This Title All WIREs
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 7.250

Self‐organizing maps for latent semantic analysis of free‐form text in support of public policy analysis

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

The huge amount of free‐form unstructured text in the blogosphere, its increasing rate of production, and its shrinking window of relevance, present serious challenges to the public policy analyst who seeks to take public opinion into account. Most of the tools which address this problem use XML tagging and other Web 3.0 approaches, which do not address the actual content of blog posts and the associated commentary. We give a tutorial review of latent semantic analysis and the self‐organizing maps, as considered in this context, and show how to apply the self‐organizing map over a probabilistic latent semantic space to the problem of completely unsupervised clustering of unstructured text in such a way as to be entirely independent of spelling, grammar, and even source language. This provides an algorithm suitable for clustering free‐form commentary with a well‐structured test environment. The algorithm is applied to academic paper abstracts instead, treated as unstructured text as though they were blog posts, because this set of documents has a known ground truth. The algorithm constructs a word category map and a document map in which words with similar meaning and documents with similar content are clustered together. WIREs Data Mining Knowl Discov 2014, 4:71–86. doi: 10.1002/widm.1112 This article is categorized under: Algorithmic Development > Web Mining Application Areas > Government and Public Sector Technologies > Structure Discovery and Clustering
Geometry of updating the BMU and its neighbors toward the input sample marked with X. The solid and dashed lines correspond to the situation before and after updating, and show the effective range of the neighborhood function. (Reprinted with permission from Ref  under GNU General Public License. Copyright 2000 Helsinki University of Technology)
[ Normal View | Magnified View ]
Document map, clustered into 20 clusters.
[ Normal View | Magnified View ]
Word category map, clustered into 20 clusters.
[ Normal View | Magnified View ]

Browse by Topic

Application Areas > Government and Public Sector
Algorithmic Development > Web Mining
Technologies > Structure Discovery and Clustering

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts