Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 2.111

Self‐organizing maps for latent semantic analysis of free‐form text in support of public policy analysis

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

The huge amount of free‐form unstructured text in the blogosphere, its increasing rate of production, and its shrinking window of relevance, present serious challenges to the public policy analyst who seeks to take public opinion into account. Most of the tools which address this problem use XML tagging and other Web 3.0 approaches, which do not address the actual content of blog posts and the associated commentary. We give a tutorial review of latent semantic analysis and the self‐organizing maps, as considered in this context, and show how to apply the self‐organizing map over a probabilistic latent semantic space to the problem of completely unsupervised clustering of unstructured text in such a way as to be entirely independent of spelling, grammar, and even source language. This provides an algorithm suitable for clustering free‐form commentary with a well‐structured test environment. The algorithm is applied to academic paper abstracts instead, treated as unstructured text as though they were blog posts, because this set of documents has a known ground truth. The algorithm constructs a word category map and a document map in which words with similar meaning and documents with similar content are clustered together. WIREs Data Mining Knowl Discov 2014, 4:71–86. doi: 10.1002/widm.1112

Conflict of interest: The authors have declared no conflicts of interest for this article.

Geometry of updating the BMU and its neighbors toward the input sample marked with X. The solid and dashed lines correspond to the situation before and after updating, and show the effective range of the neighborhood function. (Reprinted with permission from Ref  under GNU General Public License. Copyright 2000 Helsinki University of Technology)
[ Normal View | Magnified View ]
Document map, clustered into 20 clusters.
[ Normal View | Magnified View ]
Word category map, clustered into 20 clusters.
[ Normal View | Magnified View ]

Browse by Topic

Technologies > Structure Discovery and Clustering
Algorithmic Development > Web Mining
Application Areas > Government and Public Sector

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts