This Title All WIREs
How to cite this WIREs title:
WIREs Cogn Sci
Impact Factor: 3.476

Word maturity indices with latent semantic analysis: why, when, and where is Procrustes rotation applied?

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

The aim of this paper is to describe and explain one useful computational methodology to model the semantic development of word representation: Word maturity. In particular, the methodology is based on the longitudinal word monitoring created by Kirylev and Landauer using latent semantic analysis for the representation of lexical units. The paper is divided into two parts. First, the steps required to model the development of the meaning of words are explained in detail. We describe the technical and theoretical aspects of each step. Second, we provide a simple example of application of this methodology with some simple tools that can be used by applied researchers. This paper can serve as a user‐friendly guide for researchers interested in modeling changes in the semantic representations of words. Some current aspects of the technique and future directions are also discussed. WIREs Cogn Sci 2018, 9:e1457. doi: 10.1002/wcs.1457 This article is categorized under: Computer Science > Natural Language Processing Linguistics > Language Acquisition Psychology > Development and Aging
A three‐dimensional space (a room), which makes it possible to have a simplified idea of the semantic space (the usual number of dimensions with which the latent semantic analysis (LSA) represents words is about 300).
[ Normal View | Magnified View ]
The x axis represents the different developmental stages or the reader's age (simulated reading experience). The y axis represents the maturity level of the word at each stage by means of its similarity to the adult meaning, expressed as the cosine between the word vector at each stage and the word vector in the adult stage.
[ Normal View | Magnified View ]
Use of the MS Excel solver module. Sample of the solver module form in MS Excel 2013. The goal is to minimize the value of the F12 cell, which represents the quadratic sum of the differences between each WM point and each point predicted by the function. The best adjustment is the one that renders that value minimal. How is the minimal value of F12 found? Trying out different values of I4 and I5, which are the values of the ‘a’ and ‘b’ parameters in the logistic function.
[ Normal View | Magnified View ]
Use of the MS Excel solver module. Calculation of parameters ‘a’ and ‘b’ for the best adjustment of the distribution of the five WMS for ‘botella’ (‘bottle’) and estimated values for the ages 0–21 on the basis of the logistic function using these parameters. If WM scores are available for all ages, the ages will be specified in the C column and the WMs will be placed in the D column.
[ Normal View | Magnified View ]
Alignment by means of the Procrustes rotation with Gallito Studio.
[ Normal View | Magnified View ]
WM distribution of the words dog, turkey, predator, and focal. (Adapted from Ref ).
[ Normal View | Magnified View ]
(a) The centering, scaling, and the rotation matrix Q previously obtained with the figure alignment are applied to all the old map. (b) Because no word can be selected as a ‘pivot,’ assuming that words are in continuous change, the common paragraphs between each intermediate age and the adult stage are used. The rotation matrix Q for alignment of both paragraph matrices also served to align the word matrices. Cosine(dog’, dog), is used to extract the WM of ‘dog.’
[ Normal View | Magnified View ]
(a) Centering the figures, (b) scaling them, and (c) finding a rotation.
[ Normal View | Magnified View ]
Representation of towns on a Roman map (a), and representation of towns on a current map (b). Towns which already existed at the time of Roman occupation (black dots) are used as ‘pivots’ (i.e., invariants) to place towns which no longer exist (Lumintium and Brigantium) in the current map. The methodology starts by aligning figures constituted by pivots.
[ Normal View | Magnified View ]
D1, D2 y D3 represent the latent dimensions of the first developmental stage space. D1’, D2’ y D3’ represent the latent dimensions of a later developmental stage, for example, the adult language stage. D1, D2 y D3 do not have the same meaning as D1’, D2’ y D3’, thus words in that spaces are not comparable.
[ Normal View | Magnified View ]
Two semantic spaces representing two ages. The corpus used for create each space are cumulative. Corpus of stage 2 encompasses both the new texts added and the texts from the stage 1.
[ Normal View | Magnified View ]

Related Articles

Latent semantic analysis
Self‐organizing maps for latent semantic analysis of free‐form text in support of public policy analysis (WIREs WIREs Data Mining and Knowledge Discovery )

Browse by Topic

Psychology > Development and Aging
Linguistics > Language Acquisition
Computer Science and Robotics > Natural Language Processing

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts