Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Comp Stat

Principal component analysis

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Principal component analysis (PCA) is a multivariate technique that analyzes a data table in which observations are described by several inter‐correlated quantitative dependent variables. Its goal is to extract the important information from the table, to represent it as a set of new orthogonal variables called principal components, and to display the pattern of similarity of the observations and of the variables as points in maps. The quality of the PCA model can be evaluated using cross‐validation techniques such as the bootstrap and the jackknife. PCA can be generalized as correspondence analysis (CA) in order to handle qualitative variables and as multiple factor analysis (MFA) in order to handle heterogeneous sets of variables. Mathematically, PCA depends upon the eigen‐decomposition of positive semi‐definite matrices and upon the singular value decomposition (SVD) of rectangular matrices. Copyright © 2010 John Wiley & Sons, Inc.

Figure 1.

The geometric steps for finding the components of a principal component analysis. To find the omponents (1) center the variables then plot them against each other. (2) Find the main direction (called the first component) of the cloud of points such that we have the minimum of the sum of the squared distances from the points to the component. Add a second component orthogonal to the first such that the sum of the squared distances is minimum. (3) When the components have been found, rotate the figure in order to position the first component horizontally (and the second component vertically), then erase the original axes. Note that the final graph could have been obtained directly by plotting the observations from the coordinates given in Table 1.

[ Normal View 55K | Magnified View 87K ]
Figure 2.

Plot of the centered data, with the first and second components. The projections (or coordinates) of the word ‘neither’ on the first and the second components are equal to − 5.60 and − 2.38.

[ Normal View 29K | Magnified View 41K ]
Figure 3.

How to find the coordinates (i.e., factor scores) on the principal components of a supplementary observation: (a) the French word sur is plotted in the space of the active observations from its deviations to the W and Y variables; and (b) The projections of the sur on the principal components give its coordinates.

[ Normal View 33K | Magnified View 52K ]
Figure 4.

Circle of correlations and plot of the loadings of (a) the variables with principal components 1 and 2, and (b) the variables and supplementary variables with principal components 1 and 2. Note that the supplementary variables are not positioned on the unit circle.

[ Normal View 30K | Magnified View 48K ]
Figure 5.

PCA wine characteristics. Factor scores of the observations plotted on the first two components. λ1 = 4.76, τ1 = 68%; λ2 = 1.81, τ2 = 26%.

[ Normal View 18K | Magnified View 22K ]
Figure 6.

PCA wine characteristics. Correlation (and circle of correlations) of the Variables with Components 1 and 2. λ1 = 4.76, τ1 = 68%; λ2 = 1.81, τ2 = 26%.

[ Normal View 25K | Magnified View 34K ]
Figure 7.

PCA wine characteristics. (a) Original loadings of the seven variables. (b) The loadings of the seven variables showing the original axes and the new (rotated) axes derived from varimax. (c) The loadings after varimax rotation of the seven variables.

[ Normal View 30K | Magnified View 47K ]
Figure 8.

PCA example. Amount of Francs spent (per month) on food type by social class and number of children. Factor scores for principal components 1 and 2. λ1 = 3, 023, 141.24, τ1 = 88%; λ2 = 290, 575.84, τ2 = 8%. BC = blue collar; WC = white collar; UC = upper class; 2 = 2 children; 3 = 3 children; 4 = 4 children; 5 = 5 children.

[ Normal View 20K | Magnified View 27K ]
Figure 9.

PCA example: Amount of Francs spent (per month) on food type by social class and number of children. Correlations (and circle of correlations) of the variables with Components 1 and 2. λ1 = 3, 023, 141.24, τ1 = 88%; λ2 = 290, 575.84, τ2 = 8%.

[ Normal View 25K | Magnified View 33K ]
Figure 10.

CA punctuation. The projections of the rows and the columns are displayed in the same map. λ1 = 0.0178, τ1 = 76.16; λ2 = 0.0056, τ2 = 23.84.

[ Normal View 21K | Magnified View 28K ]
Figure 11.

MFA wine ratings and oak type. (a) Plot of the global analysis of the wines on the first two principal components. (b) Projection of the experts onto the global analysis. Experts are represented by their faces. A line segment links the position of the wine for a given expert to its global position. λ1 = 2.83, τ1 = 84%; λ2 = 2.83, τ2 = 11%.

[ Normal View 31K | Magnified View 56K ]
Figure 12.

MFA wine ratings and oak type. Circles of correlations for the original variables. Each experts' variables have been separated for ease of interpretation.

[ Normal View 43K | Magnified View 67K ]

Related Articles

Statistical Methods

Browse by Topic

Statistical Methods > Statistical Theory and Applications
blog comments powered by Disqus

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts

Twitter: WIREsCompStat Follow us on Twitter