This Title All WIREs
How to cite this WIREs title:
WIREs Comp Stat

# Principal component analysis

Can't access this content? Tell your librarian.

Abstract Principal component analysis (PCA) is a multivariate technique that analyzes a data table in which observations are described by several inter‐correlated quantitative dependent variables. Its goal is to extract the important information from the table, to represent it as a set of new orthogonal variables called principal components, and to display the pattern of similarity of the observations and of the variables as points in maps. The quality of the PCA model can be evaluated using cross‐validation techniques such as the bootstrap and the jackknife. PCA can be generalized as correspondence analysis (CA) in order to handle qualitative variables and as multiple factor analysis (MFA) in order to handle heterogeneous sets of variables. Mathematically, PCA depends upon the eigen‐decomposition of positive semi‐definite matrices and upon the singular value decomposition (SVD) of rectangular matrices. Copyright © 2010 John Wiley & Sons, Inc. This article is categorized under: Statistical and Graphical Methods of Data Analysis > Multivariate Analysis Statistical and Graphical Methods of Data Analysis > Dimension Reduction

Circle of correlations and plot of the loadings of (a) the variables with principal components 1 and 2, and (b) the variables and supplementary variables with principal components 1 and 2. Note that the supplementary variables are not positioned on the unit circle.

[ Normal View | Magnified View ]

How to find the coordinates (i.e., factor scores) on the principal components of a supplementary observation: (a) the French word sur is plotted in the space of the active observations from its deviations to the W and Y variables; and (b) The projections of the sur on the principal components give its coordinates.

[ Normal View | Magnified View ]

Plot of the centered data, with the first and second components. The projections (or coordinates) of the word ‘neither’ on the first and the second components are equal to − 5.60 and − 2.38.

[ Normal View | Magnified View ]

The geometric steps for finding the components of a principal component analysis. To find the omponents (1) center the variables then plot them against each other. (2) Find the main direction(called the first component) of the cloud of points such that we have the minimum of the sum of the squared distances from the points to the component. Add a second component orthogonal to the first such that the sum of the squared distances is minimum. (3) When the components have been found, rotate the figure in order to position the first component horizontally (and the second component vertically),then erase the original axes. Note that the final graph could have been obtained directly by plotting the observations from the coordinates given in Table 1.

[ Normal View | Magnified View ]

MFA wine ratings and oak type. Circles of correlations for the original variables. Each experts' variables have been separated for ease of interpretation.

[ Normal View | Magnified View ]

MFA wine ratings and oak type. (a) Plot of the global analysis of the wines on the first two principal components. (b) Projection of the experts onto the global analysis. Experts are represented by their faces. A line segment links the position of the wine for a given expert to its global position. λ1 = 2.83, τ1 = 84%; λ2 = 2.83, τ2 = 11%.

[ Normal View | Magnified View ]

CA punctuation. The projections of the rows and the columns are displayed in the same map. λ1 = 0.0178, τ1 = 76.16; λ2 = 0.0056, τ2 = 23.84.

[ Normal View | Magnified View ]

PCA example: Amount of Francs spent (per month) on food type by social class and number of children. Correlations (and circle of correlations) of the variables with Components 1 and 2. λ1 = 3, 023, 141.24, τ1 = 88%; λ2 = 290, 575.84, τ2 = 8%.

[ Normal View | Magnified View ]

PCA example. Amount of Francs spent (per month) on food type by social class and number of children. Factor scores for principal components 1 and 2. λ1 = 3, 023, 141.24, τ1 = 88%; λ2 = 290, 575.84, τ2 = 8%. BC = blue collar; WC = white collar; UC = upper class; 2 = 2 children; 3 = 3 children; 4 = 4 children; 5 = 5 children.

[ Normal View | Magnified View ]

PCA wine characteristics. (a) Original loadings of the seven variables. (b) The loadings of the seven variables showing the original axes and the new (rotated) axes derived from varimax. (c) The loadings after varimax rotation of the seven variables.

[ Normal View | Magnified View ]

PCA wine characteristics. Correlation (and circle of correlations) of the Variables with Components 1 and 2. λ1 = 4.76, τ1 = 68%; λ2 = 1.81, τ2 = 26%.

[ Normal View | Magnified View ]

PCA wine characteristics. Factor scores of the observations plotted on the first two components. λ1 = 4.76, τ1 = 68%; λ2 = 1.81, τ2 = 26%.

[ Normal View | Magnified View ]

### Related Articles

Statistical Methods