This Title All WIREs
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 7.250

Survey on establishing the optimal number of factors in exploratory factor analysis applied to data mining

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

In many types of researches and studies including those performed by the sciences of agriculture and plant sciences, large quantities of data are frequently obtained that must be analyzed using different data mining techniques. Sometimes data mining involves the application of different methods of statistical data analysis. Exploratory Factor Analysis (EFA) is frequently used as a technique for data reduction and structure detection in data mining. In our survey, we study the EFA applied to data mining, focusing on the problem of establishing of the optimal number of factors to be retained. The number of factors to retain is the most important decision to take after the factor extraction in EFA. Many researchers discussed the criteria for choosing the optimal number of factors. Mistakes in factor extraction may consist in extracting too few or too many factors. An inappropriate number of factors may lead to erroneous conclusions. A comprehensive review of the state‐of‐the‐art related to this subject was made. The main focus was on the most frequently applied factor selection methods, namely Kaiser Criterion, Cattell's Scree test, and Monte Carlo Parallel Analysis. We have highligthed the importance of the analysis in some research, based on the research specificity, of the total cumulative variance explained by the selected optimal number of extracted factors. It is necessary that the extracted factors explain at least a minimum threshold of cumulative variance. ExtrOptFact algorithm presents the steps that must be performed in EFA for the selection of the optimal number of factors. For validation purposes, a case study was presented, performed on data obtained in an experimental study that we made on Brassica napus plant. Applying the ExtrOptFact algorithm for Principal Component Analysis can be decided on the selection of three components that were called Qualitative, Generative, and Vegetative, which explained 92% of the total cumulative variance. This article is categorized under: Algorithmic Development > Statistics Algorithmic Development > Biological Data Mining Algorithmic Development > Structure Discovery
Visual analysis of V2 normality. (a) Histogram of V2; (b) QQ plot of V2
[ Normal View | Magnified View ]
Visual analysis of V1 normality. (a) Histogram of V1; (b) QQ plot of V1
[ Normal View | Magnified View ]
Visual representation of experimental data with the linear trendline included. (a) Graphical V1 variable. (b) Graphical V2 variable. (c) Graphical V3 variable. (d) Graphical V4 variable, and (e) Graphical V5 variable
[ Normal View | Magnified View ]
A large‐size land of B. napus that forms a complex ecosystem
[ Normal View | Magnified View ]
Cattell's scree test. The plot of eigenvalues. X‐axis represents the eigenvalue number. Y‐axis represents eigenvalues
[ Normal View | Magnified View ]
Plot of eigenvalues. Scree test proposed by Cattell
[ Normal View | Magnified View ]
Visual analysis of V5 normality. (a) Histogram of V5; (b) QQ plot of V5
[ Normal View | Magnified View ]
Visual analysis of V4 normality. (a) Histogram of V4; (b) QQ plot of V4
[ Normal View | Magnified View ]
Visual analysis of V3 normality. (a) Histogram of V3; (b) QQ plot of V3
[ Normal View | Magnified View ]

Browse by Topic

Algorithmic Development > Structure Discovery
Algorithmic Development > Biological Data Mining
Algorithmic Development > Statistics

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts