Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 2.541

Model‐based clustering and classification of functional data

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Complex data analysis is a central topic of modern statistics and learning systems which is becoming of broader interest with the increasing prevalence of high‐dimensional data. The challenge is to develop statistical models and autonomous algorithms that are able to discern knowledge from raw data, which can be achieved through clustering techniques, or to make predictions of future data via classification techniques. Latent data models, including mixture model‐based approaches, are among the most popular and successful approaches in both supervised and unsupervised learning. Although being traditional tools in multivariate analysis, they are growing in popularity when considered in the framework of functional data analysis (FDA). FDA is the data analysis paradigm in which each datum is a function, rather than a real vector. In many areas of application, including signal and image processing, functional imaging, bioinformatics, etc., the analyzed data are indeed often available in the form of discretized values of functions, curves, or surfaces. This functional aspect of the data adds additional difficulties when compared to classical multivariate data analysis. We review and present approaches for model‐based clustering and classification of functional data. We present well‐grounded statistical models along with efficient algorithmic tools to address problems regarding the clustering and the classification of these functional data, including their heterogeneity, missing information, and dynamical hidden structures. The presented models and algorithms are illustrated via real‐world functional data analysis problems from several areas of application. This article is categorized under: Fundamental Concepts of Data and Knowledge > Data Concepts Algorithmic Development > Statistics Technologies > Structure Discovery and Clustering
Misclassification error and intracluster inertia in relation to the noise level
[ Normal View | Magnified View ]
Results obtained with the proposed RHLP on a real switch operation time series. The rows display the signal and the polynomial regimes (top), the corresponding estimated logistic proportions (middle), and the obtained mean curve (bottom)
[ Normal View | Magnified View ]
Clustering of switch operation time series obtained with the MixHMMR model
[ Normal View | Magnified View ]
Clusters and the corresponding piecewise prototypes for each cluster obtained with the CEM‐PWRM algorithm for the satellite data set
[ Normal View | Magnified View ]
Clusters and the corresponding piecewise means for each cluster, obtained with the CEM‐PWRM algorithm for the Tecator data set
[ Normal View | Magnified View ]
Cluster means obtained by the proposed BMSSR model with K = 12 components
[ Normal View | Magnified View ]
Clustering results obtained by the proposed robust EM‐like algorithm and the bSRM model with a cubic B‐spline of seven knots for the yeast cell cycle data. Each subfigure corresponds to a cluster
[ Normal View | Magnified View ]
Phonemes data and clustering results obtained by the proposed robust EM‐like algorithm and the bSRM model with a cubic B‐spline of seven knots for the phonemes data. The five subfigures correspond to the automatically retrieved clusters which correspond to the phonemes “ao,” “aa,” “yi,” “dcl,” and “sh”
[ Normal View | Magnified View ]
Variation of the number of clusters and the value of the objective function as a function of the iteration index for the bSRM models, for the waveform data
[ Normal View | Magnified View ]
Original waveform data (a) and clustering results obtained by the proposed robust EM‐like algorithm and the bSRM model, using a cubic B‐spline with three knots. Each subfigure (b)–(d) corresponds to a cluster
[ Normal View | Magnified View ]
Examples of individual curves from Figure (e)
[ Normal View | Magnified View ]
Examples of functional data sets
[ Normal View | Magnified View ]
Results obtained with the proposed FMDA‐MiXRHLP for the real switch operation curves. The subplots illustrate the estimated clusters (subclasses) for class 1 and the corresponding mean curves (a), each subclass of class 1 with the estimated mean curve presented in a bold line (c and d), the polynomial regressors (degree p = 3) (f and g), the corresponding logistic proportions of class 1 (i and j), the estimated mean curve for class 2 (b), the polynomial regressors of class 2 (e), and the corresponding logistic proportions of class 2 (h)
[ Normal View | Magnified View ]

Browse by Topic

Technologies > Structure Discovery and Clustering
Fundamental Concepts of Data and Knowledge > Data Concepts
Algorithmic Development > Statistics

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts