This Title All WIREs
How to cite this WIREs title:
WIREs Comp Stat

Model‐based cluster analysis

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract Cluster analysis seeks to identify homogeneous subgroups of cases in a population. This article provides an introduction to model‐based clustering using finite mixture models and extensions. Finite mixtures have been successfully used for more than a hundred years for clustering and classification, but have become increasingly popular in the last decade due to recent advances in computer technology and software availability. Unlike traditional methods of cluster analysis, which are based on heuristic or distance‐based procedures, finite mixture modeling provides a formal statistical framework on which to base the clustering procedure. Finite mixture models assume that the population is made up of several distinct subsets (or clusters), each following a different multivariate probability density distribution. Model‐based cluster analysis can deal with a mix of nominal, ordinal, count, or continuous variables, any of which may contain missing values. We will demonstrate how the problems of determining the number of clusters and choosing an appropriate clustering method reduce to a model selection problem, for which objective procedures exist. We briefly discuss how model‐based cluster analysis can be used to analyze complex and structured (e.g., longitudinal) datasets. WIREs Comput Stat 2012 doi: 10.1002/wics.1204 This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification Statistical Learning and Exploratory Methods of the Data Sciences > Modeling Methods

Illustration of an unstructured finite mixture model using a latent variable framework approach. Circles represent unobserved, latent variables and squares observed, manifest variables. C denotes the latent variable, and the subclasses are inferred from multiple observed indicator variables, items 1–5. The errors e1–e5 are the residual error terms of the observed variables with mean 0. Double‐headed arrows between the error terms represent correlation (or covariance) between the observed variables which cannot be explained by its correlation with the latent variable. In the unstructured model, cluster membership is allowed to affect all parameters in the model. The effect of the residual error terms is set to 1 and one path from C to the manifest variables is constrained to 0 to avoid an ‘unidentified’ model.

[ Normal View | Magnified View ]

(a) Model fit assessed using Bayesian information criterion (BIC), the VEV model with either two or three clusters is optimal. (b) Cluster assignment using the VEV model with three clusters. (c) Errors in mclust classification are shown; black symbols indicate incorrectly classified individuals. (d) Uncertainty of cluster allocation, larger symbols indicate more uncertainty. Observations in the bottom cluster have a high certainty of being in the correct cluster, while those near the top two clusters have a higher uncertainty as these species are more similar and there of a distinction between the characteristics of these flowers. The analysis was carried out using the R package mclust.39

[ Normal View | Magnified View ]

Pearson's crab data,24 one of the earliest uses of finite mixture modeling.

[ Normal View | Magnified View ]

Dendrogram of a hierarchical cluster analysis of Anderson's Iris dataset.7 A dendrogram shows the nested structure of the clustering process using a hierarchical analysis. It consists of many U‐shaped lines connecting objects in a hierarchical tree. The height of each U represents the distance between the two objects being connected.

[ Normal View | Magnified View ]

Browse by Topic

Statistical Learning and Exploratory Methods of the Data Sciences > Modeling Methods
Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts