This Title All WIREs
How to cite this WIREs title:
WIREs Comp Stat

Bayesian mixture models for cytometry data analysis

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract Bayesian mixture models are increasingly used for model‐based clustering and the follow‐up analysis on the clusters identified. As such, they are of particular interest for analyzing cytometry data where unsupervised clustering and association studies are often part of the scientific questions. Cytometry data are large quantitative data measured in a multidimensional space that typically ranges from a few dimensions to several dozens, and which keeps increasing due to innovative high‐throughput biotechonologies. We present several recent parametric and nonparametric Bayesian mixture modeling approaches, and describe advantages and limitations of these models under different research context for cytometry data analysis. We also acknowledge current computational challenges associated with the use of Bayesian mixture models for analyzing cytometry data, and we draw attention to recent developments in advanced numerical algorithms for estimating large Bayesian mixture models, which we believe have the potential to make Bayesian mixture model more applicable to new types of single‐cell data with higher dimensions. This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Knowledge Discovery Statistical Learning and Exploratory Methods of the Data Sciences > Modeling Methods Statistical and Graphical Methods of Data Analysis > Bayesian Methods and Theory
An example of a mixture of two Gaussian distributions
[ Normal View | Magnified View ]
Trace plot of posterior samples for the first dimension of component means of one flow cytometry sample
[ Normal View | Magnified View ]
Visualization of the data from one flow cytometry (FCM) sample with manual gating. This FCM sample is the replicate 1 of patient 1,228 processed at Standford from the T‐cell panel in the HIPC Lyoplate study. 30,427 cells are displayed before standardization of the features. Diagonal plots represent marginal densities per cell population, lower triangle plots are 2‐D scatter plots of gated cells, and upper triangle plots represent bivariate densities per cell population
[ Normal View | Magnified View ]
Illustration of the Dirichlet process mixture model (DPMM) through the stick breaking process representation. Panel (a) displays a truncated draw (only the first 1,000 most frequent θ values) from a stick breaking process using G0 as its base distribution and a concentration parameter α = 2 with the density probability of each θ value which can be interpreted as a given component frequency. Panel (b) overlays the base distribution G0. Panel (c) displays the kernel density function fθ that represents the probability density distribution assumed for each component. Panel (d) displays the resulting mixture density that constitutes the DPMM
[ Normal View | Magnified View ]

Browse by Topic

Statistical and Graphical Methods of Data Analysis > Bayesian Methods and Theory
Statistical Learning and Exploratory Methods of the Data Sciences > Modeling Methods
Statistical Learning and Exploratory Methods of the Data Sciences > Knowledge Discovery

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts