This Title All WIREs
How to cite this WIREs title:
WIREs Comp Stat

# A convergence diagnostic for Bayesian clustering

Can't access this content? Tell your librarian.

Abstract In many applications of Bayesian clustering, posterior sampling on the discrete state space of cluster allocations is achieved via Markov chain Monte Carlo (MCMC) techniques. As it is typically challenging to design transition kernels to explore this state space efficiently, MCMC convergence diagnostics for clustering applications are especially important. Here we propose a diagnostic tool for discrete‐space MCMC, focusing on Bayesian clustering applications where the model parameters have been integrated out. We construct a Hotelling‐type statistic on the highest probability states, and use regenerative sampling theory to derive its equilibrium distribution. By leveraging information from the unnormalized posterior, our diagnostic offers added protection against seemingly convergent chains in which the relative frequency of visited states is incorrect. The methodology is illustrated with a Bayesian clustering analysis of genetic mutants of the flowering plant Arabidopsis thaliana. This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification Statistical Learning and Exploratory Methods of the Data Sciences > Knowledge Discovery Statistical and Graphical Methods of Data Analysis > Markov Chain Monte Carlo
Convergence diagnosis of MCMC algorithms. (a) Absolute error between true pairwise co‐cluster probability ρij and the regenerative sampling estimate. (b) Coefficient of variation for larger of co‐cluster and anti‐cluster probability estimates, . (c) Tail probability of the Hotelling‐RS statistic , partitioning on the K most probable states
[ Normal View | Magnified View ]
Left: Cumulative PMF of cluster allocations by decreasing posterior probability. Right: Co‐occurrence probabilities ρij for all pairs with ρij > .05. In black is the contribution of the top K = 10 clustering allocations, in red is that of the remainder
[ Normal View | Magnified View ]
Profile plot of metabolite measurements for each mutant. Different categories of mutant indicated by color: defective in starch biosynthesis (red), defective in starch degradation (blue), comparative plant (green), wild types (brown), uncharacterized mutants (orange). On the left is the agglomerative clustering dendrogram obtained by the method of Partovi Nia and Davison (2012), with the optimal clustering for this method displayed on the right
[ Normal View | Magnified View ]