This Title All WIREs
How to cite this WIREs title:
WIREs Comp Stat

# Examining missing data mechanisms via homogeneity of parameters, homogeneity of distributions, and multivariate normality

Can't access this content? Tell your librarian.

This paper reviews various methods of identifying missing data mechanisms. The three well‐known mechanisms of missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) are considered. A number of tests deem rejection of homogeneity of means and/or covariances (HMC) among observed data patterns as a means to reject MCAR. Utility of these tests as well as their shortcomings are discussed. In particular, examples of MAR and MNAR data with homogeneous means and covariances between their observed data patterns are provided for which tests of HMC fail to reject MCAR. More generally, tests of homogeneity of parameter estimates between various subsets of data are reviewed and their utility as tests of MCAR and MAR (in special cases) is pointed out. Since many tests of MCAR assume multinormality, methods to assess this assumption in the context of incomplete data are reviewed. Tests of homogeneity of distributions among observed data patterns for MCAR are also considered. A new nonparametric test of this type is proposed on the basis of pairwise comparison of marginal distributions. Finally, methods of examining missing data mechanism based on sensitivity analysis including methods that model missing data mechanism based on logistic, probit, and latent variable regression models, as well as methods that do not require modeling of missing data mechanism are reviewed. The paper concludes with some practical comments about the validity and utility of tests of missing data mechanism. WIREs Comput Stat 2014, 6:56–73. doi: 10.1002/wics.1287

Conflict of interest: The authors have declared no conflicts of interest for this article.

The intervals marked as ‘M’ are truncated from X to obtain the random variable $X˜$. The intervals marked ‘O’ form the range of $X˜$.
[ Normal View | Magnified View ]
Q–Q plots comparing the distribution of observed data on variable 1 for the two groups of completely observed cases and incomplete cases. The left panels correspond to the missing not at random (MNAR) data and the right panels corresponds to the missing at random (MAR) data for the two cases where missing data are generated according to () and ().
[ Normal View | Magnified View ]
The densities g(x), h(x), and the standard normal density φ(x).
[ Normal View | Magnified View ]
The JJ‐NP test of missing completely at random (MCAR) applied to a set of bivariate normal data with ρ = 0.8 and incomplete data generated according to the logistic regression model with α = − 1 and β = − 20.
[ Normal View | Magnified View ]
The JJ‐NP test of missing completely at random (MCAR) applied to a set of incomplete bivariate normal data with ρ = 0.8 and missingness generated according to the logistic regression model with α = − 2 and β = − 1.65.
[ Normal View | Magnified View ]
Logistic curves for two different parameter values.
[ Normal View | Magnified View ]
The JJ‐NP test of missing completely at random (MCAR) applied to a set of data with missingness generated according to missing at random (MAR) mechanism .
[ Normal View | Magnified View ]