Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Comp Stat

A review of multivariate distributions for count data derived from the Poisson distribution

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

The Poisson distribution has been widely studied and used for modeling univariate count‐valued data. However, multivariate generalizations of the Poisson distribution that permit dependencies have been far less popular. Yet, real‐world, high‐dimensional, count‐valued data found in word counts, genomics, and crime statistics, for example, exhibit rich dependencies and motivate the need for multivariate distributions that can appropriately model this data. We review multivariate distributions derived from the univariate Poisson, categorizing these models into three main classes: (1) where the marginal distributions are Poisson, (2) where the joint distribution is a mixture of independent multivariate Poisson distributions, and (3) where the node‐conditional distributions are derived from the Poisson. We discuss the development of multiple instances of these classes and compare the models in terms of interpretability and theory. Then, we empirically compare multiple models from each class on three real‐world datasets that have varying data characteristics from different domains, namely traffic accident data, biological next generation sequencing data, and text data. These empirical experiments develop intuition about the comparative advantages and disadvantages of each class of multivariate distribution that was derived from the Poisson. Finally, we suggest new research directions as explored in the subsequent Discussion section. WIREs Comput Stat 2017, 9:e1398. doi: 10.1002/wics.1398 This article is categorized under: Statistical and Graphical Methods of Data Analysis > Multivariate Analysis
(Left) The first class of Poisson generalizations is based on the assumption that the univariate marginals are derived from the Poisson. (Middle) The second class is based on the idea of mixing independent multivariate Poissons into a joint multivariate distribution. (Right) The third class is based on the assumption that the univariate conditional distributions are derived from the Poisson.
[ Normal View | Magnified View ]
Classic3 text dataset (low counts and medium overdispersion): maximum mean discrepancy (top) and Spearman ρ’s difference (bottom) with different number of variables: 10 (left), 100 (middle), 1000 (right). The Poisson SQR model performs better on this low count dataset than in previous settings.
[ Normal View | Magnified View ]
BRCA RNA‐Seq dataset (medium counts and medium overdispersion): maximum mean discrepancy (MMD) (top) and Spearman ρ’s difference (bottom) with different number of variables: 10 (left), 100 (middle), 1000 (right). While mixtures (‘Log‐Normal’ and ‘Mixture Poiss’) perform well in terms of MMD, the Gaussian copula paired with Poisson marginals (‘Copula Poisson’) can model dependency structure well as evidenced by the Spearman metric.
[ Normal View | Magnified View ]
Crash severity dataset (high counts and high overdispersion): maximum mean discrepancy (left) and Spearman ρ’s difference (right). As expected, for high overdispersion, mixture models (‘Log‐Normal’ and ‘Mixture Poiss’) seem to perform the best.
[ Normal View | Magnified View ]
Node‐conditional distributions (left) are univariate probability distributions of one variable conditioned on the other variables, while radial‐conditional distributions are univariate probability distributions of the vector scaling conditioned on the vector direction. Both conditional distributions are helpful in understanding square root (SQR) graphical models. (Illustration from Ref )
[ Normal View | Magnified View ]
A copula distribution (left)—which is defined over the unit hypercube and has uniform marginal distributions—paired with univariate Poisson marginal distributions for each variable (middle) defines a valid discrete joint distribution with Poisson marginals (right).
[ Normal View | Magnified View ]

Browse by Topic

Statistical and Graphical Methods of Data Analysis > Multivariate Analysis

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts