  How to cite this WIREs title:
WIREs Comp Stat

Geometry in statistics

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract Geometry is a broad area that has applications to many areas of statistics. In this article the focus will be on the role of dual information geometries to statistical inference. A great deal of research has been done in the application of these dual geometries to higher order asymptotics and a brief review is given. Greater attention is given to providing insight into dual geometries as extensions of Euclidean geometry, and how, a further extension, called the dual simplicial geometry, can provide a general framework for computational algorithms. WIREs Comp Stat 2010 2 686–694 DOI: 10.1002/wics.128 This article is categorized under: Statistical and Graphical Methods of Data Analysis > Information Theoretic Methods Statistical and Graphical Methods of Data Analysis > Nonparametric Methods

The data is represented by an n‐dimensional vector y, the model space M is represented by an m‐dimensional subspace indicated by the blue solid line. The MLE is the point on M that is closest to y in terms of Dy; it is also the orthogonal projection of y onto M along the dashed line connecting y to . The divergence Dy(y,μ0) between the data y and model μ0 is partitioned into ‘residual’ and the Kullback–Leibler divergence between the MLE and μ0.

[ Normal View | Magnified View ]

Representation of the simplicial structure for dimensions 2, 1, and 0. The interior is two dimensional, the edges one dimensional, and the vertices have dimension zero. These changes in dimension of the model space correspond to changes in support in the sample space. This change in dimension is the most important difference from the manifold structure of classical dual geometries. In the largest dimension, in this case 2, the local structure can be identified with ; however, this will not be true for points on the boundary of the simplex. This is illustrated by the ‘cone’ of vectors emanating from the distribution placing all its mass on the second cell.

[ Normal View | Magnified View ]

The data is represented by an n‐dimensional vector y, the model space M is represented by an m‐dimensional subspace indicated by the blue solid line. The MLE ( , equivalently, ), is the point on M that is closest to y in terms of Dy; it is also the orthogonal projection of y onto M along the dashed curved line connecting y to . The divergence Dy(y,μ0) between the data y and model μ0(equivalently, θ0) is partitioned into ‘residual’ and the Kullback–Leibler divergence between the MLE and θ0. The key difference between this figure and Figure 1 is that curves connecting the data to the model space M are no longer straight lines. However, these are straight lines if the figure were drawn using the parametrization μ but now M would no longer be a straight line. Both parametrizations μ and θ are required simultaneously to express the Pythagorean relationship for the divergence that is connected with the statistical ideas of maximum likelihood, sufficiency, information, and efficiency.

[ Normal View | Magnified View ]

Related Articles

Scientific Visualization