This Title All WIREs
How to cite this WIREs title:
WIREs Comp Stat

# Estimation of covariance and precision matrix, network structure, and a view toward systems biology

Can't access this content? Tell your librarian.

Covariance matrix and its inverse, known as the precision matrix, have many applications in multivariate analysis because their elements can exhibit the variance, correlation, covariance, and conditional independence between variables. The practice of estimating the precision matrix directly without involving any matrix inversion has obtained significant attention in the literature. We review the methods that have been implemented in R and their R packages, particularly when there are more variables than data samples and discuss ideas behind them. We describe how sparse precision matrix estimation methods can be used to infer network structure. Finally, we discuss methods that are suitable for gene coexpression network construction. WIREs Comput Stat 2017, 9:e1415. doi: 10.1002/wics.1415 This article is categorized under: Statistical Models > Linear Models Applications of Computational Statistics > Computational and Molecular Biology Statistical and Graphical Methods of Data Analysis > Multivariate Analysis
Heatmaps of the estimated covariance matrix when the ground truth is known. We have used p = 100 and n = 95 as a representing example. Note that the nonzero elements of the covariance matrix estimated with Glasso are somewhat smaller than their counterparts in the sample covariance matrix, due to the shrinkage effect of the L1‐penalty.
[ Normal View | Magnified View ]
Gene coexpression network estimated with the weighted Glasso in expression data from the Arabidopsis thaliana with 739 different genes.
[ Normal View | Magnified View ]
A dendrogram obtained from hierarchical clustering in a yeast microarray data with 2001 different genes (top). Distinct modules represented here with different colors are based on a height cutoff of the branches (middle). The bottom panel indicates essential genes in the data. Essential genes are more likely to be genes with high connectivity in the graph.
[ Normal View | Magnified View ]
Gene coexpression network estimated with the hard thresholding in a yeast microarray data with 2001 different genes. Nodes are colored according to the modules represented in Figure .
[ Normal View | Magnified View ]
Estimated graph of simulated dataset compared to the true graph structure. We have used n = 100 and p = 200 as a representing example. (a) The optimal graph. (b) The optimal graph estimated with Glasso when the tuning parameter is chosen with Rotation Information Criterion (RIC). (c) The optimal graph estimated with Sparse Column‐wise Inverse Operator (SCIO) when the tuning parameter is chosen with cross validation.
[ Normal View | Magnified View ]
Graphs by Meinshausen and Bühlmann approximation estimated in a subsample of the riboflavin data set. (a) The optimal graph when the tuning parameter is chosen with Rotation Information Criterion (RIC). (b) The optimal graph when the tuning parameter is chosen with Stability Approach to Regularization Selection (StARS).
[ Normal View | Magnified View ]
Heatmaps of the estimated adjacency matrices for Constrained L1‐minimization for Inverse Matrix Estimation (CLIME), Sparse Column‐wise Inverse Operator (SCIO), and Tuning‐Insensitive Graph Estimation and Regression (TIGER) when the ground truth is known. We have used p = 100 and n = 95 as a representative example. White means zero adjacency matrix (precision matrix) element and black is a nonzero element.
[ Normal View | Magnified View ]