This Title All WIREs
How to cite this WIREs title:
WIREs Comp Stat

Genome‐wide prediction of chromatin accessibility based on gene expression

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract Decoding gene regulation in a biological system requires information from both transcriptome and regulome. While multiple high‐throughput transcriptome and regulome mapping technologies are available, transcriptome profiling is more widely used. Today, over a million bulk and single‐cell gene expression samples are publicly available. This number is orders of magnitude larger than the number of available regulome samples. Most of the gene expression samples do not have corresponding regulome data. However, it is possible to obtain regulome information via prediction. Open chromatin is a hallmark of active regulatory elements. This mini‐review discusses recent advances in predicting chromatin accessibility using gene expression data, including both the development of prediction methods and their applications in expanding the regulome catalog, improving regulome analysis, integrating transcriptome and regulome data, and facilitating single‐cell analysis of gene regulation. This article is categorized under: Applications of Computational Statistics > Genomics/Proteomics/Genetics Data: Types and Structure > Massive Data Statistical Models > Linear Models
An overview of predicting chromatin accessibility using gene expression. (a) Chromatin accessibility is associated with activities of cis‐regulatory elements which control genes' transcriptional activities. (b) A timeline of landmark events in the development of methods for predicting genome‐wide chromatin accessibility
[ Normal View | Magnified View ]
Applications of transcriptome‐based chromatin accessibility prediction
[ Normal View | Magnified View ]
Predicting chromatin accessibility using scRNA‐seq. (a) Single‐cell regulome data are discrete whereas bulk chromatin accessibility is continuous. (b) The BIRD pipeline for predicting chromatin accessibility using scRNA‐seq data from different platforms. (c) A comparison between scATAC‐seq and BIRD‐predicted chromatin accessibility in an example genomic region. Bulk DNase‐seq and ATAC‐seq are shown as the gold standard. (d) Pearson correlation between bulk ATAC‐seq and chromatin accessibility obtained from scATAC‐seq or BIRD by pooling an increasing number of granulocyte‐macrophage progenitor (GMP) cells. BIRD prediction is based on raw or scVI‐imputed 10× Genomics scRNA‐seq data. (e) Differential chromatin accessibility between GMP and common myeloid progenitor (CMP) cells obtained from BIRD or scATAC‐seq by pooling the same number of cells was compared to differential signals between GMP and CMP from bulk ATAC‐seq (gold standard). The Pearson correlation is shown as a function of pooled cell number. In (d) and (e), for each cell number n (= 1, 5, 10, 20) and cell type, BIRD was run by pooling n random cells sampled from scRNA‐seq, and scATAC was based on pooling n random cells sampled from scATAC‐seq. This random sampling procedure was repeated 10 times for BIRD and scATAC, respectively. The figure shows mean ± standard deviation (vertical bars) of the correlation across the 10 independently sampled datasets. BIRD significantly outperformed scATAC‐seq in all comparisons (Wilcoxon rank‐sum test, Benjamini–Hochberg FDR <0.05) except that scATAC‐seq had similar performance as BIRD‐raw when pooling 20 cells in (e) (FDR = 0.436)
[ Normal View | Magnified View ]
Predicting open chromatin using gene expression and DNA sequences via deep learning. DNA sequences are transformed into a one‐dimensional input with four channels via one‐hot coding. The sequence features and gene expression features are then used to classify genomic regions into open or closed chromatin via a neural network
[ Normal View | Magnified View ]
Overview of BIRD. (a) BIRD uses matched gene expression and chromatin accessibility data to train prediction models which are then applied to predict chromatin accessibility in new gene expression samples. (b) BIRD first builds a regression model for each CRE. Co‐expressed genes are clustered to reduce the predictor dimension. (c) BIRD also clusters co‐activated CREs and builds multi‐level prediction models for CRE clusters. It then uses model averaging to combine CRE‐level and CRE‐cluster‐level prediction models to improve the prediction accuracy for each CRE. (d) An example genomic region comparing the true chromatin accessibility signals, BIRD‐predicted signals, and the mean signals from the training data. (e)–(f) BIRD, ChromImpute, and the mean DNase‐seq profile of training samples (negative control) are compared in terms of the Pearson correlation between the predicted and true DNase‐seq signals across all CREs (i.e., cross‐locus correlation) in 10 test samples (e) and across all samples (i.e., cross‐sample correlation) at each CRE (f). For (f), each boxplot shows the distribution of cross‐sample correlation for all CREs, and the mean correlation is shown above the boxplot
[ Normal View | Magnified View ]

Browse by Topic

Statistical Models > Linear Models
Data: Types and Structure > Massive Data
Applications of Computational Statistics > Genomics/Proteomics/Genetics

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts