This Title All WIREs
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 7.250

FiRePat—Finding Regulatory Patterns between sRNAs and Genes

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract Small RNAs are regulatory RNA fragments which, through RNA silencing, can regulate the expression of genes. Because sRNAs are negative regulators it is generally assumed that expression profiles of sRNAs and their targets are negatively correlated. Recently, examples of positive correlation between the expression of sRNAs and their targets have been discovered. At the moment, it is not known how many sRNA‐target pairs are positively and negatively correlated, and it is also not clear in what situations (e.g., under which treatments) any of these correlations can be observed. To determine this, one of the first steps is to develop tools to carry out a genome wide characterization of covariation of expression levels of sRNAs and genes. We present FiRePat—Finding Regulatory Patterns—an unsupervised data mining tool applicable to large datasets, typically produced by high throughput sequencing of sRNAs and mRNAs or microarray experiments, that detects sRNA‐gene pairs with correlated expression levels. The method consists of three steps: first, we select differentially expressed sRNAs and genes; second, we compute the correlation between sRNA and gene series for all possible sRNA–gene pairs; and third, we cluster the sRNA or gene expression series, simultaneously inducing clusters in the other series. Potential uses of FiRePat are presented using publicly available sRNA and mRNA datasets for both plants and animals. The standard output of FiRePat, a list of correlated pairs formed with sRNAs and mRNAs, can be used to investigate the cause and consequences of the respective expression patterns. © 2012 Wiley Periodicals, Inc. This article is categorized under: Algorithmic Development > Biological Data Mining

Screenshot of the colored HTML output of FiRePat applied on the Arabidopsis samples, on genes and loci defined with SiLoCo.10 The upper and lower panels present the sRNA–locus/gene pairs for the probesets 259008 and 267083, respectively, clustered by probeset expression values for the four samples (flower, leaf, seeding, and silique). These probesets map to genes annotated as etallothionein and calcium‐binding EF hand family protein, respectively. The shades of green/red indicate the increase/decrease in expression level relative to the first point. The first value in the series is initialized with 0. The left side of the table contains gene expression levels and the right side contains sRNA loci expression levels. The last column is the Pearson correlation coefficient for the sRNA–locus/gene pair.

[ Normal View | Magnified View ]

Expression levels for small RNA (sRNA) loci mapping to a gene promoter (left) and the corresponding gene (right). The black line represents the sequence of the gene promoter and the boxes represent the abundance/size class distribution of sRNAs grouped into 100 nt windows for four plant samples; S1: flower, S2: leaf, S3: seedling, S4: siliques. Larger boxes indicate higher abundance. The colors correspond to the size classes 21 (red), 22 (green), 23 (orange), and 24 (blue). The height of the box for the first sample S1 corresponds to the cumulative sRNA abundance in log scale. The height of the following boxes S2–S4 is proportional to the log offset fold change (OFC) relative to height of the box corresponding to S1. The expression profile of the corresponding mRNA is shown on the right, in linear scale, and was obtained on using data from an Affymetrix Arabidopsis Chip.

[ Normal View | Magnified View ]

Distribution of the Pearson correlation coefficient (PCC) values (x‐axis) versus the number of sRNA–gene pairs (y‐axis) in the plant (a) and human (b) datasets. The bar labels represent the percentage of pairs in each correlation interval. To generate these distributions, for the plant dataset we selected genes with offset fold change, OFC > 16 (417 genes) and small RNAs (sRNAs) with OFC > 3 (316 sRNAs). For the human dataset, we selected genes with OFC > 10 (602 genes) and miRNAs with OFC > 2 (201 miRNAs).

[ Normal View | Magnified View ]

Browse by Topic

Algorithmic Development > Biological Data Mining

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts