This Title All WIREs
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 4.476

Data mining of functional RNA structures in genomic sequences

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract The normal functions of genomes depend on the precise expression of messenger RNAs and noncoding RNAs (ncRNAs) such as transfer RNAs and microRNAs in eukaryotes. These ncRNAs and functional RNA structures (FRSs) act as regulators or response elements for cellular factors and participate in transcription, posttranscriptional processing, and translation. Knowledge discovery of these FRSs in huge DNA/RNA sequence databases is a very important step to reach our goal of going from genomic sequence data to biological knowledge for understanding RNA‐based regulation. Analyses of a large number of FRSs have indicated that the FRS can be well characterized by some quantitative measures such as significance and well‐ordered scores of the local segment. Various data mining tools have been developed and successfully applied to FRS discovery in genomic sequence databases. Here, we summarize our efforts in the computational discovery of structured features of ncRNAs and FRSs within complex genomes by EDscan and SigED. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 88–95 DOI: 10.1002/widm.13 This article is categorized under: Algorithmic Development > Biological Data Mining Algorithmic Development > Spatial and Temporal Data Mining Application Areas > Health Care Technologies > Structure Discovery and Clustering

The optimal structure (OS) and corresponding optimal restrained structure (ORS) computed from Caenorhabditis elegans let‐7 precursor sequence (a) and its randomly shuffled sequence (b). The computed lowest free energies of OS and ORS for the natural functional RNA are −38.5 (E) and −13.8 (Ef) kcal/mol, and from the randomly shuffled sequence are −15.0 (E) and −13.9 (Ef) kcal/mol, respectively. Ediff values are 24.7 kcal/mol for the let‐7 precursor and 1.1 kcal/mol for its randomly shuffled sequence. It is quite obvious that the greater Ediff of the folded let‐7 wild‐type sequence indicates a significantly more well‐ordered OS.17

[ Normal View | Magnified View ]

Zscre of local segments computed for the genomic sequence of Caenorhabditis elegans (accession no. AF274345). Zscre were computed by moving a set of windows with sizes of 75‐nt (shown in row 1), 100‐nt (row 2), 125‐nt (row 3), and 150‐nt (row 4) in steps of 3 nt from 5′ to 3′ along the sequence by StemED. The plot was made by plotting the Zscre against the position of the middle nt of these overlapping segments. The reported stem‐loop of let‐7 can be easily distinguished in each plot by the maximal Zscre as denoted the by peak in the plot.30

[ Normal View | Magnified View ]

Browse by Topic

Algorithmic Development > Spatial and Temporal Data Mining
Technologies > Structure Discovery and Clustering
Algorithmic Development > Biological Data Mining
Application Areas > Health Care

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts