PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Mol Biotechnol. Author manuscript; available in PMC 2009 October 28.
Published in final edited form as:
PMCID: PMC2770092
NIHMSID: NIHMS152441

Informatic Resources for Identifying and Annotating Structural RNA Motifs

Abstract

Post-transcriptional regulation of genes and transcripts is a vital aspect of cellular processes, and unlike transcriptional regulation, remains a largely unexplored domain. One of the most obvious and most important questions to explore is the discovery of functional RNA elements. Many RNA elements have been characterized to date ranging from cis-regulatory motifs within mRNAs to large families of non-coding RNAs. Like protein coding genes, the functional motifs of these RNA elements are highly conserved, but unlike protein coding genes, it is most often structure and not sequence that conserved. Proper characterization of these structural RNA motifs is both the key and the limiting step to understanding the post-transcriptional aspects of the genomic world. Here we focus on the task of structural motif discovery and provide a survey of the informatics resources geared towards this task.

Introduction

Post-transcriptional regulation of genes and transcripts is a vital aspect of cellular processes and unlike transcriptional regulation, remains a largely unexplored domain. One of the most obvious and most important questions to explore is the discovery of functional RNA elements. Many RNA elements have been characterized to date ranging from cis-regulatory motifs within mRNAs to large families of non-coding RNA such as pre-miRNAs, snRNAs, snoRNAs, gRNAs, tRNAs, rRNAs, and assorted ribozymes. Like protein coding genes, the functional motifs of these RNA elements are highly conserved, but unlike protein coding genes, it is most often structure and not sequence that conserved. Proper characterization of these structural RNA motifs is both the key and the limiting step to understanding the post-transcriptional aspects of the genomic world.

Here we focus on the task of structural motif discovery and the informatics resources and tools geared towards this task. We present first the existing databases of RNA structures and their known instances (Table 1). These range from databases of directly imaged 3D structures to ones where consensus structures have been compiled either manually from literature or by using a computational approach. They also include databases that catalog the result of genome-wide searches for conserved structures. Complementing these structure databases is a collection of tools for searching out instances of known structures in new sequences (Table 2).

Table 1
Structural Motif Catalogs
Table 2
Search Tools for Known Structural Motifs

We move on then to tools focusing on the discovery of new structural motifs from a set of related sequences. These are divided into two main families – ones that rely on pre-aligning the sequences (Table 3), and those that can work with unaligned sequences (Table 4). The first group includes notable covariance model based approaches as well as a smattering of classifier driven, Bayesian, thermodynamic, and aggregate approaches while the latter contains many improvements on the Sankoff algorithm for simultaneous sequence/structure alignment along with novel approaches such as shape-abstraction, suffix-arrays, genetic programming, and formal grammars. To aid in the comparison and benchmarking of these motif predicition algorithms, we also provide the two known attempts at compiling standardized datasets of motif-containing sequences (Table 5). The newer of these, TUTR, also contains matched control sets to help properly estimate sensitivity and specificity parameters for each algorithm.

Table 3
Consensus Structures in Aligned Sequences
Table 4
Consensus Structures in Unaligned Sequences
Table 5
Benchmark Data for Consensus Structure Prediction

We have not included here the numerous tools for predicting structures in individual sequences, for predicting interactions between RNA structures, or those for folding sequences in a specific association context or with specific thermodynamic constraints. These go well beyond the task of motif prediction as it relates to families of functionally related mRNAs and ncRNAs.

Using the listed tools, it should be possible to survey the known space of functional RNA motifs, to search for known motifs in new sequences, and to discover new structure families in related sets of aligned and unaligned sequences. This should provide a good starting point for studies of post-transcriptional regulatory elements and non-coding RNAs.

Acknowledgments

We wish to thank the members of the Tenenbaum Lab for helpful suggestions and discusion, especially Chris Zaleski and Frank Doyle. This work was supported in part by NIH grant U01HG004571 to SAT from the NHGRI.

Contributor Information

Ajish D. George, Gen*NY*Sis Center for Excellence in Cancer Genomics, University at Albany-SUNY, Department of Biomedical Sciences, School of Public Health, 1 Discovery Drive, Room 220, Rensselaer, NY 12144.

Scott A. Tenenbaum, Gen*NY*Sis Center for Excellence in Cancer Genomics, University at Albany-SUNY, Department of Biomedical Sciences, School of Public Health, 1 Discovery Drive, Room 220, Rensselaer, NY 12144, Phone (518) 591-7157; FAX (518) 591-7201.

Bibliography

  • Abreu-Goodger C, Merino E. RibEx: a web server for locating riboswitches and other conserved bacterial regulatory elements. Nucleic Acids Research. 2005;33(Web Server issue):W690–2. [PMC free article] [PubMed]
  • Abreu-Goodger C, et al. Conserved regulatory motifs in bacteria: riboswitches and beyond. Trends in Genetics: TIG. 2004;20(10):475–9. [PubMed]
  • Anwar M, Nguyen T, Turcotte M. Identification of consensus RNA secondary structures using suffix arrays. BMC Bioinformatics. 2006;7:244. [PMC free article] [PubMed]
  • Bafna V, Zhang S. FastR: fast database search tool for non-coding RNA. Proceedings/IEEE Computational Systems Bioinformatics Conference, CSB. IEEE Computational Systems Bioinformatics Conference; 2004. pp. 52–61. [PubMed]
  • Bauer M, Klau GW, Reinert K. Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization. BMC Bioinformatics. 2007;8:271. [PMC free article] [PubMed]
  • Berman HM, et al. The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids. Biophysical Journal. 1992;63(3):751–9. [PubMed]
  • di Bernardo D, Down T, Hubbard T. ddbRNA: detection of conserved secondary structures in multiple alignments. Bioinformatics (Oxford, England) 2003;19(13):1606–11. [PubMed]
  • Bindewald E, et al. RNAJunction: a database of RNA junctions and kissing loops for three-dimensional structural analysis and nanodesign. Nucleic Acids Research. 2008;36(Database issue):D392–7. [PMC free article] [PubMed]
  • Bindewald E, Shapiro BA. RNA secondary structure prediction from sequence alignments using a network of k-nearest neighbor classifiers. RNA (New York, NY) 2006;12(3):342–52. [PubMed]
  • Busch A, Backofen R. INFO-RNA--a fast approach to inverse RNA folding. Bioinformatics (Oxford, England) 2006;22(15):1823–31. [PubMed]
  • Chang T, et al. RNAMST: efficient and flexible approach for identifying RNA structural homologs. Nucleic Acids Research. 2006;34(Web Server issue):W423–W428. [PMC free article] [PubMed]
  • Coventry A, Kleitman DJ, Berger B. MSARI: multiple sequence alignments for statistical detection of RNA secondary structure. Proceedings of the National Academy of Sciences of the United States of America. 2004;101(33):12102–7. [PubMed]
  • Dalli D, et al. STRAL: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time. Bioinformatics (Oxford, England) 2006;22(13):1593–9. [PubMed]
  • Do CB, Foo C, Batzoglou S. A max-margin model for efficient simultaneous alignment and folding of RNA sequences. Bioinformatics (Oxford, England) 2008;24(13):i68–76. [PMC free article] [PubMed]
  • Doyle F, et al. Bioinformatic tools for studying post-transcriptional gene regulation: The UAlbany TUTR collection and other informatic resources. Methods in Molecular Biology (Clifton, NJ) 2008;419:39–52. [PubMed]
  • Dsouza M, Larsen N, Overbeek R. Searching for patterns in genomic data. Trends in Genetics: TIG. 1997;13(12):497–8. [PubMed]
  • Eddy SR. Computational analysis of RNAs. Cold Spring Harbor Symposia on Quantitative Biology. 2006;71:117–28. [PubMed]
  • Gardner P, Giegerich R. A comprehensive comparison of comparative RNA structure prediction approaches. BMC Bioinformatics. 2004;5(1):140. [PMC free article] [PubMed]
  • Gautheret D, Lambert A. Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles. Journal of Molecular Biology. 2001;313(5):1003–11. [PubMed]
  • Griffiths-Jones S, et al. miRBase: microRNA sequences, targets and gene nomenclature. Nucl Acids Res. 2006;34(suppl1):D140–144. [PMC free article] [PubMed]
  • Griffiths-Jones S, et al. Rfam: annotating non-coding RNAs in complete genomes. Nucl Acids Res. 2005;33(suppl1):D121–124. [PMC free article] [PubMed]
  • Hamada M, et al. Mining frequent stem patterns from unaligned RNA sequences. Bioinformatics (Oxford, England) 2006;22(20):2480–7. [PubMed]
  • Hofacker IL. RNA consensus structure prediction with RNAalifold. Methods in Molecular Biology (Clifton, NJ) 2007;395:527–44. [PubMed]
  • Hofacker IL. RNA secondary structure analysis using the Vienna RNA package. Current Protocols in Bioinformatics/Editoral Board, Andreas D Baxevanis … [et Al. 2004;Chapter 12(Unit 122) [PubMed]
  • Hofacker IL. Vienna RNA secondary structure server. Nucleic Acids Research. 2003;31(13):3429–31. [PMC free article] [PubMed]
  • Hofacker IL, Bernhart SHF, Stadler PF. Alignment of RNA base pairing probability matrices. Bioinformatics (Oxford, England) 2004;20(14):2222–7. [PubMed]
  • Holmes I. Accelerated probabilistic inference of RNA structure evolution. BMCBioinformatics. 2005;6:73. [PMC free article] [PubMed]
  • Horesh Y, et al. RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules. BMC Bioinformatics. 2007;8:366. [PMC free article] [PubMed]
  • Huang H, et al. RegRNA: an integrated web server for identifying regulatory RNA motifs and elements. Nucleic Acids Research. 2006;34(Web Server issue):W429–34. [PMC free article] [PubMed]
  • Hu Y. GPRM: A genetic programming approach to finding common RNA secondary structure elements. Nucleic Acids Research. 2003;31(13):3446–9. [PMC free article] [PubMed]
  • Jacobs GH, et al. Transterm--extended search facilities and improved integration with other databases. Nucleic Acids Research. 2006;34(Database issue):D37–40. [PMC free article] [PubMed]
  • Ji Y, Xu X, Stormo GD. A graph theoretical approach for predicting common RNA secondary structure motifs including pseudoknots in unaligned sequences. Bioinformatics (Oxford, England) 2004;20(10):1591–602. [PubMed]
  • Katoh K, Toh H. Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework. BMC Bioinformatics. 2008;9:212. [PMC free article] [PubMed]
  • Kin T, Tsuda K, Asai K. Marginalized kernels for RNA sequence data analysis. Genome Informatics International Conference on Genome Informatics. 2002;13:112–22. [PubMed]
  • Kiryu H, Kin T, Asai K. Robust prediction of consensus secondary structures using averaged base pairing probability matrices. Bioinformatics (Oxford, England) 2007;23(4):434–41. [PubMed]
  • Kiryu H, et al. Murlet: a practical multiple alignment tool for structural RNA sequences. Bioinformatics (Oxford, England) 2007;23(13):1588–98. [PubMed]
  • Klein RJ, Eddy SR. RSEARCH: finding homologs of single structured RNA sequences. BMC Bioinformatics. 2003;4:44. [PMC free article] [PubMed]
  • Knight R, Birmingham A, Yarus M. BayesFold: rational 2 degrees folds that combine thermodynamic, covariation, and chemical data for aligned RNA sequences. RNA (New York, NY) 2004;10(9):1323–36. [PubMed]
  • Knudsen B, Hein J. Pfold: RNA secondary structure prediction using stochastic context-free grammars. Nucleic Acids Research. 2003;31(13):3423–8. [PMC free article] [PubMed]
  • Lambert A, et al. Computing expectation values for RNA motifs using discrete convolutions. BMC Bioinformatics. 2005;6:118. [PMC free article] [PubMed]
  • Le S, Maizel JV, Zhang K. An algorithm for detecting homologues of known structured RNAs in genomes. Proceedings/IEEE Computational Systems Bioinformatics Conference, CSB. IEEE Computational Systems Bioinformatics Conference; 2004. pp. 300–10. [PubMed]
  • Lestrade L, Weber MJ. snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Research. 2006;34(Database issue):D158–62. [PMC free article] [PubMed]
  • Le SY, Zhang K, Maizel JV. A method for predicting common structures of homologous RNAs. Computers and Biomedical Research, an International Journal. 1995;28(1):53–66. [PubMed]
  • Lindgreen S, Gardner PP, Krogh A. MASTR: multiple alignment and structure prediction of non-coding RNAs using simulated annealing. Bioinformatics (Oxford, England) 2007;23(24):3304–11. [PubMed]
  • Liu J, et al. A method for aligning RNA secondary structures and its application to RNA motif detection. BMC Bioinformatics. 2005;6:89. [PMC free article] [PubMed]
  • Macke TJ, et al. RNAMotif, an RNA secondary structure definition and search algorithm. Nucleic Acids Research. 2001;29(22):4724–35. [PMC free article] [PubMed]
  • Matsui H, Sato K, Sakakibara Y. Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures. Proceedings/IEEE Computational Systems Bioinformatics Conference, CSB. IEEE Computational Systems Bioinformatics Conference; 2004. pp. 290–9. [PubMed]
  • Meyer IM, Miklós I. SimulFold: simultaneously inferring RNA structures including pseudoknots, alignments, and trees using a Bayesian MCMC framework. PloS Computational Biology. 2007;3(8):e149. [PubMed]
  • Mignone F, et al. UTRdb and UTRsite: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs. Nucl Acids Res. 2005;33(suppl1):D141–146. [PMC free article] [PubMed]
  • Moretti S, et al. R-Coffee: a web server for accurately aligning noncoding RNA sequences. Nucleic Acids Research. 2007;36(Web Server issue):W10–3. [PMC free article] [PubMed]
  • Pavesi G, et al. RNAProfile: an algorithm for finding conserved secondary structure motifs in unaligned RNA sequences. Nucleic Acids Research. 2004;32(10):3258–69. [PMC free article] [PubMed]
  • Pedersen JS, et al. Identification and Classification of Conserved RNA Secondary Structures in the Human Genome. PLoS Computational Biology. 2006;2(4):e33. [PubMed]
  • Pesole G, Liuni S. Internet resources for the functional analysis of 5′ and 3′ untranslated regions of eukaryotic mRNAs. Trends in Genetics: TIG. 1999;15(9):378. [PubMed]
  • Reeder J, Reeder J, Giegerich R. Locomotif: from graphical motif description to RNA motif search. Bioinformatics (Oxford, England) 2007;23(13):i392–400. [PubMed]
  • Rivas E, Eddy SR. Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics. 2001;2:8. [PMC free article] [PubMed]
  • Rocheleau L, Pelchat M. The Subviral RNA Database: a toolbox for viroids, the hepatitis delta virus and satellite RNAs research. BMC Microbiology. 2006;6:24. [PMC free article] [PubMed]
  • Ruan J, Stormo GD, Zhang W. An iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots. Bioinformatics (Oxford, England) 2004;20(1):58–66. [PubMed]
  • Sakakibara Y. Pair hidden Markov models on tree structures. Bioinformatics (Oxford, England) 2003;19(Suppl 1):i232–40. [PubMed]
  • Sakakibara Y, et al. Stem kernels for RNA sequence analyses. Journal of Bioinformatics and Computational Biology. 2007;5(5):1103–22. [PubMed]
  • Siebert S, Backofen R. MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons. Bioinformatics (Oxford, England) 2005;21(16):3352–9. [PubMed]
  • Steffen P, et al. RNAshapes: an integrated RNA analysis package based on abstract shapes. Bioinformatics (Oxford, England) 2006;22(4):500–3. [PubMed]
  • Tabei Y, et al. A fast structural multiple alignment method for long RNA sequences. BMC Bioinformatics. 2007;9:33. [PMC free article] [PubMed]
  • Thébault P, et al. Searching RNA motifs and their intermolecular contacts with constraint networks. Bioinformatics (Oxford, England) 2006;22(17):2074–80. [PubMed]
  • Touzet H. Comparative analysis of RNA genes: the caRNAc software. Methods in Molecular Biology (Clifton, NJ) 2007;395:465–74. [PubMed]
  • Veksler-Lublinsky I, et al. A structure-based flexible search method for motifs in RNA. Journal of Computational Biology: A Journal of Computational Molecular Cell Biology. 2007;14(7):908–26. [PubMed]
  • Washietl S, Hofacker IL, Stadler PF. Fast and reliable prediction of noncoding RNAs. Proceedings of the National Academy of Sciences of the United States of America. 2005;102(7):2454–9. [PubMed]
  • Will S, et al. Inferring noncoding RNA families and classes by means of genome-scale structure-based clustering. PLoS Computational Biology. 2007;3(4):e65. [PubMed]
  • Wilm A, Higgins DG, Notredame C. R-Coffee: a method for multiple alignment of non-coding RNA. Nucleic Acids Research. 2007;36(9):e52. [PMC free article] [PubMed]
  • Wilm A, Linnenbrink K, Steger G. ConStruct: Improved construction of RNA consensus structures. BMC Bioinformatics. 2007;9:219. [PMC free article] [PubMed]
  • Xie J, et al. Sno/scaRNAbase: a curated database for small nucleolar RNAs and cajal body-specific RNAs. Nucleic Acids Research. 2007;35(Database issue):D183–7. [PubMed]
  • Xue C, Liu G. RScan: fast searching structural similarities for structured RNAs in large databases. BMC Genomics. 2007;8:257. [PMC free article] [PubMed]
  • Xu X, Ji Y, Stormo GD. RNA Sampler: a new sampling based algorithm for common RNA secondary structure prediction and structural alignment. Bioinformatics (Oxford, England) 2007;23(15):1883–91. [PubMed]
  • Yao Z, Weinberg Z, Ruzzo WL. CMfinder--a covariance model based RNA motif finding algorithm. Bioinformatics (Oxford, England) 2006;22(4):445–52. [PubMed]
  • Zhang S, et al. Searching genomes for noncoding RNA using FastR. IEEE/ACM Transactions on Computational Biology and Bioinformatics/IEEE, ACM. 2005;2(4):366–79. [PubMed]
  • Zhou Y, et al. GISSD: Group I Intron Sequence and Structure Database. Nucleic Acids Research. 2007;36(Database issue):D31–7. [PMC free article] [PubMed]