Search tips
Search criteria

Results 1-25 (1278046)

Clipboard (0)

Related Articles

1.  MODOMICS: a database of RNA modification pathways. 2008 update 
Nucleic Acids Research  2008;37(Database issue):D118-D121.
MODOMICS, a database devoted to the systems biology of RNA modification, has been subjected to substantial improvements. It provides comprehensive information on the chemical structure of modified nucleosides, pathways of their biosynthesis, sequences of RNAs containing these modifications and RNA-modifying enzymes. MODOMICS also provides cross-references to other databases and to literature. In addition to the previously available manually curated tRNA sequences from a few model organisms, we have now included additional tRNAs and rRNAs, and all RNAs with 3D structures in the Nucleic Acid Database, in which modified nucleosides are present. In total, 3460 modified bases in RNA sequences of different organisms have been annotated. New RNA-modifying enzymes have been also added. The current collection of enzymes includes mainly proteins for the model organisms Escherichia coli and Saccharomyces cerevisiae, and is currently being expanded to include proteins from other organisms, in particular Archaea and Homo sapiens. For enzymes with known structures, links are provided to the corresponding Protein Data Bank entries, while for many others homology models have been created. Many new options for database searching and querying have been included. MODOMICS can be accessed at
PMCID: PMC2686465  PMID: 18854352
2.  MODOMICS: a database of RNA modification pathways—2013 update 
Nucleic Acids Research  2012;41(Database issue):D262-D267.
MODOMICS is a database of RNA modifications that provides comprehensive information concerning the chemical structures of modified ribonucleosides, their biosynthetic pathways, RNA-modifying enzymes and location of modified residues in RNA sequences. In the current database version, accessible at, we included new features: a census of human and yeast snoRNAs involved in RNA-guided RNA modification, a new section covering the 5′-end capping process, and a catalogue of ‘building blocks’ for chemical synthesis of a large variety of modified nucleosides. The MODOMICS collections of RNA modifications, RNA-modifying enzymes and modified RNAs have been also updated. A number of newly identified modified ribonucleosides and more than one hundred functionally and structurally characterized proteins from various organisms have been added. In the RNA sequences section, snRNAs and snoRNAs with experimentally mapped modified nucleosides have been added and the current collection of rRNA and tRNA sequences has been substantially enlarged. To facilitate literature searches, each record in MODOMICS has been cross-referenced to other databases and to selected key publications. New options for database searching and querying have been implemented, including a BLAST search of protein sequences and a PARALIGN search of the collected nucleic acid sequences.
PMCID: PMC3531130  PMID: 23118484
3.  Prediction of uridine modifications in tRNA sequences 
BMC Bioinformatics  2014;15(1):326.
In past number of methods have been developed for predicting post-translational modifications in proteins. In contrast, limited attempt has been made to understand post-transcriptional modifications. Recently it has been shown that tRNA modifications play direct role in the genome structure and codon usage. This study is an attempt to understand kingdom-wise tRNA modifications particularly uridine modifications (UMs), as majority of modifications are uridine-derived.
A three-steps strategy has been applied to develop an efficient method for the prediction of UMs. In the first step, we developed a common prediction model for all the kingdoms using a dataset from MODOMICS-2008. Support Vector Machine (SVM) based prediction models were developed and evaluated by five-fold cross-validation technique. Different approaches were applied and found that a hybrid approach of binary and structural information achieved highest Area under the curve (AUC) of 0.936. In the second step, we used newly added tRNA sequences (as independent dataset) of MODOMICS-2012 for the kingdom-wise prediction performance evaluation of previously developed (in the first step) common model and achieved performances between the AUC of 0.910 to 0.949. In the third and last step, we used different datasets from MODOMICS-2012 for the kingdom-wise individual prediction models development and achieved performances between the AUC of 0.915 to 0.987.
The hybrid approach is efficient not only to predict kingdom-wise modifications but also to classify them into two most prominent UMs: Pseudouridine (Y) and Dihydrouridine (D). A webserver called tRNAmod ( has been developed, which predicts UMs from both tRNA sequences and whole genome.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-326) contains supplementary material, which is available to authorized users.
PMCID: PMC4287530  PMID: 25272949
Uridine modifications; Pseudouridine; Dihydrouridine; 5-methyl-uridine; tRNAmod
4.  RNomics and Modomics in the halophilic archaea Haloferax volcanii: identification of RNA modification genes 
BMC Genomics  2008;9:470.
Naturally occurring RNAs contain numerous enzymatically altered nucleosides. Differences in RNA populations (RNomics) and pattern of RNA modifications (Modomics) depends on the organism analyzed and are two of the criteria that distinguish the three kingdoms of life. If the genomic sequences of the RNA molecules can be derived from whole genome sequence information, the modification profile cannot and requires or direct sequencing of the RNAs or predictive methods base on the presence or absence of the modifications genes.
By employing a comparative genomics approach, we predicted almost all of the genes coding for the t+rRNA modification enzymes in the mesophilic moderate halophile Haloferax volcanii. These encode both guide RNAs and enzymes. Some are orthologous to previously identified genes in Archaea, Bacteria or in Saccharomyces cerevisiae, but several are original predictions.
The number of modifications in t+rRNAs in the halophilic archaeon is surprisingly low when compared with other Archaea or Bacteria, particularly the hyperthermophilic organisms. This may result from the specific lifestyle of halophiles that require high intracellular salt concentration for survival. This salt content could allow RNA to maintain its functional structural integrity with fewer modifications. We predict that the few modifications present must be particularly important for decoding, accuracy of translation or are modifications that cannot be functionally replaced by the electrostatic interactions provided by the surrounding salt-ions. This analysis also guides future experimental validation work aiming to complete the understanding of the function of RNA modifications in Archaeal translation.
PMCID: PMC2584109  PMID: 18844986
5.  The YqfN protein of Bacillus subtilis is the tRNA: m1A22 methyltransferase (TrmK) 
Nucleic Acids Research  2008;36(10):3252-3262.
N1-methylation of adenosine to m1A occurs in several different positions in tRNAs from various organisms. A methyl group at position N1 prevents Watson–Crick-type base pairing by adenosine and is therefore important for regulation of structure and stability of tRNA molecules. Thus far, only one family of genes encoding enzymes responsible for m1A methylation at position 58 has been identified, while other m1A methyltransferases (MTases) remain elusive. Here, we show that Bacillus subtilis open reading frame yqfN is necessary and sufficient for N1-adenosine methylation at position 22 of bacterial tRNA. Thus, we propose to rename YqfN as TrmK, according to the traditional nomenclature for bacterial tRNA MTases, or TrMet(m1A22) according to the nomenclature from the MODOMICS database of RNA modification enzymes. tRNAs purified from a ΔtrmK strain are a good substrate in vitro for the recombinant TrmK protein, which is sufficient for m1A methylation at position 22 as are tRNAs from Escherichia coli, which natively lacks m1A22. TrmK is conserved in Gram-positive bacteria and present in some Gram-negative bacteria, but its orthologs are apparently absent from archaea and eukaryota. Protein structure prediction indicates that the active site of TrmK does not resemble the active site of the m1A58 MTase TrmI, suggesting that these two enzymatic activities evolved independently.
PMCID: PMC2425500  PMID: 18420655
6.  RNApathwaysDB—a database of RNA maturation and decay pathways 
Nucleic Acids Research  2012;41(Database issue):D268-D272.
Many RNA molecules undergo complex maturation, involving e.g. excision from primary transcripts, removal of introns, post-transcriptional modification and polyadenylation. The level of mature, functional RNAs in the cell is controlled not only by the synthesis and maturation but also by degradation, which proceeds via many different routes. The systematization of data about RNA metabolic pathways and enzymes taking part in RNA maturation and degradation is essential for the full understanding of these processes. RNApathwaysDB, available online at, is an online resource about maturation and decay pathways involving RNA as the substrate. The current release presents information about reactions and enzymes that take part in the maturation and degradation of tRNA, rRNA and mRNA, and describes pathways in three model organisms: Escherichia coli, Saccharomyces cerevisiae and Homo sapiens. RNApathwaysDB can be queried with keywords, and sequences of protein enzymes involved in RNA processing can be searched with BLAST. Options for data presentation include pathway graphs and tables with enzymes and literature data. Structures of macromolecular complexes involving RNA and proteins that act on it are presented as ‘potato models’ using DrawBioPath—a new javascript tool.
PMCID: PMC3531052  PMID: 23155061
7.  Compilation of tRNA sequences. 
Nucleic Acids Research  1979;6(1):r1-r19.
This compilation presents in a small space the tRNA sequences so far published in order to enable rapid orientation and comparison. The numbering of tRNAPhe from yeast is used as has been done earlier (1) but following the rules proposed by the participants of the Cold Spring Harbor Meeting on tRNA 1978 (2) (Fig. 1). This numbering allows comparisons with the three dimensional structure of tRNAPhe, the only structure known from X-ray analysis. The secondary structure of tRNAs is indicated by specific underlining. In the primary structure a nucleoside followed by a nucleoside in brackets or a modification in brackets denotes that both types of nucleosides can occupy this position. Part of a sequence in brackets designates a piece of sequence not unambiguously analyzed. Rare nucleosides are named according to the IUPAC-IUB rules (for some more complicated rare nucleosides and their identification see Table 1); those with lengthy names are given with the prefix x and specified in the footnotes. Footnotes are numbered according to the coordinates of the corresponding nucleoside and are indicated in the sequence by an asterisk. The references are restricted to the citation of the latest publication in those cases where several papers deal with one sequence. For additional information the reader is referred either to the original literature or to other tRNA sequence compilations (3--7). Mutant tRNAs are dealt with in a separate compilation prepared by J. Celis (see below). The compilers would welcome any information by the readers regarding missing material or erroneous presentation. On the basis of this numbering system computer printed compilations of tRNA sequences in a linear form and in cloverleaf form are in preparation.
PMCID: PMC327698  PMID: 424282
8.  Compilation of tRNA sequences. 
Nucleic Acids Research  1980;8(1):r1-r22.
This compilation presents in a small space the tRNA sequences so far published. The numbering of tRNAPhe from yeast is used following the rules proposed by the participants of the Cold Spring Harbor Meeting on tRNA 1978 (1,2;Fig. 1). This numbering allows comparisons with the three dimensional structure of tRNAPhe. The secondary structure of tRNAs is indicated by specific underlining. In the primary structure a nucleoside followed by a nucleoside in brackets or a modification in brackets denotes that both types of nucleosides can occupy this position. Part of a sequence in brackets designates a piece of sequence not unambiguosly analyzed. Rare nucleosides are named according to the IUPACIUB rules (for complicated rare nucleosides and their identification see Table 1); those with lengthy names are given with the prefix x and specified in the footnotes. Footnotes are numbered according to the coordinates of the corresponding nucleoside and are indicated in the sequence by an asterisk. The references are restricted to the citation of the latest publication in those cases where several papers deal with one sequence. For additional information the reader is referred either to the original literature or to other tRNA sequence compilations (3-7). Mutant tRNAs are dealt with in a compilation by J. Celis (8). The compilers would welcome any information by the readers regarding missing material or erroneous presentation. On the basis of this numbering system computer printed compilations of tRNA sequences in a linear form and in cloverleaf form are in preparation.
PMCID: PMC327253  PMID: 6986608
9.  Sno/scaRNAbase: a curated database for small nucleolar RNAs and cajal body-specific RNAs 
Nucleic Acids Research  2006;35(Database issue):D183-D187.
Small nucleolar RNAs (snoRNAs) and Cajal body-specific RNAs (scaRNAs) are named for their subcellular localization within nucleoli and Cajal bodies (conserved subnuclear organelles present in the nucleoplasm), respectively. They have been found to play important roles in rRNA, tRNA, snRNAs, and even mRNA modification and processing. All snoRNAs fall in two categories, box C/D snoRNAs and box H/ACA snoRNAs, according to their distinct sequence and secondary structure features. Box C/D snoRNAs and box H/ACA snoRNAs mainly function in guiding 2′-O-ribose methylation and pseudouridilation, respectively. ScaRNAs possess both box C/D snoRNA and box H/ACA snoRNA sequence motif features, but guide snRNA modifications that are transcribed by RNA polymerase II. Here we present a Web-based sno/scaRNA database, called sno/scaRNAbase, to facilitate the sno/scaRNA research in terms of providing a more comprehensive knowledge base. Covering 1979 records derived from 85 organisms for the first time, sno/scaRNAbase is not only dedicated to filling gaps between existing organism-specific sno/scaRNA databases that are focused on different sno/scaRNA aspects, but also provides sno/scaRNA scientists with an opportunity to adopt a unified nomenclature for sno/scaRNAs. Derived from a systematic literature curation and annotation effort, the sno/scaRNAbase provides an easy-to-use gateway to important sno/scaRNA features such as sequence motifs, possible functions, homologues, secondary structures, genomics organization, sno/scaRNA gene's chromosome location, and more. Approximate searches, in addition to accurate and straightforward searches, make the database search more flexible. A BLAST search engine is implemented to enable blast of query sequences against all sno/scaRNAbase sequences. Thus our sno/scaRNAbase serves as a more uniform and friendly platform for sno/scaRNA research. The database is free available at .
PMCID: PMC1669756  PMID: 17099227
10.  MetaCyc: a multiorganism database of metabolic pathways and enzymes 
Nucleic Acids Research  2005;34(Database issue):D511-D516.
MetaCyc is a database of metabolic pathways and enzymes located at . Its goal is to serve as a metabolic encyclopedia, containing a collection of non-redundant pathways central to small molecule metabolism, which have been reported in the experimental literature. Most of the pathways in MetaCyc occur in microorganisms and plants, although animal pathways are also represented. MetaCyc contains metabolic pathways, enzymatic reactions, enzymes, chemical compounds, genes and review-level comments. Enzyme information includes substrate specificity, kinetic properties, activators, inhibitors, cofactor requirements and links to sequence and structure databases. Data are curated from the primary literature by curators with expertise in biochemistry and molecular biology. MetaCyc serves as a readily accessible comprehensive resource on microbial and plant pathways for genome analysis, basic research, education, metabolic engineering and systems biology. Querying, visualization and curation of the database is supported by SRI's Pathway Tools software. The PathoLogic component of Pathway Tools is used in conjunction with MetaCyc to predict the metabolic network of an organism from its annotated genome. SRI and the European Bioinformatics Institute employed this tool to create pathway/genome databases (PGDBs) for 165 organisms, available at the website. These PGDBs also include predicted operons and pathway hole fillers.
PMCID: PMC1347490  PMID: 16381923
11.  BioWarehouse: a bioinformatics database warehouse toolkit 
BMC Bioinformatics  2006;7:170.
This article addresses the problem of interoperation of heterogeneous bioinformatics databases.
We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research.
BioWarehouse embodies significant progress on the database integration problem for bioinformatics.
PMCID: PMC1444936  PMID: 16556315
12.  YMDB: the Yeast Metabolome Database 
Nucleic Acids Research  2011;40(Database issue):D815-D820.
The Yeast Metabolome Database (YMDB, is a richly annotated ‘metabolomic’ database containing detailed information about the metabolome of Saccharomyces cerevisiae. Modeled closely after the Human Metabolome Database, the YMDB contains >2000 metabolites with links to 995 different genes/proteins, including enzymes and transporters. The information in YMDB has been gathered from hundreds of books, journal articles and electronic databases. In addition to its comprehensive literature-derived data, the YMDB also contains an extensive collection of experimental intracellular and extracellular metabolite concentration data compiled from detailed Mass Spectrometry (MS) and Nuclear Magnetic Resonance (NMR) metabolomic analyses performed in our lab. This is further supplemented with thousands of NMR and MS spectra collected on pure, reference yeast metabolites. Each metabolite entry in the YMDB contains an average of 80 separate data fields including comprehensive compound description, names and synonyms, structural information, physico-chemical data, reference NMR and MS spectra, intracellular/extracellular concentrations, growth conditions and substrates, pathway information, enzyme data, gene/protein sequence data, as well as numerous hyperlinks to images, references and other public databases. Extensive searching, relational querying and data browsing tools are also provided that support text, chemical structure, spectral, molecular weight and gene/protein sequence queries. Because of S. cervesiae's importance as a model organism for biologists and as a biofactory for industry, we believe this kind of database could have considerable appeal not only to metabolomics researchers, but also to yeast biologists, systems biologists, the industrial fermentation industry, as well as the beer, wine and spirit industry.
PMCID: PMC3245085  PMID: 22064855
13.  Discovering cis-Regulatory RNAs in Shewanella Genomes by Support Vector Machines 
PLoS Computational Biology  2009;5(4):e1000338.
An increasing number of cis-regulatory RNA elements have been found to regulate gene expression post-transcriptionally in various biological processes in bacterial systems. Effective computational tools for large-scale identification of novel regulatory RNAs are strongly desired to facilitate our exploration of gene regulation mechanisms and regulatory networks. We present a new computational program named RSSVM (RNA Sampler+Support Vector Machine), which employs Support Vector Machines (SVMs) for efficient identification of functional RNA motifs from random RNA secondary structures. RSSVM uses a set of distinctive features to represent the common RNA secondary structure and structural alignment predicted by RNA Sampler, a tool for accurate common RNA secondary structure prediction, and is trained with functional RNAs from a variety of bacterial RNA motif/gene families covering a wide range of sequence identities. When tested on a large number of known and random RNA motifs, RSSVM shows a significantly higher sensitivity than other leading RNA identification programs while maintaining the same false positive rate. RSSVM performs particularly well on sets with low sequence identities. The combination of RNA Sampler and RSSVM provides a new, fast, and efficient pipeline for large-scale discovery of regulatory RNA motifs. We applied RSSVM to multiple Shewanella genomes and identified putative regulatory RNA motifs in the 5′ untranslated regions (UTRs) in S. oneidensis, an important bacterial organism with extraordinary respiratory and metal reducing abilities and great potential for bioremediation and alternative energy generation. From 1002 sets of 5′-UTRs of orthologous operons, we identified 166 putative regulatory RNA motifs, including 17 of the 19 known RNA motifs from Rfam, an additional 21 RNA motifs that are supported by literature evidence, 72 RNA motifs overlapping predicted transcription terminators or attenuators, and other candidate regulatory RNA motifs. Our study provides a list of promising novel regulatory RNA motifs potentially involved in post-transcriptional gene regulation. Combined with the previous cis-regulatory DNA motif study in S. oneidensis, this genome-wide discovery of cis-regulatory RNA motifs may offer more comprehensive views of gene regulation at a different level in this organism. The RSSVM software, predictions, and analysis results on Shewanella genomes are available at
Author Summary
RNA is remarkably versatile, acting not only as messengers to transfer genetic information from DNA to protein but also as critical structural components and catalytic enzymes in the cell. More intriguingly, RNA elements in messenger RNAs have been widely found in bacteria to control the expression of their downstream genes. The functions of these RNA elements are intrinsically linked to their secondary structures, which are usually conserved across multiple closely related species during evolution and often shared by genes in the same metabolic pathways. We developed a new computational approach to find putative functional RNA elements by looking for conserved RNA secondary structures that are distinguished from random RNA secondary structures in the orthologous RNA sequences from related species. We applied this approach to multiple Shewanella genomes and predicted putative regulatory RNA elements in Shewanella oneidensis, a bacterium that has extraordinary respiratory and metal reducing abilities and great potential for bioremediation and alternative energy generation. Our findings not only recovered many RNA elements that are known or supported by literature evidence but also included exciting novel RNA elements for further exploration.
PMCID: PMC2659441  PMID: 19343219
14.  ECMDB: The E. coli Metabolome Database 
Nucleic Acids Research  2012;41(Database issue):D625-D630.
The Escherichia coli Metabolome Database (ECMDB, is a comprehensively annotated metabolomic database containing detailed information about the metabolome of E. coli (K-12). Modelled closely on the Human and Yeast Metabolome Databases, the ECMDB contains >2600 metabolites with links to ∼1500 different genes and proteins, including enzymes and transporters. The information in the ECMDB has been collected from dozens of textbooks, journal articles and electronic databases. Each metabolite entry in the ECMDB contains an average of 75 separate data fields, including comprehensive compound descriptions, names and synonyms, chemical taxonomy, compound structural and physicochemical data, bacterial growth conditions and substrates, reactions, pathway information, enzyme data, gene/protein sequence data and numerous hyperlinks to images, references and other public databases. The ECMDB also includes an extensive collection of intracellular metabolite concentration data compiled from our own work as well as other published metabolomic studies. This information is further supplemented with thousands of fully assigned reference nuclear magnetic resonance and mass spectrometry spectra obtained from pure E. coli metabolites that we (and others) have collected. Extensive searching, relational querying and data browsing tools are also provided that support text, chemical structure, spectral, molecular weight and gene/protein sequence queries. Because of E. coli’s importance as a model organism for biologists and as a biofactory for industry, we believe this kind of database could have considerable appeal not only to metabolomics researchers but also to molecular biologists, systems biologists and individuals in the biotechnology industry.
PMCID: PMC3531117  PMID: 23109553
15.  MUBII-TB-DB: a database of mutations associated with antibiotic resistance in Mycobacterium tuberculosis 
BMC Bioinformatics  2014;15:107.
Tuberculosis is an infectious bacterial disease caused by Mycobacterium tuberculosis. It remains a major health threat, killing over one million people every year worldwide. An early antibiotic therapy is the basis of the treatment, and the emergence and spread of multidrug and extensively drug-resistant mutant strains raise significant challenges. As these bacteria grow very slowly, drug resistance mutations are currently detected using molecular biology techniques. Resistance mutations are identified by sequencing the resistance-linked genes followed by a comparison with the literature data. The only online database is the TB Drug Resistance Mutation database (TBDReaM database); however, it requires mutation detection before use, and its interrogation is complex due to its loose syntax and grammar.
The MUBII-TB-DB database is a simple, highly structured text-based database that contains a set of Mycobacterium tuberculosis mutations (DNA and proteins) occurring at seven loci: rpoB, pncA, katG; mabA(fabG1)-inhA, gyrA, gyrB, and rrs. Resistance mutation data were extracted after the systematic review of MEDLINE referenced publications before March 2013. MUBII analyzes the query sequence obtained by PCR-sequencing using two parallel strategies: i) a BLAST search against a set of previously reconstructed mutated sequences and ii) the alignment of the query sequences (DNA and its protein translation) with the wild-type sequences. The post-treatment includes the extraction of the aligned sequences together with their descriptors (position and nature of mutations). The whole procedure is performed using the internet. The results are graphs (alignments) and text (description of the mutation, therapeutic significance). The system is quick and easy to use, even for technicians without bioinformatics training.
MUBII-TB-DB is a structured database of the mutations occurring at seven loci of major therapeutic value in tuberculosis management. Moreover, the system provides interpretation of the mutations in biological and therapeutic terms and can evolve by the addition of newly described mutations. Its goal is to provide easy and comprehensive access through a client–server model over the Web to an up-to-date database of mutations that lead to the resistance of M. tuberculosis to antibiotics.
PMCID: PMC4021062  PMID: 24731071
Tuberculosis; Antibiotics; Mutation database; Sequence database; Web
16.  A novel enzymatic pathway leading to 1-methylinosine modification in Haloferax volcanii tRNA. 
Nucleic Acids Research  1995;23(21):4312-4319.
Transfer RNAs of the extreme halophile Haloferax volcanii contain several modified nucleosides, among them 1-methylpseudouridine (m1 psi), pseudouridine (psi), 2'-0-methylcytosine (Cm) and 1-methylinosine (m1l), present in positions 54, 55, 56 and 57 of the psi-loop, respectively. At the same positions in tRNAs from eubacteria and eukaryotes, ribothymidine (T-54), pseudouridine (psi-55), non-modified cytosine (C-56) and non-modified adenosine or guanosine (A-57 or G-57) are found in the so-called T psi-loop. Using as substrate a T7 transcript of Haloferax volcanii tRNA(Ile) devoid of modified nucleosides, the enzymatic activities of several tRNA modification enzymes, including those for m1 psi-54, psi-55, Cm-56 and m1l-57, were detected in cell extracts of H.volcanii. Here, we demonstrate that modification of A-57 into m1l-57 in H.volcanii tRNA(Ile) occurs via a two-step enzymatic process. The first step corresponds to the formation of m1A-57 catalyzed by a S-adenosylmethionine-dependent tRNA methyltransferase, followed by the deamination of the 6-amino group of the adenine moiety by a 1-methyladenosine-57 deaminase. This enzymatic pathway differs from that leading to the formation of m1l-37 in the anticodon loop of eukaryotic tRNA(Ala). In the latter case, inosine-37 formation preceeds the S-adenosylmethionine-dependent methylation of l-37 into m1l-37. Thus, enzymatic strategies for catalyzing the formation of 1-methylinosine in tRNAs differ in organisms from distinct evolutionary kingdoms.
PMCID: PMC307385  PMID: 7501451
17.  Distinguishing Protein-Coding from Non-Coding RNAs through Support Vector Machines 
PLoS Genetics  2006;2(4):e29.
RIKEN's FANTOM project has revealed many previously unknown coding sequences, as well as an unexpected degree of variation in transcripts resulting from alternative promoter usage and splicing. Ever more transcripts that do not code for proteins have been identified by transcriptome studies, in general. Increasing evidence points to the important cellular roles of such non-coding RNAs (ncRNAs). The distinction of protein-coding RNA transcripts from ncRNA transcripts is therefore an important problem in understanding the transcriptome and carrying out its annotation. Very few in silico methods have specifically addressed this problem. Here, we introduce CONC (for “coding or non-coding”), a novel method based on support vector machines that classifies transcripts according to features they would have if they were coding for proteins. These features include peptide length, amino acid composition, predicted secondary structure content, predicted percentage of exposed residues, compositional entropy, number of homologs from database searches, and alignment entropy. Nucleotide frequencies are also incorporated into the method. Confirmed coding cDNAs for eukaryotic proteins from the Swiss-Prot database constituted the set of true positives, ncRNAs from RNAdb and NONCODE the true negatives. Ten-fold cross-validation suggested that CONC distinguished coding RNAs from ncRNAs at about 97% specificity and 98% sensitivity. Applied to 102,801 mouse cDNAs from the FANTOM3 dataset, our method reliably identified over 14,000 ncRNAs and estimated the total number of ncRNAs to be about 28,000.
There are two types of RNA: messenger RNAs (mRNAs), which are translated into proteins, and non-coding RNAs (ncRNAs), which function as RNA molecules. Besides textbook examples such as tRNAs and rRNAs, non-coding RNAs have been found to carry out very diverse functions, from mRNA splicing and RNA modification to translational regulation. It has been estimated that non-coding RNAs make up the vast majority of transcription output of higher eukaryotes. Discriminating mRNA from ncRNA has become an important biological and computational problem. The authors describe a computational method based on a machine learning algorithm known as a support vector machine (SVM) that classifies transcripts according to features they would have if they were coding for proteins. These features include peptide length, amino acid composition, secondary structure content, and protein alignment information. The method is applied to the dataset from the FANTOM3 large-scale mouse cDNA sequencing project; it identifies over 14,000 ncRNAs in mouse and estimates the total number of ncRNAs in the FANTOM3 data to be about 28,000.
PMCID: PMC1449884  PMID: 16683024
18.  Methylated nucleosides in tRNA and tRNA methyltransferases 
Frontiers in Genetics  2014;5:144.
To date, more than 90 modified nucleosides have been found in tRNA and the biosynthetic pathways of the majority of tRNA modifications include a methylation step(s). Recent studies of the biosynthetic pathways have demonstrated that the availability of methyl group donors for the methylation in tRNA is important for correct and efficient protein synthesis. In this review, I focus on the methylated nucleosides and tRNA methyltransferases. The primary functions of tRNA methylations are linked to the different steps of protein synthesis, such as the stabilization of tRNA structure, reinforcement of the codon-anticodon interaction, regulation of wobble base pairing, and prevention of frameshift errors. However, beyond these basic functions, recent studies have demonstrated that tRNA methylations are also involved in the RNA quality control system and regulation of tRNA localization in the cell. In a thermophilic eubacterium, tRNA modifications and the modification enzymes form a network that responses to temperature changes. Furthermore, several modifications are involved in genetic diseases, infections, and the immune response. Moreover, structural, biochemical, and bioinformatics studies of tRNA methyltransferases have been clarifying the details of tRNA methyltransferases and have enabled these enzymes to be classified. In the final section, the evolution of modification enzymes is discussed.
PMCID: PMC4033218  PMID: 24904644
RNA modification; RNA methylation; RNA maturation
19.  EPIC-DB: a proteomics database for studying Apicomplexan organisms 
BMC Genomics  2009;10:38.
High throughput proteomics experiments are useful for analyzing the protein expression of an organism, identifying the correct gene structure of a genome, or locating possible post-translational modifications within proteins. High throughput methods necessitate publicly accessible and easily queried databases for efficiently and logically storing, displaying, and analyzing the large volume of data.
EPICDB is a publicly accessible, queryable, relational database that organizes and displays experimental, high throughput proteomics data for Toxoplasma gondii and Cryptosporidium parvum. Along with detailed information on mass spectrometry experiments, the database also provides antibody experimental results and analysis of functional annotations, comparative genomics, and aligned expressed sequence tag (EST) and genomic open reading frame (ORF) sequences. The database contains all available alternative gene datasets for each organism, which comprises a complete theoretical proteome for the respective organism, and all data is referenced to these sequences. The database is structured around clusters of protein sequences, which allows for the evaluation of redundancy, protein prediction discrepancies, and possible splice variants. The database can be expanded to include genomes of other organisms for which proteome-wide experimental data are available.
EPICDB is a comprehensive database of genome-wide T. gondii and C. parvum proteomics data and incorporates many features that allow for the analysis of the entire proteomes and/or annotation of specific protein sequences. EPICDB is complementary to other -genomics- databases of these organisms by offering complete mass spectrometry analysis on a comprehensive set of all available protein sequences.
PMCID: PMC2652494  PMID: 19159464
20.  REPAIRtoire—a database of DNA repair pathways 
Nucleic Acids Research  2010;39(Database issue):D788-D792.
REPAIRtoire is the first comprehensive database resource for systems biology of DNA damage and repair. The database collects and organizes the following types of information: (i) DNA damage linked to environmental mutagenic and cytotoxic agents, (ii) pathways comprising individual processes and enzymatic reactions involved in the removal of damage, (iii) proteins participating in DNA repair and (iv) diseases correlated with mutations in genes encoding DNA repair proteins. REPAIRtoire provides also links to publications and external databases. REPAIRtoire contains information about eight main DNA damage checkpoint, repair and tolerance pathways: DNA damage signaling, direct reversal repair, base excision repair, nucleotide excision repair, mismatch repair, homologous recombination repair, nonhomologous end-joining and translesion synthesis. The pathway/protein dataset is currently limited to three model organisms: Escherichia coli, Saccharomyces cerevisiae and Homo sapiens. The DNA repair and tolerance pathways are represented as graphs and in tabular form with descriptions of each repair step and corresponding proteins, and individual entries are cross-referenced to supporting literature and primary databases. REPAIRtoire can be queried by the name of pathway, protein, enzymatic complex, damage and disease. In addition, a tool for drawing custom DNA–protein complexes is available online. REPAIRtoire is freely available and can be accessed at
PMCID: PMC3013684  PMID: 21051355
21.  An Embarrassment of Riches: The Enzymology of RNA Modification 
Summary of Recent Advances
The maturation of transfer RNA (tRNA) involves extensive chemical modification of the constituent nucleosides, resulting in the formation of structurally diverse nucleosides. Many of the pathways to these modified nucleosides are characterized by chemically complex transformations, some of which are unprecedented in other areas of biology. To illustrate the scope of the field, recent progress in understanding the enzymology leading to the formation of 2 distinct classes of modified nucleosides are reviewed, the thiouridines and queuosine, a 7-deazaguanosine. In particular, recent data validating the involvement of several proposed intermediates in the formation of thiouridines are discussed, including 2 key enzyme intermediates and the activated tRNA intermediate. The discovery and mechanistic characterization of a new enzyme activity in the queuosine pathway is discussed.
PMCID: PMC2430154  PMID: 18294973
22.  The RNA modification database--1998. 
Nucleic Acids Research  1998;26(1):196-197.
The RNA modification database provides a comprehensive listing of posttranscriptionally modified nucleosides from RNA, and is maintained as an updated version of the initial printed report [Limbach,P.A., Crain,P.F. and McCloskey,J.A. (1994) Nucleic Acids Res. , 22, 2183-2196]. Information provided for each nucleoside includes: the type of RNA in which it occurs and phylogenetic distribution; common chemical name and symbol; Chemical Abstracts registry number and index name; chemical structure; initial literature citations for structural characterization or occurrence, and for chemical synthesis. The data are available through the World Wide Web at: .html
PMCID: PMC147197  PMID: 9399834
23.  PlantRNA, a database for tRNAs of photosynthetic eukaryotes 
Nucleic Acids Research  2012;41(Database issue):D273-D279.
PlantRNA database ( compiles transfer RNA (tRNA) gene sequences retrieved from fully annotated plant nuclear, plastidial and mitochondrial genomes. The set of annotated tRNA gene sequences has been manually curated for maximum quality and confidence. The novelty of this database resides in the inclusion of biological information relevant to the function of all the tRNAs entered in the library. This includes 5′- and 3′-flanking sequences, A and B box sequences, region of transcription initiation and poly(T) transcription termination stretches, tRNA intron sequences, aminoacyl-tRNA synthetases and enzymes responsible for tRNA maturation and modification. Finally, data on mitochondrial import of nuclear-encoded tRNAs as well as the bibliome for the respective tRNAs and tRNA-binding proteins are also included. The current annotation concerns complete genomes from 11 organisms: five flowering plants (Arabidopsis thaliana, Oryza sativa, Populus trichocarpa, Medicago truncatula and Brachypodium distachyon), a moss (Physcomitrella patens), two green algae (Chlamydomonas reinhardtii and Ostreococcus tauri), one glaucophyte (Cyanophora paradoxa), one brown alga (Ectocarpus siliculosus) and a pennate diatom (Phaeodactylum tricornutum). The database will be regularly updated and implemented with new plant genome annotations so as to provide extensive information on tRNA biology to the research community.
PMCID: PMC3531208  PMID: 23066098
24.  Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering 
PLoS Computational Biology  2007;3(4):e65.
The RFAM database defines families of ncRNAs by means of sequence similarities that are sufficient to establish homology. In some cases, such as microRNAs and box H/ACA snoRNAs, functional commonalities define classes of RNAs that are characterized by structural similarities, and typically consist of multiple RNA families. Recent advances in high-throughput transcriptomics and comparative genomics have produced very large sets of putative noncoding RNAs and regulatory RNA signals. For many of them, evidence for stabilizing selection acting on their secondary structures has been derived, and at least approximate models of their structures have been computed. The overwhelming majority of these hypothetical RNAs cannot be assigned to established families or classes. We present here a structure-based clustering approach that is capable of extracting putative RNA classes from genome-wide surveys for structured RNAs. The LocARNA (local alignment of RNA) tool implements a novel variant of the Sankoff algorithm that is sufficiently fast to deal with several thousand candidate sequences. The method is also robust against false positive predictions, i.e., a contamination of the input data with unstructured or nonconserved sequences. We have successfully tested the LocARNA-based clustering approach on the sequences of the RFAM-seed alignments. Furthermore, we have applied it to a previously published set of 3,332 predicted structured elements in the Ciona intestinalis genome (Missal K, Rose D, Stadler PF (2005) Noncoding RNAs in Ciona intestinalis. Bioinformatics 21 (Supplement 2): i77–i78). In addition to recovering, e.g., tRNAs as a structure-based class, the method identifies several RNA families, including microRNA and snoRNA candidates, and suggests several novel classes of ncRNAs for which to date no representative has been experimentally characterized.
Author Summary
For a long time, it was believed that the control of processes in living organisms is almost only performed by proteins. Only recently, scientists learned that a further class of molecules, namely special RNAs, plays an important role in cell control. In consequence, research on such RNAs enjoys increasing attention over the last few years. These RNAs were called noncoding RNAs (ncRNA), because, unlike most other RNAs, these molecules do not code for proteins. Due to recent research successes, one can predict a lot of potential new ncRNAs by comparing the genomes of related organisms. Technically, comparing such RNAs is challenging and computationally expensive, since related ncRNAs often show only weak similarity on the sequence level, but share similar structures. In the paper, we present the new method LocARNA for fast and accurate comparison of RNAs with respect to their sequence and structure. Using this method, we define a distance measure between pairs of ncRNAs based on sequence and structure. This is then used for combining RNAs into a cluster for identifying groups of similar RNAs in large unorganized sets of RNA. The final aim of such a comparison is to identify new classes of ncRNAs. We applied our clustering procedure to a previously published set of 3,332 predicted ncRNAs in the C. intestinalis genomes. In addition to rediscovering known classes of RNAs, e.g., tRNAs, the method predicts microRNA candidates, and suggests several novel, experimentally uncharacterized classes of ncRNAs. For verification, we clustered about 4,000 RNAs of RFAM, which is a large database that contains RNAs with an already known classification into families. Our results show good performance of the presented structure-based clustering approach.
PMCID: PMC1851984  PMID: 17432929
25.  A complete landscape of post-transcriptional modifications in mammalian mitochondrial tRNAs 
Nucleic Acids Research  2014;42(11):7346-7357.
In mammalian mitochondria, 22 species of tRNAs encoded in mitochondrial DNA play crucial roles in the translation of 13 essential subunits of the respiratory chain complexes involved in oxidative phosphorylation. Following transcription, mitochondrial tRNAs are modified by nuclear-encoded tRNA-modifying enzymes. These modifications are required for the proper functioning of mitochondrial tRNAs (mt tRNAs), and the absence of these modifications can cause pathological consequences. To date, however, the information available about these modifications has been incomplete. To address this issue, we isolated all 22 species of mt tRNAs from bovine liver and comprehensively determined the post-transcriptional modifications in each tRNA by mass spectrometry. Here, we describe the primary structures with post-transcriptional modifications of seven species of mt tRNAs which were previously uncharacterized, and provide revised information regarding base modifications in five other mt tRNAs. In the complete set of bovine mt tRNAs, we found 15 species of modified nucleosides at 118 positions (7.48% of total bases). This result provides insight into the molecular mechanisms underlying the decoding system in mammalian mitochondria and enables prediction of candidate tRNA-modifying enzymes responsible for each modification of mt tRNAs.
PMCID: PMC4066797  PMID: 24831542

Results 1-25 (1278046)