MODOMICS is a relational database that links together the datasets of modification reactions/pathways, RNA-modifying enzymes and the sequences of target RNAs, which can be queried via three convenient menus, ‘PATHWAYS’, ‘ENZYMES’ and ‘RNAs’. The modification pathways are represented as a set of graphs visualized with GRAPHVIZ (
http://www.graphviz.org/), in which the nodes represent the modified nucleosides, and the edges represent the transformations (real or putative) between them, e.g. enzymatic reactions. Currently, the pathway dataset comprises five graphs, corresponding to all known modifications of adenosine (A), cytidine (C), guanosine (G), uridine (U) and the queuosine pathway (Q), of which the graph of U modifications is clearly most diversified. The ‘PATHWAYS’ menu offers a variety of filtering options to display a whole graph or its fragment corresponding to reactions that occur in a particular kingdom of life (Eukaryota, Archaea, Bacteria or viruses) or an organellum (currently only mitochondrial modifications are supported), in a particular subset or RNAs (tRNAs, rRNAs, mRNAs, small RNAs or chromosomal RNAs), and hypermodifications that result from processing of an already modified nucleoside (e.g. I and its derivatives, Ψ and its derivatives, etc.). All nodes of the (sub)graph, e.g. the images of all nucleosides, are hyperlinked to dynamically generated windows comprising two panels (example shown in ). The upper panel displays basic information about the selected nucleoside, including its common name, one letter code in the tRNA sequence database, symbol used in MODOMICS and the chemical structure in the SMILES code hyperlinked to the corresponding image generated using the SMI2GIF script kindly provided by the Daylight Chemical Information Systems Inc (
http://www.daylight.com/). The lower panel includes a subgraph of reactions leading to the selected nucleoside from its precursor(s) (unless the selected nucleoside is the precursor itself), and all known hypermodifications formed from that nucleoside.
All edges of the (sub)graph, e.g. arrows that connect the images of nucleosides, are hyperlinked to static ‘reaction’ windows comprising one or more panels (example shown in ). The upper panel displays basic information about the selected reaction, namely the type of RNA in which it is conducted and its occurrence in the phylogenetic context (inferred from the knowledge of the phylogenetic distribution of the substrate and the product nucleoside). Other panels display information about enzymes known to catalyze the selected reaction in different substrates and in different organisms. Currently, the database of enzymes includes only proteins from
E.coli and
S.cerevisiae, but will be expanded in the future and may eventually comprise all orthologs of the functionally characterized enzymes identifiable in fully sequenced genomes. At this moment, however, MODOMICS includes only enzymes and corresponding protein cofactors that have been experimentally validated, of which most (but not all) have known amino acid sequences. The enzyme panels are also directly accessible from the ‘ENZYMES’ menu, which lists all entries of the current database in the tabular form, which can be sorted by the name(s) (e.g. Trm1, Dus2, MnmC, etc.), enzyme type (e.g. methyltransferase, deaminase, etc.) or the organism of origin. The enzyme database can be also searched by identifying entries, whose fields match the query formulated as a regular expression (e.g. ‘Trm*’, ‘*pus*’, ‘m1*’ or ‘*transferase’, etc.). The enzyme names include both the traditional one(s), which have been quite erratically inferred from different characteristics of the gene, the protein, the organism, the type of the reaction, or even the whole pathway (e.g. Tgs1p, Abd1p, Gar1p, Nop1p, MnmA, GidA, etc.), as well as a novel name given according to the newly developed, uniform nomenclature (H.G. and J.M.B., manuscript in preparation). The enzyme panel lists all relevant types of reactions between nucleosides in different substrates and provides the EC numbers (if available) and links to the corresponding entries in the BRENDA database (
7). Other information about the enzyme concerns the sequence (if available) and includes the name of the open reading frame in the genome, the NCBI GenPept (
10) and SwissProt (
11) accession numbers, the experimentally solved structure in the PDB (
12) (in the future it will also include experimentally validated computational models) and the literature information concerning experimental characterization of the protein, identification of the gene, and possibly a structural analysis, together with links to the relevant database entries.
The ‘RNAs’ menu allows displaying multiple sequence alignments of homologous RNA families, in which the modifications have been observed. Currently, this dataset is complementary to the enzyme dataset and includes only tRNA sequences from
S.cerevisiae (both cytoplasmic and mitochondrial) and
E.coli, and the corresponding unmodified tDNA sequences. In the future it will be extended to manually validated tRNA sequences from other model organisms, for which the tRNA data are available (
3), as well as to tDNA sequences predicted from the fully sequenced genomes [e.g. the tRNomics dataset from ref. (
13)]. It will also include other RNA sequences with identified modifications, such as rRNAs and various small non-coding RNAs. The content of the currently available tRNA sequence alignment can be filtered according to several options, including the organism and strain or taxon of origin, anticodon and amino acid specificity.