|Home | About | Journals | Submit | Contact Us | Français|
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact firstname.lastname@example.org
Maintained at the University of Texas Health Science Center at Tyler, Texas, the tmRNA database (tmRDB) is accessible at the URL http://psyche.uthct.edu/dbs/tmRDB/tmRDB.html with mirror sites located at Auburn University, Auburn, Alabama (http://www.ag.auburn.edu/mirror/tmRDB/) and the Royal Veterinary and Agricultural University, Denmark (http://tmrdb.kvl.dk/). The signal recognition particle database (SRPDB) at http://psyche.uthct.edu/dbs/SRPDB/SRPDB.html is mirrored at http://srpdb.kvl.dk/ and the University of Goteborg (http://bio.lundberg.gu.se/dbs/SRPDB/SRPDB.html). The databases assist in investigations of the tmRNP (a ribonucleoprotein complex which liberates stalled bacterial ribosomes) and the SRP (a particle which recognizes signal sequences and directs secretory proteins to cell membranes). The curated tmRNA and SRP RNA alignments consider base pairs supported by comparative sequence analysis. Also shown are alignments of the tmRNA-associated proteins SmpB, ribosomal protein S1, alanyl-tRNA synthetase and Elongation Factor Tu, as well as the SRP proteins SRP9, SRP14, SRP19, SRP21, SRP54 (Ffh), SRP68, SRP72, cpSRP43, Flhf, SRP receptor (alpha) and SRP receptor (beta). All alignments can be easily examined using a new exploratory browser. The databases provide links to high-resolution structures and serve as depositories for structures obtained by molecular modeling.
Ribosomes extend their repertoire of functions by binding to additional ribonucleoprotein particles (RNPs) that can determine the fate of the protein as it emerges from the large ribosomal subunit. Two such complexes are the transfer-messenger RNP (tmRNP) and the signal recognition particle (SRP). The tmRNP, composed of the tmRNA, small protein B (SmpB) and ribosomal protein S1, rescues bacterial ribosomes stalled on faulty mRNAs. The potentially damaging polypeptides are tagged with a short peptide, released from the ribosome and destroyed by intracellular proteases [reviewed in (1)]. Similarly, the SRP binds to emerging signal sequences and directs secretory protein to cellular membranes [recently reviewed in (2)]. The investigations of tmRNP and SRP combined with the knowledge gained from the high-resolution structures of the ribosome (3–5) have contributed significantly to our understanding of protein translation and translocation, but many questions remain to be answered. To assist in the ongoing studies, the updated tmRDB and SRPDB resources offer detailed descriptions of the biological roles of tmRNP and SRP, ordered lists of the components and links to high-resolution structures. Alignment-derived RNA secondary structures are supported by comparative sequence analysis. A new browser allows the user to easily explore the alignments.
New tmRNA sequences provided at the tmRNA website (6) were merged with the previous tmRNA alignment (7). New SRP RNAs were identified using SRPscan (8) or combinations of BLAST (9), RNABOB (Eddy, unpublished data) and Infernal (10) with secondary structure predictions by MFOLD (11). The sequences were placed in phylogenetic order guided by NCBI Taxonomy (12,13). Sequences were aligned automatically with CLUSTAL (14) or manually using BioEdit (15) observing the previously described covariation rules (16). The RNA editor SARSE (A. Lind-Thomsen et al., 2005, manuscript in preparation) was used for semi-automated cleanup of the alignments. RNAdbtools (17) was applied to confirm compensatory base changes, check base pairing consistencies and possible RNA helix extensions. Pfold (18) was used to predict the secondary structure of subgroups of the alignment.
Protein sequences were identified in GenBank (13) using BLAST (9) with a subset of representative sequences from the previous versions of tmRDB (7) and SRPDB (19) as queries. The output was examined manually to generate a set of unique sequences for each protein family. Sequences were aligned using Jalview (20), CLUSTAL (14) and MUSCLE (21).
The alignments can be viewed, zoomed and scrolled in a www-browser under development for genomes by the Danish Genome Institute (also directly accessible at http://www.genomics.dk:8000/RNA). It currently features basic navigation, with color-dot, grey-dot, character display and zoom to any level. More features will be added.
The tmRDB contains a total of 555 tmRNA sequences in the range of 250–434 nt. Because of the continuous rapid emergence of new sequences this dataset is not complete but nevertheless representative. [The tmRNA website (6) can be consulted for the most recent new tmRNA sequences.] All bacterial groups, including the Alphaproteobacteria (55 sequences) previously thought to lack tmRNA, contained tmRNA genes. Consistent with the evolutionary relationship between bacteria and organelles, tmRNAs were found in most of the chloroplast and mitochondrial genomes. However, tmRNA genes were lacking in the chloroplasts of higher plants. Interestingly, tmRNAs could be identified in the genomes of certain bacteriophages.
Most tmRNAs were composed of one continuous molecule. Less frequently, tmRNAs were encoded in the DNA in two sections which, when transcribed, are expected to fold into a tmRNA-like configuration. These two-part tmRNAs were found in the genomes of most Alphaproteobacteria, as well as in some Cyanobacteria and Betaproteobacteria (Table 1). The appearance of this adaptation in these distinct phylogenetic groups suggested that two-part tmRNAs arose in evolution three times independently (22). No tmRNA genes were identified in the archaea or the nuclear genomes of the eukarya.
The tmRNA sequences were aligned using comparative sequence analysis as described previously for SRP RNA (16).An outline of the secondary structure of Escherichia coli tmRNA is depicted in Figure 1A. Shown are the tRNA-like domain (TLD), the messenger RNA-like domain (MLD), and the pseudoknot (pk) domain (PKD). Modification to the E.coli reference structure includes the reduction or deletion of pseudoknots, the appearance of new helices (e.g. in pk2 of Betaproteobacteria) and structural replacements, e.g. the change of pk4 into two tandem pseudoknots (see diagram b in Supplementary Data 1). The phylogenetic distribution of the features is summarized in Table 1.
A cluster of hydrophobic amino acids at the C-terminus and a variable length of 8–35 amino acids characterized the 539 tmRNA-encoded tag-peptides. Alanine or glycine were the most frequent resume codons. Tag peptide sequences have been experimentally confirmed for E.coli and Bacillus subtilis.
SmpB. This protein is an essential trans-translational co-factor (23) and is present in all bacteria. The protein forms quaternary complexes with aminoacylated tmRNA, EF-Tu and GTP (24). SmpB mutants which lack the C-terminal tail of the protein bind to ribosomes but are unable to tag the truncated proteins (25).
Ribosomal protein S1. This protein contains up to six related domains. The protein binds and cross-links to the MLD and pk2–pk4. The NMR structure of a single protein S1 RNA-binding domain of E.coli has been determined (26), but little is known about the arrangement of full-length protein S1 during trans-translation. The alignment suggested four groups of sequences which differed in the number of domains. Overall, domains four, five and six were less conserved and absent in some of the S1 homologues. The protein S1 sequences of Candidatus Tremblaya princeps and Clostridium acetobutylicum ATCC 824 were distinct with respect to their low levels of homology to any other aligned sequence.
Alanyl-tRNA synthetase. Aminoacylation of tmRNA constitutes a prerequisite step in trans-translation, since uncharged tmRNA mutants do not bind to 70S ribosomes in vivo (27). Studies carried out in vitro demonstrated that the aminoacyl moiety can be changed without affecting the ability of the tmRNA to participate in protein tagging. The majority of the tmRNAs are expected to be charged with alanine because they posses in their acceptor stem a G-U basepair as the critical determinant for aminoacylation with alanyl-tRNA synthetase.
EF-Tu. Elongation factor Tu, found in all organisms, forms a ternary complex with GTP and Ala-tmRNA in vitro. EF-Tu primarily interacts with the acceptor arm of the tRNA-like domain of tmRNA (24). Although Ala-tmRNA has a lower association rate constant for the EF-Tu GTP complex than Ala-tRNA, chemical and enzymatic footprinting indicate that the architecture of this complex closely resembles canonical ternary complexes.
A description of the phylogenetic distribution of the secondary structural features of tmRNA based on an alignment of 274 sequences was provided recently (28). From the analysis of 555 sequences the following insights into tmRNA phylogeny were obtained: (i) most tmRNAs consist of a single polynucleotide chain with a TLD, a relatively unstructured MLD, and a variable number of pseudoknots. (ii) The variability of the predicted pseudoknot structures suggests a preservation of RNA folding without the need for sequence conservation. (iii) In the Alphaproteobacteria, some Betaproteobacteria, and some Cyanobacteria, the tmRNAs are composed of two chains. These two-piece tmRNAs contain fewer pseudoknots than the typical one-piece tmRNAs. (iv) Plastids contain one-piece tmRNAs with a reduced number of pseudoknots. (v) Most mitochondria may be devoid of trans-translation because they lack SmpB and contain only very short two-piece tmRNAs which appear to have lost the MLD. Examples of tmRNA secondary structure diagrams are shown in Supplementary Data 1.
A total of 393 SRP RNAs were identified using the procedures described in Materials and Methods. SRP RNA genes were found to be present in all major phylogenetic groups as well as the photosynthetic plastids of red algal origin (except the substantially smaller plastid of the haptophyte Emiliania huxleyi) and the chloroplasts of some green algae (29). More than one variant were found in 33 organisms. Many novel SRP RNA sequences were found to add to our knowledge of the phylogenetic distribution of the secondary structure features (Table 2).
An overview of the SRP RNA secondary structure elements was presented in a recent nomenclature proposal (30) similar to what is shown in Figure 1B. Several new sequences, e.g. from Eremothecium gossypii, Kluyveromyces waltii and Kluyveromyces lactis, provided additional support for the proposed helices. In the Onygenales group within Pezizomycotina (Histoplasma and four other species), we found a new helix (‘extra’ helix E in Figure 1 and Table 2) located toward the 5′ end of helix 6. The phylogenetic distribution of all helices is indicated in Table 2. Representative SRP RNA secondary structure diagrams are shown in Supplementary Data 2.
Most bacteria, including certain chloroplasts, contained a small SRP RNA of 60–115 nt consisting solely of helix 8. The conserved apical tetraloop of this helix typically had the consensus sequence GNRA, with rare G to U mutations in the first position, but occasionally an URRC (8). In some gram-positive bacteria (Bacillales and Clostridia groups) and the deeply-branching gram-negative bacteria Thermotoga maritima, the SRP RNA was of the archaeal type but lacked helix 6. Several of these SRP RNAs, as well as some archaeal SRP RNAs, had a non-consensus UGUNR motif (UAUNR, UAUN or CNNNR). In certain Chrenarcheota (Aeropyrum pernix) this part seemed to be extended, perhaps forming a helix. The apical loop of the highly conserved helix 8 consisted of 4 nt in most organisms. Plants and certain fungi, however, possessed six nucleotides in this loop. Recently, we found that Trichomonas, Phytophthora, and Entamoeba have a pentaloop with the consensus sequence G[AT][AT]AA.
The eukaryal SRP RNA was highly variable, particularly with respect to the small (Alu) domain (see Table 2 and Supplementary Data 2). Secondary structure models were presented for the Saccharomyces SRP RNAs (31,32). These models showed that helices 3 and 4 were missing, whereas helices 9–12 had been acquired. The SRP RNA secondary structures of the non-Ascomycota fungi Phakopsora and Rhizopus differed from the Ascomycota and were similar to the metazoan SRP RNAs. In Diplomonads and Microsporidia, the small domain seemed to have disappeared to leave an SRP RNA composed only of the large (S) domain.
SRP9, SRP14 and SRP21. A total of 24 SRP9 protein sequences were identified: 16 sequences from the Metazoa, one each from Dictyostelium discoideum and Entamoeba histolytica, three plant and three from the Alveolata group. SRP14 (a total of 33 sequences) was found in all of the Eukarya examined, including the Fungi. Both SRP9 and SRP14 were absent in Bacteria, Archaea and some eukaryal groups. SRP21 sequences were identified in 12 fungal genomes. Evidence was provided that the metazoan SRP9 is homologous to the fungal SRP21 (31). This finding was consistent with the finding that a gradual evolutionary change from SRP9 to SRP21 had occurred with Pezizomycotina and Schizosaccharomyces pombe representing intermediates. However, further studies are required to clarify the functional role of SRP21 in fungi.
SRP19. Protein SRP19 was found in all the examined Eukarya and Archaea. The presence of SRP19 correlated strongly with the appearance of SRP RNA helix 6, thus confirming the important role of SRP19 in the assembly of the large (S) domain (33).
SRP54, also referred to in Bacteria as Ffh (fifty-four homologue), contains the signal sequence binding pocket (34) and thus is likely to be an essential component of every SRP. The SRPDB lists 115 sequences from all phylogenetic groups. We identified homologues to the chloroplast Ffh, cpSRP54, in Arabidopsis, Pisum, Chlamydomonas and Cyanidioschyzon merolae.
SRP68 and SRP72. A total of 31 SRP68 and 34 SRP72 sequences from the Fungi, Metazoa, Mycetozoa, Plants, Alveolata and Euglenozoa groups were found. Homologues of these proteins were not identified in the Bacteria and Archaea. Both the proteins are known to form a heterodimer within the large domain of the mammalian SRP, but relatively little is known about their structure. The SRP72 alignment revealed a new lysine-rich domain, originally identified as Pfam B 7529, which will be added to Pfam (35). A corresponding peptide of 63 amino acids located near the C-terminus of human SRP72 with the consensus PDPXRWLPXXER was shown to bind to SRP RNA with high affinity (36).
cpSRP43. It is a unique nuclear encoded protein and part of the post-translational SRP found only in chloroplasts. The protein binds to polypeptides destined for the thylakoid membrane. cpSRP43 contains four ankyrin repeats at the N-terminus and two chromodomains at the C-terminus. It forms a complex with cpSRP54 via its chromodomains (37).
SRP Receptor (alpha) (FtsY). The SRP receptor is a single polypeptide (FtsY) in the Bacteria and Archaea. In Eukaryotes, the SRP receptor is composed of two subunits, alpha and beta. The alpha subunit is related to FtsY and to SRP54 (Ffh) due to their GTPase domain similarity. Unique to the alpha subunit of the SRP receptor (FtsY) is an N-terminal A-region which is thought to be responsible for interacting with the membrane or the beta subunit [reviewed in (2)].
SRP Receptor (beta) was found in all Eukaryotes including the Fungi. The protein is characterized by a transmembrane anchor and binds to the alpha subunit of the receptor. Like SRP54 (Ffh), the beta subunit also contains a GTPase domain.
FlhF. This protein was characterized first as a flagellar gene from B.subtilis. It belongs to the same family of GTP-binding proteins as Ffh and FtsY (38) suggesting a role in SRP function. However, FlhF was shown recently to be dispensable for protein secretion (39).
An extensive inventory of SRP RNA and protein components has allowed us to arrive at a comprehensive view of SRP phylogeny (Table 2). Essential elements include (i) the development of an altered Alu domain in the Ascomycota lacking helices 3 and 4, accompanied by the appearance of protein SRP21, (ii) the emergence of the more complex Saccharomyces SRP RNAs with multiple insertions, (iii) the retention of a metazoan-type SRP in the Basidiomycota, (iv) the appearance of eukaryotic SRPs that lack the typical mammalian SRP proteins or the small (Alu) domain, (v) the presence of a much reduced SRP in bacteria and chloroplasts composed of only one protein (Ffh) and a small RNA that seems to be absent in the chloroplasts of higher plants and (vi) the conservation of the composition and secondary structure of the archaeal SRP.
Exploring RNA and protein alignments has become increasingly difficult with the growing number of sequences. We have implemented and continue to develop a browser which allows to display alignments at various zoom levels like a map. The user can explore and see more clearly the species- and group-specific differences. Further improvements in the quality of the alignments can be expected. Overall, these advances will lead to a better understanding not only of trans-translation and co-translational protein translocation, but also of the functional potential of the ribosome.
The data are freely accessible for research purposes at the internet addresses http://psyche.uthct.edu/dbs/tmRDB/tmRDB.html and http://psyche.uthct.edu/dbs/SRPDB/SRPDB.html or at the corresponding mirror sites provided in the Abstract. This article should be cited in research projects which use the tmRDB and SRPDB resources.
Supplementary Data are available at NAR Online.
We thank Jorgen Kjems for RNA expertise and support, Allan Lind-Thomsen for assistance with SARSE and Florian Müller for the ERNA-3D modeling program. E.S.A. is supported by the Interdisciplinary Nanoscience Center (iNANO) of the University of Aarhus. M.A.R. is supported by the SWEGENE consortium. J.G. is supported by the Danish Research Council for Technology and Production Sciences and the Danish Center for Scientific Computing. This work was also supported by NIH grants GM-58267 to J.W. and GM-49034 to C.Z. Funding to pay the Open Access publication charges for this article was provided by NIH grant GM-49034 to C.Z.
Conflict of interest statement. None declared.