|Home | About | Journals | Submit | Contact Us | Français|
Selenocysteine (Sec or U) is encoded by UGA, a stop codon reassigned by a Sec-specific elongation factor and a distinctive RNA structure. To discover possible code variations in extant organisms we analyzed 6.4 trillion base pairs of metagenomic sequences and 24,903 microbial genomes for tRNASec species. As expected, UGA is the predominant Sec codon in use. We also found tRNASec species that recognize the stop codons UAG and UAA, and ten sense codons. Selenoprotein synthesis programmed by UAG in Geodermatophilus and Blastococcus, and by the Cys codon UGU in Aeromonas salmonicida was confirmed by metabolic labeling with 75Se or mass spectrometry. Other tRNASec species with different anticodons enabled Escherichia coli to synthesize active formate dehydrogenase H, a selenoenzyme. This illustrates the ease by which the genetic code may evolve new coding schemes, possibly aiding organisms to adapt to changing environments. Our results reveal that the genetic code is much more flexible than previously thought.
The micronutrient selenium is present in proteins in the form of the versatile 21st amino acid, selenocysteine, in which the thiol moiety of cysteine (Cys) is replaced by a selenol group . Selenoproteins are present in organisms from all domains of life ; such proteins are essential in mammalian cells , yet plants and fungi lack this amino acid. Sec is present in the active site of many redox enzymes . The codon for Sec is UGA which is normally a translational stop signal . During translation of selenoprotein mRNAs, UGA is recoded by the interaction of a specialized elongation factor SelB (in bacteria) with a downstream Sec insertion sequence [5–6]. Recently, a synthetic biology study succeeded in reassigning Sec to a large number of sense and stop codons in Escherichia coli  demonstrating that alterations to the genetic code can be tolerated. This prompted the question, whether deviations of the standard UGA Sec assignment may naturally occur.
A computational study scanning several trillion base pairs of metagenomic data revealed a large number of stop codon reassignments in bacteria and bacteriophages . This inspired us to perform a comprehensive search of the available metagenomic and microbial genomic sequence data for anticodon variants of the typical tRNASecUCA, the longest tRNA  with a tertiary structure quite different from that of canonical tRNAs . A BLAST search of the tRNADB-CE database  revealed four tRNASecCUA sequences, suggesting UAG codon recognition. Searching all public microbial genomes in the National Center for Biotechnology Information (NCBI) and all assembled metagenome data in the Integrated Microbial Genomes (IMG) system  yielded a tRNASecGCA group, indicating UGC codon recognition. We then developed a general computational pipeline that scanned ~6.4 Tb of unassembled short reads, ~180 Gb of assembled contigs (> 2 kb), and 24,903 microbial genomes in IMG (Figure 1A). The results affirmed UGA as the predominant Sec codon. In addition, 12 different tRNASec anticodon variants capable of recognizing the stop codons UAG and UAA, and 10 sense codons were discovered (Figure 1A). Further sequence validations (see Supplementary Information) ascertained these tRNASec variants not to be sequencing artifacts.
We grouped these non-canonical tRNASec species by anticodon type, sequence, and structural similarity (Figure 1A, and Supplementary Information). The largest group (anticodon CUA) contains 366 nearly identical tRNASec sequences from the actinobacterial Geodermatophilaceae family  (Figure 1A, 1B left panel), while the other 3 tRNASecCUA species may be of rhizosphere bacterial origin (Figures S1, S2). The tRNASecCUA species amounted to 3% of the total tRNASec species found in a soil metagenome (3300001205).
The next group contains tRNASec species (with anticodon GCA able to decode Cys) from Betaproteobacteria and termite gut symbionts (Figure 1A, Figure S3). Two Aeromonas salmonicida genomes contain a tRNASecACA, able to decode the UGU Cys codon (Figure 1B, middle panel). The other 16 members of this group may originate from Chloroflexi (Figure S4) and ocean bacteria (Figure S5). The 2 tRNASecUUU/CUU species (recognizing AAA/AAG Lys codons) may derive from the Solirubrobacterales (Figure S6). Additional tRNASec variants able to recognize the stop codon UAA and 6 sense codons (CGA, AGA, GGA, UUA, UCA, UGG) were also found (Figure S7).
We then wanted to confirm the coding properties of these non-canonical tRNASec species. For proof of selenoprotein synthesis three strategies were possible: (i) metabolic labeling of the organisms with 75Se, (ii) replacing parts of the E. coli selenoprotein synthesis machinery with genes and tRNASec from our genomic or metagenomic findings, and (iii) replacing E. coli tRNASec with the newly discovered tRNASec species. In the last two strategies E. coli formate dehydrogenase H (FDHH, encoded by the fdhF gene) would serve as reporter .
To confirm UAG-directed Sec incorporation we grew Geodermatophilus obscurus G-20 and Blastococcus saxobsidens cells in the presence of [75Se]selenite and detected radiolabeled selenoproteins of 140 and 50 kDa size. The genome sequences predict formate dehydrogenases (FDHs) (Figure S8, Tables S1 & S2) and UGSC-motif proteins , (Figures S1, S8, Table S3). This was the first indication for UAG read-through by Sec to form the expected FDH (FdxG) and UGSC-motif protein products (Figure 2A). Crude cell extracts were then resolved by SDS-PAGE; the proteins in the gel slices corresponding to 140 and 50 kDa were trypsinized for subsequent liquid chromatography (LC) coupled with tandem mass spectrometry (LC-MS/MS) analysis. The 140 kDa gel slices from G. obscurus and B. saxobsidens harbored full-length FdxG, while the 50 kDa gel slice from G. obscurus contained a UGSC-motif protein (Figure 2A & Table S4).
Like E. coli, the G. obscurus genome [13b] encodes a Sec incorporation machinery consisting of the selA, selB, selC and selD genes. In bacteria, SelD produces the Se donor selenophosphate, SelA converts Ser-tRNASec (the tRNA is the product of the selC gene) to Sec-tRNASec, and SelB carries Sec-tRNASec to the ribosome in a SECIS-dependent manner. To test their functionality, the E. coli ΔselABC ΔfdhF strain ME6 was complemented with the G. obscurus selABC genes. The product of the E. coli fdhF gene is the selenoenzyme FDHH (Table S1) whose activity requires Sec at position 140 , if replaced by Cys the activity drops 300-fold . FDHH is readily detected by the reduction of benzyl viologen resulting in a purple color . To serve as a reporter, the plasmid-encoded E. coli fdhF gene transformed into strain ME6 was modified to have a TAG codon in position 140, followed by a G. obscurus-type SECIS element leading to an FDHH variant with two amino acid changes (see Figure 2B). Expression of the G. obscurus selABC genes in this modified strain produced active FDHH (Figure 2B). These data confirm that in this E. coli FDHH variant the UAG140 codon is recoded to Sec.
UGU (Cys) recoding was confirmed in Aeromonas salmonicida subsp. pectinolytica 34mel, the type strain of the γ-proteobacterium A. salmonicida subspecies . Unlike other Aeromonas species the pectinolytica subspecies and strain Y577  pair tRNASecACA with a UGU Cys codon in fdhF. In addition to the in vivo FDH activity in pectinolytica cells (Figure 2C), their anaerobic metabolic labeling with 75Se produced radioactive Sec-containing FDHH (Figure 2D). We confirmed Sec incorporation encoded by UGU140 in FDHH by overexpressing the protein from a plasmid (Figure 2E) and LC-MS/MS analysis (Table S4). The recoded selenopeptide (LC retention time 24.82) had the correct mass (Figure S9) and the appropriate secondary fragmentation pattern (Figure 2F). The co-eluting Cys- and Sec-peptides were detected through their different masses with ion intensities of ~100:1, respectively (Figure S9A). The mass peaks of the Sec-peptides were absent in protein samples obtained from pectinolytica cells expressing a fdhF variant lacking the SECIS element (Table S4). Thus, the Cys140 codon of the fdhF gene is translated as Sec in a SECIS-dependent manner in A. pectinolytica 34mel.
UGC Cys→Sec recoding by tRNASecGCA may be a common trait in the Burkholderiales. In some long metagenomic Burkholderiales contig the selC gene is flanked by selB and selA genes, and the selABC operon is located next to a formate dehydrogenase (fdoG) gene whose active site UGC Cys codon is followed by a putative SECIS element (Figures 1B right panel & 2G). The E. coli ΔselABC strain was complemented with the Burkholderiales contig selAB and selC-opal variant, and an E. coli fdhF variant harboring the Burkholderiales contig SECIS element. This strain produced active FDHH (purple color) (Figure 2H, the 2nd row), but an inactive SECIS element (with a G25C mutation) did not form FDHH and the cells were colorless (Figure 2H, the 4th row). In combination with the Burkholderiales contig selC, we changed UGA140 to UGC for the chimeric fdhF variants that carried functional or inactive SECIS elements. As the FDHH Cys140 enzyme produced a purple color (Figure 2H, the 1st & 3rd rows), 75Se-labeling was used to demonstrate that the functional SECIS element led to a clear signal (Figure 2H). Thus, the Burkholderiales contig selA, selB, tRNASecGCA, and SECIS enabled UGC-recoding in E. coli.
The metagenomic tRNASec variants that recognize other stop and sense codons were also tested for Sec reassignment. We selected one representative metagenomic tRNASec species for each anticodon type and expressed them in an E. coli ΔselC ΔfdhF strain, together with the E. coli fdhF variants that carry the proper cognate codons at position 140 . Surprisingly, all but tRNASecUCC of the tested tRNASec species recoded the respective codons for Sec, as they supported the expression of active FDHH in their host E. coli cells (Figure S10). It should be mentioned that GGA was also poorly recoded in our earlier Sec recoding strategy . The different recoding efficiencies may result from distortions of the ideal SECIS element structure by the nature of the upstream codon . In light of these results we believe that these tRNASec species may be used for recoding sense codons in the organisms they originate from.
What about eukaryotic organisms? Although we found 9 tRNASec variants of algal origin (2 are shown in Figure S7), they need further validation, because they are almost identical to canonical tRNASec species. A similar search of 92 mammalian genomes (215 Gbp) and of the Drosophila melanogaster genome (139 Mbp) showed no exception to the use of UGA as the Sec codon. Whether this is related to the necessity of selenoproteins in high-level redox signaling pathways  or due to the sophisticated backup systems  remains to be investigated. However, in the lower eukaryote Euplotes crassus UGA serves both as a Cys and also as a SECIS-dependent Sec codon .
Natural reassignment of sense codons has not been seen in bacteria. But it is known in mitochondrial genomes, where a particular codon lost its original assignment and now leads to insertion of another amino acid . Our case here is different; Sec insertion is mediated by a SECIS element and thus gives rise to dual use of the codon for another amino acid (through pairing with tRNASec variants carrying the proper anticodon).
What might account for this facile recoding to Sec? It is pertinent to note that Sec incorporation is different from that of all other amino acids; it is facilitated by its own ‘orthogonal’ system  consisting of a different elongation factor (SelB), a required SECIS RNA element, a structurally unusual tRNA (tRNASec) , a dual meaning stop codon (UGA), and the use of release factor 2. Therefore, Sec recoding events may not have as general an effect on the protein translation machinery as one might expect from recoding canonical sense codons . The ease of Cys to Sec recoding may be a consequence of the often desirable properties of selenoenzymes and selenoproteins with novel redox functions and increased enzyme activity [4, 26], while still allowing the expression of useful Cys-proteins and Cys-enzymes. Our finding of facile Sec recoding also opens our minds to the possible existence of other coding schemes. It also underscores the limitations of the current computational programs to predict selenoproteins from genome sequences, as these algorithms rest on UGA as the sole Sec codon.
Overall our approach provides new evidence of a limited but unequivocal plasticity of the genetic code whose secrets still lie hidden in the majority of unsequenced organisms.
We thank Hans Aerni and Jesse Rinehart for advice on LC-MS/MS and Jean Kanyo (Yale University) for the dedicated efforts on the MS analyses. We also thank Andreas Brune, Filipa Gody-Vitorino, Hans-Peter Klenk, Ryan Lynch, Katherine McMahon, Daniel Marcus, William Mohn, Len Pennacchio, and Ameet Pinto for permission to use unpublished sequence data produced through the DOE-JGI’s community sequencing program. We are grateful to Patrick O’Donoghue, Oscar Vargas-Rodriguez, Jiqiang Ling for enlightened discussions and Daniel Drell and Robert Stack for encouragement. This work was supported by grants from the National Institute for General Medical Sciences (GM22854 to D.S.) and from the Division of Chemical Sciences, Geosciences, and Biosciences, Office of Basic Energy Sciences of the Department of Energy (DE-FG02-98ER20311 to D.S.; for funding the genetic experiments). The work conducted by the U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility, was supported under Contract No. DE-AC02-05CH11231.
Supporting information for this article is given via a link at the end of the document.
Dr. Takahito Mukai, Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520 (USA)
Dr. Markus Englert, Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520 (USA)
Dr. H. James Tripp, Department of Energy Joint Genome Institute (DOE JGI), Walnut Creek, CA 94598 (USA)
Dr. Corwin Miller, Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520 (USA)
Dr. Natalia N. Ivanova, Department of Energy Joint Genome Institute (DOE JGI), Walnut Creek, CA 94598 (USA)
Dr. Edward M. Rubin, Department of Energy Joint Genome Institute (DOE JGI), Walnut Creek, CA 94598 (USA)
Dr. Nikos C. Kyrpides, Department of Energy Joint Genome Institute (DOE JGI), Walnut Creek, CA 94598 (USA)
Prof. Dieter Söll, Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520 (USA). Department of Chemistry, Yale University, New Haven, CT 06520 (USA)