|Home | About | Journals | Submit | Contact Us | Français|
Recoding a stop codon to an amino acid may afford orthogonal genetic systems for biosynthesizing new protein and organism properties. Although reassignment of stop codons has been found in extant organisms, a model organism is lacking to investigate the reassignment process and to direct code evolution. Complete reassignment of a stop codon is precluded by release factors (RFs), which recognize stop codons to terminate translation. Here we discovered that RF1 could be unconditionally knocked out from various Escherichia coli stains, demonstrating that the reportedly essential RF1 is generally dispensable for the E. coli species. The apparent essentiality of RF1 was found to be caused by the inefficiency of a mutant RF2 in terminating all UAA stop codons; a wild type RF2 was sufficient for RF1 knockout. The RF1-knockout strains were autonomous and unambiguously reassigned UAG to encode natural or unnatural amino acids (Uaas) at multiple sites, affording a previously unavailable model for studying code evolution and a unique host for exploiting Uaas to evolve new biological functions.
Although the canonical genetic code is preserved in almost all organisms, small deviations have been discovered, including the reassignment of sense codons from one amino acid to another and the reassignment between stop and sense codons.1,2 Stop codons are decoded by class I release factors (RFs).3 Whereas eukaryotes and archaea use a single RF to recognize all three stop codons,4,5 bacteria use two: RF1 is specific for UAA/UAG, and RF2 is specific for UAA/UGA.6 It is unknown why there are two class I RFs in bacteria while a single class I RF is sufficient for organisms from the other two domains. The process for stop codon reassignment and its potential association with RF evolution are also unclear. Natural code evolution occurs over millions of years. Extant organisms harboring altered genetic codes are at the end-point of the code evolution. There are no records of the initial causes of or responses to such altered genetic codes; further adaptations to and details of eventual fixation are completely unknown. To enable in-depth investigation of code change and any concurrent cellular adaptations in real time, it is necessary to generate a model organism that is able to undergo such evolutionary processes in the laboratory.
Synthetically recoding a genome may afford new properties to the organism through encoding unnatural amino acids (Uaas) and preventing cross-contamination with wild type life forms.7 For successful genome recoding, the target codon must be reassigned to the new meaning in high efficiency and without ambiguity. An attractive route is to reassign the UAG stop codon to a Uaa in bacteria. Orthogonal tRNA/synthetase pairs have been engineered to incorporate Uaas into proteins in response to UAG,8−12 yet the presence of RF1 makes the meaning of UAG ambiguous, being a stop signal and a Uaa simultaneously. RF1 competition limits the incorporation of Uaas at a single UAG site with low efficiency; the addition of even a second UAG codon decreases protein yields precipitously.13 Although Uaas can be incorporated at more than one site into a protein,14,15 such low UAG-encoding efficiency prevents effective use of Uaas at multiple sites to explore novel protein and organismal properties through directed evolution. In addition, the ambiguity of the UAG codon may hinder the eventual fixation of an altered genetic code to an organism, because protein products truncated at the UAG sites can interfere with normal protein functions and have detrimental effects on the host cell, thus preventing advantageous coding from being inherited and selected in directed evolution. To exclude Uaa incorporation at legitimate termination sites specified by UAG, endogenous UAG codons in the E. coli genome can be replaced with a synonymous UAA stop codon through genome engineering.16 However, for complete reassignment of UAG to a sense codon, a necessary and critical step is to knock out RF1 from the E. coli genome.
In some eukaryotes such as ciliates and green algae, the reassignment of a stop codon to a sense codon is accompanied by convergent changes in eRF1.2 For instance, the eRF1 of Tetrahymena restricts its recognition to UGA, and UAA/UAG are reassigned to Gln; the eRF1 of Euplotes recognizes only UAA/UAG as stop codons, and UGA is used to encode Cys.17,18 In bacteria, Mycoplasma species have lost the RF2 gene, and the UGA codon encodes Trp instead.19 However, Mycoplasma are obligatory pathogens with highly reduced genomes. To date, no free-living bacterium has been found lacking either RF1 or RF2.19 For E. coli, RF1 has been reported to be essential,20 and only conditionally lethal knockouts have been described.21,22 Recently we managed to knock out RF1 from a special E. coli strain that has a reduced genome and a mutated RF2 gene.13 Herein we discovered that the dispensability of RF1 is a general property of wild type E. coli, arguing against the paradigm that RF1 is essential. We revealed the underlying mechanism for preventing RF1 knockout, and generated autonomous RF1 deletion strains valuable for studying code evolution and for genome recoding.
We attempted to replace the RF1-encoding gene, prfA, with the chloramphenicol acetyltransferase gene in a variety of E. coli strains (Figure (Figure1)1) using the established λ red recombinase-based homologous recombination method.23E. coli K-12 and B are the two progenitors from which most E. coli strains are derived.24 Reported attempts to knockout RF1 have used only E. coli K-12 strains and were unsuccessful.20,25 We initially attempted to delete RF1 in the three common K-12 strains MG1655, DH10β, and HT115. Knockout of RF1 was assayed by genomic PCR amplifying the prfA locus (Figure (Figure1A).1A). Consistent with previous reports, we could not knock out RF1 in any of these K-12 strains (Figure (Figure11B).
We next attempted the RF1 knockout with two other K-12 strains that contain alterations related to translational termination. The BP5α strain harbors a glutaminyl amber suppressor tRNA, making the UAG coding ambiguous for either a stop signal or Gln. This strain would test if stop codon ambiguity is a factor to prime RF1 removal. The MDS42 strain has nearly 700 nonessential genes deleted26 and was used to assess if a reduced genome size and termination load could assist with RF1 removal. However, no RF1 knockouts were generated from these two strains either, suggesting that amber suppression and a minimal genome are insufficient for RF1 removal in K-12 derivatives.
All K-12 strains contain a peculiar A246T mutation in the RF2-encoding gene prfB, lowering the release activity of RF2 for UAA 5-fold.27,28 The UAA codon accounts for the termination of ~64% of E. coli genes.29 We reasoned that the A246T mutation might severely impair the ability of RF2 to recognize all UAA stop codons upon removal of RF1 and thus prevent the RF1 knockout. We therefore tested three common E. coli B strains because B strains encode wild type RF2 that contains Ala246.28 Indeed, RF1 knockout was successful with all three B strains tested; REL606, BL21, and BL21(DE3) were used to generate the knockout strains CW1.0, CW2.0, and JX1.0, respectively (Figure (Figure1A). In1A). In addition, to determine if the A246T mutation in RF2 prevents RF1 removal in K-12 strains, we reverted the A246T mutation to Ala in the K-12 strain DH10β to generate the DH10βf strain. This strain also permitted the direct knockout of RF1 (Figure (Figure1B).1B). These results indicate that RF1 is not essential in E. coli, in contrast to the previous conclusions.20,25
Full genomic sequencing was performed on RF1 knockout strains JX1.0, CW1.0, and CW2.0 and compared to their respective parental strains.30,31 The RF1 deletion was verified in all cases. For CW1.0 and CW2.0, no other mutations were found throughout the genome. For JX1.0, only seven additional single nucleotide polymorphisms (SNPs) were found (Supporting Table S1). Six of these SNPs are silent mutations in two genes of phage origin. None of the SNPs correspond to known mutations that complement an RF1 deficiency.32−35 These results indicate that RF1 was knocked out from the parental E. coli strains without incurring compensatory mutations.
What would happen to the UAG site during protein translation in the absence of RF1? We mutated one and three tyrosine codons to TAG in the enhanced green fluorescent protein (EGFP) gene36 to generate the 1-TAG and 3-TAG EGFP reporters, respectively, and tested their expression in BL21(DE3) and the RF1 knockout strain JX1.0 (Figure (Figure2A).2A). As expected, BL21(DE3) cells expressing either the 1- or 3-TAG reporter showed no EGFP fluorescence, indicating that RF1 terminates EGFP translation at UAG normally. Surprisingly, JX1.0 showed a high level of fluorescence from the 1-TAG EGFP mutant and a small but reproducible level of fluorescence from the 3-TAG EGFP mutant (Figure (Figure2B).2B). These data suggest that some endogenous tRNAs can suppress the UAG codon in JX1.0.
To reveal the identity of these tRNAs, we purified EGFP protein expressed with the 1-TAG reporter in JX1.0 and identified the amino acid incorporated at the UAG site using mass spectrometry (Figure (Figure2C).2C). Tyr, Gln, and Trp were found at the UAG site through Fourier transform mass spectrometric analysis of the tryptic fragments of EGFP. To ensure that this is not from a suppressor mutation in the endogenous tRNAs, chromosomal tRNATyr, tRNAGln, and tRNATrp were resequenced, and all were confirmed to be wild type in JX1.0. The anticodons of these three tRNAs have only a single base mispairing with UAG, so these tRNAs can weakly misread the UAG codon.37 Misreading of UAG by these tRNAs was not obvious in the presence of RF1 in BL21(DE3) but became marked in JX1.0 after RF1 removal.
The unconditional knockout of RF1 opens up the possibility of reassigning the meaning of the UAG codon completely, so that the UAG codon does not ambiguously encode a stop signal and an amino acid simultaneously. We attempted to recruit UAG for an amino acid in JX1.0, so as to mimic a possible code evolution pathway implicated by RF constriction in stop codon recognition in Tetrahymena, Euplotes, and other eukaryotes.17,18 An orthogonal tRNA/synthetase pair, the tRNACUATyr/LW1RS pair, was introduced into JX1.0. This pair does not crosstalk with endogenous E. coli tRNA/synthetase pairs and functionally couples with E. coli’s protein translational machinery.8,38 The tRNACUATyr decodes the UAG codon specifically through its anticodon CUA, and LW1RS is engineered to charge the tRNACUATyr with the Uaa p-acetyl-l-phenylalanine (pActF).39 An EGFP gene containing 1-, 2-, 3-, or 10-TAG codons was co-expressed with tRNACUATyr/LW1RS in a single plasmid pAIO-EGFP(n-TAG) (Figure (Figure33A).
EGFP expression was assayed by Western blotting (Figure (Figure3B)3B) and in-cell fluorescence (Figure (Figure3C).3C). When pActF was not added to the growth media, BL21(DE3) cells expressed a small amount of full-length EGFP with the 1-TAG reporter only, suggesting that the tRNACUATyr/LW1RS pair incorporates a natural amino acid in very low efficiency in the absence of the cognate pActF. No full-length EGFP was detected for the 2-, 3-, or 10-TAG reporters. In contrast, JX1.0 expressed full-length EGFP for 1-, 2-, 3-, and 10-TAG mutants, although the efficiency decreased with the number of UAG codons. We then purified the EGFP protein from JX1.0 expressing pAIO-EGFP(1-TAG) in the absence of pActF (Figure (Figure3D)3D) and analyzed it with mass spectrometry. Consistently, Tyr, Gln, and Trp were found at the UAG site as observed in Figure Figure2C,2C, confirming misreading by endogenous tRNAs in JX1.0.
When pActF was supplied in the growth medium, BL21(DE3) cells showed efficient expression of the 1-TAG EGFP mutant, but EGFP expression decreased precipitously with the addition of each UAG codon due to the competition from RF1 termination. The use of 3 UAG codons in EGFP virtually abolished protein expression, and no protein could be detected at all in the 10-TAG mutant. In stark contrast, the RF1-deletion strain JX1.0 showed high expression of EGFP in all mutants in the presence of pActF, as indicated by Western (Figure (Figure3B)3B) and in-cell fluorescence (Figure (Figure3C).3C). EGFP proteins were purified with yields of 8.5 (±0.4), 7.1 (±0.4), 9.7 (±0.3), and 1.2 (±0.1) mg/L for the 1-, 2-, 3-, and 10-TAG EGFP samples, respectively (Figure (Figure3D).3D). There was no decrease in incorporation efficiency when the UAG codon was increased from 1 to 2 or 3, indicating that UAG is changed to a sense codon in JX1.0. The drop off in yield in the 10-TAG EGFP sample is likely because 10 pActF interferes with EGFP folding and/or stability, which is corroborated with the lack of fluorescence from 10-pActF EGFP (Figure (Figure33C).
Although JX1.0 had no RF1 to terminate protein translation at the UAG codon, a very small amount of protein was detected on the Western blot but could not be detected with mass spectrometry that seemed to be truncated at the UAG site (Figure (Figure3B). We3B). We think that the decoding of the UAG codon as the Uaa pActF may not be as efficient as the decoding of canonical sense codons as natural amino acids. First, the pActF-specific LW1RS is less active than the wild type synthetases39,40 and thus may generate fewer aminoacylated orthogonal tRNAs. Second, the orthogonal tRNACUATyr has not been evolutionarily optimized for UAG decoding, whereas many natural tRNAs are posttranscriptionally modified through evolution for efficient codon recognition.41 Third, natural aminoacyl-tRNAs have also been fine-tuned for binding to the elongation factor Tu and the ribosome to achieve efficient translation,42,43 whereas the pActF-charged orthogonal tRNA has not been optimized for binding to either. All of these factors can lead to less efficient decoding of UAG as pActF and ribosome drop-off during translation, resulting in truncated protein products. The less efficient decoding of UAG to pActF can also have accumulative effect to account for the decrease of protein yield for the 10-TAG mutant. Nonetheless, the underlying mechanism for generating these truncated products in the absence of RF1 is intriguing and warrants further studies.
To identify the amino acid incorporated in response to UAG in JX1.0, we purified EGFP expressed in JX1.0 using pAIO-EGFP(3-TAG) in the presence of pActF and analyzed it with mass spectrometry (Figure (Figure3E).3E). The monoisotopic masses of the tryptic peptides clearly showed that pActF was incorporated at all three UAG sites. No peaks corresponding to the incorporation of other amino acids at the UAG sites were detected. The precursor ions of the peptides containing the UAG sites were individually fragmented with an ion trap mass spectrometer. The fragment ion masses were unambiguously assigned, confirming that pActF was incorporated at the UAG sites (Supporting Figure S1). These results indicate that misreading of UAG in JX1.0 by endogenous tRNAs was outcompeted by the tRNACUATyr/LW1RS pair, which specifically decodes UAG as pActF.
To evaluate the initial response of E. coli to RF1 deletion and subsequent UAG reassignment, we assessed the health of JX1.0 using a growth assay. JX1.0 was healthy, cloneable, and stable in culture; no changes in phenotype or genotype were observed after growing over 200 generations. Compared to parental BL21(DE3), JX1.0 showed a slower doubling rate (Figure (Figure4).4). The doubling time for JX1.0 was 91 min compared to 26 min for BL21(DE3). This difference suggests that RF1 knockout creates burdens for cells due to the lack of proper termination and relatively weak missuppression of UAGs by near-cognate tRNAs, yet the effect is not lethal as perceived before. Introduction of the pAIO plasmid expressing the orthogonal tRNACUATyr/LW1RS pair in the absence of pActF increased the doubling of JX1.0 and BL21(DE3) to 135 and 41 min, respectively, possibly due to a general toxicity of expressing unacylated tRNAs. The addition of pActF to the growth medium together with the expression of tRNACUATyr/LW1RS further increased the doubling time to 253 min for JX1.0, whereas BL21(DE3) was not affected. This dramatic doubling reduction of JX1.0 is most likely from the efficient incorporation of pActF at UAG positions throughout the proteome, reflecting the pressure generated by UAG reassignment to a sense codon. In contrast, BL21(DE3) expresses RF1, which competes at UAG positions for termination, thereby mitigating this pressure.
We show here that RF1 is nonessential for the E. coli species, in contrast to the previous conclusion that RF1 is essential. By reverting Thr246 to Ala and removing the autoregulation of RF2 expression, we recently knocked out RF1 from an engineered E. coli strain MDS42 that has a reduced genome.13 Because ~700 nonessential genes are deleted in MDS42 and multiple mutations are introduced into its RF2 gene, it is difficult to conclude what factors specifically contribute to RF1 knockout and whether RF1 dispensability is a general property of E. coli. Moreover, MDS42 is not a wild type E. coli and not suitable for the investigation of code evolution and adaptation since a variety of genes have been removed. By studying various nonengineered E. coli strains, we discovered here that a wild type RF2 is sufficient and necessary for RF1 removal without incurring compensatory mutations. As no other mutations are artificially introduced into these strains, our results clearly demonstrate that RF1 is nonessential for wild type E. coli. The successful RF1 knockout in multiple strains indicates that RF1 dispensability is a general property of E. coli without peculiar requirements.
Unconditional RF1 removal can be explained by the usage of UAG in E. coli genome. UAG is the least-used stop codon in E. coli, terminating only ~7% of the total genes.29 Among 302 essential genes in E. coli,44 only 7 genes are ended with the UAG codon, and these 7 genes all have either a UGA or UAA stop codon downstream in short distances (Supporting Figure S2). If the UAG is readthrough, this second different stop codon can ensure the termination of these proteins with only a small number of amino acids appended. The additional amino acids may not completely disable the function of these proteins, allowing E. coli to survive as observed in this study. The few UAG-ending essential genes and the presence of double stop codons can account for the nonessentiality of RF1. Consistently, we show here that a wild type RF2 but not the weakened A246T mutant permitted RF1 knockout, suggesting that efficient termination of the dominant UAA and UGA stop codons is required for E. coli survival.
Knockout of RF1 provides novel insights on the evolution of RFs and the genetic code. A major distinction of bacteria from eukaryotes and archaea in protein translation is that bacteria use two different RFs to recognize the stop codons, whereas a single RF is sufficient for organisms in other domains. This difference has served as evidence that the evolution of RF and translation termination between bacteria and eukaryotes/archaea is nonconserved.45,46 Our results show that an autonomous E. coli strain with a single class I RF can be generated. The viability of such a bacterium blurs the apparent two-vs-one RF distinction between prokaryotes and eukaryotes/archaea, suggesting that RF evolution in the three domains of life might be more similar than current differences suggest. In addition, certain eukaryotes have been found to reassign a stop codon to a sense codon and restrict the recognition of eRF1 to the rest of the stop codons.2,17,18 It is unclear whether stop codon reassignment caused eRF1 restriction or vice versa during the code evolution. We show here that bacteria, in addition to eukaryotes found in nature, can also undergo such stop codon reassignment accompanied by RF changes. Moreover, our results demonstrate that RF changes can synthetically drive the reassignment of a stop codon to an amino acid, providing experimental evidence for this hypothetic evolution pathway.
An autonomous RF1 knockout bacterium will enable new research for understanding code evolution. A major challenge in studying the evolution of the genetic code is that many questions are out of reach of direct experimentation.47 No organisms exist containing a primitive or intermediate genetic code for comparison, and natural code evolution takes millions of years. The RF1 knockout strain reported here now afford a previously unavailable model organism to study otherwise intractable questions on the codon reassignment process in real time. We show here that RF1 knockout and the introduction of an orthogonal tRNA/synthetase pair completely reassigned UAG from a stop signal to an amino acid, setting an initial stage for a bacterium to adopt this altered genetic code. Such a stage had been buried in the recess of evolution and experimentally inaccessible. Using the RF1 knockout strain we may now be able to address questions such as whether the altered code can eventually be fixed, how long this process would take, and what physiological changes are necessary for such adaptations. Answers to these questions would not only help us understand the fundamentals of code evolution but also provide rare empirical data to guide code optimization for specific synthetic purposes. In addition, it is hypothesized that the genetic code started with a set of primitive amino acids and that others were added until the total 20 was reached.48,49 It remains mysterious whether the addition of new amino acids to the repertoire affords evolutionary advantage to drive code expansion. This challenging fundamental question can now be investigated experimentally with strains reported here.
Our finding will guide genome recoding to reassign stop codons successfully. It is promising to generate an E. coli through genome engineering that has endogenous UAG codons replaced with the synonymous UAA stop codon.16 However, to reach the final goal of setting aside the UAG codon to encode a Uaa unambiguously, RF1 must be removed from the host strain. MG1655, a K-12 derivative strain, has being used in the current genome engineering effort,16 which will not permit RF1 knockout on the basis of our findings here. Our results provide a solution for this impasse: reversion of the Thr246 to Ala in the RF2 gene should enable RF1 knockout and clean recoding, which can be conveniently applied at any genome engineering stage before the final RF1 gene removal.
An autonomous RF1 knockout bacterium will afford a unique host for synthesizing and evolving new protein functions and biological properties through Uaa exploitation. To date, directed laboratory evolution of new biosynthesis ability using genetically encoded Uaas has not been feasible due to two demanding requirements: simultaneous Uaa incorporation at multiple sites and an autonomous host. Multisite Uaa incorporation enables synergy and maximizes the exploration of protein sequence space, and a self-sustaining host with such ability is necessary for experimental evolution. The RF1 deletion strains we generated here fulfill both requirements and thus should open the field for Uaa-based directed evolution. Since a variety of new functionalities can be introduced into proteins through Uaas, organisms containing an expanded genetic repertoire have greater potential for novel biosynthesis via directed evolution, such as biorenewable production of chemicals and fuels.
Knockout of the prfA gene was attempted using a chloramphenicol acetyltransferase (cat) cassette via established procedures.23 Briefly, 51 nucleotide overhangs homologous to the regions immediately 5′ and 3′ of prfA were appended to the cat gene. One microgram of this cassette was electroporated into various strains harboring the pKD46 plasmid, which expresses the phage λ red recombinase. Chloramphenicol-resistant clones were screened for knockout by genomic PCR using primers 5′-GGA TAA CGA ACG CCT GAA TA-3′ and 5′- TCC AGC AGG ATT TCA GCA TC-3′. Positive clones were verified by DNA sequencing and genomic sequencing.
DH10βf was constructed from DH10β as follows to revert the Thr246 in prfB to Ala. A knockin cassette was first generated containing the prfB gene from BL21(DE3) transcriptionally coupled to a kanamycin resistant (KanR) cassette. The KanR cassette was flanked on the 3′ end by a 51-nucleotide region homologous to the 3′ end of the endogenous prfB gene. One microgram of this cassette was electroporated into DH10β harboring the pKD46 plasmid. KanR clones were screened using PCR and sequence verified for mutation of position 246 to Ala.
All plasmids were assembled by standard cloning methods and confirmed by DNA sequencing. pAIO plasmids containing an EGFP gene with different TAG codons were synthesized as follows: EGFP cassettes with an N-terminal Hisx6 tag and TAG codons at various positions were created using overlapping PCRs. The following sites were used: Y182 for 1-TAG; Y39 and Y182 for 2-TAG; Y39, Y182, and Y151 for 3-TAG; Y39, K101, D102, E132, D133, K140, E172, D173, D190, and V193 for 10-TAG. These cassettes were first cloned into pBP-Blunt (Biopioneer, San Diego, CA) and then digested and ligated into pBK-AIO vectors containing the orthogonal tRNACUATyr and LW1RS (39) using Spe I and BglII restriction sites. pBAD vectors containing the 1- or 3-TAG EGFP genes were constructed by inserting the EGFP cassettes into the pBAD/His vector (Invitrogen life technologies) using the Nco I and Hind III restriction sites.
In-cell fluorescence intensity was determined using a FluoroLog-3 (Horiba Jobin Yvon). E. coli colonies were picked and grown in 2xYT medium for 16 h with or without Uaas. Cells were washed twice with PBS buffer and diluted in PBS to an OD600 of 0.1. The emission spectrum of EGFP was recorded from 503 to 560 nm using an excitation wavelength of 488 nm. The fluorescence intensity of each sample was compared using the intensity at the 511 nm emission peak.
A colony was picked for each E. coli strain and grown overnight in 2xYT medium with the appropriate antibiotics. Cells were normalized to an OD600 of 1 and diluted 1:50 in fresh 2xYT medium with antibiotics and 1 mM Uaa (when applicable). For BL21(DE3) strains, OD600 was then measured every 30 min for 10 h. For JX1.0 strains, OD600 was measured every 60 min for 48 h. Doubling times were calculated from the exponential growth phase in each strain.
We thank W. Fischer for mass spectrometric analyses. This work was supported, in part, by the Howard Hughes Medical Institute and a grant from the Gordon and Betty Moore foundation to J.R.E. L.W. acknowledges support from the Searle Scholar Program (06-L-119), Beckman Young Investigator Program, March of Dimes Foundation (#5-FY08-110), California Institute for Regenerative Medicine (RN1-00577-1) and National Institutes of Health (1DP2OD004744-01).
National Institutes of Health, United States
These authors contributed equally to this work.
Supporting methods for genome sequencing, Western analyses, protein purification, mass spectrometry, and supporting figures and table. This material is available free of charge via the Internet at http://pubs.acs.org.
The authors declare no competing financial interest.