|Home | About | Journals | Submit | Contact Us | Français|
We specifically sought genes within the yeast genome controlled by a non-conventional translation mechanism involving the stop codon. For this reason, we designed a computer program using the yeast database genomic regions, and seeking two adjacent open reading frames separated only by a unique stop codon (called SORFs). Among the 58 SORFs identified, eight displayed a stop codon bypass level ranging from 3 to 25%. For each of the eight sequences, we demonstrated the presence of a poly(A) mRNA. Using isogenic [PSI+] and [psi–] yeast strains, we showed that for two of the sequences the mechanism used is a bona fide readthrough. However, the six remaining sequences were not sensitive to the PSI state, indicating either a translation termination process independent of eRF3 or a new stop codon bypass mechanism. Our results demonstrate that the presence of a stop codon in a large ORF may not always correspond to a sequencing error, or a pseudogene, but can be a recoding signal in a functional gene. This emphasizes that genome annotation should take into account the fact that recoding signals could be more frequently used than previously expected.
It has been known since the 1980s that several mRNAs can be obtained from the same gene, by post-transcriptional modifications like editing and splicing (1–5). Alternatively, recoding events which act at the translational level can occur by subverting the normal decoding rules allowing the synthesis of two related proteins from the same mRNA (6). All of these events enhance the coding potential of complex genomes and may fill the gap between the number of genes and the number of polypeptides present in a given organism (7). In most cases, recoding events are used to graft a new biological function onto a protein. This is spectacularly illustrated in retroviruses, in which recoding is almost always necessary to produce the Gag-Pol polyprotein which bears, among others, the reverse transcriptase activity (8,9). In the vast majority of reported recoding events, either frameshifting or readthrough of stop codons is involved (6,10–12). Although the mechanisms are different, the end result is very similar, since both finish in the skipping of the natural stop codon and the synthesis of an extended protein.
Most genes controlled by recoding have thus far been observed in small autonomous genetic elements, such as RNA viruses and transposable elements, but very few in chromosomal genomes (13–19) with the exception of euplotes genome (20). This bias may be the result of a selective pressure favoring translational control in compact genomes. However, an easy explanation could be that these events are more easily identifiable in small genomes in which a large amount of information is available. In particular, the pattern of protein expression is often known and can be related to the nucleic acid sequence. This is obviously not the case for even the smallest chromosomal genomes, where there is a large knowledge gap between the nucleotide sequence and the pattern of protein expression. One may thus suppose that recoding events might be more frequent in large genomes than can be extrapolated from current data (10,21).
We have previously reported a computational analysis to identify readthrough controlled genes in the yeast Saccharomyces cerevisiae, using a stop codon nucleotide context corresponding to a consensus readthrough motif (15). This allowed us to identify eight sequences, among which we have demonstrated that translational readthrough on the PDE2 gene induces a high instability of the Pde2p protein, which in turn affects the cAMP level in the cell. This mechanism could explain some physiological modifications associated with presence of the yeast prion [PSI+] (22). Although this approach has been fruitful, one cannot exclude that we had missed other readthrough events. Termination is still one of the least understood aspects of translation, especially in eukaryotes. In particular, the precise biochemical mechanisms involved remain to be elucidated, as recently in prokaryotes (23). In addition, other mechanisms like ribosome hopping/sliding can also lead, as does readthrough, to stop codon bypass (24,25).
In this study, we developed a strategy to identify genes in the yeast S.cerevisiae, whose expression is controlled by a stop codon bypass, without a priori knowledge of the mechanism involved. We used a computer program to determine the genomic regions where two adjacent open reading frames are separated simply by a unique stop codon, termed ‘semi-open reading frames’ (SORFs). A subset of 60 candidate regions was found. We then quantified the stop codon bypass efficiency of each. Eight regions showed a bypass efficiency 10-fold higher than background; of these, two corresponded to classical readthrough events and six to other mechanisms. These SORFs, showing a high level of stop codon bypass efficiency, were named ‘bypass of stop codon’ (BSC). Several of these SORFs are genes with known functions, which will allow further analysis of the physiological role of the recoding event. Overall, these results demonstrate that bypassing stop codons may be a more frequent event in eukaryotes than was thought.
The strains Y349 (MATa lys2Δ201 leu2-3,112 his3Δ200 ura3-52) and 74-D694 (MATa ade1-14 trp1-289 his3Δ200 leu2-3,112 ura3-52 [psi–] or [PSI+]) were used in this study. YNB (0.67% yeast nitrogen base; 2% glucose) missing the appropriate amino acids was used to prepare media for standard growth conditions. The 74-D694 [psi–] and [PSI+] strains carry the ade1-14 allele which corresponds to a nonsense UGA mutation. This allows direct visualization of the PSI state of the strain through the accumulation of a red pigment in the [psi–] and not in the [PSI+] strain. The PSI status of the strains was verified by the red or white color of the colonies they give and by routine determinations of readthrough efficiency directed by sensitive stop codons (26).
The pAC99 reporter plasmid has been previously described (15). Constructs were obtained by inserting a PCR fragment containing the stop codon into the cloning site, between the lacZ and luc genes in the plasmid pAC99. For readthrough measurements, an in-frame control was used (pAC-TQ) which allowed the production of 100% fusion protein (β-galactosidase-luciferase). The region including the inserted fragment was sequenced in the newly constructed plasmids.
The yeast strains were transformed with the reporter plasmids using the lithium acetate method according to Ito et al. (27). In each case, at least three transformants, cultured in the same conditions, were assayed. Cells were broken using acid-washed glass beads; luciferase and β-galactosidase activities were assayed in the same crude extract, as previously described (28). Readthrough frequency is defined as the ratio of luciferase activity to β-galactosidase activity. To establish the relative activities of β-galactosidase and luciferase when expressed in equimolar amounts, the ratio of luciferase activity to β-galactosidase from an in-frame control plasmid was taken as a reference. Readthrough frequency, expressed as percentage, was calculated by dividing the luciferase/β-galactosidase ratio obtained from each test construct by the same ratio obtained with the in-frame control construct (29).
Each SORF fragment was amplified from Y349 genomic DNA by PCR, using Pfu polymerase (Stratagene), and cloned into the pAC99 vector.
Total RNA was extracted, as described by Schmitt et al., from 5 ml of exponential yeast culture (30). Each RNA extraction was subjected to a DNA digestion with 10 U of RNAse-free DNAse I (Boerhinger) at 37°C for 1 h. DNAse I was inactivated by heating at 90°C for 5 min as recommended by the manufacturer. RNA was reverse-transcribed with oligo(dT) primer by AMV reverse transcriptase (Stratagene) for PCR amplification with Taq polymerase® (Amersham) in a GeneAmp 2600 thermocycler (Perkin Elmer). PCR fragments were visualized in a 1.5% agarose gel. The sequences of the primers used either in SORF amplification or RT–PCR experiments are shown in Tables S1 and S2, which are published as Supplementary Material.
The SORFS have been automatically identified in the nucleic sequence of the entire genome of S.cerevisiae, with a simple computer program dedicated to this search witten in Pascal and available upon request. The nucleic sequence and the coordinates (start and stop codons location) of all the ORFs were retrieved from the Stanford Genome Database (March 1999).
For each SORF, the program retrieves the following characteristics: (i) the length of the 3′ extension (ORF2); (ii) the position of the first codon ATG (if any) being present in the ORF2; (iii) the sequence surrounding (3 nt before and 6 nt after) the termination codon of ORF1; (iv) if the ORF2 is overlapping an already annotated ORF (existing in the SGD database).
This was done for each strand separately, and resulted in a list of all the SORFs being present in the S.cerevisiae genome. The maximal length observed for ORF2 was 3375 nt, the minimal length being 3 nt (when the stop codon of ORF1 is immediately followed by another stop codon). We observed that ~90% of the annotated ORFs exhibit an extension (ORF2) which is smaller or equal to 150 nt, and that 2% of the annotated ORFs are followed by an ORF longer than 200 nt. We pursued our analysis with the latter sequences in order to be able to perform experiments on all of them.
The goal of this study was to identify ORFs exhibiting a genomic organization compatible with a translational readthrough-dependent mode of expression. We chose to restrict our search to ORFs already annotated, in order to investigate those followed by a sequence without a second immediate termination codon, in the same reading frame (Fig. (Fig.1).1). This 3′ extension was named ‘ORF2’, ORF1 being the annotated ORF. Those two ORFs were referred to as SORFs. We initially retrieved the entire S.cerevisiae DNA sequence and the coordinates of all the ORFs from the Stanford Genome Database. We then constructed a computer program to identify automatically all possible SORFs (see Materials and Methods). For each, we measured the ORF2 length and pursued our analysis only with SORFs containing an ORF2 with a minimal length of 200 nt (this limitation is discussed in Discussion).
Preliminary experiments have shown that fragments containing an in-frame ATG codon within the 50 nt after the stop codon sometimes promoted translation, possibly through translational reinitiation (data not shown). We thus excluded all fragments containing an in-frame ATG codon in the first 50 nt after the stop codon. Sixty SORFs corresponding to these criteria are listed in Table Table11.
These 60 SORFs were identified without any criteria on stop codon nucleotide context. Therefore, it cannot be predicted if ribosomes can actually bypass the unique stop codon present between ORF1 and ORF2. To quantify stop codon bypass frequency, each fragment (about ±50 nt either side of the stop codon) was amplified by PCR (see Supplementary Material Table S1 for the list of oligonucleotides used) from genomic DNA of a wild-type yeast strain (Y349) and cloned into the pAC99 dual reporter vector (15). Each construct was then sequenced to verify the presence of the stop codon and to verify that no error occurred during PCR amplification. Among the 60 SORFs analyzed, two (YCL014w and YJL162c) did not show the presence of the expected stop codon. Since the strain used here (Y349) is not the SC288C strain that has been used for the yeast sequencing project, either a sequencing/annotation error or a gene polymorphism could explain this discrepancy. Among the 58 remaining candidates, eight fragments displayed a stop codon bypass efficiency >10-fold of the background (≥3%) and six displayed a bypass efficiency between 1 and 3% (Fig. (Fig.2).2). The eight sequences with a bypass efficiency >3% have been called BSC and were retained for further study.
To confirm the presence of an mRNA encompassing the leaky stop codon, we examined each of the eight BSC genes identified above by RT–PCR, using first a reverse transcriptase step with an oligo dT primer, and secondly a PCR step with an upper primer located in ORF1 and a lower primer located far beyond the stop codon in ORF2 (see Supplementary Material Table S2 for the list of oligonucleotides used). A unique specific amplification was obtained in each case only in the presence of reverse transcriptase (Fig. (Fig.3A).3A). All of them have been sequenced (Fig. (Fig.33B)
These results demonstrate that the same molecule of mRNA carries both the ORFs for each BSC tested. Moreover, the amplification indicated that these mRNAs are polyadenylated. In these conditions, a readthrough product could in fact be produced from these BSC. An analysis of the polyadenylation signal in yeast has indicated that the mean length of 3′ non-coding sequence, before the poly(A) tail, is ~60 nt (31). Our analysis confirmed these data except for BSC genes which possess a long 3′ extension between the stop codon and the polyadenylation signal (Fig. (Fig.33 and data not shown).
We analyzed databases in order to determine if these BSC genes exhibit any similarity with known genes, or carried any known motifs. The results are shown in Table Table22.
BSC5 (YNR069c) is referenced as a pseudogene in the database. However, we have demonstrated that a unique poly(A) mRNA exists, strongly suggesting that it can in fact be expressed.
BSC6 (YOL137w) has eight putative transmembrane segments in ORF1, and two in ORF2. The protein resulting from the fusion of the two ORFs is expected to have a structure different to that synthesized from ORF1 only. This novel structure would probably modify the specificity or the activity of the protein.
Finally, BSC1 (YDL037c) displays significant homology with Muc1p (cell surface flocculin) on both ORF1 and ORF2. It also bears two large nucleotide repetitions, one of 94 nt in ORF1 and one of 111 nt in ORF2. Furthermore, there are several imperfect nucleotide motif repeats around the stop codon between ORF1/ORF2. Strikingly, these imperfect nucleotide repeats correspond to two perfect repeats of nine amino acids, exactly spaced by 15 amino acids (Fig. (Fig.44A).
To determine either these BSC genes are limited to S.cerevisiae, or are found in other yeast genomes, a BLAST comparison has been done with the five other yeast genomes almost completed. Preliminary data suggest that homologous extensions can be found at least for some BSC genes (data not shown). However, the data available on the other yeast genomes are still too preliminary to draw any definitive conclusion.
Numerous experiments have demonstrated that the proximate nucleotide context, surrounding the stop codon, is a key determinant for readthrough efficiency (26,32–35). We have previously proposed a 3′ consensus pattern which yields high readthrough efficiency in yeast (26). This motif, STOP-CAA N(-A)A, very similar to the sequence first identified in the TMV virus, is found in BSC4 (Fig. (Fig.4B),4B), but not in the other BSC. We postulated that the 3′ motif of the BSC4 is actually responsible for the high readthrough level obtained, and to test this we cloned an 18 bp fragment from BSC4 (Fig. (Fig.4B)4B) into the pAC99 vector. This fragment gave a readthrough efficiency of 9%, similar to that directed by the original 123 bp fragment. This result clearly indicates that the immediate nucleotide context of the BSC4 stop codon is sufficient to obtain a high readthrough level.
As quoted above, BSC1 displays an unusual nucleotide structure around the stop codon with imperfect nucleotide repeats and perfect amino acid repeats (Fig. (Fig.4A).4A). We have investigated the role of these repetitions in the stop codon bypass mechanism, by quantifying the stop codon bypass efficiency of several deletion mutants. Our results, shown in Figure Figure4A,4A, indicate that this region plays an important role in stop codon bypass efficiency.
How the stop codon bypass occurs in the BSC is clearly understood only for BSC4, where we suspect a suppression by a natural tRNA, promoted by the stop codon proximal nucleotide context. For the other sequences, the mechanism remains unknown, since several events could lead to the same result. Among these are translational readthrough, splicing, editing, ribosome hopping, etc. As a first test to discriminate between these possibilities, the RT–PCR products obtained in the experiment shown in Figure Figure33 were sequenced. We confirmed the presence of in-frame stop codons, but in no case did we observe RNA post-transcriptional modification (data not shown).
We then tested the selected stop codon regions in isogenic [psi–] and [PSI+] strains. [PSI+] is an epigenetic element corresponding to intracellular aggregates of the Sup35p translation termination factor (36). In a [PSI+] strain, there is an overall decrease of translation termination efficiency, due to the depletion of soluble eRF3 termination factor which favors natural suppressor tRNAs (37,38).
The same constructs were used to transform isogenic [psi–] and [PSI+] strains. As shown in Figure Figure5,5, the basal level of readthrough was increased at least 3-fold in the 74-D694 [psi–] strain as compared to the Y349 strain used in the previous experiment. This increase is, however, moderate since the [psi–] 74-D694 strain ([ade–]), contrary to its [PSI+] derivative, gives rise to red colonies, characteristic of a low level of readthrough. Such limited variability in basal readthrough efficiency has already been observed (15,29) and is probably due to different genetic backgrounds. The most striking observation is that, for most of the BSC, the bypass frequency is unaffected by the [PSI+] context. Only IMP3 and BSC4 showed a significant increase of stop codon readthrough level when associated with the depletion of Sup35p (Fig. (Fig.5).5). We concluded from these data that only the stop codons from both these genes are bypassed by a mechanism depending on eRF3.
This work describes a comprehensive analysis of the S.cerevisiae genome which attempts to identify cellular recoding events occurring during translational termination. We have developed a genomic approach, seeking genes with an extended coding potential, through the stop codon, without prior bias from existing ideas on termination codon suppression mechanism. The candidate genes are composed of two ORFs separated by a unique stop codon named SORF. We fixed the minimal length of the second ORF (ORF2) at 200 nt, and only SORFs without an ATG codon in the first 50 nt after the stop codon were retained for analysis. This allowed us to analyze a reasonable number of candidates, and excluded from the analysis genes possibly expressed through internal ribosome entry or translational reinitiation. Our preliminary data suggest that such genes, controlled through translation initiation, could also be retrieved by this kind of approach (O.Namy, I.Hatin and J.P.Rousset, unpublished results).
Recently, Harrison et al. have published an analysis of the yeast genome seeking ‘disabled ORF’, calling these dORF or mORF (39). Their results overlap only slightly with ours, because we did not base our analysis on homologies with known genes, and limit our research of 3′ extensions to official ORFs. They identified 11 of the SORFs characterized here (YDR082W, YIR044C, YER039C-a, YHR058C, YKL031W, YKL020C, YLR465C, YMR057C, YNR069C, YOR024W, YOR051C), two of which (YLR465C and YNR069C) display a significant stop codon bypass efficiency.
Our approach identified 58 SORFs in the yeast genome. Eight sequences displayed a stop codon bypass efficiency 10-fold higher than background. For each of these candidates, a unique mRNA covering both ORFs is present in the cell. Although it is only for those SORFs showing the highest bypass levels that one could expect to detect an mRNA editing mechanism, we did sequence the RT–PCR products for each SORF. No RNA post-transcriptional modification was identified. Moreover, from the amplification of the mRNA using a poly(dT) primer at the reverse transcription step, we concluded that these mRNA are polyadenylated and not rapidly degraded.
We quantified the stop codon bypass efficiency for each of the eight BSC genes in two isogenic [PSI+] and [psi–] yeast strains. In the [PSI+] strain, we expected that if the bypass mechanism is bona fide readthrough, the stop codon efficiency should increase. In fact, we observed a stop codon bypass increase only for the genes BSC4 and IMP3. For the other sequences, no such difference between the [psi–] and the [PSI+] strains was observed; thus for these sequences, the termination process is less dependent on the concentration of the eRF3 release factor, or the stop codon is bypassed by another translational mechanism independent of eRF3. The discrepancy between the results obtained with both wild-type [psi–] strains (Y349 and 74-D694) is not clear. In any case, these results indicate that the genetic background influences these unknown stop codon bypass mechanisms.
Recoding signals are usually composed of two elements, the sequence where the recoding takes place, and a stimulatory sequence. In a few examples, the stimulatory sequence is simply the immediate 3′ stop codon nucleotide context (26,34). However, most frequently the stimulatory sequence is a pseudoknot (or a stem–loop), and may be as far as 4 kb from the recoding site (40–44). A computational analysis of the stop codon 3′ nucleotide context of the BSC genes did not reveal any significant secondary structure in the vicinity of the stop codon. However, the BSC1 and BSC4 sequences displayed an unusual sequence pattern. The BSC1 sequence included four repeats of a nine amino acid motif that correspond to imperfect nucleotide repeats. This suggests that the role of these motifs is at the level of the protein and not of the mRNA. The exact role of these motifs in protein function is still unknown; however, our results indicate that this region is essential to obtain a high level of stop codon bypass, which suggests that the nucleotide motif is involved in the bypass mechanism. It is interesting to note that another recoding event, hopping, uses a codon repetition to promote stop codon bypassing (25,45). One can thus speculate that the mechanism active in BSC1 would not be readthrough, but hopping. This could explain the lack of effect of the PSI factor on the stop codon bypass efficiency. More experiments are necessary to elucidate the precise mechanism.
The study of BSC4 indicated that the stop codon is present in a typical readthrough context. This observation is coherent with the increase of the stop codon bypass efficiency observed in the [PSI+] strain. Our results demonstrate that the immediate stop codon context is in fact sufficient to promote a high level readthrough of the BSC4 stop codon.
Overall, our results emphasize that recoding events take place at the stop codon more often than expected. These events should be carefully sought during genome annotation as shown recently in euplotes (20). They also suggest that mechanisms other than readthrough are possibly used by cells to allow ribosomes to bypass the stop codon.
Supplementary Material is available at NAR Online.
We are especially indebted to Maryse Godon for genetics and molecular biology experiments. We thank Ian Stansfield, Laure Bidou and Guillaume Stahl for fruitful discussions. We are grateful to Ian Brierley and Sawsan Naphtine for their crucial help and comments on the manuscript. This work was supported in part by the Association pour la Recherche contre le Cancer (contract 4699 to J.P.R.) by the Groupe d’Intérêt Scientifique: Infections à prions (project A74) and by a CNES grant on the Preparatory Program Mars Sample Analysis.