|Home | About | Journals | Submit | Contact Us | Français|
Errors in protein synthesis disrupt cellular fitness, cause disease phenotypes, and shape gene and genome evolution. Experimental and theoretical results on this topic have accumulated rapidly in disparate fields such as neurobiology, protein biosynthesis and degradation, and molecular evolution, yet with limited communication between disciplines. Here, we review studies of error frequencies, their cellular and organismal consequences, and attendant long-range evolutionary responses. Measurements of error frequencies, from transcription through protein folding, remain in their infancy; we emphasize major areas where little is known, such as the failure rate of protein folding, or where technological innovations may enable imminent gains, such as translational missense error frequencies. Evolutionary responses to errors fall into two broad categories: adaptations that minimize errors and their attendant costs, and adaptations which exploit errors for the organism’s benefit. Given this wide spectrum of effects, it may be more useful to refer to synthesis outcomes as beneficial and deleterious rather than correct and erroneous.
Synthesis of a functional protein from genetic information is strikingly error-prone. For example, amino-acid misincorporations during translation are estimated to occur once in every 1,000 to 10,000 codons translated1,2. At this error rate, 15% of average-length protein molecules will contain at least one misincorporated amino acid. Polypeptide errors can induce protein misfolding, aggregation, and cell death (e.g. Ref. 3). Misfolded proteins underlie a broad array of neurogenerative diseases, and misincorporation of amino acids during translation may be a causative factor in the pathology of multiple sclerosis and ALS4,5. Conversely, global defects in protein synthesis produce tissue-specific neurodegeneration linked to production of misfolded proteins3,6.
We define erroneous protein synthesis as any disruption in the conversion of a coding sequence into a functioning protein. Besides amino-acid misincorporations, sources of errors are transcription errors, aberrant splicing, premature termination, faulty posttranslational modifications, and kinetic missteps during folding (Figure 1). This definition explicitly includes correctly synthesized polypeptides that fail to fold into a functional protein.
We have previously hypothesized that major patterns of coding sequence evolution, conserved from bacteria to humans, arise from the selective pressure to minimize the cost of erroneous protein synthesis, including the failure of properly synthesized polypeptides to fold5. Such selection would act most strongly on highly expressed genes and, in animals, on genes expressed in neural tissues. Mathematical modeling and computer simulations predict biophysical adaptations that reduce this cost5,7–9, and several of these predictions have now been verified in a recent experimental evolution study10.
Together, these studies illuminate a pathway leading from the fidelity of protein production through cellular dysfunction and organismal fitness defects—exemplified by neurodegeneration—to adaptations whose imprints are visible in the evolution of coding sequences across taxa.
Here, we first review what is known about the frequencies of errors in the production of functional proteins, from transcription to protein folding. We do not attempt a comprehensive review of all measurements. Instead, we aim to create perspective and to motivate much-needed future studies by highlighting the diverse set of approaches taken. We then review the many ways in which organisms may have evolved to cope with errors in synthesis, either by selectively reducing error rates or by evolving tolerance to errors. Next, we examine how organisms exploit errors in synthesis to achieve biological and evolutionary ends that are inaccessible when synthesis is error-free. We conclude with a discussion of implications for future research.
Errors arise at all steps of protein synthesis, from transcription to protein folding, and have widespread phenotypic consequences. Yet surprisingly little is known about the exact error rates and error spectra.
The science of measuring error rates associated with protein synthesis remains in its infancy, even though the first attempts go back more than 45 years (e.g. Ref. 11). For example, the literature contains experimental measurements for the frequency of less than 5% of the 1,216 (64×19) possible codon-to-amino-acid errors in translation, with only a handful of estimates from the same species. Recent studies have made substantial progress on measuring error rates in specific cases (see e.g. Ref. 12), but current technological developments will likely soon give us the first comprehensive view of translation error frequencies in normal cells (Box 1).
Translation is the most error-prone step of protein synthesis. Therefore, accurate measurements of amino-acid misincorporation rates are crucial for a thorough understanding of synthesis errors. We can write all possible missense errors in the form of a 64×19 matrix with 1,216 independent entries. To date, only a small percentage of these entries has been measured, and only in a handful of organisms. The challenge in measuring missense error rates is that in a given sample, the abundance of error-free molecules is several orders of magnitude higher than that of any species of error-containing molecules, overloading most unbiased detection methods and forcing investigators to employ clever, but strongly biased, schemes to obtain any result at all.
Historical methods used to measure translational error rates fall into three broad categories. First, some groups have measured the amount of a specific amino acid in a protein that should not contain this amino acid. For example, Edelmann and Gallant measured the amount of cysteine in the normally cysteine-free protein flagellin of E. coli91. Second, some groups have measured the change in a protein’s isoelectric point due to amino-acid misincorporation92,93. Both of these approaches share the drawback that they average over many different elements in the ribosomal error matrix. A third approach builds on special reporter systems that produce a signal when a specific codon is mistranslated. For example, Kramer and Farabaugh studied misincorporation of lysine at various codons using fused luciferases, F-luc and R-luc, whose luminescence can be determined independently and with extreme accuracy. In F-luc, they replaced the codon for the essential lysine at position 529 by all near-cognate and several other codons12,94. With these constructs, they measured the frequency of mistranslation of specific codons into lysine by assaying the F-luc activity relative to R-luc activity.
Could an estimate of the entire 64×19 error matrix be obtained in a single experiment? In principle, yes. Massive gains in the sensitivity of quantitative tandem mass spectrometry (MS/MS) (e.g. Ref. 90) offer the tantalizing potential for detecting low-frequency errors against a background of wild-type molecules. Deep quantitative MS/MS probing of peptides generated from a purified target protein or proteins, using a detection database including all possible single amino-acid substitutions as well as the DNA-encoded sequence, could in principle detect both the type and position of amino-acid substitutions introduced by mistranscription and mistranslation. By encoding the target protein(s) with multiple instances of all 64 codons, each codon’s error spectrum could be estimated in multiple contexts, and single-molecule RNA sequencing the target gene’s transcripts could be used to assess the frequency and position of transcription errors, allowing translation errors due to misacylation and misreading to be disentangled. While such an experiment is technically demanding, it is within the reach of present-day methods and would, at a stroke, provide the first comprehensive view of the translation error spectrum in any organism.
Table 1 provides estimates of error rates from transcription through protein folding, emphasizing the heterogeneous experimental approaches used and the patchy knowledge that has resulted. The central observation is that synthesis errors are orders of magnitude more frequent than DNA-replication errors. The E. coli genome is 4.6×106 base-pairs long, such that at the typical mutation rate of approximately 10−9 per base pair, one bacterium in 200 will bear a mutation in its genome. By contrast, the average E. coli coding sequence is 335 codons long, and at a canonical per-codon missense error rate of 5×10−4, 15% of protein molecules will contain at least one error. At the bacterial scale, perfectly replicated genomes are commonplace, but perfectly synthesized proteomes never occur. The available evidence suggests that eukaryotes are no more or less accurate at protein synthesis than are prokaryotes13. All else equal, longer proteins necessarily accumulate more errors, leading to astonishing predictions: if canonical missense error rates hold, each molecule of the giant human muscle protein titin, consisting of 34,350 amino acids, would contain an average of 17 missense errors, and an average human sarcomere would contain no error-free titin molecules at all.
Errors in posttranslational modification are likely important but their frequency and effects remain largely unknown. One of the most common modifications, glycosylation, is performed on more than 50% of proteins in a human cell14. Glycosylation is not template-driven and shows remarkable heterogeneity15. Oligosaccharides attached to glycosylation sites tend to vary from copy to copy of the same protein, and occupancy rates of glycosylation sites also vary, making it unclear to what extent heterogeneity in glycosylation should be considered erroneous. That not all heterogeneity is functionally normal is demonstrated by the often highly deleterious effects of glycosylation-altering mutations, which usually affect the efficiency of glycosylation or the composition of glycans without disrupting glycosylation altogether16. The extent and importance of misphosporylation also remains poorly understood despite potentially major consequences. For example, misphosphorylation of the microtubule-binding protein tau is a pathological signature of all cases of Alzheimer’s disease and apparently contributes to tau misfolding and aggregation17.
Perhaps surprisingly, we know even less about the error rate in producing functional proteins. An early study reported that up to 30% of newly synthesized proteins were rapidly degraded, most of which were believed to be defective ribosomal products (DRiPs)18. Yet a later study using similar techniques found that most newly synthesized proteins were largely protected from degradation, even when unable to fold correctly due to misincorporation errors, and “at most a few percent” of newly synthesized proteins were rapidly degraded19. Thus, the failure rate of functional protein production, the ultimate expression of failures in protein synthesis, remains essentially unknown. Correspondingly, the proportion of those failures due to upstream synthesis errors versus errors in folding of correctly synthesized proteins also remains unclear.
Plentiful evidence demonstrates that errors in protein synthesis reduce organism fitness: disruption of translational fidelity with common antibiotics such as streptomycin and kanamycin kills bacteria; cells with impaired translational proofreading ability display altered morphologies20 and suffer severe fitness defects21, as do cells with elevated rates of transcription errors in an essential gene10; defects in translational fidelity and in protein folding cause disease phenotypes in mouse models3,6.
A single amino-acid substitution in the editing domain of an alanyl-tRNA synthetase—a mutation which causes misacylation, subsequent widespread translation errors, and protein misfolding—causes degeneration of Purkinje cells in the mouse cerebellum, ataxia, and death3. This result supports the possibility that disease conditions involving tissue-specific dysfunction arise from global errors in protein synthesis20. Neurons may be unusually sensitive to synthesis errors because of their long lifetimes, large surface-area-to-volume ratios with correspondingly abundant sites for membrane-induced aggregation22, branched morphologies which impede transport and damage responses23, fluctuating cell polarization, and protein quality control systems more likely to be overloaded by misfolded proteins17 (cf. Refs. 24, 25).
Fitness costs can arise by multiple different mechanisms. Protein synthesis errors will often lead to loss of function of the protein. A recent study demonstrated that disruption of folding and function of the antibiotic-resistance protein β-lactamase by transcriptional errors reduced cellular fitness, but could be compensated by increased expression and by stabilizing mutations in the protein sequence10.
Protein synthesis errors may also produce polypeptides displaying a gain of toxic function. In rare cases, the error may confer an alternate or pathological function on an otherwise normal, folded protein. More often, errors disrupt folding, and the misfolded molecule may be toxic. In this context, “toxic” simply means harmful and does not specify the modality or severity of the harm. Misfolded proteins may destabilize membranes26, steal quality-control bandwidth from essential proteins24,25, and induce chronic stress. The toxic effects of aminoglycoside antibiotics, which befoul ribosomes and lead to production of misfolded proteins, have been traced in part to misfolded-protein-induced signaling through the membrane receptor cpxA. The ultimate consequence is increased radical formation, membrane depolarization, and cell death27. Misfolded protein cytotoxicity has been studied extensively as a contributor to neurodegenerative disease. It has become increasingly clear that at the molecular level, misfolding-associated disease phenotypes often reflect gains of toxic function rather than losses of function3,17,22,23,25,26,28.
Synthesis and degradation of non-functional proteins may also be costly without being obviously harmful (clean-up costs, see e.g. Ref. 29). Ribosomal throughput dedicated to a polypeptide that will ultimately fail to function represents an opportunity cost, particularly for fast-growing organisms30. Expression of quality control systems, such as chaperones, to assist, rescue, or degrade polypeptides represents a further fitness cost acting in trans. Toxicity and clean-up costs may coexist: even if quality control systems ultimately detect and either degrade or refold all misfolded proteins, the latter may still wreak substantial toxic havoc, just as crime does not cease to be a problem even if all criminals are eventually caught.
Errors in the proteins responsible for reproduction of genetic and non-genetic material, particularly in translation and replication, may lead to reduced fidelity and subsequent dysfunction in succeeding generations. Such an effect, originally conceived as an error catastrophe by Orgel31, has been demonstrated in bacteria, where heritable mutations can arise from an editing defect in translation21,32.
To the extent that protein-synthesis errors produce harmful molecular species or waste valuable cellular resources, the severity of the resulting phenotypic effects will depend on the expression level of that gene. The more highly expressed a gene, the larger the amount of erroneously synthesized proteins produced, and thus the bigger the influence of these proteins on the organism’s phenotype. For example, the clean-up costs due to synthesis of non-functional protein will be proportional to the amount of protein produced. Many forms of misfolded protein toxicity, such as aggregation and interference with membranes, increase with absolute protein concentration and therefore with gene expression level. Note that if synthesis errors primarily act by reducing protein function, an effect from gene expression level is not expected; errors in a low-expression, but functionally critical, protein such as a DNA polymerase or transcription factor need not contribute any less to organism fitness than disruption of the activity of a high-abundance enzyme or structural molecule5,7.
Faced with costly protein-synthesis errors, organisms may evolve two high-level cost-reduction strategies: reduction of error frequencies (increased accuracy), and reduction of the costs of the remaining errors (increased tolerance or robustness). Because costs tend to increase with gene expression level, selection for cost reduction is often visible in differences between genes of low and high expression level.
The primary source of missense substitutions during protein synthesis is misincorporation of non-cognate tRNAs during translation. Codons corresponding to low-abundance tRNAs tend to be more error-prone than other codons12. Consequently, codon usage affects translation error frequencies. Selection pressure to use codons with low error rates is commonly referred to as selection for translational accuracy 33. Accuracy selection should not, however, cause uniform usage of accurate codons along the gene. Instead, it should disproportionately affect those sites at which translation errors would have particularly severe effects on protein folding or function33. A common test for translational accuracy selection therefore assesses whether preferred codons associate with evolutionarily conserved sites5,33,34 or with sites that are known to be important for protein structure or function33,35. In general, these analyses show a moderate but highly significant tendency for preferred codons to coincide with sites at which translation errors are expected to be important, consistent with weak selection for increased translational accuracy.
Although changes in codon usage to improve accuracy come at little cost, ribosomes may also be made more accurate. However, increased ribosomal accuracy comes often at the cost of translation speed and energy efficiency36, due in part to the intrinsic physical implementation of increased accuracy through increased energy-dependent rejection of tRNAs. Consequently, organisms may evolve to balance ribosome speed, ribosome accuracy, and energetic costs.
A second codon-level selection pressure penalizes codons that have a high probability of being mistranslated into radically different amino acids. We refer to this selection pressure as selection for error mitigation. While not leading to a reduction of error frequencies per se, this selection pressure reduces the frequency of the most costly errors at the expense of a larger number of more benign errors. Several bioinformatics studies have found evidence for selection for error mitigation37–39. The genetic code itself also has error-mitigating properties40, and may have evolved specifically to minimize the effects of translation errors41.
It is likely that selection limits error frequencies at all steps of protein synthesis. Some simple predictions have yet to be tested, such as whether high-expression genes have lower transcriptional error rates. But aside from accuracy selection and error mitigation, little is known about the signatures that would indicate such selection pressures. One exception is the efficiency of splicing in fission yeast, as estimated by the proportion of intron-exon junctions retained in cellular mRNAs. It increases markedly with gene expression level42, presumably because missplicing becomes more costly when incorrectly spliced mRNAs are abundant.
Errors need not be eliminated altogether if instead organisms can tolerate a certain amount of errors without paying a significant fitness cost (Figure 2). Some tolerance is inherent in protein biochemistry. In vitro, proteins can be robust to many individual or multiple mutations43–48, although most mutations tend to reduce protein stability. Robustness can itself be modulated by mutations in the protein47,48. These observations suggest that proteins can evolve robustness to typical errors arising under translation, termed translational robustness7,8. Proteins that possess translational robustness can fold and function properly even if mistranslated. Mathematical and computational modeling predicts that this selection pressure will cause proteins to be more thermostable and to also be more tolerant to genetic mutations5,7–9. Recent experimental results confirm these predictions10.
But even if a gene is translated without any errors, the resulting protein may misfold, because of interactions with other proteins (e.g., other misfolded or aggregated proteins) or properties of the protein itself. Key among protein properties are thermodynamic stability, measured by the free energy of unfolding, and folding kinetics, measured by the rate of folding or unfolding. For most proteins, thermodynamics dictate whether a protein can ever attain a stable folded state, whereas kinetics determine how likely a thermodynamically stable protein is to complete folding before other processes, such as aggregation and degradation, derail it. Rapid folding and high stability tend to be correlated. We have previously hypothesized that selection reduces the propensity of proteins to misfold even when translated without errors5,7, but this hypothesis has not yet been tested experimentally. Because of the close relationship between thermodynamic stability and tolerance to mutations47–50, more translationally robust proteins may also be more kinetically stable and vice versa. A key difficulty at present is distinguishing stochastic misfolding from mistranslation-induced misfolding, as translation errors remain difficult to detect. Consistent with either translational-robustness selection or selection against stochastic misfolding is the observation that highly expressed genes are less aggregation-prone than genes of low expression level51–53.
Other adaptations beside robust protein folding may reduce the cost of synthesis errors. One is the efficient detection and degradation of mis-spliced products. The nonsense-mediated decay pathway degrades mRNAs that contain premature stop codons54. The introns of eukaryotes tend to either contain stop codons or alter the translational reading frame to reveal a downstream stop codon, leading to mRNA degradation of mis-spliced transcripts by the nonsense-mediated decay pathway55.
Broad patterns of coding-sequence evolution, such as the tendency for highly expressed proteins to evolve slowly, may reflect selection to reduce costs of protein misfolding5. Genome-wide analyses of evolutionary rates have consistently found that expression level is a major predictor of both synonymous and non-synonymous divergence in bacteria, fungi, plants, and animals5,56. Multivariate analyses find that quantities related to translation frequency make stronger contributions to evolutionary rate than do quantities linked primarily to gene function57–61. We have hypothesized that selection against protein misfolding, including misfolding of error-free polypeptides, imposes a strong constraint on coding-sequence evolution5. Many genomic patterns—covariation between evolutionary rates, expression level, codon-usage bias, and the transition–transversion ratio, as well as an association between optimal codons and evolutionarily conserved sites—can be reproduced in a model involving only selection against mistranslation-induced misfolding5. New genome-wide signals are needed to allow disentangling of selection pressures against costs of error-free versus error-induced protein misfolding.
In all extant models, we do not expect a one-to-one relationship between gene expression level and evolutionary conservation. Selection acts only on the deleterious outcomes of erroneous synthesis, which may vary from protein to protein. For example, protein alleles that are less likely to become toxic, or are more rapidly detected and shuttled toward degradation or refolding, should experience less evolutionary constraint. This pressure, and the resulting constraints on sequence change, should intensify with increasing expression level or for genes expressed in sensitive tissue types. Likewise, if a particular protein fold is highly tolerant to synthesis errors, then genes encoding proteins of this fold will experience little selection pressure to reduce costs, even if expressed at a relatively high level. By contrast, sensitive folds will experience much stronger selection pressure at comparable expression levels. Consistent with this reasoning, biophysical properties of the protein fold also influence the rate of sequence divergence62–65, and the relative contributions of expression level and protein structure to evolutionary conservation seem to be of comparable magnitude66.
Even though errors in protein synthesis tend to be deleterious on average, in numerous cases they can have direct benefits for organism fitness.
A wide array of organisms, from viruses to mammals, have evolved certain genes that depend on errors in protein synthesis. The best-known example is programmed frameshift, where the elongating ribosome shifts forward (+1) or back (−1) by a single nucleotide to enter a new reading frame67. E. coli DNA polymerase III subunits τ and γ, and eukaryotic ornithine decarboxylase antizyme, depend on frameshifting for proper protein expression68,69.
Programmed frameshifts can control gene expression (Figure 3A)70. Ornithine decarboxylase antizyme (OAZ) noncompetitively inhibits ornithine decarboxylase (ODC), an enzyme which catalyzes the first step in polyamine synthesis. Polyamines such as spermidine stimulate +1 frameshifting. In eukaryotes from fission yeast to mammals, the OAZ gene normally terminates at an early stop codon, yielding only a short peptide with no inhibitory activity, but a +1 frameshift yields full-length antizyme which inhibits ODC. At low polyamine levels, frameshifting occurs infrequently and little antizyme is produced, more ODC is active, and more polyamines accumulate70. As polyamine levels rise, frameshifting is stimulated, yielding more full-length antizyme which inhibits ODC and reduces polyamine production70. Thus, the polyamine-controlled frequency of a translation “error” has evolved to implement feedback regulation of polyamine levels.
In baker’s yeast, gene expression is partly regulated by splicing efficiency71. Under amino-acid starvation, splicing is inhibited for the majority of intron-containing ribosomal genes71. The consequence of this regulation is likely a reduction in the number of functional ribosomes, and thus inhibition of translation. Some bacteriophages use spontaneous readthrough of stop codons in specific contexts to regulate gene expression of phage proteins72.
Certain picornaviruses carry, within a long polypeptide, a short sequence (~19 amino-acids) that induces eukaryotic ribosomes to skip a peptide bond73. This skip-inducing 2A sequence allows these viruses to encode multiple proteins using a single, compact sequence without paying the price of encoding a protease. Such ribosome skipping, in essence a bug in the translational hardware uncovered and exploited by viruses, is now being coopted by human biological engineers74.
Synthesis errors can suppress otherwise deleterious mutations, such as reading through the stop codon in an important gene. Such nonsense suppression had been long studied in bacterial and cell-culture systems75. Recently, it has taken on increased importance as a therapy for genetic diseases in which a premature stop-codon mutation causes a disease phenotype, such as in cystic fibrosis and Duchenne muscular dystrophy76.
Drugs that interfere with translational fidelity in bacteria are commonly used as antibiotics. Bacterial mutants that depend on streptomycin for viability are readily isolated77 and tend to have hyperaccurate but slow ribosomes36. Streptomycin independence is often regained by mutations that decrease ribosomal fidelity78.
Synthesis errors can reveal cryptic genetic variation or produce novel phenotypes and thus allow an organism to either switch epigenetically between different phenotypes or pre-screen potentially beneficial mutations79, 80.
The yeast prion [PSI+] is an amyloid-forming conformation of the translation-termination factor Sup35p. It sequesters Sup35p and impairs translation termination (Figure 3B). The [PSI+]-state is self-propagating81, arising and resolving spontaneously with low probability82. In many environmental conditions [PSI+] strains grow better than strains without the prion83, due to genetic variation revealed upon readthrough of stop codons84. The prion domain of Sup35p is evolutionarily conserved and seems to have a beneficial effect over evolutionary timescales85. It may have adaptive value unrelated to [PSI+]. But the [PSI+] state alone is sufficient to maintain the prion domain if environmental conditions under which [PSI+] reveals adaptive genetic variation are encountered at least once every few million generations86,87.
More generally, organisms can take advantage of beneficial phenotypes generated by errors79,80. A mutation which increases the production of the erroneous but beneficial product will increase in frequency in the population. As a consequence, organisms may derive a direct benefit from mutations that increase the likelihood of additional beneficial mutations in the future. This mechanism has been termed the look-ahead effect80. A variant of the look-ahead mechanism may be the cause of adaptive mutability, i.e., an increase in beneficial mutations under conditions of environmental stress, observed in E. coli88. One of the experiments reported in Ref. 88required a reversion of an inactivating frameshift in the lac operon. But even in the absence of the reversion mutation, rare accidental frameshifts during translation provide residual function to the inactivated gene; duplication of the inactivated gene leads to a commensurate increase in fitness derived from the residual function; at the same time, it increases the probability that a mutation corrects the frameshift89.
Our understanding of the fidelity of transcription, translation, and protein folding remains sketchy (Box 2). No comprehensive, or even representative, error spectra exist for cells under normal physiological conditions. Technological innovations such as single-molecule nucleic acid sequencing have given us a surprising portrait of rampant splicing errors in a eukaryotic genome42, and this technology in combination with deep-coverage quantitative mass spectrometry90 may soon provide a similar breakthrough in our understanding of transcriptional and translational error spectra (see Box 1). However, the frequency and types of errors in common posttranslational modifications such as glycosylation and phosphorylation remain almost completely unknown, as do the consequences of these errors for protein folding and function. Moreover, the relative fitness costs of loss of protein function, quality control, and gain of toxic function remain unknown, and considerable effort will be required to determine these as well (Box 2). Yet whatever the results of such studies, the existing evidence shows that protein synthesis is surprisingly error-prone, and that erroneous protein synthesis can differentially affect specific tissue types, impose substantial cellular fitness costs, and modulate the evolution of whole genomes.
In stark contrast with the rarity of DNA replication errors, the extraordinary frequency of protein synthesis errors in normal cells urges a different, perhaps unfamiliar, view of cellular operations. Cells are inherently noisy statistical ensembles, and the genotype is best understood as encoding the frequency of different outcomes rather than a single so-called correct state that is disrupted by errors. Notions of correct and erroneous may be subsumed by the more useful notions of beneficial and deleterious, with the important difference that supposed errors may be beneficial, even essential. For example, programmed +1 frameshifts and translational hops seem to have evolved by amplification of low-frequency translation errors67.
Recent single-molecule studies underscore the need to embrace the extraordinary molecular diversity arising from a single genotype. In fission yeast, the frequency of retained introns appears to exceed 90% for the vast majority of transcripts42. Are all these retained introns technical artifacts, errors whose deleterious effects are too small to be eliminated by natural selection, errors in transcripts destined for degradation by nonsense-mediated decay55, or an uneasy compromise resulting from energetic or kinetic costs associated with increased splicing fidelity? Or do some of these retained introns confer important benefits on the organism which would be suppressed by higher-fidelity splicing? Similarly, for some high-expression proteins, certain mistranslation-generated, biochemically similar molecular species are expected to exist at cellular abudances of 10–100 molecules per cell, sufficient for action as regulatory proteins. It seems unlikely that nature always fails to exploit the existence of these molecular subspecies, but they are difficult to hunt down; perhaps high-expression genes which change expression markedly in cells with hyperaccurate ribosomes may point to autoregulatory systems maintained by mistranslation. We believe that erroneous synthesis with its attendant modifiers and resulting adaptations, far from being a negligible nuisance, will play a central role in our understanding of molecular evolution.
This work was funded in part by NIH grants P50 GM068763 and R01 GM088344.
D. Allan Drummond is a Bauer Fellow at Harvard University’s FAS Center for Systems Biology. He obtained his Ph.D. in 2006 at The California Institute of Technology, USA. His research interests include protein quality control and molecular evolution.
Claus O. Wilke is an Assistant Professor at the The University of Texas at Austin in the Section of Integrative Biology. He is also affiliated with the Center of Computational Biology and Bioinformatics and the Institute of Cell and Molecular Biology at The University of Texas at Austin. Wilke obtained his Ph.D. in Theoretical Physics from the Ruhr-University Bochum, Germany in 1999, and was a postdoctoral scholar at The California Institute of Technology from 2000 to 2004. Wilke works on computational evolutionary biology, molecular evolution, and protein folding.