Integrating genome-scale sequence, expression, structural and protein interaction data from E. coli we establish an interaction between chaperone (GroEL) dependency and optimal codon usage.Highly expressed sporadic substrates of GroEL employ more optimal codons than expected, show enrichment for optimal codons at structurally sensitive sites and greater conservation of codon optimality under conditions of relaxed purifying selection.We suggest that highly expressed genes cannot routinely utilize GroEL for error control so that codon usage has evolved to provide complementary error limitation, whereas obligate GroEL substrates experience relaxed selection on codon usage.Our results support a critical role of misfolding prevention in gene evolution.
Errors during gene expression are relatively commonplace, which has prompted speculations that many features of gene and genome anatomy and organization have evolved to reduce or mitigate such errors. One type of error that can be particularly costly occurs when the polypeptide chain that emerges from the ribosome fails to fold into its native structure. Some aberrantly folded proteins, exposing hydrophobic residues that would normally be buried, may begin to promiscuously interact with other proteins, become toxic to the cell and thus pose a substantial fitness concern (Gregersen et al, 2006).
In trans, molecular chaperones have long been recognized to play crucial roles in misfolding prevention and remedy. In cis, it has recently been suggested that the use of optimal codons limits mistranslation-induced protein misfolding (Drummond and Wilke, 2008). Evidence for the latter is centred on the argument that synonymous codons differ in their propensity to cause mistranslation. Translationally optimal codons, typically represented by more abundant cognate tRNAs (Duret, 2000), are thought less likely to cause ribosomal stalling and/or incorporation of the wrong amino acid.
Here, we suggest that the role, if any, of error limitation in cis can be revealed by studying its interaction with well-established error management systems in trans (chaperones). If codon usage does indeed play a tangible role in misfolding prevention, we would expect selection on codon identity to vary with the degree to which a protein can rely on other error control mechanisms, namely chaperones. We use the E. coli chaperonin GroEL as a model system to explore whether there is any interaction between optimal codon usage and chaperone dependency.
Kerner et al (2005) had previously determined GroEL substrates on a genome-wide scale. Based on enrichment in GroEL complexes the authors assigned ∼250 proteins to three classes reflecting GroEL dependency: class-I proteins, only a small fraction of which (<1%) associates with GroEL and which spontaneously regain some activity; class-II proteins, which only exhibit spontaneous refolding at more permissive temperatures and class-III proteins, which are obligate substrates of GroEL and largely fail to refold even under more benign conditions. Notably, although on average less abundant than class-I/II proteins (‘sporadic clients'), class-III proteins (‘obligate clients') occupy ∼80% of GroEL's capacity in vivo. Consequently, a higher proportion (∼100% versus ∼20% for class-II and ∼1% for class-I) of these proteins is routinely processed by the GroEL system.
We demonstrate that sporadic but not obligate clients of GroEL exhibit enhanced codon adaptation, carefully controlling for possible confounding factors, notably expression level and protein length (Figure 1). We also point out that genes that recently entered the E. coli genome via horizontal gene transfer will distort equilibrium analyses of codon usage in bacteria and should thus be routinely eliminated from analysis.
Building on earlier work by Zhou et al (2009), we further show that sporadic substrates are conspicuously enriched for optimal codons at structurally sensitive sites, consistent with more severe fitness implications of codon choice for these proteins.
Lastly, we reveal that codon optimality in sporadic clients is more highly conserved in S. dysenteriae. S. dysenteriae is an E. coli clone that has diverged relatively recently from the E. coli K12 strain and has adopted an intracellular lifestyle (Balbi et al, 2009). Concomitant with that lifestyle, Shigella has experienced a lower effective population size and therefore reduced efficiency of purifying selection. This has generated conditions where, overall, codon optimality has started to decay. However, when we followed the fate of ancestrally optimal codons at buried sites in the S. dysenteriae and E. coli K12 genomes, we found that a lower fraction of buried sites has lost codon optimality in sporadic substrates (Figure 4), again consistent with greater structural importance of codon choice in these substrates.
Based on the these findings, we suggest the following explanation: As mentioned above, class-III substrates are defined not only by GroEL being critical for proper folding, but also by occupying most of GroEL's capacity (∼80%). With a high proportion of class-III protein passaged through the GroEL system, mistranslation errors in these proteins weigh less severely as GroEL can remedy at least some misfolding that ensues. In contrast, class-I and II genes are more highly expressed and cannot routinely rely on GroEL to rectify folding errors. Yet class-I/II proteins are clearly liable to misfold as testified by their sporadic association with GroEL. We argue that augmenting GroEL's capacity to address the misfolding propensity of these genes would be prohibitively costly to the organism and that, as an alternative strategy, these genes employ optimal codons to reduce the rate of misfolding error.
Our findings (a) reveal a cis–trans interaction between codon usage and chaperones in providing an integrated error management system, (b) provide independent evidence for a role of misfolding in shaping gene evolution and (c) suggest that the burden of deleterious mutations in long-term bottlenecking populations like that of the insect endosymbiont Buchnera not only comprises unfavourable amino-acid (Moran, 1996) but also synonymous substitutions.
It has recently been suggested that the use of optimal codons limits mistranslation-induced protein misfolding, yet evidence for this remains largely circumstantial. In contrast, molecular chaperones have long been recognized to play crucial roles in misfolding prevention and remedy. We propose that putative error limitation in cis can be elucidated by examining the interaction between codon usage and chaperoning processes. Using Escherichia coli as a model system, we find that codon optimality covaries with dependency on the chaperonin GroEL. Sporadic but not obligate substrates of GroEL exhibit higher average codon adaptation and are conspicuously enriched for optimal codons at structurally sensitive sites. Further, codon optimality of sporadic clients is more conserved in the E. coli clone Shigella dysenteriae. We suggest that highly expressed genes cannot routinely use GroEL for error control so that codon usage has evolved to provide complementary error limitation. These findings provide independent evidence for a role of misfolding in shaping gene evolution and highlight the need to co-characterize adaptations in cis and trans to unravel the workings of integrated molecular systems.