|Home | About | Journals | Submit | Contact Us | Français|
With synthetic gene services, molecular cloning is as easy as ordering a pizza. However choosing the right RNA code for efficient protein production is less straightforward, more akin to deciding on the pizza toppings. The possibility to choose synonymous codons in the gene sequence has ignited a discussion that dates back 50 years: Does synonymous codon use matter? Recent studies indicate that replacement of particular codons for synonymous codons can improve expression in homologous or heterologous hosts, however it is not always successful. Furthermore it is increasingly apparent that membrane protein biogenesis can be codon-sensitive. Single synonymous codon substitutions can influence mRNA stability, mRNA structure, translational initiation, translational elongation and even protein folding. Synonymous codon substitutions therefore need to be carefully evaluated when membrane proteins are engineered for higher production levels and further studies are needed to fully understand how to select the codons that are optimal for higher production.
The nature of the genetic code was deciphered 50 years ago . As RNA is made of four different nucleotides, there are 64 possible combinations of codons for the 20 different amino acids. Different synonymous codons can therefore encode for the same amino acid. For example, serine, arginine, and leucine are each encoded by six different synonymous codons (Fig. 1).
Synonymous codon use is not uniform. Some codons are frequently used whereas others are not; the latter are commonly referred to as rare codons. Synonymous codon use also varies between different genes and genomes [2–5], and different indices have been developed to describe this phenomenon (i.e. to distinguish frequent codons from rare codons). For instance, the Nc scale describes the use of a specific codon relative to the number of synonymous codons in a genome . Alternatively, codon usage can be described by the concentrations of the complementary tRNAs in the cell. Although these two scales correlate well [3, 7–9], more refined descriptions, such as the codon bias index (CBI)  or the tRNA adaptation index (tAI) , can be obtained by combining them. Finally, the codon adaptation index (CAI) computes statistical codon usage relative to codon usage in highly expressed genes as a prediction of protein expression levels . Recent findings indicate that growth conditions affect codon usage, and the kinetics of recharging tRNAs may also be important for describing codon usage [13, 14]. Clearly, it is not straightforward to develop an efficient description of synonymous codon usage.
The availability of synonymous codons means that a single protein can be encoded by a myriad of different DNA sequences. So does it matter which synonymous codon is used? In most situations synonymous codon choice is neutral, however several studies indicate that synonymous codon changes can influence mRNA stability, mRNA structure, translational initiation, translational elongation and protein folding (reviewed in [15–18]). Thus the genetic code has the capacity to contain deeper layers of information than simply the amino acid sequence.
There are numerous examples of membrane proteins, whose expression levels are sensitive to synonymous codon use. For instance, a single synonymous codon change in FtsH, a membrane-bound protease in E. coli, increases the stability of mRNA structure around the ribosome-binding site and inhibits translational initiation. As a result there was a considerable reduction in protein levels . In the human dopamine receptor D2, a synonymous codon change lowered the mRNA stability and caused a reduction in protein levels . Furthermore, synonymous codon changes in the E. coli outer membrane protein OmpA resulted in a 10-fold lowering of both mRNA and protein levels . Membrane protein folding can also be sensitive to synonymous codon use. A frequent-to-rare synonymous codon change in the human P-glycoprotein (an ATP driven efflux pump) resulted in a protein with altered conformation and substrate specificity . In this study it was speculated that the synonymous codon change had affected the timing of translation and the co-translational folding of the protein.
Codon choice can also influence folding of soluble proteins. It has long been recognized that slowly translated regions can be localized downstream of protein domain boundaries  and / or secondary structures , thus facilitating co-translational folding of protein domains. Clusters of rare codons (in this case defined as codons that are read by less abundant tRNAs) are also predicted to cause translational pausing of domain boundaries in the SufI protein in E. coli . When rare codons in these clusters are changed to more frequent synonymous codons, the protein folds incorrectly even though it has the same amino acid sequence. In another study, synonymous codon changes decreased the solubility of a fatty acid binding protein when expressed in E. coli . Similar effects of codon usage on protein folding have also been demonstrated in vitro (e.g. [27, 28]).
The effect of codon use on folding of membrane proteins is a poorly understood but important aspect of protein biogenesis, as it has implications for gene design and for understanding single nucleotide polymorphisms in disease states. If we are to effectively manipulate the genetic code for membrane protein production, we must first understand these deeper layers.
Codon use in membrane protein mRNAs differs from that in soluble protein mRNAs. The difference is predominantly a reflection of differences in amino acid usage, as membrane proteins are enriched in hydrophobic amino acids (i.e. F, M, I, L, V, C) [29–31]. Intriguingly, the codons for most of these hydrophobic amino acids usually contain a uracil (U) in the second position  and a disproportionately higher number of U’s compared to codons for other amino acids (Fig. 1). Membrane proteins are also enriched in two hydrophilic amino acids (i.e. S and Y) , whose codons also contain a disproportionately high number of U’s (Fig. 1). As a result, mRNAs encoding membrane proteins contain a high U-bias compared to mRNAs encoding soluble proteins . The U-bias phenomena is more pronounced in bacteria than in eukaryotes and it has been speculated that it may be an evolutionary relic of an mRNA targeting pathway. In support of this hypothesis, it has been shown that mRNAs encoding two E. coli membrane proteins (LacY and Bgl) are localized to the inner membrane through the regions encoding the transmembrane helices (i.e. the regions that are most U-biased) . Other membrane-protein specific trends, such as GC-richness in the third codon position have been noted . Whilst the U-bias and the 3rd-position-GC-richness phenomena are intriguing, the physiological relevance remains to be determined.
An interesting but poorly understood characteristic of all mRNAs is the presence of rare codon clusters, which can induce ribosomal pausing. Such clusters have been detected in membrane protein mRNAs from S. cerevisiae , E. nidulans , E. coli and B. subtillus . Our analysis of the E. coli data set indicates that there is little difference in the occurrence or location of the rare codon clusters between membrane protein mRNAs and soluble protein mRNA’s. Approximately 76% of membrane protein mRNAs and 66% of soluble protein mRNA’s contain at least one predicted rare codon cluster (an average of 1.54 and 1.37 per mRNA, respectively). As noted in our analysis and in earlier studies the clusters are most often located at the 5′ of the mRNA (Fig. 2A and [37–41]). This observation is in agreement with other experimental and bioinformatics studies, which indicate a universally conserved translation speed ramp at the 5′ end of genes [42, 43]. Such a ramp might serve to minimize ribosome collisions during the early stages of translation and increase overall translational efficiency . Previous studies have also shown that rare codon regions are more common in long proteins, as maybe expected by chance alone . However, our analysis of the E. coli proteome indicated that rare codon regions occurring near the 5′ end of mRNAs are present in long and short proteins at similar frequencies (Fig. 2B).
Whilst the bioinformatics analyses indicate that there should be instances of ribosome pausing during translation of many membrane proteins, pausing has to the best of our knowledge only been experimentally demonstrated outside the ‘5 ramp’, viz. for the chloroplast the CFo-1 subunit of the ATP synthase and the D1 subunit of Photosystem II [45, 46]. Given the scarcity of experimental examples it is difficult to fully understand how important pausing might be for membrane protein biogenesis. For the D1 subunit it was hypothesized that the pause was important for co-translational insertion of co-factors and therefore for correct folding of the protein. Furthermore (as mentioned above), a single synonymous codon change was sufficient to alter folding of P-glycoprotein . More experimental work is therefore required to tease out the sequence characteristics (or molecular code) in mRNA that governs control of translation rate, and to determine the role that ribosomal pauses play in membrane protein folding.
One observation that may guide experimentation is that rare codon clusters are often located 45 or 70 codons downstream of a transmembrane spanning helix in S. cerevisiae [35, 36]. Since the ribosome exit tunnel can accommodate 30 – 72 amino acids (depending on secondary structure of the nascent polypeptide) [47, 48], it is hypothesized that the pause would often occur as a transmembrane helix is leaving the ribosome exit tunnel or the translocon (Fig. 3). One can speculate that increasing the time spent by a transmembrane helix in the translocon might influence (i) how efficiently it partitions into the surrounding membrane, (ii) how efficiently it interacts with more N-terminally located transmembrane helices, or (iii) how efficiently it is glycosylated on the regions flanking the transmembrane domain. In support of the last point, it has been shown that efficient glycosylation of tyrosinase (a type I membrane glycoprotein) is sensitive to the translation rate .
Determining the structures of biomedically important proteins is central to understanding and modulating function. A first step towards this goal is to obtain milligram amounts of folded proteins for structural studies using X-ray crystallography, NMR and EM, as well as for biochemical and biophysical analysis. This is not a trivial process for membrane proteins as they are difficult to overexpress. A die-hard assumption is that rare codons cause low expression during overexpression, and that optimizing codon usage will improve production levels. However, a growing number of reports challenge this simplistic view [15, 50–52]. In the following section we have tried to make sense of the somewhat confusing reports relating codon use to membrane protein production. What have we learned?
Not surprisingly, there are reports that synonymous codon changes can influence membrane protein overexpression levels. A 6- to 9-fold increase in expression was observed when genes for the GluClα and GluClβ ion channels from C. elegans were codon optimized and expressed in E18 rat hippocampal neurons . Likewise, two G-protein coupled receptors were produced after codon-optimization [54, 55], although one of them appeared to be misfolded and ended up in inclusion bodies. Moreover, a recent multi-gene study reported an increase in expression success rate (from 39 to 50%) when 28 membrane proteins were optimized by multi-parameter gene optimization . In the study, codon quality, GC-content, sequence motifs and probability to form stable mRNA secondary structures were all concomitantly optimized. Similar multi-parameter codon optimization algorithms have been described elsewhere [14, 52, 57, 58]. The algorithm of Fath and co-workers  was capable of improving the expression of 12 out of 14 membrane proteins, but only by 1–3 fold . An alternative strategy to codon optimization is to supplement the host organism with rare tRNAs [50, 59, 60].
Whilst there are numerous examples where codon-engineering strategies have been effective for overexpression of membrane proteins [53–56, 61, 62]), there are also several examples where they have failed [63, 64]. Significantly, analyses of large data sets have failed to find a correlation between codon usage and overexpression levels of membrane proteins in E. coli and S. cerevisiae [65, 66]. This conclusion was corroborated by Kudla et al, who were unable to find a correlation between codon use and overexpression levels in a library of synonymous GFP variants [15–18]. These examples indicate that there is still much to learn about codon optimization of membrane proteins.
One way to further our understanding is to analyze both successful and failed experiments. Unfortunately the failed experiments are rarely published. In our laboratory, four membrane proteins that had been optimized by different commercial multi-parameter algorithms exhibited little or no improvement in overexpression in E. coli. In agreement with this observation, 8 out of 10 codon-optimized variants of a membrane transporter did not express better than the native construct in S. cerevisiae (David Drew personal communication). Furthermore, the two that did express aggregated during purification. These observations, although under-represented in the literature, reinforce the point that rare codons may serve important roles and cannot per se be regarded as non-optimal (for recent reviews on this particular topic [5, 51, 52]). The take home message is, that the deeper layers of the RNA code and the species-related differences have not been systematically studied, and are not yet well enough understood, to be effectively exploited for production of membrane proteins.
Adding nucleotide extensions to the gene-of-interest (i.e. non-coding leader sequences, protein-coding leader sequences, whole genes) is a generic solution that can improve the mRNA characteristics for membrane protein overexpression. For example, a translational fusion between GFP and a membrane subunit of the ATP synthase stabilized the mRNA, eliminated toxic effects and resulted in high-yield overexpression . Similarly, a 28-codon tag fused to the N-termini of a library of poorly expressed GFP codon-variants normalized expression to a high level . In our hands the same 28-codon tag was able to stimulate overexpression levels for approximately 30% of the membrane proteins tested in E. coli (Nørholm, von Heijne and Daley, manuscript in preparation).
Gene sequences are shaped by evolution and are rarely optimized for translational efficiency. High protein production is a need dictated by biotechnology, whereas nature most likely requires minimization of resources. In this article we have presented examples where manipulation of the genetic code has led to higher production of functionally active membrane proteins in heterologous hosts, and other examples where codon optimization has had no effect, or has led to misfolded and unstable products. Clearly there is still a lot to learn about codon use if we are to manipulate it for protein production. In membrane proteins, the relevance of mRNA structures and rare codon clusters has not been thoroughly explored. Whilst is seems reasonable to suggest that they encode programmed translational pauses, there has been no systematic experimental analysis of their effect on translation rates and protein biogenesis. In the never-ending quest for higher production levels of membrane proteins, these deeper layers need to be kept in mind.
There is a tremendous amount of literature on codon usage, and we would like to apologize to those whose work we could not cite. We would like to thank Shashi Bhushan for help with Fig. 3 and we acknowledge support from the NIH (R01 GM081827-01) and the Swedish Research Council. SL is supported through the Bioinformatics for Life Science platform.