This multi-gene study evaluates the influence of sequence optimization on the level of protein expression of human genes in
E. coli. Unlike previous reports describing individual expression studies,
11–16 we analyzed 94 genes representing various protein classes which were optimized using a standardized multi-parameter algorithm and the expression was analyzed side by side to their wt counterpart. We expressed full-length proteins without systematically screening deletions, domain boundaries, tag positions or expression conditions. Such methods can increase the amount of expressed soluble protein
22 and may be required when producing a single protein domain in high amounts
23 as shown here by high-level expression of TLR2 and Jak2 subdomains (Supporting Information
Fig. 5). Besides optimization,
de novo gene synthesis provides for the accurate gene whereas only one third of the wt genes obtained from commercial sources showed the correct and complete DNA sequence compared to its EntrezGene entry. Due to freedom of sequence design, 99 of 100 sequence-optimized genes were successfully synthesized whereas 6 wt genes failed. We developed an expression vector which contains a cleavable N-terminal His-tag and a tightly regulated T7 promoter. A fluorescence-based method for reliable protein quantification using the fluorescent dye Chromeo P503 was applied to quantify
in vivo protein expression with a fluorescent imaging system. Membrane and a few weakly expressed proteins were detected by Western blotting using a fluorescence-labeled primary antibody
24 directed against the His tag.
In our opinion, the use of rationally designed synthetic genes has several advantages over wt cDNAs: (i) 99% in contrast to 34% of the desired constructs were readily available with a reliable, proven sequence, (ii) the chance to achieve expression of a protein was elevated from 69 to 78% in our study using sequence-optimized constructs, and (iii) the level of expression was enhanced 3.5-fold on average. We observed a success rate of 70% using sequence-optimized genes (enhanced expression compared to wt genes). This observation matches closely with results already seen for the expression of 30 human sequence-optimized short-chain dehydrogenase/reductase genes.
11Cells grew to higher cell densities when expressing sequence-optimized genes, probably due to a more efficient translation of heterologous transcripts which might lead to accelerated cell growth in general. As postulated by Kudla et al., a mRNA transcribed from an optimized gene might sequester fewer ribosomes resulting in smoother translation which could lead to an increase of the total cellular protein synthesis and thus cell growth.
25In cases where optimized genes expressed lower than wt genes, OD values of the bacterial cultures also lacked behind. Negative influence of the accumulating protein within the bacterial cell, especially when using autoinduction medium, might be one explanation for the observation of a higher expression level measured for 19 wt genes (wt > opt). In three out of ten cases tested (kinase FYN and transcription factors NFκBIA and GATA1), expression level analysis after a shorter induction period showed an inverse result (opt > wt; data not shown). In addition to
in vivo expression, all constructs were analyzed in an
E. coli cell-free expression system (Supporting Information
Table 4). Similar to
in vivo, cell-free expression levels of wt genes were higher in 13 out of these 19 cases. With 5 out of 15 genes
in vivo and 6 out of 15 in the cell-free expression system this finding was most pronounced within the group of transcription factors. This might be explained with disturbing effects of these foreign DNA-binding proteins to the transcriptional regulation of the host cell machinery. Even though it is commonly believed that proteins toxic to the living cell can generally be expressed in the corresponding cell-free lysate
26 it is obvious that cellular functions such as transcription are also required for in vitro transcription/translation systems. Whether or not DNA binding or other effects are responsible for the failure to express certain proteins like transcription factors will have to be elucidated by more selective studies. Assuming that homologous expression of such proteins does not or not as severely disturb the physiology of the host, the toxicity hypothesis is indirectly supported by the fact that in a similar study conducted in eukaryotic cells expression of all transcription factors could be increased by gene optimization (Stephan Fath, Ralf Wagner
et al., manuscript in preparation).
We observed a good correlation between cell-free and
in vivo expression in
E. coli as it has been reported in other studies
27,28 and confirm that cell-free expression is a convenient screening tool for small-scale protein expression.
We did not check for solubility of all proteins expressed using sequence-optimized constructs, but 34.2% of those tested could be purified under native conditions and an additional 30% worked under denaturing conditions. This degree of solubility is slightly higher than described in other studies.
29 The exact yield was quantified for 24 proteins and varied between 1.2 and 80 mg/L bacterial culture. The expression ratios determined in crude cell-lysates and the yield ratios calculated from amounts of purified protein show a good correlation. Therefore, sequence-optimization does not only result in enhanced expression levels in crude cell-lysates but also in elevated protein amounts that can be purified. Beyond that, protein expressed from sequence-optimized constructs does not change its solubility compared to protein derived from wt sequences.
It has been postulated that although heterologous expression can be improved by altering the codon preference, the effect can generally be achieved by introducing rare codon tRNAs into the host cell.
11 Our results clearly show that an adapted codon bias is only one parameter which contributes to an enhanced expression level using sequence-optimized genes. In all cases analyzed here and in a study reported previously,
4 the expression level of wt genes in bacterial strains supplemented with rare tRNAs was exceeded significantly by using sequence-optimized counterparts in nonsupplemented strains. This might be in accordance with the finding that simply choosing the codons most frequently used by an expression host will not ensure protein expression. Instead, the use of codons served by tRNAs during translation which are most efficiently recharged seems to be important in situations of amino acid starvation.
30 Recently, it was reported that the use RosettaDE3 strains leads to improved purity of purified protein rather than to a great enhancement of protein expression levels.
31The mRNA stability at the 5′terminus has an influence on the expression level of heterologously expressed genes in
E. coli. A lower amount of free energy corresponds to weaker hairpins in the 5′ region of mRNAs and therefore enhances translation initiation efficiency.
25,32 One of our algorithm parameters aims at avoiding hairpin forming inverted repeats in the 5′ region but we did not observe a correlation between lower Δ
G values and enhanced expression levels (Supporting Information
Table 3). However, all of our constructs contained a 24 nucleotide His tag in the 5′ region which is known to elevate recombinant protein expression.
27,33,34 We believe that this leader sequence attenuates mRNA secondary structure formation and allows efficient initiation of translation. Therefore, at least in case of N-terminally tagged proteins other factors discussed in this study seem to play a more important role than a low amount of free energy in the 5′ region.
Our data show that the mRNA amount clearly correlates with protein expression levels in case of LCK and CK1. Factors like a prolonged half-life or a reduced sensitivity for RNA nucleases might be explanations for elevated mRNA levels. At this point it is probably fair to state that, compared to the situation in mammalian expression systems, the impact of mRNA optimization in E. coli is not as well understood. Certainly however, effects during translation elongation can also have an influence on protein expression since the ratio of maximum mRNA levels of optimized to wt for CK1 is 36.8 whereas the protein ratio is only 3.9, and ratios for LCK are 0.26 (mRNA) and 0.41 (protein). These data show that despite the high success rate of increasing yields of full-length human proteins details of the molecular mechanisms underlying translational control remain to be elucidated. The fact that 20% of the gene sequences calculated by our algorithm resulted in lower expression compared to the wt cDNA suggests a potential for further improvement in gene design for heterologous protein expression.
In summary, this multi-gene study supports the conclusion that an improved codon usage is not the only parameter which has to be considered when using an expression system like E. coli for heterologous protein production. In fact, an optimization strategy has to provide a balance of (i) an adapted codon choice, (ii) a balanced GC-content, (iii) avoidance of sequence repeats and other DNA motifs, and (iv) the avoidance of mRNA secondary structures especially at the translation initiation region, all of which is accounted for in our algorithm. In our view, the gene redesign concept can be regarded as the method of choice for expression of recombinant proteins, since it not only guarantees the availability of an expression construct of correct sequence but also significantly increases success, that is expression rate. This applies to heterologous expression of human genes in E. coli and may also hold true for homologous expression of human genes in human (HEK293) as well as in insect cells (Stephan Fath, Ralf Wagner et al., manuscript in preparation). As the improved success rate applies to high-level production of full-length proteins the concept promises to facilitate investigation of multi-protein complexes, an important future goal in biochemistry.