The recent findings that genetic polymorphisms reflecting synonymous codon substitutions are not “silent” and are implicated in the development of various disease states mediated through splicing defects refutes the long held dogma that synonymous mutations are neutral
[28]–
[30]. Even a single synonymous codon substitution within a coding region can lead to proteins with altered substrate specificities
[2] or enzymatic activities
[1], indicating significant changes in protein structure. Thus, it is possible that subtle modulation of nucleotide sequence may also serve to regulate protein structure and function, and such sequence may have experienced evolutionary pressure to produce fully functional proteins. Clearly, the present findings portend problems in the area of applied biotechnology for heterologous protein expression from distantly related organisms with disparate codon bias. Consequently, we have developed an algorithm that adjusts the tRNA isoacceptor availability found in the natural host to that of the expression host; this may be needed to provide optimal translational kinetics in the expression host.
The basic concepts underlying the “codon harmonization” algorithm are derived from evidence provided in Thanaraj, et al, where codon usage frequency patterns from
E. coli proteins having known structures were analyzed
[22]. They showed empirically that lower frequency codon preferences (rare codons) tend to cluster within the regions of mRNA that encode the link/end segments that separate elements of higher ordered structure . These segments are approximately fifteen residues long and are encoded by clusters of infrequently-used codons
[22],
[31] that are separated by 1–10 codons
[23]. As few as two consecutive infrequently used codons can reduce the steady-state density of ribosomes on mRNA
[32]. Slowing ribosomal transit-time through such regions may allow concurrent translation and acquisition of ordered structure by a structural element, and this would be completed prior to the synthesis of the next element.
Studies of the prokaryotic ribosomal tunnel during protein synthesis support its role as an active modulator of nascent peptide secondary structure formation
[33]. High-resolution electron micrographs of the 70S ribosome from
E. coli show that the 50S subunit contains a bifurcating tunnel that is 85Å to 110Å in length as measured from the amino peptidyl transferase center to the exit site on the distal surface
[34],
[35], which can accommodate nascent peptides of 30 to 72 amino acids, depending on secondary protein structure
[7]. The diameter of the tunnel is sufficiently large to accommodate an alpha helix structure. The exit of the tunnel at the ribosomal surface, which is 25–30Å in diameter, appears to accommodate chaperones, such as Trigger Factor
[36],
[37], which, as necessary
[38], can interact with partially folded nascent polypeptides to promote their complete folding. This may serve to shield them from proteolysis prior to complete structure formation as they are extruded from the tunnel
[39].
The benefits derived by this rational process of codon substitution are most dramatically shown by our limited mutagenesis approach to create a single targeted synonymous codon replacement for I141 within the sequence encoding a putative link/end segment contained with MSP1
42 (FVO) protein. Making this single base change, (i.e., to produce the FMP003 protein), increased yields of soluble product to approximately 70 µg protein/g of wet cell paste, this being at least ten-fold over what was achieved with the native sequence. The MSP1
42 FMP003 antigen was subsequently produced under GMP conditions and shown to be highly immunogenic and efficacious against malaria challenge in an
Aotus monkey study
[25]. However, the yield of FMP003 protein was too low to be of practical use for vaccine development. Therefore we decided to “harmonize” codons throughout the entire gene sequence for MSP1
42 (FVO), producing FMP010, and obtained a sixty-fold increase in expression over the level that was detected for FMP003.
Expression levels for soluble protein from the codon harmonized MSP142 3D7.2 and MSP142 Camp.2 genes equaled the levels produced for FMP010. Our successes with the -FVO and -Camp alleles are notable, as we detected no recombinant protein when the native P. falciparum gene sequences were used for E. coli expression. Thus, this approach has overcome a practical barrier and recombinant proteins for these three genes are currently being evaluated in pre-clinical studies to determine their vaccine potential.
In addition to showing that the FMP003 protein produced a strong malaria protective effect in vaccinated monkeys
[25], we observed that FMP010 induced antibodies that inhibit malaria parasite growth
in vitro at levels comparable with FMP003 (data not shown). Such antibodies are known to be directed to important conformational epitopes in the antigen
[40],
[41]. Details for the pre-clinical evaluation for FMP010 will be described elsewhere.
The improved expression of soluble protein was not a consequence of simply changing the G/C ratio in the
P. falciparum target genes. As we show here with the LSA-NRC
E gene fragment, codon optimization, or synonymously substituting high frequency codons throughout the gene, for expression in
E. coli allowed for the production of very little protein. Codon harmonization rectified this problem by preventing plasmid loss during exponential growth, which suggests that the LSA-NRC
E expression product induced a deleterious host cell response. The MSP1
42 proteins from the FVO and 3D7 strains
P. falciparum have been expressed at high levels in
E. coli from codon optimized genes, but these proteins were insoluble and required refolding
in vitro [42],
[43].
Codon harmonization appears to offer excellent prospects for design and expression of heterologous proteins, at least in E. coli; whether or not it will be useful for other expression hosts remains to be determined. If such adjustments for relative codon usage can improve reliability of functional protein expression, this approach may represent a paradigm shift for heterologous protein expression, with important consequences for both structural biology and biotechnology. However, one may anticipate that control mechanisms other than the availability of tRNA isoacceptor molecules can also affect co-translational folding under different growth conditions or at different stages in the cell cycle. The studies presented here underscore the importance of continuing to achieve general solutions to problems of heterologous protein expression. Advances based on integration of proteomic and genomic analysis may not be fully realized until the target genes and the synthetic potential of the expression organism are completely integrated; failing to achieve such a balance may leave many potential vaccine and biopharmaceutical products undiscovered.