The CAI has been used to provide a crude indication of the ‘favorability’ of a coding sequence towards protein expression in an organism of choice (9
). In E.coli
, highly expressed genes such as those encoding ribosomal proteins typically have CAI values ≥0.60, while poorly expressed genes have much smaller adaptation indices. The human cDNA sequences encoding PKB2, S6K1 and PDK1 (Figs 1S–3S) have slightly lower CAI values of 0.45, 0.50 and 0.55, respectively, suggesting that these genes could be further optimized for high-level protein expression in E.coli
. The synthetic gene sequences of human PKB2, S6K1 and PDK1 obtained after codon optimization (Figs 1S–3S) have increased CAI values of 0.60, 0.66 and 0.69, similar to that of highly expressed genes in E.coli
). In addition, the synthetic gene sequences were further engineered so as not to contain any recognition sites of restriction enzymes selected for cloning purposes. The synthetic gene sequences of human PKB2, S6K1 and PDK1 were used for the simultaneous design of oligonucleotide primer sets for both the TBC and the TBIO methods of PCR-based gene synthesis (Tables 1S–3S).
Design of primers for the TBC and TBIO methods of PCR-based gene synthesis
Computer programs that are currently available to automate the design of temperature-optimized overlapping primers often fail to obtain homogeneous Tm
values for primer overlap regions. For example, the DNAWorks program obtained average Tm
values ranging from 52 to 67°C for 11 different synthetic gene sequences (5
). In addition, the range between the minimum and maximum Tm
values about each of these average Tm
values varied from 5 to 18°C. The wide variation in Tm
values obtained by computer optimization resulted primarily from the design of short oligonucleotide primers (30–50 nt) for gene sequences, which contained regions of widely varying GC% content. In order to provide no bias in the comparison of the efficiency between the conventional TBC and the novel TBIO methods of PCR-based gene synthesis, a procedure was developed in which TBC and TBIO primer sets (Tables 1S–3S) were generated for each gene sequence using identical overlapping regions with a highly restricted optimized Tm
value of 64 ± 2°C, which is recommended for the high-fidelity KOD proofreading pfu
polymerase (Novagen). In this paper, the Vector NTI computer program (Informax, Inc.) was used to calculate Tm
values during selection of the temperature-optimized overlapping nucleotide regions. How ever, it has come to our attention that computer programs have become available, such as HyTher™ (Peyret and SantaLucia, Wayne State University, Detroit, MI) and Visual OMP2® (DNA Software, Ann Arbor, MI), which offer much improved Tm
accuracy and also provide improved models for predicting mis-hybridization and magnesium dependence.
Each of the TBC and TBIO primers were chosen to be 60 nt in length, which could be readily synthesized with high fidelity and at a low cost. The probability of formation of hairpins or primer–primer pairs between 60mer can essentially be neglected, since the growing template concentration approaches the concentration of primers. The TBC primer set consisted of sense- and antisense-strand primers, which coded for regions throughout the gene sequence (Fig. A). In contrast, the TBIO primer set consisted of sense-strand primers that coded only for regions in the N-terminal half of the gene sequence and antisense-strand primers that coded only for regions in the C-terminal half of the coding sequence (Fig. A). Each of the TBIO primers coded for the identical nucleotide regions that the TBC primers covered, except that half of the primers were taken to be reverse complement sequences.
For both the TBC and TBIO primer sets (Tables 1S–3S), the number of base pairs in regions of primer overlap varied from 18 to 26 for PKB2, 19 to 33 for S6K1 and 16 to 38 for PDK1. The significant variations in GC% content across each of the gene sequences of PKB2, S6K1 and PDK1 clearly emphasize the importance of selecting primer lengths, which can accommodate the large number of overlapping base pairs necessary to ensure the highly restricted optimal Tm of 64 ± 2°C. In rare instances, nucleotide regions were incurred for which the desired Tm value could not be achieved, namely due to long stretches of low GC% content (e.g. C-S05 and C-A05 in PKB2, Table 1SA; C-S02 and C-A02 in S6K1, Table 2SA; and C-S10 and C-A13 in PDK1, Table 3S). For these three cases, two approaches were tested. For PKB2 and S6K1, one primer was extended beyond 60 nt and the subsequent primer was shortened in order to obtain an overlapping region that yielded the optimal Tm of 64 ± 2°C (Tables 1SA and 2SA). For PDK1, the 5′- and 3′-terminal overlapping regions were allowed to share a common region of overlap near the middle of the primer (green-colored nucleotides, Table 3S). As reported below, error-free double-stranded DNA fragments were synthesized across these indicated regions by both the TBC and TBIO methods, indicating that either approach can be utilized. Computer algorithms may be more easily adapted to the method used for PDK1 (Table 3S), which maintains a constant primer length of 60 nt but requires partial sharing of the 5′- and 3′-terminal overlapping regions.
TBC method of gene synthesis
Synthesis of sequential overlapping gene fragments. Table summarizes the results for TBC PCR-based synthesis of either the full-length sequence, two half fragments or four quarter fragments of the synthetic genes for PKB2, S6K1 and PDK1. For the assembly reaction of each gene fragment, the concentration of primers was varied at 20, 50, 100, 200 and 400 nM, and the number of PCR cycles was varied at 25, 35, 45 and 55 cycles for each concentration. For the 20 different assembly reaction conditions tested for each fragment, 0.5, 1, 3 and 5 µl of the assembly product were removed and used as the template for the PCR amplification reaction. The full- and half-length gene fragments of PKB2 (1494, 798 and 722 bp), S6K1 (1622, 894 and 748 bp) and PDK1 (1712, 866 and 866 bp) could not be obtained under any of the reaction conditions. In addition, the assembly reaction mixtures were subjected to agarose gel electrophoresis and the region corresponding to the correct molecular weight of the desired PCR product often showed staining, indicating the presence of correct-sized DNA. However, upon gel purification of these regions no full-length product could be PCR amplified with outside primers. In order to investigate the nature of these correct apparent molecular weight species, the gel-purified DNA species were subjected to further agarose gel electrophoresis. In such cases, the DNA species partitioned predominantly into lower molecular weight products, with a residual amount of the correct-sized species, suggesting that the higher molecular weight species consisted of non-covalent hybrids of lower molecular weight DNA. Cloning and sequencing of the gel-purified DNA indicated the presence of numerous random partial-sized fragments from throughout the gene sequence.
TBC synthesis of sequential overlapping fragments for PCR-based assembly of the codon-optimized gene sequences of PKB2, S6K1 and PDK1
In contrast to the larger full- and half-length fragments, all four quarter-length gene fragments (332–465 bp) could be obtained for each gene sequence under many of the reaction conditions tested (Table ). The reaction conditions that appeared most optimal consisted of using 200 nM primers in the assembly reaction with 25–35 PCR cycles and using 0.5 µl of the assembly reaction product as the template for further PCR amplification. Nonetheless, adequate amounts of product were also obtained by using decreasing concentrations of primers with 25–35 PCR cycles and by using increasing volumes of the assembly reaction product as the template for further PCR amplification. The observation that the successful syntheses of the quarter-sized fragments were relatively insensitive to the array of PCR conditions tested suggests that further optimization of PCR conditions would be unlikely to yield full-length DNA products for the larger fragments. It is possible that the sequences in a number of specific TBC primers resulted in mis-priming events. Due to the number of various random fragments generated by the TBC method, no specific primers could be identified. Alternatively, PCR assembly of full-length DNA sequences could be inhibited by cross-annealing reactions between partial fragments, which share common nucleotide regions as described above.
Although numerous gene sequences approaching 1000 bp have been successfully synthesized by the TBC approach (4
), only gene fragments of 332–465 bp could be obtained for the synthetic gene sequences of PKB2, S6K1 and PDK1 (Table ), indicating a requirement for an efficient method of joining sequential fragments. The ability to synthesize gene sequences of defined regions can vary depending on the defined gene sequence. For synthesis of long gene sequences in which restriction enzyme digestion and ligation of multiple sequential fragments is required, it may be hard to predict optimal regions where unique restriction sites should be engineered. If a defined fragment cannot be fully synthesized, then new contiguous fragments must be engineered to contain unique restriction sites near the 5′ and 3′ termini of each new fragment. Thus, the approach of restriction enzyme digestion and ligation of sequential gene fragments generated by the TBC method has the potential to be overly laborious if pre-defined gene fragments cannot be generated. Alternatively, the TBC method can be used to generate multiple arrangements of sequential overlapping gene fragments (Table ). Then, the fragments can be joined to yield the full-length gene sequence by subsequent assembly and amplification PCRs.
PCR-based assembly of sequential overlapping gene fragments. The full-length gene sequences of PKB2, S6K1 and PDK1 can be generated by four possible combinations of joining the four quarter-sized fragments that were obtained (Table ). First, the full-length gene sequence may be obtained by simultaneous joining of all four fragments. Alternatively, the N-terminal two quarters and the C-terminal two quarters can first be joined in separate reactions, and the resulting two half-sized fragments can be joined to yield the full-length gene sequence. Finally, the quarter-sized fragments can be joined sequentially by proceeding either in the N- or C-terminal directions of the gene sequence. An array of PCR conditions was applied to all possible combinations of joining the four fragments (Table ). The concentration of gene fragments in each assembly reaction mixture was varied at 20, 50, 100 and 200 nM each. Either 0.5, 1, 3 or 5 µl of the product of the assembly reaction was diluted into a 50 µl PCR amplification reaction mixture, which contained 200 nM of the outermost 5′-sense- and 5′-antisense-strand primers.
PCR-based assembly of sequential overlapping fragments generated for the codon-optimized gene sequences of PKB2, S6K1 and PDK1 by the TBC method
Figure A shows agarose gel electrophoresis of the reaction products obtained after simultaneous PCR assembly of the four fragments generated for PKB2, S6K1 and PDK1. The results shown in Figure A were obtained using 20 nM fragments in the assembly PCR mixture, and no full-length gene sequence could be PCR amplified from these reaction mixtures. In all three cases, it appears that the primary reaction products consisted of a mixture of gene sequences in which only three of the four fragments were successfully joined. When the assembly PCRs were carried out using higher concentrations of the individual fragments, the primary reaction product consisted of a mixture of lower molecular weight species in the range expected for the quarter- and half-sized fragments, and no full-length gene sequence could be PCR amplified from these reaction mixtures. As described above for PCR synthesis of the full- and half-length fragments, the assembly reaction mixtures containing the four quarter-sized fragments were subjected to agarose gel electrophoresis (Fig. A) and the region corresponding to the correct molecular weight of the desired PCR product often showed staining (e.g. S6K1), indicating the presence of correct-sized DNA. However, upon gel purification of these regions no product could be PCR amplified. Again, the gel-purified DNA species were subjected to further agarose gel electrophoresis, and the DNA species partitioned predominantly into lower molecular weight products, with a residual amount of the correct-sized species. Thus, the higher molecular weight species consisted of a complex mixture of non-covalent hybrids of partially assembled DNA fragments.
Figure 3 (A) Simultaneous PCR assembly of the four sequential overlapping gene fragments generated by the TBC method for PKB2, S6K1 and PDK1. Agarose gel electrophoresis shows a complex mixture of partially assembled fragments (see text). No full-length gene (more ...)
In order to determine the reason that the full-length reaction products could not be obtained, each of the fragments were tested for their ability to be joined to its adjacent fragment in individual reactions. Table further illustrates the results of the various strategies for PCR-based assembly of the sequential overlapping fragments. There are three possible combinations of two quarter-sized fragments, which can yield either an N-terminal half fragment, a C-terminal half fragment or a central fragment. For PKB2, S6K1 and PDK1, the N-terminal half fragment, the C-terminal half fragment and the central fragment were obtained by joining the two respective quarter fragments. The optimal reaction conditions consisted of using 20 nM each of the two quarter-sized fragments in the assembly reaction with 30 PCR cycles and using 5 µl of the assembly reaction product as the template for further PCR amplification with the two outermost primers. The assembly reaction was inhibited by using higher concentrations of the two quarter-sized fragments. Although the N- and C-terminal half fragments were successfully obtained, the full-length product could not be obtained by joining the two half-sized fragments under all of the PCR conditions tested (Table ). Finally, the N- and C-terminal half-sized fragments were tested for their ability to be sequentially joined in individual reactions to the remaining two quarter-sized fragments. For PKB2, S6K1 and PDK1, the N- and the C-terminal half fragments could be joined to the adjacent third quarter-sized fragment by using the same conditions that yielded joining of the two quarter-sized fragments. However, in no case could the three quarter-sized fragments be joined to the final quarter-sized fragment to yield the full-length gene sequence under all of the PCR conditions tested (Table ).
The results in Table suggest that successful PCR assembly of overlapping DNA fragments could be sensitive to the size of the fragments being joined. For all three gene sequences, the quarter-sized fragments (<465 bp) could be joined either to another quarter-sized fragment or to a half-sized fragment (<894 bp), but they could not be joined to the three quarter-sized fragments (≥1105 bp). In addition, the half-sized fragments (≥722 bp) could not be joined together. The results of these individual assembly PCRs are consistent with the results shown in Figure A in which the primary reaction products consisted of a mixture of gene sequences in which only three of the four fragments were successfully joined (Fig. B). The staining observed for higher molecular weight DNA resulted from non-covalent association between smaller fragments and overlapping regions. From these exhaustive studies, we suggest the following approach for more efficient TBC synthesis of long gene sequences. First, synthesize sequential overlapping gene fragments of 300– 500 bp. Then, assemble three sequential overlapping gene fragments by PCR to yield fragments of 900–1500 bp. For longer gene sequences, it is preferable to perform restriction digestion and ligation of the PCR-assembled gene fragments. Due to the unpredictable nature of generating defined gene fragments with site-specific restriction sites, we developed the following novel TBIO method of PCR-based gene synthesis, which is shown to generate long gene sequences without the necessity of restriction digestion and ligation.
TBIO method of gene synthesis
Optimization of primer conditions. Since TBIO synthesis involves systematic bidirectional elongation, the concentration of the inside to outside primers can be adjusted to yield the fully amplified DNA product in one PCR. The concentrations of primer pairs can be arranged either to be identical or to be in an increasing gradient from inside to outside. Therefore, the number of primer pairs, the concentrations of individual primer pairs, and the total concentration of all primers were tested in order to optimize the yield and purity of specific DNA products obtained by the TBIO method.
Figure B shows agarose gel electrophoresis of the DNA products obtained for TBIO PCR-based synthesis of the initial inside fragment of PKB2 using various combinations of primer concentrations. In this experiment, the concentrations of the first six pairs of inside primers were varied and assessed for their ability to generate the DNA product predicted to be 480 bp. In lanes 2, 4 and 6, the concentrations of primers IO-S01–S05 and IO-A01–A05 were held constant at 40, 80 and 120 nM, respectively, and the concentration of the outermost primer pair, IO-S06 and IO-A06, in each reaction mixture was 200 nM. The total concentration of all primers was 400 nM [= (5 × 40 nM) + 200 nM], 600 nM [= (5 × 80 nM) + 200 nM] and 800 nM [= (5 × 120 nM) + 200 nM] in lanes 2, 4 and 6, respectively. In lanes 1, 3 and 5, the concentrations of primers IO-S01–S06 and IO-A01–A06 were formulated with an increasing gradient leading up to a maximum concentration of 200 nM for the outermost primer pair. The gradients used in lanes 1, 3 and 5 were selected to yield total primer concentrations of 400, 600 and 800 nM, respectively, as previously selected for lanes 2, 4 and 6. Figure B shows that formation of the 480 bp DNA product could be detected in lanes 3, 5 and 6. However, the 480 bp DNA product was obtained in its most pure form under the conditions described for lane 3.
Similar optimization experiments were performed using four, five, seven, eight, nine and ten pairs of primers. After agarose gel electrophoresis of the reaction product mixtures, the region of the gel containing DNA of the predicted size of the fully elongated product was gel-purified and tested for its ability to be re-amplified with the two outermost primers. Variation of the total concentration and the arrangements of various concentration gradients for experiments utilizing seven or more pairs of primers failed to yield a gel-purified fully elongated DNA product, which could be re-amplified. Similar to the TBC method, excessive numbers of primers inhibited TBIO elongation. Thus, a concentration gradient of 40, 80, 120 and 200 nM was optimal for four pairs of primers; 40, 60, 80, 120 and 200 nM was optimal for five pairs of primers; and 40, 60, 80, 100, 120 and 200 nM was optimal for six pairs of primers. Using these optimized conditions, the TBIO method was used to synthesize the full-length gene sequences of PKB2, S6K1 and PDK1 by four sequential bidirectional elongation reactions using four to six pairs of primers in each step.
TBIO synthesis of PKB2, S6K1 and PDK1. The oligonucleotide primer set designed for TBIO synthesis of the PKB2, which is the shortest of the three genes, is illustrated in Figure and Table 1S. The PKB2 primer set includes the coding- or ‘sense’-strand primers, IO-S01 to IO-S19 (Table 1SA), and the non-coding- or ‘antisense’-strand primers IO-A01 to IO-A19 (Table 1SB). As illustrated in Figure A, the TBIO method of gene synthesis is initiated by using an optimized gradient of concentrations of the first five pairs of primers, IO-S01 and IO-A01 to IO-S05 and IO-A05, which generates efficient inside-out bidirectional elongation from the middle to both the beginning and end of the gene sequence region covered by the outside primers IO-S05 and IO-A05. IO-S01 is the 5′→3′ sequence of the sense strand from nucleotides 686 to 745, and A01 is the 3′→5′ sequence of the antisense strand from nucleotides 727 to 786. The 3′-terminal 19 nt of IO-S01 and IO-A01 are complementary with a Tm of 65°C (nucleotide region 727–745) (Table 1S), which is optimal for the PCR annealing temperature of 60°C that is recommended for the High-Fidelity KOD Hot Start Polymerase (Novagen).
For TBIO elongation, IO-S01 and IO-A01 (40 nM each) anneal to each other and are extended in the 3′ directions to give the double-stranded PCR product containing base pairs 686–786 (Fig. A). This 101 bp fragment serves as the template for the IO-S02 (nucleotide region 646–705) and IO-A02 (nucleotide region 761–820) primers (60 nM each). The 3′-terminal 20 nt of IO-S02 are complementary to the 3′-terminal end of the newly synthesized antisense strand, while the 3′-terminal 26 nt of IO-A02 are complementary to the 3′-terminal end of the newly synthesized sense strand of the 101 bp fragment. PCR synthesis initially generates sense and antisense strands that terminate at the ends of the 101 bp template, and subsequent PCR cycles generate the full-length double-stranded 175 bp DNA fragment enclosed by the 5′ termini of IO-S02 and IO-A02 (nucleotide region 646–820) (Fig. A). In the same manner, oligonucleotide primer pairs, IO-A03 and IO-S03 (80 nM each), IO-A04 and IO-S04 (120 nM each), and IO-A05 and IO-S05 (200 nM each) provide for continued inside-out bidirectional elongation until the 406 bp DNA fragment is generated, as defined by the 5′ termini of IO-S05 and IO-A05 (nucleotide region 529–934 of PKB2) (Table 1S and Fig. A). Figure B illustrates that after the first PCR, the 406 bp fragment of PKB2 is gel-purified and used as the initiating template for further inside-out bidirectional elongation using the primer pairs IO-S06 and IO-A06 to IO-S10 and IO-A10 to generate the 785 bp DNA fragment as defined by the 5′ termini of IO-S10 and IO-A10 (nucleotide region 347–1131) (Table 1S). The process of inside-out bidirectional elongation and gel purification is continued until the full-length target sequence of PKB2 is achieved.
Figure C shows agarose gel electrophoresis of the double-stranded DNA gene fragments generated by each step of the TBIO method of PCR-based gene synthesis of the codon-optimized gene for human PKB2. For PKB2 (Table 1S), the molecular weights predicted for primer pairs IO-S01 and IO-A01 to IO-S05 to IO-A05 (nucleotide region 529–934 or 406 bp), IO-S06 and IO-A06 to IO-S10 to IO-A10 (nucleotide region 347–1131 or 785 bp), IO-S11 and IO-A11 to IO-S15 to IO-A15 (nucleotide region 142–1328 or 1187 bp), and IO-S16 and IO-A16 to IO-S19 to IO-A19 (nucleotide region –12 to 1482 or 1494 bp) correspond to the apparent molecular weight of the predominant band in each given lane (Fig. C). Figure C also shows agarose gel electrophoresis of the double-stranded DNA gene fragments generated by each step of the TBIO method of PCR-based gene synthesis of the codon-optimized genes for human S6K1 and PDK1. For each gene sequence, the apparent molecular weight of the predominant band in each lane corresponds to the molecular weight predicted for generation of the full-length fragment of interest.
While little or no bands were detected for lower molecular weight fragments resulting from inefficient and incomplete synthesis, discrete bands were detected for higher molecular weight fragments (Fig. C), which correspond to the molecular weights of dimers and trimers of the predicted size fragment. The higher molecular weight species were gel-purified and subjected to agarose gel electrophoresis. Surprisingly, the predominant DNA band migrated as expected for a monomeric species, with a further similar partitioning into minor bands, which migrated as expected for multimeric species. In addition, the next round of TBIO elongation could be initiated from the gel-purified higher molecular weight species. Together, these results suggest that the higher molecular weight species result from equilibrium mixtures of non-covalent cross-hybridization between monomers to form multimers. Thus, any possible contamination of the monomeric species with the multimeric species upon gel purification does not hinder continued TBIO synthesis.
The high-fidelity KOD proofreading pfu polymerase (Novagen) generates blunt ends on the double-stranded DNA PCR products. Therefore, the codon-optimized genes for human PKB2, S6K1 and PDK1 (Figs 1S–3S) were cloned into the pCR®-Blunt II-TOPO® plasmid vector (Invitrogen, Inc.), and five plasmids shown to contain the correct-sized inserts for each given gene were chosen for DNA sequencing. Of the 15 genes that were sequenced, the number of errors that were incorporated by the TBIO PCR-based gene synthesis method ranged from zero to three. One clone of S6K1 was shown to contain the correct sequence and no further corrective mutagenesis was required. For both PKB2 and PDK1, clones were selected that contained only one mistake, which were each corrected by using the QuikChange® Single Site-Directed Mutagenesis Kit (Stratagene, Inc.).
Advantages of TBIO gene synthesis
The underlying strategy for the TBIO method of gene synthesis (Fig. ) is fundamentally distinct from that of the TBC method (Fig. ). For the TBC method, each primer is complementary to two other primers in the assembly reaction so that many different primer extension reactions occur simultaneously during assembly. As PCR assembly progresses, any number of combinations of fragments can form, which may contain nucleotide regions that can anneal with complementary nucleotide regions on other fragments, which is the premise of this method of assembly. However, there exists the possibility that a single-stranded DNA molecule could simultaneously re-anneal to more than one complementary partner, resulting in the formation of a complex that inhibits primer extension. Thus, the assembly reaction can produce numerous random fragments for regions throughout the gene sequence. In contrast to the TBC method, the TBIO method involves complementation between the next pair of outside primers with the termini of a fully synthesized inside fragment. TBIO bidirectional elongation must be completed for a given outside primer pair before the next round of bidirectional elongation can take place. Thus, TBIO synthesis yields a well defined and narrower range of products in contrast to the numerous possible products that can result from TBC synthesis.