A large-scale, high-throughput means of generating numerous proteins depends dramatically on robust, reliable and routine methods of cloning genes and expressing proteins. An approach utilizing synthetic genes presents an attractive option. The strategy described in this report, utilizing the appropriate computer software, minimizes the effort involved in designing oligonucleotides used for PCR-based gene synthesis. In many cases, the time required between identification of a protein sequence and obtaining an expression product can be as short as 1 week. Unfortunately, protein expression is strongly dependent on post-translational events. Thus, although the synthetic gene is optimized for expression, the yield of any particular protein may vary considerably.
The frequency of nucleotide errors is largely dependent on the quality of the oligonucleotides, rather than the fidelity of the polymerase. In published results of Pfu
fidelity assays (28
), it was found that 0.42% of transformants were lacĪ
after amplifying the lacI
gene within a 1.9 kb Eco
RI fragment. Assuming that within this fragment there are 349 sensitive nucleotides in the lacI
gene which, when mutated, would cause the lacI
gene product to become non-functional, it can be estimated that under standard operating conditions, Pfu
-mediated PCR will minimally give rise to approximately 0.012 errors per kilobase of cloned PCR product [(0.0042 × 1000)/349]. Because errors could occur outside the sensitive sites without affecting lacI
activity, a better estimate of total errors (both silent and destructive mutations) would be approximately 5.4 times higher, 0.065 errors per kilobase of cloned PCR product. The overall error rate seen from the data presented in Table is 1.8 errors per kilobase of sequenced synthetic gene product, 32 times higher than that estimated from PCR-mediated errors alone.
Theoretically, the likelihood of incorporating errors into oligonucleotides increases with the size of the oligonucleotides, and therefore the lengths of synthetic oligonucleotides were kept to a minimum. However, there was little correlation between the average size of the oligonucleotides and the frequency of error generation. Also, there is little correlation with the length of the gene and the number of errors. As shown with the hCXCR4 and PPP genes, longer genes did not contain any errors, while shorter genes, such as mMCP-5 and hLT, contained multiple errors per clone. Thus, we find that random sequence errors introduced during oligonucleotide synthesis depend mostly on the quality of the synthetic source, and not systematically from either the length of the oligonucleotides or the number of oligonucleotides needed for the synthetic gene.
Because most of the synthetic gene is constructed from two overlapping oligonucleotide chains, it is highly unlikely that the same error will occur in complementary oligonucleotides. Thus, any error that does arise in a single oligonucleotide has at most a 50% chance of being incorporated into the synthetic gene. Interestingly, in our experiments insertion errors occurred with the lowest frequency. Deletion and mismatch errors seemed to be dominant and arose in most of the transformants. This is probably due to the technical process of oligonucleotide synthesis. Most of the errors could be overcome by either screening a larger number of transformants or correcting the mistakes by site-directed mutagenesis performed on the original ‘faulty’ genes. However, for longer genes it might be best to correct multiple deletion errors by resynthesis of the oligonucleotides.
Mispriming (oligonucleotides priming at unintended sites) could arise in specific cases. In the genes investigated here, no mispriming was found. The use of longer oligonucleotides, at the expense of increasing errors, should minimize mispriming, allowing full gene synthesis in a single step. Based on our experience, we suggest that the oligonucleotide overlap melting temperatures should be set to at least 58°C and can be raised to as much as 70°C to minimize mispriming. Combining longer oligonucleotide overlaps and ‘touchdown’ PCR (29
) can help eliminate mispriming. However, proteins containing multiple repeats of amino acid sequences would produce highly similar DNA sequences, increasing the chance of mispriming. In such cases, it would be beneficial to screen for possible mispriming before synthesizing the oligonucleotides. At present, this feature is not available in DNAWorks, but will be incorporated into future versions. However, possible mispriming can be analyzed with several other available software tools. Although the assembly protocol works relatively well for small genes (shorter than ~500 bp), in the case of longer genes some problems begin to arise. In such cases, PCR mispriming could become more prevalent, as reported in other studies (30
). These problems can be overcome by dividing the gene synthesis into regions of 200–300 nt, and then amplifying the gene from combined purified gene fragments with the outer primers.
Overall, once the synthetic gene is designed using DNAWorks and the oligonucleotides are synthesized, it should take 3–4 days to clone the gene and submit for sequencing. Based on the results described here, from six initial transformants at least one of these clones will have the correct sequence. Several features, currently not implemented in DNAWorks, will likely increase the utility of this program. Among these are the ability to insert restriction sites within the sequence to facilitate restriction analysis and mutagenesis work on a particular gene, and monitoring long-range repeats to minimize mispriming in large genes (currently mispriming can be avoided by breaking the gene assembly step into smaller steps). The availability of a wider range of other multiple cloning system flanking sequences would give the user flexibility in cloning the synthetic gene directly from the PCR product. These and other extensions will be incorporated into future versions of DNAWorks.
The executable version of DNAWorks for Windows can be downloaded from the website http://mcl1.ncifcrf.gov/lubkowski.html.