Several methods for assembling genes from synthetic oligonucleotides have been developed [11
]. Our effort to include gene synthesis in a gene to structure pipeline for non-automated structural genomics [16
] prompted us to expand current methods in gene assembly for recombinant protein production. Our method is a modification on the thermodynamically balanced inside-out (TBIO) PCR-based gene synthesis [12
] and is referred to as "Sequential TBIO (SeqTBIO)" because of the incremental and individual successive steps involved. The principle is illustrated in Figure . An initial DNA fragment made from 2 central oligonucleotides is extended bidirectionally, one oligonucleotide pair at a time. Major differences with the TBIO methods are the number of oligonucleotides per step (1 pair instead of 4 to 6), the number of cycles per step (only 7 compared to 25) and the absence of any gel purification step. Overall, despite a higher number of steps, our method is faster and more robust because the steps are simpler and require less manipulation.
To demonstrate the simplicity of our improved methodology for synthesizing DNA fragments of different sizes, the assembly of two novel nucleotide sequences, paz and polA, were performed. The protein encoded by paz is a 15.41 kDa PAZ (Piwi/Argonaute/Zwille) domain, a siRNA-binding domain of an Argonaute protein homologue. The polA gene codes for a deletion mutant of a family A DNA polymerase lacking the 5'-3' exonuclease domain, with a molecular weight of 67.70 kDa. The paz sequence was assembled in just 4 steps (Figure ). Each reaction resulted in a single band clearly showing an incremental increase in DNA fragment size. The assembled synthetic gene fragment contained a total of 449 base pairs and included vector homologous regions at its ends. Two clones were sequenced, one contained 2 point deletions and the other one was error-free and was subsequently used for protein expression.
Gene assembly results for polA are shown in Figure . Each reaction involving the synthesis of polA resulted in a single band product. Because the yield began to decrease after about 20 reactions (reactions 20–22), the last reaction was repeated using more template (2 μl of the 21st reaction product instead of 0.9) and more cycles (15 instead of 7), which was enough to obtain a pure single DNA product (lane F). The assembled synthetic gene fragment contained 1826 bp including sequences at the 5'and 3' termini that were homologous to the expression vector sequence. After co-transformation with a linearized vector and plasmid isolation from cultures grown from individual colonies, plasmids isolated from 4 clones were sequenced. All contained the expected insert, and among the 4 clones, a total of 6 single-nucleotide mutations were detected, including 1 deletion, 3 transitions and 2 transversions. A clone with a single mutation (the deletion) was selected for error correction.
In vivo homologous recombination
We used in vivo
homologous recombination to efficiently insert assembly products into a propagating plasmid vector and to rapidly correct synthesis errors. Gene cloning mediated by RecA-independent homologous recombination in E. coli
is well documented [17
] although it has not become a mainstream technique despite its simplicity and efficiency. It is based on the ability of many E. coli
strains (including the RecA deficient ones used in cloning) to perform in vivo
intermolecular recombination between DNA fragments sharing homologous sequences at their ends. In our experiments, synthetic gene fragments can be quickly subcloned into a linearized target plasmid vector (Figure ) without restriction digest, ligation or other enzymatic manipulation. Virtually 100% of the resulting clones contained the correct insert, eliminating the need for screening. In addition, synthetic DNA fragments produced with our method can be easily assembled further into larger constructs by in vivo
homologous recombination of overlapping fragments. We have successfully synthesized a 3.6 kb and a 6 kb genes by assembling two 1.8 kb and three 2 kb fragments respectively (data not shown).
Errror correction by Site Directed Mutagenesis
Synthetic genes inherently have errors derived mainly from inaccuracies in the oligonucleotide syntheses and to a lesser extent from the DNA polymerase-mediated assembly [14
]. The error rates observed in our assembly products were consistent with the 1 to 3 errors per kb reported by others [20
], and imply that, especially for larger genes, prohibitively large numbers of sequencing reactions need to be performed in order to have a high probability to find a clone with the correct sequence [21
]. Several approaches have been proposed to decrease the error rate, in particular using enzymes involved in mismatch recognition on renatured assembly products [20
]. However, these may be difficult to implement due to the cost and availability of such enzymes. Therefore, for small-scale gene synthesis projects, it is often simpler to correct errors through site-directed mutagenesis (SDM). Numerous SDM techniques have been described over the last two decades, many of which involve more than one PCR step, the use of additional enzymes or further complex manipulations [23
]. We applied a SDM method based on in vivo
homologous recombination using a single PCR step, which is both simple and efficient [18
]. In its simplest form, competent cells are directly transformed with PCR products from a single amplification step generating overlapping fragments.
The error correction strategy using in vivo homologous recombination is illustrated in Figure . We have approached it in two ways. First, primers are designed to produce corrected fragments of the gene assembly product (Figure ). In this case, two, three and four pairs of primers (F1-R1, F2-R2, F3-R3 and F4-R4) are required to correct one, two and three error sites respectively in separate reactions. The resulting PCR products are mixed with the linearized plasmid vector and used to transform competent cells in which in vivo homologous recombination is allowed to occur. In our hands, as many as 4 PCR amplified corrected fragments recombined accurately with the recipient vector and generated error corrected products.
The second approach involves the amplification of DNA fragments that include the plasmid vector (Figure , panels B-D). Typically, up to 3 point mutations are corrected at one time. In such a case, two correcting primers, reverse-complement of each other, are designed at each mutation site, with the correcting nucleotide being at the center of each primer. DNA fragments are amplified by PCR using primer pairs F1-R1, F2-R2 and F3-R3 respectively in 3 separate reactions. In the case of 2 point mutations, two pairs of primers are similarly used such that there will be only 2 separate reactions. When a single synthesis error needs to be corrected, a non-mutagenic primer set corresponding to a sequence in the vector backbone is used in addition to the correcting primer set such that 2 fragments are generated (as if 2 corrections were being made) in order to avoid using mutually annealing primers in a single reaction.
In correcting the gene synthesis error of polA that had a point deletion, the two strategies were pursued in parallel. First, 2 overlapping fragments of the same plasmid were amplified (illustrated in Figure , panel B), each with a primer correcting the deletion and a primer corresponding to a sequence in the vector backbone. The primers in each fragment were reverse complement of the primers in the other fragment. In the second approach, only the gene synthesis product was amplified from the plasmid template in 2 fragments (each fragment being amplified with a correcting primer and a terminal oligonucleotide used for the original gene synthesis). In this case, the amplification products were mixed with the linearized vector allowing in vivo recombination to occur between the 3 fragments (illustrated in Figure , panel A). In both cases, 40 pg/μl of plasmid template was used in the correcting PCR mixture. Between 10 and 100 colonies were obtained for each transformation.
A random selection of colonies was analyzed for successful recombinant inserts. In both correction procedures used, the correction efficiency was 50% in which half the clones analyzed contained completely accurate sequences of the recombinant insert. A corrected clone from the first approach was selected for protein expression to demonstrate the integrity of the plasmid in the face of possible site mutations in the vector sequence.
An obvious concern when plasmids are used as PCR templates is the risk that they compete with the desired recombination products after transformation. Even if amplified linear DNA is several orders of magnitude more concentrated than the circular plasmid template in the PCR products, homologous recombination being a rare event, even a modest amount of circular plasmid template may result in the unwanted presence of non-recombined clones among the colonies after transformation. Several strategies have been proposed to address this problem, including gel purification of the amplified DNA fragment, linearization of the plasmid template by restriction enzyme digestion prior to PCR [18
] and treatment of PCR products with DpnI which cleaves methylated DNA [26
]. In our effort to minimize the number of steps and to simplify the method, we attempted to prevent plasmid carry-over by diluting the plasmid template in the correcting PCR. In the case of polA
, this strategy succeeded but the dilution level was not optimal since only half of the clones were recombination products.
When applying the same error correction technique to other genes, we found that further diluting the template to 3 pg/μl or less in the correcting PCR consistently resulted in 100% correction efficiency, effectively decreasing template carry-over to negligible levels. However, there were unexpected consequences as new mutations appeared in otherwise corrected clones. Despite using one of the highest-fidelity DNA polymerases commercially available, unintended mutations appeared when plasmid templates were diluted to 10 pg/μl or less. The mutation rate seemed to correlate with the level of template dilution, with over 1 mutation per kb when the template concentration was under 1 pg/μl. All new mutations involved a single nucleotide. A compilation of 79 mutations detected after sequencing 94 kb of corrected clones showed transitions to be prevalent (59% of all mutations observed) with an equal amount of type 1 (A to G and T to C), and type 2 (G to A and C to T), followed by deletions (28%), transversions (9%) and insertions (4%). Note that in all cases where new mutations were observed the template concentration was significantly lower than the DNA polymerase manufacturer's recommended 100 to 600 pg/μl of plasmid template input. In order to obtain reproducible 100% error correction while avoiding additional mutations, we found it necessary to limit the template dilution in the correcting PCR to the levels recommended by the enzyme's manufacturer and therefore to include a template removal step before transformation.
Our observations of increased mutation rates when using highly diluted templates in PCR are consistent with a previous report that low copy number template can decrease the fidelity of both Taq and Pfu DNA polymerases [27
]. To our knowledge, no explanation was ever proposed to account for such a phenomenon. However, it can have serious implications for PCR methods involving samples with limited availability such as ancient DNA, forensics or pre-implantation genetic diagnosis. Further studies are needed to confirm whether this phenomenon affects PCR in general and how the sensitivity to template concentration varies among different DNA polymerases or with varying reaction conditions.
Protein expression, crystallization and preliminary X-ray analysis
Expression of recombinant proteins PAZ and PolA were performed by transforming an E. coli expression cell strain with the plasmids containing the error-free inserts. A random selection of resulting colonies was screened for small scale expression. Clones that demonstrated recombinant expression as determined by SDS-PAGE analysis were further cultured for large scale preparation. The yields of both proteins were approximately 30 mg/L. Since both PAZ and PolA recombinant proteins were derived from hyperthermophilic microorganisms, they had the outstanding property of being heat resistant. Thus, the proteins can be immediately purified to more than 70% after a heat selection step. PAZ and PolA can be purified to more than 90% homogeneity based on SDS-PAGE analysis after ion exchange and size exclusion chromatography (Figure ). The approximate size of the purified proteins were 15 kDa and 70 kDa for PAZ and PolA respectively. In both cases, protein crystals suitable for X-ray diffraction were obtained within 5 days, and as a result their space group and unit parameters were determined. PAZ crystals appeared as orthorhombic habit and grew as large as 0.5 mm in the longest dimension. The space group of the crystals was determined to be P212121 with unit cell dimensions of a = 36.831 Å, b = 58.671 Å and c = 61.819 Å. Prismatic crystals of PolA grew as big as 0.4 mm in the longest dimension. The space group and unit cell dimensions were identified to be P3221 and a = b = 148.07 Å, c = 105.63 Å respectively. The crystals of PAZ and PolA diffracted in the range between 2.5–8 Å resolution using a home laboratory X-ray source. The crystallographic structure determination of both proteins is in progress.
Figure 3 SDS-PAGE analysis and crystallization of PAZ and PolA. Purified recombinant proteins PAZ and PolA are shown in the upper panels (lanes 2) against standard molecular markers (lanes 1) containing myosin, phosphorylase, BSA, glutamic dehydrogenase, alcohol (more ...)