Polymerases to amplify GACTZP six-letter DNA were screened using templates containing various numbers of Z
nucleotides adjacent and apart (). Consistent with the minor groove scanning hypothesis, and different from experiments with AEGIS components that do not present electron density to the minor groove,18
no polymerase rejected the Z:P
pairs entirely. Further, polymerases accepting Z:P
pairs at single site also accepted Z:P
pairs when spaced apart (data not shown).
However, some polymerases had difficulty accepting multiple consecutive Z
nucleotides. For example, Deep Vent (exo+
) accepted two consecutive template-P
’s and Z
’s but not three. Taq
and Phusion, in contrast, incorporated dZ
TP opposite three and four consecutive P
’s, and dP
TP opposite three and four consecutive Z
’s (Supplementary Figure S1
). Phusion and Taq
DNA polymerases also PCR-amplified templates containing three and four consecutive Z
nucleotides with efficiencies only slightly lower than with standard DNA (, lanes 2, 3, 5, 7). Here, the retention of artificial bases in the PCR products obtained from Taq
was verified by sequencing (see below, Supplementary Table S1 and S2
). Performance at this level has not been seen with any other artificial genetic system, including those developed in this laboratory.18, 25
The relative facility with which this performance is obtained might be viewed as being consistent with the Steitz-Joyce scanning hypothesis.
But what about fidelity? To follow the misincorporation of Z
into standard sequences and the loss of Z:P
pairs from GACTZP DNA, we exploited our observation that the Bsp120I restriction endonuclease does not cleave its recognition sequence (5′-GGGCCC-3′) if any C or G is replaced by Z
A DNA molecule containing a 5′-GGGCCC-3′ sequence was amplified by Taq
PCR (at pH 8.8 or 8.0) with and without dZ
TP and dP
TP. The amplicons were then treated with Bsp120I. In the presence of 0.2 mM of both dZ
TP and dP
TP (, lane 4), after 1000 fold PCR amplification, 16% and 6% of the amplicon obtained at pH 8.8 and 8.0 (respectively) resisted digestion (see also Supplementary Fig. S2
). These results indicate Taq
only slowly replaces C:G by Z:P
pairs, with the error at high “error-generating” pH 8.8 (ca. 0.25% per theoretical cycle per site; the pH is measured at room temperature) dropping to less than 0.1% per theoretical cycle per site at the lower pH of 8.0. The observed pH dependency suggested that mismatching arises predominantly as a result of deprotonated dZTP
pairing with G, which becomes significant at high pH (, left lane 3). In contrast, dPTP pairing with C is negligible at both pH’s (, lane 2).
These fidelity results also demonstrated the existence of small amounts of “forward” mutation, where non-standard components enter a sequence during copying, rather than being lost (). In artificial genetic systems in general, polymerases show only a natural propensity to lose unnatural components. This is, we believe, the first example of forward mutation in any synthetic genetic system.
Mutation of T:A pairs to Z:P pairs was even rarer. To identify rare substitutions of this kind, we exploited our observation (unpublished) that DraI does not cut at its recognition sequence (5′-TTTAAA-3′) if any site contains Z or P. Here, DNA containing a 5′-TTTAAA-3′ sequence was amplified 1000 fold with and without dZTP and dPTP at pH 8.8 and 8.0. Here, no detectable fraction of the amplicon became resistant to cleavage (data not shown). This showed that any T:A to Z:P forward mutation was less facile than mutation of C:G to Z:P, and did not occur to less than one part in ca. 50,000.
To measure the rates of “reverse” mutation that convert Z:P pairs into C:G or T:A pairs, PCR amplification was performed on a template that contained the Bsp120I recognition sequences disrupted by Z and P nucleotides under forcing conditions. Here, double-strand GACTZP DNA (Bsp-Z and Bsp-P, ), containing 5′-GGGCCZ-3′ and 3′-CCCGGP-5′, were 1000 fold amplified with low concentrations of dZTP and dPTP (5 μM to 20 μM each) and the four standard dNTPs (200 μM each). Followed by endonuclease digestion, the fraction of cleavable amplicons was used as a metric to quantitate Z:P loss. As shown in , most amplicon was digested by Bsp120I, indicating reverse mutation of the Z:P pair to a C:G pair, recreating the recognition sequence. At pH 8.0, loss was less than 7% per theoretical cycle at 20 μM of dPTP and dZTP (, right lane 1).
To drive the loss of Z:P pairs more forcibly, DNA containing 5′-GGGCCZ-3′ (Bsp-Z, ) or 3′-CCCGGP-5′ (Bsp-P, ) sequences were amplified without any dZTP or dPTP. Here, 95% of PCR product was digested (, lane 1), indicating that Taq incorporates both dGTP (95%) and dATP (5%) opposite template Z in the absence of dPTP, mutating Z into C or T. In contrast, 70% of PCR product resisted digestion (, lane 2), indicating that Taq incorporates both dTTP (70%) and dCTP (30%) opposite template P in the absence of dZTP, mutating P into A or G.
The observed mutation under forcing conditions and pH dependency suggested mechanisms for mutation () and conditions that might maximize the fidelity of copying GACTZP DNA. At pH 8.0 (measured at room temperature), decreasing the concentration of dZ
TP from 0.2 mM to 0.05 mM significantly reduced the “forward mutation” (converting C:G pairs into Z:P pairs). This dZ
TP concentration (0.05 mM) is also sufficient to faithfully incorporate dZ
TP opposite template-P, and also prevents mispairing of dCTP and dTTP with template-P. This was also verified by the subsequent sequencing results in Supplementary Figure S4
Next, increasing the concentration of dCTP (from 0.2 to 0.6 mM) essentially eliminates mispairing of dZ
TP with G (Supplementary Fig. S2b
). Likewise, increasing the concentration of dP
TP to 0.6 mM ensures dP
TP pairing with template-Z
in competition with dGTP. Last, decreasing the concentration of dA,T,G/TPs to 0.1 mM and adjusting the ratio of standard to non-standard triphosphates, finishes the optimization process. Under these optimized conditions, retention of Z:P
pairs averaged 99.8% per theoretical PCR cycle, while the loss and gain of Z:P
pairs is ca. 0.2% per theoretical PCR cycle ( and Supplementary Fig. 2b
). In contrast, under normal triphosphate concentrations (0.2 mM, without optimizing the concentrations), the retention of one Z-P pair is 99.2% per theoretical PCR cycle, and about 0.6% per theoretical cycle from natural to artificial base pair (for all Z/Ps in the recognition sequence) (Supplementary Fig. S2a
Any biotechnology based on an evolvable genetic molecule built from six nucleotide letters needs analytical tools to determine its sequence. Accordingly, we developed such a tool for GACTZP DNA, which we describe here and use to show that amplicons arising from known initial sequences retain Z and P at their proper positions.
This sequencing tool exploited both the power of high throughput DNA sequencing technologies24
and the understanding of how Z:P
pairs in duplex DNA might evolve to give C:G and/or T:A pairs during PCR amplification (). Opposite
of our goal while developing high fidelity six-letter PCR, which sought to minimize
the loss of Z
, our goal in developing sequencing tools was to maximize
the loss of Z
. The sequencing procedure developed had these steps ( and Supplementary Figure S3
- A sample of duplex GACTZP DNA is PCR-amplified using dNTPs (0.2 mM each) and just dPTP (0.2 mM), without dZTP. Lacking dZTP, any P in the template directs addition of either dTTP or dCTP to the primer. The presence of some dPTP allows any Z in any template to direct the incorporation of P in a product strand; the derived P subsequently directs incorporation of either T or C in the next copying step (Supplementary Figure S3b).
- The products of the “conversion” PCR reaction are then shotgun cloned.
- Individual DNA molecules from the clones are sequenced.
- The resulting sequences are aligned and compared.
- Sites in the alignment that hold both C and T in various aligned sequences are inferred to have arisen from sites in the parent sequence that held Z; sites in the alignment that hold both G and A are inferred to have arisen from sites in the parent sequence that held P. Sites in the alignment that consistently hold G, A, C, and T in all of the aligned sequences are inferred to have arisen from sites in the parent sequence that held G, A, C, and T, respectively.
In more detail, sites that originally held P
in the precursor would hold either G or A in the converted sequence as a result of steps that involved P
:C and P
:T mispairing (respectively) in the absence of dZ
TP. If the mismatching is balanced, the result will generate a “G” call in half of the sequences, and a “A” call the other half. Similarly, sites that originally held Z
will generate either a “C” call or a “T” call, through a first step involving Z:P
pairing, and a second involving P
:C and P
:T mispairing (respectively). Sites that originally held G, A, C, and T will give uniform calls in all of the sequences returned though consistent G:C and T:A pairing (pace
an occasional PCR error). Thus, the sequence of the precursor and the positions of Z
in that sequence can be inferred ( and Supplementary Figure S3
We found that Taq DNA polymerase supports the needed level of mismatching of template P against T or C at these concentrations in the total absence of dZTP (). In contrast, in conditions that we examined, template-Z in the total absence of dPTP directed overwhelmingly the incorporation of G (leading to amplicons where Z is replaced by C), not the balanced mixture of G and A that would be most useful to infer a sequence.
To demonstrate the use of “conversion PCR” to sequence GACTZP DNA, DNA molecules containing various consecutive and non-consecutive Z
’s and P
’s (Supplementary Table S1
) were first amplified under optimized six-letter PCR conditions (Supplementary Figure S3a
). To convert Z:P
pairs in PCR amplicon to a mixture of T:A and C:G pairs (Supplementary Figure S3b
), a second PCR was performed in 1x ThermoPol reaction buffer (pH = 8.8, measured at room temperature) with Taq
, standard dNTPs (0.2 mM each), no dZ
TP, and dP
TP (0.2 mM) to further amplify the Z:P
containing PCR amplicon. The second PCR products were then cloned into the pCR®
plasmid and transformed into E. coli
(DH5α), colonies were picked, plasmids were isolated and Sanger sequenced, and the separate sequencing results compared (Supplementary Table S2
As expected, at sites in the amplicon that originated as A:T or G:C pairs, all Sanger sequences concurred (Supplementary Table S2
). However, at sites in the amplicon that originated as Z:P
pairs, the sequences differed and showed a mixture of T:A and C:G pairs at those sites. Thus, the positions of the Z:P
pairs in the parent amplicons could be inferred, and were found to be where they were placed in the original template that was PCR amplified. Control experiments amplifying targets that contained no Z:P
pairs in a GACT sequence (Bsp-C, ) showed negligible false calls of T:A and C:G pairs.