|Home | About | Journals | Submit | Contact Us | Français|
We describe a novel single molecule nanopore-based sequencing by synthesis (Nano-SBS) strategy that can accurately distinguish four bases by detecting 4 different sized tags released from 5′-phosphate-modified nucleotides. The basic principle is as follows. As each nucleotide is incorporated into the growing DNA strand during the polymerase reaction, its tag is released and enters a nanopore in release order. This produces a unique ionic current blockade signature due to the tag's distinct chemical structure, thereby determining DNA sequence electronically at single molecule level with single base resolution. As proof of principle, we attached four different length PEG-coumarin tags to the terminal phosphate of 2′-deoxyguanosine-5′-tetraphosphate. We demonstrate efficient, accurate incorporation of the nucleotide analogs during the polymerase reaction, and excellent discrimination among the four tags based on nanopore ionic currents. This approach coupled with polymerase attached to the nanopores in an array format should yield a single-molecule electronic Nano-SBS platform.
DNA sequencing is a fundamental technology in the biological and medical sciences. Recently, several analytical methods have been developed to detect DNA or RNA at the single molecule level using chemical or physical microscopic technologies1,2,3. In particular, ion channels have been shown to detect individual DNA or RNA strands, leading to the promise of high-speed sequencing and analysis of DNA4,5,6,7,8,9,10,11,12,13.
In 1996, Kasianowicz et al.4 first demonstrated that the α-hemolysin (αHL) channel could be used to detect nucleic acids at the single molecule level. The αHL channel has a 1.5 nm-diameter limiting aperture14,15,16,17, and its voltage-dependent gating can be controlled, such that the pore remains open indefinitely17, making it an ideal candidate for nanopore-based detection and discrimination. Individual single-stranded polyanionic nucleic acids are driven through the pore by the applied electric field, and the polynucleotides cause well-defined, transient reductions in the pore conductance4,8,10,12. Because the residence time of the polynucleotide in the pore is proportional to the RNA or DNA contour length, it was suggested that a nanopore may be able to sequence DNA in a ticker-tape fashion if the current signatures of the four bases can be discriminated from each other.4 Towards the goal of sequencing with nanopore4,13,18, in another approach, an αHL channel with a covalently linked adaptor in the pore was used to identify unlabeled nucleoside-5′-monophosphates one by one following exonuclease cleavage19. However, a complete exonuclease-nanopore system based on this concept to sequence DNA has so far not been demonstrated.
Despite the ability of nanopores to detect and characterize some physical properties of DNA at the single molecule level, the more demanding goal of accurate base-to-base sequencing by passing a single stranded DNA through the nanopore has not yet been realized. Oxford Nanopore Technologies recently announced the ability to accomplish strand sequencing in a nanopore at 3-base resolution with an error rate of 4%20. Another group reported single base resolution strand sequencing with a nanopore, but had difficulty correctly determining homopolymer sequences21.
The native αHL channel has an inherent ability for high-resolution molecular discrimination. For example, it can discriminate between aqueous H+ and D+ ions17, and Robertson et al.22 have recently demonstrated that the αHL channel can easily separate poly(ethylene glycol) (PEG) molecules at better than the single monomer level. In the latter study, a molecular mass or size spectrum estimated from the mean current caused by individual PEG molecules entering the pore easily resolves individual ethylene glycol repeat units. In addition, the mean residence time of the polymer in the pore increases with the PEG size23. Based on these previous investigations using nanopores to detect and distinguish molecules with different structures and the fact that DNA polymerase can recognize nucleotide analogs with extensive modification at the 5′-terminal phosphate group as efficient substrates24,25,26,27,28, we propose a novel nanopore-based sequencing by synthesis (Nano-SBS) strategy that will accurately differentiate each of the four different sized tags attached to the 5′-phosphate of each nucleotide at the single molecule level for sequence determination. The basic principle of the Nano-SBS approach is described as follows. As each nucleotide is incorporated into the growing DNA strand during the polymerase reaction, its tag is released by phosphodiester bond formation (Fig. 1). The tags will enter a nanopore in the order of release, producing unique ionic current blockade signatures due to their distinct chemical structures, thereby determining DNA sequence electronically at single molecule level with single base resolution. We demonstrated that the 5′-terminal phosphate position of the nucleotide is unique in its ability to tolerate sizable modifications by large tags based on PEG molecules without affecting polymerase recognition. This overcomes the inherent constraints imposed by the small differences among the 4 bases, a challenge which all other nanopore sequencing methods have faced for decades. Thus, the proposed Nano-SBS approach identifies individual bases by the detection and differentiation of the large tags released during the polymerase reaction instead of the small nucleotides themselves. The tags are large molecules that have slow diffusion rates, which greatly increase their chance of entering the nanopore and producing unique ionic current blockade signals. As proof-of-principle, we attached four different length coumarin-PEG tags to the terminal phosphate of 2′-deoxyguanosine-5′-tetraphosphate. We demonstrate efficient incorporation of the nucleotide analogs during the polymerase reaction, and better than baseline discrimination among the four tags based on their nanopore ionic current blockade signatures. This approach coupled with polymerase covalently attached to the nanopores in an array format should yield a single-molecule Nano-SBS platform.
The single molecule electronic Nano-SBS system, which is shown schematically in Fig. 2, depicts the DNA polymerase bound in close proximity to the nanopore entrance. A template to be sequenced is added along with the primer. To this template-primer complex, four differently tagged nucleotides are added to the bulk aqueous phase. After polymerase catalyzed incorporation of the correct nucleotide, the tag-attached polyphosphate will be released and pass through the nanopore to generate a unique ionic current blockade signal, thereby identifying the added base electronically because the tags have distinct chemical structures. An example of four continuous nucleotide incorporation reactions with different tags for each base is shown in Supplementary Fig. S1. An array of nanopores, each with a covalently attached polymerase adjacent to the pore entrance, will allow single-molecule SBS.
This tag-based Nano-SBS system offers the following advantages over strand sequencing through nanopores: (1) it overcomes the inherent constraints imposed by the small differences among the 4 bases by instead using 4 large and distinct molecular tags, which are easily differentiated by a nanopore; and (2) there is no need to slow down the transit speed of the tag through the pore as long as the tag is detectable, because the polymerase extension and tag release rate is much slower than the tag interaction time with the pore. This would also eliminate phasing issues inherent to strand sequencing. Here, we describe the synthesis and efficient incorporation of a new class of nucleotide analogs with 5′-phosphate-attached tags. These tags consist of four different length PEGs and a coumarin moiety. We also demonstrate four distinct ionic current blockade patterns produced by these tags in an α-hemolysin channel at the single molecule level. This proof-of-principle study of the separate elements of the proposed Nano-SBS system demonstrates the feasibility of integrating them into a single molecule electronic SBS nanopore sequencer in the future.
The four 5′-phosphate tagged 2′-deoxyguanosine-5′-tetraphosphates (Fig. 3) were synthesized according to the generalized synthetic scheme shown in Fig. 4. First, 2′-deoxyguanosine-5′-triphosphate (dGTP) was converted to 2′-deoxyguanosine-5′-tetraphosphate (dG4P). Then, a diaminoheptane linker was added to the terminal phosphate of the tetraphosphate to produce dG4P-heptyl-NH2 (Product A) in order to attach different length PEG tags. In a separate set of reactions, 6-methoxy-coumarin N-hydroxysuccinimidyl ester was reacted with one of four amino-PEGn-COOH molecules with 16, 20, 24 or 36 ethylene glycol units, to produce coumarin-PEGn-COOH molecules, which were subsequently converted to the corresponding NHS-esters (Product B). The coupling of dG4P-heptyl-NH2 (Product A) with the coumarin-PEGn-NHS esters (Product B) yields the four final nucleotide analogs, abbreviated coumarin-PEGn-dG4Ps (Fig. 4, n = 16, 20, 24, 36). The coumarin moiety was used as a prototype modifier to further tune the size of the tag as well as to track the purification of intermediates and the final nucleotide analogs. Synthesis of the expected coumarin-PEGn-dG4P molecules was confirmed by MALDI-TOF mass spectroscopy (Supplementary Fig. S2).
We next tested the coumarin-PEGn-dG4P nucleotide analogs in polymerase extension reactions using the Therminator DNA polymerase. A primer-loop-template was designed where the next complementary base was a C, enabling dGMP to be added to the DNA primer (Supplementary Fig. S3). Coumarin-PEGn-triphosphate is released during the reaction (Supplementary Fig. S4). MALDI-TOF-MS confirmed that indeed each of the four coumarin-PEGn-dG4P nucleotide analogs gave the correct extension product with 100% incorporation efficiency, as shown by the appearance of a single peak at ~8,290 daltons in the mass spectra (Fig. 5). The absence of a primer peak at 7,966 daltons suggested that the reaction proceeded essentially to completion. An important feature of the Nano-SBS approach is that the extended DNA chains contain all natural nucleotides without any modifications, allowing SBS to continue over extensive lengths. All the extension products shown in Fig. 5 represented the incorporation of the coumarin-PEGn-dG4P nucleotide analogs, with no products derived from potential residual dGTP or dG4P, since the molecules were purified twice in an HPLC system that separates these molecules effectively with a retention time difference of more than 10 min between the two groups of compounds. To further exclude this possibility, we treated the purified coumarin-PEGn-dG4P nucleotide analogs with alkaline phosphatase, which would degrade any contaminating tri- or tetra-phosphate to the free nucleoside, and used the resulting HPLC-repurified coumarin-PEGn-dG4P nucleotide analogs in extension reactions.
The tags released during incorporation of the coumarin-PEGn-dG4P nucleotide analogs in polymerase reactions should be coumarin-PEGn-triphosphates (coumarin-PEGn-P3). To reduce the complexity of the charge on the tags, we treated the released tags (coumarin-PEGn-P3) with alkaline phosphatase, yielding coumarin-PEGn-NH2 tags (Supplementary Fig. S4), and then analyzed these tags for their nanopore current blockade effects. In further developing the Nano-SBS system, we can pursue such treatment of the released coumarin-PEGn-P3 tags with alkaline phosphatase, which would be attached to the entrance of the nanopores downstream of the polymerase, to generate coumarin-PEGn-NH2 tags. Alternatively, we can optimize the conditions for using nanopores to directly detect the released charged coumarin-PEGn-triphosphate tags. For the proof-of-principle studies reported here, in order to obtain large amounts of material for testing by MALDI-TOF MS and protein nanopores, we produced synthetic versions of the expected released tags (coumarin-PEGn-NH2) by acid hydrolysis of the four coumarin-PEGn-dG4P nucleotide analogs to cleave the P-N bond between the polyphosphate and heptylamine moiety (Supplementary Fig. S4). The expected coumarin-PEGn-NH2 molecules were confirmed by MALDI-TOF-MS analysis, following HPLC purification (Fig. 6). MALDI-TOF-MS results indicated that the coumarin-PEGn-NH2 tags generated by acid hydrolysis were identical to the tags produced by alkaline phosphatase treatment of the released coumarin-PEGn-triphosphate tags during the polymerase reaction.
To demonstrate the feasibility of our proposed electronic single molecule SBS approach, we measured a heterogeneous mixture of the four coumarin-PEGn-NH2 tags for their current blockade effects on a single αHL nanopore (Fig. 7). The top of Fig. 7 shows the profile of current blockade versus time. The lower left of Fig. 7 shows a representative subset of the time series data, indicating that inside the nanopore, PEG tags produce current blockades that are characteristic of their size. The relative frequency distribution of the histogram of blockade events (<i>/<iopen>) shows four well separated and distinct peaks for the four coumarin-PEGn-NH2 tags (n = 36, 24, 20, and 16 from left to right respectively in Fig. 7, lower right). To highlight the wide separation of the peaks, and offer clear evidence that detection of a specific nucleotide might be accomplished by the unique blockade signal afforded by its released PEG tag, the peaks are fitted with single Gaussian functions and the corresponding 6 σ error distributions are shown (colored rectangles at top in Fig. 7, lower right). We also characterized separately each of the coumarin-PEGn-NH2 molecules with the pore (data not shown), which confirmed the identity of the different-sized PEG-related peaks shown in Fig. 7. These results suggest that a single base could be discriminated with accuracy better than 1 in 5x108 events, represented in Fig. 7, lower right, by using A, C, G and T designations, which would occur when four different nucleotides with four different length PEGs such as those tested here are used for DNA sequencing.
As described above, a single αHL ion channel can separate single molecules based on their size, and easily resolves a mixture of PEGs to better than the size of a single monomer unit (i.e., < 44 g/mol)16,18,22. This high resolution arises from the interactions between the PEG polymer, the electrolytes (mobile cations), and amino acid side chains that line the αHL channel's lumen16. These interactions allow the pore to be used as a nanometer-scale sensor that is specific to the size, charge and chemical property of an analyte.
Here, such analysis is extended to PEGs with different chemical groups on either terminus. The single channel ionic current recording in Fig. 7 (top and lower left) illustrates the blockades caused by the four different sized coumarin-PEGn-NH2 molecules, one at a time. As with unmodified PEG, each of the current blockades is unimodal (i.e., described well with Gaussian distributions and well-defined mean values).
To accurately discriminate between the four bases (A, C, G and T) for strand nanopore sequencing, one or more of the following strategies need to be adopted: (1) enhance and differentiate the strength of the detection signals; (2) develop an effective method to discern and process the electronic blockade signals generated; (3) control the translocation rate of nucleic acids through the pore, e.g., by slowing down DNA movement; and (4) design and make new and more effective synthetic nanopores. As we demonstrated here, the Nano-SBS approach has transformed the problem of resolving the 4 individual bases to that of discriminating among 4 large well-differentiated tags, which essentially solves the first three problems.
DNA sequencing by synthesis is the dominant platform for genomics research and personalized medicine29,30,31,32,33. Kumar et al. first reported the modification of nucleoside-5′-triphosphates, either by introducing more phosphate groups to produce tetra- and penta-phosphates and introducing fluorophores directly to the terminal phosphate or attaching a linker between the terminal phosphate and the fluorophore24,25. Tetra- and penta-phosphates were shown to be better DNA polymerase substrates, and fluorophore-labeled phosphate nucleotides have been used widely for DNA sequencing26,27,28,34. Here, we have demonstrated a novel approach to enhance discrimination of the four nucleotides by modifying them at the terminal phosphate moiety with distinct large chemical tags for single molecule electronic SBS. The physical and chemical properties of the tag can be further adjusted to optimize the nanopore capture efficiency and measurement accuracy. For instance, the insertion of a positively charged linker consisting of four lysines or arginines between the polyphosphate and the PEG will produce precursors with a neutral charge and released tags with a net positive charge. Using the appropriate magnitude and sign of the potential23, the released tags, but not nucleotide substrates, will be transported through the pore.
The coumarin moiety on the tagged nucleotides can be replaced with other molecules of larger size or different charge to further enhance nanopore discrimination. Clearly, it is important that every tag released in a polymerase reaction be maintained in the proper order for real-time single molecule Nano-SBS. Despite all these precautions, some unreacted nucleotide analogs might enter the pore. Thus, the ability to discriminate between cleaved tags and unreacted nucleotide analogs will be important; fortunately, these two groups of tags should be easily differentiated by a nanopore due to their significant size and charge differences. In addition, it has not escaped us that the tagged nucleotide Nano-SBS approach can be implemented in a straightforward way by adding the four nucleotides (A, C, G and T) labeled with identical tags on the 5′-phosphate in a stepwise fashion to reduce the overall complexity of the system, analogous to pyrosequencing30 and the Ion Torrent approach33. However, unlike those methods, the Nano-SBS approach has the advantage of single molecule sensitivity without the requirement for DNA amplification, and hence no issues with sequencing through homopolymeric regions, since tags released at each position of the homopolymer are detected discretely by the nanopore at single-molecule level.
The single molecule electronic Nano-SBS approach described here should be applicable to either protein nanopores (e.g., αHL; Mycobacterium smegmatis porin A, MspA)35, or solid-state nanopores36,37,38,39,40,41,42. These options will provide nanopores with different properties that are appropriate for detecting a library of tags. To implement this novel strategy for DNA sequencing, an array of nanopores43 can be constructed on a planar surface to facilitate massively parallel DNA sequencing.
In conclusion, we have conducted proof-of-principle studies for a novel single molecule electronic Nano-SBS platform that will measure the tags released from the nucleotide substrates during the polymerase reaction, for sequence determination. In its full implementation in the future, it should be capable of long, accurate reads, and potentially offer very high throughput electronic single molecule DNA sequencing.
The synthesis of coumarin-PEGn-dG4P involves three steps (A, B, C) as shown in Fig. 4. All of the nucleotide analogs were purified by reverse-phase HPLC on a 150 × 4.6 mm column (Supelco), mobile phase: A, 8.6 mM Et3N/100 mM 1,1,1,3,3,3-hexafluoro-2-propanol in water (pH 8.1); B, methanol. Elution was performed from 100% A isocratic over 10 min followed by a linear gradient of 0–50% B for 20 min and then 50% B isocratic over another 30 min.
The synthesis of 2′-dG4P is carried out starting from 2′-dGTP. 300 µmoles of 2′-dGTP (triethylammonium salt) were converted to the tributylammonium salt by using 1.5 mmol (5 eq) of tributylamine in anhydrous pyridine (5 ml). The resulting solution was concentrated to dryness and co-evaporated twice with 5 ml of anhydrous DMF. The dGTP (tributylammonium salt) was dissolved in 5 ml anhydrous DMF, and 1.5 mmol 1, 1-carbonyldiimidazole (CDI) was added. The reaction was stirred for 6 hr, after which 12 µl methanol was added and stirring continued for 30 min. To this solution, 1.5 mmol phosphoric acid (tributylammonium salt, in DMF) was added and the reaction mixture was stirred overnight at room temperature. The reaction mixture was diluted with water and purified on a Sephadex-A25 column using a 0.1 M to 1 M TEAB gradient (pH 7.5). The dG4P elutes at the end of the gradient. The appropriate fractions were combined and further purified by reverse-phase HPLC to yield 175 µmol of the pure tetraphosphate (dG4P).31P-NMR: δ, −10.7 (d, 1P,α-P), −11.32 (d, 1P, δ-P), −23.23 (dd, 2P, β, γ-P); ESI-MS (-ve mode): Calc. 587.2; Found 585.9 (M-2).
To 80 µmol dG4P in 2 ml water and 3.5 ml 0.1 M 1-methylimidazole-HCl (pH 6) were added 154 mg EDAC and 260 mg diaminoheptane. The pH of the resulting solution was adjusted to 6 with concentrated HCl and stirred at room temperature overnight. This solution was diluted with water and purified by Sephadex-A25 ion-exchange chromatography followed by reverse-phase HPLC to yield ~20 µmol dG4P-heptyl-NH2 (Product A), which was characterized by ESI-MS (-ve mode): calc. 699.1; Found 698.1 (M-1).
Amino-PEGn-acids (1 eq) [Amino-d(PEG)16, 20, 24, 36-acids; Quanta Biodesign] were dissolved in 0.1 M sodium carbonate-sodium bicarbonate buffer (pH 8.6), followed by addition of coumarin-NHS (1 eq) in DMF, and the reaction mixture was stirred overnight. The coumarin-PEGn-acids obtained were purified by silica-gel chromatography using a CH2Cl2-MeOH (5–15%) mixture and the appropriate fractions were combined. These compounds were analyzed by MALDI-TOF MS (Supplementary Table 1).
Reaction of the coumarin-PEGn-acids with 1.5 eq. of disuccinimidyl carbonate (DSC) and 2 eq. of triethylamine in anhydrous DMF for 2 h yields the corresponding coumarin-PEGn-NHS esters (Product B). The coumarin-PEGn-NHS esters, which move slightly faster than the corresponding acids on silica-gel plates, were purified by silica-gel chromatography using a CH2Cl2-MeOH (5-15%) mixture and then used in the next step.
dG4P-heptyl-NH2 (Product A) synthesized above was taken up in 0.1 M sodium carbonate-bicarbonate buffer (pH 8.6) and to this stirred solution was added 1 eq. of one of the coumarin-PEGn-NHS esters (Product B) in DMF. The resulting mixture was stirred overnight at room temperature and then purified on a silica-gel cartridge (15-25% MeOH in CH2Cl2 to remove unreacted coumarin-acid or -NHS ester and then 5:4:1 isopropanol/NH4OH/H2O). The crude product was further purified twice by reverse-phase HPLC to provide pure coumarin-PEGn-dG4P: coumarin-PEG16-dG4P (retention time, 31.7 min); coumarin-PEG20-dG4P (retention time, 32.2 min); coumarin-PEG24-dG4P (retention time, 33.0 min); coumarin-PEG36-dG4P (retention time, 34.3 min). The structure of all the molecules was confirmed by MALDI-TOF MS (Supplementary Table 2).
Extension reactions were performed using a template-loop-primer (5′-GATCGCGCCGCGCCTTGGCGCGGCGC-3′, M.W. 7966), in which the next complementary base on the template is a C, allowing extension by a single G (Supplementary Fig. S3). Each extension reaction was carried out in a GeneAmp PCR System 9700 thermal cycler (Applied Biosystems) at 65°C for 25 minutes in 20 µl reactions consisting of 3 µM template-loop-primer, 1X Therminator γ buffer [50mM KCl, 20mM Tris-HCl, 5mM MgSO4, 0.02% IGEPAL CA-630 (pH 9.2)], 2 units of Therminator γ DNA polymerase (New England Biolabs), and 15 µM of one of the coumarin-PEGn-dG4P nucleotide analogs. The DNA extension products were precipitated with ethanol, purified through C18 ZipTip columns (Millipore), and characterized by MALDI-TOF MS. As shown in Fig. 5 in the main text, four identical extension products (expected molecular weight 8,295) were obtained. These polymerase extension reactions for each coumarin-PEGn-dG4P were repeated and the released products (coumarin-PEGn-triphosphates, Supplementary Fig. S4) were treated with alkaline phosphatase (1U at 37°C for 15 min) to yield the coumarin-PEGn-NH2 tags, which were extracted into dichloromethane and characterized by MALDI-TOF-MS.
Single α-hemolysin (αHL) channels were inserted into solvent-free planar lipid bilayer membranes44 fabricated across an ~ 80 μm diameter hole in a 25 μm thick Teflon partition separating two electrolyte solution wells as described previously45. 4 M KCl, 10 mM Tris titrated to pH 7.2 with citric acid was used throughout the experiment. Membranes were formed by first wetting the partition with 1 % v/v hexadecane/pentane. 10 mg/mL diphytanoyl phosphatidylcholine (DPhyPC) in pentane were spread at both air-electrolyte solution interfaces with the solution levels well below the hole in the Teflon partition. After 10 min, the solution levels were raised above the hole spontaneously to form a membrane. Approximately 0.5 μL of 0.5 mg/mL αHL was injected into the solution immediately adjacent to the membrane and the ionic current was observed until a single channel inserted into the membrane. The cis chamber contents were then exchanged with protein-free electrolyte solution to maintain a single channel for the duration of the experiment.
Coumarin-PEGn-NH2 molecules (n = 16, 20, 24 and 36) were added to the trans side of the pore (defined as the β-barrel side of the channel) to a final concentration between 0.4 μmol/L and 1 μmol/L of each component. Ionic current was recorded between two matched Ag/AgCl electrodes (3 M KCl) at a fixed potential (−40 mV) for approximately 15 min to achieve sufficient counting statistics. Data were recorded with a 4-pole Bessel filter at 10 kHz oversampled at 50 kHz.
Data were analyzed off-line with an in-house program written in LabVIEW (National Instruments) as described previously23. In brief, blockades were located with an event detector based on a simple threshold algorithm set at 5 σ of the current noise in the open state. When an event is detected, the points in the rise time and decay time were discarded (~ 60 μs and 20 μs, respectively). The mean blockade depth was calculated from the remaining points and the open channel current was calculated from the mean of 0.8 ms of open channel data separated 0.2 ms from the threshold. The data were reported as a ratio of the means (<i>/<iopen>) and the nanopore spectra were calculated as a histogram of these values.
JJ, CT, SK and ZL conceived the initial approach; SK synthesized the nucleotide analogs; MC, CT, BH and ZL characterized these molecules and carried out the SBS experiments; AB and JWFR performed and analyzed the nanopore experiments; JER and JWFR designed the data analysis algorithms used to process the data. JJ, SK, JJR, JJK and JWFR designed the experiments, assessed the results, and wrote the manuscript with MC.
We are grateful for the support of NIH grant R01HG005109 (JJ) and in part by an NRC/NIST-NIH Research Fellowship (AB) and a grant from the NIST Office of Law Enforcement Standards (JJK, JWFR, JER). Names and parts for certain equipment are mentioned in the manuscript. This in no way represents an endorsement by NIST.