|Home | About | Journals | Submit | Contact Us | Français|
Nanopore sensors show great potential for rapid, single-molecule determination of DNA sequence information. Here, we develop an ionic current-based method for determining the positions of short sequence motifs in double-stranded DNA molecules with solid-state nanopores. Using the DNA-methyltransferase M.TaqI and a biotinylated S-adenosyl-l-methionine cofactor analogue we create covalently attached biotin labels at 5′-TCGA-3′ sequence motifs. Monovalent streptavidin is then added to bind to the biotinylated sites giving rise to additional current blockade signals when the DNA passes through a conical quartz nanopore. We determine the relationship between translocation time and position along the DNA contour and find a minimum resolvable distance between two labeled sites of ~200 bp. We then characterize a variety of DNA molecules by determining the positions of bound streptavidin and show that two short genomes can be simultaneously detected in a mixture. Our method provides a simple, generic single-molecule detection platform enabling DNA characterization in an electrical format suited for portable devices for potential diagnostic applications.
Precise determination of DNA sequence information has underpinned many recent advances in biomedicine. Alongside direct determination of the sequence, accurate mapping of the position of specific base pair motifs has become an important tool, for instance, in reconstructing large genomes.1 Mapping of short sequence motifs can also be used for species characterization by comparing given maps with databases of known organisms.2 Until now, optical methods have been the primary technique employed for genome mapping.3 In these methods the DNA is first stretched by shear flows or nanochannel confinement followed by determination of the positions of specific sequences by fluorescent labeling.3−7
Nanopores offer another attractive route to DNA identification through genome mapping with the potential for integration into small, portable devices.8,9 Nanopores present a distinct advantage when compared to the requirements for optical microscopy in terms of equipment size and complexity. The principle of the sensing method is that single molecules can be characterized when they pass through a nanosized pore and block the flow of ions. The 1D threading of a long polymer such as DNA through such a nanoscale pore naturally leads to the possibility of reading sequence information. Synthetic or solid-state nanopores have potential advantages such as robustness and amenability to wafer scale integration which makes them attractive alternatives to biological nanopores.10 Singer et al.11,12 showed that bound peptide nucleic acids (PNA) sequences could be detected for targeted identification of a single viral genome using ~4 nm diameter solid-state nanopores. Recent developments in solid-state nanopore technology have shown that the positions of single proteins13−16 and DNA hairpins17 can also be detected on double-stranded DNA.
In this paper, we build on these developments to demonstrate a generic solid-state nanopore method capable of mapping genomic DNA. We modify short sequence motifs with a biotin residue by using a DNA-methyltransferase (MTase) and a synthetic cofactor analogue.18 The biotin sites are then labeled by tight binding of monovalent streptavidin19 thereby creating additional, sequence specific blockades as the DNA passes through the solid-state nanopore. We demonstrate the feasibility of our method by mapping a variety of DNA plasmid and bacteriophage genomes and find a minimum resolvable distance between sequence motifs of ~200 bp. Our results show the potential application of solid-state nanopores for DNA identification.
Double-stranded DNA (dsDNA) translocations through a solid-state nanopore show no detectable sequence information due to the uniform cross-section of a DNA double-strand. Therefore, we sought a method for creating labels on double-stranded DNA that would increase the ionic current blockade at specific sequence motifs. A variety of methods have been demonstrated for labeling specific internal sequences on a double-stranded DNA molecule.20,21 Synthetic PNA binds to a double-helix at highly specific sites ~8–15 bp in length.11,12,22 Similarly nicking endonucleases can be used to create nicks in DNA for subsequent labeling by nick translation with fluorophore-labeled deoxyribonucleotides.5 Also different variants of Cas9 can be used for fluorescence labeling or to create additional ionic current blockade in solid-state nanopore measurements.23,24 The length of the recognition sites of PNA and Cas9 varies from 8–20 bp. This means that labeling is usually targeted for identification of a specific genome because sequence motifs of this length do not repeat frequently enough to create unique fingerprints for different DNA samples. We sought a method that would label shorter sequence motifs and therefore create a way to easily barcode many DNA molecules based on the distribution of these short sequences along the polymer.
A robust method for targeted labeling of DNA positions is “methyltransferase-directed Transfer of Activated Groups” (mTAG).25 In this technique a DNA MTase is tricked into covalently adding a functional group by replacing the methyl group of the natural cofactor S-adenosyl-l-methionine (AdoMet) with other chemical entities. The mTAG-method has been shown to have a yield close to 100% and can be performed in a simple one-step reaction.2,25Figure Figure11a,b shows schematics of the DNA MTase-based strategy we employed here. We used the DNA MTase M.TaqI26 which recognizes the four base pair sequence 5′-TCGA-3′ to covalently label the N6-atom of the adenine base of the recognition sequence with a biotin residue.
We initially tested our method with a DNA plasmid of 4361 bp in length (pBR322). This circular plasmid has seven 5′-TCGA-3′ recognition sites (Sites 1–7) for M.TaqI. After biotin labeling, the plasmid was completely protected against cleavage by R.TaqI indicating at least one adenine within the palindromic 5′-TCGA-3′ is labeled with quantitative efficiency (Figure Figure11c). The biotinylated plasmid was then linearized by digestion with the restriction endonuclease R.AhdI that cuts at a single site (Figure Figure11d). A 5-fold excess of monovalent streptavidin (over the total number of 5′-TCGA-3′ sites) was then added to bind to the biotinylated sites and thereby create additional ionic blockades.27,28
The sample was added to the reservoir outside a conical glass nanopore (~14 nm diameter, see the Supporting Information for nanopore fabrication and characterization) and the ionic current was detected upon applying a positive voltage (Figure Figure11e). Previous studies have shown that this diameter of conical nanopore enables full translocations of DNA and DNA bound with proteins.13,29 Four molar LiCl solution was used as the electrolyte as it is known to slow the DNA translocation relative to other commonly used salts such as NaCl and KCl.13,30,31 A voltage of 600 mV was chosen to strike a balance between the requirements for high capture rate and slow translocation velocity.32 When entering the nanopore, DNA molecules cause a lowering of the current and passage of streptavidin bound to the DNA creates additional current drops. Figure Figure11f shows an example of a recorded current trace and events caused by translocations of streptavidin-labeled pBR322 DNA. Because of the wide opening of the nanopores used here (~14 nm diameter) the DNA can translocate not only in a linear form but also in folded conformations. These folded confirmations have been extensively characterized in the literature and it is well-known that the predominant folded conformation has a single hairpin at the beginning.33,34 We wrote a selection algorithm to exclude those events with a fold at the beginning and only the subset of unfolded events (~30% of the total) were retained for further analysis (see Figures S2 and S3).
The DNA can pass through the pore in two orientations named as direction 1 (D1) and direction 2 (D2), as shown in Figure Figure11f. For these two directions, we measured the distribution of signal sizes for the peak caused by the seventh 5′-TCGA-3′ site (Figure Figure11d) along the plasmid. The distributions were bimodal showing that at each 5′-TCGA-3′ site there are either one or two monovalent streptavidin bound. Control experiments with an engineered DNA double strand containing a single biotin site showed only a unimodal distribution thereby indicating the effect was not due to dimerization of streptavidin in 4 M LiCl (see Figure S5 for details).13 Therefore, the observation of two monovalent streptavidin molecules at some sites likely results from the labeling of both adenine residues within the palindromic 5′-TCGA-3′ sequence. The signal for a single streptavidin is ~60 pA irrespective of the DNA direction (Figure Figure11f) which is high compared to the typical noise of our nanopores of 6–10 pA RMS in a 50 kHz bandwidth.17
After separating the data set into the two directions, we then used an algorithm to determine the positions of the intraevent peaks caused by the additional current blockade when each streptavidin label passed through the pore (Figure S4). This algorithm first corrects for the baseline DNA translocation and then uses a peak finding routine to determine the time points of each peak in the ionic current as a streptavidin passes through. From visual inspection, streptavidin labels that were separated by >300 bp in the pBR322 DNA were easily resolved whereas the two 5′-TCGA-3′ sites separated by 141 bp mostly showed up as a single peak.
In Figure Figure22b, we show a histogram of the time points of all peaks measured with respect to the first peak in the translocation (Figure Figure22a). The data is an aggregate of 812 translocations for direction 1 only. Five peaks are distinguishable in the histogram because the two 5′-TCGA-3′ sites separated by 141 bp were not resolved. The width of each peak in this histogram increases as a function of the time due to fluctuations as the DNA passed through the nanopore. Figure Figure22c shows a histogram of the number of peaks detected per translocation. Approximately 50% of the translocations were found to have the expected six peaks. Higher or lower numbers of detected peaks can be due to a variety of factors such as errors in the baseline tracking, complex folds, or knots in the DNA creating additional signals or some cases where the two peaks separated by 141 bp were in fact resolved. We note however that unlabeled sites are not thought to be important because the gel characterization of Figure Figure11c shows that all 5′-TCGA-3′ sites are labeled with at least one biotin.
Analyzing only those translocations that have six peaks, Figure Figure22f shows how the mean translocation time (calculated from Gaussian fits to the histograms, Figure Figure22e) between adjacent peaks varies as a function of the label distance (computed from the known sequence). The data indicates that to a good approximation the intraevent velocity remains constant during a translocation. We recently analyzed the mean intraevent velocity using custom designed DNA rulers that showed a small velocity decrease of ~5% during the translocation of 7.2 kbp DNA through a nanopore.32 The smaller statistical samples and DNA length used here likely obscures this slight average velocity reduction that is not visible in Figure Figure22f and for our further analyses in this paper we assume a constant velocity. The velocity did not show a significant change over several hours′ measurement (Figure S6).
Data on the standard deviation of the time interval between peaks as a function of the mean time (both were obtained from the Gaussian fits to the histograms in Figure Figure22e) is shown in Figure Figure22g. The standard deviation reflects the uncertainty of position due to natural fluctuations as the DNA threads through the nanopore. These fluctuations create a fundamental physical limit to the accuracy of determining the positions of the labeled sites (Figure Figure22b). The physical mechanism behind these fluctuations remains to be fully understood but may reflect different conformations of the DNA when it arrives at the pore mouth.32,35,36 Bell et al.32 quantified a standard deviation of ~500 bp for a 5 kbp distance. This is of the same order of magnitude as the estimated error of ~1.5 kbp for a 10 kbp distance for optical mapping due to DNA fluctuations in nanochannels.37 This uncertainty caused by fluctuations together with the minimum distance required to resolve two peaks quantifies the genome mapping capability of our system. DNA translocation time and the detection performance were also characterized with different pores (see Figure S7). The percentage of unfolded events that had six peaks ranged from 30 to 50%.
We also performed experiments to determine whether the presence of the protein labels significantly modifies the velocity of the DNA. We mixed together unlabeled pBR322 and labeled pBR322 and measured translocations simultaneously with a single nanopore (Figure Figure22h). The two populations were separated by determining the number of intraevent peaks for unfolded DNA translocations. Figure Figure22g shows histograms comparing the translocation times for the labeled and unlabeled DNA (see Figures S8 and S9 for data analysis). The presence of the protein labels might be expected to change the DNA translocation time due to changes in the total charge, increased viscous drag and the possibility of surface interactions between the streptavidin and the nanopore. However, we observe only an increase from 1.1 ± 0.1 ms (μ ± σ from the Gaussian fit) for the unlabeled DNA to 1.2 ± 0.1 ms (μ ± σ) for the labeled DNA indicating that the presence of the labels does not have a significant impact on the DNA velocity (Figure Figure22g) under our experimental conditions.
To show that our method can be generally applied to map different genomes, we measured the signals after labeling the DNA of two bacteriophages, ΦX174 DNA (5353 bp after linearization with R.BaeI) and M13mp18 DNA (7249 bp after linearization with R.MscI). These two genomes have 10 and 12 5′-TCGA-3′ sequence motifs, respectively. Figure Figure33 shows schematics of the labeling of these two genomes and example translocation signals. A detailed analysis on the peak positions is given in the Supporting Information (Figures S10–S15). For the ΦX174 DNA, sites 3–7 are densely distributed on the DNA with an overall distance of 194 bp and cause a wide and deep peak. However, we are able to identify separate peaks with distances of 231 bp (between sites 8 and 9, Figure Figure33a,b) and 239 bp (between sites 10 and 11, Figure Figure33c,d). In contrast, the signals are not fully separated for sites 1 and 2 in Figure Figure33a,b with a smaller distance of 141 bp. Combined with the results for pBR322 DNA in Figure Figure22, this indicates a minimum resolvable distance between sites of ~200 bp. This value can be approximately understood based on the known geometry of our nanopores as characterized by scanning electron microscopy.17 The conical shape means the electric field strength 68 nm (~200 bp) into the pore drops to 37% of its highest value at the pore entrance, according to numerical simulation results calculated with COMSOL Multiphysics 4.4 (Figure S16), so that the baseline has recovered sufficiently for the next peak to be detected. The ~200 bp resolution is comparable to recent results with super-resolution microscopy.37
Finally we investigated the potential for simultaneously mapping two DNA molecules present at the same time in the reservoir. We selected labeled pBR322 and labeled M13mp18 as the two genomes to be mapped. We also added unlabeled pBR322 to the sample mixture (Figure Figure44a) as an independent calibration for determining the velocity of bare DNA for the pore. We added this calibration because it is well established that the DNA velocity can vary slightly from one solid-state nanopore to the next due to differences in geometry and a calibration sample therefore allows us to accurately infer the time-distance relationship for a particular nanopore assuming a constant velocity.13Figure Figure44b shows a histogram of event charge deficit (ECD), equal to the total charge excluded during a single translocation, for 9244 total events (including folded confirmations). It has previously been shown that ECD gives a measure of DNA length.29 In Figure Figure44b, the left group of events are caused by the unlabeled and labeled pBR322 DNA and the right group of events are caused by the labeled M13mp18 DNA. Theoretically, unlabeled pBR322 DNA, labeled pBR322 DNA, and labeled M13mp18 DNA would create 0, 6, and 11 peaks, respectively, if all labels with less than 200 bp separation created one peak. We selected events with 0 in the left group as a calibration and events with 6 peaks in the left group and 11 peaks in the right (Figure Figure44c) to be analyzed. Example events present are shown in Figure Figure44d and further details of data analysis are shown in the Supporting Information (Figures S17–S19). We estimated the distances between adjacent sites for the two labeled DNA samples based on the velocity computed for unlabeled pBR322 DNA, as shown in Figure Figure44e,f and Tables S1 and S2 (utilizing the constant velocity assumption). The values are in good agreement with expected ones with the largest error of the estimated mean value around 10% (Tables S1 and S2).
In summary, we have demonstrated a solid-state nanopore platform for mapping genomic DNA by detecting streptavidin labeled on short sequence motifs. Our method shows that labels as close as ~200 bp can be observed separately. We extensively characterized the relationship between translocation time and distance between labels and showed the ability to simultaneously map two genomes. We envisage several potential avenues for further improvement of the current implementation. Fast voltage switching protocols could enable multiple reads of each individual molecule to reduce the error in localizing each peak.38 This could also help for reading longer DNA molecules because a current difficulty is that longer DNA molecules give fewer unfolded translocations. The event throughput could also be increased by membrane tethering39 or dielectrophoretic concentration.40 We might also use DNA MTases that recognize different sequence motifs in combination with individually addressable nanopores to measure samples at different label densities. The ability to resolve closely separated labels could also be improved by using nanopores in thin 2D membranes because these pores have electric field sensing zones as small as several nanometers.
Overall our solid-state nanopore-based method for determining the positions of short sequence motifs by recording electrical signals can be used for DNA characterization with relatively few labeling and purification steps. In contrast to previous experiments with targeted Cas9 or PNA binding,11,12,24 the short sequence identification enables a generic platform for mapping sequence sites in different DNA molecules with a single workflow. The barcodes generated from labeling short sequence motifs could be correlated with genome databases to enable identification of the different samples present in solution.2 This technique could also straightforwardly be combined with recent developments in specific protein sensing with solid-state nanopores13,17 to provide both DNA and protein detection in a single measurement. The amenability of solid-state nanopores to integration in small chips also suggests potential for portable diagnostic devices.
We thank the lab of Mark Howarth at Oxford University for kindly providing monovalent streptavidin and K. Glensk for preparing M.TaqI.
(N.A.W.B.) Department of Chemistry, University of Oxford, Oxford, OX1 3TA, United Kingdom.
The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.
K.C. acknowledges funding from the China Scholarship Council (201506210147). N.A.W.B. and U.F.K. acknowledge funding from an ERC consolidator grant (Designerpores 647144). E.W. acknowledge financial support from the German-Israeli Foundation for Scientific Research and Development (I-1196-195.9/2012).
The authors declare no competing financial interest.