|Home | About | Journals | Submit | Contact Us | Français|
Many types of cancer and neurodegenerative diseases are caused by abnormalities and variations in the genome. We have designed a high-resolution imaging technique with high throughput and low cost for determining structural variations of genes related to genetic diseases. We initially mapped all seven nicking sites of Nb.BbvCI endonuclease enzyme on lambda DNA. Then we resolved densely labeled patterns of 107 nicking sites on human BAC DNA that is digested by Nb.BsmI and Nb.BbvCI endonuclease enzymes. This high density resulted in several dyes being closer together than the diffraction limit. Overall, detailed DNA nicking sites mapping with 100bp resolution was achieved, which has the potential to reveal information about genetic variance and to facilitate medical diagnosis of several genetic diseases.
It has become increasingly apparent that structural variation plays an important role in human health and common diseases.1,2 In general, these variations are defined as being longer than 500bp.3,4 Despite their importance, most genome-wide approaches for detecting copy number variations (CNVs) are indirect, depending on signal intensity differences between samples and controls to predict regions of variation in DNA. They therefore provide limited quantitative signal and positional information and cannot detect balanced events such as inversions and translocations. Nonuniform sensitivity, specificity, and probe density of these platforms often lead to conflicting results even with identical samples.5,6 This qualitative measurement requires further validation by low throughput detection methods, such as PCR and FISH.
Physical mapping of long single DNA molecules, either by using gaps generated by digestion with restriction endonu-cleases,7 or using fluorescent tags bound to specific sequence-motif sites as landmarks,8,9 has provided new ways for comparatively rapid and direct whole-genome characterization and visualization for structural variation studies. However, due to the optical nature of the mapping, they are limited in their ability to resolve motifs that are closer than about ~1.5kbp.9 This generally requires selecting restriction site frequencies of at least 10kb per site to avoid significant portions of genome with stretches of unresolvable sites being left out. The method described elsewhere by Ming Xiao and colleagues and being commercialized by BioNano Genomics overcomes some of these limitations by constraining DNA in nanochannels, allowing for more uniform and consistent measurement of single- or multicolor labeling and targeted hybridization-based labeling.11,33 These demonstrations of genome mapping using nanochannels confirmed the method’s sub-2kb resolution and ability to detect sub-100kb CNVs. Improving resolution further, for example, down to 100bp, would have advantages for detecting even smaller structural variation and increasing the information density of the output.
In this study, we describe a DNA mapping method based on localization of multiple sequence-motifs with 100bp resolution. We achieve this by employing two super-resolution techniques, both of which have been shown to have 10 nm resolution on DNA (and other samples).10,11 Specifically, they are two-color SHRImP (Single molecule-High Resolution Imaging with Photobleaching), and two-color SHREC (Single-molecule High-Resolution Co-localization), both corrected for chromatic aberration. SHRImP resolves adjacent fluorophores of the same color by using the quantal photobleaching behavior of single fluorescent dye molecules. SHREC uses two chromatically different fluorophores and images them with high-resolution in separate channels by using a dual-view system. Both SHRImP and SHREC can be extended to three or more dyes. Using more colors will increase the number of resolved distances between individual molecules.
We successfully generated two color sequence-motif maps of 180kb BAC clones (GCTGAGG and GCTCTTC) at 100bp resolution.
A 741bp dsDNA template was constructed by PCR with one Cy3 labeled primer (Figure 1a) at the 5′ end. Additional Cy3 fluorophores were introduced at specific locations 94bp and 172bp from the 5′ end by nick-labeling.10,11 After stretching and linearizing DNA on a glass surface, we applied SHRImP and measured distances between dyes. Our measured distances were 27, 61, and 95 nm, which are in good agreement with the expected distances between Cy3 dyes of 32 nm (94bp), 58 nm (172bp), and 90 nm (266bp) (Figure 1b). The results also demonstrate that three fluorophores of the same color can be imaged in SHRImP simultaneously.
To test the feasibility of simultaneously using SHRImP and SHREC, a similar 741bp dsDNA model system was constructed with a Cy5 labeled PCR primer at the 5′ end and two Cy3 molecules, at positions 32 nm (94bp) and 58 nm (172bp) from the Cy5 (Figure 2a). The positions of the dyes were localized using a dual-view imaging system as described in the methods. The distances between the two Cy3-Cy5 pairs were determined to be 34 ± 1 nm (32 nm expected) and 88 ± 1 nm (90 nm expected). The distance between the Cy3-Cy3 pair was 56 ± 2 nm (58 nm expected) (Figure 2b). This demonstrated that the combination of SHRImP and SHREC can resolve the distances beyond the diffraction limit between multiple fluorophores of different colors.
Lambda DNA (48.5kb) was nick-labeled using Nb.BbvCI and Tamra-ddUTP, which has seven nicking sites along each molecule. In a previous study by Xiao et al., four sites were resolved.9 The two nicking sites clustered at location B (Figure 3a) and three nicking sites clustered at location C (Figure 3a) could not be resolved as containing multiple sites due to their close proximity within the diffraction limit. The seven nicking sites are reduced to four resolvable locations and distances are measured as in Figure 3c with 1.47, 3.27, and 4.27 μm respectively, which are in good agreement with expected distances.
Using SHRImP, the clustered sites at B and C can be clearly resolved. Figure 3d shows the distances between the two nicking sites at the B location to be 104 nm, which agrees well with 108 nm (318bp) predicted distance. The distances between three clustered nicking sites at location C are also resolved to be 101, 202, and 313 nm, which are in close agreement with expected distances of 102, 208, and 310 nm (Figure 3c).
Nicking sequence-motifs for Nb.BsmI and Nb.BbvCI occur on average every 2kb across the human genome. With the resolution of regular fluorescent microscopy, many of the sites within 2kb would not be resolved with even two-color labeling of both motifs. Here we applied DNA-mapping with two-color super-resolution techniques and constructed a Nb.BsmI and Nb.BbvCI sequence-motif map of a 180kb BAC clone containing human sequence. Nb.BsmI has 71 recognition sequences (GCATTC) and Nb.BbvCI has 36 recognition sequences (GCTGAGG) across the 180kb BAC clone.
Figure 4 shows nick-labeling of Nb.BsmI sites with the green dye Tamra and Nb.BbvCI sites with the red dye Cy5. The DNA backbone is stained with YOYO-1. Three-color images were generated by using sequential excitation of Tamra (at 532 nm), Cy5 (at 642 nm), and YoYo-1 (at 488 nm). A few typical overlapping DNA fragments are shown in Figure 4a. The distances between each neighboring spot of the same color are then calculated separately by using SHRImP analysis (Figure 4b–d). To correlate the red and green channel with minimal chromatic aberration, Tamra and Cy5 labeled sites were analyzed together by using SHREC analysis. For this, both Tamra and Cy5 channels were merged after making chromatic aberration correction by using nanoholes as fiduciary marker, which are 100 nm in diameter and 1.5 μm apart (Figure 4e). The color spatial-correlation function was created for each image frame based on the nanohole fiduciary; this color correlation function has 5 nm resolution. By using the color correlation function, all Cy5 spots were mapped to the Tamra channel. A true two-color super-resolution image was created with minimum chromatic aberration. Each DNA fragment was then mapped to the BAC clone reference sequences.
The predicted sequence-motif map was generated using the reference sequence as shown in the top graph of Figure 5a. The experimentally derived histogram of the sequence-motif map is shown in the bottom graph of Figure 5a. The histogram is created with bin sizes of 200 nm (diffraction limit) by using over 1000 DNA fragments and its range covers the whole BAC length. Overall, one-third of the DNA fragments were used in the final map. Unused DNA fragments are mostly under-stretched or overstretched fragments judging by the uneven backbone stain. Experimentally localized nick-labels for Tamra and Cy5 dyes are shown in green and red, respectively. The experimental map agrees well with the reference sequence map (Figure 5a). The peak height of each individual peak correlates well with the density of the nicking sites. More dense regions have higher peaks than other regions. Figure 5b shows two different regions analyzed with two-color SHRImP analysis. One region covers from 44kb (15 μm) to 54kb (18.5 μm). There are five Nb.BsmI sites labeled with Tamra and six Nb.BbvCI sites labeled with Cy5. The closest distance measured between the same color (same sequence-motif) and different colors (different sequence-motif) are 134bp (46 nm) and 313bp (106 nm), respectively. This agrees well with the reference sequences. The other region shown in Figure 5b covers from 19kb to 24kb; in this region, all four Nb.BsmI and three Nb.BbvCI nicking sites were resolved at their expected locations.
Out of a total 107 Nb.BsmI and Nb.BbvCI sites, the super-resolution map resolves 91 sites compared to 65 sites with regular DNA mapping of 2kb resolution. Some of the sites could not be resolved due to the 30 nm resolution limitation as well as having more than two dyes within the diffraction limit. Supporting Information Table S1 shows the complete super-resolution map.
The full super-resolution two-color sequence-motif map with SHREC analysis is also shown in the Supporting Information Figure S4.
Despite recent advances in next-generation sequencing technologies, de novo genome assembly, structural variant, and haplotype analysis using “short read” shotgun sequencing remain challenging. Consequently, most medical resequencing projects rely on mapping the sequencing data to the reference human genome sequence to identify sequences and variants of clinical relevance.14 One approach to address the sequence assembly challenge is optical mapping, an approach pioneered by David Schwartz and colleagues. Optical mapping has been used to construct ordered restriction sites for whole genomes and has proven to be useful in providing scaffolds for shotgun sequence assembly and validation.7,15–20 However, the information content and mapping capabilities are limited by low resolution and use of only a single restriction enzyme.18 The resolution of optical mapping is traditionally limited by the optical resolution (diffraction limit). Small fragments, or neighboring motif sites below 2kb, are hard to measure,9,11,18 resulting in false negatives. Additionally, optical mapping is limited in practical use by low throughput, imprecise DNA length measurement, and high error rates.
Genome mapping using nanochannel technology overcomes some of these limitations by uniform stretching and linearization of the DNA molecules in solution for multiple cycles of imaging, and permitting multienzyme multicolor measurements (unpublished data). Such new genome mapping methods are now enabling numerous analyses in complex genomes, allowing visualization of a significantly larger portion of genomic variation than previously possible. Further improvements in resolution would further increase the information content and decrease the possibility of desert regions in some genomes (genomic regions without sequence-motifs) and also permit the ability of detecting genomic features smaller than CNVs as commonly defined.
We have shown here a multicolor super-resolution DNA mapping method, which provides more detailed DNA sequence information. Single color SHRImP can measure up to three dyes21,22 within the diffraction limit. The two-color SHRImP method is shown to perform as efficiently as single-color SHRImP. To correlate the two color channels for precise distance measurement, we developed a modified SHREC procedure with a nanohole fiduciary marker to minimize the chromatic aberration. The accuracy between different colors reaches 30 nm (100bp). Super-resolution DNA mapping provides significantly higher uniqueness when compared to existing optical DNA mapping technologies for a given molecule length because more dense sequence-motif information can be obtained. This not only helps in de novo sequence assembly and physical map generation, since smaller contigs and less overlap between molecules is needed, it also reduces the current requirement for sample preparation of extremely long DNA molecules. Moreover, the method can potentially generate sequence-motif maps for damaged DNA samples, such as paraffin embedded (FFPE) samples.
In our two-color nick-labeling scheme, the high specificity for sequence recognition is determined by both the enzymatic nicking reaction and the fluorescent nucleotide incorporation reaction. More colors could be incorporated with additional nicking-endonucleases or in combination with other DNA labeling schemes (e.g., polyamides, Bis-PNA, methyltransferase23–25). Using photoswitchable dyes could further increase the labeling density, though it would require a different dsDNA labeling chemistry if STORM like pairs Cy3-Cy526,27 are used. Other photoswitching dyes may also be used.28,29 Improved labeling technologies together with the advance of multicolor super-resolution imaging techniques can provide a DNA sequence-motif map of unprecedented detail. This map can be used to resolve smaller genetic variations over long distances, helping to resolve haplotypes, and can approach sequencing resolution.
This work in part was supported by NIH 068625 and NSF DBI-02-15869 and 082265 (P.R.S.); NIH R01-HG005946 (P. K., M.X.). We would also like to acknowledge support from the Network for Computational Nanotechnology at Illinois and nanohub.org. Preparation of silver nanoholes was carried out in part in the Frederick Seitz Materials Research Laboratory Central Facilities, University of Illinois.
The authors declare no competing financial interest.
Detailed descriptions of materials and methods, figures, and information on algorithms. References 12, 13, and 30–32 appear in the Supporting Information. This material is available free of charge via the Internet at http://pubs.acs.org.