|Home | About | Journals | Submit | Contact Us | Français|
Established techniques for global gene expression profiling, such as microarrays, face fundamental sensitivity constraints. Due to greatly increasing interest in examining minute samples from micro-dissected tissues, including single cells, unorthodox approaches, including molecular nanotechnologies, are being explored in this application. Here, we examine the use of single molecule, ordered restriction mapping, combined with AFM, to measure gene transcription levels from very low abundance samples. We frame the problem mathematically, using coding theory, and present an analysis of the critical error sources that may serve as a guide to designing future studies. We follow with experiments detailing the construction of high density, single molecule, ordered restriction maps from plasmids and from cDNA molecules, using two different enzymes, a result not previously reported. We discuss these results in the context of our calculations.
Global gene expression analysis is the quantification of gene transcription, across all genes, from a cell or tissue at the time of sampling [1, 2]. Detection of differences and modulation in global expression patterns has yielded a deeper appreciation for the interconnected circuitries of normal and diseased tissues, and is now commonplace in biomedical research and drug discovery; global gene expression profiling is also beginning to be used in the clinical setting, to aid in disease prediction, diagnostics and treatment [3–7]. Importantly, there is increasing demand for expression profiling of small samples, as large amounts of material can be difficult, if not impossible, to obtain in clinical and experimental settings [8–19]. Recent methodological advances make possible the global expression profiling of minute samples from micro-dissected tissues, including single cells, thereby avoiding the confounding biological effects of tissue heterogeneity. Fine needle aspirates and fine needle core biopsies offer practical clinical sampling procedures of limited material. Technologies that facilitate the isolation of individual, specialized cells, such as by laser capture micro-dissection, yield homogenous material for analysis [3, 12, 18, 19].
Each cell contains approximately 300 000 mRNA molecules, representing more than 3 × 104 different species, while each low abundance species may be present in only a few copies per cell [18, 20, 21]. Genes transcribed at low levels, such as regulatory proteins, exert large biological effects from small changes in expression level [13, 18, 22]. Considerable work with enzymatic amplification, including PCR-based [23–33], multiple displacement amplification-based (MDA) [24, 34–55], and RNA polymerase-based [23, 56–63] protocols, has enabled the use of hybridization microarrays and sequence tag methods to characterize low abundance mRNA samples [64–72]. This includes several recent reports of message profiling of single cells [73–82]. Unfortunately, serious methodological drawbacks remain.
Recent studies of mRNA amplification protocols have found a significant decrease in correlation coefficients between low copy number species, pre- and post-amplification. Single cell transcript profiling studies have found that only large magnitude changes in expression can be quantified for moderate- and low-abundance transcripts [24, 27, 28, 30, 35, 41, 46, 57, 62, 63, 67, 83–88]. In several papers, Nygarrd et al [85, 88] have argued strongly that fundamental, stochastic effects prevent reliable enzymatic amplification of all species from minute samples; they conclude that high and medium abundance species can be quantitatively amplified, but low copy number species will always be amplified unevenly. This may prove to be an enormous limitation, as the majority of transcripts in a cell can be ‘low abundance’, defined as having 1–5 copies per cell.
Non-amplified single molecule approaches provide the most direct solution to the above limitations, as they offer theoretically unlimited, unbiased sensitivity. They also require inherently fewer processing steps, reagent use is limited, and with parallelization, reasonably low amortized instrument costs are incurred [89–126]. Unfortunately, non-amplified, single molecule sequencing methods, such as nanopores or in situ synthesis-based chemistries, also face extremely challenging signal detection and sequencing chemistry hurdles, and despite extensive ongoing research, remain in the earliest stages of development; the same is true for single molecule high density oligonucleotide probe hybridization approaches [127–154].
In contrast, single molecule, high density ordered restriction mapping presents an interesting, but largely unexplored alternative in this application. Type II restriction enzymes, the most common variety, are unparalleled in their robustness and simplicity as detection systems for specific, short DNA sequences (usually less than eight bp). They bind and cleave their recognition sequences up to 106 times more specifically than similar, non-cognate sequences . Over 3500 different Type II restriction enzymes exist and hundreds are available commercially. Single molecule ordered restriction mapping has been used extensively in small sample genome mapping studies using optical detection [156–165], and in a few cases using AFM [166–168]. For a combination of reasons, which we discuss in detail below, none of these studies demonstrated high resolution mapping of short DNA molecules (<2 kb) as would be required for identifying individual message transcripts. The current study examines high density, ordered restriction mapping using AFM as a method for gene expression profiling. We begin by defining the problem mathematically, using coding theory, to determine the relationship between molecule size, AFM sizing accuracy and site labelling (binding or cleavage) efficiency. We detail the construction of highly accurate, dense single molecule ordered restriction maps of actual cDNA molecules and short plasmids, using AFM and the in situ cleavage approach, a result not previously reported in the literature.
AFM images were acquired with both a Digital Instruments Bioscope AFM and a Dimension 3000 AFM, in tapping mode, using manufacturer-supplied TESP diving board cantilevers. Imaging was conducted at 22°C and ~30% relative humidity. DNA was processed and imaged on freshly cleaved mica derivatized with 3-aminopropyl triethoxysilane to provide a positive charge for DNA retention, as previously described . Nanoscope image processing software was used to flatten and plane-fit all AFM images and NIH Image was used to manually measure DNA backbone length profiles. We have found that the DNA molecule itself serves as a very good reference for scan quality and tip condition. The true width of DNA is ~2 nm and generally appears 8–15 nm wide in our images. Tip quality and scan parameters were assessed using the apparent DNA width.
DNA sizing studies used six linear fragments from pEYFP-C1 (Clontech), prepared by cleavage in solution, deposited as described above, skipping the surface cleavage and washing steps. The fragment sizes (nm/bp) were 191/579, 230/760, 447/1355, 589/1785, 788/2388, and 1561/4731. A constant 0.33 nm/bp derived from the calculated pitch of B-DNA  was used as a nm-to-bp conversion factor. A truncated splice variant of CD44 in the pOTB7 plasmid vector was obtained from ATCC. The CD44 plus pOTB7 sample was produced by double cleavage with XhoI and EcoRI to release the cDNA insert from the pOTB7 vector. The CD44v cDNA sequence corresponds to Genbank accession number BC052287.1. Stretching DNA on surfaces has been researched extensively [170–181]. We used fluid flow to stretch cDNAs, a standard technique. Orientation and spacing of cDNAs on the substrate could be controlled by the direction of fluid flow and sample concentration during application, as observed elsewhere [157, 162, 163, 179].
Linear DNA molecules were elongated and deposited onto derivatized mica surfaces using capillary fluid flow as described previously . Surface bound molecules were exposed to aqueous salt buffer containing enzyme RsaI or PstI for 15–30 min at room temperature. Processed samples were washed with ultra pure water and dried under a stream of nitrogen gas. AFM images were taken from dried samples directly.
Breaks are scored simply from the local topography of the DNA backbone. Double-stranded DNA consistently appears between 0.5 and 1.0 nm tall and 8 and 15 nm wide in our AFM images. The height contrast between the backbone and the surrounding surface is more than sufficient to identify breaks via local threshold, i.e. the breaks appear to have height equal to the surrounding surface. Molecules with gaps larger than 100 nm are rejected. It is very unlikely that two different molecules will appear co-aligned on the surface with their ends 100 nm or less apart. However, to virtually eliminate such molecules from consideration, we only include a molecule if its fragments sum up to the known length of molecules in the sample. In a ‘real world’ sample one can easily label the ends of a molecule with a moiety detectable via AFM to disambiguate closely spaced ends. We make no attempt to automate gap finding or otherwise use more complicated criteria. This is an appropriate subject for further study, but our present criteria are sufficient given the scope of this work.
Both pOTB7 and CD44 are relatively short (594 nm/1800 bp) and loss or displacement of cleaved fragments during sample processing reduces the yield of measurable molecules. Molecules were deemed measurable if the ends were distinct, they contained one clear break, the fragments summed to full length, and the molecule was sufficiently elongated to manually follow the backbone contour. The cleavage rate (% cleavage/total sites) varied from image to image, generally with a range of 30–40%, on average. The rate of false positives (false cuts) is largely a function of image quality, sample age, and similar variables. Other in situ restriction mapping studies suggest that false cuts are more likely to be non-specific breaks than the enzyme cutting in the wrong place [162, 163].
We calculated the probability of uniquely distinguishing cDNA molecules present in a sample containing many similar species using AFM-determined ordered restriction maps. In this analysis, we treat each map as a unique ‘molecular signature.’ The first step in determining this probability is to calculate the Hamming distance between molecular signatures, HamDist, assuming a total number of ‘good signatures’, S. Each signature is randomly selected from the set of all possible binary vectors, with a probability π. The computation of this probability proceeds as follows: start with a selected signature f0 from the set S, and compute all the possible signatures whose Hamming distances from f0 range between 1 and HamDist; there are:
such signatures, and with high probability, they do not contain even a single signature from the set S (probability > (1 − 10−12) > (1 − π)vol). We compute the uniqueness of the identification probability, given a fixed sizing accuracy, α, enzyme recognition site frequency, pc, and cleavage rate, pd: we compute this probability as follows:
That is, we sum the probabilities that starting with a signature with (a + b) unit bits, exactly b unit bits are lost from the mapped signature as a consequence of incomplete cleavage. These calculations were performed with Mathematica and the code is available upon request.
We examined the case of uniquely determining the identity of a population of mRNAs obtained from a limited sample, such as a single cell, using single molecule ordered restriction mapping. Our analysis takes into account molecule size, and the two primary sources of error: inaccuracy in locating the ordered restriction sites, and missing real sites (false negatives) due to noise, problems with the enzyme, and similar suboptimal situations. Mammalian mRNAs have an approximately log-normal length distribution, with the median length ~1.5 kb [182–185]. Roughly 80% of the species are within the size range 1–8 kb, while only a few per cent are shorter than 0.5 kb. mRNAs are converted to a double-stranded DNA form, cDNA, so that they become templates for restriction enzymes (a common technique). A single cell contains approximately 300 000 mRNA molecules, representing more than 3 × 104 different species. The median size of the mRNAs and the number of distinct species dictates that whichever restriction enzyme is used, each of the molecules should contain several sites. The most practical choice is to use an enzyme with a 4 bp recognition sequence, called a 4-cutter. The 4-cutter has a recognition sequence every 256 bp on average, so the average cDNA molecule would contain seven or eight sites. The average spacing of sites on a linear molecule would be physically quite small, ~85 nm, which in a practical sense eliminates far-field optical detection methods. This is discussed further in the next section. The required site density is also one reason that oligonucleotide probes are inferior to restriction enzymes for this application. A short probe (4- or 6 bp), even using modified nucleotides such as PNA, would not bind strongly enough to achieve the required labelling efficiency.
There are generally two experimental schemes for detecting the ordered restriction sites in a single molecule, and both require fixing the molecule to a flat surface for imaging. In the first method, which is the subject of this paper, the molecule is digested while fixed in situ and then the cleavage points are detected by AFM imaging (figure 1). In the second method, the enzyme is made to bind in place but not cut, and the whole molecule, with enzymes in place, is imaged. This amounts to ‘labelling’ in either case, and we will use the term ‘labelling efficiency’ to denote percentage of actual sites detected. Our mathematical analysis applies to both methods, and the relative merits of each approach will be discussed in the next section.
Several studies have examined the problem of sizing DNA molecules by AFM using backbone contour length as a metric, determined automatically in some cases and by hand in others. In spite of its simplicity, backbone contour length appears remarkably accurate for the molecule sizes tested (~300–20 000 bp) [167, 168, 186–189]. While the conditions varied among studies, single measurement sizing accuracy (defined here as population CV) better than ±2–5% was reported for most cases for distances larger than 1000 bp. Only three studies presented data for shorter distances and only Fang et al  have reported analysis of fragments shorter than 500 bp. For these smaller fragments, the sizing accuracy appears to be between ±7% and ±10%. One reason for the lower accuracy of sizing small fragments is likely to be tip convolution effects, which have not been corrected for in the published studies. Our own sizing experiments, discussed below, agree with the data from Fang et al. Therefore, it is reasonable to assume that 7% accuracy is achievable with AFM, with some optimization. In the case of our analysis, ±7% error on the average 256 bp spacing of 4-cutter restriction sites equates to 36 bp overall measurement accuracy.
The cDNA molecule is represented as a digital binary signature, (e.g., 00100110), in which each detected site is noted by a non-zero bit, and the distance between two neighbouring detected sites by the number of intervening consecutive zero-bits. In this analysis, the physical length represented by each bit is an integer number, and is determined by the precision with which one can measure the molecular fragments. This distance is a function of the AFM imaging resolution and the conversion factor used to calculate length in bp from molecular dimensions. Here we use the conversion of 0.33 nm/bp, derived from the known bp pitch of B-DNA (see section 2). As the molecule becomes shorter, or sizing resolution worsens, the signatures contain fewer coding bits, and as the digestion rate drops, the corrupted molecular signature deviates from the true signature. In each case, our ability to disambiguate pairs of cDNAs belonging to different species becomes progressively impaired. We assume a 4-cutter enzyme is used, which cleaves at any site in a random cDNA sequence with a probability pc = 4−4 = 1/256; thus a 2 kb molecule/signature would have about eight non-zero bits (cleavages) on average. To illustrate, consider a sample calculation that assumes a resolution α = 10 bp. The 2 kb molecule is then divided up into 200 bins of width 10 bp; therefore, the signatures are of length M = 200 bits. At this value of M, there are an enormous number of possible signatures: 2M ≈ 1.61 × 1060. In actuality, a mammalian cDNA sample would contain a very small subset of these possibilities. Following this logic, we can calculate the probability with which one could uniquely distinguish cDNA molecules present in a sample containing many similar species using AFM-determined restriction maps. Figures 2(a)–(d) shows the number of unambiguously identifiable 2.5, 2, 1 and 0.5 kb cDNAs (>95% probability), for a given bp sizing accuracy, as a function of labelling efficiency. The horizontal band, region A, indicates the approximate number of cDNA species of a specific size that might be expected per cell [182, 183, 185]. For cDNA of length 2.5 kb, as sizing resolution degrades from 50 to 90 bp, difficult-to-achieve labelling efficiency (>80%) is needed to distinguish many species (>104). Conversely, as sizing resolution approaches 30 bp, 105−107 species can be detected, even at low labelling efficiencies (30%–50%).
As discussed above, published reports of single molecule ordered restriction mapping have used two different schemes, in situ digestion with wild type enzymes or stable binding of the enzyme to the restriction site, using modified enzymes or buffer conditions. Using the latter method, Allison et al reported in 1996  and 1997  accurate, AFM-based EcoRI maps of large molecules, plasmids ranging from 3200 to 6800 bp, a cosmid vector (35 000 bp) and the lambda phage genome (48 000 bp). Importantly, they used a special mutant version of EcoRI, obtained from Modrich , that binds with reasonably high affinity to its recognition sites, but does not cut. While interesting, their method is an unlikely candidate for cDNA restriction mapping for two reasons: first, EcoRI recognition sites occur too infrequently, on average every 4096 bp; and second, mutagenesis techniques that efficiently separate specific binding from cleavage, if applied to more frequent cutting restriction enzymes, are likely to prove to be very difficult; we refer the reader to several good works on the subject [155, 191, 192].
A more promising approach is to use wild type enzymes but eliminate the Mg++ cofactor that is required for cleavage. This has been demonstrated by Oana et al using fluorescently labelled wild type EcoRI to map restriction sites on single molecules of the lambda genome DNA . Here, the binding efficiency, while not thoroughly characterized, seemed to be too poor (~10%) for cDNA profiling, based on our above calculations The role of divalent cations in restriction enzymology is currently an active area of research [191–194]. Recent work has shown that divalent cations, while being absolutely required for cleavage activity, also play a critical role in increasing enzyme binding avidity and their ability to distinguish cognate sites from similar sequences . Therefore, it remains unclear whether or not removal or replacement of Mg++ will prove to be a robust strategy for single molecule restriction mapping. Both Oana and Allison observed some non-specific binding, but the level was not well quantified.
The alternative approach of in situ digestion using wild type restriction enzymes has been studied extensively [156–164, 179, 195, 196]. In this method, genomic DNA molecules are elongated and fixed to a glass substrate, followed by in situ digestion and imaging. No sequence specific reporter is used—restriction cleavage sites are photographed directly on fluorescently stained DNA molecules. This technology has proven to be quite robust, enabling Schwartz and coworkers to map 6-cutter restriction sites across whole genomes of several microbes [159, 160]. It requires no amplification and the biochemistry is a single step and highly parallelizable. Only common, unmodified restriction enzymes are required, and most of those actually tested cleave with high efficiency [156, 159, 160, 162–164, 195]. Because the restriction sites are detected optically, this method, as reported, has difficulty resolving sites spaced closer than 2–5 kb apart. Unfortunately, most gene transcripts are shorter than 2 kb in length, and optical techniques can resolve at best one or two restriction sites on such short molecules , which is insufficient to discriminate more than a few species. AFM techniques can overcome this limitation; however, risk lies in the uncertainty that existing single molecule ordered restriction mapping methods can be adapted to work with much shorter molecules than used previously (<2 kb versus >30 kb), while accommodating the stringent sample preparation requirements of AFM.
Recent advances in AFM technology suggest that AFM can be used in high throughput applications, under certain circumstances. A recent report actually captured restriction enzyme cleavage of DNA in real time at a rate of 6 frames s−1 , though not in conditions compatible with ordered restriction mapping. Most high resolution studies of DNA by AFM use scanning speeds of ~3–5 µm s−1. Multipurpose AFMs are not constructed for high-speed scanning, since they have to fulfill conflicting requirements, such as large vertical motion range and various modes of operation. AFMs designed for high-speed scanning can image with molecular resolution at speeds up to ~60–75 µm s−1; some emerging designs may be able to image at a cm s−1 rate [198–216]. High-speed AFMs have emerged over the past 5–7 years, and incorporate more compact scanner designs, smaller and piezo-actuated cantilevers, and improved feedback electronics. Viani et al imaged DNA on mica to high resolution, in liquid, at rates up to 1.7 s/image, and also recorded fast protein binding dynamics [217, 218]. Ando used a high frequency piezo scanner and tip (250 kHz+) to record 100 × 100 pixel images in 80 ms. Tip speeds in their study reached 600 µm s−1 at 2 nm pixel resolution [207, 219]. Manalis has developed high-speed techniques where the cantilever itself is piezo-actuated . Rogers et al used actuated tips to image E. coli and mica steps at speeds up to 75 µm s−1 . Hobbs et al have developed VideoAFM, a design which replaces the cantilever with a high frequency tuning fork (micro-resonator) and circumvents the feedback control speed by using a passive technique. They have recorded 256 × 256 pixel images at rates exceeding 1 cm s−1, though it is unclear if this technique can provide the resolution required to observe restriction cleavage sites in DNA [205, 213].
We conducted two series of experiments using the in situ cleavage approach, combined with AFM, to construct fine restriction maps of a short plasmid and actual cDNAs. In the first series of experiments, the recognition sequences of RsaI, a 4-cutter, were mapped to high resolution on a 3.5 kb linearized plasmid (see section 2). This plasmid contained nine 5′GCAT3′ sites, and produced ten fragments when fully digested. The shortest spacing between sites was 34 bp, and the largest 950 bp. Six partially digested molecules were imaged to high resolution, using a square pixel size of 1 nm and a linear scan rate of 1.5–3.0 µm s−1. When the pattern of observed breaks was compared to the predicted RsaI map for the plasmid, five molecules aligned very well (figure 3). The one molecule that did not align appeared to have six spurious breaks, which we will discuss below. To align each molecule, the observed, ordered fragments were compared to the corresponding predicted fragment by size, based on the known sequence. The width of the breaks in the molecules ranged from 8 to 42 nm, with an average value of 17 nm. In two of the molecules, the small end fragments of 101 and 34 bp were missing and may have desorbed. Otherwise, the individual fragments remained stable in situ throughout processing. Of the twenty fragments total, the median sizing error of 2% was quite good. Eighteen total cleavage sites were observed, out of a predicted 45; however, three sites appeared to be non-specific breaks, based on the map alignment. This indicates a cleavage efficiency of 33%. However, the two closely spaced restriction sites, 34 bp apart, produce an 11 nm long fragment that may be easily, and undetectably, desorbed. Correcting for this, the true cleavage efficiency would approach 40%. More non-specific breaks were observed than expected, and we speculate that this is a function of the high fluid shear required to fully elongate these molecules, rather than spurious cleavage by RsaI, which is not known to produce non-specific cleavages when the proper buffer conditions are used. Our previous optical mapping work has shown that ‘false cuts’ are much more likely to be non-specific breaks than the enzyme cutting in the wrong place (so-called ‘star activity’) [162, 163], which can be eliminated by using the correct digestion buffer conditions. Non-specific breaks in the DNA backbone, caused by excessive strain during deposition, chemical degradation, and other strain processes degrade our ability to profile cDNAs. Image artefacts also can produce apparent ‘false cuts’. We have found, both here and in previous optical mapping work [156, 159, 160, 162, 163], that the rate of ‘false cuts’ is largely a function of sample handling, sample age, and image quality.
We also used AFM profiling to measure the components of a mixture containing one part DNA from the cancer-related human CD44 gene [222, 223] and one part linearized DNA plasmid pOTB7 with no insert. Both CD44 and pOTB7 molecules are approximately 1800 bp (594 nm) in length. While the enzyme used, PstI, is a 6-cutter with recognition sequence 5′CTGCAG3′, it produces smaller than average fragments in the two test molecules: pOTB7 contains a PstI recognition sequence 354 bp (117 nm) from its 5′ end, and CD44 contains a PstI site 169 bp (65 nm) for its 5′ end, and has an additional PstI site 1046 bp (345 nm) from its 5′ end (figure 4(a)). Molecules cleaved once with PstI rather than twice were chosen for measurement to increase yield. Images were collected using square 3 nm pixels and a linear scan rate of 2–4 µm s−1. The frequency of 1-cut molecules determined from a collection of fifty 1 µm × 1 µm AFM images was determined (figure 4(b)). In the sample, molecules with a PstI site ~169 bp ±10% from one end, indicative of pOTB7, were approximately as prevalent as those with a site either 354 bp ±10% or 1046 bp ±10% from an end, indicative of CD44 (figure 4(b)). As the sample contained a pure mixture of two species, this distribution of 1-cut molecules is statistically significant and provides the expected frequency from a 1:1 mixture of the two molecules.
Returning to our mathematical calculations, in figure 2, region B indicates the parametric space accessible given the resolution and labelling efficiency inferred from published studies and from our experiments. For cDNAs 2.5 kb in length or longer, high density restriction mapping can distinguish >106 different species in the best case (40% cleavage efficiency, 30 bp resolution). This decreases to <104 species for cDNAs 2 kb in size (figure 2(b)). For 1 kb cDNAs (figure 2(c)), either an increase in cleavage efficiency, to 65%, or an increase in resolution, to 20 bp, is required to distinguish a minimum of 104 species uniquely. For cDNAs 0.5 kb in length, both an increase in resolution to 20 bp and high cleavage efficiency (>75%) is required to distinguish at least 103 species uniquely (figure 2(d)).
One key difference between the study by Feng et al  and the current one is that in their work individual molecules were deposited from solution rather than being generated as in situ fragments from larger DNA molecules. In terms of surface chemistry, the current method differs in that it includes increased surface fixation avidity, which enables the use of higher ionic strength washing solutions. These differences in technique have been reported to alter the observed chain length of DNA, which is an anionic polymer . To better characterize measurement accuracy in our system, a series of six DNA fragments, 230 to 4731 bp, were measured using the AFM to determine backbone contour length. Assuming a conversion of 0.33 nm bp−1, our results duplicated the reported data in sizing accuracy, defined as population CV . The linear regression slope coefficient for these data was 1.0154 with an R2 of 0.9994 (figure 5). As a reference, the data also compared favourably with single DNA molecule sizing based on fluorescence , which is the closest comparable surface-biochemical system. The clear advantage of AFM in single fragment sizing is apparent in the lower sizing dispersion and the ability to accurately size very small molecules (<300 bp). The coefficient of variation was 8–10% for small fragments (<600 bp) and as low as 5% for the largest fragment (4700 bp) using AFM as compared to >16% using fluorescence. Also, fluorescence methods require internal references in each image to convert fluorescence intensity into molecular length, while AFM can directly convert backbone contour length accurately to bp.
The substrates used in this study have to fulfill three conflicting criteria: (1) maximum smoothness, (2) stringent molecular adhesion, and (3) permit normal activity of the restriction enzyme. First, for AFM imaging the surfaces must be smooth on a roughness scale that is less than the diameter of the DNA molecule (~2 nm). Second, surfaces used for in situ restriction digestion here must bind and retain small fragments (<1000 bp) in moderate ionic strength buffer. APTES silanization generates roughness that is generally proportional to the amount of silane molecules deposited on the surface [224, 225]. More adhesive surfaces that hold small fragments require more aminosilane to be adsorbed, which in turn creates greater roughness. Third, the surface must not bind the molecules so stringently so as to sterically hinder the enzyme, which could cause incomplete digestion and/or nonspecific cleavage.
To address theses requirements, we developed an APTES application protocol to produce an AFM-compatible surface that retains enough positive charge to bind and hold small DNA fragments. The contour profile of one of these substrates (figure 6) shows surface irregularities <1 nm in height and an RMS roughness of ~0.4 nm, which is smooth enough to resolve DNA molecules using AFM. Silane hydrolysis and surface adsorption kinetics indicate that polymerized aggregates of multifunctional silanes accumulate in solution rapidly after 10 min in aqueous solvent, and these adsorbed aggregates increase roughness on silanized silica substrates [225–227]. We therefore chose short, less than 1 h, derivatization reaction times.
We determined that in situ DNA digestion increases surface roughness, resulting in reduced contrast AFM images, although this reduction was not sufficient to preclude sharp AFM imaging. One source of roughness is the restriction enzyme, which adheres to the positively charged surface, though less avidly than negatively charged DNA. Even without enzyme treatment, the contrast in AFM images was reduced after treatment with enzyme digestion buffer; we suspect that adsorption of salt from the restriction enzyme buffer or rearrangement of the APTES layer itself when exposed to aqueous solution may be responsible. After some trial and error, we were able to repeatedly produce molecularly smooth samples, using low silane concentrations, which were compatible with DNA stretching and enzyme digestion. The large body of organosilane research suggests ways to improve the performance of these surfaces (durability, hydrolytic stability) in numerous ways, including using spin-coating, oven curing, applying mixtures of silanes, and potentially multi-layer coatings [160, 228–238].
In particular, it is possible that appropriate surface chemistry ‘tuning’ will improve the enzymatic digestion rate substantially. Based on our experience with this particular surface/biochemical system, we believe that it is possible to approach the 70–80% digest rates we have previously achieved in optical mapping studies [156, 159, 160, 162, 163]. The enzyme digestion rate will also improve as the incubation time is increased. The incubation time used in these studies was constrained because the APTES monolayer begins to degrade after 45 min−1 h in aqueous buffer, we believe, due to partial hydrolysis [227, 238]. We have determined that simply increasing the thickness of the APTES layer greatly improves resistance to hydrolysis, but results in unacceptable surface roughness for AFM imaging. Reports suggest the use of pre-cross linked, or bis-silanes, among other options, will increase hydrolysis resistance substantially, while retaining monolayer smoothness. Benkoski et al produced a molecularly smooth bis-silane film of thickness 1–10 nm, roughly equivalent to the thickness of our APTES layers . The RMS roughness of these bis-silane layers varied from 0.15 to 0.4 nm for the roughest samples.
Our mathematical analysis of single molecule, ordered restriction mapping of cDNAs yielded benchmarks for sizing accuracy and labelling/cleavage efficiency that can guide future studies. The sizing accuracy and labelling efficiencies observed in our experiments, and inferred from the cited literature, suggests AFM profiling of cDNA with restriction maps allows, in theory, unique identification of up to ~104 individual, 2 kb long, species, and greater than 106 individual species 2.5 kb or longer. Given that roughly 40% of mammalian cDNAs are 2 kb or longer [182, 183, 185], this approach could, in principle, quantify a sizable fraction of the message transcripts within a single cell to single molecule precision. In reference to our experiment, achievable improvements in size resolution (20%) or cleavage efficiency (30%) would permit complete quantification of 1 kb or longer cDNAs, or roughly 80% of all transcripts [182, 183, 185]. Species ~0.5 kb are more difficult. A sufficient number of species in this category can be resolved with simultaneous improvements in cleavage efficiency (2×) and size resolution (1.5×).
Improvements in sizing accuracy are likely to come from tip deconvolution techniques, which are especially relevant to molecules smaller than 1 kb. More sophisticated metrological methods, which take into account information beyond molecule contour length, such as apparent volume, or frictional information, may also improve sizing accuracy. It is harder to speculate on specific ways to improve labelling efficiency, because inter-molecular binding can be modified with so many chemistries.
Both the in situ cleavage approach and the bind-but-not-cut approach to single molecule, ordered restriction mapping appear viable; however, the former is much better characterized. AFM throughput was not a focus of this work, but it is clearly a critical issue. The most commonly used AFMs require careful operator attention to produce high quality data and prevent damaging the tip or the sample. This is particularly true with many biological samples, such as cells, because they have highly variable shapes and material properties. In contrast, our samples used here are very smooth and the structures imaged, DNA strands, are all essentially identical. Because the samples are so smooth and regular, industrial automation techniques can be implemented. Automated AFM is used widely in the semiconductor fabrication industry for chip inspection, and is favoured for its ability to inspect nanometre features while not damaging the sample. Industrial AFMs run unassisted for extremely long periods at high duty cycles, and have the capability to replace tips automatically. We have found that the DNA molecule itself serves as a very good reference for scan quality and tip condition.
Part of this work was supported by NIH grants R21GM074509 (JG, JR and MT), R21HG003714-01 (JG, JR and BM), R01CA74929 (MT), R01CA107300 (MT), PN2EY018228 (MT), the Margaret E. Early Medical Research Trust (MT), and CMISE (JG and MT), a NASA URETI Institute (NCC 2-1364). MT is a Scholar of the Leukemia and Lymphoma Society. BM was also supported by a USAMRMC grant (W81XWH-05-1-0026) and a NIST grant (no. 60NANB5D1199).
*Based on invited talk at the International Conference on Nanoscience and Technology 2006.