Exonucleases are integral components of many genetic repair and recombination pathways (1
). Processive 5′→3′ exonucleases generate DNA intermediates with long 3′-overhangs involved in these pathways (1
); in bacteriophage lambda, the Red α gene encodes the 24-kD subunit of such a nuclease (3
). The structure of lambda exonuclease (λ-exo) consists of a toroidal homotrimer of subunits, with a tapered central channel sufficient to admit double-stranded DNA (dsDNA) at its entrance but only wide enough to pass single-stranded DNA (ssDNA) through its exit (4
). A topological constraint may therefore underlie the high processivity of λ-exo (>3000 base pairs, where 1 bp corresponds to a helical rise of 0.338 nm along dsDNA) (5
). λ-exo digestion requires Mg2+
, but does not require ATP or GTP, and proceeds along DNA at rates of 2 to 3.5 nm/s, as determined by bulk biochemical studies (6
Here, we developed a novel high-resolution, single-molecule assay to study λ-exo motion by optical trapping nanometry. Besides measuring individual translocation rates, single-molecule studies provide insights into other processes that are obscured by bulk averages. For example, certain processive DNA enzymes pause for variable intervals, a behavior that may reflect off-pathway branches in the reaction cycle. Pausing behavior has been observed in single-molecule studies of transcriptional elongation by RNA polymerase (8
), viral phage packaging by
29 portal protein (9
), and DNA unwinding by Rep helicase (10
). The biological significance of pausing in such disparate enzyme systems is not yet well understood, however.
Our experimental assay consists of a surface-attached enzyme bound to a single DNA molecule () (11
). A His-tagged version of λ-exo was preincubated with a 7.1-kbp substrate (derived from M13 viral DNA) in the presence of Ca2+
, which arrests the enzyme, and anchored stereospecifically with a His-specific antibody linkage to the cover glass surface inside a flow cell. Once Mg2+
was introduced into the buffer, digestion started and the enzyme moved processively. The DNA molecule was attached at its distal end to a small polystyrene bead, which formed a tether. The tethered bead was captured and held under tension with an optical trap, and its position was monitored by back focal plane interferometry (12
). We implemented a stage-based force clamp using a computer-driven 3D piezoelectric stage that maintains the bead at a fixed offset from the trap center (11
Fig. 1 (A) Experimental geometry of the stage-based optical force clamp (not to scale); bead displacement relative to the trap center (xbd) was maintained by stage motions (xs → xs′) that kept the load constant. The bead was positioned at a predetermined (more ...)
Motion of λ-exo was not constant along the substrate, but displayed frequent pauses () interspersed with periods of uniform movement. The locations and durations of these pauses could be accurately resolved by averaging data at 0.4 Hz under comparatively low load (1 pN) (, inset). Higher loads, up to ~3 pN, had little effect on speeds (14
When independent records of λ-exo motion were compared by eye, many of the longer pauses appeared at closely corresponding locations on the DNA substrate. A small vertical shift of individual traces sufficed to bring pauses at different DNA locations in separate records into alignment. To score pausing statistics and variations in digestion rate among individual λ-exo molecules, we focused on the region containing the dominant pause at ~900 nm () (11
). The 900-nm pause and nearby weaker pauses were used as fiducial marks to align the corresponding locations for each record. To perform the alignment on 40 records, an average offset of −2.1 nm ± 12 nm (mean ± SD) was necessary; more than half (7 nm) of the offset variability (12 nm) is due to variation in the bead diameter (11
). The alignment demonstrated that pausing was sequence-dependent and allowed us to compute the pause probability and duration for the most prominent pause locations ().
Fig. 2 Sequence specificity of pausing. (A) Records of digestion in the forward direction for 28 λ-exo molecules, aligned at the dominant pause at 900 nm. For clarity, only a selection of pause sites of various relative strengths () are indicated (more ...)
Table 1 Statistics for 11 pauses of various durations located between 600 and 1200 nm and identified by the pause-finding algorithm. Positions are reported as Lnm (±2.3 nm, SEM) and as Lbp (±7 bp, SEM); these are related through Lbp = 6406 bp (more ...)
Among the possible explanations for pausing are changes in the local properties of the DNA structure, such as twists or bends (15
); variations in local DNA energetics or stability, such as a series of A·T or G·C base pairs; and stereochemical interactions between the λ-exo enzyme and specific DNA sequences. If the structural or energetic properties of dsDNA alone dominated, pausing would be expected to occur at (or near) the same positions, regardless of the direction of digestion. To test this possibility, we engineered substrates with opposite polarity, so that digestion took place from a 5′-terminal phosphate on the complementary strand in the reverse direction. For such constructs, the dominant pause at 900 nm disappeared (), and no significant correlation was found between pause locations in the forward and backward directions (16
). This finding implies that pausing is strand-specific, as well as sequence-dependent, and eliminates a class of candidate mechanisms, including such symmetric measures as the G·C content. Notably, for the DNA region between 675 and 1200 nm (over which data were most plentiful), there were significantly more long pauses found for digestion of the sense strand (forwards) than for the nonsense strand (backwards) (17
Is pausing a stochastic or a deterministic property of enzymatic activity? The wide variation from molecule to molecule in pausing at characterized sites () contrasts with the consistency in catalytic rates, reflected in the uniform speeds of digestion between pauses ( and ). This difference suggests that pausing is stochastic and is likely to represent an off-pathway state unrelated to any translocation events in the reaction cycle. The comparatively steady rate of λ-exo digestion also contrasts with the nearly fivefold variation in single-molecule speeds reported in experiments on RNA polymerase (8
) and RecBCD (19
). Is the propensity to pause a variable feature of each individual enzyme? It did not appear from an inspection of records that certain enzymes were more prone to pausing than others. Quantitatively, our data failed to display a correlation between the time spent at the dominant pause at 900 nm and at a pause at 950 nm (), implying that pause durations at these two nearby sites are statistically independent. The histogram of durations for the pause at 900 nm is shown in . Moreover, pausing appears to be independent of the surface immobilization; if bound enzymes paused more frequently, their average rates of digestion would be slower. The mean speed of single-molecule digestion (including pauses) was 3.3 ± 0.5 nm/s (10 ± 2 nucleotides (nt) per s) [mean ± SD; N
= 40]), which agrees within error with the digestion rate of 12 nt/s reported in a recent biochemical study under similar buffer and temperature conditions (7
Fig. 3 Velocity and pause statistics. (A) Histogram of mean forward λ-exo digestion speeds, computed with (white bars) and without (gray bars) intervening pauses >1 s (N = 40). The average speeds with and without pauses were 3.3 ± 0.5 (more ...)
To compare records with specific DNA sequences, we analyzed products of λ-exo digestion at steady state in the presence of an excess of 3′-32
P–labeled DNA by high-resolution polyacrylamide gel electrophoresis (11
). In gel assays, the strand to be digested carried a 5′-phosphate, whereas digestion of the complementary strand was blocked by a 5′-OH group (6
). Bands are produced at every base position, and pauses are characterized by darker bands; the density of any given band is related to the pause strength, which is the product of the duration and probability of pausing.
First, we set out to identify DNA sequences corresponding to the pauses identified in single-molecule records. Duplex DNA molecules (~300 bp) for several regions carrying pause sites were generated by polymerase chain reaction (PCR) (table S1), digested, and run out on gels. Analysis revealed a single prominent band at nucleotide 3,738 in the M13 sequence (), corresponding to a computed position of 902 nm along the DNA; this matches up extremely well against the dominant pause found at 900 nm. We estimate our positional error, after drift removal and fiducial alignment, at approximately 2 nm (6 bp) (21
). The digestion product displayed a single band, rather than a series of adjacent bands, which implied that λ-exo paused at a unique position. None of the other briefer, sequence-dependent pauses detected in single-molecule records were resolved by DNA gel analysis; this may be attributable to the comparatively weaker relative pause strengths at these locations () and the relatively poor signal-to-noise ratio (S/N
) afforded by gel analysis (S/N
= 5 ± 0.5 for the dominant 900-nm pause).
Fig. 4 Polyacrylamide gel analysis of pausing. (A) Exonuclease digestion of PCR products (~300 bp) flanking several pause sites identified in single-molecule records, designated Pnnn atop each lane, where nnn is the nominal position, in nanometers. Numbers (red) (more ...)
Next, we sought to characterize the determinants responsible for the dominant pause. Oligonucleotides were designed in which 20-bp regions immediately before (FlipA) or after (FlipB) the pause site were exchanged from one strand to the other (; table S2). Because single-molecule assays showed that pausing is strand-specific, we reasoned that swapping the full sequence specifying a pause from one strand to its complement would cause a new band to appear on gels digested in the backward direction, with concomitant loss of the band in the forward direction. Upon digestion, the FlipB substrate showed the expected behavior, and the pause site was successfully transferred to its corresponding position on the opposite strand, whereas for the FlipA substrate, the pause site was not transferred between strands (). This finding restricts the sequence specifying a pause to a 20-bp region ahead of the active site of the enzyme. To further isolate the minimal sequence, a third set of oligonucleotides was designed that flipped the central 18 bp of the sequence (Flip18). This also caused the pause to exchange strands (). Collectively, these results implicate a comparatively short region located up to 9 bp ahead of the enzyme active site, within the sequence 5′-GGCGATTCT-3′. Models of DNA interacting with the enzyme crystal structure suggest that 11 bp may be enclosed within the central channel (22
), and nuclease protection assays suggest that the footprint of the enzyme spans ~13 to 14 bp (6
). Our data therefore support a model where the DNA bases located within the central channel interact directly with the protein, as opposed to alternative possibilities, for example, hairpin formation in the ssDNA digestion product or the effect of upstream dsDNA sequences.
Does the sequence determining the dominant pause—or some sequence related to it—correlate with other pauses found in single-molecule assays? To investigate this question, we performed a cross-correlation analysis of the sequence in the DNA sense strand (the digested strand) against a function based on the average of single-molecule records, in an effort to identify the relevant motif (11
). A histogram of dwell times was produced by binning aligned traces over a range of contour lengths from 590 to 1760 nm, with the use of a 5-nm window. A simple integer-scoring function was adopted to compare the running DNA sequence against a given candidate for the pause sequence. A trial oligomer of n
nucleotides was selected for the candidate, with n
ranging from 4 to 9 bases. This n
-nucleotide oligomer was then compared with the substrate base sequence at all positions, with a score of +1 awarded for each exact base match, 0 for a mismatch, and an additional penalty of −1 assessed for any transversion (purine-pyrimidine interchange) altering the number of base-pair hydrogen bonds (i.e., for A↔C and G↔T). The running score was exponentiated (pause times are exponentially related to the underlying energetics) and smoothed. Finally, it was cross-correlated against the dwell time histogram and used to calculate a correlation coefficient, r
. The value computed by this procedure supplies a numerical measure of how well the candidate sequence predicts the experimentally observed distribution of dwell times. Correlation coefficients for all possible n
–mer sequences were computed and ranked (table S3).
For the 9-nt oligomer GGCGATTCT corresponding to the dominant pause sequence identified by gel analysis, the correlation was quite high, r
= 0.451; less than 1.5% of all 262,144 possible 9-nt oligomers produced a correlation exceeding even half this value (11
). Of 1024 possible 5-nt oligomers, the sequence GGCGA yielded the single largest correlation, r
= 0.296. For each n
-nucleotide oligomer size, we also examined the 1% of sequences producing the greatest correlation values, e.g., the top-ranked 2621 of all 9-nt oligomer sequences, etc. Among these best-correlated sets of sequences, we scored the most frequently occurring imbedded sequences (of length m
, with m
). The 5-bp motif GGCGA consistently appeared as either the first or the second most common imbedded sequence found in every set of 6-, 7-, 8-, and 9-nt oligomers, with no other imbedded sequences appearing consistently in the rankings. The same 5-bp motif was returned by a variant of this analysis, in which we substituted the pause strength at all sites identified by the pause-finding algorithm (N
= 124) for the dwell-time histogram. This analysis identifies GGCGA as the 5-base motif statistically most correlated with pausing within the 1170-nm (~3.5-kb) interval studied (table S3). Of particular note, the correlation analysis relies solely on single-molecule data, without a priori knowledge of the site identified by gel assays. These complementary approaches independently suggest that determinants for pausing are contained within GGCGATTCT. The 5-nt oligomer GGCGA is only imperfectly correlated with dwell locations, however, which raises the likelihood that other sequence motifs may also contribute to pausing signals. Exploring alternative scoring functions, we found that pause locations failed to correlate with purine content (r
= 0.05) or with the energetics of DNA stability (r
= 0.07), and correlated only weakly with GC content alone (r
= 0.19) (table S4).
What features of the consensus sequence GGCGA slow the enzyme? The initial triplet of the sequence, GGC, has a high melting temperature, second only to GCG among all 3-nt oligomers (23
). However, thermal stability of the DNA alone is unlikely to supply the explanation, because of the following argument (and also owing to the failure of DNA stability and GC content to correlate well with pausing, as discussed). Gel analysis shows that the dominant pause occurs reproducibly at a single base and not over a distribution of several nearby bases. Because the enzyme cleaves mononucleotides, the most parsimonious explanation is that λ-exo translocation and DNA strand separation occur one base at a time. Based on the structural model (4
), when the 5′-terminal G is positioned at the enzyme active site, located 15 to 18 Å from the central axis, ~2 to 5 terminal nucleotides would be unpaired at this point, i.e., they would be melted before the pause and not concomitant with it.
We instead favor a model where pausing is conferred by interactions between λ-exo residues and specific nucleotides in the DNA substrate. The unpaired bases located in the “frayed” segment of ssDNA postulated to exist between the central axis and the enzyme active site represent obvious candidates for such interactions. We speculate that residue Trp24
of the protein, located on helix αB lining the central channel and directly facing the catalytic site [residues 22 to 26; (22
)], may intercalate between two adjacent guanosine bases in this segment and form a tight ring-stacking interaction. A similar stacking arrangement between DNA bases and aromatic amino acids has been postulated in the mechanism of translocation in helicase (24
). The sequence GGCGA contains four purines that may potentiate stacking interactions and is sufficiently short to be entirely in single-stranded form inside the enzyme channel. Clearly, however, additional interactions between the ssDNA and the enzyme would be necessary to confer the observed degree of sequence specificity. A more complete understanding of such interactions awaits enzyme-DNA cross-linking studies or a cocrystal structure of λ-exo bound to a strong pause sequence, such as the one identified here.
We speculate that the propensity of exonuclease to pause at specific sequences may explain a longstanding puzzle about the inhomogeneity of recombination rates in bacteriophage lambda. When lambda recombination occurs in the absence of DNA replication—and is therefore dependent on the lambda Red system, of which λ-exonuclease is a principal component—the frequency of recombination is much lower at the left end of lambda than at the right (25
). In addition, a strong, nested pause signal containing both GGCG and GGCGA is found within the first nine bases of left lambda cohesive end (cosNL), GGGCGGCGACCT (26
). If the pausing of lambda exonuclease induced in vitro by these sites is reflected in its activity in vivo, then the slowing of digestion from the left end of lambda may singly, or in conjunction with lambda terminase (27
), be able to account for the recombination deficit.
We anticipate that the improved spatiotemporal resolution achieved here, which made possible the nanometer-level identification of sequences, will not only facilitate further work on λ-exo, but also enable detailed studies of other DNA-based enzymes at the single-molecule level.