|Home | About | Journals | Submit | Contact Us | Français|
Nucleic acid hairpins provide a powerful model system for understanding macromolecular folding, with free energy landscapes that can be readily manipulated by changing the hairpin sequence. The full shapes of energy landscapes for the reversible folding of DNA hairpins under controlled loads exerted by an optical force clamp were obtained by deconvolution from high-resolution, single-molecule trajectories. The locations and heights of the energy barriers for hairpin folding could be tuned by adjusting the number and location of G:C base pairs, and the presence and position of folding intermediates were controlled by introducing single-nucleotide mismatches.
Elucidating the mechanisms by which proteins and nucleic acids fold into three-dimensional structures is key to developing insights into biomolecular function (1), improving predictive models (2, 3), and understanding the basis of diseases linked to misfolding (4). For over two decades, free energy landscape formalisms have provided the fundamental conceptual framework for describing folding (5). Numerous experimental and theoretical studies have probed specific features of folding landscapes, including the properties of transition states (6), intermediate states (7), and the ruggedness of the energy surface (8). Experiments, however, have characterized only limited aspects of the folding landscape, such as the locations and heights of energy barriers, and how these barriers change when perturbed by solvent substitutions, temperature jumps, substrate changes, or mutations (9). Direct measurements of the shape of an energy landscape at all points along the reaction coordinate have not been feasible. Here, we show how the full energy landscape for the formation of a nucleic acid hairpin can be derived from sufficiently high-resolution trajectories of single-molecule folding.
Single nucleic acid hairpins subjected to mechanical loads provide a powerful model system for investigating energy landscapes and understanding the effects of primary and secondary structure on folding (10-13). The molecular end-to-end extension is recorded during the folding transition, supplying a natural reaction coordinate that can be related directly to the number of bases paired in the hairpin stem. Previous work has characterized specific aspects of the folding landscape. In particular, short hairpins tend to fold as simple, two-state systems (10, 13), indicative of a single transition energy barrier. Conventional analysis of single-molecule records supplies the free energy difference between the folded and unfolded states, as well as the height and location of the barrier (10). For hairpins with random (unpatterned) stem sequences, the barrier is typically located close to the unfolded state, with a height controlled largely by the size of the loop (13). However, finer details of the folding landscape have been heretofore inaccessible, due to limited spatial and temporal resolution as well as instrumental baseline drift. Using a high-bandwidth, passive force clamp with an ultra-stable dumbbell assay (14), we have now been able to reconstruct the shape of the landscape.
Sets of DNA hairpins were synthesized in which the heights and locations of energy barriers were systematically varied, as well as the numbers and locations of any folding intermediates. Sequences were designed based on a model of the sequence-dependent energy landscape derived from the thermodynamic and mechanical properties of nucleic acids (13, 15). Both ends of the hairpins were attached to long handles of dsDNA (13) bound specifically to polystyrene beads held in a dumbbell configuration by two independently-controlled optical traps (Fig. 1A). A constant force, F, was applied with a force clamp (14), and high-resolution trajectories of the end-to-end extension (~0.1 nm/√Hz) were recorded for a range of forces. The extensions of folded, unfolded, and any intermediate states were measured directly from these records. The locations and heights of energy barriers between these states were computed from the force-dependence of the state lifetimes (10, 13). These measurements of specific points on the landscape were then taken as benchmarks for an experimental determination of the free energy at every point along the reaction coordinate, deconvolving the measured probability distribution of hairpin extension to correct for blurring effects arising from thermal motions associated with the beads and the DNA tether.
A typical record of extension under load (Fig. 1B) shows two-state folding behavior: two nearly Gaussian peaks in the extension histogram correspond to the folded and unfolded states. Here, F ≈ F1/2, the load at which the hairpin spends equal time in each state. The lifetimes of the folded (τf) and unfolded (τu) states depend exponentially on F according to τf(F) exp(-FΔxf‡/kBT), where Δxf‡ is the distance to the barrier from the folded state and kBT is the thermal energy (16); an analogous expression holds for τu(F). Previously, the transition state (TS) for folding a hairpin with an unpatterned stem sequence was found to involve the formation of 1−2 base pairs adjacent to the loop (13), resulting in an energy barrier near the unfolded state. In contrast, the barrier for the hairpin in Fig. 1B lies much closer to the folded (Δxf‡ = 5.4 ± 0.5 nm) than to the unfolded state (Δxu‡ = 13 ± 1 nm), implying that the TS requires the formation of ~15 base pairs. This difference in behavior is due to the particular sequence selected for the hairpin: a contiguous block of strong G:C base pairs placed near the base of an A:T-rich stem moves the barrier near to the folded state (Fig. 1B, inset).
We created a family of hairpins in which the TS position was systematically manipulated by moving the block of G:C base pairs to various locations within the stem (Table S1). Determining the barrier location for each hairpin, we found that the TS moved in concert with the G:C block, always located at the edge of the block nearest the loop (Fig. 1C; Table S2). The sum of Δxf‡ and Δxu‡ agreed well with the distance between folded and unfolded states (Δx), as expected for a pure two-state system. Measurements were in excellent agreement with landscape model predictions (Fig. 1C).
We also created a family of hairpins where the barrier position was fixed at the center of the stem using a G:C block, but the barrier height was altered by changing the overall stem G:C content (Table S1). State lifetimes measured at F1/2 for each hairpin (17) varied by a factor of ~600 over the entire family (Fig. 1D; Table S2). Assuming Arrhenius behavior for the lifetimes, τ1/2 = τ0exp(ΔG‡1/2/kBT), where ΔG‡1/2 is the barrier height at F1/2 and τ0−1 the attempt rate at zero force, these results show that the barrier height changed by 6.4 ± 0.7 kBT, matching the variation predicted by the model, 6.9 ± 0.2 kBT. All together, the results of Fig. 1 confirm the remarkable level of control afforded by this system over the folding landscape: the TS can be placed at will along the reaction coordinate and its energy adjusted over a wide range simply by manipulating the hairpin sequence.
Extension records such as those in Fig. 1B and (10, 12, 13) have traditionally been interpreted in terms of two-state folding over a single energy barrier. However, not all hairpin sequences exhibit strict, two-state folding behavior. For example, it was recently reported that the nominal “folded state” may, in fact, consist of an ensemble of states comprised of the folded state plus a series of frayed states with one or more base pairs unzipped (13). Moreover, short-lived intermediate states may be present that are unobservable at the available temporal resolution (18). Simple two-state analysis ignores details of the trajectory between folded and unfolded states because the motion is taken to be instantaneous, and properties of the energy landscape are inferred only from characteristics of the two states. To induce the hairpin to spend more time between folded and unfolded states, and to observe intermediate properties during folding more clearly, we manipulated the sequence to produce a local potential well between the folded and unfolded states by inserting a single T:T mismatch at various positions along the hairpin stem.
When the mismatch was placed at the seventh base pair from the base of the stem, extension records at F ≈ F1/2 revealed a shoulder on the histogram peak at low extension (nominally, the folded state), indicative of a third peak representing an intermediate state (Fig. 2A). Such a shoulder was observed systematically in all records, but associated only with the low-extension peak. The existence of an intermediate state may also be inferred from the rapid fluctuation of ~5 nm amplitude recorded at low extensions (Fig. 2B). Similar results were obtained by introducing a G:A or a G:T mismatch, rather than T:T. Repeating these measurements for an entire family of hairpins with mismatches located 4−16 bp from the base of the stem (Table S1), we consistently observed the signature of an intermediate state, whose distance from the folded and unfolded states depended on the location of the mismatch (Fig. 2C; Table S2). Interpreting the hairpin extension in terms of the number of base pairs formed, the intermediate states correspond to hairpins partially folded up to the point of mismatch.
These results demonstrate the precision with which certain features of the folding landscape can be determined, but they define only a few key points on the energy landscape. Notably, they don't address more general features of the landscape, such as the widths and curvatures of the potential wells or barriers, which are known to affect folding (19). By further analyzing the folding trajectories, however, the entire landscape along the reaction coordinate can be reconstituted. The free energy at a given extension, ΔG(x), is related to the probability density, P(x), through ΔG(x) = −kBT ln[P(x)] (20). Although conceptually straightforward, this method of determining ΔG(x) requires accurate measurements of P(x) in the region between the states, where the hairpin spends little time (~100−300 μs, here). Hundreds to thousands of transitions must therefore be sampled at high bandwidth, necessitating exceptional instrumental stability. A second complication is that the measured extension represents that of a hairpin attached to dsDNA handles and beads, rather than an isolated hairpin. The thermal and mechanical properties of the trapped dumbbell smooth and dampen the apparent motions of the hairpin (13). The underlying energy landscape may be recovered from P(x), however, by a deconvolution process.
To reconstruct the full energy landscape, we measured the folding trajectory of single hairpins at F ≈ F1/2 at high bandwidth (50 kHz) for 5−15 min and created a histogram of the extension, P(x). Instrumental drift was typically ≤1 nm. The point spread function (PSF) for the deconvolution, S(x), was estimated from extension histograms of the folded state for a hairpin with 100% stem G:C content and found to be a Gaussian (Fig. S2) whose width is governed by the stiffness of the trapped dumbbell (14). The energy landscape was then determined by a constrained nonlinear iterative deconvolution (21) of the extension histogram. An initial guess for the potential, ΔG(0)(x), was constructed by assuming parabolic potential wells located at the histogram maxima, separated by a parabolic barrier whose height and position were determined from the measured, force-dependent rates (as in Fig. 1). The associated extension probability p(0)(x) was then convolved with the PSF and compared to P(x). The difference was subtracted from p(0)(x), constraining the probability to be between 0 and 1, and the process was iterated (15). The solutions, p(n)(x), and associated landscapes, ΔG(n)(x), are shown in Fig. 3, along with the measured P(x), ΔG(x), and residuals , for four different hairpin sequences designed to explore a range of barrier positions, heights, and possible intermediate states. Shown in Fig. 3 are a 20 bp stem with a TS located 18 bp from the base of the stem (Fig. 3A), a 20 bp stem with TS located 6 bp from the base of the stem (Fig. 3B), a 20 bp stem with T:T mismatch located 7 bp from the end of the stem (Fig. 3C), and an unpatterned stem sequence of length 30 bp (Fig. 3D). In all four cases, the deconvolution algorithm generated a stable solution with acceptably small residuals.
The subtle differences seen in P(x) and ΔG(x) were sharpened by the deconvolution procedure into recognizably different landscapes that reflected the underlying sequence and recapitulated the results in Figs. 1 and and2.2. In Fig. 3A, the barrier is located near the unfolded state, whereas in Fig. 3B it is nearer the folded state. The hairpin in Fig. 3C, which contains a mismatch, shows a clearly-resolved intermediate state, corresponding to partial folding up to the point of mismatch. These measurements go beyond the previous results, however, by revealing details of the well and barrier shapes. For example, the energy minimum for the folded state in Fig. 3A is significantly broader than that in Fig. 3B: the width of this well supplies direct evidence that the nominally folded state for this hairpin actually consists of an ensemble of states with up to ~4 bp unzipped. A similar situation is seen in Fig. 3D, although the slightly steeper well suggests that the fully folded state plays a more dominant role in this mixed ensemble than it does in Fig. 3A. The energy barriers in Figs. 3A and D are clearly different: the barrier in Fig. 3D is wider than in Fig. 3A, indicating a TS that is less well-defined and therefore more susceptible to experimental perturbation, e.g., by mutagenesis or solvent condition changes.
To explore the validity of these measurements, we compared the experimental landscapes with predictions of the model (Fig. 3). We found excellent agreement across the entire landscape for all hairpins studied (within <1 kBT), with two exceptions. At the lowest extensions, corresponding to regions deconvolved from physical compressions of the double helix (which can arise from thermal fluctuations) as well as elongations, the experimental potential is systematically less stiff than the model. This discrepancy may be attributable to an inaccurate description of the confining potential, somewhat arbitrarily taken to be a Morse potential (22). In addition, the barrier for exiting the unfolded state in Fig. 3B rises to the TS more slowly than predicted, lagging by up to 3 kBT at the point of greatest discrepancy. We speculate that this deviation may result from the large number of base pairs that must be formed to reach the TS from the unfolded state, which allows more opportunities for abortive refolding attempts involving misfolded base pairs. In principle, the sequence for this particular hairpin allows for a number of misfolded states containing short, 2−3 bp helices. Any such misfolding, neglected in the model, would tend to increase the probability of extensions near the unfolded state, exactly as observed.
The deconvolution approach described here has known limitations. To obtain adequate statistics, folding must occur sufficiently frequently that large numbers of transitions can be recorded. In the present case, this places a practical limit on the folding rate of ~0.1 s−1, which is faster than some slow folding transitions found in proteins or ribozymes. The numerical stability of any deconvolution process depends on the quality of the input data (both for the record being analyzed and the PSF employed). In practice, only a limited range of frequency information can be recovered by deconvolution, restricting the resolution of the reconstructed landscape, particularly at the shortest length scales (23). Moreover, experimental noise may become amplified by deconvolution, producing artifactual features that further complicate determinations of short-scale behavior (21) The challenges posed by deconvolution, however, may be mitigated by increasing the stiffness of the experimental system, which reduces the smoothing of trajectories (24). Improvements may be achieved by increasing the stiffness of the handles (e.g., by making them shorter or from materials other than dsDNA). Application of the approach described here to peptides or more complex nucleic acid sequences may supply further insights into how energy landscapes guide the folding process.
Supporting Online Material
Materials and Methods
Sample preparation. Hairpin constructs were made as described previously (S1). Briefly, these each consisted of a hairpin sequence connected to a 621-bp digoxigenin-labeled dsDNA handle on the 3' end, and a 1036-bp, biotin-labeled dsDNA handle on the 5' end. The hairpin sequence was separated from each handle by an abasic site (a deoxyribose spacer), inserted to reduce stacking interactions between the handles and the hairpin. These constructs were incubated with 600-nm diameter polystyrene beads labeled with avidin and 730-nm diameter beads labeled with anti-digoxigenin to produce “dumbbells,” as shown in Fig. 1A (S2). Dumbbells were diluted in assay buffer (50 mM MOPS pH 7.5, 200 mM KCl) with 2% oxygen scavenging system (250 mg/mL glucose, 37 mg/mL glucose oxidase, 1.7 mg/mL catalase), and introduced into a flow cell.
Measurement and analysis. All measurements were made at 23 ± 0.5 °C in an optical trap with two trapping beams and two detector beams, as described previously (S3). The position and intensity of the two orthogonally-polarized infrared trap beams were controlled independently by acousto-optic deflectors, while the positions of the two beads were detected independently by light from two orthogonally-polarized red detector beams that was scattered by the beads onto position sensitive diodes. The stiffness of the traps was calibrated using standard techniques, and the position of the beads in the traps was calibrated for each dumbbell (S4). Where not otherwise indicated, measurements were made using a passive force clamp, described elsewhere (S3), which maintains constant force during the motions of the hairpin. Records were measured at constant force for 10−3000 s, depending on the hairpin folding rate. Data were sampled at bandwidths ranging from 1−50 kHz and Bessel-filtered online at the appropriate Nyquist frequency. Data used for deconvolution of the energy landscape were not filtered offline, while the rest of the data were median-filtered with a 1−200 ms window. Folded and unfolded states for hairpins with two-state behavior were partitioned automatically using a software threshold adjusted to the extension midway between the two states.
Hairpin sequences. All hairpins (except one) had a 20-bp stem, and all had a 4-nt polythymidine loop (a tetraloop chosen to minimize any intraloop stacking and structure). The sequences of the hairpins are listed in Table S1 below. The hairpins in the family with varying transition state locations were named “20TSxx/T4,” where “20” indicates the length of the stem, “TSxx” indicates a transition state located putatively “xx” bp from the end of the stem (predicted by the landscape model used in the design process), and “T4” indicates the sequence of the loop. The hairpins with varying barrier heights were named “20Hyy/T4”, where “Hyy” indicates a barrier height predicted by the landscape model to be “yy” kJ/mol. The hairpins with varying mismatch locations were named “20Mzz_nn/T4”, where “Mzz” indicates a mismatch located “zz” base pairs from the base of the stem and “nn” indicates the mismatch base identity. Two hairpins had sequences described previously: the hairpin in the ‘mismatch’ family with that had no mismatch was hairpin 20R55/T4 from ref. S1 (where “R55” indicates a random stem sequence with 55% G:C content), while the hairpin with a 30-bp stem shown in Fig. 3A was hairpin 30R50/T4 from ref. S1.
Effects of force on folding landscape. The application of force constrains the unfolded state and partially unfolded intermediates to adopt a low-entropy, extended conformation, justifying the use of a one-dimensional description of the landscape. The externally-applied force tilts the energy landscape because of the work performed by the optical trap on the system. The precise effect of force on the landscape depends on how the force changes during the folding reaction itself, as described elsewhere (S3). For constant force, as in most measurements presented here, the situation is described by Fig. S1: the energy of the unfolded state decreases by FΔx while the energy of the transition state decreases by FΔx‡. We assume that the position of the transition state remains invariant over the small range of force applied here, an approximation valid in the limit that the curvature of the potential barrier is much greater than the change in the force.
Landscape model. The model used to predict the hairpin folding behavior has been described in detail previously (S1). Briefly, this model calculates the free energy required to unzip each base pair using nearest neighbor energies from the program mfold 3.1 (S5), and adds to this the free energy required to stretch the unzipped ssDNA using a worm-like chain approximation (S6) and the work done by the trap, yielding the free energy as a function of the extension of the hairpin and the force applied by the trap, as follows:
A repulsive Morse potential, where ε represents the potential depth and a represent the well width, is used to approximate the resistance of the dsDNA helix to compression (S7), which can occur through thermal fluctuations. Lp is the persistence length and L0 is the contour length of the ssDNA in the unfolded hairpin. This free-energy function is then smoothed by an amount corresponding to the stiffness of the unzipped ssDNA, to incorporate the effect of ssDNA elasticity, resulting in model landscapes of the form shown in Fig. 3. The heights and locations of any energy barriers were determined directly from the modeled landscape. Extension histograms were simulated by calculating the probability distribution function for the extension from the free energy landscape and convolving this with a Gaussian of appropriate width based on the elastic stiffness of the dsDNA handles. Distances between folded and unfolded states were determined from these modeled histograms. The locations of any intermediate states were determined from the positions of local minima in the energy landscape.
We note that this model contains no free fitting parameters. The model employs, as fixed parameters, values for the persistence length and the contour length of ssDNA, and for the extension of ssDNA at given force, that fall in a range consistent with previous measurements.
Numerical data. Table S2 displays the numerical values of the experimental data and model results displayed in Figs. 1 and and2.2. Error bars on experimental data indicate the standard error on the mean added in quadrature to the estimated systematic error arising from calibration uncertainties and the known dispersion in bead sizes. Uncertainties in the model reflect the standard deviation of the results obtained from calculations using a range of input parameters consistent with previous work (see above).
Deconvolution procedure. To reconstruct the free energy landscape, we removed by deconvolution the effects of the thermal motion of the beads attached by elastic dsDNA handles to the hairpin (which blur the extension histogram), following the non-linear constrained iterative methods described by Jansson (S8). Given a point-spread function S(x) smoothing the true extension probability function to produce the measured extension probability P(x), we construct an initial solution p(0)(x) and approach the true distribution function iteratively:
where k refers to the index of the iteration. The relaxation function r [p(k)(x)] constrains the solution to remain within the physical boundaries 0 = p(k)(x) = 1, with the amplitude r0 controlling the speed of convergence. We used r0 = 2 with ~300 iterations. To reduce artifactual fluctuations in the deconvolution, P(x) was smoothed in a 1-nm window, as was the final solution p(n)(x).
S(x) was estimated from histograms of the extension of a hairpin with 100% stem G:C content studied previously (hairpin 20R100/T4 from ref. S1). This hairpin is known to remain in the folded state for the forces used in the experiment of Fig. 3 (~11−14 pN). S(x) was determined to be a Gaussian with a characteristic width that depended on the stiffness of the trapped dumbbell (Fig. S2). To increase the resolution of the deconvolved landscape in Fig. 3A-C, the stiffness of the trapped dumbbell was increased by measuring without a force clamp (S3). In these measurements, therefore, the force on the hairpins changed somewhat as the hairpins folded and unfolded. This effect modifies the energy landscape by changing the integral for the mechanical work carried out by the trap in Eq. S1. A correction to the energy landscape was therefore calculated from the known local stiffness of the trap and applied, as described previously (S3).
Fig. S1. Effect of force on the energy landscape. A force F0 that is constant during the folding transition tilts the landscape uniformly, reducing the energy of the unfolded state (U) by F0Δx, and reducing the energy of the transition state by F0Δx‡f, where Δx is the distance between F (folded state) and U, and Δx‡f is the distance between F and the energy barrier.
Fig. S2. A histogram of the extension of a hairpin with 100% G:C content in the stem (sequence shown in inset), which remains fully folded under 11 pN load (solid black line). The data are well fit by a Gaussian (dashed red line), experimentally confirming the Gaussian form of the point spread function (PSF) used for the deconvolution procedure.
Table S1. Sequences of the hairpins measured. The hairpin nomenclature is described in the Materials and Methods. N is the number of hairpin molecules measured.
Table S2. >Numerical results from experiment and model. Experimental data are listed above, with model results shown directly below in italics.
S1. M. T. Woodside et al., Proc. Natl. Acad. Sci. USA 103, 6190 (2006).
S2. J. W. Shaevitz, E. A. Abbondanzieri, R. Landick, S. M. Block, Nature 426, 684 (2003).
S3. W. J. Greenleaf, M. T. Woodside, E. A. Abbondanzieri, S. M. Block, Phys. Rev. Lett. 95, 208102 (2005).
S4. K. C. Neuman, S. M. Block, Rev. Sci. Instr. 75, 2787 (2004).
S5. M. Zuker, Nucl. Acids Res. 31, 3406−15 (2003).
S6. J. F. Marko, E. D. Siggia, Macromolecules 28, 209−12 (1995).
S7. M. Peyrard, A. R. Bishop, Phys. Rev. Lett. 62, 2755 (1989).
S8. P. A. Jansson, ed. Deconvolution of Images and Spectra, 2nd ed (Academic Press, New York, 1997).