Elucidating the mechanisms by which proteins and nucleic acids fold into three-dimensional structures is key to developing insights into biomolecular function (
1), improving predictive models (
2,
3), and understanding the basis of diseases linked to misfolding (
4). For over two decades, free energy landscape formalisms have provided the fundamental conceptual framework for describing folding (
5). Numerous experimental and theoretical studies have probed specific features of folding landscapes, including the properties of transition states (
6), intermediate states (
7), and the ruggedness of the energy surface (
8). Experiments, however, have characterized only limited aspects of the folding landscape, such as the locations and heights of energy barriers, and how these barriers change when perturbed by solvent substitutions, temperature jumps, substrate changes, or mutations (
9). Direct measurements of the shape of an energy landscape at all points along the reaction coordinate have not been feasible. Here, we show how the full energy landscape for the formation of a nucleic acid hairpin can be derived from sufficiently high-resolution trajectories of single-molecule folding.
Single nucleic acid hairpins subjected to mechanical loads provide a powerful model system for investigating energy landscapes and understanding the effects of primary and secondary structure on folding (
10-
13). The molecular end-to-end extension is recorded during the folding transition, supplying a natural reaction coordinate that can be related directly to the number of bases paired in the hairpin stem. Previous work has characterized specific aspects of the folding landscape. In particular, short hairpins tend to fold as simple, two-state systems (
10,
13), indicative of a single transition energy barrier. Conventional analysis of single-molecule records supplies the free energy difference between the folded and unfolded states, as well as the height and location of the barrier (
10). For hairpins with random (unpatterned) stem sequences, the barrier is typically located close to the unfolded state, with a height controlled largely by the size of the loop (
13). However, finer details of the folding landscape have been heretofore inaccessible, due to limited spatial and temporal resolution as well as instrumental baseline drift. Using a high-bandwidth, passive force clamp with an ultra-stable dumbbell assay (
14), we have now been able to reconstruct the shape of the landscape.
Sets of DNA hairpins were synthesized in which the heights and locations of energy barriers were systematically varied, as well as the numbers and locations of any folding intermediates. Sequences were designed based on a model of the sequence-dependent energy landscape derived from the thermodynamic and mechanical properties of nucleic acids (
13,
15). Both ends of the hairpins were attached to long handles of dsDNA (
13) bound specifically to polystyrene beads held in a dumbbell configuration by two independently-controlled optical traps (). A constant force,
F, was applied with a force clamp (
14), and high-resolution trajectories of the end-to-end extension (~0.1 nm/√Hz) were recorded for a range of forces. The extensions of folded, unfolded, and any intermediate states were measured directly from these records. The locations and heights of energy barriers between these states were computed from the force-dependence of the state lifetimes (
10,
13). These measurements of specific points on the landscape were then taken as benchmarks for an experimental determination of the free energy at every point along the reaction coordinate, deconvolving the measured probability distribution of hairpin extension to correct for blurring effects arising from thermal motions associated with the beads and the DNA tether.
A typical record of extension under load () shows two-state folding behavior: two nearly Gaussian peaks in the extension histogram correspond to the folded and unfolded states. Here,
F ≈
F1/2, the load at which the hairpin spends equal time in each state. The lifetimes of the folded (τ
f) and unfolded (τ
u) states depend exponentially on
F according to τ
f(
F)
![[proportional, variant]](/corehtml/pmc/pmcents/x221D.gif)
exp(-
FΔ
xf‡/
kBT), where Δ
xf‡ is the distance to the barrier from the folded state and
kBT is the thermal energy (
16); an analogous expression holds for τ
u(
F). Previously, the transition state (TS) for folding a hairpin with an unpatterned stem sequence was found to involve the formation of 1−2 base pairs adjacent to the loop (
13), resulting in an energy barrier near the unfolded state. In contrast, the barrier for the hairpin in lies much closer to the folded (Δ
xf‡ = 5.4 ± 0.5 nm) than to the unfolded state (Δ
xu‡ = 13 ± 1 nm), implying that the TS requires the formation of ~15 base pairs. This difference in behavior is due to the particular sequence selected for the hairpin: a contiguous block of strong G:C base pairs placed near the base of an A:T-rich stem moves the barrier near to the folded state (, inset).
We created a family of hairpins in which the TS position was systematically manipulated by moving the block of G:C base pairs to various locations within the stem (
Table S1). Determining the barrier location for each hairpin, we found that the TS moved in concert with the G:C block, always located at the edge of the block nearest the loop (;
Table S2). The sum of Δ
xf‡ and Δ
xu‡ agreed well with the distance between folded and unfolded states (Δ
x), as expected for a pure two-state system. Measurements were in excellent agreement with landscape model predictions ().
We also created a family of hairpins where the barrier position was fixed at the center of the stem using a G:C block, but the barrier height was altered by changing the overall stem G:C content (
Table S1). State lifetimes measured at
F1/2 for each hairpin (
17) varied by a factor of ~600 over the entire family (;
Table S2). Assuming Arrhenius behavior for the lifetimes, τ
1/2 = τ
0exp(Δ
G‡1/2/
kBT), where Δ
G‡1/2 is the barrier height at
F1/2 and τ
0−1 the attempt rate at zero force, these results show that the barrier height changed by 6.4 ± 0.7
kBT, matching the variation predicted by the model, 6.9 ± 0.2
kBT. All together, the results of confirm the remarkable level of control afforded by this system over the folding landscape: the TS can be placed at will along the reaction coordinate and its energy adjusted over a wide range simply by manipulating the hairpin sequence.
Extension records such as those in and (
10,
12,
13) have traditionally been interpreted in terms of two-state folding over a single energy barrier. However, not all hairpin sequences exhibit strict, two-state folding behavior. For example, it was recently reported that the nominal “folded state” may, in fact, consist of an ensemble of states comprised of the folded state plus a series of frayed states with one or more base pairs unzipped (
13). Moreover, short-lived intermediate states may be present that are unobservable at the available temporal resolution (
18). Simple two-state analysis ignores details of the trajectory between folded and unfolded states because the motion is taken to be instantaneous, and properties of the energy landscape are inferred only from characteristics of the two states. To induce the hairpin to spend more time between folded and unfolded states, and to observe intermediate properties during folding more clearly, we manipulated the sequence to produce a local potential well between the folded and unfolded states by inserting a single T:T mismatch at various positions along the hairpin stem.
When the mismatch was placed at the seventh base pair from the base of the stem, extension records at
F ≈
F1/2 revealed a shoulder on the histogram peak at low extension (nominally, the folded state), indicative of a third peak representing an intermediate state (). Such a shoulder was observed systematically in all records, but associated only with the low-extension peak. The existence of an intermediate state may also be inferred from the rapid fluctuation of ~5 nm amplitude recorded at low extensions (). Similar results were obtained by introducing a G:A or a G:T mismatch, rather than T:T. Repeating these measurements for an entire family of hairpins with mismatches located 4−16 bp from the base of the stem (
Table S1), we consistently observed the signature of an intermediate state, whose distance from the folded and unfolded states depended on the location of the mismatch (;
Table S2). Interpreting the hairpin extension in terms of the number of base pairs formed, the intermediate states correspond to hairpins partially folded up to the point of mismatch.
These results demonstrate the precision with which certain features of the folding landscape can be determined, but they define only a few key points on the energy landscape. Notably, they don't address more general features of the landscape, such as the widths and curvatures of the potential wells or barriers, which are known to affect folding (
19). By further analyzing the folding trajectories, however, the entire landscape along the reaction coordinate can be reconstituted. The free energy at a given extension, Δ
G(
x), is related to the probability density,
P(
x), through Δ
G(
x) = −
kBT ln[
P(
x)] (
20). Although conceptually straightforward, this method of determining Δ
G(
x) requires accurate measurements of
P(
x) in the region between the states, where the hairpin spends little time (~100−300 μs, here). Hundreds to thousands of transitions must therefore be sampled at high bandwidth, necessitating exceptional instrumental stability. A second complication is that the measured extension represents that of a hairpin attached to dsDNA handles and beads, rather than an isolated hairpin. The thermal and mechanical properties of the trapped dumbbell smooth and dampen the apparent motions of the hairpin (
13). The underlying energy landscape may be recovered from
P(
x), however, by a deconvolution process.
To reconstruct the full energy landscape, we measured the folding trajectory of single hairpins at
F ≈
F1/2 at high bandwidth (50 kHz) for 5−15 min and created a histogram of the extension,
P(
x). Instrumental drift was typically ≤1 nm. The point spread function (PSF) for the deconvolution,
S(
x), was estimated from extension histograms of the folded state for a hairpin with 100% stem G:C content and found to be a Gaussian (
Fig. S2) whose width is governed by the stiffness of the trapped dumbbell (
14). The energy landscape was then determined by a constrained nonlinear iterative deconvolution (
21) of the extension histogram. An initial guess for the potential, Δ
G(0)(
x), was constructed by assuming parabolic potential wells located at the histogram maxima, separated by a parabolic barrier whose height and position were determined from the measured, force-dependent rates (as in ). The associated extension probability
p(0)(
x) was then convolved with the PSF and compared to
P(
x). The difference was subtracted from
p(0)(
x), constraining the probability to be between 0 and 1, and the process was iterated (
15). The solutions,
p(n)(
x), and associated landscapes, Δ
G(n)(
x), are shown in , along with the measured
P(
x), Δ
G(
x), and residuals

, for four different hairpin sequences designed to explore a range of barrier positions, heights, and possible intermediate states. Shown in are a 20 bp stem with a TS located 18 bp from the base of the stem (), a 20 bp stem with TS located 6 bp from the base of the stem (), a 20 bp stem with T:T mismatch located 7 bp from the end of the stem (), and an unpatterned stem sequence of length 30 bp (). In all four cases, the deconvolution algorithm generated a stable solution with acceptably small residuals.
The subtle differences seen in P(x) and ΔG(x) were sharpened by the deconvolution procedure into recognizably different landscapes that reflected the underlying sequence and recapitulated the results in and . In , the barrier is located near the unfolded state, whereas in it is nearer the folded state. The hairpin in , which contains a mismatch, shows a clearly-resolved intermediate state, corresponding to partial folding up to the point of mismatch. These measurements go beyond the previous results, however, by revealing details of the well and barrier shapes. For example, the energy minimum for the folded state in is significantly broader than that in : the width of this well supplies direct evidence that the nominally folded state for this hairpin actually consists of an ensemble of states with up to ~4 bp unzipped. A similar situation is seen in , although the slightly steeper well suggests that the fully folded state plays a more dominant role in this mixed ensemble than it does in . The energy barriers in are clearly different: the barrier in is wider than in , indicating a TS that is less well-defined and therefore more susceptible to experimental perturbation, e.g., by mutagenesis or solvent condition changes.
To explore the validity of these measurements, we compared the experimental landscapes with predictions of the model (). We found excellent agreement across the entire landscape for all hairpins studied (within <1
kBT), with two exceptions. At the lowest extensions, corresponding to regions deconvolved from physical compressions of the double helix (which can arise from thermal fluctuations) as well as elongations, the experimental potential is systematically less stiff than the model. This discrepancy may be attributable to an inaccurate description of the confining potential, somewhat arbitrarily taken to be a Morse potential (
22). In addition, the barrier for exiting the unfolded state in rises to the TS more slowly than predicted, lagging by up to 3
kBT at the point of greatest discrepancy. We speculate that this deviation may result from the large number of base pairs that must be formed to reach the TS from the unfolded state, which allows more opportunities for abortive refolding attempts involving misfolded base pairs. In principle, the sequence for this particular hairpin allows for a number of misfolded states containing short, 2−3 bp helices. Any such misfolding, neglected in the model, would tend to increase the probability of extensions near the unfolded state, exactly as observed.
The deconvolution approach described here has known limitations. To obtain adequate statistics, folding must occur sufficiently frequently that large numbers of transitions can be recorded. In the present case, this places a practical limit on the folding rate of ~0.1 s
−1, which is faster than some slow folding transitions found in proteins or ribozymes. The numerical stability of any deconvolution process depends on the quality of the input data (both for the record being analyzed and the PSF employed). In practice, only a limited range of frequency information can be recovered by deconvolution, restricting the resolution of the reconstructed landscape, particularly at the shortest length scales (
23). Moreover, experimental noise may become amplified by deconvolution, producing artifactual features that further complicate determinations of short-scale behavior (
21) The challenges posed by deconvolution, however, may be mitigated by increasing the stiffness of the experimental system, which reduces the smoothing of trajectories (
24). Improvements may be achieved by increasing the stiffness of the handles (
e.g., by making them shorter or from materials other than dsDNA). Application of the approach described here to peptides or more complex nucleic acid sequences may supply further insights into how energy landscapes guide the folding process.