|Home | About | Journals | Submit | Contact Us | Français|
To explain the observed dynamics in equilibrium single-molecule measurements of biomolecules, the experimental observable is often chosen as a putative reaction coordinate along which kinetic behavior is presumed to be governed by diffusive dynamics. Here, we invoke the splitting probability as a test of the suitability of such a proposed reaction coordinate. Comparison of the observed splitting probability with that computed from the kinetic model provides a simple test to reject poor reaction coordinates. We demonstrate this test for a force spectroscopy measurement of a DNA hairpin.
A variety of new experimental techniques have made it possible to monitor the conformational fluctuations of single biological macromolecules under both equilibrium and nonequilibrium conditions. These experiments aim to probe the statistical dynamics and conformational substates relevant to folding and function. In a typical experiment, such as observation of the resonant energy transfer efficiency between two fluorophores incorporated into an RNA molecule , fluctuations of a spectroscopic observable in the absence of an external field are monitored. Other experiments allow the effect of an external biasing potential on the dynamics to be observed, as in an optical trap [2–4].
To describe the observed dynamics of the system, it is tempting to identify the observable with a reaction coordinate and construct a model in which the dynamics evolves by a diffusion process in an effective potential, such as by overdamped Langevin (also called “Brownian”) dynamics ,
Here, x(t) is the time-dependent motion along the resolved coordinate, D(x) is the diffusion constant (often assumed to be a constant independent of x), β (kBT )–1 is the inverse temperature, F(x) –kBT ln π(x) is the potential of mean force (PMF) defined in terms of the observed equilibrium probability density π(x), and R(t) is a Gaussian process with zero mean satisfying R(t)R(t′) = δ(t – t′).
Many physical systems such as biomolecules exhibit strong metastabilities in the conformational degrees of freedom, resulting in the presence of two or more discrete conformational states in which the system remains for a for long time before transitioning to another metastable state . While it is often easy to find an observable x that is a suitable order parameter that allows these metastable states to be discriminated to some degree, it is generally difficult to find a good reaction coordinate so that dynamics along the resolved coordinate are well-described by Eq. 1.
For data collected in a given single-molecule experiment, how can we determine whether the resolved coordinate provides a good reaction coordinate? Recent work on tests of reaction coordinate suitability in computer simulations has focused on the calculation of the committor or splitting probabilities, a concept dating back to Onsager . This quantity, now extensively used in simulation studies of protein folding , represents the probability that a trajectory first encounters one absorbing boundary placed along the reaction coordinate before another, given an initial microscopic state of the system. For suitable choices of reaction coordinate, the distribution of committor probabilities along an equilibrium ensemble of configurations restricted to a given value of the reaction coordinate will be closely grouped about a characteristic value [8–13]; indeed, ideal reaction coordinates organize committor isosurfaces in an ordered fashion along the reaction coordinate .
Unfortunately, tests based on evaluating distributions of committor values along cuts of the putative reaction coordinate are impossible to apply in a physical experiment, since there is no way to prepare the system in precisely the same microscopic configuration to probe the statistics of committor probabilities. Instead, we propose a simple alternative that is readily computable from observed equilibrium trajectories of the resolved coordinate: Comparison of the average committor along each value of the reaction coordinate evaluated by Eq. 1 with the empirical average committor from the observed trajectory.
Consider the placement of absorbing boundaries at a and b near the periphery of the observed range of the resolved coordinate x. For a diffusion process in one dimension governed by Eq. 1, the probability of first encountering a before b starting from x [a, b] can be shown to be [5, 14, 15],
The PMF along the resolved coordinate x, F(x), can be estimated from a single-molecule trajectory of sufficient length [4, 16] or from multiple trajectories under different equilibrium  or nonequilibrium [18, 19] conditions. The diffusion profile D(x) can be estimated in a number of ways (such as the Bayesian scheme of Best and Hummer that allows simultaneous computation of both PMF and diffusion constant ), though it is commonly assumed to be constant, in which case it cancels from both numerator and denominator.
An empirical estimate of the splitting probability A(x) can also be computed directly from an observed equilibrium trajectory x(t), (also recently noted in Ref. 20]),
where we have defined the hitting function cA(t) in terms of x(t) as,
where auxiliary functions τA(t) and τB(t) are defined as,
The hitting function cA(t) simply keeps track of whether x(t) will hit boundary a before b immediately following time t, and assumes the value of unity if so, and zero otherwise. In practice, the delta function δ(y – x(t)) is replaced by some kernel function of finite width, such as a histogram bin. An estimate of A(x) from multiple equilibrium trajectories can be produced by averaging the trajectories weighted by their lengths.
Our proposed test is simple: By comparing the splitting probability estimated from the PMF, pA(x), with the empirical estimate of the splitting probability from the trajectory, A(x), we can judge whether these quantities are obviously discrepant over the range x [a, b], which would indicate that x is a poor reaction coordinate. Note that this test is necessary, but not sufficient, for x to be a good reaction coordinate; agreement does not mean that the putative reaction coordinate is a true reaction coordinate. Nevertheless, the test may be sufficiently exacting to reject poor choices of reaction coordinate that are not immediately obvious by eye yet fail this comparison.
When the observed coordinate is determined to be a poor reaction coordinate, the consequences of assuming it to be an adequate reaction coordinate depend on the precise nature of the information extracted from the single-molecule data. The consequences could be as simple as underestimating the rate constant for a two-state process or as subtle as inferring an erroneous mechanism for more complex processes. The most obvious consequence is mistaking the location of the transition state—the point where the splitting probability pA = 0.5—to be displaced from the free energy barrier in the potential of mean force. For systems like DNA hairpins and proteins, this can have consequences for the interpretation of how “brittle” or “compliant” the conformational states are perceived to be. Notably, similar tests have been found to be useful in validating putative reaction coordinate choices in computer simulations, despite the ability to inspect the atomic coordinates directly [21, 22].
As an illustrative example, we consider the two-dimensional model system previously studied by Rhee and Pande ,
pictured here in the upper-left panel of Fig. 1. Two stable states are present, located roughly at (x, y)-coordinates (4, 1) and (1, 4). At kBT = 5, the PMFs along both x and y clearly show two distinct wells separated by a barrier, and yet these coordinates are expected to be poor reaction coordinates individually; the coordinate q = (x – y)/√2 (Fig. 1, upper-left panel, red line), however, which connects the two stable basins more directly, is known to be a good reaction coordinate at this temperature .
A Brownian dynamics trajectory of 106 steps was generated using the discretization of Eq. 1 by Ermak and Yeh [23, 24], with a diffusion constant of D = 1 and timestep Δt = 0.1. This trajectory was projected onto either poor choices of reaction coordinate x and y, or good reaction coordinate q (Supplementary Fig. 1). For each projection, the potential of mean force was estimated from an empirical histogram, e.g. F(x) ≈ –kBT ln p(x) for the projection onto x, using 100 equally-sized bins. The PMF-derived splitting probability pA(x) was computed from F(x) using Eq. 2, and the empirical splitting probability A(x) according to Eq. 3. To judge whether disagreement between these estimates was statistically meaningful, the statistical uncertainty in the empirical A(x) was estimated using by time-correlation analysis (see Supplementary Information).
The results of this comparison assuming a uniform diffusion constant are shown in Fig. 2. The poor suitability of x and y as reaction coordinates is easily seen by the large discrepancy between the the splitting probability pA computed from the PMF (dashed line) and the empirical splitting probability A estimated from the trajectories (solid line). However, the coordinate q = (x–y)/√2, previously identified by Rhee and Pande as being well-aligned with the true reaction coordinate at this temperature by sophisticated means not available to single-molecule experiments , agrees to within statistical error (shaded region).
To demonstrate the utility of our proposed splitting probability test in a real laboratory measurement, we performed the same analysis on a single-molecule trajectory of a DNA hairpin in a double optical trap, previously reported by Woodside et al. . The hairpin, referred to as 30R50/T4 due to the content of a 30 bp stem-forming sequence, is attached by means of dsDNA handles to two polystyrene beads held in a passive all-optical constant-force clamp  at an external force that encourages hopping among closed and open conformations over the course of the experiment. Bead displacements in the trap were recorded with a sampling frequency of 25 kHz , and the bead-to-bead extension trajectory was analyzed here.
Fig. 3 shows the observed trajectory of the molecular extension coordinate and corresponding splitting probability analysis for a uniform diffusion constant. From this analysis, it is evident there is poor agreement between pA(x) estimated from the PMF and the empirical A(x) estimated from the trajectory in the region of extensions between 535 and 545 nm. This suggests that, at this external force, dynamics would be poorly-described by Brownian dynamics along the total molecular extension coordinate using Eq. 1 and a uniform diffusion constant.
Recently, it has been suggested that non-uniformity of the diffusion constant along the resolved coordinate may have important ramifications for single-molecule biophysical experiments . Could strong position-dependence of the diffusion constant D(x) may be responsible for the observed discrepancy in in Fig. 3? To judge whether non-uniform diffusion significantly impacted our test of reaction coordinate suitability, we used the Bayesian inference scheme proposed by Best and Hummer  to simultaneously compute position-dependent diffusion constant D(x) and potential of mean force F(x) for the systems considered here (Supplementary Figs. 2 and 3). Notably, the diffusion constant varies markedly with the bead-to-bead extension (Fig. 4, left), and the agreement of the PMF-derived pA and empirical A (Fig. 4, right) improves substantially. By contrast, repeating the reaction coordinate test for the 2D model system allowing for a position-dependent diffusion constant reveals only relatively minor variations in the estimated diffusion constant that result in no substantial change in which reaction coordinates are rejected by the test (Supplementary Fig. 2). Taken together, these data suggest a significant role for position-dependent diffusion in the DNA hairpin system under force, in agreement with the theoretical findings of Best and Hummer .
We note that the reaction coordinate test presented here only allows us to test a condition that is necessary, but not sufficient, for Brownian dynamics to appropriately describe the observed dynamics on a one-dimensional landscape determined by the PMF. This does not rule out the possibility of pathological cases where poor reaction coordinates go unnoticed because the average splitting probability at a particular value of the resolved coordinate matches the PMF-derived model, but the splitting probability distribution is not tightly peaked about its average value. Additionally, if multiple reactive channels exist that are otherwise indistinguishable by this test, differences between the channels will not be resolvable.
Despite this, our test was able to discern good from poor choices of reaction coordinate in a model system, and reject the extension coordinate as a good choice of coordinate for a DNA hairpin unless a strongly position-dependent diffusion constant is permitted. Even then, there are statistically significant discrepancies between the observed splitting probability and the PMF-derived splitting probability that indicate this reaction coordinate choice is not ideal. We note that the presence of ~1 kb dsDNA handles tethering the DNA hairpin to the laser-trapped polystyrene beads is one potential source of the incomplete alignment of the extension coordinate with the reaction coordinate for hairpin unzipping. Shorter dsDNA handles have recently been suggested as a way to improve the signal-to-noise ratio , and may also improve the reaction coordinate quality. For proteins, techniques that allow the attachment of tethers at specific attachment points can be exploited to probe for improved reaction coordinate should the experimenter find that the current pulling coordinate under study is unsuitably poor . Finally, we note that though this test is able to test the suitability of the extension coordinate for a polymer under force, we cannot determine from the present analysis whether a good reaction coordinate in the presence of external force would also be a good reaction coordinate in the absence of force, or even under different biasing forces; this concern is still the subject of active study [27, 28].
The authors thank Michael Woodside (University of Alberta and National Institute for Nanotechnology, NRC), Phillip Elms and David Chandler (U Berkeley), Gerhard Hummer and Attila Szabo (NIH), Steven Block and Imran Haque (Stanford University), Felix Ritort (University of Barcelona), and the anonymous referees for their helpful feedback on this work. The authors are grateful to Michael Woodside and Steven M. Block (Stanford University) for kindly providing original single-molecule data. JDC acknowledges support through an NSF grant for Cyberinfrastructure (NSF CHE-0535616) and a California Institute of Quantitative Biosciences (QB3) Distinguished Postdoctoral Fellowship. VSP acknowledges support from NIH R01-GM062868, NSF-DMS-0900700, NSF-MCB-0954714, and NSF EF-0623664. Matlab code implementing the analysis procedure described here can be obtained from https://simtk.org/home/splitting.