|Home | About | Journals | Submit | Contact Us | Français|
Single-molecule manipulation studies can provide quantitative information about the physical properties of complex biological molecules without ensemble artifacts obscuring the measurements. We demonstrate computational techniques which aim at more fully utilizing the wealth of information contained in noisy experimental time series. The “noise” comes from multiple sources, e.g. inherent thermal motion, instrument measurement error, etc. The primary focus of this article is a methodology that uses time domain based methods to extract the effective molecular friction from single-molecule pulling data. We studied molecules composed of 8 tandem repeat titin I27 domains, but the modeling approaches have applicability to other single-molecule mechanical studies. The merits and challenges associated with applying such a computational approach to existing single-molecule manipulation data are also discussed.
An understanding of how individual molecules behave mechanically and chemically can be of importance to both nanotechnology and materials applications [1–3]. These studies are also important to understanding the fundamental biophysics of important molecules like proteins and nucleic acids [4–11]. Single-molecule experiments have allowed researches to explore small scale systems with high spatial and temporal resolution and have provided useful mechanical information about individual molecules. Single-molecule experimental techniques are making rapid advances and the time series resulting from these experiments contains a rich amount of physical information.
In this article we demonstrate how time domain based computational modeling techniques utilizing local maximum likelihood methods [12–15] can be used to extract kinetic information from single-molecule time series where a molecule is stretched by an external force. Specifically, we use an AFM tip to mechanically unfold/refold a molecule composed of 8 serially linked I27 domains from human cardiac titin using constant velocity pulling experiments. Although we focus on studying a protein with AFM, the computational methods we present are applicable to other systems, e.g. nucleic acids  or optical tweezer manipulation experiments [7, 16].
The primary interest in this article is in extracting the effective molecular friction induced by a protein molecule as a function of molecular extension. From a biological standpoint, the I27 domain of titin acts as a molecular “shock absorber” . The hydrogen bond network in the beta sandwich secondary structure is believed to help dissipate energy caused by rapid extension of biological materials such as cardiac muscle and to also allow extension of tissue without inducing rupture . The amount of energy damped depends on the effective molecular friction of the molecule. The elastic and frictional properties of molecules can be tuned and this shows great promise in nanotechnology and materials applications where one desires to design synthetic adhesives or spider-silk like materials [2, 17]. Single-molecule techniques provide one means to characterize these types of novel materials, but directly estimating the effective molecular friction from single-molecule data can be challenging due to the multiscale noise sources influencing these types of measurements [12, 18, 19].
Previous works [18–20] have demonstrated that the contribution of the effective friction due to the molecule attached to the AFM cantilever can be decoupled from that associated with the solvent interacting with the cantilever in experiments where the cantilever is oscillated at a constant frequency. Frequency domain techniques which utilize a simple harmonic oscillator model were used to quantify the effective friction due to the molecule. Here we demonstrate how time domain based methods [12, 13] can be used to estimate the effective molecular friction. We borrow many of the ideas laid out in [18, 21], but our techniques do differ in several important aspects which we highlight throughout. The primary motivations for providing an alternative computational method for extracting effective molecular friction from experimental times series are as follows: 1) Data coming from constant loading rate experiments  do not utilize an oscillating cantilever. The time domain based methods we demonstrate can utilize existing time series to estimate an effective friction; 2) In other circumstances, the use of an oscillating external force may complicate the analysis of rupture time distributions in various single-molecule experiments [8, 22]. Possessing computational tools which can estimate rupture time distributions, reaction rate constants, and state-dependent effective friction from the same experimental data are desirable because one can then make better use of the information content contained in experimental data [8, 22]. In addition to estimating the effective friction from experimental data, we also demonstrate how the models we use can quantify other sources of variation in single-molecule experiments. For example, we can quantitatively measure how the tip attachment location influences the measured effective friction.
The remainder of this article is organized as follows: Section II reviews the models, discusses computational details, and presents the experimental setup. Section III presents the results and discussion. Section IV concludes.
The time domain method we utilize was first presented in [12, 13]. We summarize the main features here and in Section III we present a new discussion expanding on our motivation for using this type of modeling framework. Nonlinear stochastic differential equations (SDEs) are fit to data coming from AFM experiments where an external force is added into the system [23, 24]. The global SDE [14, 15] representing the dynamics of a single time series is assumed to be a nonlinear diffusion of the form:
where zt represents the experimental observable value at time t (throughout we use the end-to-end extension of the titin molecules as this observable), Bt represents the standard Brownian motion, μ(·, ·) the time dependent drift function and σ2(·) represents the diffusion coefficient 1. The drift term is time dependent because we are adding an external force. The “instrument noise” (εti) does not allow us to directly observe zt, instead we observe yti = zti + εti where the subscript on the time index is used to stress that our observations are discrete.
In single-molecule systems, the complexity of the atomistic system often cannot be ignored and this causes problems in developing physically based, accurate parametric SDE models [25–28] from a priori considerations. For example, the “force-hump” associated with a titin unfolding intermediate [20, 29] cannot be predicted by simple coarse-grained polymer models. We assume that the global dynamics are completely unknown a priori, so appealing to a standard parametric estimation scheme is problematic. To overcome the difficulty of unknown global drift and diffusion functions. We use local models [14, 15, 28, 30] to fit the coefficients of polynomial SDEs whose functional form is motivated by the overdamped Langevin equation . We should stress that the basic local modeling idea presented here does not require an overdamped Langevin structure. If there is a compelling reason to try other local models structures, they can be entertained. The relevant expressions for the local models used here are:
where kBT represents Boltzmann’s constant times the system temperature, FExt the external force applied to the system, λ(t) is the pulling protocol (common to all experiments/simulations) we desire the observable monitored to follow, kpull is the spring constant associated with the harmonic constraint used to apply the external force, FInt the force due to internal molecular interactions, and θ (A, B, C, D) is the local parameter vector estimated by approximate maximum likelihood estimation . zo is a free parameter used only for estimation purposes. We set zo to the average (temporal) value observed in a given local time series window. The windows were formed by dividing a single global time series into M “time windows” 2. The modeling ideas behind the two-scale realized volatility estimator (TSRV)  were used to approximate the variance of the instrument noise process in each local window.
The information from a TSRV inspired analysis is used in conjunction with likelihood based methods to estimate quantities describing the z dynamics. 3. The fitting criterion used was motivated by the local linear maximum likelihood type method outlined in . The stage location λ(t) was denoised by using the Daubechies (5 vanishing moments) wavelet family to smooth the signal 4. All measurement noise was assumed to be contained in y after this smoothing procedure. The validity of the various assumptions (diffusive noise, local linearity, etc.) were tested in an a posteriori fashion using the probability integral transform based Q-test developed in Ref. .
To obtain a global SDE model, a penalized spline was used to stitch the piecewise polynomial models together . The full numerical details of this spline procedure are outlined in Ref. . Briefly, a sequence of estimated local θs measured along one experimental time series are used to construct the global functions (via smoothing splines ) needed to provide μ and σ. Information about the parameter uncertainty is used to determine a regularized global model from the local θs. The procedure is then repeated for each observed experimental time series. An additional discussion on the local windows is presented in Section III.
The overdamped Langevin equation in Eqn. 3 through 6 implicitly assumes that the diffusion coefficient is related to the effective friction via the Einstein relation: σ2 = kBT/ζ where the effective friction is denoted by ζ. Recall that we do test the validity of the local model assumptions using goodness-of-fit tests . After the global function was fit using our smoothing spline procedure, the resulting spline was subsequently used to compute the effective friction () from the Einstein relation; hats above a variable are used to remind us that the function come from time series estimates. Recall that our time domain based model accounted for the instrument measurement noise, but the effective friction that we compute has contributions coming from both the effective friction induced by the molecule (the quantity of interest) and the friction experienced by solvent bombarding the cantilever (“classical” thermal noise). In order to decouple these sources, we use the approach of Ref. , namely we assume that the friction acting on the system can be thought of as two parallel springs. Under this assumption, we can subtract the effective friction experienced by the cantilever/“protein containing some folded I27 domains” system, denoted by , from that associated with the cantilever/“protein with all I27 domains unfolded” system (Unfold) and use this difference to estimate the effective friction due to the protein containing folded I27 domains, i.e. Folded(z):= (z) − Unfold. We used a common estimate of the scalar Unfold for all computations.
10 μl of protein solution, 50–100 μg/ml containing eight serially linked repeats of the I27 domain of human cardiac titin (Athena ES) was incubated on a gold substrate at room temperature for 20 min. A Multimode AFM with Picoforce option (Veeco Instruments) was used for experimental measurements. Individual protein chains were attached to a silicon nitride cantilever tip with spring constant, kpull = 50 pN/nm. The spring constant was determined using methods appealing to the equipartition theorem [36, 37].
The experimental setup is illustrated in Fig. 1. The attached molecule was stretched to unfold several domains, allowed to relax back nearly to the substrate surface, and held at a constant position to allow the molecule to refold before repeating the cycle. The stretch and relax portions of the cycle were performed at a constant velocity of 50 nm/s, followed by a rest time of 30 sec. Discrete time series were recorded at the frequency of 20kHz.
Sample raw AFM data is shown in Fig. 2. In our experiments we attempted to retain the same molecule for multiple cycles. This was done in an effort to minimize variation due to the tip attachment point and other random factors not typically within our control in AFM pulling experiments. In the results presented, we analyze three batches of data. Each batch contains results from holding the same molecule for seven fold/refold cycles (the different traces shown in Fig. 2 are associated with a batch of cycles obtained with one molecule). In our analysis we only use the second and third peak for modeling because the first peak can contain artifacts from nonspecific binding . We did not attempt to analyze all eight peaks because of the experimental difficulty associated with retaining a single titin molecule for multiple force extension cycles. That is, if we successfully held onto a molecule and observed three unfolding events, we reversed the AFM stage direction in order to minimize the chances of the molecule detaching from the AFM tip as opposed to attempting to unfold all 8 domains.
Figure 3a plots the estimated effective internal force in and the effective friction coming from the titin molecule is shown in Fig. 3a–b. In these plots, 3 different titin molecules were subjected to 7 unfolding/refolding cycles. We refer to the 7 experiments where one molecule is retained as a “batch”; in each batch we only report the unfolding portion of the cycle. The term “Relative Extension” is used on the x-axis because we shifted each extension to start the coordinate system after the first force peak (see Materials and Methods) resulting in a common initial extension for each cycle observed.
Even within each batch there is significant variation and this variation likely has physical relevance in addition to the unavoidable sampling noise associated with a finite discretely sampled data set. Note that the force hump [6, 29] that is believed to correspond to an intermediate is not always detected. It is known that the likelihood of observing the signature of this intermediate decreases as the number of domains unfold  and the strength of this signal depends on unobserved conformation details. We did not attempt to align the force traces from different experiments (we only shifted the extension as discussed above), but as one can see there are clearly different slopes (and curve shapes) in the different force curves. The different slopes can physically be interpreted as a different “effective stiffness coefficients”. Also the presence of a force hump in only some of the force curves demonstrates that there are indeed significant differences in the force responses that alignment would not help resolve. The variation within each batch is large enough in the force curves so as to make the variation induced by the random attachment point negligible. This is not surprising given that several groups have been able to reproduce similar results with this model system . The variation induced by the randomness of the attachment point does seem to be slightly detectable when inspecting the effective internal friction, but further experiments are needed to determine if these differences in the batches are due to the attachment point and the underlying conformational details or to instrument artifacts.
Recall that our primary interest is in the effective friction due to the molecule. The time domain estimate of the friction is very noisy as can readily be seen by looking at the global smoothing splines plotted in Fig. 3 (in Section III C we discuss possible techniques for reducing the variance), but before getting into this technical discussion we prefer to focus on the physical interpretations of this result and compare our result to frequency domain results obtained elsewhere .
To facilitate our discussion we plot results “averaging” over the different unfolding events within each batch in Fig. 4 (see caption for details). The percentage of friction estimated due to the folded polymer construct where the aforementioned percentage is computed according to Folded(z)/Unfold×100+ where is a “shift constant” used to shift the percentage at zero relative extension to zero. This shift constant was used because we used one single Unfold for all data points and it is likely that this value is not appropriate for all experiments due to unavoidable experimental drift; the shift constant is only used to register the curves and facilitate making comparisons (the unshifted data using a common baseline is contained in Fig. 3).
First we discuss similarities with other experiments and theoretical predictions. In  a polymer model quantitatively explaining why a contribution to the effective friction should increase as a function of extension was presented and the authors supported their theoretical predictions with force clamp experiments. A wide degree of variation was observed between different experiment batches when an effective friction was measured ; note that the tip attachment point changes in each batch. We also observe that the effective friction appears to generally increase as a function of extension and that the tip attachment point appears to introduce significant variation. In another study probing the effective friction of a titin molecule containing 5 serially linked I27 domains , the authors found that effective friction tended to decrease as the number of domains unfolded by force increased. Our results in Figs. 3 and and44 also support this finding. A series of I27 domains acts as an effective “shock absorber” and the amount of energy this network can dissipate depends in a nontrivial fashion on the number of folded domains. Recall that the individual I27 domains each consist of a beta sandwich structure possessing a large hydrogen bond network which is believed to absorb/dissipate a significant amount of energy in various situations [1, 2, 29].
Now we discuss some of the differences between our experiments/analysis and previous studies. We studied titin molecules containing 8 repeat I27 domains instead of 5 as was done in the other viscoelastic studies we are comparing to. Also we do not oscillate the cantilever tip in our experiments and appeal to time domain/local maximum likelihood techniques using nonlinear SDEs accounting for state-dependent effective noise whereas the other studies use frequency domain methods that appeal to a linear harmonic oscillator model assuming constant effective noise. With this said, we proceed to present some of the more notable differences in results.
Our computations of the effective friction are noisier than the methods we compare to. This is one of the motivations for presenting the average curve in Fig. 4. After we aligned the curves and computed this average we notice that in each case the effective friction slightly “dips” around the 75% relative extension before it rapidly increases. The large amount of noise and the various approximations employed here are suspect, but the reproducibility of this feature and the known existence of an intermediate suggest that it is possible that the formation of the intermediate temporarily reduces the effective friction. Note that we did subject our models to hypothesis tests  appropriate for nonstationary time series and the models performed adequately. For example, out of 1400 local models (data coming from one batch of experiments) we rejected 6.07% of the local models on par with rejection rate that one would expect under the “null” (assuming the estimated maximum likelihood parameter gives the “true” model). An increased amount of experimental data samples and tests possessing better power properties may help identify/diagnosing potentially poor assumptions in our models using goodness-of-fit tests. The aforementioned additional experimental data can be obtained by various means, e.g. use a higher discrete time sampling rate or using slower pulling speeds. The availability of ‘omnibus” time series tests  is one appealing feature of the time domain methods we present.
In experiments where the cantilever was oscillated , the “dip” was not observed but these authors estimated effective friction curves that appear to contain less uncertainty. It would be interesting to see if the modeling techniques we use here detect a slight dip using the same data sets and/or if the signal to noise ratio in our estimated models is improved in oscillating cantilever experiments. Recall, that in some situations it might not be desirable to introduce an oscillating cantilever if the primary interest is not in the effective molecular friction, but instead in something like the net effective dissociation or unfolding rates . Possessing a technique which can utilize existing data sets to infer multiple physical quantities can help one in better characterizing natural or man-made systems, and our method offers this possibility.
Also, the absolute value of the effective friction we obtained for the second force peak is estimated to be roughly an order of magnitude larger than two studies carried about by other researchers referenced above. This could be in part due to the larger number of repeat I27 domains. Also, both the frequency domain methods and the time domain methods we use are subject to systematic bias. For example we estimate (z) and use this to compute the state-dependent friction. Even if a diffusive process truly generates the data, the measurement noise, finite number of samples and machine drift will make a nontrivial relationship between our estimate and the “truth”, i.e. (z) = σ(z)+b(z)+ε where b(z) is a bias function and ε is a zero mean process meant to account for unavoidable variation due to finite sample sizes. If b(z) > 0 everywhere we would tend to be underestimating the effective friction. This could happen if the instrument noise associated with the photon detector measuring the cantilever deflection “leaked” into the (z) which is only supposed to contain thermal noise sources due to solvent bombardment and molecular friction. If b(z) < 0 everywhere we would tend to overestimate the effective friction. This could happen in our models if we over-smoothed the stage location, λ(t), too much making the system dynamical noise seem smaller than it should be due to the underlying thermal noise. In practice b(z) will likely be a complicated function taking both signs and will be hard to determine from a finite set of samples when laboratory data is analyzed, but higher precision instrumentation, increased sampling rates, and goodness-of-fit tests can help us in approximating this bias in the future. These types of issues are discussed in the next section.
Figure 5 provides a schematic illustrating the basic idea behind our method. Each time series is divided into “M” total blocks (the variable m is used to index the blocks). In the titin cases presented, we selected M so that each block contained about 400 observations uniformly spaced in time. Within each block, there was an average state value associated with the time series, denoted by z0. The local internal force and σ(z) function were estimated using maximum likelihood type methods in this window . Effectively we obtained an estimate of the function (e.g. σ(z0) C) and its derivative (e.g. σ(z)/z|z0 D) at z0. We then used SDE simulations to approximate the covariance associated with the estimated local parameters (assuming a normal error distribution of the estimated parameters). This was carried out for all M blocks, and then a smoothing spline procedure  using all of this information to construct (z) which was our estimate of the unknown “truth” σ(z). We observed that this estimate was noisy; this noise is indirectly reflected in Fig 3(b). A more direct representation of the noise observed in (z) is presented in .
Increasing the rate at which we sample from the AFM device would likely improve this situation. Reducing the pulling speed would also allow us to obtain more data points in a given extension range which again may help our estimates. In addition, we could entertain the possibility of using overlapping blocks; i.e. blocks m = 1 and m = 2 could be constructed to share entries. The local estimation procedure would not change, but accurately accounting for the covariance in the uncertainty estimates used in the smoothing spline procedure would become more computationally intensive. This is one possibility for reducing the variance in our estimates. We could also entertain other methods for parameterizing our local SDEs (i.e. model the effective friction function directly as opposed to the diffusion term). If the problem under consideration requires high accuracy the extra computational load or alternative procedures may be worth pursuing.
Note that we also introduced smoothing into our approximation procedure at several points. We smoothed the stage location λ(t) and assumed the resulting smoothed estimate contained no measurement uncertainty. Different types of smoothing and/or including instrument noise explicitly in this quantity can moderately influence the computations we presented. For example, the absolute value of the effective friction changed by about 20% using a different smoothing method for λ(t) (though the percentage of friction due to the folded protein did not appreciably change because this smoothing influences both and Unfold). We reported the “best” results using the Q-test statistic  to guide our choice. This test is appealing to our situation because it does not require one to assume a stationary time series and it is easy to compute using the maximum likelihood approximation employed in estimating (A, B, C, D) in each local window. We also used spline smoothing in approximating the global estimates of σ(z) and FInt(z). Recall that the method presented utilizes “point function estimates” (e.g. A or C) and “derivatives or linear sensitivities” (B or D) in constructing the global functions . One could also entertain applying standard smoothing procedures to the linear sensitivities directly to assist in understanding physical features. For example, if the “polymer stiffness”  is a complicated function of extension , then one might want to obtain a smoothing spline approximating FInt(z)/z directly. In addition, one may want to consider local models that allow more flexibility than a simple harmonic oscillator or an overdamped Langevin equation. The local models can readily be changed in our setup, and the basic modeling ideas can be applied to other local model structures.
We demonstrated how an SDE modeling method [12, 13] accounting for instrument noise and inherent thermal noise can be applied to estimate the effective friction due to a single molecule in an AFM experiment. The findings were qualitatively in agreement with previous results using different experimental setups and analysis. One appealing feature of the method we presented is that it can be used to estimate effective friction without oscillating the cantilever. This has relevance to constant loading rate experiments [8, 22] which are becoming more popular in single-molecule studies; data generated in constant loading rate experiments can be re-analyzed using our methods and provide additional physical information from existing data. Another attractive feature of our tests is that we can subject our modeling assumptions to goodness-of-fit tests. We also discussed how estimating the derivative (linear sensitivity) of the effective diffusion and/or force could help in identifying state-dependence of these quantities. State-dependent noise is common in complex systems where we only observe a crude system observable like an end-to-end extension of the molecule [12, 13].
These types of computational tools can also help in identifying “outliers” in single-molecule time series. Atypical measurements of the effective friction as a function of extension can sometimes be attributed to unobserved collective conformational coordinates modulating the effective dynamics . Work of this type may help in identifying signatures associated with refolding events . Techniques for quantitatively characterizing the response of single-molecule systems can also assist our understanding of natural and man-made biomaterials .
CPC thanks NSF DMS 0240058, NSF ACI-0325081, and the computer support provided by NSF CNS-0421109 & a partnership between Rice AMD and Cray. The work of NCH and CPC was supported in part by a training fellowship from the Nanobiology Training Program of the W. M. Keck Center for Interdisciplinary Bioscience Training of the Gulf Coast Consortia (NIH T90DK70121-04 and 5 R90 DK71504-04). CHK thanks NSF DMR-0505814 and Welch Foundation C-1632 for support. DDC thanks NSF DMS-0505584.
1The thermal noise in the “internal system” has contributions coming from internal molecular fluctuations as well as solvent bombardment on the molecule and the cantilever tip. Methods for decoupling these noise sources are currently under investigation.
2The width of the time series window was determined by the length of the force trace time series, but each local model consisted of roughly 300 observations and each unfolding event was associated with 20-40 sets of local SDE model parameters.
3An analysis of the autocorrelation of the temporal differences (e.g. ) of the time series suggested that the assumption of white measurement noise was reasonable and the goodness-of-fit tests employed also suggested the white noise proxy was statistically acceptable for our observed time series (results available upon request).
4We used a “level six” wavelet approximation for smoothing the stage, wavelet coefficient thresholding could also be entertained.