|Home | About | Journals | Submit | Contact Us | Français|
We report a theoretical description and numerical tests of the extended-system adaptive biasing force method (eABF), together with an unbiased estimator of the free energy surface from eABF dynamics. Whereas the original ABF approach uses its running estimate of the free energy gradient as the adaptive biasing force, eABF is built on the idea that the exact free energy gradient is not necessary for efficient exploration, and that it is still possible to recover the exact free energy separately with an appropriate estimator. eABF does not directly bias the collective coordinates of interest, but rather fictitious variables that are harmonically coupled to them; therefore is does not require second derivative estimates, making it easily applicable to a wider range of problems than ABF. Furthermore, the extended variables present a smoother, coarse-grain-like sampling problem on a mollified free energy surface, leading to faster exploration and convergence. We also introduce CZAR, a simple, unbiased free energy estimator from eABF trajectories. eABF/CZAR converges to the physical free energy surface faster than standard ABF for a wide range of parameters.
Coordinate-based free energy simulation methods accomplish two tasks: sample a reduced-dimension space of generalized coordinates, and estimate the associated free energy landscape. Among these, the adaptive biasing force method (ABF)1−3 is an adaptive application of unconstrained thermodynamic integration (TI). In ABF, the running estimate of the free energy gradient from TI is used as the biasing force, hence combining the importance sampling and free energy estimation problems.
Here we investigate a combination of ABF with extended-system dynamics as in the Car–Parrinello metadynamics method.4,5 This combination was introduced by Lelièvre et al.,6,7 and is a part of the double-integration orthogonal space tempering method of Yang and co-workers, under the name Dynamic Reference Restraining.8 This extended-system ABF algorithm will henceforth be referred to as eABF. Instead of applying the ABF scheme to generalized variables that depend directly on atomic coordinates, eABF dynamics proceeds in a separate, explicit collective coordinate space with its own equations of motion, and these extended degrees of freedom are harmonically coupled to the collective variables in atomic coordinate space. As eABF does not rely on the free energy surface of interest for importance sampling, a separate estimator of the free energy is needed.
There are two possible benefits from the eABF approach. One is a gain in flexibility and efficiency due to separating the problems of sampling and free energy estimation, as has been investigated in recent studies.9,10 We investigate and discuss this question in detail in this work. The second benefit is bypassing the stringent technical requirements for collective variables to be usable for ABF. The multidimensional TI formalism11 used by our implementation of ABF12 entails some practical limitations: collective variables are required to be mutually orthogonal, and orthogonal to constraints; furthermore, a Jacobian term, which depends on second derivatives of the coordinates, must be calculated. While defining an effective, low-dimension collective coordinate space for molecular processes is often daunting to begin with, additional restrictions on the coordinates that can be used for ABF reduce the applicability of the method in the most challenging cases. eABF relaxes these requirements, making the method applicable to any set of variables whose individual values and gradients can be readily computed.
Below, we present the principles of eABF dynamics in one and multiple dimensions, as well its combination with standard ABF in what we term “semi-eABF”. We derive some properties of eABF-biased distributions of the coordinates, which are useful for understanding the sampling efficiency of the method. We then describe numerical tests and explain the effects of its parameters on convergence, most notably the strength of coupling between the geometric coordinate and the fictitious variable. We investigate the role of noise reduction through mollification of the measured forces by fluctuations of the coupling spring. We introduce the corrected z-averaged restraint (CZAR) estimator of the “true” free energy based on eABF trajectories. Finally, we test the convergence and accuracy of eABF/CZAR and compare it to ABF as well as an Umbrella-Integration-based estimator.
Let us recall the principles and notations of the ABF method,1 the direct parent of eABF. Consider a set of particles of coordinates , subject to constraints of the form σk(q) = 0, and whose physical distribution is the canonical ensemble associated with the potential V(q) at temperature T. Within that ensemble, we wish to sample a vector generalized coordinate (collective variable) of physical interest, z = ξ(q). ABF is an adaptively biased dynamics that produces, in the long-time limit, a uniform distribution of ξ.1,3,7 ABF also yields an unbiased estimate of the free energy profile (or potential of mean force, PMF) A(z), which is defined by
where ρ is the equilibrium probability distribution of ξ. The time-dependent biasing force in ABF is the running estimate of the free energy gradient.
Our implementation of ABF12 relies on multidimensional thermodynamic integration as expressed by Ciccotti et al.11 For each coordinate ξi, let vi be any vector field on atomic coordinates (, where N is the number of atoms) satisfying, for all j and k,
We call vi the inverse gradient of ξi.12 A possible choice is vi = ∑jGi,j–1 ξi, provided that the matrix G (ξi·ξj)i,j is invertible.6 This, however, is not often practical: in simpler cases intuition is sufficient to exhibit valid choices of v, whereas in more complex cases, obtaining v in closed form from this expression is difficult, if at all possible.
The ith partial derivative of the free energy surface A may then be calculated as the ξ-conditioned ensemble average of the instantaneous collective force Fiξ:11
Depending on the function ξ, calculation of the divergence ·vi, which typically implies second derivatives of ξi, may be onerous or impossible to express in closed form. It then becomes desirable to replace ABF dynamics on coordinate ξ with a dynamics that approximates it at a lower cost (including the cost of developing algorithms and production-quality code).
eABF consists in performing ABF dynamics on a fictitious coordinate that fluctuates around ξ(q). We shall now present eABF in detail, first on a scalar variable ξ for simplicity, then in higher dimension.
In eABF, we consider the extended system (q, λ), where λ is a fictitious, nonphysical degree of freedom with mass mλ. λ evolves in time according to Langevin dynamics at the same temperature as the rest of the system. To ensure that λ approximates ξ, we couple these quantities with the harmonic potential (k/2) (ξ(q) – λ)2. eABF can therefore be seen as a multiscale simulation method, where λt represents a coarse-grained dynamics coupled to the atomic dynamics (Figure Figure11). We then define eABF on coordinate ξ as standard ABF dynamics on the extended coordinate ξext, where ξext (q, λ) λ.
The obvious choice of inverse gradient v is equal to the gradient ξext, that is, a vector with null components for all q, and 1 for λ. This satisfies eq 2 and eq 3 because v has no support on coordinates q involved in constraints. The instantaneous force of eq 4 then reduces to the harmonic force from the coupling potential, which can be calculated regardless of the geometry of ξ, and without using the physical forces −V(q), in contrast to standard ABF.
The main convergence property of ABF applies, that is, the biased dynamics of λ is less metastable than the unbiased one, and its limiting probability distribution is uniform, yielding a form of “optimal” sampling. The key intuition behind eABF is that efficient sampling of λ will result in efficient sampling of ξ, given sufficient coupling between those two variables. We will show numerically that this is verified, and examine the requirement on the coupling strength.
In the absence of adaptive bias, the extended potential is
and the equilibrium marginal distribution of λ depends on k:
The corresponding PMF in λ is defined by
In the following, probability distributions are expressed up to a normalization constant, and free energy profiles up to an implicit additive constant, without loss of precision or generality.
The integral in (6) may be recast by inserting ∫δ(ξ(q) – z) dz = 1:
That is, the marginal distribution of the extended variable λ is that of ξ(q) convolved with a Gaussian kernel of variance σ2 (βk)−1. Therefore, in the tight-coupling limit (large k, small σ), ρk becomes arbitrarily close to ρ, and the extended PMF Ak, to the physical PMF A. A numerical example comparing these two quantities (Figure Figure22) illustrates Ak as a smoothed or mollified version of A.
Under eABF dynamics, the applied bias on λ is the running estimate of the average spring force:
where the angle brackets indicate a canonical average conditioned by λ = λ*. The system is driven by a time-dependent biased potential t(q, λ) that converges toward
which generates the following biased Boltzmann distribution:
Integrating over λ gives the biased marginal in z:
which can be summarized as
where Gσ stands for the Gaussian kernel of variance σ2, and * for convolution. In the high-k, low-σ limit, convolution by the kernel becomes the identity operator, and the correction factor becomes ρ(z)−1. Thus, the biased limiting distribution (z) becomes uniform, and eABF in the high-k limit recovers the behavior of standard ABF on coordinate z = ξ(q).
The approach outlined above may be extended to higher-dimension variables, following the principles of multidimensional ABF.12 Given a d-dimensional vector collective variable ξ(q) = (ξi(q))i, we augment the system with a vector of d extended variables λ = (λi)i, coupled through as many harmonic potentials (1/2) ki (ξ(q) – λi)2. Because the components ξi may have different physical dimensions, the spring constants ki are not generally commensurable.
To recover the multidimensional thermodynamic integration formalism recalled above, we define the inverse gradient vector vi as equal to the gradient (q, λ) ξiext, that is, the vector with component 1 on coordinate λi and zero elsewhere. This satisfies the conditions of eqs 2 and 3. Components of the adaptive biasing force are then
where the angle brackets indicate an average with respect to the canonical distribution of (q, λ) conditioned by λ = λ*. The case of a two-dimensional coordinate is illustrated in numerical tests below.
Consider the special case of two collective variables ξ1 and ξ2. One may then run a semi-eABF simulation, that is, a simulation in extended space (q, λ) with a harmonic potential coupling λ and ξ2(q), and an ABF bias on the two-dimensional coordinate ξext(q, λ) (ξ1(q), λ).
A vector field v1 needs to be defined, verifying v1(q) ·ξ1(q) = 1 for all q (eq 2), and orthogonal to constraints (eq 3). The mutual orthogonality condition of eq 2 is trivially fulfilled because the second coordinate is independent of q. The biasing forces may then be calculated based on
where the angle brackets indicate averages with respect to the canonical distribution of (q, λ) conditioned by (ξ1(q) = z1, λ = λ*).
More generally, one may run ABF dynamics on a vector variable involving any combination of geometric coordinates and fictitious variables. Numerical tests of the semi-eABF approach in two dimensions are presented below, together with an appropriate free-energy estimator.
The eABF free energy gradient estimator converges in the long-time limit to Ak ′(λ), which is not the “true”, physical free energy gradient A′(z). Here we discuss how to estimate the physical free energy gradient from an eABF simulation.
The simplest estimator uses Ak as an approximation to the physical PMF, A(z). This, however, is not asymptotically unbiased in the long-time limit, because of the convolution in eq 8, as can be seen in Figure Figure22. Ak does converge to A only in the high-k limit. Below, we describe and test two other estimators, one of which is asymptotically unbiased regardless of k. Another option, since Ak is related to A by convolution, is to apply a generic deconvolution algorithm;13 however, that approach again yields a biased estimator.
Zheng and Yang proposed a free energy estimator in the context of the orthogonal space tempering method,8 but it is applicable more generally to any eABF-like simulation. Considering that each λ-value is a state wherein z is sampled under a harmonic restraint, the free energy derivative can be written based on umbrella integration.14 The final estimate of the free energy derivative is a weighted average of the umbrella integration values from different “restraining ensembles” (λ-states):8
where (z|λ) is the observed, bias distribution of ξ(q) conditioned by λ. Umbrella integration further assumes that this distribution is nearly Gaussian, hence its log-derivative may be approximated based on its mean z λ and variance (σλz)2:
This assumption reduces the variance of the estimator, but introduces a bias that formally vanishes only in the high-k regime, when ρ(z|λ) becomes peaked and tends toward a Gaussian.
In the past years, we informally provided the community with an implementation of the Zheng/Yang estimator as a post-treatment of eABF trajectories, which was inefficient for large amounts of data. This was improved by an on-the-fly implementation that was published recently.15
We propose here an asymptotically unbiased estimator of A, named corrected z-averaged restraint (CZAR). Below, we introduce this estimator based on physical intuition (see Supporting Information for a formal derivation).
Consider the eABF-biased dynamics of coordinate z = ξ(q). This coordinate only feels the eABF bias through the spring force k(λ – z). Thus, at a given value of z, the biasing force on coordinate λ results in an effective average biasing force on z equal to k(λz – z) (where λz is the conditional average of λ at a given value of z). The free energy gradient is the log-derivative of the unbiased distribution ρ:
from which we can derive a relationship with the eABF-biased distribution :
The right-hand side can be estimated numerically from the time trajectory of (z, λ).
We note that eq 22 is related to eq 19 by swapping the integration with respect to λ and differentiation with respect to z. Thus, the main difference between the CZAR and Zheng/Yang approaches is that the latter relies on a Gaussian approximation for the conditional distributions of z. A practical benefit of CZAR is that it does not require defining discrete λ-states, which makes implementation simpler and removes a tunable parameter.
The CZAR estimator generalizes to eABF on a vector variable of arbitrary dimension, z = (zi)i = (ξi(q))i. Each component of the free energy gradient may be estimated as
The derivation is provided in the Supporting Information, in dimension 2 for the sake of simplicity. It extends to arbitrary dimension in a straightforward way.
The CZAR estimator is valid for standard ABF as well, if the z-averaged force is replaced with the average force of ABF for “non-extended” coordinates. The log-derivative of the ABF-biased distribution vanishes for long times as the distribution becomes uniform. Thus, CZAR may be used transparently in the semi-eABF case, as it provides unbiased estimates of the free energy gradient along both extended and nonextended variables. This is used in the numerical tests of semi-eABF below.
Our eABF implementation in the Collective Variables Module (Colvars)16 is essentially a combination of two pre-existing features: extended-system coordinates, and multidimensional ABF.12 The extended Lagrangian functionality of the colvars module includes separate velocity-Verlet or Langevin integrators, independent from that of the underlying MD engine, as well as harmonic restraints coupling each extended degree of freedom to its counterpart collective variable.
Activating extended dynamics introduces two tunable parameters for each extended coordinate: integrating the Langevin equation of motion requires that the coordinate be assigned a fictitious mass, and the harmonic spring is defined by an arbitrary force constant. Rather than choosing those parameters directly, the Colvars implementation relies on an equivalent pair of quantities whose choice is more intuitive: the “thermal width” of the coupling, σ, which is also the standard deviation of the Gaussian kernel in eq 8, and its time constant τ (the period of the oscillator). They are determined from the following expressions:
where m is the fictitious mass and k is the harmonic force constant. Broadly speaking, if λ is to approximate ξ(q), then a small value of the tolerance σ is desirable. Conversely, the oscillator period should be much larger than the MD time step for stability of the integrator. More refined criteria for choosing these parameters will emerge from the numerical tests described below.
One practical benefit of implementing eABF over standard ABF, besides doing away with geometric complexity, is estimating the free energy gradient without needing the values of atomic forces. In large-scale parallel applications where NAMD17 and LAMMPS18 are often used, this removes the need to gather these atomic forces on the computing node tasked with executing the ABF algorithm. The decrease in interprocess communications should yield scaling improvements when many particles are involved in the definition of the collective variable.
We provide an implementation of the new CZAR estimator within the Colvars Module.16 It is de facto available for any combination of MD engine and platform supported by the Colvars, most notably, NAMD17 and LAMMPS.18 The algorithm is lightweight and relies on two accumulators that are updated at every time step: the z-conditioned average forces kzk – λz, and the biased z marginal distribution, (z). These two quantities are only combined at output time, to yield the free energy gradient estimate according to eq 22. The implementation, as with most Colvars functionality, supports an arbitrary number of variables. In one-dimensional cases, the gradient is integrated automatically using the trapezoidal rule. In multidimensional cases, the gradient must be integrated in a postprocessing step to yield the free energy surface. To this end, we use our discrete Markov chain Monte Carlo algorithm12 in this work for practical reasons, but in principle any Poisson solver with adequate boundary conditions could be used.19
The test systems considered here were two small peptides in vacuum that were used in many studies before. The helical peptide deca-alanine was associated with its end-to-end distance (using the first and last α carbon atoms, in the absence of bond constraints). For two-dimensional tests, conformation of the dipeptide mimic N-acetyl-N-methyl-l-alanylamide (NANMA) was described by its central Ramachandran angles and ψ.
Simulations were run with the development version of NAMD 2.12,17 together with the Collective Variables Module16 version 2016-09-03, which will be included in NAMD 2.12. In vacuo systems were modeled by the CHARMM22 force field, and simulated with a time step of 0.5 fs with Langevin dynamics at 300 K and a damping coefficient of 5 ps–1. The Zheng and Yang umbrella integration estimator8,14 was used in its NAMD/Colvars implementation by Fu et al.15
Convergence of free energy gradient estimators was assessed as follows. For each set of parameters, 10 simulations were started using different random seeds and stochastic initial velocities, and run for 10 ns each. For assessing the convergence of true free energy estimators, sampling was increased to 20 simulations of 20 ns each to account for their higher variance. Values of the estimators to be tested were saved every nanosecond. The average of the final estimates of the free energy gradient from those replicas was the target for measuring “relative” convergence toward the limiting value of a given estimator under a given set of parameters. Reference ABF simulations were run under the same conditions except for the extended variables. The average ABF gradient from those reference simulations was the target to test for “absolute” convergence toward the true free energy gradient. For each individual test simulation, the root-mean-square distance from the estimated gradient to the target was calculated as a function of sampling time. The average and standard deviations of these distances over sets of simulations were used for comparing the accuracy and rate of convergence of estimators, and are plotted in Figures in the Results section.
When estimating the free energy profile A(z) in an eABF simulation, convergence occurs at two levels. At the level of eABF itself, the free energy gradient estimator converges toward Ak′(λ), and the observed distribution of λ becomes uniform. This depends on tunable eABF parameters σ and τ, and on the delay parameter fullSamples. The second level is convergence of the estimators of A(z) based on the eABF trajectory (zt, λt). The latter depends on the choice of estimator, and in the case of the Zheng/Yang and CZAR estimators described above, on good sampling of the joint distribution of z and λ.
Because the methods are intended for large-scale molecular simulations where sampling time is necessarily limited, we test the estimators and parameter sets based on their rate of convergence at short and moderate times (10 ns for the fast deca-alanine system).
The fullSamples parameter controls when enough sampling has been collected in one given bin for the adaptive biasing force to be introduced. To ensure continuity of the force as a function of time, the force is scaled by a linear ramp that goes from zero to one as the local sample count evolves between half the value of fullSamples and its full value.12 The top panel in Figure Figure33 shows convergence rates of test simulations as a function of fullSamples. While the deviation at the end of the 10 ns simulations is equivalent, it can be seen to affect the initial convergence rate. A value of zero leads to transient overcompensation of the underlying free-energy, with strong dispersion of the results in the initial 4 ns for this particular case. This relaxation time could be much longer for more slowly relaxing systems, however, which is why larger values are advisable. Even for this toy system, a value of 2000 samples gives the fastest short-time convergence, as well as nearly optimal accuracy in the intermediate regime (4–8 ns) until the accuracies of all simulation sets converge at 10 ns. The fullSamples parameter is not specific of eABF, yet its role is not necessarily identical in eABF as in standard ABF. Tests of ABF on the same system showed that a value of 0 for fullSamples did not impair convergence.20 This confirms that short-time scale relaxation properties of ABF and eABF are substantially different, as eABF dynamics is dominated by the relatively slow oscillations of the fictitious degree of freedom (Figures Figures11 and and4).4). On longer time scales, however, the limiting factor becomes orthogonal relaxation of slow degrees of freedom that are not captured by the chosen transition coordinate. That slow phenomenon occurs under similar circumstances in ABF and eABF. It is not well-captured by a small gas-phase system such as this one, yet slower systems would not lend themselves to precise statistical characterization.
Like fullSamples, the fluctuation period τ of the oscillator (defined in Equation 25) has no measurable influence on the deviation after 10 ns of simulation (Figure Figure33, middle). However, it does slightly influence the rate of convergence for shorter times. For this system, we find 100 fs to give the fastest convergence. A shorter time of 50 fs gives similarly fast convergence, yet this is only possible because we use a short MD integration time step of 0.5 fs in these gas-phase simulations. For condensed-phase simulations with longer timesteps, larger values of τ may be required to guarantee the accuracy of the extended-system trajectory: we recommend scaling it with the integration time step of the extended variable. Unless otherwise stated, other eABF simulations of the deca-alanine system are performed with fullSamples set to 2000 and a τ of 100 fs.
The key parameter of an eABF simulation is the coupling width σ, whose effect on convergence is illustrated by the lower panel of Figure Figure33. In the σ →0 limit, eABF is equivalent to standard ABF. The initial rate of convergence is simply an increasing function of σ: broader fluctuations of the fictitious degree of freedom appear to accelerate convergence. The slowest initial rate is obtained for standard ABF. This order is still valid at the end of the 10 ns interval, although error values have come closer together. Standard ABF and eABF at σ = 0.1 still have the highest variance. This result is surprising in the sense that higher values of σ make the simulation more “approximate” or “blurry”. However, this expected loss of accuracy leads to a gain in precision.
In ABF, fluctuations of the instantaneous force Fξ are dominated by high-frequency terms due to bonded interactions (Figure Figure44). In eABF, the mean square fluctuation of the spring force σF2 is of the order of k/β, which implies:
Thus, the choice of σ is a trade-off between fluctuations of the extended variable (bias of Ak as an estimator of A) and fluctuations of the extended force (variance of Ak): a high σ, low k will yield a limiting value of Ak that is very different from A but is estimated with low variance. However, the bias on Ak can be corrected by the estimators described below, which displaces the optimal compromise toward a soft (low k) rather than a stiff spring (high k). As evidenced numerically in Figure Figure44, even a moderately soft spring results in considerable noise reduction with respect to ABF.
The upper panel of Figure Figure55 shows the joint biased distribution (z, λ) observed for four different values of σ. These distributions observed numerically agree well with the prediction of eq 12 based on numerical estimates of A and Ak (data not shown). The common feature is that the marginal distribution in λ is nearly uniform, thanks to the eABF bias. Setting σ = 0.1 Å enforces near equality of the two variables, leading to a very narrow joint distribution and a nearly uniform z-marginal as well (lower panel of Figure Figure55). For looser coupling, the distribution broadens. At σ = 0.5, the high-probability region deviates visibly from the equality line, leading to undersampling of low values of z. At σ = 1, sampling along z becomes strongly nonuniform, with variations across 3 orders of magnitude. In this example, eABF sampling is most effective for σ between 0.2 and 0.5 Å, that is, where the mollification is significant but not sufficient to impede sampling along z.
Therefore, the drawback of a softer spring does not directly reside in the greater difference between A and Ak, but rather in the greater deviation of the limiting biased distribution of z from uniformity, eventually yielding poor sampling as high-free-energy regions in z may be missed despite complete sampling over λ.
Since the eABF bias differs from the optimal bias by Gaussian convolution (eq 13), deviations of (z) from uniformity are related to the difference between ρ and its convolution. Considering the convolved of ρ by a Gaussian of variance σ2 as an approximation to ρ(z) in the small-σ limit, it can be shown (see Supporting Information) that the error is bounded by ρ″L∞σ2/2. That is, deviation from uniform sampling depends on the curvature of ρ(z). One may see from Figure Figure55 that sampling of z is reduced in concave regions of the PMF (hence of ρ) and increased in convex regions. As a result, the optimal value of σ for a given application depends on the curvature of ρ.
Figure Figure66 shows the convergence toward the “true” free energy derivative A′(z) of the naive, Zheng/Yang (ZY), and CZAR estimators compared to standard ABF for different values of σ. The long-time bias of the naive estimator is clearly apparent for larger values of σ, but this estimator still proves faster-converging than standard ABF, and as accurate at t = 20 ns, for σ equal to 0.1 Å.
On this time scale, the asymptotic bias of the ZY estimator at finite σ is only visible for σ of 0.5 or 1 Å: on those conditions, the Gaussian assumption (20) for the conditional distribution is not verified. For lower values of σ, the bias is masked by the error due to variance: the mean error at 20 ns is equivalent to that of CZAR or standard ABF. Still, the convergence rate is slightly slower at 0.2 Å and at 0.1 Å, with a greater dispersion. Since the CZAR and ZY estimators share their mean force term, this could be ascribed to noise in the two-dimensional (z, λ) histogram, which the Gaussian approximation may not erase at short times in less-populated regions of the histogram. Another contributing factor could be the error due to discretization along λ.
The defining feature of CZAR is its robustness with respect to σ, directly deriving from its asymptotically unbiased nature. Its rate of convergence, however, depends on efficient sampling along z. For too large values of σ (here, 1 Å), enhanced sampling along λ leaves some metastability or unsampled regions along ξ (Figure Figure55), leading to slower convergence of the PMF estimator (blue line).
In constrast with convergence of the mollified eABF free energy (Figure Figure33), all estimators seem more accurate at small σ (0.1 Å), but the CZAR estimator shows much more robustness with respect to σ.
On the two-dimensional coordinate describing the NANMA peptide, the PMF estimates from eABF/CZAR and standard ABF are visually indistinguishable after 100 ns of sampling (Figure Figure77). A more quantitative view of the convergence is given by the upper panel of Figure Figure88. The optimal value of σ here is 5°. σ = 1° produces a high initial error, certainly due to the high variance of the 2D (, ψ) histogram at short sampling times. At 100 ns, this estimate has reached the error of standard ABF. Conversely, the σ = 10° simulation shows a high initial convergence rate, but at later times, its convergence is impeded by poor sampling along (, ψ) due to insufficient coupling to (λ, λψ), similar to the σ = 1 Å case for deca-alanine (Figure Figure55).
The semi-eABF case is a form of 2D-ABF dynamics where one of the variables is biased directly, while the other is coupled to a fictitious degree of freedom that is biased. The results are largely similar to the 2D-eABF case, demonstrating the practical effectiveness of the approach. There is one interesting exception: the loose coupling case (σ = 10°), which shows slower convergence in 2D: eABF converges well when only the angle ψ is treated with an extended variable. This results from the anisotropic shape of the 2D PMF, where the barriers are more pronounced along than along ψ (Figure Figure77).
The eABF approach is a generalization of ABF, when applied to fictitious degrees of freedom coupled to generalized coordinates of interest. In contrast to ABF, it imposes no requirements as to what combinations of coordinates may be used, and only their gradients, not their second derivatives, are needed. eABF sampling is more effective than ABF due to the noise reduction effect of the fluctuations of the extended variable, as well as the nonlocal effect of the biasing force. We introduce CZAR, a conceptually simple, asymptotically unbiased estimator of the free energy that converges faster than ABF for a broad range of coupling strengths.
Thus, we have shown that the deviation of Ak from A is not damaging, as long as sampling of the actual geometric coordinate is efficacious. This stems from the essential difference between eABF and ABF, that is, separating the force used for sampling from the free energy gradient estimator. In ABF, the free energy that is being estimated is used exactly to accelerate the dynamics. eABF decouples the sampling bias and free energy estimate, leaving some flexibility to use a biased estimator with reduced variance and fast convergence for sampling, while still ultimately benefiting from an unbiased free energy estimator. In this sense, eABF is similar in spirit to a recently proposed approach, combining metadynamics for exploration and a separate free energy estimator.10 Indeed, the convergence rate of eABF/CZAR as a free energy estimator compares favorably to that of standard ABF, essentially because it combines efficient exploration on a smoothed free energy surface with an unbiased estimator of the actual free energy.
eABF is applicable where ABF is not, in particular for elaborate generalized coordinates where calculation of second derivatives is cumbersome, and for sets of coordinates where orthogonality of the inverse gradient (eq 2) is difficult to satisfy. Its added algorithmic complexity (integrator for the extended equations of motion and free energy estimator) is offset by the reduced geometric calculations. There are two additional tunable parameters in eABF. However, we have shown that one of them, the oscillation time, has limited impact on convergence, and the other, the coupling width, has a range of safe values, particularly if a robust, asymptotically unbiased free energy estimator is used.
Based on these results, it may be argued that eABF/CZAR is always preferable to ABF for its rate of convergence, regardless of practical implementation considerations. This may become clearer when several large-scale, numerically challenging applications are documented.
The authors gratefully acknowledge support from the Laboratoire International Associé CNRS-UIUC, from the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) ERC Grant Agreement number 614492, and from the French National Research Agency under grant ANR-14-CE23-0012 (COSMOS).
The authors declare no competing financial interest.