Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2700186

Formats

Article sections

Authors

Related links

J Comput Chem. Author manuscript; available in PMC 2010 August 1.

Published in final edited form as:

PMCID: PMC2700186

NIHMSID: NIHMS101614

The publisher's final edited version of this article is available at J Comput Chem

See other articles in PMC that cite the published article.

Replica exchange accelerated molecular dynamics (REXAMD) is a method that enhances conformational sampling while retaining at least one replica on the original potential, thus avoiding the statistical problems of exponential reweighting. In this paper we study three methods that can combine the data from the accelerated replicas to enhance the estimate of properties on the original potential: weighted histogram analysis method (WHAM), pairwise multistate Bennett acceptance ratio (PBAR), and multistate Bennett acceptance ratio (MBAR). We show that the method that makes the most efficient use of equilibrium data from REXAMD simulations is the MBAR method. This observation holds for both alchemical free energy and structural observable prediction. The combination of REXAMD and MBAR should allow for more efficient scaling of the REXAMD method to larger biopolymer systems.

Free energy is the driving force behind biochemical problems of great importance, from drug binding to protein function. A computational approach to free energy calculation affords complete control over the system and method, as well as atomistic detail of the results. The use of computational free energy calculation can theoretically provide free energies for very specific processes that are difficult to isolate in experiment. Among the necessary conditions for accurate free energy prediction are accurate sampling and efficient use of the collected data.

The sampling of computational molecular dynamics has been practically limited due to the topography of the potential energy surface, which requires femtosecond timesteps when propagating the system, and the in the case of biopolymers, sampling many isolated regions of the potential energy. These issues mean that a large number of molecular dynamics steps need to be computed in order to simulate systems with micro-to millisecond relaxation times, but the most heroic brute force approaches have yielded only tens of microseconds.^{1}^{–}^{3} Accurate sampling can also be achieved through methodological developments that increase the sampling efficiency. For example, temperature replica exchange molecular dynamics (TREMD) has enjoyed a surge in popularity following the seminal publication of Sugita and Okamoto.^{4} TREMD overcomes local energy barriers by simultaneously simulating the dynamics of a set of non-interacting systems at different temperatures. Conformations sampled at high temperatures are more likely to overcome conformational barriers and exhibit higher sampling efficiency than low temperatures. The sampling efficiency passes between replicas through periodic Metropolis Monte Carlo attempts, which retain the detailed balance of the system. Despite the advantage of conformational exchange between temperatures, TREMD increases the energy of all degrees of freedom instead of activating those specifically important to conformational sampling.

Hamiltonian modification schemes selectively modify the system potential energy and can greatly enhance sampling over TREMD.^{5} For example, by applying a harmonic bias along a reaction coordinate of interest, umbrella sampling increases the number of conformations sampled local to the bias minimum.^{6}^{,}^{7} Accelerated molecular dynamics (AMD), on the other hand, typically applies a bias along the dihedral degrees of freedom, decreasing the potential barrier between states while retaining the general shape of potential basins.^{8} Unlike umbrella sampling AMD does not require prior knowledge of the reaction coordinate, although parameterization of the amount of bias should be done to optimize the acceleration. AMD has been used to reproduce NMR residual dipolar couplings, measures of molecular motion on up to the millisecond time scale, in the third IgG-binding domain of protein G.^{9} Despite the promise of AMD to increase the computational efficiency of conformational sampling, each observable must be exponentially reweighted in order to recover the unbiased ensemble average, which Shen and Hamelberg showed yields large statistical uncertainties when applying to highly accelerated simulations.^{10}

In order to increase the statistical efficiency of AMD we recently developed a replica exchange accelerated molecular dynamics (REXAMD) scheme, which combines the selective activation of accelerated molecular dynamics and replica exchange.^{11} The resulting Hamiltonian replica exchange simultaneously simulates the dynamics on potentials of varying acceleration instead of varying temperature. We previously demonstrated the effectiveness of REXAMD for alchemical free energy calculations of small model systems.^{11} These free energy calculations only used the *V*/*λ* values sampled from the ground, or unaccelerated, state. This analysis strategy makes no use of data generated in the accelerated states and limits the computational efficiency of the method as a result. Furthermore, the computational efficiency will decrease as the system size increases because the number of replicas required for an efficient replica exchange will increase.^{12} In order to mitigate this decrease in computational efficiency we will briefly introduce and compare the performance of three methods of recombining multi-state data from REXAMD simulations: WHAM, MBAR, and PBAR.

The most widely used method to combine different biased simulations and produce an estimate of the unbiased result is the weighted histogram method (WHAM).^{13} Central to this approach is the definition of an observable of interest, for example *V*/*λ*. The bias from each of the multiple states is removed through an exponential reweighting of the biased probability density function of the observable. These unbiased probability density functions, one from each state, are then combined in a weighted sum at each value of the observable subject to the constraints that the weighted variance be minimized and the weights normalized. The WHAM formulation has been extended to REXAMD and applied to combine the (, ψ) distribution from accelerated potentials during a REXAMD simulation, yielding the unbiased (, ψ) distribution of various oligopeptides.^{14} While making use of the data generated in accelerated states improves the computational efficiency of the REXAMD method, WHAM suffers from an inherent systematic bias that plagues all histogram methods. The bias arises when the probability density of the observable varies greatly over the interval spanned by the bin, an observation rigorously derived by Kobrak.^{15} Decreasing the bin width attenuates the problem, but increases the statistical error of the histogram of the simulation data used to approximate the population density. The optimal bin width that balances these two effects should be found for each specific system, but the effects are always present.

The pairwise multistate Bennett acceptance ratio method (PBAR) method developed by Maragakis *et al.* extends the maximum likelihood derivation of the Bennett acceptance ratio method to handle multiple pairs of states simultaneously.^{16} The PBAR method is applicable to both equilibrium and non-equilibrium work data and requires that the work data between different pairs of states be independent. The independence criterion is a practical limitation when applying the PBAR method to the equilibrium samples generated from REMD simulations; the total amount of samples per state N* _{i}* must be split into independent samples for each pair of states

Shirts and Chodera developed a different multistate Bennett acceptance ratio method (MBAR).^{17} The MBAR approach requires equilibrium samples from each state and the energy of each sampled structure at every state. This lends itself naturally to a REMD approach when the temperature or Hamiltonian modification is straightforward and computing the energy of a structure at the different states is easy. In extreme cases post-processing of the equilibrium samples to compute the energy at different states can be expensive, such as when a soft-core alchemical potential is used.^{18}^{,}^{19} The MBAR analysis uses all of the equilibrium data for each pair of states, and therefore should scale better with the number of states than the PBAR method for equilibrium simulations. Validation was performed by constructing the potential of mean force of the extension of a DNA hairpin in an optical double trap from different constant force trajectories.

In this paper we will first describe the analytical model system and REXAMD simulation scheme. The WHAM, PBAR and MBAR techniques will be discussed in further depth, highlighting the main equations and practical implementation. The performance of WHAM and MBAR will then be compared against the previous ground state approach. The MBAR and PBAR methods will then be compared, and finally the extension of MBAR to computing equilibrium structural properties will be investigated.

A simple, analytically solvable model system provides a wealth of information against which to compare the simulation performance. We selected a linear four-atom molecule (pseudo-butane) with no van der Waals or electrostatics forces, leaving the dihedral angle as the sole degree of freedom. There are two stable conformations at ± 90 degrees (p-form and m-form, respectively) with different relative depths shown in Figure I. The alchemical change progressed from the dominant p-form to the dominant m-form according to a linear scaling of the potentials (Equation 1), and due to the symmetry of the endpoints the alchemical free energy change is zero. The energetic barrier between the two conformations is at least 8 kcal/mol for all *λ _{i}*, and thus requires aggressive acceleration in order to generate an equilibrium distribution.

$$\begin{array}{l}V(\lambda )=(1-\lambda ){V}_{0}+\lambda {V}_{1}\\ \frac{dV}{d\lambda}={V}_{1}-{V}_{0}\end{array}$$

Equation 1

The REXAMD simulations have four states each, ranging from un-accelerated (s00) to an acceleration that results in a completely flat potential energy surface along the torsional degree of freedom (s03). The exchange rate between these states is higher than 50%, and thus efficient mixing of the replicas occurs on a very short timescale.^{20} The exchange period of the REXAMD simulations is 1 ps and the simulations are coupled to a 300K Langevin thermostat with a collision frequency of 50 ps^{−1}. The instantaneous energy and *dV*/*dλ* values are taken from the exchanging structures. Systems corresponding to the endpoint lambda values (0.0 and 1.0) as well as the lambda values corresponding to a five-point Gaussian quadrature (0.04691, 0.23077, 0.50000, 0.76923, 0.95309) were simulated for five nanoseconds, and each *λ _{i}* REXAMD simulation was run four times.

The WHAM method applied to REXAMD relies on unbiasing the probability densities of a specific reaction coordinate for each state j through exponential reweighting (Equation 2). Equation 2 is exact in the limit of zero bin width, but
${P}_{\mathit{biased}}^{j}(Q)$ must be approximated by a finite bin width histogram of the observable Q. Kobrak showed analytically that the discretization of a continuous observable Q required by WHAM results in a competition between systematic and statistical error.^{15} In the formalism of WHAM for REXAMD the systematic error arises from the approximation of the
${P}_{\mathit{biased}}^{j}(Q)$ and
${\langle exp(\beta {V}_{\mathit{bias}}^{j})\rangle}_{\mathit{biased},j,Q}$ from a finite bin width and increases with increasing bin width. The discretized
${P}_{\mathit{biased}}^{j}(Q)$ and
${\langle exp(\beta {V}_{\mathit{bias}}^{j})\rangle}_{\mathit{biased},j,Q}$ are estimated from histograms of the sampled data, which introduces a statistical error that decreases with increasing bin width. The
${P}_{un\mathit{biased}}^{j}(Q)$ from each state are then combined according to Equation 3 with a set of weights *w _{j}* (

$${P}_{un\mathit{biased}}^{j}(Q)=\frac{{Z}_{\mathit{biased}}^{j}}{{Z}_{un\mathit{biased}}}{P}_{\mathit{biased}}^{j}(Q){\langle exp(\beta {V}_{\mathit{bias}}^{j})\rangle}_{\mathit{biased},j,Q}$$

Equation 2

$${P}_{un\mathit{biased}}(Q)=\sum _{j=1}^{K}{w}_{j}(Q){P}_{\mathit{biased}}^{j}(Q)$$

Equation 3

The MBAR method is rooted in the identity given in Equation 4 (Equation 5 in Reference ^{17}) where *q _{i}*(

$$\underset{\mathrm{\Gamma}}{\int}{q}_{i}(x){\alpha}_{ij}(x){q}_{j}(x)dx={Z}_{i}{\langle {\alpha}_{ij}{q}_{j}\rangle}_{i}={Z}_{j}{\langle {\alpha}_{ij}{q}_{i}\rangle}_{j}$$

Equation 4

$$\sum _{j=1}^{K}\frac{{Z}_{i}}{{N}_{i}}\sum _{n=1}^{{N}_{i}}{\alpha}_{ij}{q}_{j}({x}_{in})=\sum _{j=1}^{K}\frac{{Z}_{j}}{{N}_{j}}\sum _{n=1}^{{N}_{j}}{\alpha}_{ij}{q}_{i}({x}_{jn})$$

Equation 5

The Python implementation of MBAR, PyMBAR (https://simtk.org/home/pymbar), was used for all MBAR analysis. The statistical uncertainty of the free energies and expectation values are based on an estimate of the asymptotic covariance matrix of the provided data, which requires an uncorrelated dataset. Correlation can be removed from the molecular dynamics data by subsampling the original data set at an interval greater than or equal to the equilibrium relaxation time of the molecular dynamics system. For this work we used the subsampling technique implemented in PyMBAR.^{17}

The PBAR method extends the maximum-likelihood derivation of the Bennett acceptance ratio to multiple states. The log likelihood (Equation 6) involves the Fermi function *f*(*x*)=1/(1+*e ^{x}*)of the instantaneous work values between states

$$lnL=\sum _{i}^{K}\sum _{j\ne i}^{K}\sum _{{n}_{ij}}^{{N}_{ij}}f(-\beta ({W}_{ij}({x}_{{n}_{ij}})-\mathrm{\Delta}{F}_{ij}+{M}_{ij}))$$

Equation 6

The first application of the REXAMD method to alchemical free energy calculations used the instantaneous *dV*/*dλ* values taken from the un-accelerated, or ground state, to compute *dV*/*dλ** _{λi}* for use with Gaussian quadrature thermodynamic integration (Equation 7).

$$\mathrm{\Delta}F={\int}_{0}^{1}{\langle dV/d\lambda \rangle}_{{\lambda}^{\prime}}d{\lambda}^{\prime}\approx \sum _{i}^{N}{w}_{N}({\lambda}_{i}){\langle dV/d\lambda \rangle}_{{\lambda}_{i}}$$

Equation 7

In TI-WHAM the four replicas of a REXAMD simulation at a specific *λ _{i}* are combined to estimate

The *Individual Lamda* TI-MBAR also uses the four REXAMD replicas for each *λ _{i}* to compute

A more efficient use of the data is to simultaneously use all four states from all five *λ _{i}* and thus utilize 100,000 structures per

The TI-MBAR method was helpful in comparing the advantage of MBAR over TI-WHAM, but the PBAR and MBAR methods were developed to directly estimate the free energy difference between states. The *λ _{i}* endpoints (0.0 and 1.0) need to be simulated in order for a direct estimate of the free energy difference from MBAR and PBAR, and the results in Table III include data from all seven

Interestingly, the direct MBAR method also outperforms the PBAR results. Shirts and Chodera predicted that because the PBAR method requires independent sets of work values between each pair of states MBAR would make better use of equilibrium data.^{17} To get independent sets of work values the *N _{i}* samples per state must be distributed into

The MBAR Subsets results follow the procedure outlined above for sampling random subsets of the total *N _{i}* structures and reporting the average and standard deviation. The size of

The REXAMD method is not limited to free energy calculations, and has been used for determining equilibrium structural properties in conjunction with the WHAM method.^{14} The calculation of expectation values using MBAR instead of WHAM should avoid the bias introduced by discrete binning during the reconstruction process.^{15}^{,}^{17} The MBAR method naturally computes equilibrium expectation values, and this can be extended to computing population histograms and potential of mean force for observables. Figure II illustrates the distribution of the model system’s dihedral angle from a single simulation at *λ _{i}*= 0.5 where the potential energy surface is symmetric. The five thousand structures from the ground state method (Figure II) show an overpopulation of the p-form. All four REXAMD states at

Many important equilibrium conformational analysis techniques can be expressed as expectation values, and thus would greatly benefit form the combination of REXAMD and MBAR. For example, the covariance matrix and the related principal components analysis and quasi-harmonic entropy (for the mass-weighted covariance matrix), which are very sensitive to sampling, are defined in terms of expectation values (Equation 8). The root mean squared deviation and the root mean squared fluctuation are also expectation values that are frequently used in conformational analysis.

$$\mathit{Cov}({X}_{i},{X}_{j})=\langle ({X}_{i}-\langle {X}_{i}\rangle )({X}_{j}-\langle {X}_{j}\rangle )\rangle =\langle {X}_{i}{X}_{j}\rangle -\langle {X}_{i}\rangle \langle {X}_{j}\rangle $$

Equation 8

All REMD methods require an increasing number of replicas as the system size increases, and it becomes important to make efficient use of all of the replica data and not just the data at the desired temperature or acceleration. We set out to determine the best way to combine the biased data from REXAMD simulations to improve the accuracy of free energy calculations and structural analysis. The performance of WHAM and MBAR at computing *dV*/*dλ** _{λi}* from multiple acceleration states at specific

The MBAR and PBAR methods were then compared against each other, which required simulating the alchemical endpoints. The MBAR method was approximately an order of magnitude more precise than the PBAR method. The inefficiency of PBAR relative to MBAR was shown to be due to the requirement of independent data sets for PBAR, which reduced the amount of data available for each pair of states during the analysis. The most efficient way of combining equilibrium samples of REXAMD data is conclusively MBAR. We then demonstrated the usefulness of MBAR in combining multiple states to generate unbiased structural quantities. The combination of REXAMD and MBAR should allow large system sizes to be efficiently sampled and analyzed.

We thank Michael Shirts and John Chodera for making the PyMBAR code publicly available (https://simtk.org/home/pymbar). This material is based upon work supported under a National Science Foundation Graduate Research Fellowship (to MF). Funding by NIH GM31749, NSF MCB-0506593 and MCA93S013 (to J. A. M.) also supports this work. We also gratefully acknowledge additional support from the Howard Hughes Medical Institute, the Center for Theoretical Biological Physics (NSF PHY-0822283), and the National Biomedical Computation Resource (NIH P 41 RR08605).

1. Freddolino PL, Liu F, Gruebele M, Schulten K. Biophys J. 2008;94(10):L75–77. [PubMed]

2. Maragakis P, Lindorff-Larsen K, Eastwood MP, Dror RO, Klepeis JL, Arkin IT, Jensen MO, Xu H, Trbovic N, Friesner RA, Palmer AG, Shaw DE. The Journal of Physical Chemistry B. 2008;112(19):6155–6158. [PMC free article] [PubMed]

3. Mura C, McCammon JA. Nucleic Acids Res. 2008 Sep;36(15):4941–4955. [PMC free article] [PubMed]

4. Sugita Y, Okamoto Y. Chemical Physics Letters. 1999;314(1–2):141–151.

5. Faraldo-Gomez JD, Roux B. J Comput Chem. 2007;28(10):1634–1647. [PubMed]

6. Ciccotti G, Ferrario M, Hynes JT, Kapral R. Chemical Physics. 1989;129(2):241–251.

7. Torrie GM, Valleau JP. Journal of Computational Physics. 1977;23(2):187–199.

8. Hamelberg D, Mongan J, McCammon JA. J Chem Phys. 2004;120(24):11919–11929. [PubMed]

9. Markwick PRL, Bouvignies G, Blackledge M. J Am Chem Soc. 2007;129(15):4724–4730. [PubMed]

10. Shen T, Hamelberg D. J Chem Phys 2008 Jul. 21;129(3):034103. [PubMed]

11. Fajer M, Hamelberg D, McCammon JA. Journal of Chemical Theory and Computation. 2008;4(10):1565–1569. [PMC free article] [PubMed]

12. Fukunishi H, Watanabe O, Takada S. The Journal of Chemical Physics. 2002;116(20):9058–9067.

13. Kumar S, Bouzida D, Swendsen RH, Kollman PA, Rosenberg JM. J Comput Chem. 1992;13(8):1011–1021.

14. Xu C, Wang J, Liu H. Journal of Chemical Theory and Computation. 2008;4(8):1348–1359.

15. Kobrak MN. J Comput Chem. 2003 Sep;24(12):1437–1446. [PubMed]

16. Maragakis P, Spichty M, Karplus M. Phys Rev Lett 2006 Mar. 17;96(10):100602. [PubMed]

17. Shirts MR, Chodera JD. The Journal of Chemical Physics. 2008;129(12):124105. [PubMed]

18. Beutler TC, Mark AE, van Schaik RC, Gerber PR, van Gunsteren WF. Chemical Physics Letters. 1994;222(6):529–539.

19. Zacharias M, Straatsma TP, McCammon JA. The Journal of Chemical Physics. 1994;100(12):9025–9031.

20. Abraham MJ, Gready JE. Journal of Chemical Theory and Computation. 2008;4(7):1119–1128.

21. Tan Z. Journal of the American Statistical Association. 2004;99(468):1027–1036.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |