|Home | About | Journals | Submit | Contact Us | Français|
Accelerated molecular dynamics (aMD) has been shown to enhance conformational space sampling relative to classical molecular dynamics; however, the exponential reweighting of aMD trajectories, which is necessary for the calculation of free energies relating to the classical system, is oftentimes problematic, especially for systems larger than small poly peptides. Here, we propose a method of accelerating only the degrees of freedom most pertinent to sampling, thereby reducing the total acceleration added to the system and improving the convergence of calculated ensemble averages, which we term selective aMD. Its application is highlighted in two biomolecular cases. First, the model system alanine dipeptide is simulated with classical MD, all-dihedral aMD, and selective aMD, and these results are compared to the infinite sampling limit as calculated with metadynamics. We show that both forms of aMD enhance the convergence of the underlying free energy landscape by 5-fold relative to classical MD; however, selective aMD can produce improved statistics over all-dihedral aMD due to the improved reweighting. Then we focus on the pharmaceutically relevant case of computing the free energy of the decoupling of oseltamivir in the active site of neuraminidase. Results show that selective aMD greatly reduces the cost of this alchemical free energy transformation, whereas all-dihedral aMD produces unreliable free energy estimates.
Molecular dynamics (MD) simulations have become a crucial theoretical tool in advancing our understanding of the function of biological macromolecules.(1) Advances in algorithms(2) and computing power3−5 continue to allow for simulations of increasingly larger systems on longer and longer time scales, permitting the direct observation of all-atom protein folding,6,7 the observation of ion permeation through a transmembrane channel,(8) and the simulation of a complete virus.(9) Despite the remarkable progress that has been made in the field, simulation times still often fall far short of the miscosecond to millisecond time scales inherent in many biological processes. There have been several methodological advances which have aimed at simulating longer time scales within current computational power such as implicit solvation models,(10) multiple time stepping algorithms,(11) and improved treatment of long-range electrostatics.(12) Sampling of phase space may also be enhanced through the deformation of the underlying potential energy surface,(13) as has been done in hyperdynamics,(14) puddle jumping,(15) conformational flooding,(16) and the local boost method(17) (to name only a few). Our group recently developed a method to enhance the crossing of barriers and the sampling of phase space termed accelerated molecular dynamics (aMD),(18) which has been shown to enhance sampling of biomolecular systems as in the conformational switching of Ras,(19) improve the agreement between experimental and calculated chemical shifts for IκBα,(20) and accelerate the calculation of pKa values in lysozyme.(21)
An area of particular interest concerns using aMD simulations to calculate ensemble averages for physically relevant nonaccelerated systems. For systems in which the energy added to accelerate the system is low, trajectories may easily be reweighted; however, as the system size increases the boost energy required for significant acceleration increases, causing the exponential reweighting factor to produce ensemble averages that are dominated by only a few configurations with high weight, thereby decreasing the precision of thermodynamic quantities such as free energy changes.(22) Here, we propose limiting the acceleration to the degrees of freedom most responsible for conformational changes, thereby reducing the energy added to a system to enhance sampling and resulting in improved reweighting statistics. This work is an extension of a previous study in which aMD was limited to the dihedrals in the backbone of a peptide substrate bound to cyclophilin;(23) however, here we demonstrate that accelerated dihedrals may contain atoms which are also contained in nonaccelerated torsions (that is, individual molecules may contain both accelerated and nonaccelerated torsions). Two examples are highlighted. First, we show that even in the case of the model system alanine dipeptide, selectively targeting dihedrals in the molecule’s backbone results in similar acceleration levels while reducing the amount of energy added to the system. Then we turn our attention to the larger problem of calculating the binding energy of the clinically approved drug oseltamivir (marketed as Tamiflu by Roche Pharmaceuticals, Basel, Switzerland(24)) to the N1 flu protein neuraminidase.25,26 Using free energy perturbation with a modification to the Bennett acceptance ratio (to account for reweighting),(27) we show that the computational cost required to accurately calculate the binding energy may be reduced by as much as 70% while maintaining a similar level of precision.
To enhance phase space sampling, the original aMD applies an additional potential only when the potential energy, V(r), is below a specified criterion E to produce dynamics on the artificial landscape V*(r) such that
The form of the “boost” potential ΔV(r) is defined as
The formalism of this boost potential has several practical advantages: it produces a potential energy surface with a smooth first derivative, it does not require the definition of a “reaction coordinate” along which to enhance sampling, it reflects the shape of the original potential, and it is relatively simple with only two adjustable parameters (E and α).
Simulation results generated with aMD may be reweighted by the exponential of the boost potential, exp(βΔV(r)), to recover theoretically exact thermodynamic properties for the physically relevant unaccelerated system. In practice, however, the exponential dependence of the reweighting hinders convergence as trajectory averages become dominated by a smaller subset of their snapshots as the range of boost potentials increases. While this does not severely affect small systems such as alanine dipeptide, it does prevent the use of aMD in accurate free energy calculation in larger biomolecular systems. To improve the reweighting statistics and enhance sampling, several variants of aMD have been developed including “barrier lowering” aMD,(28) replica-exchange aMD,(29) and adaptive aMD (personal communication P. Markwick). In this paper we discuss an extension to aMD which may be incorporated into other aMD implementations, which is to selectively accelerate a user-defined subset of dihedrals most pertinent to sampling the relevant degrees of freedom, which we refer to as selective aMD. Selective aMD has the advantage that by only accelerating the degrees of freedom most important to sampling, lower overall boosts may be utilized to achieve a similar acceleration level, thus resulting in improved reweighting statistics. The idea of enhancing sampling along a user-defined manifold has been previously shown to improve the calculation of time-correlation functions for kinetics of multidimensional systems.(30)
Free energy perturbation (FEP) is a well-established technique used in free-energy calculations, specifically in the case of ligand binding and computational alchemy.31,13 In FEP, a nonphysical energy pathway is constructed between two physical end states, for example, a ligand bound in the active site of an enzyme (which we denote as λ = 0) and an active site without the ligand (λ = 1). The path between these two states is divided into a series of “windows” in which the Hamiltonian is transformed from state 0 to 1. Traditionally, free energy differences between successive windows are estimated by exponentially averaging the instantaneous work of going between the states, and the overall free energy is a sum of free energy differences between windows.(32) Shirts et al. showed that the Bennett acceptance ratio (BAR) was superior to exponential averaging in producing asymptotically unbiased free energy estimates between two states that could improve precision by an order of magnitude.27,33
For a series of work functions between two states in which individual works do not each have the same weight (as in aMD), the derivation of a weighted BAR follows that in Shirts et al. with the exception that their eqs 5 and 6, the probability of a single measurement of the work Wi for the forward and reverse work functions, are modified to
where the constant M is redefined as
Therefore, the value of ΔF that solves
is the optimal free energy estimate between adjacent windows. It has recently been shown that reweighting of states in BAR to account for non-Boltzmann sampling may have practical advantages outside of aMD simulations.(34)
MD simulations were performed with the MD package Desmond (version 2.2) developed by D. E. Shaw Research.(35) Both systems were built, solvated, and ionized with Schrödinger’s Maestro modeling suite such that there was a minimum of 12 Å of TIP3P(36) water buffer between the macromolecule and the periodic boundary and an ionic concentration of ~150 mM NaCl was present. The CHARMM22 force field with the CMAP correction was utilized (except where noted below in the neuraminidase calculations).(37) Following 10000 steps of minimization, systems were continuously heated to 300 K over 1.5 ps. All simulations used the Martyna−Tobias−Klein constant pressure and temperature algorithms (a combination of Nosè−Hoover constant temperature and piston constant pressure algorithms)38,39 with a reference temperature and pressure of 300 K and 1.01425 bar, respectively. Short-range nonbonded interactions were truncated at 12 Å, while long-range electrostatics were calculated with a particle-mesh Ewald algorithm using a sixth-order B-spline for interpolation and a grid spacing of <1 Å in each dimension.(40) A time step of 2 fs was employed, and the M-SHAKE algorithm was used for constraining all hydrogen-containing bonds.(41) A plugin was written for Desmond to perform aMD calculations on specified dihedrals.
An alanine dipeptide molecule based on a model compound was solvated in a (27 Å)3 box using Maestro and equilibrated for 5 ns following heating. Two sets of 50 ns aMD trajectories were run, one in which all dihedrals were accelerated with aMD and one in which only the two dihedrals defined as ϕ and ψ were accelerated (selective aMD, see Figure Figure1).1). For each setup, 16 simulations spanning the parameter space of E and α were run to optimize these parameters. For the all-dihedral simulations, parameters of E = 8 and α = 4 had an optimal fit to our metric χ (as discussed below), whereas for the selective aMD E = 1 and α = 0.75 were optimal. Additionally, a 250 ns classical MD simulation was performed.
To determine the accuracy of the aMD results, well-tempered metadynamics was performed to calculate the underlying two-dimensional free energy landscape in ϕ/ψ space.(42) In well-tempered metadynamics, the height of a Gaussian centered at position x is proportional to the Boltzmann weight of the metadynamics potential already present at x, that is the added Gaussian has a maximum value of ω0·exp (− Vt(x)/kΔT), with ω0 being the initial Gaussian height, −Vt(x) the metadynamics potential at x, and kΔT a user-defined energy which limits the explored energy range. We performed a 50 ns simulation in which gaussians were added every 0.2 ps with a width of 0.1 radians and a height determined by ω0 = 0.02 kcal/mol and kΔT = 2.4 kcal/mol. To quantitate convergence we define a metric χ as follows
For a given well (defined as i in Figure Figure2f)2f) this metric calculates the ratio of the population of states below the energy U (Pi,UaMD) with that expected from the metadynamics results (Pi,Uexact) and averages this over M energy values of U from 0.6 to 3 kcal/mol (in increments of 0.1 kcal/mol) for each of the N = 3 wells. If a well has a population greater than that expected from the metadynamics result, the inverse of the ratio is taken to equally account for over and under sampling of the well. This metric has several advantageous features: by averaging over multiple energy levels it selects for smooth population densities (as are observed in cMD and metadynamics but may not result when the trajectory is reweighted with aMD), it treats over and under sampling a well as equally poor, and it equally weights all three of the main energy wells. χ = 1 is considered an ideal reproduction of the exact population.
A monomer of neuraminidase bound to the inhibitor oseltamivir taken from the 2HU0 crystal structure was solvated in an approximately (70 Å)3 box. Following the minimization and heating protocol outlined above, the system was equilibrated for 5 ns with the AMBER99SB force field(43) before protein and water parameters were changed to the CHARMM22 force field and further equilibrated for 1 ns. Parameters for oseltamivir were previously developed for use with the AMBER99SB force field(44) and maintained throughout the CHARMM simulations. The switch to the CHARMM force field was performed after testing of aMD free energy calculations revealed that increased acceleration levels in the AMBER force field tended to disturb the electrostatic components of the free energy calculation; therefore, this hybrid AMBER (for the ligand) and CHARMM (for the remainder of the system) force field was utilized. While the authors concede this may produce incorrect absolute binding energies, the goal of this paper is to study convergence of accelerated free energy calculations to results obtained from unaccelerated (and longer) calculations.
Alchemical free energy calculations for the decoupling of the ligand in the protein’s active site were performed using 21 windows in which the electrostatics were decoupled over 10 windows followed by the Lennard−Jones interactions decoupled over 10 windows with a softcore potential using α = 0.5(45) (as in a previous study(46)). Free energies were calculated using BAR, with the modified BAR formulas described above used in the aMD calculations. A positional restraint of .8 kcal/mol was placed on a central carbon atom in oseltamivir to prevent the ligand from sampling nonactive site portions of the simulation box,(47) and calculations were performed for cMD, all-dihedral aMD (with E = 2600 and α = 400), and selective aMD (with E = 13 and α = 2). For cMD calculations, three sets of windows were run (as has been shown to improve calculated free energies(48)), each with the same initial coordinates but different velocities for 5 ns per window, whereas for the selective aMD the same three sets of windows were run with 200 ps of all-dihedral aMD (to quickly equilibrate the whole protein) followed by 1.5 ns of selective aMD per window. Note that the times indicated in the text include the simulation time spent in all-dihedral aMD; thus, a time of 500 ps represents 200 ps of all-dihedral and 300 ps of selective aMD. One set of FEP calculations was performed for the all-dihedral case for 1.75 ns window to illustrate the futility of using standard aMD in large-scale biomolecular FEP calculations.
The choice of dihedrals to accelerate was based on previous work in which the tetramer was simulated for 100 ns.(46) Acceleration was applied to those dihedrals which contained only heavy atoms, were in residues that had a heavy atom within 5 Å of the oseltamivir in the crystal structure, and had a multivariate distribution for a total of 29 dihedrals.
Work functions were decorrelated based on the statistical inefficiency using code provided by Shirts and Chodera.(49) For each BAR calculation, a bootstrap analysis was performed (with 50 independent calculations) for an error σB, which was combined with the variance of the three means (σV) to calculate an overall error estimate for the free energy by
The free energy landscape of alanine dipeptide can be described by the rotation of two torsional angles, ϕ and ψ, making it an ideal model system for methodological development that has been extensively studied (Figure (Figure1).1). The metadynamics, or “infinite sampling limit”, results (Figure (Figure2a)2a) show three distinct energy wells which we label for further discussion in Figure Figure2f.2f. The energy barrier between wells 1 and 2 is relatively low, and classical MD (cMD) simulations sample both sets of configurations on the 10 ns time scale (Figure (Figure2b).2b). Well 3 is substantially oversampled, which we attribute to the system becoming trapped in this state due to the higher energy barrier, thereby discouraging transitions to and from well 3. With 50 ns of simulation the sampling of all three wells are improved (Figure (Figure2c).2c). Further detail is shown in Table Table1,1, which compares the minimum energy of each well and probability of all states within 1.8 kcal/mol of that minimum to the theoretically exact answer derived from metadynamics. The 10 ns cMD shows good agreement for well 2 (it is identified as the global minimum and the population at 3kT is nearly identical to the metadynamics results); however, well 1 is undersampled by 39% whereas well 3 is oversampled by 292%, and the minimum energies are incorrect by .4 and .7 kcal/mol, respectively. With 50 ns of simulation time the sampling improves such that errors in the populations range from 11% to 20%, and by 250 ns of sampling the statistics agree much better with the metadynamics results for all three wells, although the populations of wells 1 and 3 are still off by >10%.
For comparison, the all-dihedral and selective aMD simulations sampled all three energy wells on the 10 ns time scale (Figure (Figure2d2d and and2e).2e). Free energy statistics indicate a maximum error in the minimum well energy estimate of 0.24 kcal/mol in the all-dihedral case and 0.06 kcal/mol in the selective aMD simulation, whereas the greatest disagreement in well populations was an undersampling of well 1 by 17% in the all-dihedral aMD and an undersampling of well 3 by 16% in the selective aMD. Extension of the simulations to 50 ns results in further improved statistics of the well populations, with well 1 only being undersampled by 8% in the all-dihedral simulation and by 6% in the selective aMD.
To further examine the convergence of the free energy statistics we defined the parameter χ to quantitate the difference in the two-dimensional energy profiles (as discussed in the methods) which has the property of a value of 1 representing ideal sampling of the wells as compared to the metadynamics results. In Figure Figure33 we compare the time course of this parameter between the cMD and the aMD simulations (note the different time scales for the two sets of simulations). The scores for both aMD are similar to those for cMD simulations of five times the length, with cMD simulations requiring 200 ns before consistently having values above 0.9, whereas the aMD simulations pass this value at 44 and 35 ns for all-dihedral and selective aMD, respectively.
A comparison of the boosts applied throughout the aMD simulations (Figure (Figure4a)4a) shows that higher boost potentials are applied throughout the all-dihedral simulation than in the selective aMD. For the all-dihedral case, ΔV varies from 0 to 4.2 kcal/mol, with an average value of 0.45 kcal/mol, whereas the selective aMD has a range of 0 to 0.54 kcal/mol with an average of 0.11 kcal/mol. This decrease in the applied boosts has a significant impact on reweighting as the maximum weight relative to an unaccelerated state is reduced from 1097 to 2.46. In Figure Figure4b4b the amount of total weight recovered from a simulation is plotted against the percentage of frames that contribute to that weight. In the case of all-dihedral aMD, very few frames contribute a substantial portion of the reweighting, 50% of the total weight comes from 4.8% of the trajectory, whereas 90% of the weight comes from 53.7% of the trajectory. This results in almost one-half the sampling (46.3%) contributing very little (less than 10%) to the calculated ensemble averages. In the selective aMD case the lower boosts result in more uniform weights, 50% of the weight comes from 37.9% of the trajectory and 90% from 87.6% of the trajectory (for comparison, in cMD configurations in the trajectory are uniformly weighted, so 50% of the weight comes from 50% of the trajectory). This increased reweighting efficiency improves the recovered statistics as high-boost configurations tend to dominate the ensemble average. For example, wells 1 and 2 appear significantly smoother in the energy landscape of the selective aMD relative to the all-dihedral aMD (Figure (Figure2e2e and and2d)2d) and χ is consistently higher in the selective case, both of which can be attributed to smoother statistics in the reweighted energy profiles.
The binding of oseltamivir to neuraminidase is an example of how highly accurate free energy calculations may be employed in the study and development of novel pharmaceutical compounds. In order to validate our aMD simulations of the decoupling of oseltamivir in the N1 active site, we performed extensive sampling at each step of the alchemical transformation with 5 ns of cMD simulation for each of the 21 λ values, which was repeated three times (with different initial velocities). An equilibration period is typically discarded from the BAR calculations, the length of which is determined by several factors, including molecular rigidity, the slow motions of loops which are relevant to free energy differences, and the amount of computational time available.(50) In Figure Figure5a5a we show the time evolution of the mean free energies for three equilibration times, 1000, 2000, and 3000 ps, along with the associated errors. The 3000 ps of equilibration curve is the only one which does not have a mean that changes appreciably with increased sampling, suggesting that for this initial configuration of this complex a full 3 ns per window is required for each of the λ windows to equilibrate to their new Hamiltonians. The free energy of 66.9 ± 1.2 kcal/mol is calculated by using 3 ns for equilibration and the remaining 2 ns for sampling, as shown in Table Table2.2. Note that this is not the free energy of binding in solution; rather it is only one leg of the thermodynamic cycle required for that calculation. Examination of individual BAR runs shows that increased sampling decreases both the variance between the three runs and the bootstrap errors associated with each (Figure S1, Supporting Information).
BAR results from simulations using selective aMD (with the first 200 ps utilizing all-dihedral aMD) are presented in Figure Figure5b.5b. As in the cMD calculations, we have chosen three equilibration times, 250, 500, and 750 ps, and the mean value of the free energy does not remain constant with increased sampling unless the longest of these equilibration times is discarded. With 750 ps of equilibration and 750 ps of sampling we obtain a free energy value of 67.3 ± 1.5 kcal/mol, identical (within error) to that from the longer cMD calculations. Results from individual aMD runs are shown in Figure S2, Supporting Information. A comparison of the time evolutions of the BAR results shows similar behavior for the aMD and cMD free energies (with the aMD on shorter time scales) for the short, medium, and long equilibration times (Table (Table22 and Figure S3, Supporting Information). For short equilibration periods (cMD, 1000 ps; aMD, 250 ps) the free energy is initially overestimated, and while it approaches the values obtained for longer equilibration, the bias introduced in this nonequilibrated period results in free energies ~1.5 kcal/mol too high. Medium length equilibration periods (cMD, 200 ps; aMD, 500 ps) suffer from this effect as well but are not quite as biased, whereas for long equilibration times (cMD, 3000 ps; aMD, 750 ps) the calculated free energies remain stable (within error) with increased sampling.
As a comparison, we also performed BAR calculations on a single set of windows run for 1750 ps with all-dihedral aMD (Figure S4, Supporting Information). The much larger range of weights resulted in very few configurations contributing to the BAR results and poor free energy estimates. For example, with 250 ps of sampling and 1250 ps of equilibration a free energy of −15.0 ± 17.0 kcal/mol was computed, whereas 750 ps of equilibration and 750 ps of sampling resulted in a free energy of decoupling of 36.9 ± 17.3 kcal/mol. Both of these values are wildly inaccurate compared to the values of 66.9 ± 1.2 and 67.3 ± 1.5 kcal·mol obtained from cMD and selectively applied aMD.
The method of accelerated molecular dynamics has been well established as a means of enhancing phase space sampling with minimal computational cost; however, the exponential reweighting required for the recovery of ensemble averages in the unaccelerated case introduces excessive noise such that it is often difficult, if not impossible, to recover accurate ensemble averages. Even in the case of the well-studied system alanine dipeptide this becomes evident. For example, in Figure Figure2d2d the free energy profile of well 1 appears discontinuous in the region of (ϕ, ψ) = (−50,150), due in large part to the fact that several of the conformations visited have weights 1−3 orders of magnitude below that of the maximum weighted conformation. In contrast, by limiting the acceleration to only those dihedrals which are most pertinent to phase space sampling (in this case, ϕ and ψ) the maximum weight was reduced from 1097 to 2.46, which not only increased the smoothness of the recovered free energy profile (Figure (Figure2e)2e) but also moderately improved the agreement to the infinite sampling limit (as calculated with metadynamics) results in Table Table11 and Figure Figure3.3. However, both aMD formalisms showed approximately a factor of 5-fold increase in efficiency relative to cMD for our order parameter χ.
Extension of this idea to the pharmaceutically relevant case of oseltamivir binding to neuraminidase shows the expensive free energy calculation of the decoupling of the ligand in the protein’s active site may be reduced by up to 70% (from 5 to 1.5 ns/window) without a loss in precision. While this may not be crucial in the case of studying the binding of only a single ligand to a protein, one could imagine that in the lead optimization stage of a drug-design effort, when highly accurate binding energies are necessary, the ability to examine three times the number of possible compounds at little extra computational cost may be highly desirable. In the case presented here, the accelerated dihedrals were chosen based upon extensive prior MD simulations; however, if this data were not available one could choose the accelerated dihedrals by residue type (accelerating dihedrals in residues with highly mobile side chains such as arginine and not accelerating dihedrals in aromatic rings), atom types (non-hydrogen containing), and proximity to the ligand. Additionally, in some cases where the ligand is bulky and has multiple torsions with high-energy barriers between local minima, one could accelerate dihedrals in the ligand molecules themselves, as was done in the case of cyclophilin.(23)
Selective aMD may easily be incorporated into other free energy algorithms. For example, the methods of one-step perturbation and envelope distribution sampling provide techniques for effectively calculating the binding of several ligands to a protein with a single extended MD simulation.51,52 They have, however, not been extensively utilized due to the fact that, depending on the system being studied, they may require simulations on the microseconds time scale.(53) Therefore, a reduction of the computational cost of 3- to 5-fold (as observed in this study) could reduce the necessary simulation length from the highly expensive 1−2 μs range into the manageable 200 ns time scale. These methods highlight that sampling of the partition function is, in general, a slow process requiring extensive calculations; therefore, methods such as selective aMD, which can enhance sampling of the relevant portions of phase space while not introducing excessive noise into the calculations, may prove useful in future applications.
We extend our thanks to M. Lawrenz, M. Fajer, P. Gasper, and Y. Wang for valuable discussion concerning the work presented here. The project described was supported by Award Number F32GM093581 from the National Institute of General Medical Sciences. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of General Medical Sciences or the National Institutes of Health. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Additional support has been provided by the NSF, NIH, HHMI, CTBP, NBCR, and the NSF Supercomputer Centers.
National Institutes of Health, United States
Plots of the free energies for oseltamivir decoupling as a function of sampling time for individual cMD and aMD simulations; comparisons between these cMD and aMD runs; and free energies calculated from a single all-dihedral aMD simulations. This material is available free of charge via the Internet at http://pubs.acs.org.