|Home | About | Journals | Submit | Contact Us | Français|
We present a general conformational-energy fitting procedure based on Monte Carlo simulated annealing (MCSA) for application in the development of molecular mechanics force fields. Starting with a target potential energy surface and an unparameterized molecular mechanics potential energy surface, an optimized set of either dihedral or grid-based correction map (CMAP) parameters is produced that minimizes the root mean squared error (RMSE) between the parameterized and targeted energies. The fitting is done using an MCSA search in parameter space and consistently converges to the same RMSE irrespective of the randomized parameters used to seed the search. Any number of dihedral parameters can be simultaneously parameterized, allowing for fitting to multi-dimensional potential energy scans. Fitting options for dihedral parameters include non-uniform weighting of the target data, constraining multiple optimized parameters to the same value, constraining parameters to be no greater than a user-specified maximum value, including all or only a subset of multiplicities defining the dihedral Fourier series, and optimization of phase angles in addition to force constants. The dihedral parameter fitting algorithm’s performance is characterized through multi-dimensional fitting of cyclohexane, tetrahydropyran, and hexopyranose monosaccharide energetics, with the latter case having a 30-dimensional parameter space. The CMAP fitting is applied in the context of polypeptides, and is used to develop a parameterization that simultaneously captures the , ψ energetics of the alanine dipeptide and the alanine tetrapeptide. Because the dihedral energy term is common to many force fields, we have implemented the dihedral-fitting algorithm in the portable Python scripting language and have made it freely available as Supplementary Material.
The continued increase in the speed of computers and the ease-of-use of computational chemistry software has led to the widespread application of computational methods to chemical and biological problems. Molecular mechanics (MM) force fields are now a routinely-used computational tool in the study of biological systems such are proteins, nucleic acids, carbohydrates, and lipids, and highly-optimized force-fields are available for these types of systems . However, force fields for condensed-phase simulations of small-molecules of medicinal interest lag behind, primarily because the wide span of chemical space requires that a very large number of parameters need to be developed in order to support the simulation of arbitrary chemical entities of medicinal interest. Whereas the nonbonded parameters for small molecules can often be well-assigned either by manual inspection and analogy to previously-parameterized molecules or in an automated fashion [2–4], the bonded parameters can pose more difficulty as the probability of having an unparameterized connectivity in the molecule of interest becomes successively larger for bonds (two consecutive atoms), valence angles (three consecutive atoms), and dihedral or torsion angles (four consecutive atoms).
The ease-of-use of many quantum mechanical (QM) software packages and the wide availability of computing power has made QM geometry optimization accessible to most interested researchers, and missing bond and angle parameters can be quickly developed by taking parameters for a similar connectivity and manually adjusting them to match the QM data. However, the development of dihedral parameters to match QM conformational scans is a more difficult task because multiple conformational geometries and energies must be simultaneously fit. Additionally, because multi-dimensional QM scans are now tractable and drug-like molecules often contain more than one dihedral degree-of-freedom about rotatable bonds, simultaneous fitting of multiple dihedral parameters becomes a desirable goal. Such a task would benefit from a general, well-characterized, and easy-to-use automated dihedral parameterization software, such as the one we present here. The methodology we describe and characterize includes conformational energy fitting to the CMAP grid-based cross-map energy term [5, 6], which was originally introduced in the context of the CHARMM protein force field  to further refine the energetics of the protein polypeptide backbone (i.e. , ψ or Ramachandran energy surface) relative to using only dihedral energies.
The dihedral angle energy of a molecule in a given conformation, Edihedral, in a molecular mechanics (MM) force-field representation is commonly determined by
The dihedral angle χj is defined by the bonded sequence of atoms 1-2-3-4, and the sum over multiplicities n is a Fourier series with coefficients of Kj,n and phase angles σj,n. Thus, in principle, it is possible to reproduce any periodic function, and therefore any rotational energy profile, using Equation 1.
Typical target data for fitting are the energies from QM adiabatic potential energy scans. For a hypothetical four-atom molecule with connectivity 1-2-3-4, a relaxed potential energy scan would be done for rotation about bond 2–3 in the QM representation. The same scan would then be done in the MM representation using all force field terms (bonds, angles, van der Waals, electrostatics, etc.) but with the Kj,n ’s of the dihedrals being optimized set to 0, and a difference potential would be calculated by subtraction of the QM and MM energy profiles. This difference potential would be fit exactly using Equation 1 and the resulting Kj,n ’s and σj,n ’s would be the dihedral parameters.
In practice, the series in Equation 1 is often truncated at n = 3, reflecting the 3-fold nature of the energy profile for rotation around the bond connecting two sp3-hybridized centers. The n = 1 and n = 2 terms are useful for reproducing local minima and barriers of different heights, as well as for capturing the energetics of scans in which one or both of the central two atoms is sp2-hybridized. Additionally, the n = 6 term can be useful in the case where the two central atoms are sp2- and sp3-hybridized, respectively, and the former has two and the latter has three identical substituents, as in e.g. benzyl sulfonate. Also in practice, the phase angle σj,n is typically constrained to be either 0° or 180°. This preserves the symmetry of the function about χj = 0, which in turn ensures that a molecule and its mirror image have the exact same dihedral energy for the same Kj,n ’s and σj,n ’s. A final comment about Equation 1 is that changing the sign of Kj,n yields the same-shaped curve as changing σj,n from 0° to 180°. Thus, a common convention in force-field parameters is to constrain all Kj,n ’s to be non-negative and allow for σj,n values of both 0° and 180°.
The truncation of the dihedral Fourier series and the constraint of symmetry about χj = 0 mean that arbitrary difference-potentials cannot be fit exactly. In place of an exact fit, a target function that measures the difference between the QM and MM energies is optimized. A common target function is the root-mean squared error between these energies, RMSE,
where the sum is over all conformations i of the molecule in the scan, is the QM energy of conformation i, is the total MM energy, including the energy of the dihedrals for which the parameters are being optimized (Equation 1), and c is a constant that vertically aligns the data as the optimization proceeds and is defined by
Any number of methods can be used to optimize RMSE. In the case where only the Kj,n ’s and not the σj,n ’s are to be fit, the problem can be expressed as a linear system of equations, with the Kj,n ’s the coefficients to be solved for. In such a case, this “least squares” approach is the most direct method and gives the optimal solution, and has been applied to dipeptide conformational energetics . Other possibilities, which allow for fitting of the σj,n’s, include systematic searching of parameter space , self-consistent iteration , genetic algorithms [9, 11], and the use of simplex, Fletcher-Powell, and Newton-Raphson minimizers [12, 13].
Here we present details of a general dihedral parameter fitting algorithm that uses Metropolis Monte Carlo  to optimize RMSE. In addition to being an efficient way of searching parameter space, the Monte Carlo method allows for the introduction of various additional options into the optimization process. One such option is a constraint on the maximum value of the Kj,n ’s so as to produce physically reasonable parameters. Another is the equivalencing of Kj,n ’s for two or more different values of j. Equivalencing is useful in the case of a linear molecule like 1-2-3-4-5 in which a two-dimensional scan consisting of rotations about bonds 2–3 and 3–4 is used as target data and one wishes to enforce K1–2–3–4,n = K2–3–4–5,n. Weighting of different conformations can be done by extending Equation 2 to
where wi is a weight factor for conformation i and can, for example, be used to favor more accurate fitting of low-energy conformations while sacrificing the fit of high-energy ones. Simultaneous fitting of multiple Kj,n ’s to data having a multiple number of dimensions is readily done and, as shown subsequently, RMSE convergence is achievable even for very-high dimensionality. Finally, the phase angles (σj,n) can be allowed to vary as part of the optimization process.
The above approach can also be extended to the parameterization of the grid-based cross map term (CMAP) employed in the CHARMM protein force field to accurately reproduce the conformational energetics of the polypeptide backbone [5, 6]. The CMAP energy is a function of two dihedral angles simultaneously. In the case of the polypeptide backbone, the CMAP energy is a function of the backbone dihedrals and ψ. The CMAP parameters are simply the difference potential energies between the QM and MM dipeptide surfaces calculated at 15° increments of and ψ, and an interpolation function is used for calculating the CMAP energies for off-grid /ψ values. This approach can almost exactly reproduce the target QM surface for the alanine dipeptide. However, exact reproduction of all energies is not possible if, for example, in addition to the dipeptide conformational energies, tetrapeptide conformational energies are also targeted.
In order to include both dipeptide and tetrapeptide conformational energies in the MCSA fitting, the quantity RMSECMAP is targeted:
Here, RMSEdipeptide and RMSEtetrapeptide are defined independently by Equation 4 for the dipeptide and tetrapeptide data. Importantly, the constant c in Equation 4 varies independently for the two sets of data. The weight factors wdipeptide and wtetrapeptide can be chosen to bias the fit toward either the dipeptide or tetrapeptide data.
QM energies were at the MP2/cc-pVTZ//MP2/6–31G(d) level for pyranose monosaccharides and at the MP2/cc-pVTZ level for cyclohexane and tetrahydropyran [15–17]. Alanine dipeptide and tetrapeptide energies were at the RI-MP2/cc-pVTZ//MP2/6–31G(d) level [18–20]. MP2 and RI-MP2 data used in dihedral parameter fitting were from relaxed potential energy scans calculated using the Gaussian03  and Q-Chem  software packages, respectively. MM energies were those from MM-optimized geometries (gradient < 10−3 kcal*mol−1*Å−1) with an infinite nonbonded cutoff and harmonic restraints with a force constants of 1000 kcal*mol−1*degree−2 on all dihedral angles that have parameters to be fit, and were computed using the CHARMM software  and the steepest descent  and conjugate gradient  optimizers implemented therein. Parameters for the MM calculations were the CHARMM22 all-atom protein set , additive CHARMM parameters for cyclohexane and ethers , and parameters under development for the monosaccharides (parameter set “combo*” in ).
Input for the dihedral fitting program consists of a file containing the QM energies, another file containing the MM energies calculated with the dihedral parameters to be fit set to 0, and a separate file for each unparameterized dihedral angle containing the values of that dihedral. The list of data in each file must represent the same ordering of conformations, and the first line in each of the dihedral angle files contains the four atom-types for that dihedral. Based on these atom types, the program automatically equivalences the parameters for all dihedrals that consist of the same atoms types. Thus, in fitting a two-dimensional scan of the C1C2C3C4 and C2C3C4C5 dihedrals in n-pentane, the parameters of the two dihedrals would automatically be constrained to be the same if the atom types were specified such that C1 = C5 and C2 = C3 = C4. The user can choose to further equivalence any other dihedrals even though they may have different atom types, and also decide what multiplicities (n in Equation 1) should be used for each equivalenced group. If non-uniform weighting of the conformations is desired, a file containing the weight-factor wi for each conformation i is read prior to starting the Monte Carlo search, allowing for example the application of Boltzmann weighting to all points. Two possible temperature schemes are available for the Monte Carlo procedure, a constant-temperature or a simulated-annealing  protocol with an exponential-cooling schedule of
where T0 is the starting temperature, m is the current Monte Carlo step number, mmax is the maximum number of Monte Carlo steps, and Tm is the temperature at step m. The difference in energy, ΔE, between step m and step m−1 used in the Metropolis exchange criterion is
The implementation of the dihedral fitting, called “fit_dihedral”, is in the Python scripting language (www.python.org) and uses only the “math”, “random”, “string”, and “sys” libraries, which are a standard part of the Python distribution. fit_dihedral is freely available through the internet at http://www.pharmacy.umaryland.edu/faculty/amackere/.
The CMAP fitting also is implemented in Python, but writes out a new CMAP parameter file after each Monte Carlo step and calls the CHARMM program to calculate the energy, in contrast to fit_dihedral, in which Equation 1 is implemented directly in Python and makes no calls to external programs.
Paramaterization of the ring C-C-C-C dihedrals in cyclohexane is an illustrative example of the automatic equivalencing of dihedral parameters. The energy surface for rotation about the χ1 dihedral is complicated due to a barrier-crossing at χ1= −15° as the molecule goes from the global energy-minimum boat conformation to the local energy-minimum chair conformation. During the χ1 = C1C2C3C4 scan from −100° to +100°, the five other C-C-C-C χ dihedrals also undergo changes in value. In the case of χ2 and χ6, this change spans over 100° as the molecule goes from a twist-boat conformation to a chair conformation (Figure 1a). Because each of the six χ dihedrals is composed of the same atom types, the algorithm automatically constrains the Kj,n ’s to be the same, where j runs from 1 to 6. Using only the n = 3 multiplicity and constraining Kj,3 to be in the range [−3:3] kcal/mol, five 5000-step simulated-annealing runs were seeded with random K values in this range. Since only a single multiplicity is used and equivalencing is in effect, only a single parameter K is being varied to optimize RMSE as a function of K and the values of the six simultaneously changing χ dihedrals. From a starting temperature of T0 = 1000 K, all five runs converge to the same parameter value of 0.19 kcal/mol with a phase angle of 180° and reduce the RMSE of 0.53 kcal/mol for K = 0 kcal/mol to an RMSE of 0.38 kcal/mol for K = 0.19 kcal/mol, resulting in correction of the barrier height and chair conformation (χ1 = 60°) energies, which were both too high by nearly 1 kcal/mol prior to parametrization (Figure 1b).
Tetrahydropyran, in which one of cyclohexane’s methylene groups is replaced by a ring ether, presents a case of simultaneous fitting. Taking the C-C-C-C dihedral parameter from cyclohexane leaves two pairs of equivalenced dihedrals, C1O1C5C4/C5O1C1C2 (χ1 /χ3) and O1C1C2C3/O1C5C5C3 (χ2/χ4), to be fit. QM scans of χ1 and of χ2 show that, like cyclohexane, the other χ values in the system vary simultaneously as these two are scanned (Figure 2). Inputting χ1, χ2, χ3, and χ4 leads to automatic equivalencing of χ1 to χ3 and χ2 to χ4 based on atom type, and the corresponding K’s are simultaneously optimized. Using the same optimization protocol as for cyclohexane leads to convergence to the same RMSE and nearly-identical K values in each of five annealing runs seeded with randomized K’s. The final optimized values in the five runs are K1,3 = K3,3 = 0.19, 0.20, 0.21, 0.20, and 0.19 kcal/mol and K2,3 = K4,3 = 0.33, 0.31, 0.33, 0.31, and 0.31 kcal/mol, and the respective phase angles are 0° and 180°. Using only 3-fold parameters leads to a modest reduction of RMSE from 0.98 kcal/mol to 0.92 kcal/mol, reflecting the good agreement with the target data prior to fitting. There are nonetheless specific conformations that benefit from the optimization process, in particular conformations with χ1 = 40, 50, χ2 = 10, 20, and χC1C2C3C4 = −50, −40, −30, which in the respective scans of these dihedrals come to match the QM energies post-optimization, compared to over-estimation of these energies by up to 1.1 kcal/mol prior to optimization (Figure 3, Table 1).
Fragment-based approaches to parameter development divide the molecule of interest into smaller fragments, thereby reducing the number of atoms as well as the number of dihedral degrees-of-freedom and making QM relaxed potential energy scans tractable. However, increasing computer power has made direct QM scans of many-atom molecules with multiple dihedral degrees-of-freedom possible. Dihedral parameters derived from these more complicated scans are preferable to relying on the transferability of dihedral parameters from smaller fragments since dihedral parameters are critically important to the conformational energetics of flexible molecules.
An illustrative case of the importance of QM scans of the complete molecule vs. smaller fragments is the diastereomers of the hexopyranose form of glucose. A chirality change at C1 converts α-D-glucose to β-D-glucose, while similar changes at the C2, C3, and C4 positions yield α-D-mannose, α-D-allose, and α-D-galactose, respectively (Figure 4). These changes place the hydroxyl groups in differing local chemical environments, which cannot be captured using a fragment-based approach, for example by using cyclohexanol as the model compound due to the extensive number of intramolecular hydrogen bonds between hydroxyls in the sugar. Additionally, rotation of the O5C5C6O6 dihedral and the C6 hydroxyl allow for hydrogen-bonding of this “exocyclic” hydroxyl with the O5 ring ether or the C4 hydroxyl, posing a further complication to a fragment-based approach.
To characterize the MCSA fitting algorithm, we apply it to fitting the energetics of 1860 hexopyranose conformations comprising hydroxyl, exocyclic group, and ring deformation scans of a variety of glucopyranose diastereomers (Table 2). The resultant parameters will be applicable to the various diastereomers so that, for example, glucose, galactose, and mannose will have the identical parameter set. Transferring dihedral parameters from alkanes, tetrahydropyran, and ethylene still leaves undetermined the parameters for 13 hexopyranose dihedrals (hydroxyl rotation: H1O1C1C2, H1O1C1O5, H2O2C2C1, H2O2C2C3, H3O3C3C2, H3O3C3C4, H4O4C4C3, H4O4C4C5, C5C6O6H6; ring deformation: O5C1C2O2, O1C1O5C5, O5C5C4O4; exocyclic-group rotation: O5C5C6O6). Automatic equivalencing based on atom-type (H2O2C2C3 = H3O3C3C2 = H3O3C3C4 = H4O4C4C3) reduces this to 10 unique dihedrals, and allowing for n = 1, 2, 3 multiplicity for each of these means 10 × 3 = 30 dihedral parameters to be simultaneously parameterized. Thus, this example represents an extreme case of fitting in multi-dimensional parameter space.
In contrast to the much simpler cases of cyclohexane and tetrahydropyran, in which the optimal parameters were determined in the first several hundred steps of 5000-step exponential-cooling Monte Carlo runs, the 30-dimensional fit to the pyranose energetics shows much slower convergence behavior. Using exponential cooling (Figure 5a), the maximum number of steps must be set to 50,000 in order to consistently converge to the same RMSE value in each of ten Monte Carlo runs, while runs of 500 or 5000 steps are insufficient (Figure 5c, e). Using a constant-temperature scheme at 35 K (Figure 5b), which yields a Monte Carlo acceptance ratio of 0.2 to 0.3, the results are the same in that consistent convergence is seen only to runs of 50,000 steps (Figure 5d, f). The advantage of the MCSA with exponential cooling as opposed to constant-temperature Monte Carlo is that the user does not have to take a trial-and-error approach to finding a temperature that yields a reasonable acceptance ratio.
While the search problem is much more difficult in this example versus the prior cases, it is nonetheless possible to achieve converged RMSE results when simultaneously fitting 30 dihedral parameters. Another contrast with the simpler systems is that, though converged behavior is achieved with respect to RMSE, the parameters themselves show significant variability, both in the magnitude of each Kj,n as well as whether the associated σj,n is 0° or 180°. For example, in the 10 independent 50,000-step MCSA runs that converge to RMSE values spanning only a 0.09 kcal/mol window (1.74 to 1.83 kcal/mol, Figures 6c and 6e), the n = 3 term for the H1O1C1O5 dihedral takes on parameter values ranging from K = 3.00 kcal/mol, σ = 0° to K = 0.53 kcal/mol, σ = 180° (i.e. K = −0.53 kcal/mol, σ = 0°). Likewise, the n = 1 term for the O1C1O5C5 dihedral takes on parameter values spanning the range K = 2.74 kcal/mol down to K = −1.83 kcal/mol. The complete set of K values for each of the ten runs is listed in Table 3, along with the corresponding RMSE for each run. The standard deviations in the fit parameters K are as large as 1.27 kcal/mol, in stark contrast the standard deviation of 0.03 kcal/mol for the RMSE values of the ten independent runs. Thus, parameter space for such a complicated case is populated with multiple minima in different regions of the parameter space, with each minimum having a near-identical RMSE value.
The data set of hexopyranose conformations is populated mostly with ring conformations in the 4C1 chair form (Figure 4). In order to properly capture the energetics of chair-to-boat conversion, which are influenced by the O5C1C2O2, O1C1O5C5, and O5C5C4O4 dihedral parameters, scans of these ring dihedrals were included in the fit (Table 2, “ring” and “all”). The number of non-chair conformations resulting from these scans is dwarfed by the number of chair conformations from the other scans. As a result, this data set is heavily weighted toward the development of optimized parameters that reproduce chair energetics at the potential expense of boat energetics. This indeed does turn out to be the case in practice for β-D-galactopyranose, where the O5C1C2O2 scan is qualitatively incorrect relative to the QM in the absence of weighting, with the chair and boat conformations being isoenergetic instead of separated by 5 kcal/mol (Figure 6).
The problem of under-represented conformations can be corrected simply by increasing the weight factors wi for these conformations (Equation 4). The choice of weight factors is an empirical task and may take several iterations of choosing different wi’s to get the desired results. In the present example, applying a weight factor of 5 to conformations in the scan with dihedral values of 75° to 150° is sufficient to correct their under-representation and achieve dramatic improvement in chair vs. boat energetics (Figure 6). As with the unweighted fitting, exponential cooling over 50,000 Monte Carlo steps is sufficient to converge the RMSE of ten independent runs. The RMSE of the best unweighted fit, 1.74 kcal/mol, increases negligibly to 1.78 kcal/mol with this weighting scheme. Weight factors must be applied judiciously so as to balance the effect of the increased weighting of some conformations on the energies of the other conformations. If a sizable minority of conformations is given large weight factors, the resultant near-exact fitting of the energetics of their respective conformations will come at the expense of the energetics of the rest of the conformations. For example, Boltzmann weighting based on the target QM energies often yields accurate parameterization of low-energy conformations. However, Boltzmann weighting can also cause inaccurate energies for conformations located at or near high-energy barriers, which in turn compromise the dynamics of the model. In our experience, wi’s of less than or equal to five are typically appropriate for the under-represented conformations, assuming wi’s of unity for the rest of the target data.
The previous two examples involved fitting exclusively the D forms of hexopyranose monosaccharides. Nonetheless, the parameters from those unweighted and weighted fits are transferable to the L forms of these sugars since the phase angles σj,n were constrained to 0°/180° so as to preserve the symmetry of Equation 1 about χj = 0°. It is possible to further refine the parameters by removing this constraint. Of course, the resultant increase in accuracy comes at the expense of decreased transferability of the parameters. In particular, a pair of enantiomers will require unique phase angles σj,n.
Taking the parameter set from the prior unweighted fit to 1860 hexopyranose conformational energies, we reoptimized just the O5C5C6O6 dihedral parameters that determine the energetics of exocylic-group rotation by allowing for −180° ≤ σj,n ≤ 180° and using β-D-galactopyranose O5C5C6O6 scans as target data. The additional degrees-of-freedom afforded by variability in the σj,n’s yields a significant improvement in the force field’s ability to reproduce the QM target data (Figure 7). This 6-dimensional fitting (j = O5C5C6O6, n = 1, 2, 3, both Kj,n and σj,n allowed to vary), like the low-dimension cyclohexane and tetrahydropyran fits, showed convergence both in RMSE as well as the actual values of all of the parameters in multiple 5000-step exponential cooling runs seeded with random parameter values and started at 1000 K.
Breaking the symmetry of Equation 1 by allowing phase-angle variability means that the L enantiomers of these β-D-galactopyranose conformations will have different energies using this parameter set, which is chemically incorrect. Thus, increased accuracy comes at the cost of decreased generality. If possible, it is preferable to improve the fit through the use of non-uniform weighting of conformations instead of removing the constraints on the phase-angle parameters. Nonetheless, there may be particular instances when allowing variable phase angles is desirable, especially in biological systems where often only one enantiomer is found (e.g. amino acids) or is relevant (e.g. chiral drugs). In such instances, the MCSA fitting approach is able to obtain converged optimization of both the σj,n ’s and the Kj,n ’s.
The CHARMM all-atom force field functional form was recently extended so as to better-reproduce the conformational energetics of the polypeptide backbone in proteins. The extension involved the introduction of a new energy term, CMAP, that is a grid-based energy correction map and is a function of the backbone /ψ angles [5, 6]. Just as the dihedral energy term in Equation 1 seeks to reproduce the difference energy between the MM surface with the target dihedrals set to zero and the QM surface as a function of the dihedral angle χ, the CMAP energy term reproduces the difference energy between the QM and MM surfaces as a function of the and ψ angles simultaneously. That is, ECMAP = f (,ψ) where f (,ψ) is constructed by two-dimensional bi-cubic interpolation through grid points located in /ψ space . These grid points are evenly placed at 15° increments of and ψ, and each grid point has associated with it a difference energy. The difference energies at these grid points are the CMAP parameters.
Using the CMAP energy term, it is possible to exactly reproduce any difference energy as a function of /ψ. Thus, in the case of e.g. alanine dipeptide, the entire adiabatic QM /ψ surface can be reproduced by the MM model, which is not the case using only dihedral terms for and ψ, as previously discussed . In practice, based on data from protein crystal simulations, it was found that an empirical adjustment to the exact QM dipeptide surface was required to better-capture the conformational properties of polypeptides .
In an effort to further refine the current CMAP parameterization , we have adapted the described MCSA fitting protocol to simultaneously fit CMAP parameters to alanine dipeptide and alanine tetrapeptide relative conformational energies. The target QM dipeptide and tetrapeptide data were single-point RI-MP2/cc-pVTZ energies calculated from MP2/6-31G(d)-optimized geometries. While the dipeptide data consisted of the entire /ψ surface, the tetrapeptide date consisted of 51 structurally distinct conformations derived by clustering conformations sampled by MM molecular dynamics simulations . For the purposes of fitting, the dipeptide and tetrapeptide data were given equal weighting (wdipeptide = wtetrapeptide in Eq 5), and the CMAP parameters (i.e. offset energies at the grid points) within a ±2 kcal/mol window were sampled. The starting CMAP parameters were those that exactly reproduced the dipeptide surface such that RMSEdipeptide = 0.
The 153 /ψ values sampled in the 51 alanine tertrapeptide conformations are illustrated in Figure 8a and populate all regions of /ψ space observed in protein crystal structures . Figure 8a also shows the CMAP grid points whose parameters were allowed to vary by ±2 kcal/mol during the fitting. To retain the smoothness of the surface, whenever one of these parameters was changed by δE, the parameters of all the adjacent grid points were changed by 0.5*δE. Adjacent grid points not part of the set shown in Figure 8a were allowed to vary at most by ±1 kcal/mol relative to their starting values. With the starting CMAP parameters, RMSEtetrapeptide = 1.53 kcal/mol, which reflects the considerable scatter in the MM tetrapeptide energies relative to the QM target energies and the common occurrence of errors as large as 2 kcal/mol (Figure 8b). In contrast, after MCSA fitting to combined tetrapeptide and dipeptide QM relative energies, the tetrapeptide MM energies are greatly improved, with most errors reduced to less than 0.5 kcal/mol (Figure 8b) and a final RMSEtetrapeptide of 0.57 kcal/mol.
The improvement in the tetrapeptide energies results from three subtle changes to the starting (i.e. exact QM) alanine dipeptide surface (Figures 8c and 8d). First, the local minimum at /ψ = −165°/165° in the extended-backbone region of /ψ space has been shifted slightly to −120°/135°. Second, the local minimum at −150°/30° in the helical region has been raised by ~1 kcal/mol and is no longer a local minimum. And third, the local minimum at 60°/−75° has been increased by less than 1 kcal/mol in energy. Qualitatively, the “before” and “after” dipeptide surfaces remain very similar, and the RMSE of the MCSA fit surface, RMSEdipeptide, is 0.33 kcal/mol compared to a starting value of 0 kcal/mol.
1000 MCSA steps were sufficient to achieve the improvements in the alanine tetrapeptide energies. The RMSECMAP (Eq 5) went from an initial value of 0.77 kcal/mol to a final value of 0.45 kcal/mol. Since changes to a single CMAP parameter only affect conformations with /ψ values very close to that grid point, and therefore lead to small changes in RMSECMAP and hence small ΔE’s (Eq 7), a starting temperature T0 (Eq 6) of 1 K gave good MC acceptance ratios during the annealing. The low-temperature annealing also makes the MCSA more akin to a minimization, which is appropriate given the fact that a single CMAP parameter only affects the energies of conformations nearby in /ψ space. This is in contrast to the dihedral parameter fitting, where changes in a dihedral parameter affect the energies of all conformations, necessitating a higher T0 so as to not get trapped in local RMSE minima while searching parameter space.
We have presented and characterized a Monte Carlo simulated annealing (MCSA) conformational-energy fitting algorithm for use in the development of molecular mechanics force fields. For the fitting of dihedral parameters, the algorithm consistently converges to the same optimized parameters, and therefore to the same value of the target function RMSE (Equation 4), when the number of parameters to be fit is small. In a case of very high dimensionality (30 dihedral parameters to be fit), multiple MCSA runs also converge to a very narrow range of RMSE. However, the parameters that yield near-identical optimized RMSE’s can be qualitatively different, illustrating that there are multiple minima in dihedral parameter space that offer “best fits” to the target data. Extending the algorithm to the fitting of the CHARMM force-field CMAP term shows that the MCSA approach is also an effective way to develop CMAP parameters that find a balance between, in this case, alanine dipeptide and alanine tetrapeptide energetics. This approach to CMAP and multi-dimensional dihedral fitting is expected to prove useful in future applications such as parameterizing the energetics of the nucleic acid backbone, of glycosyl linkages in polysaccharides, and of flexible drug-like small-molecules.
This work was supported by NIH GM070855 and GM051501 (ADM) and F32CA1197712 (OG). The authors wish to thank Professor Carlos Simmerling for sharing alanine tetrapeptide conformations, and to acknowledge generous grants of computer time from the National Cancer Institute Advanced Biomedical Computing Center and Department of Defense High Performance Computing.