Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2864003

Formats

Article sections

- Abstract
- Introduction
- Fitting Approach
- Computational Details
- Results and Discussion
- Conclusions
- Supplementary Material
- References

Authors

Related links

J Mol Model. Author manuscript; available in PMC 2010 May 4.

Published in final edited form as:

Published online 2008 May 6. doi: 10.1007/s00894-008-0305-0

PMCID: PMC2864003

NIHMSID: NIHMS196768

Department of Pharmaceutical Sciences, School of Pharmacy, University of Maryland, Baltimore, 20 Penn St., HSF II-629, Baltimore, MD 21201, USA

The publisher's final edited version of this article is available at J Mol Model

See other articles in PMC that cite the published article.

We present a general conformational-energy fitting procedure based on Monte Carlo simulated annealing (MCSA) for application in the development of molecular mechanics force fields. Starting with a target potential energy surface and an unparameterized molecular mechanics potential energy surface, an optimized set of either dihedral or grid-based correction map (CMAP) parameters is produced that minimizes the root mean squared error (*RMSE*) between the parameterized and targeted energies. The fitting is done using an MCSA search in parameter space and consistently converges to the same *RMSE* irrespective of the randomized parameters used to seed the search. Any number of dihedral parameters can be simultaneously parameterized, allowing for fitting to multi-dimensional potential energy scans. Fitting options for dihedral parameters include non-uniform weighting of the target data, constraining multiple optimized parameters to the same value, constraining parameters to be no greater than a user-specified maximum value, including all or only a subset of multiplicities defining the dihedral Fourier series, and optimization of phase angles in addition to force constants. The dihedral parameter fitting algorithm’s performance is characterized through multi-dimensional fitting of cyclohexane, tetrahydropyran, and hexopyranose monosaccharide energetics, with the latter case having a 30-dimensional parameter space. The CMAP fitting is applied in the context of polypeptides, and is used to develop a parameterization that simultaneously captures the , ψ energetics of the alanine dipeptide and the alanine tetrapeptide. Because the dihedral energy term is common to many force fields, we have implemented the dihedral-fitting algorithm in the portable Python scripting language and have made it freely available as Supplementary Material.

The continued increase in the speed of computers and the ease-of-use of computational chemistry software has led to the widespread application of computational methods to chemical and biological problems. Molecular mechanics (MM) force fields are now a routinely-used computational tool in the study of biological systems such are proteins, nucleic acids, carbohydrates, and lipids, and highly-optimized force-fields are available for these types of systems [1]. However, force fields for condensed-phase simulations of small-molecules of medicinal interest lag behind, primarily because the wide span of chemical space requires that a very large number of parameters need to be developed in order to support the simulation of arbitrary chemical entities of medicinal interest. Whereas the nonbonded parameters for small molecules can often be well-assigned either by manual inspection and analogy to previously-parameterized molecules or in an automated fashion [2–4], the bonded parameters can pose more difficulty as the probability of having an unparameterized connectivity in the molecule of interest becomes successively larger for bonds (two consecutive atoms), valence angles (three consecutive atoms), and dihedral or torsion angles (four consecutive atoms).

The ease-of-use of many quantum mechanical (QM) software packages and the wide availability of computing power has made QM geometry optimization accessible to most interested researchers, and missing bond and angle parameters can be quickly developed by taking parameters for a similar connectivity and manually adjusting them to match the QM data. However, the development of dihedral parameters to match QM conformational scans is a more difficult task because multiple conformational geometries and energies must be simultaneously fit. Additionally, because multi-dimensional QM scans are now tractable and drug-like molecules often contain more than one dihedral degree-of-freedom about rotatable bonds, simultaneous fitting of multiple dihedral parameters becomes a desirable goal. Such a task would benefit from a general, well-characterized, and easy-to-use automated dihedral parameterization software, such as the one we present here. The methodology we describe and characterize includes conformational energy fitting to the CMAP grid-based cross-map energy term [5, 6], which was originally introduced in the context of the CHARMM protein force field [7] to further refine the energetics of the protein polypeptide backbone (i.e. , ψ or Ramachandran energy surface) relative to using only dihedral energies.

The dihedral angle energy of a molecule in a given conformation, *E*_{dihedral}, in a molecular mechanics (MM) force-field representation is commonly determined by

$${E}_{\text{dihedral}}=\sum _{j}^{\text{dihedrals}}\sum _{n}^{\text{multiplicities}}{K}_{j,n}[1+cos(n{\chi}_{j}-{\sigma}_{j,n})].$$

(1)

The dihedral angle *χ _{j}* is defined by the bonded sequence of atoms

Typical target data for fitting are the energies from QM adiabatic potential energy scans. For a hypothetical four-atom molecule with connectivity *1-2-3-4*, a relaxed potential energy scan would be done for rotation about bond *2–3* in the QM representation. The same scan would then be done in the MM representation using all force field terms (bonds, angles, van der Waals, electrostatics, etc.) but with the *K _{j}*

In practice, the series in Equation 1 is often truncated at *n* = 3, reflecting the 3-fold nature of the energy profile for rotation around the bond connecting two *sp*^{3}-hybridized centers. The *n* = 1 and *n* = 2 terms are useful for reproducing local minima and barriers of different heights, as well as for capturing the energetics of scans in which one or both of the central two atoms is *sp*^{2}-hybridized. Additionally, the *n* = 6 term can be useful in the case where the two central atoms are *sp*^{2}- and *sp*^{3}-hybridized, respectively, and the former has two and the latter has three identical substituents, as in e.g. benzyl sulfonate. Also in practice, the phase angle *σ _{j,n}* is typically constrained to be either 0° or 180°. This preserves the symmetry of the function about

The truncation of the dihedral Fourier series and the constraint of symmetry about *χ _{j}* = 0 mean that arbitrary difference-potentials cannot be fit exactly. In place of an exact fit, a target function that measures the difference between the QM and MM energies is optimized. A common target function is the root-mean squared error between these energies,

$$\mathit{RMSE}=\sqrt{\frac{{\displaystyle \sum _{i}}{({E}_{i}^{\text{QM}}-{E}_{i}^{\text{MM}}+c)}^{2}}{{\displaystyle \sum _{i}}1}},$$

(2)

where the sum is over all conformations *i* of the molecule in the scan,
${E}_{i}^{\text{QM}}$ is the QM energy of conformation *i*,
${E}_{i}^{\text{MM}}$ is the total MM energy, including the energy of the dihedrals for which the parameters are being optimized (Equation 1), and *c* is a constant that vertically aligns the data as the optimization proceeds and is defined by

$$\frac{\partial \mathit{RMSE}}{\partial c}=0.$$

(3)

Any number of methods can be used to optimize *RMSE*. In the case where only the *K _{j,n}* ’s and not the

Here we present details of a general dihedral parameter fitting algorithm that uses Metropolis Monte Carlo [14] to optimize *RMSE*. In addition to being an efficient way of searching parameter space, the Monte Carlo method allows for the introduction of various additional options into the optimization process. One such option is a constraint on the maximum value of the *K _{j}*

$$\mathit{RMSE}=\sqrt{\frac{{\displaystyle \sum _{i}}{w}_{i}{({E}_{i}^{\text{QM}}-{E}_{i}^{\text{MM}}+c)}^{2}}{{\displaystyle \sum _{i}}{w}_{i}}}$$

(4)

where *w _{i}* is a weight factor for conformation

The above approach can also be extended to the parameterization of the grid-based cross map term (CMAP) employed in the CHARMM protein force field to accurately reproduce the conformational energetics of the polypeptide backbone [5, 6]. The CMAP energy is a function of two dihedral angles simultaneously. In the case of the polypeptide backbone, the CMAP energy is a function of the backbone dihedrals and ψ. The CMAP parameters are simply the difference potential energies between the QM and MM dipeptide surfaces calculated at 15° increments of and ψ, and an interpolation function is used for calculating the CMAP energies for off-grid /ψ values. This approach can almost exactly reproduce the target QM surface for the alanine dipeptide. However, exact reproduction of all energies is not possible if, for example, in addition to the dipeptide conformational energies, tetrapeptide conformational energies are also targeted.

In order to include both dipeptide and tetrapeptide conformational energies in the MCSA fitting, the quantity *RMSE*_{CMAP} is targeted:

$${\mathit{RMSE}}_{\mathit{CMAP}}=\frac{({w}_{\mathit{dipeptide}}\ast {\mathit{RMSE}}_{\mathit{dipeptide}})+({w}_{\mathit{tetrapeptide}}\ast {\mathit{RMSE}}_{\mathit{tetrapeptide}})}{{w}_{\mathit{dipeptide}}+{w}_{\mathit{tetrapeptide}}}.$$

(5)

Here, *RMSE _{dipeptide}* and

QM energies were at the MP2/cc-pVTZ//MP2/6–31G(d) level for pyranose monosaccharides and at the MP2/cc-pVTZ level for cyclohexane and tetrahydropyran [15–17]. Alanine dipeptide and tetrapeptide energies were at the RI-MP2/cc-pVTZ//MP2/6–31G(d) level [18–20]. MP2 and RI-MP2 data used in dihedral parameter fitting were from relaxed potential energy scans calculated using the Gaussian03 [21] and Q-Chem [22] software packages, respectively. MM energies were those from MM-optimized geometries (gradient < 10^{−3} kcal*mol^{−1}*Å^{−1}) with an infinite nonbonded cutoff and harmonic restraints with a force constants of 1000 kcal*mol^{−1}*degree^{−2} on all dihedral angles that have parameters to be fit, and were computed using the CHARMM software [23] and the steepest descent [24] and conjugate gradient [25] optimizers implemented therein. Parameters for the MM calculations were the CHARMM22 all-atom protein set [7], additive CHARMM parameters for cyclohexane and ethers [26], and parameters under development for the monosaccharides (parameter set “combo*” in [27]).

Input for the dihedral fitting program consists of a file containing the QM energies, another file containing the MM energies calculated with the dihedral parameters to be fit set to 0, and a separate file for each unparameterized dihedral angle containing the values of that dihedral. The list of data in each file must represent the same ordering of conformations, and the first line in each of the dihedral angle files contains the four atom-types for that dihedral. Based on these atom types, the program automatically equivalences the parameters for all dihedrals that consist of the same atoms types. Thus, in fitting a two-dimensional scan of the C_{1}C_{2}C_{3}C_{4} and C_{2}C_{3}C_{4}C_{5} dihedrals in *n*-pentane, the parameters of the two dihedrals would automatically be constrained to be the same if the atom types were specified such that C_{1} = C_{5} and C_{2} = C_{3} = C_{4}. The user can choose to further equivalence any other dihedrals even though they may have different atom types, and also decide what multiplicities (*n* in Equation 1) should be used for each equivalenced group. If non-uniform weighting of the conformations is desired, a file containing the weight-factor *w _{i}* for each conformation

$${T}_{m}={T}_{0}exp(-4m/{m}_{max}),$$

(6)

where *T*_{0} is the starting temperature, *m* is the current Monte Carlo step number, *m*_{max} is the maximum number of Monte Carlo steps, and *T _{m}* is the temperature at step

$$\mathrm{\Delta}E={\mathit{RMSE}}_{m}-{\mathit{RMSE}}_{m-1}$$

(7)

and the *RMSE* at every step is recalculated using Equation 4 and the constraint Equation 3.

The implementation of the dihedral fitting, called “fit_dihedral”, is in the Python scripting language (www.python.org) and uses only the “math”, “random”, “string”, and “sys” libraries, which are a standard part of the Python distribution. fit_dihedral is freely available through the internet at http://www.pharmacy.umaryland.edu/faculty/amackere/.

The CMAP fitting also is implemented in Python, but writes out a new CMAP parameter file after each Monte Carlo step and calls the CHARMM program to calculate the energy, in contrast to fit_dihedral, in which Equation 1 is implemented directly in Python and makes no calls to external programs.

Paramaterization of the ring C-C-C-C dihedrals in cyclohexane is an illustrative example of the automatic equivalencing of dihedral parameters. The energy surface for rotation about the *χ*_{1} dihedral is complicated due to a barrier-crossing at *χ*_{1}= −15° as the molecule goes from the global energy-minimum boat conformation to the local energy-minimum chair conformation. During the *χ*_{1} = C_{1}C_{2}C_{3}C_{4} scan from −100° to +100°, the five other C-C-C-C *χ* dihedrals also undergo changes in value. In the case of *χ*_{2} and *χ*_{6}, this change spans over 100° as the molecule goes from a twist-boat conformation to a chair conformation (Figure 1a). Because each of the six *χ* dihedrals is composed of the same atom types, the algorithm automatically constrains the *K _{j}*

Tetrahydropyran, in which one of cyclohexane’s methylene groups is replaced by a ring ether, presents a case of simultaneous fitting. Taking the C-C-C-C dihedral parameter from cyclohexane leaves two pairs of equivalenced dihedrals, C_{1}O_{1}C_{5}C_{4}/C_{5}O_{1}C_{1}C_{2} (*χ*_{1} /*χ*_{3}) and O_{1}C_{1}C_{2}C_{3}/O_{1}C_{5}C_{5}C_{3} (*χ*_{2}/*χ*_{4}), to be fit. QM scans of *χ*_{1} and of *χ*_{2} show that, like cyclohexane, the other *χ* values in the system vary simultaneously as these two are scanned (Figure 2). Inputting *χ*_{1}, *χ*_{2}, *χ*_{3}, and *χ*_{4} leads to automatic equivalencing of *χ*_{1} to *χ*_{3} and *χ*_{2} to *χ*_{4} based on atom type, and the corresponding *K*’s are simultaneously optimized. Using the same optimization protocol as for cyclohexane leads to convergence to the same *RMSE* and nearly-identical *K* values in each of five annealing runs seeded with randomized *K*’s. The final optimized values in the five runs are *K*_{1,3} = *K*_{3,3} = 0.19, 0.20, 0.21, 0.20, and 0.19 kcal/mol and *K*_{2,3} = *K*_{4,3} = 0.33, 0.31, 0.33, 0.31, and 0.31 kcal/mol, and the respective phase angles are 0° and 180°. Using only 3-fold parameters leads to a modest reduction of *RMSE* from 0.98 kcal/mol to 0.92 kcal/mol, reflecting the good agreement with the target data prior to fitting. There are nonetheless specific conformations that benefit from the optimization process, in particular conformations with *χ*_{1} = 40, 50, *χ*_{2} = 10, 20, and *χ*_{C1C2C3C4} = −50, −40, −30, which in the respective scans of these dihedrals come to match the QM energies post-optimization, compared to over-estimation of these energies by up to 1.1 kcal/mol prior to optimization (Figure 3, Table 1).

Variation in the tetrahydropyran *χ*_{1}, *χ*_{2}, *χ*_{3}, and *χ*_{4} dihedrals during relaxed potential energy scans of *χ*_{1}(a) and *χ*_{2} (b). *χ*_{1} = C_{1}O_{1}C_{5}C_{4}, *χ*_{2} = O_{1}C_{1}C_{2}C_{3}, *χ*_{3} = C_{5}O_{1}C_{1}C_{2} (*χ*_{1}/*χ* **...**

Tetrahydropyran *χ*_{1} (C_{1}O_{1}C_{5}C_{4}) (a), *χ*_{2} (O_{1}C_{1}C_{2}C_{3}) (b), and *χ*_{C1C2C3C4} (c) relaxed potential energy scans. QM: MP2/cc-pVTZ (+), MM before fit: *K*_{1,3} = *K*_{3,3} = *K*_{2,3} = *K*_{4,3} = 0 kcal/mol (dotted line), MM after fit: *K*_{1,3} = *K*_{3,3} = 0.20 **...**

Fragment-based approaches to parameter development divide the molecule of interest into smaller fragments, thereby reducing the number of atoms as well as the number of dihedral degrees-of-freedom and making QM relaxed potential energy scans tractable. However, increasing computer power has made direct QM scans of many-atom molecules with multiple dihedral degrees-of-freedom possible. Dihedral parameters derived from these more complicated scans are preferable to relying on the transferability of dihedral parameters from smaller fragments since dihedral parameters are critically important to the conformational energetics of flexible molecules.

An illustrative case of the importance of QM scans of the complete molecule vs. smaller fragments is the diastereomers of the hexopyranose form of glucose. A chirality change at C_{1} converts α-*D*-glucose to β-*D*-glucose, while similar changes at the C_{2}, C_{3}, and C_{4} positions yield α-*D*-mannose, α-*D*-allose, and α-*D*-galactose, respectively (Figure 4). These changes place the hydroxyl groups in differing local chemical environments, which cannot be captured using a fragment-based approach, for example by using cyclohexanol as the model compound due to the extensive number of intramolecular hydrogen bonds between hydroxyls in the sugar. Additionally, rotation of the O_{5}C_{5}C_{6}O_{6} dihedral and the C_{6} hydroxyl allow for hydrogen-bonding of this “exocyclic” hydroxyl with the O_{5} ring ether or the C_{4} hydroxyl, posing a further complication to a fragment-based approach.

Epimers of α-*D*-glucose. Starting from the upper-right molecule and going clockwise around α-*D*-glucose (center molecule), a chirality change at C_{1} yields β-*D*-glucose, at C_{2} yields α-*D*-mannose, at C_{3} yields α-*D*-allose, **...**

To characterize the MCSA fitting algorithm, we apply it to fitting the energetics of 1860 hexopyranose conformations comprising hydroxyl, exocyclic group, and ring deformation scans of a variety of glucopyranose diastereomers (Table 2). The resultant parameters will be applicable to the various diastereomers so that, for example, glucose, galactose, and mannose will have the identical parameter set. Transferring dihedral parameters from alkanes, tetrahydropyran, and ethylene still leaves undetermined the parameters for 13 hexopyranose dihedrals (hydroxyl rotation: H_{1}O_{1}C_{1}C_{2}, H_{1}O_{1}C_{1}O_{5}, H_{2}O_{2}C_{2}C_{1}, H_{2}O_{2}C_{2}C_{3}, H_{3}O_{3}C_{3}C_{2}, H_{3}O_{3}C_{3}C_{4}, H_{4}O_{4}C_{4}C_{3}, H_{4}O_{4}C_{4}C_{5}, C_{5}C_{6}O_{6}H_{6}; ring deformation: O_{5}C_{1}C_{2}O_{2}, O_{1}C_{1}O_{5}C_{5}, O_{5}C_{5}C_{4}O_{4}; exocyclic-group rotation: O_{5}C_{5}C_{6}O_{6}). Automatic equivalencing based on atom-type (H_{2}O_{2}C_{2}C_{3} = H_{3}O_{3}C_{3}C_{2} = H_{3}O_{3}C_{3}C_{4} = H_{4}O_{4}C_{4}C_{3}) reduces this to 10 unique dihedrals, and allowing for *n* = 1, 2, 3 multiplicity for each of these means 10 × 3 = 30 dihedral parameters to be simultaneously parameterized. Thus, this example represents an extreme case of fitting in multi-dimensional parameter space.

In contrast to the much simpler cases of cyclohexane and tetrahydropyran, in which the optimal parameters were determined in the first several hundred steps of 5000-step exponential-cooling Monte Carlo runs, the 30-dimensional fit to the pyranose energetics shows much slower convergence behavior. Using exponential cooling (Figure 5a), the maximum number of steps must be set to 50,000 in order to consistently converge to the same *RMSE* value in each of ten Monte Carlo runs, while runs of 500 or 5000 steps are insufficient (Figure 5c, e). Using a constant-temperature scheme at 35 K (Figure 5b), which yields a Monte Carlo acceptance ratio of 0.2 to 0.3, the results are the same in that consistent convergence is seen only to runs of 50,000 steps (Figure 5d, f). The advantage of the MCSA with exponential cooling as opposed to constant-temperature Monte Carlo is that the user does not have to take a trial-and-error approach to finding a temperature that yields a reasonable acceptance ratio.

Fitting results for the hexopyranoses in the exponential cooling (a) or constant temperature (b) schemes. *RMSE* for ten runs as a function of the maximum number of steps and of exponential cooling (c) or constant temperature (d) Monte Carlo. Progress of **...**

While the search problem is much more difficult in this example versus the prior cases, it is nonetheless possible to achieve converged *RMSE* results when simultaneously fitting 30 dihedral parameters. Another contrast with the simpler systems is that, though converged behavior is achieved with respect to *RMSE*, the parameters themselves show significant variability, both in the magnitude of each *K _{j,n}* as well as whether the associated

Effect of weighting conformations in a ring-deformation scan of β-galactopyranose during fitting to 1860 hexopyranose (Table 2) MP2/cc-pVTZ//MP2/6–31G(d) (QM, +) conformational energies. The weighted fit had *w*_{i}’s (Equation 4) of **...**

The data set of hexopyranose conformations is populated mostly with ring conformations in the ^{4}C_{1} chair form (Figure 4). In order to properly capture the energetics of chair-to-boat conversion, which are influenced by the O_{5}C_{1}C_{2}O_{2}, O_{1}C_{1}O_{5}C_{5}, and O_{5}C_{5}C_{4}O_{4} dihedral parameters, scans of these ring dihedrals were included in the fit (Table 2, “ring” and “all”). The number of non-chair conformations resulting from these scans is dwarfed by the number of chair conformations from the other scans. As a result, this data set is heavily weighted toward the development of optimized parameters that reproduce chair energetics at the potential expense of boat energetics. This indeed does turn out to be the case in practice for β-*D*-galactopyranose, where the O_{5}C_{1}C_{2}O_{2} scan is qualitatively incorrect relative to the QM in the absence of weighting, with the chair and boat conformations being isoenergetic instead of separated by 5 kcal/mol (Figure 6).

The problem of under-represented conformations can be corrected simply by increasing the weight factors *w _{i}* for these conformations (Equation 4). The choice of weight factors is an empirical task and may take several iterations of choosing different

The previous two examples involved fitting exclusively the *D* forms of hexopyranose monosaccharides. Nonetheless, the parameters from those unweighted and weighted fits are transferable to the *L* forms of these sugars since the phase angles *σ _{j,n}* were constrained to 0°/180° so as to preserve the symmetry of Equation 1 about

Taking the parameter set from the prior unweighted fit to 1860 hexopyranose conformational energies, we reoptimized just the O_{5}C_{5}C_{6}O_{6} dihedral parameters that determine the energetics of exocylic-group rotation by allowing for −180° ≤ *σ _{j,n}* ≤ 180° and using β-

β-*D*-galactopyranose O_{5}C_{5}C_{6}O_{6} dihedral parameter fitting with constraints (constrained *σ*, dotted line) and without constraints (variable *σ*, solid line) on the phase angles. MP2/cc-pVTZ//MP2/6–31G(d) (QM, +) data for fitting **...**

Breaking the symmetry of Equation 1 by allowing phase-angle variability means that the *L* enantiomers of these β-*D*-galactopyranose conformations will have different energies using this parameter set, which is chemically incorrect. Thus, increased accuracy comes at the cost of decreased generality. If possible, it is preferable to improve the fit through the use of non-uniform weighting of conformations instead of removing the constraints on the phase-angle parameters. Nonetheless, there may be particular instances when allowing variable phase angles is desirable, especially in biological systems where often only one enantiomer is found (e.g. amino acids) or is relevant (e.g. chiral drugs). In such instances, the MCSA fitting approach is able to obtain converged optimization of both the *σ _{j}*

The CHARMM all-atom force field functional form was recently extended so as to better-reproduce the conformational energetics of the polypeptide backbone in proteins. The extension involved the introduction of a new energy term, CMAP, that is a grid-based energy correction map and is a function of the backbone /ψ angles [5, 6]. Just as the dihedral energy term in Equation 1 seeks to reproduce the difference energy between the MM surface with the target dihedrals set to zero and the QM surface as a function of the dihedral angle χ, the CMAP energy term reproduces the difference energy between the QM and MM surfaces as a function of the and ψ angles simultaneously. That is, *E _{CMAP}* =

Using the CMAP energy term, it is possible to exactly reproduce any difference energy as a function of /ψ. Thus, in the case of e.g. alanine dipeptide, the entire adiabatic QM /ψ surface can be reproduced by the MM model, which is not the case using only dihedral terms for and ψ, as previously discussed [6]. In practice, based on data from protein crystal simulations, it was found that an empirical adjustment to the exact QM dipeptide surface was required to better-capture the conformational properties of polypeptides [6].

In an effort to further refine the current CMAP parameterization [6], we have adapted the described MCSA fitting protocol to simultaneously fit CMAP parameters to alanine dipeptide and alanine tetrapeptide relative conformational energies. The target QM dipeptide and tetrapeptide data were single-point RI-MP2/cc-pVTZ energies calculated from MP2/6-31G(d)-optimized geometries. While the dipeptide data consisted of the entire /ψ surface, the tetrapeptide date consisted of 51 structurally distinct conformations derived by clustering conformations sampled by MM molecular dynamics simulations [29]. For the purposes of fitting, the dipeptide and tetrapeptide data were given equal weighting (*w _{dipeptide}* =

The 153 /ψ values sampled in the 51 alanine tertrapeptide conformations are illustrated in Figure 8a and populate all regions of /ψ space observed in protein crystal structures [30]. Figure 8a also shows the CMAP grid points whose parameters were allowed to vary by ±2 kcal/mol during the fitting. To retain the smoothness of the surface, whenever one of these parameters was changed by *δE*, the parameters of all the adjacent grid points were changed by 0.5**δE*. Adjacent grid points not part of the set shown in Figure 8a were allowed to vary at most by ±1 kcal/mol relative to their starting values. With the starting CMAP parameters, *RMSE _{tetrapeptide}* = 1.53 kcal/mol, which reflects the considerable scatter in the MM tetrapeptide energies relative to the QM target energies and the common occurrence of errors as large as 2 kcal/mol (Figure 8b). In contrast, after MCSA fitting to combined tetrapeptide and dipeptide QM relative energies, the tetrapeptide MM energies are greatly improved, with most errors reduced to less than 0.5 kcal/mol (Figure 8b) and a final

Simultaneous alanine tetrapeptide and dipeptide energetic fitting. a) /ψangles in 51 alanine tetrapeptide conformations (x’s) and the corresponding CMAP grid points (squares). b) Alanine tetrapeptide energies at the RI-MP2/cc-pVTZ//MP2/6–31G(d) **...**

The improvement in the tetrapeptide energies results from three subtle changes to the starting (i.e. exact QM) alanine dipeptide surface (Figures 8c and 8d). First, the local minimum at /ψ = −165°/165° in the extended-backbone region of /ψ space has been shifted slightly to −120°/135°. Second, the local minimum at −150°/30° in the helical region has been raised by ~1 kcal/mol and is no longer a local minimum. And third, the local minimum at 60°/−75° has been increased by less than 1 kcal/mol in energy. Qualitatively, the “before” and “after” dipeptide surfaces remain very similar, and the RMSE of the MCSA fit surface, *RMSE _{dipeptide}*, is 0.33 kcal/mol compared to a starting value of 0 kcal/mol.

1000 MCSA steps were sufficient to achieve the improvements in the alanine tetrapeptide energies. The *RMSE _{CMAP}* (Eq 5) went from an initial value of 0.77 kcal/mol to a final value of 0.45 kcal/mol. Since changes to a single CMAP parameter only affect conformations with /ψ values very close to that grid point, and therefore lead to small changes in

We have presented and characterized a Monte Carlo simulated annealing (MCSA) conformational-energy fitting algorithm for use in the development of molecular mechanics force fields. For the fitting of dihedral parameters, the algorithm consistently converges to the same optimized parameters, and therefore to the same value of the target function *RMSE* (Equation 4), when the number of parameters to be fit is small. In a case of very high dimensionality (30 dihedral parameters to be fit), multiple MCSA runs also converge to a very narrow range of *RMSE*. However, the parameters that yield near-identical optimized *RMSE*’s can be qualitatively different, illustrating that there are multiple minima in dihedral parameter space that offer “best fits” to the target data. Extending the algorithm to the fitting of the CHARMM force-field CMAP term shows that the MCSA approach is also an effective way to develop CMAP parameters that find a balance between, in this case, alanine dipeptide and alanine tetrapeptide energetics. This approach to CMAP and multi-dimensional dihedral fitting is expected to prove useful in future applications such as parameterizing the energetics of the nucleic acid backbone, of glycosyl linkages in polysaccharides, and of flexible drug-like small-molecules.

This work was supported by NIH GM070855 and GM051501 (ADM) and F32CA1197712 (OG). The authors wish to thank Professor Carlos Simmerling for sharing alanine tetrapeptide conformations, and to acknowledge generous grants of computer time from the National Cancer Institute Advanced Biomedical Computing Center and Department of Defense High Performance Computing.

1. MacKerell AD. J Comput Chem. 2004;25:1584–1604. [PubMed]

2. Halgren TA. J Comput Chem. 1996;17:490–519.

3. Wang JM, Wolf RM, Caldwell JW, Kollman PA, Case DA. J Comput Chem. 2004;25:1157–1174. [PubMed]

4. Wang JM, Wang W, Kollman PA, Case DA. J Mol Graphics Modell. 2006;25:247–260. [PubMed]

5. MacKerell AD, Feig M, Brooks CL. J Am Chem Soc. 2004;126:698–699. [PubMed]

6. MacKerell AD, Feig M, Brooks CL. J Comput Chem. 2004;25:1400–1415. [PubMed]

7. MacKerell AD, Bashford D, Bellott M, Dunbrack RL, Evanseck JD, Field MJ, Fischer S, Gao J, Guo H, Ha S, Joseph-McCarthy D, Kuchnir L, Kuczera K, Lau FTK, Mattos C, Michnick S, Ngo T, Nguyen DT, Prodhom B, Reiher WE, Roux B, Schlenkrich M, Smith JC, Stote R, Straub J, Watanabe M, Wiorkiewicz-Kuczera J, Yin D, Karplus M. J Phys Chem B. 1998;102:3586–3616. [PubMed]

8. Kaminski GA, Friesner RA, Tirado-Rives J, Jorgensen WL. J Phys Chem B. 2001;105:6474–6487.

9. Wang JM, Kollman PA. J Comput Chem. 2001;22:1219–1228.

10. Park S, Radmer RJ, Klein TE, Pande VS. J Comput Chem. 2005;26:1612–1616. [PubMed]

11. Okur A, Strockbine B, Hornak V, Simmerling C. J Comput Chem. 2003;24:21–31. [PubMed]

12. Maxwell DS, Tirado-Rives J. Fitpar. Yale University; New Haven, CT: 1994.

13. Norrby PO, Liljefors T. J Comput Chem. 1998;19:1146–1166.

14. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. J Chem Phys. 1953;21:1087–1092.

15. Moller C, Plesset MS. Phys Rev. 1934;46:618–622.

16. Hariharan PC, Pople JA. Theor Chim Acta. 1973;28:213–222.

17. Woon DE, Dunning TH., Jr J Chem Phys. 1993;98:1358–1371.

18. Feyereisen M, Fitzgerald G, Komornicki A. Chem Phys Lett. 1993;208:359–363.

19. Weigend F, Haser M. Theor Chem Acc. 1997;97:331–340.

20. Distasio RA, Steele RP, Rhee YM, Shao YH, Head-Gordon M. J Comput Chem. 2007;28:839–856. [PubMed]

21. Frisch MJ, Trucks GW, Schlegel HB, Scuseria GE, Robb MA, Cheeseman JR, Montgomery JA, Vreven T, Jr, Kudin KN, Burant JC, Millam JM, Iyengar SS, Tomasi J, Barone V, Mennucci B, Cossi M, Scalmani G, Rega N, Petersson GA, Nakatsuji H, Hada M, Ehara M, Toyota K, Fukuda R, Hasegawa J, Ishida M, Nakajima T, Honda K, Kitao O, Nakai H, Klene M, Li TW, Knox JE, Hratchian HP, Cross JB, Adamo C, Jaramillo J, Gomperts R, Stratmann RE, Yazyev O, Austin AJ, Cammi R, Pomelli C, Ochterski JW, Ayala PY, Morokuma K, Voth GA, Salvador P, Dannenberg JJ, Zakrzewski VG, Dapprich S, Daniels AD, Strain MC, Farkas O, Malick DK, Rabuck AD, Raghavachari K, Foresman JB, Ortiz JV, Cui Q, Baboul AG, Clifford S, Cioslowski J, Stefanov BB, Liu G, Liashenko A, Piskorz P, Komaromi I, Martin RL, Fox DJ, Keith T, Al-Laham MA, Peng CY, Nanayakkara A, Challacombe M, Gill PMW, Johnson B, Chen W, Wong MW, Gonzalez C, Pople JA. Gaussian 03. Gaussian, Inc; Pittsburgh, PA: 2003.

22. Shao Y, Molnar LF, Jung Y, Kussmann J, Ochsenfeld C, Brown ST, Gilbert ATB, Slipchenko LV, Levchenko SV, O’Neill DP, DiStasio RA, Lochan RC, Wang T, Beran GJO, Besley NA, Herbert JM, Lin CY, Van Voorhis T, Chien SH, Sodt A, Steele RP, Rassolov VA, Maslen PE, Korambath PP, Adamson RD, Austin B, Baker J, Byrd EFC, Dachsel H, Doerksen RJ, Dreuw A, Dunietz BD, Dutoi AD, Furlani TR, Gwaltney SR, Heyden A, Hirata S, Hsu CP, Kedziora G, Khalliulin RZ, Klunzinger P, Lee AM, Lee MS, Liang W, Lotan I, Nair N, Peters B, Proynov EI, Pieniazek PA, Rhee YM, Ritchie J, Rosta E, Sherrill CD, Simmonett AC, Subotnik JE, Woodcock HL, Zhang W, Bell AT, Chakraborty AK, Chipman DM, Keil FJ, Warshel A, Hehre WJ, Schaefer HF, Kong J, Krylov AI, Gill PMW, Head-Gordon M. Phys Chem Chem Phys. 2006;8:3172–3191. [PubMed]

23. Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M. J Comput Chem. 1983;4:187–217.

24. Levitt M, Lifson S. J Mol Biol. 1969;46:269–279. [PubMed]

25. Fletcher R, Reeves C. Comput J. 1964;7:149–154.

26. Vorobyov I, Anisimov VM, Greene S, Venable RM, Moser A, Pastor RW, MacKerell AD. J Chem Theory Comput. 2007;3:1120–1133. [PubMed]

27. Guvench O, Greene SN, Kamath G, Brady JW, Venable RM, Pastor RW, MacKerell AD. in progress. [PMC free article] [PubMed]

28. Kirkpatrick S, Gelatt CD, Vecchi MP. Science. 1983;220:671–680. [PubMed]

29. Hornak V, Abel R, Okur A, Strockbine B, Roitberg A, Simmerling C. Proteins: Struct, Funct, Bioinf. 2006;65:712–725. [PMC free article] [PubMed]

30. Lovell SC, Davis IW, Adrendall WB, de Bakker PIW, Word JM, Prisant MG, Richardson JS, Richardson DC. Proteins: Struct, Funct, Bioinf. 2003;50:437–450. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |