|Home | About | Journals | Submit | Contact Us | Français|
Parametrization of the additive all-atom CHARMM force field for acyclic polyalcohols, acyclic carbohydrates and inositol is conducted. Initial parameters were transferred from the alkanes and hexopyranose carbohydrates, with subsequent development and optimization of parameters unique to the molecules considered in this study. Using the model compounds acetone and acetaldehyde, nonbonded parameters for carbonyls were optimized targeting quantum mechanical interaction data for solute-water pairs and pure solvent thermodynamic data. Bond and angle parameters were adjusted by comparing optimized geometries to small molecule crystal survey data and by performing vibrational analyses on acetone, acetaldehyde and glycerol. C-C-C-C, C-C-C-O, C-C-OH and O-C-C-O torsional parameters for polyol chains were fit to quantum mechanical dihedral potential energy scans comprising over 1500 RIMP2/cc-pVTZ//MP2/6-31G(d) conformations using an automated Monte Carlo simulated annealing procedure. Comparison of computed condensed-phase data, including crystal lattice parameters and densities, NMR proton-proton couplings, densities and diffusion coefficients of aqueous solutions, to experimental data validated the optimized parameters. Parameter development for these compounds proved particularly challenging because of the flexibility of the acyclic sugars and polyalcohols as well as the intramolecular hydrogen bonding between vicinal hydroxyls for all of the compounds. The newly optimized additive CHARMM force field parameters are anticipated to be of utility for atomic level of detail simulations of acyclic polyalcohols, acyclic carbohydrates and inositol in solution.
Glucose and related six-carbon monosaccharides exist in aqueous solution in equilibrium between the thermodynamically-favored cyclic pyranose form and the linear aldose form. Reduction of the aldehyde functionality in the aldose form to an alcohol yields a linear polyalcohol or polyol, also commonly referred to as an “alditol” or “sugar alcohol.” Such six-carbon sugar alcohols, as well as related cyclic and shorter-chain linear polyols (Figure 1), have both biological and industrial significance. One example is the conversion of the linear aldose sugar D-glucose to the alditol D-glucitol (also known as sorbitol) by reduction of the aldehyde group at C1 and subsequent oxidation at C2 to produce the linear ketose sugar D-fructose. Elevated conversion from glucose to sorbitol to fructose in humans is a factor in diabetes, and pharmacologic inhibition of the enzyme aldose reductase that catalyzes the reaction is a form of therapy.1-3 In addition to participating in metabolism, these compounds also participate in cell-signaling. Inositol 1,4,5-trisphosphate, a derivative of the cyclic polyol inositol (Figure 1), is produced from the hydrolysis of the membrane phospholipid phosphatidyl inositol 4,5-bisphosphate, and acts as a secondary messenger that signals a rapid release of calcium from intracellular stores.4,5 In addition to their biological significance, this class of compounds and their derivatives have industrial applications as sugar substitutes, surfactants, lubricants and even explosives, motivating their structural, dynamic, and thermodynamic characterization.
Molecular dynamics (MD) simulations are a powerful and flexible way of studying structure, dynamics, and thermodynamics at an atomic level of detail. However, MD simulation results of a compound are only as reliable as the force field used to describe its structural and energetic properties. Accordingly, there has been much effort toward improving carbohydrate force fields, with focus primarily on cyclic pyranoses such as glucose.6-13 Owing to the vast array of carbohydrates and their derivatives found in both biological and industrial settings, a comprehensive carbohydrate force field remains to be fully developed and validated. To maximize its utility, such a carbohydrate force field should also be compatible with available protein, lipid, and nucleic acid force fields since carbohydrates rarely participate in biological processes without interaction with these other three major biomolecular classes.
The present work describes the development of CHARMM force field parameters for linear polyalcohols, inositol, and linear sugars. These parameters have been developed for compatibility with the existing protein,14,15 lipid16,17 and nucleic acid18,19 CHARMM all-atom additive force fields20,21 and extend the library of available carbohydrates, which had been previously limited to cyclic hexopyranoses.22 Bond and angle internal parameters were transferred from previous work22 with some modifications based on a survey of the Cambridge Crystallographic Database (CSD)23 and vibrational analyses. Torsional parameters were fit to the relaxed quantum mechanical (QM) potential energy surfaces at the RIMP2/cc-pVTZ//MP2/6-31G(d) level using an automated Monte Carlo simulated annealing procedure.24 The nonbonded parameters for the linear polyols and inositol were transferred from the hexopyranoses,22 whereas the nonbonded parameters for the carbonyl in aldoses and ketoses were fit to reproduce heats of vaporization and molecular volumes for neat liquids of representative model compounds as part of the present work. Parameter validation involved pure solvent calculations to compare to experimental heats of vaporization and molecular volumes, crystal simulations to compare to crystal lattice parameters, and aqueous phase simulations to compare to experimental solution densities, diffusion constants and NMR J-coupling constants. Also addressed are the difficulties encountered during the parametrization process due to the flexibility of these compounds and the extensive intramolecular hydrogen bonding between the vicinal hydroxyl groups.
In Equation 1, Kb, Kθ, KUB, Kχ and Kimp are bond, valence angle, Urey-Bradley, dihedral angle, and improper dihedral angle force constants, respectively, while b, θ, S, χ and are the bond distance, valence angle, Urey-Bradley 1,3-distance, dihedral angle and improper dihedral angle, respectively, where the subscript zero represents the equilibrium value. In the dihedral potential energy term, n is the multiplicity and σ is the phase angle as in a Fourier series. The nonbonded interaction energy between atoms i and j is separated into two terms, the Lennard-Jones (LJ) 6-12 term and the Coulomb term. For the nonbonded terms, εij is the LJ well depth, Rmin,ij is the distance at the LJ energy minimum, qi and qj are the partial atomic charges, and rij is the distance between atoms i and j. For the LJ parameters, the Lorentz-Berthelot combining rules are applied.26
Hydrogen bonding water-solute pair interaction energies and distances were calculated using the standard additive CHARMM force field protocol, so as to maintain compatibility with existing CHARMM biomolecular force fields.19,21 Solute geometries were obtained from optimization at the MP2/6-31G(d) level of the conformation in the respective crystals obtained from the Cambridge Structural Database.23 Using the optimized geometry of the monomer, water-monomer pairs were constructed, with the water internal geometry identical to that of the TIP3P water model.27 Examples of these pair interactions are shown in Figure 2 where the water molecule is interacting with the terminal hydroxyl of allitol. In pairs 1, 2, 3 and 4 the hydroxyl oxygen is the hydrogen bond donor and in 5 and 6 the hydroxyl hydrogen is the hydrogen bond acceptor. For pairs 1, 2, 3 and 4 the hydrogen of the water molecule is directed at the COH bisector. In pairs 1 and 2 the water molecule lies in the COH plane whereas in pairs 3 and 4 the water molecule lies at a 120 degrees angle to the COH plane. Pairs 5 and 6 are different because the water molecule in these interaction pairs acts as the hydrogen bond acceptor; therefore, the COH hydroxyl is directed along the water HOH bisector. For pair 5, the HOH plane of the water is at a 90-degree angle to the COH plane, and for pair 6 the HOH and COH atoms are coplanar.
Reference data for comparison of molecular mechanics (MM) interaction energies and distances were generated by geometry optimization of the interaction distances at the QM HF/6-31G(d) level for each of the water-solute pairs above, with all other degrees of freedom constrained. The QM data cannot be targeted directly, but are instead empirically scaled to account for the fact that the MM force field needs to be able to account for many-body effects in the condensed phase. The CHARMM additive force field empirical scaling rules are well-established15,28 and are such that the MM target distance is the RQM − 0.2 Å and the MM target pair interaction energy (denoted “EQM”) is given by the expression 1.16*(EQM,pair − EQM,solute − EQM,water). The energy-scaling factor of 1.16 and the offset of the QM distance by 0.2 Å account for limitations in the potential energy function and in the QM level of theory, and these empirical corrections lead to good agreement with condensed phase properties, as shown in previous work.20,21
All of the C-C-C-C, C-C-C-O, O-C-C-O and C-C-O-H dihedral parameters are fit to relaxed QM potential energy scans. The Gaussian03 package29 is used to optimize geometries at the MP2/6-31G(d) level of theory followed by single point calculations performed at the RIMP2/cc-pVTZ level with the QCHEM program.30 This level of theory has previously been shown to be sufficiently accurate for a number of systems including carbohydrates.22,31 The target dihedral is scanned at 15° intervals from −180 to 165°, with the exception of inositol, which is scanned from 15 to 135° due to the constrained nature of the ring. The dihedral parameters are then fit to the QM dihedral scans using an automated Monte Carlo simulated annealing (MCSA) method.24 In the MCSA method the selected dihedral parameters are fit simultaneously to minimize the root mean squared error (RMSE),
where and are the QM and MM energies of conformation i, wi is a weighting factor for conformation i and c is a constant that aligns the QM and MM data to minimize the RMSE. All of the six-carbon (n=6) alditols are used in the fitting procedure (Figure 1) and all of the five-carbon (n=5) and four-carbon (n=4) alditols are used as the test set for the parameter validation. With inositol, the C-C-C-O, O-C-C-O and C-C-O-H dihedrals are transferred from the hexopyranose parameters and only the C-C-C-C is fit (independently from the n=6 alditols). For the dihedral parameters in the aldehyde and ketone groups in the linear carbohydrates D-allose and D-psicose (Figure 1), only the torsions containing non-hydrogen atoms and including the carbonyl atoms are parametrized and the other torsional parameters are transferred from the n=6 alditols.
Parameter optimization and validation of the parameters is performed via a number of condensed phase MD simulations. A cubic box containing TIP3P water molecules27,32 with periodic boundary conditions is used for all aqueous simulations. Particle Mesh Ewald (PME)33 with a 12 Å real space cutoff is used to treat the long-range Coulomb interactions and a force-switched smoothing function34 with a range of 10-12 Å is used for the Lennard-Jones interactions, with a long-range correction applied beyond the truncation distance.26 The SHAKE algorithm35 is used to constrain all hydrogen atom bonds to their equilibrium lengths and to maintain rigid water geometries. For the constant pressure – constant temperature (NPT) simulations the Nosé-Hoover thermostat36,37 is used to maintain the temperature and the Langevin piston barostat38 is used to maintain the pressure. A leapfrog integrator39 is used with a 1 fs timestep for all of the simulations.
Pure solvent simulations are performed with a periodic box of 125 solvent molecules. The box of solvent molecules is minimized and then equilibrated for 50 ps followed by five production runs performed for 1 ns. The heat of vaporization ΔHvap is calculated from the pure solvent simulation using the relation
Here, Umonomer is the average potential energy of the monomers calculated from five individual gas-phase simulations of all 125 molecules, with each simulation run for 500 ps. The Ubox term is the average potential energy of the periodic box. N is the number of molecules in the box, R is the universal gas constant for an ideal gas and T is the temperature.
The free energy of aqueous solvation ΔGsol is calculated from the difference in free energy of a molecule in aqueous solution compared to that in the gas phase. ΔGsol is calculated from the sum of nonpolar ΔGnp and electrostatic ΔGelec free energies40:
ΔGnp is the sum of the repulsive and dispersive contribution, which are calculated using the Weeks, Chandler, Anderson decomposition of the LJ potential.41 The repulsion term in the LJ potential is treated using a soft-core potential.42 In the aqueous phase, free energy calculations are performed using 1 molecule centered in a water box of 250 TIP3P water molecules. The aqueous system at each window is equilibrated for 50 ps and then simulated for 200 ps in the NPT ensemble. In the gas phase, Langevin dynamics are used with an infinite non-bond cutoff.26,43 Since the gas phase energies converge much more quickly, the gas phase system is equilibrated for 10 ps and the production run is simulated for 100 ps. The simulations are performed at a temperature of 298K and a pressure of 1 atm, which is consistent with experiment. The free energy calculations are analyzed using thermodynamic integration44 and the weighted-histogram analysis method45 (WHAM). Additional details for calculating the free energy have been described previously.40,46 Unlike all other condensed phase simulations in the present work, due to software limitations, the long-range pressure correction is not part of the MD protocol for the free energy simulations. Thus, the long-range contribution (LRC) from the LJ potential to the free energy of solvation is calculated as the difference in LJ energy of the aqueous system with a nonbond cutoff of 12 Å and a nonbond cutoff of 30 Å. The LRC is calculated from a 5 ps NPT simulation trajectory of the molecules in solution using coordinates saved every 100 fs and averaged over all values.
From the pure solvent simulation trajectories, the self-diffusion coefficient Dsim incorporates a system-size dependent finite-size correction developed by Yeh and Hummer47:
In Equation 5, DPBC is the diffusion coefficient calculated from a simulation with periodic boundary conditions to which the correction term is added. ζ is a constant of 2.837297, kB is the Boltzmann constant, T is the temperature, η is the viscosity and L is the length of the cubic simulation box. DPBC is calculated from the slope of the mean square displacement of the C1 atom of all solute molecules in the simulation box versus time.26 For diffusion coefficients of polyols in aqueous solutions of TIP3P water, Equation 5 is further modified to take into account the low viscosity of TIP3P water relative to experiment:
Here, the scaling factor of 0.375 is applied to correct for the underestimation of the viscosity of water by the TIP3P model. The scaling factor is calculated from ηTIP3P /ηw where ηTIP3P = 0.35 cP and ηw = 0.93 cP, the experimental viscosity of water. Equation 6b is the viscosity of a solution with the presence of a solute estimated by the Einstein formula,48 where ηTIP3P is the viscosity of TIP3P (0.35 cP) and ϕ is the volume fraction of the solute. The method for calculating the simulation diffusion coefficient for a polyol-water mixture is similar to that previously used for a system of polyethylene oxide and polyethylene glycol.49
Complete crystal unit cells, obtained from the Cambridge Structural Database,23 are used as starting structures for crystal simulations, with periodic boundary conditions applied in accordance with the length and angle parameters of the respective crystals. Each crystal system is minimized initially to remove bad contacts and is then equilibrated for 100 ps. After equilibration, the simulation is run for 2 ns. For all of the polyol crystals, the reference temperature is set to room temperature 298K, the temperature at which the crystals were obtained, and constant pressure is maintained at 1 atm by allowing independent variation in the crystal cell length parameters.
For the aqueous phase MD simulations, a box containing 1100 waters and the number of solute molecules based on the experimental concentration is set up and then minimized using harmonic restraints with a force constant of 1*(particle mass)*kcal*mol−1*Å−2*amu−1 on only the solute molecules. The system is equilibrated for 500 ps and then the equilibrated conformation is used as the starting conformations for five different unrestrained 1ns runs, using different initial velocities for each of the runs to achieve improved statistics. The reference pressure of the glucitol and mannitol systems is 1 atm, and the reference pressure of the galacitol, xylitol, erythritol, ribitol, glycerol and myoinositol system is 3.5 atm, in accordance with the experimental conditions. The density of each system is calculated using the following equations:
where V is the average volume calculated from all five runs. Nwater , Nsolute and NAvogadro are the number of water molecules, the number of polyol solute molecules and Avogadro's number respectively. MW is the average molecular weight of the system. Equation 7a is also used to calculate the density for neat liquids; however, in this case N is simply the number of molecules in the periodic box.
The J coupling constants for glucitol and mannitol are also calculated from the aqueous simulations described above. However, the coupling constants for arabitol, ribitol and xylitol are calculated from aqueous phase simulations at 1 atm and a molality of 0.5 mol*kg−1 using the same protocol for the aqueous simulations. The dihedral value for the proton-proton coupling is calculated every 1 ps for each of the production runs. Moreover, the dihedral value is calculated for each of the solute molecules in the respective systems; therefore, depending on the concentration of the simulation the amount of torsional data differs. The J coupling is then calculated from the dihedral values for each snapshot using the generalized Karplus equation 50
In Equation 9, Jobs is the observed coupling constant.
All parameter optimization was done in a self-consistent manner so that when one parameter was changed in a molecule all other parameters were tested and reoptimized as necessary. The presented empirical force field data reflect the final set of self-consistently optimized nonbonded and bonded parameters.
The non-bonded parameters for the aliphatic and hydroxyl moieties in the polyol compounds in Figure 1 were directly transferred from alkanes51 and hexopyranoses,22 and testing showed that no further optimization was required. This testing included the ability of the force field model to properly describe hydrogen bonding as compared to QM data, both in terms of hydrogen bonding strength and distance.
Taken as a group, hydrogen bonds in the water-solute pairs are well represented using the transferred nonbonded parameters. The average error over all 24 interaction pairs examined (Table 1) is −0.14 kcal/mol for the interaction energy and 0.03 Å for the interaction distance, which shows a slight systematic underestimation of the energy and overestimation of the distance. The mean absolute error for the interaction energy and distance is calculated to be 0.43 kcal/mol and 0.04 Å respectively. For allitol, both a terminal (O1) and non-terminal hydroxyl (O2) were investigated with good results for both types. Of particular note is that the interaction energies range from −2.81 kcal/mol all the way to −8.65 kcal/mol in the scaled QM representation, and the MM representation faithfully captures this diversity in hydrogen bond strength. MM water interactions for n=4 (threitol), n=5 (ribitol), n=6 (allitol and altritol) linear polyols as well as for inositol are all independently in good agreement with the QM results with the exception of a few weakly interacting pairs, i.e. pair interactions 1 and 2 for ribitol. However, in the cases where the weakly interacting pair has a large error, the strongly interacting pairs are in very good agreement. As the more favorable interactions dominate structural and dynamic properties, it is deemed more important to treat these interactions accurately. Of note is that all hydroxyls in all compounds have the same partial charges and LJ parameters assigned to them, which attests to their generality and transferability to similar compounds, and provides evidence that an accurate force field can be developed with a relatively parsimonious set of nonbonded parameters.
As an additional test of the hydroxyl nonbonded parameters, the pure solvent properties of glycerol were determined. The results show an error of 4.8% in the molecular volume and −14.2% in the heat of vaporization (Table 2). These errors are significantly beyond the typical error for CHARMM pure solvents, especially considering that the pure solvent properties of ethylene glycol are in good agreement with experiment.22 One possibility for the large error in the heat of vaporization is overly favorable intramolecular hydrogen bonding occurring between hydroxyl groups in the gas phase simulations resulting from the need to over polarize hydroxyls to obtain correct scaled water-solute interaction energies appropriate for a condensed phase additive force field. The resultant lowering of the value of Umonomer in Equation 3 would lead to a larger heat of vaporization. To test this possibility, the gas phase energy Umonomer was calculated using monomer trajectories extracted from the pure solvent simulation. In these trajectories it is expected that intramolecular hydrogen bonding would be diminished due to intermolecular hydrogen bonding of the monomers with surrounding molecules in the solvent environment. Using this approach to calculate Umonomer the resulting heat of vaporization is 22.34 kcal/mol, which is only a 2.0 % error with respect to experiment. Thus, it appears that the significant deviation in the heat of vaporization of neat glycerol is due to overestimation of intramolecular hydrogen bonding in the gas phase, although it should be noted that this effect will not influence the calculated molecular volume. Intramolecular hydrogen bonding in these compounds also complicates the parametrization of their conformational energetics, as discussed below.
The nonbonded parameters for aldehyde and ketone carbonyls were explicitly optimized for application to the linear aldose and ketose forms of monosaccharides. Acetaldehyde and acetone were selected as model compounds for the parameter development process. Methyl parameters were as previously published51 and only the carbonyl C and O atoms and, with acetaldehyde, the aldehydic H LJ parameters and charges were optimized, with the charges of the adjacent bonded methyl carbons adjusted to yield a total molecular charge of zero. Target data for the parameter optimization were water-solute pair interaction energies and distances, dipole moments, heats of vaporization, and molecular volumes, while the free energies of aqueous solvation and the diffusion constant were calculated using the final optimized parameters.
Optimization of the carbonyl LJ parameters and partial charges yielded a set of parameters capable of reproducing all the target data. Overall, interaction energies with water are in satisfactory agreement with the QM target data, though in some cases the balance between the in-plane and out-of-plane orientations is not ideal (Table 2 and Figure S1 of the supplemental materials). This problem is due to limitations in the form of the energy function and can, to a large extent, be corrected using an explicit representation of lone pairs.52 However, to maintain consistency with the remainder of the CHARMM additive force fields, such an addition was not made. The final charges overestimate the QM MP2/6-31G(d) dipole moments by 28% and 29% for acetaldehyde and acetone, respectively. Such overestimation is required in the additive model to account for the implicit over-polarization required to accurately predict the condensed phase properties.21 The corresponding partial atomic charges along with the appropriate LJ parameters lead to excellent agreement for the pure solvent properties for both molecules, with the agreement within 2.2% of experiment53-59 in all cases (Table 2).
All bond and angle parameters were initially transferred from similar existing CHARMM additive force field parameter values. Optimized geometries were compared to target data from a crystal database survey of the CSD. This analysis revealed systematic errors in geometries, in particular the bond lengths associated with C-C bonds, in which both carbons are hydroxylated. Accordingly, the respective geometric force field parameters were suitably optimized. A comparison of the optimized bond lengths and angles to the crystal survey data and QM data at the MP2/6-31G(d) level for compounds acetone and acetaldehyde is given in Table S2 in the supplemental materials. Vibrational analyses for acetone, acetaldehyde and glycerol (Table S3 in supplemental materials) were done both at the QM MP2/6-31G(d) level and in the MM representation, and the MOLVIB utility in CHARMM was used to assign the bond and angle contributions to internal normal modes as described by Pulay.60 MM force constants were optimized as required to reproduce the scaled QM frequencies61 to complete the bond and angle parameter optimization.
Parameters associated with methyl rotations in acetone and acetaldehyde were readily optimized to yield excellent agreement between QM and MM methyl dihedral scans (O=C-C-H) (supplemental materials, Figure S4). In the case of the C-C-C-C, C-C-C-O, O-C-C-O, and C-C-O-H dihedrals, the torsional parameters were fit to QM RIMP2/cc-pVTZ//MP2/6-31G(d) potential energy scans performed on linear polyols. These parameters were simultaneously fit to conformations with relative energies below a cutoff value of 15 kcal/mol for scan points on all of the n=6 linear polyols, yielding approximately 1500 different conformational energies. Prior to fitting, with the parameters for the targeted dihedrals set to zero, the total RMSE (Eq. 2) for all of the conformations, including those above the energy cutoff of 15 kcal/mol (~1730 different conformations) was 4.0 kcal/mol. Following fitting, which included the 1, 2 and 3 fold terms for each dihedral, sampling the force constant from 0 to 3 kcal/mol and sampling phase angles of either 0 or 180, the RMSE was 2.5 kcal/mol. Using the dihedral parameters fit to the n=6 sugar alcohols, the RMSE of the C-C-C-C dihedral for the n=5 linear polyols is calculated to be 1.60 kcal/mol and for the n=4 polyols to be 1.87 kcal/mol, as compared to values of 2.20 kcal/mol and 3.85 kcal/mol, respectively, with the targeted dihedral parameters set to zero. These data demonstrate both that the fitting procedure leads to significant improvements in the conformational energetics of the targeted n=6 compounds and also the transferability of the dihedral parameters to the shorter chain linear polyols.
Figure 3 shows the improvement in the MM conformational energies for the n=6 target compounds. The change is visually apparent in the relative energies (Figure 3a), and especially so in the difference in QM and MM energies for both the parametrized and unparametrized energies (Figure 3b). It should be noted that all the relative energies for all the compounds are offset to a constant c, from Eq 2, minimizing the total RMSE value. Figure 4 shows the QM and MM dihedral scans, using the optimized torsional parameters for a few representative compounds, and highlights the quality of the conformational energy fitting results. In all cases, the location of the minimum using the optimized dihedral parameters reproduces the QM results to within 15 degrees. There is, however, a general trend toward overestimation of the energy barriers by the MM model. Two factors contribute to this. One is the fact that the targeted dihedral parameters are fit simultaneously to all of the n=6 compounds, as required to maximize their generality and transferability (as evidenced by their applicability to the shorter polyols). The second is due to overestimation of intramolecular hydrogen bonding, again arising from the over-polarized hydroxyl groups that are required for proper condensed phase behavior in an effective pairwise additive force field; these intramolecular hydrogen bonding interactions are broken in the region of the energy barriers, exaggerating their energy differences relative to the minima.
Less extensive dihedral parameter optimization was required for the remaining compounds in Figure 1. The cyclic inositol was treated separately from the linear polyols, with parametrization of only the C-C-C-C dihedral, with the other dihedral parameters transferred from the hexopyranose force field. Using the MCSA method to fit the ring dihedral in inositol, an RMSE value of 3.27 kcal/mol was obtained for an energy scan in which the ring was deformed from the favored chair conformation, an improvement on the RMSE of 4.23 kcal/mol before optimization. Additionally, for allose and psicose, only the dihedrals involving the carbonyl atoms and heavy atoms (C-C-C=O, O-C-C=O) required parametrization, since all others could be transferred from those for the linear polyols and acetone or acetaldehyde. The RMSE for the optimized allose and psicose dihedral parameters are calculated to be 0.75 kcal/mol and 0.69 kcal/mol as compared to 1.47 kcal/mol and 2.95 kcal/mol, respectively, for the unparametrized torsions. Though only allose and psicose were considered, the current set of parameters is applicable to all related ketoses and aldoses, owing to the demonstrated generality and transferability of the polyol parameters.
To validate the optimized parameters, pure solvent, crystal, and aqueous phase MD simulations were performed to calculate both condensed phase properties and conformational distributions and to compare them with experimental results. As well as testing the accuracy of the force field parameters, these calculations also gave insight into complications that arise during the parametrization process.
Validation of the aldehyde and ketone parameters was based on the calculation of the aqueous solvation free energy ΔGsol and pure solvent diffusion coefficients for acetaldehyde and acetone. The calculated value for ΔGsol, including the long-range LJ correction, is in excellent agreement with the experimental value for acetaldehyde, with a difference of 0.3 kcal/mol (Table 2). The agreement for acetone is poorer, with the force field yielding a value that is 1.2 kcal/mol more favorable than the experimental value. While the level of agreement for acetone is not ideal, it is similar to that observed for a number of model compounds representative of amino acid sidechains.62,63 Furthermore, the diffusion coefficient for pure acetone is in excellent agreement and, combined with the water-solute interaction energy data (Table 2), indicates the model to have satisfactory energetic and dynamic properties.
Crystal simulations were performed using crystals obtained from the CSD23 to validate nonbonded parameters and conformational properties. The compounds included a number of tetritols (n=4), pentitols (n=5) and hexitols (n=6) as well as glycerol and inositol. The selected crystals provide a thorough investigation of the crystal lattice parameters and are chosen based on diversity, purity and resolution. To be consistent with experimental conditions, all of the crystals were simulated at room temperature (298K).
The average sizes of the crystal unit cells from the simulations are systematically too large relative to the experimental values (Table 3). The average percent error of the simulated unit cell lengths A, B and C for all of the selected polyols is 2.8%, 1.5% and 4.3%, respectively, and the average unit cell volume is calculated to have an error of 7.3%. This over prediction in the average volume is consistent to what has been seen in previous studies for the hexopyranoses22 and suggests limitations in the ability of a pairwise additive condensed-phase force field designed for liquid simulations to reproduce the crystal phase. The environments surrounding a molecule in the liquid versus the crystal state are considerably different, and the inability of the force field to quantitatively model the latter environment when parametrized to the former is not entirely surprising.
The heavy atom bonds, angles and dihedrals are monitored during the simulations and the averages reproduce the values in the crystal structure. This is particularly true for the bond lengths and angles because their equilibrium values were adjusted based on a crystal database survey. The differences between a selected set of experimental and simulated bond lengths, angles and dihedrals are given in Table S5 of the supplemental materials. However, some of the dihedrals involving hydroxyl oxygens contrast to the values found in the crystal structure. In some cases, these differences cause errors in the calculated crystal lattice parameters, as hydrogen bonding that may occur in the crystal is not maintained during the simulation. For instance, in the altritol unit cell, a terminal hydroxyl hydrogen in one of the altritol monomers is hydrogen bonded to a hydroxyl oxygen in another altritol in an adjacent unit cell; this leads to the relatively large differences in the O1-C1-C2-C3 and O1-C1-C2-O2 dihedrals. During the crystal simulation, rotation about the torsions involving the hydrogens on these hydroxyls causes a loss in hydrogen bonding capabilities. Subsequently, this causes an increase in unit cell length A leading to larger errors in the unit cell A and volume of the unit cell.
To test the behavior of the polyols in aqueous solution, the environment in which it is anticipated the force field will primarily be applied, densities were calculated for molal solutions of glucitol, mannitol, ribitol, xylitol, galacticol, erythritol, glycerol and myo-inositol varying in concetration from dilute (0.1 mol/kg) to highly concentrated (5 mol/kg) and compared to experimental values.64,65 For consistency with experimental conditions, glucitol and mannitol were simulated at a temperature and pressure of 298K and 1 atm, while all other compounds were simulated at 298K and 3.5 atm. All of the calculated molecular densities reproduce experimental values within 3% error across the entire 50-fold difference in concentration and at ambient and elevated pressures (Table 4). Moreover, as the concentration increases there is a trend of decreasing error. The overall average error in volume, 0.64%, is much better than the overall crystal volume, emphasizing the applicability of the parameter set to heterogeneous liquid systems.
Diffusion coefficients for polyol-water solutions show consistent correlation with experimental values (Table 5). Complicating the analysis is the fact that TIP3P water has a self-diffusion coefficient larger than the experimental value, which motivated the addition of correction terms and scaling factors to the diffusion equation. The diffusion coefficients are computed using Equation 6a, where the size-dependent correction term is calculated using the shear viscosity of TIP3P water model, which is 0.35cP66 (3.5E-4 kg/m/s), in conjunction with Equation 6b for all polyol-water mixtures. The volume fraction (Vmixture −Vwater box)/Vwater box is calculated from the average volume of the box for all five runs for the polyol aqueous phase simulations. DPBC is calculated at the given concentration for each of the five production runs and is averaged. The computed results are compared to experimental results67 and the percent error is given in Table 5. The diffusion coefficients are systematically too low with a total average error of−48%. This error may be due to the fact that the DPBC varies greatly between the five different production runs with a standard deviation of the average DPBC for each polyol calculated to be ~30%. The standard deviations are given in parentheses in Table 5. In addition, assumptions in the correction factors in Equation 6 may be limiting.
To investigate the ability of the model in treating the conformational properties of the alditols in solution, J coupling constants and conformer populations for all of the Hx – Hx−1 hydrogens were computed from the aqueous phase simulations of glucitol, mannitol, arabitol, ribitol, and xylitol. The dihedrals were computed for every solute molecule from snapshots taken every 1ps from the MD simulations. Since the simulations of glucitol and mannitol are run with varying concentrations, the sampling of the dihedrals is much larger. For glucitol, the number of points sampled is ~1.2 million and for mannitol it is 260,000. The number of torsions calculated for arabitol, ribitol and xylitol is 50,000. In the case of arabitol, ribitol and xylitol, performing 5 more runs with different starting velocities and obtaining similar dihedral probability distributions tested the convergence of sampling. J coupling values are calculated for each dihedral in each sample using the Karplus equation given in Equation 8. Subsequently, the coupling constant is calculated as the average over all the J coupling values for each dihedral.
The H-H trans conformer probability was determined from the dihedral probability distributions, which were determined with a bin width of 3.6 degrees between −180 and 180 degrees. A trans conformer was considered to have any torsional value less than −135 and greater than 135 degrees. The J coupling constants and the probability of H-H trans conformers Ftrans are given in Table 6. The H-H trans conformer distributions for glucitol are shown in Figure S6 of the supporting material. As shown in Table 6, for all of the molecules studied, some of the J coupling values and Ftrans values reproduce the experimental NMR data, although the differences are larger in some cases. For example, the average difference in Ftrans involving one terminal hydrogen and one interior hydrogen (i.e. 1,2; 1',2, 5,6 and 5',6) is calculated to be 7 and the average difference involving two interior hydrogen atoms is −29. Therefore, the Ftrans are better described for those dihedrals involving the terminal hydrogen. However, it is important to note that the parameters for the associated dihedrals (terminal/non-terminal) have the same values in order to retain the transferability and simplicity of the parameters for these compounds. Therefore, the parameters overestimate the trans conformer population involving the terminal hydrogens and underestimate the trans conformer population involving only internal hydrogens. While such a compromise may represent an inherent limitation in the force field, limitations in the conversion from dihedrals to J coupling values may also contribute. Such limitations may be due to the use of a simplified generalized Karplus equation in this study (Eq. 7). Given the different environments of the terminal and nonterminal hydrogens, it may be more appropriate to use equations that are more detailed (e.g. that account for electronegativity of substituents) or are optimized for the individual types of dihedrals. However, given the large number of compounds and classes of dihedrals, such a task is significant and beyond the scope of the present study.
The all-atom additive CHARMM force field for acyclic carbohydrates and polyalcohols and inositol is presented. A full list of the topology and parameter information is included in Table S7 of the supplemental material. The nonbonded parameters for the polyols are transferred from the hexopyranoses and are tested using QM water-polyol interactions. The parameters for the carbonyl groups in linear aldoses and ketoses are optimized using the model compounds acetaldehyde and acetone and target QM data, model-compound-water interactions and pure solvent properties. For all of the compounds in this study, the MM water interaction distances and energies are in good agreement with the scaled QM values. For the bond and angle parameters, initial optimization was performed using acetone, acetaldehyde, and glycerol, with the optimized parameters transferred to the larger acyclic carbohydrates. Concurrently, selected equilibrium bond distances were adjusted to fit CSD survey data. The dihedral parameters are optimized to fit an extensive set of QM dihedral energy scans on n=6 linear polyols. The resulting parameters were then shown to be directly transferable to the n=4 and n=5 polyols. Validation of the optimized parameters was performed using condensed phase simulations, including crystal simulations and aqueous phase simulations. In the crystal simulations, all heavy atom bond, angle and dihedral values reproduced the experimental crystal values. Crystal volumes calculated from the simulations are systematically too large by approximately 7%, consistent with a similar trend observed in crystals of hexopyanoses22. Although the crystal volumes are too large, molecular densities calculated from aqueous phase simulations concentrations ranging all the way from 0.1 mol/kg to 5 mol/kg reproduce experimental data very well to within approximately 1%. As the parameters are anticipated to be used primarily for investigations of these compounds in aqueous solution, reproduction of the solution phase results is deemed more important. Concerning the conformational properties in solution, trans conformer populations calculated using a generalized Karplus equation are overestimated for terminal dihedrals while being systematically underestimated for nonterminal dihedrals. Limitations in the applied Karplus equation may limit this analysis.
All of the compounds considered in the present study (Figure 1) are flexible molecules with multiple hydroxyls or carbonyl moieties that allow for extensive intramolecular hydrogen bonding. Such hydrogen bonding leads to difficulties in parametrizing a force field for these compounds. This is due to the need to overestimate the partial atomic charges in additive force field so as to account for the polarization of molecules that occurs in the condensed phase. Such increased charges, or over-polarization, is particulary problematic with these compounds, as it leads to the extensive gas-phase intramolecular hydrogen bonding in the molecules being systematically overestimated; in the condensed phase it is assumed that competition between inter- and intramolecular hydrogen bonding yields a proper representation of interactions with the environment. Overestimation of intramolecular hydrogen bonding leads to several problems presented above. With glycerol, calculation of the heat of vaporization is confounded as the gas phase intramolecular hydrogen bonding will tend to make the gas phase energy too favorable, leading to the heat of vaporization being systematically too unfavorable. Calculation of the heat of vaporization using the gas phase energy determined using the monomer conformations from the condensed phase simulation (which is equivalent to determination of ΔHvap based on only the intermolecular interactions in the condensed phase) yields much better agreement. This is consistent with previous results for ethylene glycol, where both the heat of vaporization and molecular volume were in good agreement with experiment.22 However, it does not explain the molecular volume of glycerol being too large by 4.8%, which may be due to limitations in the nonbond parameters or the subtle balance in the competition of the inter- and intramolecular hydrogen bonding not being ideal. Another limitation due to overestimation of intramolecular hydrogen bonding is its impact on the dihedral energy scans. This leads to the low energy conformations, which maximize intramolecular hydrogen bonds, being artificially too favorable, which manifests itself in the MM energy surfaces overestimating the barrier height in the plots (Figure 4). This limitation leads to the RMSE for the energy surfaces to be higher than anticipated and may impact the resulting conformational properties of these molecules in solution, leading to poorer agreement with the NMR data. As this problem is due to inherent assumptions in the additive force field used in this study, it cannot be solved, though we have attempted to alleviate this problem via targeting and validating the force field against a large body of target data. It is anticipated that polarizable force fields, in which the need to over polarize the gas phase charge distribution is eliminated,68 may overcome these problems.
Financial support from the NIH (R01GM070855 (ADM) and F32CA1197712 (OG)) and computational support from the Department of Defense High Performance Computing and the National Cancer Institute Advanced Biomedical Computing Center are acknowledged.
Supporting Information Available.
Acetone and acetaldehyde water interaction orientations; acetone and acetaldehyde geometric data; vibrational analysis for acetone, acetaldehyde and glycerol; acetone and acetaldehyde dihedral scans; comparison of crystal and calculated bond lengths, angle and dihedrals; conformer distributions of glucitol; topology and parameter files. This material is available free of charge via the Internet at http://pubs.acs.org