|Home | About | Journals | Submit | Contact Us | Français|
We present an extension of the CHARMM hexopyranose monosaccharide additive all-atom force field to enable modeling of glycosidic-linked hexopyranose polysaccharides. The new force field parameters encompass 1→1, 1→2, 1→3, 1→4, and 1→6 hexopyranose glycosidic linkages, as well as O-methylation at the C1 anomeric carbon, and are developed to be consistent with the CHARMM all-atom biomolecular force fields for proteins, nucleic acids, and lipids. The parameters are developed in a hierarchical fashion using model compounds containing the key atoms in the full carbohydrates, in particular O-methyl-tetrahydropyran and glycosidic-linked dimers consisting of two molecules of tetrahyropyran or one of tetrahydropyran and one of cyclohexane. Target data for parameter optimization include full two-dimensional energy surfaces defined by the Φ/Ψ glycosidic dihedral angles in the disaccharide analogs as determined by quantum mechanical MP2/cc-pVTZ single point energies on MP2/6-31G(d) optimized structures (MP2/cc-pVTZ//MP2/6-31G(d)). In order to achieve balanced, transferable dihedral parameters for the Φ/Ψ glycosidic dihedral angles, surfaces for all possible chiralities at the ring carbon atoms involved in the glycosidic linkages are considered, resulting in over 5000 MP2/cc-pVTZ//MP2/6-31G(d) conformational energies. Also included as target data are vibrational frequencies, pair interaction energies and distances with water molecules, and intramolecular geometries including distortion of the glycosidic valence angle as a function of the glycosidic dihedral angles. The model-compound optimized force field parameters are validated on full disaccharides through comparison of molecular dynamics results to available experimental data. Good agreement is achieved with experiment for a variety of properties including crystal cell parameters and intramolecular geometries, aqueous densities, and aqueous NMR coupling constants associated with the glycosidic linkage. The newly-developed parameters allow for the modeling of linear, branched, and cyclic hexopyranose glycosides both alone and in heterogenous systems including proteins, nucleic acids and/or lipids when combined with existing CHARMM biomolecular force fields.
Polysaccharide carbohydrates, composed of individual monosaccharide units that are connected together by glycosidic linkages, have numerous and varied roles in biology, where they serve as energy storage and transport molecules, structural scaffolds, and motifs for molecular recognition. An important subset of these glycosides has as its component monosaccharide hexopyranoses, such as glucose (Figure 1, compound 1), galactose, and mannose. Members of this subset include cellulose and starch, which are both composed exclusively of glucose, yet have dramatically different properties and functions arising from cellulose being composed exclusively of one epimer of glucose and starch exclusively of the other epimer.1 In addition to existing as isolated biological polymers, hexopyranose polysaccharides are commonly found covalently linked to other biomolecules. Attachment of hexopyranose polysaccharides to proteins yields glycoproteins and to lipids yields glycolipids, and these heterogeneous biomolecules play crucial roles in protein folding and molecular recognition.
Significant efforts spanning several decades have been made toward the development of molecular mechanics force field models for investigating the structure and energetics of polysaccharides.2–16 The large variety of component monosaccharides, the inherent conformational complexity of monosaccharides, and the additional degrees of freedom that need to be considered in parametrizing glycosidic linkages all present challenges to the development of comprehensive carbohydrate force fields. As such, the range of force field applicability (i.e. types of carbohydrates) is often limited, and the study of heterogeneous systems (e.g. glycolipids, glycoproteins, protein:carbohydrate complexes) is complicated by inconsistent treatment of 1,4 nonbonded interactions and differing force-field development protocols. Presently, the availability of high-quality experimental data and the ability to probe a larger variety of conformational energies on larger model systems using more accurate quantum mechanical methods allow for the inclusion of more and better target and validation data in the carbohydrate force field development process, as reflected in recent efforts.17–20 It is anticipated that such advances will lead to both more complete and more accurate carbohydrate force field models.
To enable the atomic-level modeling of a wide variety of carbohydrates in an aqueous environment and interacting with other biopolymers such as proteins, nucleic acids, and lipids, our laboratory has undertaken the development of a comprehensive force field for carbohydrates19,20 in the context of the CHARMM additive biomolecular force fields.21–33 The long-term goal of these efforts is to enable atomically-detailed investigation of carbohydrates in molecular recognition, including detailed understanding of the molecular basis of and physical insights into the interaction of carbohydrates with other biomolecules such as proteins and lipids. The present work builds on previously published work on model compounds, including linear and cyclic ethers,34 ethylene glycol,35 and 2-ethoxy tetrahydropyran,36 as well as all-atom pairwise additive force field development efforts for hexopyranose monosaccharides19 and acyclic polyols, acyclic carbohydrates, and inositol.20 Among the findings of these prior efforts is that, using the relatively simple functional form of the CHARMM all-atom additive force field equation, it is possible to develop carbohydrate force field parameters that give good agreement with gas phase, crystalline, and aqueous conformational properties as well as aqueous phase densities of very dilute to very concentrated aqueous solutions. In the present study, the parameters developed for hexopyranoses have been extended to a comprehensive collection of glycosidic linkages, i.e. 1→1, 1→2, 1→3, 1→4, and 1→6 glycosidic links with all possible combinations of chiralities for the two ring carbon atoms in each glycosidic link, as well as to both α- and β-anomers of 1-O-methyl hexopyranoses. Therefore, these parameters allow for the modeling of all possible unsubstituted linear, cyclic, and branched hexopyranose polysaccharides.
The present work represents a departure from previous parametrizations of hexopyranose glycosidic linkages in that extensive high-level quantum mechanical data on relatively large model compound systems are used in the parametrization procedure. Specifically, two molecules of tetrahydropyran or one molecule each of tetrahydropyran and cyclohexane connected by a glyclosidic linkage are employed as model compounds for hexopyranose disaccharides. Optimized two-dimensional scans at the MP2/6-31G(d) level with a resolution of 15° for the full range of the Φ/Ψ dihedral angles of a glycosidic linkage are performed, and MP2/cc-pVTZ single point energies are computed for all of these optimized structures. In sum, over 5250 MP2/cc-pVTZ//MP2/6-31G(d) conformational energies are used in the parametrization process, and include consideration of all possible chiralities at the ring carbon atoms involved in the glycosidic linkages. Particular care is taken in the parametrization of the C1-Olink-Cn’ valence angle distortion, where C1 corresponds to the anomeric carbon in the first ring, Olink the oxygen in the glycosidic linkage, and Cn’ the glycosidic ring carbon atom in the second ring, such that the force field accurately reproduces the quantum mechanical distortion of this angle, ranging nearly 30°, over the full range of Φ/Ψ dihedral values. Extensive validation of the parameter set is done by comparing results from molecular dynamics simulations of disaccharide crystals and aqueous solutions to experimental disaccharide crystal geometries, NMR J-coupling values, and aqueous solution densities.
All molecular mechanics calculations were performed with the CHARMM program,21,37,38 and using the same potential energy function as for the CHARMM protein,23–25 nucleic acid,26–28 and lipid all-atom additive force fields:29–33
In Eqn. 1, Kb, Kθ, KUB, Kχ and Kimp are bond, valence angle, Urey-Bradley, dihedral angle, and improper dihedral angle force constants, respectively. b, θ, S, χ and are the bond distance, valence angle, Urey-Bradley 1,3-distance, dihedral angle, and improper dihedral angle values, and the subscript 0 represents an equilibrium value. Additionally, for the dihedral term, n is the multiplicity and δ is the phase angle as in a Fourier series. The nonbonded interaction energy between pairs of atoms i and j consists of the Lennard-Jones (LJ) 6–12 term and the Coulomb term. εij is the LJ well depth, Rmin, ij is the interatomic distance at the LJ energy minimum, qi and qj are the partial atomic charges, and rij is the distance between atoms i and j. The Lorentz-Berthelot combining rules are applied to determine LJ parameters between different atom types.39
A modified version of the rigid three-site TIP3P model was used to represent water,40,41 and the SHAKE algorithm42 was applied to keep water molecules rigid and to constrain covalent bonds between hydrogen and heavy atoms to their equilibrium values. Gas-phase energies were calculated using infinite nonbonded cutoffs and gas-phase energy minimizations were done to a tolerance of 10−6 kcal*mol−1*Å−1. Aqueous and crystal simulations were performed using periodic boundary conditions,39 with a force-switched smoothing function43 applied to LJ interactions in the range of 10 to 12 Å and particle mesh Ewald44 with a real-space cutoff of 12 Å to compute the Coulomb interactions. A timestep of 1 fs was used for integration of the equations of motion with the “leapfrog” integrator,45 and all molecular dynamics (MD) employed Nosé-Hoover thermostating,46,47 Langevin piston barostating,48 and a long-range correction to the pressure to account for LJ interactions beyond 12 Å.39 Condensed-phase simulations were done at experimental temperature and pressure, with aqueous simulations done using a cube as the periodic unit cell and crystal simulations employing the appropriate experimental unit cell geometries. For aqueous simulations with a cubic unit cell, the cell edge lengths were varied isotropically to maintain the target pressure during simulation, whereas unit cell edge lengths in crystal simulations were allowed to vary independently. Angular crystal cell parameters of 90° were constrained to this value while those not 90° were allowed to vary independently. Data for crystal simulations were collected for 4 ns following 1 ns of equilibration.
QM calculations were performed using the Gaussian 03 program.49 Geometry optimizations and vibrational calculations were performed using the MP2/6-31G(d) model chemistry,50,51 with tight tolerances applied when optimizing structures for vibrational calculations. A scale factor of 0.9434 was applied to the QM frequencies, as required to account for limitations in the level of theory and reproduce experimental frequencies.52 Potential energy decomposition analysis was performed using the MOLVIB utility in CHARMM using internal coordinates as per Pulay et al.53 Relaxed potential energy scans were obtained by optimizing the geometry at the MP2/6-31G(d) level followed by MP2/cc-pVTZ single point calculations (MP2/cc-pVTZ//MP2/6-31G(d)).50,51,54 All potential energy scans were performed in 15° increments with only the scanned dihedral angles (e.g. Φ/Ψ or Φ/Ψ for the 2D surfaces) constrained.
QM calculations for interaction pairs consisting of a water molecule with a particular model compound followed the standard procedure for the CHARMM force field, thereby ensuring consistency of the nonbonded parameters with the remainder of the CHARMM additive biomolecular force fields.22 Per this procedure, the solute:water interaction distance was optimized at the HF/6-31G(d) level, with constraints on all other degrees of freedom. Following optimization, HF/6-31G(d) interaction energy target data were calculated as 1.16*(Epair − Esolute − Ewater), with no basis-set superposition-error correction and the empirical scaling factor of 1.16 introduced to yield parameters appropriate for a condensed phase force field.24,55 Target data for interaction distances were the QM-optimized distances minus 0.2 Å, again to yield parameters appropriate for a condensed phase force field. The water intramolecular geometry in both QM and MM calculations of pair interaction data was that of the TIP3P water model,40 and the model compound geometry was one that was previously gas-phase optimized in the MP2/6-31G(d) or the CHARMM representation, respectively.
Relaxed QM potential energy surfaces at the MP2/cc-pVTZ//MP2/6-31G(d) level of theory were used as target data for fitting the dihedral force constants in an automated manner using the freely-available Monte Carlo simulated annealing (MCSA) dihedral parameter fitting program fit_dihedral.py56 (available for downloading at http://mackerell.umaryland.edu). For each dihedral being fit, three multiplicities n of 1, 2, and 3 were included and the corresponding Kχ values (Eqn. 1) were optimized to minimize the root mean square error between the empirical and QM energies. In the MCSA approach, the adiabatic empirical energy surfaces are initially obtained with the force constants Kχ on the dihedral parameters being parametrized set to zero; restraining potentials on dihedral angles are used to maintain the values of the dihedral angles that are being scanned. The energy difference between the resulting empirical surface and the target QM surface is then determined and the dihedral parameters fit to reproduce that energy difference. Kχ values were constrained to be no more than 3 kcal/mol, and phase angles δ were limited to 0 and 180° to maintain symmetry of the dihedral potentials about χ = 0°, allowing for applicability of the parameters to both enantiomers of a chiral species.
An extension to the MCSA fitting program fit_dihedral.py was introduced to allow for independent root mean squared alignment of multiple MM surfaces to their corresponding QM surfaces during the simultaneous fitting of these surfaces. Previously, the target function to be optimized was the root-mean squared error RMSE between the QM conformational energies and the MM conformational energies :
where wi is a weight factor for conformation i and the constant c, defined by
optimally aligns the QM and MM data.56 When simultaneously fitting the same dihedral parameters in two or more different molecules (e.g. configurational isomers), it is sometimes desirable to allow for independent root-mean squared alignment of the QM and MM data on a per-molecule basis. Thus the target function for MCSA fitting was expanded to
where RMSEg is the RMSE for a grouping of data and is defined by Eqns. 2 and 3 for that grouping g. Each grouping of data is aligned independently, that is the c values are computed separately for each grouping, and wg is a weight factor applied to a particular grouping. Thus, for example, data for configurational isomers can be placed in separate groupings so as to optimally fit the conformational energies of the individual configurational isomers with a single set of dihedral parameters without requiring fitting of the QM ΔE values between configurational isomers.
Here, Nwater, Nsolute, and NAvogadro are the number of water molecules, solute molecules, and Avogadro’s number, respectively, and MW is the average molecular weight of the solution. For all aqueous density simulations, Nwater = 1100, while Nsolute was adjusted to give the appropriate density for each simulated system. These aqueous simulations were equilibrated for 0.5 ns followed by 5 ns of data collection, with the volume of the system calculated every 10 ps and averaged over the simulation time to give V.
NMR heteronuclear three-bond proton-carbon coupling constants (in Hz) for the glycosidic angle rotation 3 JCOCH were computed from the simulations using the modified Karplus equation developed by Tvaroska et al.57
as well as that by Cloran et al.58
where ϕ is the C-O-C-H dihedral angle of interest. Similarly, for three-bond carbon-carbon coupling constants 3 JCOCC, the modified Karplus equation developed by Bose et al.,59
was used, where ϕ is the C-O-C-C dihedral angle of interest. Bose et al. also developed a second simpler equation in which the cos ϕ term was excluded,59
and this second equation was also used to analyze the MD data. To compute J values from MD simulations the value of the dihedral angle of interest was tabulated for each snapshot of each disaccharide in the simulation. Each dihedral value was then used as ϕ in one of the above equations to get a J value for that disaccharide in that snapshot. The resultant J values for the dihedral angle of interest were then averaged over all disaccharides in the system and over all snapshots from the simulation to get an ensemble-averaged value for J for comparison with previously published experimental data. Simulations consisted of four disaccharides in 1100 molecules of water, corresponding to a concentration of 200 mM, which is approximately 10-fold more concentrated than the comparison NMR experiments though still sufficiently dilute so as minimize direct solute-solute interactions. Of the four disaccharide molecules, the chirality at the reducing-end anomeric carbon was α for one molecule and β for the remaining three, so as to reflect the anomeric ratio for glucopyranose in water (α:β = 1:260). In accord with the comparison NMR experiments performed using pure D2O, simulations were done at 298 K and 1 atm, and with no ions added. Simulations were equilibrated for 0.5 ns, followed by 20 ns of data collection during which a snapshot was taken every 1 ps.
The focus of the present work was the development of a transferable set of parameters for the glycosidic linkages between hexopyranoses. Applying parameters developed for hexopyranose monosaccharides and associated model compounds19 left as parameters to-be-determined only those involving the glycosidic linkage oxygen atom. Starting values for bond, angle, dihedral, Lennard-Jones and electrostatic parameters involving this atom were transferred from published ether parameters34 and further optimized as described below using the model compounds in Figure 1, and, to a limited extent, data from disaccharide crystals. Parametrization was done in a self-consistent fashion such that whenever one parameter was changed, properties were recomputed and additional parameters re-optimized if necessary21–23; all data presented throughout reflect the final set of self-consistently optimized parameters. The optimized parameters were subsequently applied to full disaccharides for validation by comparison of MD results to available experimental data.
Conformational energies for all compounds were studied using the MP2/ccpVTZ// MP2/6-31G(d) model chemistry, which provides a reasonable compromise between accuracy and computational expediency. Previously, it was seen that energies for hexopyranose monosaccharides using this model chemistry were in excellent agreement compared to MP2/cc-pVTZ optimized structures.19 Additional studies on ethylene glycol35 and 2-ethoxy tetrahydropyran36 using a variety of model chemistries support the use of the MP2/cc-pVTZ model chemistry for energy calculations on carbohydrates.
O-methylation at the C1 position in hexopyranoses leads to formation of an acetal, thereby preventing spontaneous isomerization between the α- and β-anomers, and is a common chemical modification in synthetic hexopyranose polysaccharides. Force field parameters were developed for this moiety because of its common occurrence and its similarity to glycosidic linkages between hexopyranoses. Compounds 2 and 3, which are O-methyl derivatives of tetrahydropyran, were used as model compound analogs for the α- and β-anomers of 1-O-methyl hexopyranoses. The location of O-methylation and the geometry of the tetrahydropyran ring were chosen to mimic a 1-O-methyl-d-hexopyranose molecule having the energetically-favored 4C1 ring conformation.
Tetrahydropyran parameters were those previously developed in the context of hexopyranoses19 and bond, angle, dihedral, Lennard-Jones, and partial charge parameters involving the O-methyl group were transferred from existing ether parameters.34 Following this transfer of parameters, those that remained missing included the Oring-C1-Omethyl valence angle and the Oring-C1-Omethyl-Cmethyl, C2-C1-Omethyl-Cmethyl, and C5-Oring-C1-Omethyl dihedral parameters. Also in question was the ability of the transferred nonbonded parameters on the Oring and Omethyl atoms to capture the energetics of hydrogen bonding in the context of these molecules.
QM MP2/cc-pVTZ//MP2/6-31G(d) scans of the Oring-C1-Omethyl-Cmethyl dihedral in both the α-anomer 2 and β-anomer 3 were done to characterize the potential energy surfaces and determine the location of the minimum energy conformations of the two anomers (Figure 2). The global minimum for both surfaces was at Oring-C1-Omethyl-Cmethyl = 60° for the α-anomer. The minimum on the β-anomer surface was 1.55 kcal/mol higher in energy and located at Oring-C1-Omethyl-Cmethyl = −60°. Although the O-methyl moiety is located in the axial position in the α-anomer, it is energetically more favorable than the equatorial substituted β-anomer. This well-known result is counter to the general trend that equatorial substitutions on saturated six-membered rings are more favorable than axial ones, a phenomenon referred to as the “anomeric effect.” The energy difference between the two anomers in the molecular mechanics framework is determined by the C5-Oring-C1-Omethyl dihedral, and ring deformation potential energy scans of this dihedral were also done for inclusion as target data during dihedral parameter fitting.
An initial set of parameters were developed by transferring existing ether C-C-O valence angle parameters to the Oring-C1-Omethyl valence angle and fitting the Oring-C1-Omethyl-Cmethyl, C2-C1-Omethyl-Cmethyl, and C5-Oring-C1-Omethyl dihedral parameters to reproduce the Oring-C1-Omethyl-Cmethyl and C5-Oring-C1-Omethyl QM potential energy scans. Using these initial parameters, both anomers were then fully geometry-optimized in the QM MP2/6-31G(d) and MM representations starting from Oring-C1-Omethyl-Cmethyl = 60° for the α-anomer and −60° for the β-anomer. Comparison of the MM geometries to the QM geometries showed errors in the C1-Omethyl bond and the Oring-C1-Omethyl and C2-C1-Omethyl angles. Additionally, pair interaction energies with water for these two conformations using the transferred partial charge of –0.34 e on Omethyl were consistently underestimated in the MM representation compared to the QM representation. Because of the errors in geometries and water pair interaction energies using transferred parameters from the linear ethers, further parameter optimization was undertaken.
A second iteration of parameter optimization was done to correct the QM water pair interaction energies while ensuring the molecular geometries and conformational energies were well reproduced. The C1-Omethyl geometry was improved by decreasing the transferred equilibrium bond length by 0.020 Å, and the Oring-C1-Omethyl and C2-C1-Omethyl geometries were improved by decreasing the transferred equilibrium value parameters by 1.5° to 110.0° and by 2.5° to 109.0°, respectively. Reproduction of the target vibrational frequencies was improved by increasing the Oring-C1-Omethyl force constant from 45 to 90 kcal*mol−1*radian−2. The partial charge on Omethyl was adjusted to −0.36 e, with +0.01 e added to the partial charges on C1 and Cmethyl to maintain charge neutrality, and the dihedral parameters were re-fit. With optimized partial charges, water pair interaction energies and distances in the MM representation compared very favorably to the QM target data, both for Omethyl and for Oring, which retained its initial partial charge of −0.40 e transferred from tetrahydropyran (Table 1, Figure 3). In order to use the same bonded parameters for both anomers, it was necessary to have the force field slightly underestimate Oring-C1-Omethyl and overestimate C2-C1-Omethyl for the α-anomer while doing the reverse for the β-anomer (Table 2). The valence angle parameters also yielded good agreement between the QM and MM vibrational frequencies having contributions from these degrees of freedom (Table 3). Conformational energies as well as the energy difference between the two anomers were very well reproduced with the fit dihedral parameters (Figure 2). In the conformational energy scans, the force field gives a ΔE of 1.53 kcal/mol between the minimum-energy conformations of the two anomers, in comparison to the reference MP2/cc-pVTZ//MP2/6-31G(d) value of 1.55 kcal/mol. In this case, the force field representation gives a significantly more accurate ΔE than the MP2/6-31G(d) value (2.25 kcal/mol), highlighting the importance of the choice of QM model chemistry for generation of target data.
Four different 1→1 glycosidic linkages can be formed between two hexopyranose monosaccharides depending on which anomers are linked together: α(1→1)α, α(1→1)β, β(1→1)α, and β(1→1)β. This number is reduced to three if the identities of the two component monosaccharides are the same, as the resultant α(1→1)β and β(1→1)α linked disaccharides constitute identical molecules. Accordingly, 4, 5, and 6 were used as model compound analogs of 1→1 linked hexopyranose disaccharides. These model compounds are dimers of tetrahydropyran connected by a glycosidic linkage and are analogous to α(1→1)β, α(1→1)α, and β(1→1)β hexopyranose disaccharides, respectively. Parameters to be determined for these linkages were those involving the glycosidic oxygen Olink. These included Olink-C1 bond parameters; C1-Olink-C1, Olink-C1-Oring, and Olink-C1-H valence angle parameters; C1-Olink-C1-X dihedral parameters (X = C2, Olink, or H), and nonbonded Olink parameters.
An attractive possibility for minimizing computer time and parametrization effort is to transfer analogous parameters from 2 and 3 to the full disaccharides, and this approach has been taken in other work. To test this approach, the QM Φ/Ψ two dimensional relaxed potential energy surface for the α(1→1)β analog 4 (Φ = Oring-C1-Olink-C1’, Ψ = C1-Olink-C1’-Oring’) was compared to the MM surface calculated using transferred parameters (Figure 4a,b). The obvious shortcoming of this approach is the incorrect ordering of the two minima on the QM surface at Φ/Ψ = 60°/−90° and 90°/60°; on the QM surface the global minimum is as 60°/−90° while on the MM surface it is at 90°/60°. Additional QM and MM Φ/Ψ scans done on 5 and 6 showed better agreement between the QM and MM surfaces, which is not surprising as they only have one low-lying minimum each (Figure 4d,e,g,h). However, in both instances the minima had notably steeper walls in the MM representation. Starting at the global minimum at 75°/75° for 5 (Figure 4d,e), the energy rises significantly faster moving along the valley toward 180°/90° in the MM as compared to the QM surface. Likewise for 6, starting at the global minimum at −75°/−75° and moving up the valley toward −180°/−60° the energy increases significantly more quickly in the MM as compared to the QM representation (Figure 4g,h). These problems reveal deficiencies in the transferred parameters when applied to disaccharide analogs.
Given the shortcoming of transferring parameters developed for 2 and 3 to the disaccharide analogs, parameters were fit directly to target QM Φ/Ψ data. The Oring-C1-Olink-C1 and C2-C1-Olink-C1 dihedral parameters that determine Φ/Ψ energetics were fit to 4, with extra weighting given to points in the two minima and ignoring points greater than 12 kcal/mol above the global minimum (in Eqn. 2, wi=20 for points in the minima, 0 for points with energy greater than 12 kcal/mol, and 1 otherwise). The optimized parameters based on 4 were then directly transferred to 5 and 6. Additionally, the C1-Olink-C1 valence angle parameters were optimized to reproduce distortion of this angle as a function of Φ/Ψ for 4–6. This required reduction of the force constant transferred from C1-Omethyl-Cmethyl from 95 kcal*mol−1*radian−2 to a value of 50 kcal*mol−1*radian−2 and increasing the equilibrium value from 109.7° to 111.5°.
Very good energies and C1-Olink-C1 valence angle geometries as a function of Φ/Ψ were achieved by the self-consistent optimization of the dihedral and valence angle parameters. The Φ/Ψ energy surface for 4 using these optimized parameters gives the correct ordering of the two minima (Figure 4a,c). The MM model also satisfactorily reproduces the energy surfaces for 5 and 6, with the walls of the minima being less steep than with the transferred parameters, in better agreement with QM data (Figure 4d,f and 4g,i). The MM model does an excellent job of reproducing the change in C1-Olink-C1 valence angle geometry as a function of Φ/Ψ for all three model compounds (Figure 5). The C1-Olink-C1 geometry varies by more than 20° across the Φ/Ψ surface for each model compound. Quite remarkably, the simple form of the potential energy function, with a harmonic term for valence angle distortion, is able to correctly reproduce such large magnitude valence angle distortions.
Water pair interactions energies and distances, intramolecular geometries of minimum energy conformations, and vibrational frequencies are also all well-reproduced using the optimized parameter set. The partial charge and Lennard-Jones parameters for Olink, transferred from the Omethyl atom in 2 and 3, capture the hydrogen bond energies and distances for 4–6 (Table 4). Of note is the fact that for the α(1→1)α analog 5, the hydrogen bond length is longer and the energy significantly less favorable than for the α(1→1)β and β(1→1)β analogs 4 and 6. This reflects the fact that water access to the Olink atom in the minimum energy geometry of 5 is sterically blocked by the two rings, whereas having at least one β anomer in the linkage opens up access to this atom. Minimum energy structures in the MM representation do a good job of reproducing bond, valence angle, and dihedral angle geometries relative to the QM MP2/6-31G(d) representation (Table 5); we note that the equilibrium length for the C1-Olink bonds in 4–6 was set to 1.415 Å, as is done for general linear ethers, compared to the value of 1.395 Å for compounds 2 and 3. The Φ and Ψ angles are particularly well represented by the MM model (Table 5), even though the same dihedral angle parameters are used for 4, 5, and 6. Finally, the good agreement between the MM and QM vibrational frequencies (Supplementary Material Figure S1) serves as additional confirmation of the appropriateness of the force field parameters.
Using two molecules of tetrahydropyran linked by a bridging ether as model compounds for the 1→2, 1→3, and 1→4 glycosidic linkages leads to a prohibitively large number of possibilities for all combinations of linkage type and linkage stereochemistry. Because the Oring atom in the second ring in these types of linkages is at least two atoms removed from Olink, a cyclohexane molecule was used to represent the second ring for the respective model compounds. For 1→2 linked pyranose disaccharides, the resultant model compounds, with all possibilities for stereochemistry at the C1 and C2’ positions, are compounds 7, 8, 9, and 10. Additionally, simply by renumbering the atoms on the cyclohexane ring, these four molecules correspond to all possible stereochemistries for 1→3 and 1→4 glycosidic linkage. Thus, energies and geometries of these four molecules were used to develop parameters for 1→2, 1→3, and 1→4 linked hexopyranose disaccharides. As with the 1→1 linkages, a single set of parameters was developed for all model compounds in the series.
Parameters except for the Oring-C1-Olink-C2’, C2-C1-Olink-C2’, C1-Olink-C2’-C1’, and C1-Olink-C2’-C3’ dihedral parameters were transferred from the 1→1 linkages, and cyclohexane parameters were those previously developed in the context of hexopyranoses.19 Automated fitting of these four dihedrals, with the C1-Olink-C2’-C1’ and C1-Olink-C2’-C3’ parameters equivalenced (i.e. constrained to be the same),56 was undertaken by including in the fit all points having QM energies less than 12 kcal/mol relative to the respective global minima on the four two-dimensional MP2/cc-pVTZ// MP2/6-31G(d) scans. While the fits were generally good, two issues arose: systematic deviation of the MM C1-Olink and Olink-C2’ bond and C1-Olink-C2’ angle values from the QM data, and difficulty reproducing the global energy minima on the surfaces for 9 and 10 where the energy is nearly constant over a large range of Ψ for a value of Φ = −60° (Φ = Oring-C1-Olink-C2’, Ψ = C1-Olink-C2’-C1’).
To overcome these limitations, a special weighting scheme was applied to the dihedral fitting, and the relevant bond and angle parameters were optimized to target the QM data. With regard to dihedral fitting, the automated dihedral fitting algorithm was extended to allow separate RMS alignment of each of the four MM surfaces to its corresponding QM surface during the simultaneous fitting of all four surfaces, as described in the Methods. This allowed the fitting to overcome any discrepancies in the MM ΔE values between any pair of surfaces as compared to the QM ΔE. Additionally, points with an energy value within 0.25 kcal/mol of the global minimum for each surface were given a weight wi of 1000, compared to a weight of 1 for other points below the 12 kcal/mol cutoff.56 This emphasis on a few low-lying points was helpful in fitting the flat global minima for compounds 9 and 10 (Figure 6). There were sufficiently few points with this high weighting such that, combined with optimized bond and angle parameters, the full shapes of the energy surfaces were all well reproduced (Figure 7). Bond optimization entailed adjusting the equilibrium values of the C1-Olink and Olink-C2’ bonds to 1.395 and 1.435 Å, respectively. Additionally, the final optimized C1-Olink-C2’ parameters (force constant 50 kcal*mol−1*radian−2 and equilibrium value of 109.2°) gave impressive reproduction of the angle distortion as a function of Φ/Ψ (Figure 8).
Intramolecular geometries and vibrational frequencies demonstrated the appropriateness of the final bonded parameters, and water pair interaction distances and energies confirmed the transferability of the Olink partial charge and Lennard-Jones nonbonded parameters from model compounds 4–6. Fully unconstrained MP2/6-31G(d) geometry optimizations done on the global minimum energy conformations from the two-dimensional Φ/Ψ scans showed little change in the value of the dihedral angles relative to the constrained values from the scans. Using the geometries from fully unconstrained QM optimizations, MM optimizations on compounds 7 and 8 gave structures in excellent agreement with the optimized QM structures (Table 6). However, for model compounds 9 and 10, the Ψ dihedral needed to be restrained to the QM value during the MM geometry optimization to prevent changes in this angle by +49.3° and +46.2° respectively. It is important to note that this simply reflects the very flat minimum energy well that these conformations exist in (Figure 6 and Figure 7e–h) and not a significant deficit in the force field. From Figure 6, it is apparent that Ψ values spanning 60° for 9 and 10 in the global minimum energy wells correspond to conformations within 0.5 kcal/mol of each other, with a majority of these being within 0.25 kcal/mol of each other. Because of the differences in the QM- versus unrestrained MM-optimized conformations for 9 and 10, vibrational frequencies for comparison were computed only for 7 and 8, and these had the same degree of accuracy as for model compounds 2–6 (Supplementary Material Figure S2).
Water pair interaction energies and distances for the QM and MM optimized conformations showed excellent agreement for 8 and 10, while hydrogen bonds were shorter and stronger in the MM representation relative to the QM for 7 and 9 (Table 7). Importantly, the hydrogen bonds for 8 and 10 are significantly shorter and more favorable in both representations than for 7 and 9, which have relatively long and weak hydrogen bonding interactions with the water molecule regardless of its orientation. Thus, the MM model is properly reproducing the relative hydrogen bond strengths for the different anomers. The disagreement for the weaker hydrogen bonding interactions may be ascribed to the inability of the water molecule to approach the Olink atom for 7 and 9 due to steric hinderance, and the MM model predicts a shorter interaction distance because it properly includes favorable dispersion interactions via the Lennard-Jones force field term whereas the Hartree-Fock level of theory used in the QM water pair interaction calculations does not model dispersion.
As discussed at the beginning of this section, the use of two molecules of tetrahydropyran linked by a bridging ether as model compounds for all possible 1→2, 1→3, and 1→4 glycosidic linkages would have been prohibitive. To confirm the appropriateness of the dihedral parameters developed using 7–10, MM scans using these parameters were done on analogs of 7–10 (7O, 8O, 9O, 10O) in which cyclohexane was replaced by a second molecule of tetrahydropyran, and compared to QM data. The scans were done at fixed values of Φ, chosen based on the global minima on the Φ/Ψ surfaces for 7–10 (Figure 9). In all four cases, the MM scans capture the features of the QM scans, and for 8O, 9O, and 10O (Figure 9b–d), the global minimum is correctly captured. In the case of 7O (Figure 9a), the global minimum is at Ψ = −90° in the QM representation as compared to Ψ = −150° in the MM. More than anything else, this is a reflection of the flat well surrounding the global minimum, also seen for compound 7 (Figure 7a). In the case of 7O, energies in this well that ranges from Ψ = −180° to Ψ = −75° are within 1.5 kcal/mol of each other. The MM representation correctly captures the span of this wide well and also the adjacent peak at Ψ = +15°. Thus, the dihedral parameters developed on 7–10 appear also to be appropriate when the cyclohexane ring is replaced by a second tetrahydropyran ring.
Compared to 1→1, 1→2, 1→3, and 1→4 linkages, 1→6 glycosidic linkages have an additional dihedral degree of freedom, making full QM characterization of an analogous model compound too computationally expensive. The three dihedral degrees of freedom in a 1→6 linkage (Φ = Oring-C1-Olink-C6’, Ψ = C1-Olink-C6’-C5’, Ω = Olink-C6’-C5’-Oring’) were thus parametrized based on the results for 2 and 3 as models for the Φ dihedral angle, and 11 and 12 as models for the Ψ and Ω dihedral angles. A single set of dihedral parameters gave excellent results for the Ψ/Ω surfaces for both 11 and 12 (Figure 10). These dihedral parameters along with optimization of the Olink-C6’-C5’ angle geometry also gave excellent minimum-energy geometries (Table 8) and good agreement with vibrational frequencies (Supplementary Material Figure S3), completing the parameter development for all the model compounds.
The parameters developed in the model compound analogs of disaccharides, along with existing parameters for hexopyranose monosaccharides,19 enabled the construction and simulation of disaccharides in the crystalline state (Table 9). Analysis of intramolecular geometries obtained from MD simulations of various disaccharide crystals showed several systematic differences in bond and valence angle geometries as determined by the model compound parameters, and these parameters were further optimized to correct for these systematic deviations. For C1-Olink-C1’ linked compounds, the model compound vs. full disaccharide parametrized equilibrium valence angle values were as follows: C2-C1-Olink 109.0° vs. 105.0°, and O5-C1-Olink 110.0° vs. 112.0°, with the model compound values followed by the full disaccharide values. For C1-Olink-Cn’ (n=2,3,4), the equilibrium bond and angle values were: Olink-Cn' 1.435 Å vs. 1.415 Å, C2-C1-Olink 109.0° vs. 105.0°, and O5-C1-Olink 110.0° vs. 112.0°. Of note is that the changes to the C2-C1-Olink and O5-C1-Olink angle parameters were the same for both C1-Olink-C1’ and C1-Olink-Cn’ (n=2,3,4) linked disaccharides relative to the model compounds. Perhaps it is not surprising that these two angle parameters required special attention; even in the case of model compounds 2 and 3, the changes in QM angle geometries going from the α- to the β-anomer resulted in MM errors having magnitudes of 1.5° to 1.8° but of differing signs depending on the anomer (Table 2), attesting to the challenge of developing general parameters involving the anomeric oxygen. The final set of parameters for all linkages between hexopyranoses is listed in Supplementary Material S4–S9.
A number of molecular dynamics (MD) simulations were undertaken to validate the development of bonded and nonbonded parameters for glycosidic linkages, and included a variety of disaccharides both in the aqueous phase and the crystalline phase. Crystal simulations were used primarily to confirm the ability of the force field to reproduce intramolecular geometries, thereby testing bond, angle, and dihedral parameters, with additional analysis of crystal cell volumes and their relevance to the nonbonded parameters. Aqueous phase simulations served to test the conformational properties of the various glycosidic linkages as determined by the dihedral parameters, as well as the ability of the force field model to capture aqueous solution density as a function of concentration. The results of these extensive simulations compared favorably to the available experimental data, thereby serving to validate the force field parametrization.
A comprehensive list of disaccharides representative of the parametrized linkages was simulated as infinite crystals using MD at constant temperature and pressure to test the ability of the force field to maintain intramolecular geometries and unit cell parameters. This list of 16 compounds, taken from the Cambridge Structural Database (CSD) of small molecules,61 included disaccharides containing not only glucose as a component hexopyranose monosaccharide but also allose, galactose, and mannose (Table 9). Additionally, a number of the disaccharides were O-methylated at the reducing end pyranose and/or contained water molecules in the crystal unit cell, both of which further tested the robustness of the force field. Finally, in all cases the simulated unit cell contained at least two independent disaccharide molecules, and a number of the simulations had four independent disaccharides. Overall, MD results of intramolecular geometries and unit cell parameters tabulated by averaging over snapshots from the trajectories, and over the independent disaccharide molecules in the case of intramolecular geometries, were quite consistent with the experimental values.
Errors in the disaccharide intramolecular geometries over the entire set of 16 crystals were impressively small for bonds, valence angles, and dihedral angles involving the glycosidic oxygen Olink (Table 10). Both bonds involving this atom had average errors no larger than 0.01 Å, and the spread in errors, as judged by the standard deviation in the errors, was also quite small, reflecting the fact that these bond lengths were well reproduced across the entire set of crystals. The situation was similar for valence angles, with all average errors for valence angles being less than 1.5° and having similarly small standard deviations in the errors. Of particular note is the C1-Olink-Cn’ valence angle with an average error of only −0.3°. Parameters for this angle were from careful parametrization aimed at reproducing not only the geometries of minimum-energy conformations of the disaccharide analog model compounds 4–10, but also the distortion of this angle with changing Φ/Ψ, including high-energy regions in the MP2/cc-pVTZ// MP2/6-31G(d) QM surfaces (Figure 4, Figure 5, Figure 7, and Figure 8). These parameters were directly used in the simulations of the full disaccharides and, along with C1-Olink-C6’ parameters transferred from linear ethers, yielded good reproduction of this glycosidic link angle across the full set of 1→1, 1→2, 1→3, 1→4, and 1→6 linked disaccharides (Figure 11). Finally, Φ, Ψ, and Ω dihedral angle values were well reproduced in the disaccharide crystal simulations using the dihedral parameters developed on model compounds 2–12 (Table 10).
Disaccharide crystal unit cell geometries and volumes were also tabulated from the MD simulations of the systems in Table 9, and were in good accord with the experimental values (Table 11). The crystal unit cell length parameters A, B, and C were well maintained for 15 of the 16 simulated systems. In those systems where the crystal unit cell angle parameter β was not 90°, it was allowed to vary during the simulation, and in all of the 9 total systems for which this was the case, β stayed close to the crystallographic value. The outlier with regard to the crystal unit cell parameter data was α,α-allo-trehalose trihydrate (YOXFUG). This case is the most challenging in the entire set in that it is a trihydrate, with 6 molecules of water and 2 of the disaccharide in the unit cell. Furthermore, it is the only crystal in the set to have P1 symmetry, thus all three angular unit cell parameters α, β, and γ were allowed to vary unconstrained during the simulation, whereas all other crystals had at most one unconstrained angular parameter (β) (Table 11). It is important to note that unit cell volumes are systematically overestimated across the entire set of compounds, a trend that has been noted in prior parametrizations of cyclic19 and linear carbohydrates.20 As noted in these prior works, this likely is a reflection of the difficulty in applying nonbonded parameters developed from simulations of neat liquid alcohols19 to the highly-directional hydrogen bonding networks in crystals. Also as noted in these prior works, when used in aqueous simulations, the parameters give excellent reproduction of solution densities, as discussed next.
Aqueous solutions at various concentrations of different disaccharides were simulated under the experimental conditions of 298 K and 1 atm to calculate densities (Eqs. 5 and 6), which compared favorably with experimental values across all solutes. For all disaccharides and all densities, the calculated solution densities are in good agreement with the experimental solution densities, with all errors within 1.5% (Table 12). For the fifteen aqueous solutions that were simulated, the average total error in the densities is 0.85% and the average total absolute error is 1.10%. Moreover, the force field model is able to reproduce experimental results both for dilute and concentrated solutions (maltose). Specifically, for simulations of dilute aqueous trehalose, cellobiose, gentiobiose, and melibiose, ranging in concentration from 0.1 to 0.3 mol/kg, errors range from 0.87% to 1.48%. For simulations of concentrated maltose, with concentrations between 1.5 to 2.6 mol/kg, errors range from −0.26% to −1.07%. For the most concentrated system, 2.6 molal maltose, nearly one third of the atoms in the system belong to solute molecules. As a result, there is significant direct contact between solute molecules, and it is not entirely surprising that the density is somewhat overestimated given that in the limiting case in which there are no water molecules, i.e. crystalline disaccharides, densities are systematically overestimated. More important though is that for all the aqueous systems, errors in density lie within a relatively small window, and the results are quite favorable given the 25-fold difference in concentrations between the most dilute versus the most concentrated systems studied.
Karplus-type equations have previously been developed for both vicinal 13C-1H and 13C-13C spin-couplings to relate the observed three-bond coupling constant values 3 J to dihedral angle values in the glycosidic linkage (see Methods).57–59 Such equations provide a means of comparing the conformational properties of the glycosidic linkage in MD simulations of disaccharides in water to NMR observables. Consequently, they can be used to help validate the dihedral parameters developed using gas-phase MP2/cc-pVTZ//MP2/6-31G(d) QM data from model compounds when applied to full disaccharides in aqueous solution.
The selection of disaccharides for MD J-coupling studies was based on the availability of experimental data as well as the desire to test all the different dihedral parameters. As described above, separate glycosidic dihedral parameters were developed for 1→1 linkages (model compounds 4–6; “Category 1”), for 1→2, 1→3, and 1→4 linkages (model compounds 7–10; “Category 2”), and for 1→6 linkages (model compounds 2–3 and 11–12; “Category 3”). Accordingly, disaccharides were chosen that fell into each of these three categories. α,α-trehalose was used to test Category 1 linkage parameters, maltose and cellobiose for Category 2 linkage parameters, and melibiose and gentiobiose for Category 3 parameters. In addition to the availability of recent NMR J-coupling data for these molecules,62,63 the tested Category 2 and 3 disaccharides are representative of both α and β linkages, which is important given that the same dihedral parameters are used for a given type of linkage regardless of the chiralities at the ring carbon atoms involved in the linkage. While Category 2 encompasses 1→2 and 1→3 linkages as well as 1→4 linkages, the 1→4 linked molecules maltose and cellobiose were chosen to represent this category because of their biological significance. Although the set of disaccharides does cover all of the dihedral parameters developed for glycosidic linkages based on the conformational properties of model compounds 2–12, it is worth noting that the configuration at the carbon on the reducing-end monosaccharide involved in the glycosidic linkage in all of these disaccharides is equatorial, reflecting a lack of available experimental data.
The referenced experimental glycosidic 3 JCOCH coupling constants for trehalose, maltose, cellobiose, melibiose, and gentiobiose span the range of 2.36 to 4.75 Hz (Table 13). Values for 3 JCOCH calculated from the MD simulations are representative of this range and correlate well with the experimental data. Interestingly, depending on whether Eqn. 7 or Eqn. 8 was used to convert the MD conformational data to 3 JCOCH, the calculated values for a given dihedral in a given simulation differ by as much as 0.85 Hz (cellobiose 3 J(C1OlinkCn’Hn’)). Furthermore, the variability of the calculated values can lead to underestimation when using one equation and overestimation when using the other, as is the case for both 3 J(H1C1OlinkCn’) and 3 J(C1OlinkCn’Hn’) for maltose. Readers interested in further detail are directed to Ref. 61, Table 1, which summarizes both older and more recent experimental and computed 3 JCOCH values, and shows that for e.g. maltose, experimental data can vary by as much as 2.2 Hz and computed data by 2.4 Hz depending on the source. In the case of 3 JCOCC values, the experimental data are limited to cellobiose and gentiobiose (Table 14). The reference data range from ≤0.5 up to 3.0 Hz, and both the low and the high 3 JCOCC are faithfully reproduced by the simulations. The overall closeness of the calculated data to the experimental data across all compounds, especially taking into account the inherent uncertainties introduced by using empirical equations to convert observed MD conformational data to NMR 3 J values, suggests the force field parameters are properly reproducing the true aqueous solution conformational behavior of this variety of glycosidic linkages.
The present work extends the developing CHARMM all-atom additive force field for carbohydrates19,20 to hexopyranose glycosides. The newly-developed and validated parameters allow for the modeling of all possible 1→1, 1→2, 1→3, 1→4, and 1→6 linkages between hexopyranoses, as well as O-methylation at the C1 position. The parameter development protocol and the force field functional form are consistent not only with recent CHARMM carbohydrate parametrizations,19,20 but also with the CHARMM all-atom additive force fields for proteins, nucleic acids, and lipids; the parameter set is therefore an extension of the CHARMM all-atom additive biomolecular force field. Given the comprehensive coverage of glycosidic linkages and the compatability with other CHARMM biomolecular force fields, the presented force field is expected be of utility for the simulation of linear, cyclic, and branched hexopyranose glycosides in solution, including in heterogeneous systems that include proteins, lipids, and/or nucleic acids.
The present work has focused on the development of a highly-optimized force field for the glycosidic linkages between hexopyranoses, with validation focusing on disaccharide crystalline intramolecular geometries and unit cell parameters, solution densities, and conformational properties in aqueous solution. Preliminary data on dynamic properties such as diffusion coefficients and NMR relaxation rates show promise with regard to experimental results, and a thorough analysis of simulated dynamic properties and existing experimental data is under preparation.
This work was supported by NIH GM070855 (ADM) and F32CA119771 (OG). The authors acknowledge computer time and resources from the National Cancer Institute Advanced Biomedical Computing Center, Department of Defense High Performance Computing, and the Pittsburgh Supercomputing Center.
Supplementary material includes three figures and force field parameters. This information is available free of charge via the Internet at http://pubs.acs.org. The force field topologies and parameters in CHARMM readable format may be obtained from the web site of ADM at http://mackerell.umaryland.edu.