|Home | About | Journals | Submit | Contact Us | Français|
Aromatic molecules with π electrons are commonly involved in chemical and biological recognitions. For example, nucleobases play central roles in DNA/RNA structure and their interactions with proteins. The delocalization of the π electrons is responsible for the high polarizability of aromatic molecules. In this work, the AMOEBA force field has been developed and applied to 5 regular nucleobases and 12 aromatic molecules. The permanent electrostatic energy is expressed as atomic multipole interactions between atom pairs, and many-body polarization is accounted for by mutually induced atomic dipoles. We have systematically investigated aromatic ring stacking and aromatic-water interactions for nucleobases and aromatic molecules, as well as base–base hydrogen-bonding pair interactions, all at various distances and orientations. van der Waals parameters were determined by comparison to the quantum mechanical interaction energy of these dimers and fine-tuned using condensed phase simulation. By comparing to quantum mechanical calculations, we show that the resulting classical potential is able to accurately describe molecular polarizability, molecular vibrational frequency, and dimer interaction energy of these aromatic systems. Condensed phase properties, including hydration free energy, liquid density, and heat of vaporization, are also in good overall agreement with experimental values. The structures of benzene liquid phase and benzene-water solution were also investigated by simulation and compared with experimental and PDB structure derived statistical results.
Molecular mechanics (MM) modeling has become an important tool in the study of complex biological and chemical systems,1,2 including their structure, interaction, energy landscape, and dynamic behavior, often at time and physical scales difficult for experimental methods to access.3−6 The growth in application of molecular modeling demands more accurate force fields in order to obtain more reliable physical and chemical prediction.7,8 Currently, the commonly used MM force fields, AMBER,9 CHARMM,10 and OPLS,11 among others, all use fixed atomic charges to represent electrostatic potentials. In recent years, there have been increasing efforts to improve the modeling electrostatic potentials in general force fields. This has mainly been attempted via the introduction of explicit electronic polarization, either via Drude oscillator12−15 or induced dipole schemes.16−20 Recently, two comprehensive reviews on polarizable force field have been published.21,22 Polarizable force fields offer a practical and effective way of capturing the nonadditive effect in electrostatics, which can change significantly in different chemical environments, such as in gas vs liquid phase, or when surrounded by different solvents. The fixed charge force field, in which the influence of polarization is included in an average fashion, has limited transferability and is thus parametrized for specific environments or applications.8 In addition, adding additional off-atom-center charge sites23 or higher order atomic multipole moments are also found necessary to accurately describe the anisotropic electrostatic potential around molecules.24,25 Both atomic multipole moments and induced-dipole based polarization are employed in the AMOEBA force field,26 which has been developed and applied extensively to study water,17 various small molecules,27,28 ions,29 and proteins.30
Nucleobase stacking, hydrogen bond pairing,31 and the solvation effect (interaction with water) contribute to the specific nucleic acid structures.32 Besides the well-known Watson–Crick base pairs, hydrogen bonding base pairs using the Hoogsteen edge and the sugar edge are also observed in the DNA/RNA structure database.33 With recent advancements in computing power, high-level QM methods have been applied to the study of base stacking energy and the potential energy surface.34,35 Molecular dynamics simulations of nucleic acids show the limitations in accuracy of the recent force fields,36 with the lack of polarization specifically cited as one of the problems.37,38 For these nucleobases, especially the large adenine and guanine, the delocalization of the π electrons leads to a high molecular polarizability,39 which was found to be related to the thermal stability of the DNA duplex.40 Multipole electrostatic interactions are also important for anisotropic π-systems. With the fixed charge model, it is difficult to capture correct parallel stacking and T-shape stacking structures at the same time.41 Thus, the major goal of this work is to construct a polarizable multipole-based classical potential for heteroaromatic molecules using the AMOEBA framework, which is a necessary step to develop a general nucleic acid force field.
While nucleobase properties and interaction energies in gas phase are readily available from QM calculations, experimental data of nuleobases in condensed phase are lacking. Therefore, 12 aromatic molecules (Figure Figure11), for which experimental data of pure liquid and/or solution are available, have been included to systematically derive the force field for aromatics, especially shared valence and vdW parameters, and study the condensed phase properties for π-systems. Benzene is the aromatic molecule that has been most thoroughly studied by experimental, ab initio, and molecular simulation methods. Previous studies have investigated molecular properties in gas phase, π–π stacking energy,42 pure liquid structure43 and thermodynamic properties,44,45 and structures in water solutions.46−49 Pyrrole, pyridine, and aniline are three types of π-systems with one-heteronitrogen. 2-Aminopyridine, pyrimidine, 2-pyridone, 1-methyl-2-pyridone, 1-methylimidizole, and indole all share similar functional groups with nucleobases. Methylindole and methylpyridine are included because of the existence of experimental condensed phase data. Methylindole, benzene, and 1-methylimidizole are also found in the side chains of amino acids.50 These heteroaromatic molecules are also the core fragments of many drug-like molecules. Thus, this work will not only contribute to a new polarizable force field for nucleobases but will also be useful for modeling and understanding aromatic rings in general chemical and biological recognition.51
For the five regular nucleobases (see structures in Figure S1) and these aromatic molecules, the molecular polarization, vibrational frequencies, electrostatic potentials, and conformational energies are investigated first by using QM methods. To accurately model the nonbonded interactions, a set of interacting pairs was designed, including aromatic-water pairs, stacking pairs, and hydration base pairs. We then proceeded to calculate the corresponding QM energies. Then, we present the parametrization and performance of the new AMOEBA force field for bases, including the comparison with all of the QM results, as well as experimental condensed phase properties such as liquid density, heat of vaporization, and hydration free energy. The pure benzene liquid and benzene-water solution structures were also analyzed and compared with previous experimental results.
Gaussian 0952 was used for the ab initio QM calculations unless otherwise specified. Structure optimizations were performed using MP2/cc-pVQZ. Molecular polarization and molecular vibrational frequency were calculated with the wB97xD/aug-cc-pVTZ method. For computation of out-of-plane bending or rotational conformation energies MP2/aug-cc-pVTZ was used. The permanent atomic multipole moments were derived using the same procedure used in parametrization of small organic molecules28 and proteins.30 The atomic multipole moments were first assigned from QM electron density calculated at the MP2/6-311G** level and using Stone’s distributed multipole analysis (GDMA v2.2).53 The “Switch 0” and “Radius H 0.65” options were used with GDMA to access the “original” DMA procedure.54 The DMA charges were kept fixed, while the atomic dipole and quadrupole values were subsequently optimized against the QM electrostatic potentials computed using an MP2/aug-cc-pVTZ basis set.
QCHEM 4.255 was used to compute interaction energy for all dimers. The RIMP2 method, which provides accurate energy values and is computational efficient,56 was used with the dual basis set. For aromatic ring stacking energy calculation, cc-pVTZ and racc-pVTZ were used, and for other configurations and systems, aug-cc-pVTZ and racc-pVTZ were used.
All force field based calculations were performed using TINKER 7.57 The VALENCE program in TINKER was used to calculate molecular frequencies. The TINKER POLARIZE program was used to compute molecular polarizabilities based on atomic polarizability parameters. TINKER program POTENTIAL was used to obtain electrostatic potentials (ESP) around a molecule from Gaussian 09 cube files or based on AMOEBA atomic multipole parameters. POTENTIAL was also used to optimize the electrostatic parameters (permanent atomic dipole and quadrupole moments) against QM electrostatic potential grids (MP2/aug-cc-pVTZ). The AMOEBA force field parameters for water17 from previous studies were used here. The TINKER MINIMIZE program was used to optimize molecular structures using the AMOEBA force field, with the convergence criteria set to 0.0001 kcal/mol/Å. The TINKER ANALYZE program was used to calculate the total potential energy and the energy components, as well as the molecular multipole moments and induced dipole moments.
The nonlinear least-squares optimization method “lsqnonlin” in MATLAB58 was used to fit various parameters against QM values, including out-of-plane bending or rotational conformation energies and interaction energy between two monomers in a pair. The inputs of “lsqnonlin” program are the objective function, initial parameters, and the parameter limits. The objective function calculates the mean square of the difference between MM and QM values of all the structures. For more details, refer to the MATLAB program code in the Supporting Information.
Molecular dynamics (MD) simulations were carried out using TINKER 7.57 The Bussi-Parrinello thermostat59 and RESPA integration60 method were used in all the simulations, with a 2 fs time-step. Spherical energy cutoffs for van der Waals and Ewald direct-sum were 12.0 and 7.0 Å, respectively. Default PME cutoff and grid sizes were used in the reciprocal space.
For each of the aromatic neat liquids, simulations were performed at four temperatures, including the melting temperature, room temperature, and the boiling temperature. For each temperature, a cubic box containing 300 molecules was built using PACKMOL.61 The initial box sizes were set to match the density to the corresponding experimental value. All systems were relaxed via a 1 ns NVT simulation. Subsequently, NPT simulations, utilizing a Berendsen barostat62 and a target pressure of 1 atm, were performed for 2 ns at each of the four temperatures to evaluate the density and heat of vaporization of each liquid. In heat of vaporization calculations at boiling temperature, the gas-phase potential energy was calculated via stochastic simulation63 on a single molecule using the default (91.0 ps–1) friction coefficient and 0.1 fs time step.
The Bennett acceptance ratio (BAR) method was used for hydration free energy calculation, using protocol similar to previous studies.64 The aromatic molecule was solvated in a 40 × 40 × 40 Å3 cubic box containing 2138 water molecules. After system equilibration using NPT simulations at 298 K for 1 ns, the electrostatic interaction between aromatic molecule and water molecules was turned off with a scaling factor λ from 1.0 to 0.0 at a constant interval of 0.1. Here, 1.0 means 100% electrostatic interaction strength between the solute and water, while 0.0 means there is no electrostatic interaction. Subsequently, the soft-core van der Waals (vdW) interactions were scaled down in a series of steps: 1.0, 0.9, 0.8, 0.75, 0.7, 0.65, 0.6, 0.55, 0.5, 0.45, 0.4, 0.3, 0.2, 0.1, 0.0, which are scaling factors for the vdW interactions between the solute and environment. NVT simulation was carried out at each λ value for 2 ns. In the gas-phase recharging process (growing the solute electrostatics back to 100 in gas phase), the λ interval was set to 0.1, and the stochastic integrator was used with a 0.1 fs time-step, with a run time of 10 ns.
The AMOEBA potential energy function comprises bonded (valence) and nonbonded interaction terms. Bond stretching, angle bending, out-of-plane bending, and torsional energy terms describe the valence potential in a molecule. van der Waals, permanent, and induced electrostatic interactions are the major factors for molecular interaction (see the functional forms in the SI or ref (26)). All the atoms in the studied molecules are reduced to 20 atomic classes (listed in Table 1) based on their chemical or environmental properties. The vdW and valence parameters in AMOEBA are determined according to atom “classes”, while electrostatic parameters are defined by more refined atom “types.” Different atom types can belong to the same atom class, which means atom classes are a super set of “atom types”. Carbon and nitrogen atoms on the 5-member ring, 6-member ring, and on the bridge of double-ring were put into different classes. Carboxyl carbon atoms were set to an individual class. Single rings were determined to hold both polar and nonpolar carbon types. Polar aromatic carbons on the single ring are defined as those linked with at least one nitrogen or carboxyl carbon on the ring. Carbons on the indole ring (except the bridge carbons) show similar properties with single-ring polar aromatic carbons, thus they were assigned to the same class. Nitrogen atoms on the aromatic ring with lone pair electron (pyridine nitrogen) and without lone pair (pyrrole nitrogen) belong to two different classes.
As with previous force field development for small organic molecules,27 all the parameters, including atom polarizability, multipole moments, van der Waals, out-of-plane bending, torsion, and valence, were determined based on corresponding QM calculations of single molecules or dimers and subsequently fine-tuned by condensed phase simulations. AMOEBA electrostatic interactions comprise permanent and induced components. The permanent multipole moments were derived from Distributed Multipole Analysis (DMA)53 and then optimized to the QM electrostatic potential surface, a procedure described in detail previously.27 In the ESP fitting process for each base, in addition to the gas-phase monomer structure, the structure of the corresponding QM-optimized Watson–Crick base pair (AT, AU, and GC) was also used to ensure the transferability of multipole parameters among different molecular geometries. Atomic polarizabilities were adjusted from previous canonical values28,30 to match the QM-derived molecular polarizability tensor. Three important interacting dimer configurations, hydrogen-bonding pairs (if available), stacking, and aromatic-water, were constructed with different orientations and distances. The QM interaction energy for these dimers, after basis set superposition error correction, was utilized to optimize the van der Waals parameters. Next, a series of distorted conformations were generated with out-of-plane bending or rotation operation to fit the out-of-plane bending and torsion parameters. The bond and angle force constants were then determined by fitting to QM frequencies of all vibrational modes. It was necessary to re-examine the distorted conformational energy after valence parameters were finalized. Condensed phase properties, hydration free energy, liquid density, and heat of vaporization were used for fine-tuning the vdW and multipole parameters. The force field parameters of water for aromatic-water and hydration free energy calculation were from our previous work.17
Electronic polarization is described in our model via Thole’s damped induction approach.65 Besides the molecules listed in Figure Figure11, three other molecules are also included in this study, imidazole, phenol, and naphthalene. Excluding benzene, the molecular polarizability values determined by using the previous parameters28 are systematically lower than the QM value. Thus, we determined that the atomic polarizability parameters of nitrogen, oxygen, and the bridge carbon of the double-ring needed separate types, and their values were optimized to reproduce the three eigenvalues of the QM molecular polarizability tensors. We found that the polarizability of the following two types of nitrogen atoms should be increased by 35% (from 1.037 Å3 to 1.450 Å3): the N within the aromatic ring and with an electron lone pair (e.g., N of pyridine) and the N of the amino group connecting to the aromatic ring. The “H” atom of the amino group has the same polarizability as the “H” in benzene (0.696 Å3). The aromatic carboxyl “O” atom needs 55% greater atomic polarizability (from 1.037 Å3 to 1.450 Å3), and the bridge C needs 28% greater atomic polarizability (from 1.750 Å3 to 2.250 Å3). Table 2 lists the molecular polarizabilities calculated using the old and new AMOEBA parameters, compared with QM values. Using the new parameters, the correlation coefficient between QM and AMOEBA molecular polarizabilities was improved from R2 = 0.975 to R2 = 0.986, and the relative root-mean-square error was improved from 6.0% to 4.7%. The Thole damping coefficient for all atom types is fixed at 0.390 as in previous work.27
The permanent electrostatic energy is expressed as the sum of multipole-multipole interactions between atom pairs, excluding nearby through-bond pairs. For atomic dipole and quadrupole moments, a local frame is defined at each atom based upon the neighboring atoms. Most atoms follow the “Z-then-X” convention. The heaviest bonded atom is selected to define the Z-axis, and another neighboring atom defines the ZX-plane and the positive X-direction. If the linking of an atom has 2-fold symmetry, like the carbon of benzene, the “bisector” convention was used for this atom. For methyl carbons, the “Z only” convention is used, where the X-axis and Y-axis are on the plane perpendicular to the Z-axis.
The electron density matrix was computed at the MP2/6-311G** level, from which the initial atomic multipole moments were obtained using Stone’s distributed multipole analysis (DMA).53 The electrostatic potential on a grid of points around a molecule is constructed from MP2/aug-cc-pVTZ calculations. Grid points of four shells (0.35 Å apart) were generated around the molecule with an offset of 1 Å from the vdW surface. Then the multipole parameters for this molecule were determined by fitting to the electrostatic potential grids of one or two conformers using the TINKER POTENTIAL program. Only the atomic dipole and quadrupole moments were allowed to deviate from the DMA values during the ESP optimization, which stops when a gradient of 0.1 kcal/mol/electron2 is achieved. The root-mean-square of the grid potential between the QM and MM potential was smaller than 0.30 kcal/mol per unit charge (except for guanine, see Table S1).
Water was used as a probe for the nonbonded interactions of a target molecule. Two kinds of base-water (Figure Figure22A) or aromatic-water (Figure S3) dimers were designed. In the first, a water molecule was put in the plane of the target molecule as a hydrogen bond donor, or acceptor, or both. Using HF/6-311G*, the dimer structure was optimized, and the distance between the two monomers was used as the equilibrium distance. The low level QM method was used here in order to broadly scan the potential energy surface at different distances and geometry, instead of finding the absolute minimum. In the second kind of dimer, the water molecule was positioned above or below the ring, forming a π-H hydrogen bond. After calculating the interaction energy at several discrete distances, the distance with minimum energy was assigned as the equilibrium distance.
Depending on the base edges participating in the interaction (Watson–Crick, Hoogsteen, or sugar edge) and the orientation of the glycosidic bonds relative to the hydrogen bonds (cis or trans), base pairs can be roughly divided into 12 classes (see UG pairs in Figure Figure33A).33 The regular Watson–Crick pair (for AU, AT, or GC) is a Watson–Crick edge–Watson–Crick edge pair in cis (WCcis for abbreviation). Hoogsteen pairs are mostly Watson–Crick edge–Hoogsteen edge pair (WCH for abbreviation). All of the 12 classes mentioned here are observed in RNA crystal structures and are important for the formation of different secondary and tertiary motifs of RNA and DNA. Among the pairs listed in Westhof lab’s work,35 all those with at least two hydrogen bonds and without protonation states were selected for studying base–base interaction and are shown in Figure S2. The hydrogen bonding pairs were also optimized using the simple HF/6-311G* method in Gaussian.
If two stacking bases are parallel, the angle of the two plane vector can be 0° (defined as cis) or 180° (defined as trans). Additional configurations occur through translation (rise, slide, and shift) and rotation (twist, roll, and tilt) as in nucleic acid helical stacking.35 We generated some configurations for vdW parameter fitting using the following process. For each of the 15 kinds of pairs (forming by any two of methylated A, C, G, T, and U), the initial conformation of cis or trans stacking was generated using a 3.3 Å rise. Then the dimer structure was optimized using HF/6-311G*. Since the potential surface of stacking is very flat, the optimization was stopped when the root-mean-square of force was smaller than 0.000300 au. This same stacking conformation generation method was used for all 11 aromatic rings (a-k in Figure Figure11). The stacking conformations for nucleobases and aromatic molecules are shown in Figure Figure44A and Figure S5, respectively.
After preparing the above three types of dimers (aromatic-water, ring–ring stacking, and hydrogen bonding) at equilibrium distance of an orientation, 5 or 4 dimer structures with different distances were generated by translation from the equilibrium. Any atom of a molecule within 4.5 Å of the partner molecule is defined as an interface atom. The translation vector is defined as the direction of the two center points of the interface atoms.
Using the QM (RIMP2/cc-pVTZ or RIMP2/aug-cc-pVTZ) interaction energy of these designed pairs as the targets, the vdW parameters (atom diameter D, the potential well depth σ, and reduction factor17,66 for hydrogens I) for the 14 vdW types (Table 1) were optimized. The target vdW potential energy was the total QM interaction energy minus the AMOEBA electrostatic energy. Similar atoms, such as polar N on the 5-member and 6-member rings are assigned the same vdW type. The hydrogens linked with N atoms are also treated as the same type. Using the MATLAB nonlinear optimization program, the mean square of the difference between MM and QM values of all the weighted dimers was minimized. For every dimer orientation, the lowest target vdW energy was found from a set of distances. Then, the weight of a dimer in this series was assigned based on its relative vdW energy from the minimum. The initial values of the vdW parameters were picked from the original values of AMOEBA (amoebapro13.prm30). The three kinds of parameters D, σ, and I should have different parameter “stiffness”. For example, the relative change in σ could be greater than the molecular size D. The parameter deviation from the original value was added to the objective function using a harmonic function, and the stiffness was expressed as the force constant. See the program code for parameter optimization in the Supporting Information.
The potential well depth σ for almost all of the heavy aromatic atoms are ~20% bigger than the initial values (Table 1). Smaller carbon and larger hydrogen radii seem to better reproduce the interaction of polar aromatic C–H. The RMSE for the three types of pairs are 0.69–1.64 kcal/mol (Tables S2–S4 and Tables S6–S7). The correlation coefficient between QM and AMOEBA interaction energies for a large number of molecular dimers is rather satisfactory. In general, for most cases, the AMOEBA dimer interaction energies with separation distance equal or greater than the equilibrium distance matched well with QM values, and those with shorter distance were somewhat more repulsive (Figures Figures22B, B,3B,3B, B,4B,4B, S4, and S6).
With thermal energy, atoms in the bases can bend out of the plane with different strength. This kind of distortion was described by three kinds of valence potential functions in AMOEBA: the out-of-plane bending, torsion, and π-orbital torsion. Only the 2-fold Fourier torsion terms, with 180° as the phase angle, were used for torsions angles formed by 4 linear arranged atoms on the plane of the aromatic ring. The 2-fold and 3-fold Fourier terms were used for the torsion of aromatic ring–methyl bond. A Wilson-Decius-Cross function is used at sp2-hybridized trigonal centers to restrain out-of-plane bending.67 π-orbital torsion describes the interaction of π-orbitals. Each sp2 carbon or nitrogen has a π-orbital which is perpendicular to the plane of the three linked atoms. The π-orbital torsion energy is a function of the torsion angle between the two adjacent π-orbitals.68
To investigate the bending energy of the out-of-plane distortion and the rotation energy of methyl and amino groups, different kinds of structures were generated from the most stable structure of an aromatic molecule (Figure Figure55). Out-of-plane distortion of one or a group of atoms are generated by rotating about a line (axis) defined by two atoms on the ring (see pyridine distortion in Figure Figure55C). The four types of distortion conformations are side atom/group out of plane, one ring atom out of plane, two ring atoms out of plane, and one ring out of plane in a double-ring molecule (the axes of these out-of-plane distortion conformations are shown in different colors in Figures Figures55B and and5E).5E). The rotation angles were 7.5°, 10.6°, 13.0°, 15.0°, 16.8°, and 18.4°. Totally we got 1176 distorted structures across all 17 molecules. Then the QM energy for each generated structure was calculated using MP2/aug-cc-pVTZ. Six types of AMOEBA interactions contribute to the total deformation energy, including torsion, out-of-plane bending, π-orbital torsion, angle bending, the intramolecular vdW, and intramolecular electrostatic potential. All the π-orbital torsions were divided into 7 types, and the force constants (see Table S8) were slightly modified from the original values of AMOEBA.30 The total QM deformation energy minus the latter four terms is the target energy for torsion and out-of-plane bending force constant fitting. The force constants of 120 out-of-plane bending and parameters for 194 torsion angles (see these in the AMOEBA parameter file) were optimized together using MATLAB nonlinear optimization. The correlation coefficient square of the distorted conformational energy between QM and MM is 0.92 (Figure S7). The distorted conformational energy of QM and AMOEBA for pyridine (PD) and 9-methylguanine (Gm) are shown in Figures Figures55A and and5D,5D, respectively.
The equilibrium bond lengths and bond angles were assigned as the average QM value in all molecules. The force constant parameters for bond stretching and angle bending were optimized, starting from typical values from prior AMOEBA force fields, to best match between the AMOEBA and QM vibrational frequencies. The frequencies were most strongly related to the force constants of hydrogen-containing bonds and angles. These parameters were optimized first. Then, these parameters were fixed, while the others were optimized. The bonds and angles were divided to several common types, with each type sharing the same force constant. A couple of iterations were needed to fully optimize all valence parameters. The average of unsigned error in vibrational frequencies for all these 16 molecules is 31.9 cm–1, about 1% error (see the comparison between AMOEBA and QM results in Figures Figures66 and S8).
The QM structures of 37 nucleobase pairs were optimized using the AMOEBA force field. The all-atom root-mean-square deviation (RMSD) between the QM structure and the AMOEBA structure is less than 0.3 Å, and the average of RMSD is only 0.11 Å (Table S9). The hydrogen bonding geometries of AU, GC Watson–Crick pairs, and AT Hoogsteen pairs reproduce the X-ray69 or neutron results70 (see Figure Figure77). The AMOEBA energy minima of these nucleobase pairs are also highly correlated with the QM values, and the correlation coefficient square is 0.972 (Table S9).
To fine-tune and validate the new AMOEBA force field parameters for nucleic acid bases and aromatic molecules, pure liquids and hydration have been simulated. The density of liquid at 4 temperatures, the heat of vaporization of liquid at the boiling, and the hydration free energy were calculated and compared to the experimental values. The experimental data of liquid density of 9 molecules and heat of vaporization of 8 liquids were obtained from the NIST/TRC Web Thermo Tables (WTT) database.71 Experimental hydration free energies are collected from the FreeSolv database72 (6 molecules) and other references.73−75 Vdw parameters were refined to better fit the experimental liquid density and heat of vaporization. For example, a size combination 3.74:3.08 Å of polar carbon and the linking hydrogen was better than the QM-fitted 3.70:3.14 Å for most related liquid densities. A smaller lone-pair nitrogen (from 3.69 to 3.64) also gave a better density for pyridine. For electrostatic parameters, only the point charge values of the carboxyl (C=O) with a methyl group were modified. The oxygen atomic charge becomes 5% more negative, and the positive charge on the carbon increased by the same amount. This change noticeably improves the liquid phase results for 1-methyl-2-pyridone (NM) in comparison with experiment (Table S11). With the final parameters, at room temperature, the errors of calculated densities are all within 2% of the experimental value (Figure Figure88 and Table S10). The differences of the calculated and experimental heat of vaporization for these 8 molecules were all within 1.20 kcal/mol, and the RMSE and correlation coefficient were 0.831 kcal/mol and R2 = 0.884, respectively (Table 3 and Figure Figure99A). The AMEOBA-calculated hydration free energies of the 10 molecules were also in good agreement with experimental measurements (Table 4 and Figure Figure99B).73 The RMSE between the calculated and experimental values was 0.907 kcal/mol, and the correlation coefficient was R2 = 0.950. The differences were all within 1.20 kcal/mol except methylindole (ML).
Structural features of pure liquid benzene and dilute benzene-water solution were obtained from 10 ns MD simulations and compared with experimental data or statistical results. Figure S9A shows the calculated ring-center—ring-center radial distribution functions in benzene liquid. Simulated benzene liquid shows a nearest neighbor coordination shell from approximately 4.0 to 7.5 Å, with a maxima at 5.65 Å. In the nearest neighbor shell, there are approximately 12 molecules. The orientational liquid structure was investigated using the radial distribution function of the angle between the normal to aromatic planes. Figure Figure1010A shows the calculated 2D angular-radial distribution function from benzene liquid simulation. For parallel stacking, the main maximum is at 5.6 Å, with a shoulder at 4.0 Å. The perpendicular geometry shows a higher peak at about 5.7 Å. Figure S9B shows that molecular orientations of the first shell of benzene are nearly isotropic, and when the molecular separation is less than 5.0 Å, the parallel configuration is favored; and when it is between 5.0 and 6.0 Å, the perpendicular configuration is favored. All these results are in accordance with the high–resolution neutron diffraction data43 and are as good as the best recent popular force fields (in ref (76), the benzene liquid RDF and angular RDF performances of 8 force fields were reported).76 The distribution of water molecules around aromatic rings in solution was investigated from the simulation trajectories. A 2D angular-radial distribution function was also plotted (Figure Figure1010B). The distance is defined as from the center of benzene to the oxygen of water, and the angle was defined as the angle between the benzene normal and the line of benzene-center—oxygen. The position at the normal line for water at 3.1–3.6 Å has the largest density. and the peak for the position on the plane was at about 5.0 Å. These features match well with the statistical results from protein structure database (PDB) and Cambridge Structural Database (CSD).47
In this work, we constructed a classical polarizable multipole based force field for common nucleic acid bases and their analogs. QM methods were applied to provide target interaction energies for base-water, base pairing and stacking at various distances and orientations. Although the MP2 method overestimates the energy of stacking with the aug-cc-pVTZ basis set,77,78 the MP2/cc-pVTZ energy of the ten stacked nucleobase dimers agreed well (RMSE = 0.65 kcal/mol) with the reference energy79 using a complete basis set and CCSD(T) correction, namely the CBS(T) (Table S5). Energy computed using the new AMOEBA force field results in slightly better RMSE (RMSE = 0.48 kcal/mol) with the CBS(T) energy of these ten dimers. Sherrill’s base–base stacking database35 was also used to validate the new force field, in which the QM energy was calculated using SCS(MI)-MP2/cc-pVTZ. The correlation coefficient (R2) between AMOEBA and the SCS(MI)-MP2 energy for the three kinds of conformation in the database, rise-twist, slide-shift, and roll-tilt, were 0.932, 0.964, and 0.903, respectively (Figure S11). GU and AC stacking pairs were studied in detail (Figures Figures1111 and S10, Table S12). For some cases, the RIMP2 and SCS(MI)-MP2 stacking energy did not agree, with RIMP2 predicting lower energy by up to 1 kcal/mol (using the same basis set, cc-pVTZ). Our model fits well with RIMP2 energies in all of the translation and rotation conformations. In some cases, AMBER greatly underestimated the stacking energy when compared to experiment, especially for the GU stacking pair.
The interaction energies between nucleobases and three metal ions, Na+, K+, and Mg++ were also examined. Eight metal ion binding positions around four bases (A, C, G, and U) were selected: four ion positions around carboxyl groups and four ions positions around nitrogen groups with lone pairs (Figure S12). We found that the QM interaction energy of the bases with Na+ and K+ at the four ion positions near carboxyl groups are well reproduced using AMOEBA, while the interaction energy of Mg++ near carboxyl groups was too weak. The AMOEBA interaction energy for all the three ion types at the four ion positions around nitrogen groups was too strong. So special pairwise vdW parameters (Table S13) were used for metal ions-N and magnesium-O, and the results are also shown in Figure S12.
We investigated in detail the out-of-plane bending of aromatic rings. Multiple conformations with different bending axes and bending angles were designed, and the conformational energies were computed using QM. The new AMOEBA force field can reproduce the QM conformational energy with a 0.51 kcal/mol RMSE for all the distorted structures investigated here.
The AMOEBA polarization energies of dimers at equilibrium distance are highly correlated with the SAPT0 induction energy (Table S14). Water is a notably strong polar solvent; the induced dipoles on the aromatic molecules in water solution are much larger than those in pure liquid (Figure Figure1212 and Table S15). Aromatic molecules adapt to different environments by changing the magnitude of their dipole moments, due to the high molecular polarizabilities of π-electron. Many drug molecules possess aromatic fragments, which play important roles in maintaining both desirable partition coefficients and binding affinity to the target biomolecules.80 The large polarization energy is likely important for such purposes. In DNA/RNA, the balance among base-water, base-pairing, and stacking interactions is the main driving force for forming biological structures such as duplexes and hairpins. In this study, we have successfully developed a new AMOEBA force field for nucleic acid bases and related aromatic analogs, based on comprehensive studies of molecular structures, energetics and liquid phase thermodynamic properties, and by systematic comparison with both QM calculations and experimental measurements.
The authors are grateful for support by the National Institutes of Health (R01GM106137 and R01GM114237 to J.W.P. and P.R.), the National Science Foundation (CHE 1152823 to J.W.P.), and the Robert A. Welch Foundation (F-1691 to P.R.). TACC and XSEDE (TG-MCB100057) provided high-performance computing resources.
The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript.
Funding was provided by the following: NIH R01GM106137 and R01GM114237; NSF CHE 1152823; and the Welch Foundation F-1691.
The authors declare no competing financial interest.