|Home | About | Journals | Submit | Contact Us | Français|
The principal challenge of using classical physics to model biomolecular interactions is capturing the nature of short-range interactions that drive biological processes from nucleic acid base stacking to protein-ligand binding. In particular most classical force fields suffer from an error in their electrostatic models that arises from an ability to account for the overlap between charge distributions occurring when molecules get close to each other, known as charge penetration. In this work we present a simple, physically motivated model for including charge penetration in the AMOEBA (Atomic Multipole Optimized Energetics for Biomolecular Applications) force field. With a function derived from the charge distribution of a hydrogen-like atom and a limited number of parameters, our charge penetration model dramatically improves the description of electrostatics at short range. On a database of 101 biomolecular dimers, the charge penetration model brings the error in the electrostatic interaction energy relative to the ab initio SAPT electrostatic interaction energy from 13.4 kcal/mol to 1.3 kcal/mol. The model is shown not only to be robust and transferable for the AMOEBA model, but also physically meaningful as it universally improves the description of the electrostatic potential around a given molecule.
A grand challenge of molecular mechanics (MM) force fields is modeling the physics of molecular interactions with an accuracy and efficiency that allows realistic, tractable simulations of large systems. The goal is not only to correctly capture the physics of molecular interactions, but also to be able to answer important practical questions posed by biology, materials science and a number of other fields. To do this, MM models make classical approximations to the 1st principles quantum mechanics driving the true dynamics of a molecular system. Typically, this is done via a set of classical harmonic potential terms describing the intramolecular interactions of bonded atoms in the system and a separate set of non-bonded terms to describe intermolecular interactions. In particular, the electrostatic nonbonded terms are especially important for accurately modeling both short and long range molecular interactions.1
The AMOEBA force field is unique in its treatment of these important intermolecular electrostatic interactions. Most MM force fields use point charges to approximate the charge distribution around atoms in a system and parameterize these point charges based on thermodynamic measurements. AMOEBA takes a more physically realistic approach. The AMOEBA model approximates the charge distribution around atoms as a point multipole expansion of the charge distribution obtained from ab initio quantum mechanics (QM) calculations.2, 3 Using a multipole expansion derived from ab initio QM calculations provides a much more accurate description of electrostatic interactions at medium-range (~2 to 4 times the vdW radius), and has been shown to yield satisfactory results for simulations of water, proteins, nucleic acids and small molecules.1, 2, 4, 5
The multipole approximation of electrostatics, however, starts to break down at short-range. While the multipole expansion is rigorously correct for interactions of atoms at sufficient distance, it is no longer strictly valid once the electron clouds of interacting atoms start to overlap. This phenomenon is known as charge penetration. Charge penetration is simply the change in the electrostatic interaction between two atoms due to their electron cloud overlap and the associated loss of nuclear screening. It is a simple accounting for the fact that atoms in a system are not points; they represent finite charge distributions. Accurately modeling electrostatics has been a priority with AMOEBA since its inception. The importance of these interactions was a key motivation for the original AMOEBA multipole model. Qualitatively, accounting for charge penetration is the logical next step in improving this model. As depicted in figure 1, the current model covers the accuracy of long- and medium-range electrostatic interactions. What is needed is a description of charge penetration to accurately model short-range interactions.
In addition to being physically relevant, charge penetration has been shown to be an important factor in many intermolecular interactions. A particularly instructive set of examples lies with what are commonly called “pi-pi” stacking interactions.6 The benzene sandwich dimer, as illustrated in figure 2, should classically be considered electrostatically repulsive since like charges are lined up across from one another. High level ab initio quantum mechanical calculations, however, show the counterintuitive result that the benzene sandwich dimer is electrostatically attractive.7 This is almost entirely due to charge penetration. Figure 2 shows that the overlap of electron clouds causes the electrostatic energy of the interaction to become more negative as the two monomers get closer together. This same phenomenon is observed with stacking interactions between nucleobases. Parker and Sherrill have recently shown that without charge penetration, it is difficult, if not impossible to accurately capture the electrostatics of interacting nucleobases.8 These considerations show that if AMOEBA is to be successful in accurately modeling biologically relevant interactions such as nucleic acid folding or ligand binding, we must account for the short-range electrostatics of charge penetration.
A number of studies have suggested functions for incorporating charge penetration into existing molecular mechanics force fields.9–20 The derivation of most of these functions has followed the same basic strategy. The electrostatic description of each atom in the system is split into two parts. The first is the core charge (often, but not necessarily simply the nuclear charge), treated as a point and second a smeared electron cloud charge representing the remaining charge of the atom. This splits what was a single interaction into four interactions, as illustrated in figure 3. The functions listed in table 1 are four methods suggested for how best to handle this four-part interaction between atoms. Tafipolsky and Engels took a more direct approach and calculated a numerical integral between spherical pro-molecule charge densities.17 This is similar in spirit to the approach of the GEM (Gaussian Electrostatic Model) force field, where hermite gaussians are used to reproduce the ab initio electron density.9, 21, 22 While being physically straightforward, these methods currently lack the efficiency needed for simulating large systems. The other three methods use damping functions to approximate how the electrostatic potential of an atom changes in its electron cloud and use those damping functions to approximate the value of the overlap integral for U4.
In a previous proof-of-principle study, we implemented the form of Piquemal and co-workers in the AMOEBA force field.23 The study showed that accounting for charge penetration can start to recover the true nature of short-range electrostatic interactions between molecules. A follow-up study extended the model for use with smooth particle mesh Ewald.24 In the present work we seek to develop a comprehensive model based on the previous work that best captures the physics of electrostatic intermolecular interactions and the aims of the AMOEBA force field. Given the potential improvement our previous work has shown possible in such a model, the question becomes: what features would we like the AMOEBA charge penetration model to have? In the work presented here we aim to implement a charge penetration function that best meets the following criteria:
In section 2, we present the physical derivation of the models that were considered and derive corresponding damping terms for higher-order multipoles. In section 3, the scheme for parameterizing the models is presented. Section 4 lays out results comparing the performance of the models. Section 5 shows validation that the charge penetration model is capturing physical reality. And lastly, section 6 draws our conclusions.
Stone illustrated the phenomenon of charge penetration with a simple example.25 Consider the interaction of a proton with a hydrogen-like atom with nuclear charge Z. From quantum mechanics we know that the wave function of a hydrogen-like atom is
This gives us the electron density of the atom,
This tells us how dense the electron distribution of the atom is as a function of the radial distance (r) from its nucleus. To get the potential this density generates, we must apply Poisson’s equation,
the familiar potential due to the electron density of a hydrogen-like atom. At large distances from the atom, the first term in Eq. 4 dominates the second term due to the second’s exponential decay and we have the classical point charge coulomb approximation of the potential. At closer distances, however, as shown in figure 4, the second term becomes non-negligible. This second term represents the charge penetration.
We can exploit the fact that V(r) converges to −1/r at large distances and rewrite Eq. 4 as
The potential in this form is represented simply as the point charge coulomb potential multiplied by a damping function. This is convenient because the damping function has the following straightforward properties:
To this point there are no approximations made in our derivation. Crucially, however, most atoms in systems of interest for molecular simulation are not strictly hydrogen-like. This means that fdamp for non-hydrogen-like atoms is not exactly given by Eq. 6. The properties and form of Eq. 6 are instructive, however. To capture the physics more generally, we introduce a parameter, α, in place of the 2Z and remove the prefactor in front of the exponential to obtain
This more general construction of fdamp retains all of the relevant damping function properties listed above and allows us to tune the parameter, α, to reproduce ab initio electrostatic energies. This is identical to the damping function proposed separately by both Gordon and co-workers11 and Piquemal and co-workers.10
Using the damping formulation of Eq. 7, we have now effectively changed the potential due to every atom in a given system. The potential at any point in the system is described by,
where the potential due to the nucleus is unchanged, but the potential due to the electrons now accounts for the charge penetration effect. This, however, is not quite enough to get the interaction energy between two atoms. Recall from figure 3 that although the second and third terms of the charge penetration corrected electrostatic interaction energy involve simple point charges interacting with the potential due to smeared charge distributions, the fourth term has two smeared charge distributions interacting with each other. In this unique case, we must derive a second “overlap” damping function to account for this interaction.
For the fourth, overlap term we are attempting to approximate the overlap integral between the two charge distributions,
where VA and VB are the charge penetration corrected potentials due to atoms A and B respectively. Gordon and co-workers approximate this integral using the one-center method given by Coulson26 to yield
where qA and qb are the total electron charges of atoms A and B, for the charge-charge portion of the interaction. Piquemal and co-workers take a two-center approach to approximating the integral,
where, as laid out in our previous work (Ref. 20), a second parameter is introduced to describe the overlap. While the derivations of these formulae are slightly different, mathematically these U4 overlap damping functions constitute the only functional difference between the models of Gordon and co-workers and Piquemal and co-workers. For simplicity’s sake, the approach of Eq. 10a will be referred to as model 1 and Eq. 10b as model 2. They can be implemented, however, in an identical manner. These overlap damping functions allow us to calculate the charge penetration corrected charge-charge electrostatic interaction between any two sites:
The AMOEBA model, however, has more than just charges on every atom. It uses a multipole expansion representing the charge distribution at every site. The energy between two AMOEBA multipole sites, i and j, is given by,
where Mi and Mj represent the multipole moments on atoms i and j respectively, and
is the classical point multipole interaction matrix. We can see in Eq. 13 that the interaction matrix, Tij, for AMOEBA without charge penetration is obtained simply by taking repeated derivatives of the classical coulomb potential, 1/r. To account for charge penetration, not just in charge-charge interactions, but in all multipole interactions up to arbitrary order, we simply insert the charge penetration damped potential in place of the classical potential. This yields the charge penetration corrected multipole interaction matrix,
where fdamp is either 1 (for nuclear-nuclear interactions), the damping function from Eq. 7 (for the second and third terms of the interaction energy), or the overlap damping function from Eqs. 10a or 10b (for the fourth term of the interaction energy). Using the charge penetration corrected multipole interaction matrices, we can express the new AMOEBA multipole interaction energy of any two sites as:
Eq. 15 allows us to account for the effects of charge penetration up to arbitrary order multipole expansion. For AMOEBA, which has multipole interactions up to quadrupole-quadrupole, this means that the charge penetration model can be made fully consistent with the multipole model. See Supplementary Material for explicit damping functions for all AMOEBA multipole interaction components.
The goal of including charge penetration in the AMOEBA model is to more accurately reproduce the energies of electrostatic interactions between molecules at short range. Because both models 1 and 2 contain empirical parameters, we will seek to optimize them by fitting to a database of relevant intermolecular electrostatic energies. In our previous work, the S101 and S101x7 databases where constructed for this purpose.23 The S101 database contains 101 unique pairs of both homodimers and heterodimers of common organic molecules. It contains the widely used S66 database27 along with some additional relevant biomolecular interactions. The S101x7 database is constructed by placing each dimer pair from the S101 database at 0.70, 0.80, 0.90, 0.95, 1.00, 1.05 and 1.10 times their equilibrium intermolecular distance. A schematic representation of all the dimer pairs included in the S101 database is shown in figure 5. In all of the parameterization that follows, the entire S101x7 database was used with the exception of interactions involving ethyne. The omission of ethyne allows direct comparison with the results from our previous work.
To parameterize the charge penetration models against the S101x7 database, accurate intermolecular electrostatic energies are needed for all dimer pairs. In the previous work, Symmetry Adapted Perturbation Theory (SAPT)28 calculations where performed to obtain these energies. SAPT calculations decompose intermolecular energies into physically meaningful components; the intermolecular energy between two monomers is broken down into electrostatic, induction, exchange-repulsion and dispersion energies. For the S101x7 database, SAPT2+ calculations29, 30, estimated at the complete basis set (CBS) limit as described in Ref. 22, were carried out to return the ab initio electrostatic interaction energy of each dimer pair.
The parameters of model 1 and model 2 were optimized by performing a nonlinear least squares fit to minimize the difference between the AMOEBA electrostatic energy (with charge penetration), , and the SAPT electrostatic energy, , for each dimer pair. For models 1 and 2, two methods of parameterizing are proposed. In the first method one parameter, α, is assigned per element. In the second, one α is assigned per charge penetration class. These classes, as listed in table 2, are simply chosen to allow for different descriptions of atoms of the same element but different physiochemical classifications. The choice of classes is based on the knowledge that the electronic structure of an sp2 hybridized carbon, for example, will be generally different than that of an aromatic carbon. While it is certainly true that differences in electron distribution exist even amongst atoms of the same charge penetration class (the electronic structure of every sp2 hybridized carbon is not exactly the same), the guiding principle is to include only the minimal level of atomic classification to allow the model to be easily transferable.
For model 2, the parameter, β, is fixed as a fraction of α,
where the parameter, γ, is taken to be universal to avoid over-fitting. Allowing β to float for every charge penetration class has the potential, of course, to improve the overall fit, but at the cost of losing physical meaningfulness. Recall from Eq. 10b that although the β parameter is specific to the overlap function in model 2, the two electron clouds that are overlapping are supposed to already be described by the parameter α. Allowing both α and β to float in the fit would allow two different parameters to describe essentially the same physics. Instead fitting one universal parameter γ simply describes how β should be generally related to α in approximating the overlap between molecules. It should be noted that the parameterization strategy here for model 2 differs slightly from previous work. It is chosen in this way to best fit the AMOEBA multipole model and provide for a direct comparison with model 1 on the same test set.
The results of fitting model 1 and model 2 are shown in table 2. Three fits were performed for each model. First the S101x7 database of intermolecular electrostatic energies was fit using only charge-charge damping with parameters assigned by element. Next, the same charge-charge damping fit was performed with parameters assigned by class. Then the database was fit using higher-order damping with damping of all AMOEBA multipole interactions (up to and including quadrupole-quadrupole).
In addition to parameterizing models 1 and 2, a third model, due to Wang and Truhlar18–20 has been parameterized as well. This model, developed for application in QM/MM calculations is included as a point of comparison. However, it is not developed any further than charge-charge damping using parameters assigned by element as it has several properties that make it unsuitable for implementation in AMOEBA. First, the model can be unstable with respect to the parameters of interacting atoms. If two closely interacting atoms have parameters that are close, but not identical, the overlap damping functions of the model breaks down. Second, expanding the model to include higher-order damping to make it fully consistent with the AMOEBA multipole model is computationally intractable with this model. The expressions that form the overlap damping functions, as seen in Eqs. 8 and 9 in Ref. 19 are much more complex functions of the radial distance between atoms, r. Taking the successive derivatives necessary for higher-order damping terms would produce expressions too expensive to calculate for our purposes. Third, even if such derivatives were deemed necessary, the model’s framework is incompatible with higher-order damping. The damping functions used in Wang and Truhlar’s model are meant to simulate the outer Slater-type orbitals of atoms. With this being the case, rather than treat all of an atom’s electrons as damped, the model only treats a maximum of 2 as damped. This treatment is acceptable for charge-charge damping since charge is spherically symmetric and one simply treats the remaining electrons as part of the “core”. This is, however, problematic for higher-order damping because there is no such simple partitioning of the electrons that make up an atom’s dipole and quadrupole moment. It would be nonsensical to apply the model’s damping terms meant for two electrons, to an atom’s dipole and quadrupole interactions.
In the following section the fits produced by the parameterization of all three models is presented. The fits of each model to the S101x7 database will be used along with some important validation tests and theoretical arguments to determine which model and which parameterization strategy to implement in AMOEBA.
To understand how charge penetration improves the electrostatic model of AMOEBA, we must understand how the current AMOEBA model without a charge penetration correction performs. Figure 6 shows how AMOEBA’s prediction of intermolecular electrostatic energies compares to the SAPT ab initio electrostatic energy values on the S101x7 database. Figure 6 reveals that using only a multipole expansion to describe the electrostatic interactions between molecules systematically overestimates the electrostatic energy at short range. The pervasive gap illustrated in figure 6 illustrates the need for including charge penetration in the electrostatic model of the AMOEBA force field.
The most naïve method of applying a charge penetration correction is to assign one parameter per element and damp only the charge-charge electrostatic interactions. As a first test of the theory, this strategy was implemented for models 1, 2 and 3. Each model was then parameterized by fitting to the S101x7 database. The overall results of assigning parameters by element and damping only the charge-charge electrostatic interactions are illustrated in the first cluster of columns in figure 7. It is clear that all three models perform much better than the current AMOEBA multipole only model. The RMS error of the multipole-only model for electrostatic energies on the S101x7 database is 13.4 kcal/mol. Models 1, 2 and 3 bring that error down to 2.1 kcal/mol, 2.1 kcal/mol and 4.5 kcal/mol respectively, showing that even a naïve damping strategy starts to capture the missing physics. It is also apparent that models 1 and 2 perform much better, even at this low level of implementation, than model 3. Additionally, note that despite having fewer parameters, model 1 performs nearly identically to model 2 for this implementation. Complete statistics for each of these fits, including a breakdown by intermolecular distance, are available in Supplementary Material.
While assigning parameters by element produces an improvement over the multipole-only AMOEBA model, it ignores some key physiochemical properties of elements in different bonding environments relevant to interpreting the α parameter. The α parameter with units, Å−1, can be understood as the inverse of the physical extent of the electron cloud of an atom. From ab initio electronic structure calculations we know that in general this property can change substantially based on the bonding environment of an atom. For this reason we fit models 1 and 2 with parameters assigned by class to the S101x7 as described in the preceding section. The overall results of assigning parameters by class and still damping only the charge-charge electrostatic interactions are illustrated in the second cluster of columns in figure 7. The first thing to note is the absence of a fit for model 3. Once the parameter set is expanded to include classes, model 3 becomes highly unstable. As noted before this is due to numerical instability when parameters in the model become close. This is practically unavoidable for class-based parameters, so model 3 is excluded from this point forward. More importantly, however, we notice also that splitting out different parameter classes improves the overall fit to the S101x7 database for models 1 and 2. Assigning parameters by class improves the performance on the RMS error. Again despite having fewer parameters, model 1 outperforms model 2 in this case. This improvement is largely due to allowing different classes for the same element. For example, table 2 shows that for model 1 the parameter for hydrogen in the element based fit splits quite significantly when one allows different classes to vary. The element parameter, 4.0 Å−1 splits into parameters of 3.4 Å−1, 3.9 Å−1 and 5.0 Å−1 for non-polar, aromatic and polar hydrogen respectively. This extra flexibility in the parameterization, rooted in basic physiochemical properties improves our overall description of the electrostatics. Again specific statistics for class-based fits can be found in the Supplementary Information.
Splitting out separate chemical classes for parameters improves the performance of our charge-charge damping charge penetration model, but it unfortunately does not meet the criteria of being fully consistent with the AMOEBA multipole electrostatic model. To test the fully integrated model we implemented charge penetration damping for all multipole interaction terms (up to and including quadrupole-quadrupole) for both models 1 and 2. We will refer to this model as “higher-order” damping. The overall results, illustrated in the third and final cluster of columns in figure 7, show the improvement that this model brings. Implementing a fully integrated higher-order damping model with class-based parameters brings the RMS error on the entire S101x7 database for models 1 and 2 down to 1.31 kcal/mol and 1.52 kcal/mol respectively. Full statistical analysis can be found in Supplementary Information. These numbers represent a dramatic improvement over the current AMOEBA multipole-only RMS error of 13.43 kcal/mol. More importantly they also improve on the errors from our charge-charge damping implementations. A significant portion of the improvement is due to improvement in the performance on the closest dimer pairs in the S101x7 database. Among dimers that are separated by 0.70 and 0.80 of their equilibrium distance, model 1 with higher-order damping reduced that error from 2.75 kcal/mol to 2.27 kcal/mol, and model 2 reduced it from 4.36 kcal/mol to 2.64 kcal/mol. Importantly, this improvement does not sacrifice the fit at more accessible distances. For model 1 the RMS error on dimers with intermolecular separations of 0.90 to 1.10 times their equilibrium distance dropped to under 1 kcal/mol compared with an error of over 4 kcal/mol for the current multipole-only model. Lastly, these fits give a slight edge to the simpler model 1 over model 2. Model 1 performs 16% better than model 2 on overall RMS errors in the S101x7 database when higher-order damping is included. The absolute percent error of model 2 on the electrostatic energies of the S101x7 database is 10%, while model 1 gives 7%.
Figure 7 lays out the overall performance of each of the implementations described above. It is clear from this data that model 1 with higher-order damping and parameters assigned by class gives the best fit to the electrostatics of the S101x7 database. The improvement this model gives on each individual dimer pair is shown in figure 8. Figure 8 shows that across the board model 1 with higher-order damping is superior to simple charge-charge damping, and represents a dramatic improvement over the current multipole-only model. This is bourn out in a handful of important and instructive examples. Figure 9 lays out the results for fitting the water dimer, figure 10 shows two important orientations of the benzene dimer and figure 11 shows the model’s performance on phosphate ions. These three examples represent important relevant biomolecular interactions that the current multipole-only model fails to accurately capture. Moreover, all three also show that an integrated higher-order damping model is needed to achieve the highest level of agreement with SAPT electrostatic data. These examples show that not only does the model generally improve the quality of electrostatics across a wide dataset, but it also performs well on individual examples, such as the benzene sandwich dimer, that inspired our investigation of the charge penetration phenomemon.
The fit to the S101x7 database with model 1 higher-order damping is a welcome result. The model dramatically improves the quality of the electrostatic fit for those electrostatic interactions over AMOEBA’s current multipole-only model and it outperforms all of the other relevant damping models proposed. There are, however, some considerations that need to be addressed to validate model 1 with higher-order damping as the best option for capturing the physics of charge penetration. First, we would like to show that in addition to giving the best fit, model 1 is also the most robust option. Second, we need to know to what extent this charge penetration model is independent of the AMOEBA multipole model. And most importantly, we must validate that this model is capturing a real physical phenomenon.
It is important our charge penetration model not only provides a good fit to ab initio electrostatic data, but also that the model is robust. To evaluate robustness we must evaluate the sensitivity of the model to small changes in the parameters. Model 3 does not pass this parameter sensitivity requirement. Figure 12 shows the behavior of the oxygen–sulfur electrostatic interaction in the DMSO–water dimer as the difference between oxygen and sulfur parameters gets smaller. Clearly model 3 breaks down as the two parameters get close to one another. Moreover, the problem is compounded as the intermolecular distance decreases. Since the zeta parameter multiplies the interatomic distance, r, everywhere in the damping function, the problem gets worse as monomers get closer together. Model 2 does not suffer from any such numerical instability, but it is sensitive to the parameter, γ, that determines the overlap damping function. Table 3 shows that if the closest dimers are left out of our fit to the electrostatic data, γ changes from 0.88 to 0.90. Moreover, if we use the γ that comes out of the fit where we leave out the closest points, the RMS error for the full S101x7 database jumps from 1.52 kcal/mol to 1.83 kcal/mol. Model 1 on the other hand does not suffer from any such sensitivity. If we leave out the closest dimer pairs and fit parameters to our model, table 3 shows that those parameters do almost as well as the parameters fit to the full S101x7 database. The RMS error for model 1 in this case goes up by less than 0.1 kcal/mol. By these tests model 1 shows the strength with respect to numerical stability and parameter transferability we expect a robust charge penetration model to have.
In addition to being the most robust option, model 1 also shows good model independence from the AMOEBA multipole model. AMOEBA follows a defined protocol for determining charge, dipole and quadrupole parameters for each monomer2 and we should expect that our model should, for the most part, be independent of that specific protocol. In other words the multipole model and the charge penetration model should not depend on each other. To test this we use the toy example, benzene. When determining the electrostatic parameters for benzene, multiple values for the opposing charges of the carbons and hydrogens will give nearly identical fits to the electrostatic potential on a grid of points around the molecule. Although the AMOEBA multipole protocol fixes those charge values semi-arbitrarily, we wanted to see if choosing otherwise would break our model 1 charge penetration model. Figure 13 demonstrates that model 1 accurately reproduces the electrostatic potential regardless of which potential-fitted charge-dipole-quadrupole model one chooses. This validates an important feature of the model: that it is independent of the specifics of potential fitting protocol for the AMOEBA multipole model.
Lastly, but most importantly, for our model to be valid, we must prove that it is capturing a real physical effect. At the heart of the charge penetration phenomenon is the fact that the electrostatic potential around an atom at short range cannot be reproduced by a simple point multipole approximation without accounting for the extent of the atom’s charge density. To validate that the model is describing this physics we tested to see if our charge penetration model, model 1 with higher-order damping, could accurately reproduce the ab initio electrostatic potential around a molecule at short range. Figure 14 shows that without exception the charge penetration model dramatically improves the electrostatic potential fit around every monomer in the S101 database. This is the validation we are looking for. Not only does our model correct the practical problem of bad intermolecular electrostatic energies at close range, but it does so by accurately capturing the physical reality of molecules’ finite charge distributions.
As stated in the introduction, charge penetration effects are important in a broad range of close-contact biomolecular interactions. One essential example is the stacking interactions of nucleobases in DNA and RNA sequences. Parker and Sherrill recently showed that without an explicit accounting for charge penetration, force fields struggle to accurately reproduce the ab initio electrostatic energies of these interactions.8 For instance in an AC:GT base step, the mean absolute errors (MAE) of the AMBER31, 32 and CHARMM33 force fields relative to the SAPT electrostatic energy were over 20 kcal/mol. Likewise, we find that AMOEBA without charge penetration gives an electrostatic energy MAE over 20 kcal/mol as well. However, when we apply our charge penetration function with parameters fixed to their values from the S101x7 fit, the MAE drops dramatically to nearly 2 kcal/mol. This improvement is not unique to the AC:GT base step. As shown in figure 15, the MAE of our AMOEBA model with charge penetration is significantly lower for every base step combination.
Moreover, this improvement in the electrostatic description of nucleobase stacking holds even for non-equilibrium stacking arrangements. Figure 16 shows that for the six structural parameters that define the stacking interaction, 34 the AMOEBA + charge penetration model does far better than AMBER, CHARMM or the current AMOEBA force field. These data confirm, as asserted by Parker and Sherrill, that including charge penetration is an absolute necessity for a robust nucleic acid force field model. This imperative is highlighted in two standout cases of the TA:TA base step. Figure 17 shows the performance of force field models against SAPT electrostatics versus the nucleobase rise. It is immediately clear that the AMOEBA + charge penetration model put forward here is the only model that accurately reproduces the electrostatic nature of this interaction. The same is seen in figure 18 where we examine the electrostatic energy as a function of the tilt parameter. Again, the model including charge penetration is the only model that agrees with the quantum mechanics. This same improvement persists across all structural parameters of the TA:TA base step. Figures for the other four parameters can be found in the supplementary information. It is worth noting that not only is this an important test case because of its direct relation to biomolecular applications for the force field. It is also important because it shows that the model, parameterized against a particular test set (S101x7) performs well on interactions well outside of that set. These results give us confidence in the transferability of our charge penetration model.
The goal of the AMOEBA force field is to model the physics of biomolecular interactions using approximations that make calculations on large systems tractable. Our work here shows that to accurately capture the physics of short-range intermolecular interactions, a charge penetration term is absolutely necessary. Without accounting for charge penetration, even an advanced point multipole model cannot accurately reproduce electrostatic interactions at short range. These discrepancies in intermolecular interactions crucial to biomolecular systems are large enough that they cannot be ignored. Fortunately, we have also shown that charge penetration can be corrected for with the implementation of a simple set of damping functions. This is not necessarily a new conclusion. Previous work on AMOEBA as well other classical force field models have demonstrated the efficacy of using damping functions to capture charge penetration. We have demonstrated here that the higher-order damping functions we have developed for model 1 represent the best, most integrated method for implementing charge penetration in the AMOEBA force field.
There are some key reasons why using model 1 with higher-order damping makes the most sense for AMOEBA. The first reason is the most obvious. On an extensive test set of relevant molecular dimers, model 1 with higher-order damping produced the most accurate results. We have shown that including higher-order damping provides a substantial increase in model accuracy and model 1 performs well at this purpose. The practical purpose of including charge penetration in the force field is to accurately describe intermolecular interactions and by this direct measure model 1 with higher-order damping does the best.
The model does more than simply give good numbers, however. Model 1 is derived from the fundamental physics of atomic charge distributions. The damping function that describes the electrostatic potential around an atom in this model comes directly from the charge distribution of a hydrogen-like atom. The overlap damping function comes directly from an approximation of the overlap integral between two hydrogen-like charge densities. The model does contain empirical parameters, but those parameters are given physical meaning by the derived functions they sit in.
A natural question is why the similar model 2 with one extra parameter does not give better results than model 1. The simple answer is that it appears the two models are intrinsically aligned with different mulitpole models. AMOEBA takes a two step approach to assigning multipole parameters. First distributed multipole analysis (DMA) is performed to obtain initial charge, dipole and quadruople parameters. Then, those parameters are optimized by fitting to the electrostatic potential on a grid of points around the molecule. Because the overlap function in model 1 is constructed starting from a simple one-electron potential, model 1 seems to align nicely with the electrostatic potential fit method for determining AMOEBA multipoles. In contrast it seems that the two-center integral method used by model 2 might perform better with multipoles that are not potential-fitted. This theory is borne out by the results of figure 19. Figure 19 illustrates that model 2 with its extra free parameter, does perform better on the S101x7 database when simple DMA multipoles are used instead of potential fitted ones. Using the AMOEBA potential fitted multipoles however does better overall and much better when paired with model 1. The origin of this difference between models 1 and 2 in instructive. It shows that despite its relative simplicity, model 1 seems to provide a better intrinsic fit for the AMOEBA force field.
Not only is the model conceptually aligned with the AMOEBA multipole model, but it is fully integrated with it as well. Prior charge penetration models have damped charge-charge interactions or a handful of higher order interactions13, 14, but here we have derived damping functions for multipole interactions up to arbitrary order. This does two important things. First, it improves the overall accuracy of our intermolecular electrostatic energies. And second, it gives us a fully integrated multipole electrostatic–charge penetration model. The charge, dipole, quadrupole moments of a multipole expansion are all functions of the underlying charge density distribution. Thus every interaction of these moments should be damped by the function that describes that charge density. Our higher-order charge penetration model satisfies this requirement and does so in a simple, straightforward way.
Importantly, the charge penetration model doesn’t just fit one set of data. We have demonstrated that it passes multiple validation tests. First, the model proved to be robust. There is no numerical instability and the parameters are not overly sensitive. Second, the model is independent of the multipole model. This means that even if a slightly different set of multipole moments that fit the electrostatic potential are chosen for a given molecule, our charge penetration model will still give the same improvement in the fit. These validation tests indicate not only that our model is viable, but that it is not beholden to the test set or the multipole model. In addition we have shown that our charge penetration model has some measure of predictive power. On the biologically significant test of electrostatics in nucleic acid base stacking, our charge penetration model accurately predicted the electrostatic energies of base stacking over a wide range of non-equilibrium structural parameters. This result displays the promise this model shows in its application to simulations of real biological systems.
Finally, our higher-order charge penetration model captures a real physical effect. The charge penetration phenomenon is a direct result of the fact that atoms have charge distributions representing their electron densities. We have shown that our charge penetration function captures exactly this physics. When we use our model to fit the electrostatic potential on a grid of point surrounding a molecule, the error in the electrostatic fit from the simple point multipole approximation goes down for every tested case. This gives us the highest degree of certainty that we are doing more than just adding in another degree of freedom to our electrostatic function. The damping functions derived for our higher-order damping model accurately describe the electrostatic environment around molecules, and since the effect is necessarily short-range, the computational cost of accounting for charge penetration in this way is minimal. The damping terms can be implemented utilizing a short-range cutoff, or can be computed for every pairwise interaction in the real-space portion of an Ewald summation approach. In either case, the additional cost beyond that of the standard AMOEBA electrostatic model is small. By describing this simple physics in a simple way, our model allows us to more accurately predict intermolecular interactions between biomolecules.
JWP and PR wish to thank the National Institutes of Health NIGMS for support of AMOEBA development via awards R01 GM106137 and R01 GM114237.