|Home | About | Journals | Submit | Contact Us | Français|
We used replica exchange molecular dynamics (REMD) simulations to evaluate four different AMBER force fields and three different implicit solvent models. Our aim was to determine if these physics-based models captured the correct secondary structures of two α-helical and two β-peptides: the 14-mer EK helix of Baldwin and co-workers, the C-terminal helix of ribonuclease, the 16-mer C-terminal hairpin of protein G, and the trpzip2 miniprotein. The different models gave different results, but generally we found that AMBER ff96 plus the implicit solvent model of Onufriev, Bashford, and Case gave reasonable structures, and is fairly well-balanced between helix and sheet. We also observed differences in the strength of ion pairing in the solvent models, we but found that the native secondary structures were retained even when salt bridges were prevented in the conformational sampling. Overall, this work indicates that some of these all-atom physics-based force fields may be good starting points for protein folding and protein structure prediction.
All-atom force fields, such as CHARMM,1 AMBER,2,3 and OPLS,4 are widely used for computer simulations of the properties of proteins and small peptides. However, such force fields are not yet commonly used to predict native protein structures, in part because of computational limitations in the conformational sampling of systems with many degrees of freedom, as in proteins, but also in part because of concerns that the force fields themselves might not be good enough. It has been difficult to distinguish whether physics-based protein structure prediction has been hindered by insufficient sampling or by limitations in the force fields.
However, a few tests on protein structure prediction have been possible so far because of mechanistically focused sampling or large dedicated computer resources. There are some notable successes indicating that modern physics-based force fields might be accurate enough to predict the native structures of proteins. Using supercomputer resources, Duan and Kollman performed a milestone microsecond molecular dynamics simulation of the 36-residue villin headpiece in explicit solvent starting from an unfolded conformation.5 Scheraga’s group, starting from a random configuration, folded the 46-residue protein A to within 3.5 Å root-mean-square deviation (rmsd) using Monte Carlo dynamics with an implicit solvation model.6 Another milestone was the folding of villin by the Pande group on Folding@Home, a distributed grid computing system running on the screen savers of volunteer computers worldwide.7
Recently, higher accuracies have also been achieved. High-resolution structures of villin have recently been reached by Pande et al.8 and Duan et al.9, 10 In addition, three groups have folded the 20-residue Trp-cage peptide to ~ 1 Å: Simmerling et al.,11 the IBM Blue Gene group of Pitera and Swope,12 and Duan et al.13 Recently, Lei and Duan folded the albumin binding domain, a 47-residue, three-helix bundle, to 2.0 Å.14 Furthermore, our group has found that AMBER ff96 in combination with a generalized Born/surface area implicit solvation model does well in identifying structures within about 3 Å from native for eight small single-domain globular proteins less than 75 amino acids in length.15 Such studies are encouraging for the application of physics-based potentials to the structure prediction problem.
Studies on shorter model peptides have enabled a more detailed look at differences among and potential deficiencies of force fields. Gnanakaran and Garcia performed 10 ns replica exchange molecular dynamics simulations of short alanine peptides with explicit water and a modified AMBER force field, and found that obtaining correct secondary structures was sensitive to backbone torsional energies.16 Okamoto and coworkers performed 10 ns replica exchange molecular dynamics (REMD) simulations on the C-terminal hairpin of protein G and the helical C peptide of ribonuclease A using six popular force fields with water molecules included explicitly.17,18 They found that AMBER ff94 and ff99 were too helical, GROMOS96 favored extended structures, and AMBER ff96, CHARMM22, and OPLS-AA were more balanced but did not reach a high level of accuracy in either peptide. Sakae and Okamoto also explored a method to optimize partial charge assignments and backbone torsional energies in these force fields using Protein Data Bank (PDB) structures, finding improved performance in simulations of short peptides.19–21
Much work has also focused on force fields used with implicit solvation, such as the generalized Born surface area (GBSA) method, as implicit models tend to be orders of magnitude faster for large protein simulations. Zhou22 compared the results of 3 ns explicit and implicit REMD simulations for the protein G hairpin, considering the AMBER ff94, ff96, ff99, and OPLS-AA force fields in combination with explicit (TIP3P for AMBER, SPC for OPLS) or implicit (GBSA for AMBER, surface generalized Born for OPLS) solvents. That study found that implicit solvation tended to destabilize the native structure, most prominently in the AMBER94 and AMBER99 force fields, which showed strong helical biases. Performing better in implicit solvent was AMBER ff96. More recently, Levy and co-workers examined the G hairpin and C peptide using OPLS-AA and a newer generalized Born model with a nonpolar solvation term;23 they found good agreement between experiment and their 4 ns REMD simulations for both peptides. Both the Zhou and Levy studies noted the overstabilized ion pairing they found in implicit solvent; non-native salt bridges significantly slowed conformational sampling or destabilized the native structure altogether. In contrast to the generalized Born approach, Luo and coworkers used a Poisson–Boltzmann implicit solvent to fold the G hairpin using AMBER ff94, ff03, and several variants of ff99, and found that ff03 was among the best performers.24 Interestingly, their simulations omitted the nonpolar surface area term, which they suggest can lead to non-native ion-pairing interactions.25
These detailed force field tests show that, despite their successes, two key issues remain unresolved in physics-based potentials: balance in backbone conformational preferences and implicit solvation models, particularly in ion-pairing interactions. We believe these are key to improving protein structure prediction with physical models. In this work, our aim is to perform an unbiased test of a family of modern force fields with implicit solvation, in order to assess the state of the art. We ask a very simple question: are these models capable of producing both basic α-helical and β-sheet structural motifs, where appropriate? In the interest of an unbiased performance assessment, here we do not evaluate based on the theoretical validity of the models or the manner in which they were derived.
We test several AMBER force fields (ff96, ff99, ff99SB, and ff03) and implicit solvation models (denoted here as igb1, igb5, igb7) for their abilities to stabilize under extensive sampling the small C-terminal protein G β-hairpin26 and the synthetic EK α-helix.27 For the best-performing force field/solvation pair, we then did additional tests on two other peptides: the helical C peptide of ribonuclease A28 and the synthetic trpzip2 hairpin.29 We believe the present simulations are extensive, with three independent 10 ns REMD trials for each peptide/force field/solvent combination. The goal of these simulations is to assess the stability of the native structures during the sampling procedure, in terms of resistance to unfolding during the finite simulation length. In addition, we perform a simple test of the role of ion pairs in the simulations by modifying the sampling so that they cannot form. We find that strong ion pairs in the implicit solvent models do not contribute significantly to stability, as the native secondary structures are still recovered when ion pairing is prevented.
The latest release of the AMBER package, version 9, contains several all-atom force fields. Each of these comprises a collection of parameters for bonded potential energy terms (stretching, bending, and torsion) and nonbonded interactions (charge and van der Waals). In the present study, we examine four AMBER force fields: ff96, ff99, ff99SB, and ff03. The original ff94 parameter set developed by the Kollman group30 is one of the earlier force fields adopted by many researchers. The atomic charges in ff94 were derived from Hartree–Fock calculations; nonelectrostatic parameters such as torsion and van der Waals interactions were fitted first to small molecule data and then to larger organic compounds. Because ff94 was found to overemphasize helical conformations, ff96 was developed using quantum calculations to improve on peptide backbone ϕ–ψ torsion parameters.31
Kollman and co-workers later attempted a broader systematic refinement of many of the torsional parameters in ff94/ff96.32 They used RESP charges derived from high-level ab initio calculations to parametrize energies in an 82-molecule training set. The subsequent force field is denoted ff99, and is intended for use both with and without polarization effects. A number of groups, however, have since noticed an inaccurate representation of glycine and a strong α-helical emphasis in ff99. Several modifications have been proposed; among them, Hornak et al. performed a careful reparametrization of backbone torsional terms using alanine and glycine tetrapeptides.33 That new version, called ff99SB, was designed to provide better balance between extended and helical conformations. In ff03, Duan et al. refitted the charges in ff99 altogether, using quantum calculations with a continuum dielectric, to better reproduce the effects of solvent.34
Implicit solvation methods are designed to reproduce the solvation free energy as a function of protein composition and conformation, in a term that can be added to the intramolecular Hamiltonian for the protein. Implicit models represent the solvent as a uniform dielectric medium, neglecting the geometry and detailed bonding patterns of the solvent, such as water’s hydrogen bonds. Short- and long-range components of this free energy are treated separately: short-range interactions include van der Waals attractions and the formation of the protein cavity, and are typically modeled as being proportional to the solvent-accessible surface area. This simple treatment has been the standard approach since Eisenberg and McLachlan’s work in 1986,35 although it has been suggested that an additional cavity volume term may be important as well.36
The long-range electrostatic component of the solvation free energy has been the subject of much work in the past two decades. Numerical solution of Poisson’s equation for the protein charge distribution provides a straightforward approach, but is impractical for molecular dynamics simulations of modest lengths. Thus, a number of researchers have focused on the development of fast, approximate solutions to the electrostatic equations that recover exact results in ideal limits. The generalized Born model of Still and co-workers forms the basis for a number of implicit solvation models.37 It provides an approximation to the electrostatic free energy when a charge distribution is created in a cavity of low dielectric (the protein, ε = 1) embedded in a uniform dielectric medium (water, εW = 80):
where the sum is over pairs of atoms i and j, qi is the charge on atom i, and Ri is the effective Born radius of atom i. Equation 1 can also be modified with a Debye–Hückel term to incorporate salt effects, although we do not consider that case in our work. The function f provides an effective interaction distance between charge pairs. The main approximation of Still and co-workers is the following functionality for f:
This particular form yields the correct asymptotic free energies for point charges in both limits rij →∞(isolated ions) and rij → 0 (point dipole), while remaining smoothly differentiable at intermediate distances.
The effective Born radius for each atom, Ri, is crucial to the calculation of the solvation free energy.38 In a broad physical sense, Ri measures the distance from a charge in the protein to the boundary with the solvent. Approximating the electric field in the solvated case with the Coulombic expression (the so-called Coulomb-field approximation39,40) gives, for Ri
Here ρi is the intrinsic radius of atom i and the integral I is performed over the entire protein cavity minus the sphere of radius ρi surrounding atom i. For deeply buried atoms I is large, while for surface atoms it is small and Ri is close to the intrinsic radius, ρi. The computation of I is particularly challenging since the integration domain is highly irregular and the integral must be performed for each atom. Moreover, this computation is an expensive part of MD simulations and must be updated every one or two time steps to avoid numerical instabilities. Importantly, the Born radii appear to be key to the accuracy of implicit solvation: a study by Onufriev, Case, and Bashford in 2002 found that when the “perfect” Poisson–Boltzmann-determined radii were used in GB models, the remaining GB free energies were an excellent approximation to the Poisson–Boltzmann model.38 Implicit solvation models in AMBER differ mainly in the manner in which I is calculated.
In this work, we use three solvation models: igb1, igb5, and igb7, in the AMBER terminology. The solvation model called igb1 comes from Hawkins, Cramer, and Truhlar (HCT), who suggested that the integration domain for I be parsed into the atomic spheres composing the protein itself, so that the integral could be calculated by summing the analytical pairwise contributions to I of an atom with every other solute atom.41 This neglects the contributions to I of interstitial regions between the atomic spheres, and it overcounts regions of atomic overlap. HCT addressed the latter problem by scaling down the intrinsic radii used to calculate I by an element-dependent empirical factor; they determined the scaling factors by fitting predictions to a test set of small organic compounds. HCT parameters suitable for biomolecules were later developed by Tsui and Case,42 and are incorporated with the generalized Born equations into AMBER under the option igb=1.
While the HCT approach works well for smaller molecules, it was found to consistently underestimate the Born radii of buried atoms in macromolecules, due to the neglect of the atomic interstitial regions in I.39 Furthermore, the HCT equations can have singularities in the calculated Born radii when I → ρ−1, which can lead to numerical instabilities. Onufriev, Bashford, and Case (OBC)43 suggested a correction to these problems in which the Born radii are scaled by the extent of their burial, measured by the dimensionless quantity Ψ = iI. Here I is determined by the usual HCT approach and the tilde indicates the scaled intrinsic radii. OBC chose a simple, three-parameter scaling function to modify eq 3:
The parameters α, β, and γ were optimized to reproduce Poisson–Boltzmann model predictions for a myoglobin unfolding trajectory. This approach largely eliminated the systematic errors for large molecules, while retaining good accuracy for smaller ones. Moreover, in a meta-study of implicit solvation models by Feig and co-workers in 2003,44 the OBC model yielded the best agreement with Poisson–Boltzmann calculations among pairwise GB models and was suggested as the best high-speed choice among the 23 methods and parametrizations studied. The OBC model is incorporated in AMBER under the option igb=5.
Mongan et al. recently attempted to improve the treatment of interstitials by including pairwise “neck” regions formed by the solvent-inaccessible space between two atoms fixed in space.45 Their model, called GBn, adds to the HCT value for I an additional summation over such neck regions for atom pairs. In these interstitials, the value of the integral is approximated by a Padé-like expression fitted to numerical solutions for test cases. Then, the Born radii are determined using the OBC methodology (eq 4) with a new set of values for α, β, and γ to accommodate the neck terms they have included. Using REMD simulations, Mongan et al. computed backbone free energies for the alanine decamer and found better agreement with explicit water simulations for GBn than for the older OBC model. The GBn model is incorporated in AMBER as the option igb=7.
We tested the performance of force fields and solvation models on four peptides. The peptides were chosen both because experiments show that they have substantial structure, and because they are short enough that we can do extensive conformational sampling. (1) We studied the 16-mer C-terminal hairpin of protein G, which has a near-native β-hairpin structure, when excised from the wild-type protein.26 NMR measurements estimate the population of the hairpin conformation to be about 40% in solution.26 The hairpin is stabilized by hydrophobic pairing of tyrosine/phenylalanine and tryptophan/valine on one side of the sheet, and contains four potential salt bridges, one of which does not disrupt the native strand pairing. (2) We examined the synthetic EK peptide of Baldwin and co-workers,27 which contains repeats of AEAAKA that tend to form α-helical structure due to pairing of the polar, oppositely charged glutamate and lysine residues along one side of the helix. We studied the 14-mer two-repeat version of this peptide; it has about 40% helicity according to circular dichroism measurements.
In addition, for the best-performing force fields and solvent models, we simulated two additional peptides. (3) We studied the Baldwin C peptide, a 13-residue helix excised from ribonuclease A that achieves partial helix formation around 0 °C and, based on the sensitivity of helicity to pH, appears to be at least partially stabilized by salt-bridge interactions.28 (4) Finally, we examined the 12-mer trpzip2 miniprotein engineered by Cochran, Skelton, and Starovasnik.29 trpzip2 forms a β-hairpin structure due to π-stacking interactions between four tryptophan residues on one side of the sheet. In addition, a Glu5–Lys8 salt bridge appears to stabilize the turn region of this peptide.
We tested four AMBER force fields: ff96, ff99, ff99SB, and ff03. In various combinations, we tested three solvation models: igb1 (HCT with Tsui–Case parameters), igb5 (OBC), and igb7 (GBn). Each of these GBSA models included the default surface area term specified in AMBER. The ionic radii used are the amber6, mbondi2, and bondi sets for igb1, igb5, and igb7, respectively; these follow the suggestions of the AMBER9 manual. We did not test igb1 with ff99 and ff03 because our preliminary comparisons showed that they would not differ much from igb5 in these cases (data not shown). The force fields and solvation models were selected using the tleap and sander programs of the AMBER package. Peptides were capped with ACE and NME blocking groups, for consistency with previous work18,22,23 and to minimize the possibility of salt-bridge traps that might prevent equilibration in the chosen simulation length.22,23
For all calculations, we simulated with replica exchange molecular dynamics (REMD).46 REMD enhances sampling by allowing configurations to heat up and cool down while maintaining proper Boltzmann distributions, in effect letting a conformational search process at higher temperatures inform that at lower ones. Each of our REMD simulations consisted of 20 replicas with temperatures exponentially spaced between 270 and 700 K. We used an implementation similar to that described in ref 47. All trajectories were generated using the sander program in the AMBER 9 package, while replica swaps and information collection was managed by a custom Python wrapper. Each peptide was simulated for a total of 10 ns, and analysis was performed on the last 1 ns using 1000 trajectory snapshots spaced every 1 ps.
We performed three REMD runs on each peptide to test the stability of the presumed native structure. Initial velocities for all three runs were different, and data from the three repeats for each initial structure were averaged to obtain the final results. We initiated the runs from a presumed native structure, taken from the PDB coordinates for the hairpins and the C peptide (1GB1, 1LE1, and 1FS3) and from the ideal helix for the EK peptide.
Root-mean-square-deviation (rmsd) and clustering analyses were performed on the 270 K trajectory from each REMD run. We also investigated results from the 300 K trajectories (not reported here), and found a uniform decrease in secondary structure content but otherwise no qualitative difference in the results. Secondary structures were assigned based on hydrogen bonding patterns and dihedral angles. Backbone rmsd values were computed for each analysis frame, taking reference configurations identical to the initial structures used. The rmsd trajectory tool for the VMD visualization program was used to perform these calculations.48 A Python-based clustering algorithm was also used to identify dominant configurations from the analysis frames. Multiple histogram analysis49 on all trajectories for T < 350 K was used to compute potential of mean force (PMF) curves at 270 K, using the final 2 ns of the 10 ns runs. A bin size of 0.2 Å was used, and the PMF curves were normalized such that when multiplied by −1/kBT, exponentiated, and integrated over their corresponding degree of freedom, the resulting value was unity.
Our assessment of each of the force fields is based on the extent to which they maintain correct basic hairpin and helical tendencies, under extensive conformational sampling. Because experimental measurements on these short peptides are difficult to convert directly into structural ensembles, we do not believe a more detailed analysis–comparing the fractional stabilities, for example–is particularly informative here. Instead, we focus primarily on the degree of balance between helical and β motifs, by examining the dominant structures produced in each simulation and comparing to a single, presumed native structure. Though certainly the peptides experience fluctuations away from these presumed structures in reality, these conformations still serve as a useful milepost for indicating proximity to the correct topology.
Our main results are shown in Figures 1–3 and in Table 1. Figure 1 shows the dominant conformations of the EK peptide after REMD simulation. These conformations come from the most populous cluster after the clustering algorithm is performed on the trajectory; of the three run repeats for each condition, the one whose top cluster contained the most secondary structure is shown. The simulations with ff96, ff99, and ff99SB with igb1 or igb5 perform well, all retaining a near-perfect helix. The strong helical tendency of ff99 has been seen before.45 On the other hand, whereas ff96 was previously found to give poor helices in explicit solvent,17,50 here we find that ff96 works well with these implicit solvent models. The difference may be due to the fact that the implicit models increase the helical propensity of ff96, relative to explicit water. Another difference may be greater conformational sampling in the 10 ns simulations using implicit solvent, which lacks viscosity as compared to previous explicit water simulations.
Other force field/solvent combinations are problematic. The ff03 force field unfolds the helix completely, forming a coil-dominated structure with igb5 and a hairpin with igb7. These results are consistent among all trials, indicating at least minor difficulties in achieving α-helical structures in ff03 with implicit solvent. A second weakness appears to be the igb7 solvent model. The results for igb7 consistently show significant deviations from helix: starting from the native structure, the helix essentially unfolds when igb7 is used with all force fields except ff99SB, where it partially unfolds. The variety of structures produced with different force fields suggests that igb7 does not adequately stabilize helical structures.
Figure 2 shows our tests on the protein G hairpin. Here, successful predictions appear to depend more on the force field than on the solvation model. It is clear that ff99 erroneously tends toward helical conformations, as previous researchers have seen. ff99SB gives slightly better structures than ff99, although the hairpin never fully develops in the former, and some helical tendency is still evident for igb5. However, ff96 performs well for all three solvent models, consistent with findings of Okamoto et al.17 The ff03 parameters also fare well; the particular combination of ff03 and igb5 generates an extremely stable hairpin with near-perfect alignment. Compared to its poor performance on the helical EK peptide, this might indicate that ff03’s dihedral torsions are slightly biased toward β-structure when used with implicit solvent.
In these visual assessments of structure, force field ff96 appears the most consistent and balanced. We find that igb5 performs well with it, and that igb1 is almost as good, but igb7 is problematic. Figure 3a presents a more quantitative comparison using a type of plot that we have found to be informative. This double-error plot shows the rmsd error in predicting the helix vs the rmsd error in predicting the β-hairpin. The rmsd is time-averaged over the last 1 ns of the simulation trajectories. This kind of diagnostic does not give detailed information about the ensemble of structures sampled in each simulation, but we believe it is at least a good indication of the basic extent to which helical and hairpin structures are sampled. Therefore, the plot in Figure 3a gives two simple types of information. First, points closer to the main diagonal, the y = x line, indicate that a force field is well-balanced, which is to say that errors in predicting β-hairpins are roughly equal to errors in predicting α-helical structures. Second, points closer to the origin indicate smaller overall errors of the model relative to the presumed native structures compared to points more distant from the origin. Hence the best force fields and solvent models give points on this graph that are closest to the main diagonal and closest to the origin.
By these criteria, the best overall force field/solvation pairs are ff96 + igb1, ff96 + igb5, and ff03 + igb5. We note that it is difficult to discriminate further, using small differences in average rmsd values (~1 Å), due to both the imperfect nature of the rmsd measurement to a presumed native structure and the run-to-run variations. However, we find that globally the most consistent force field, with multiple solvents, appears to be ff96. Similarly, we find igb5 to be the most consistent solvent when paired with multiple force fields. Therefore, we consider the choice of ff96 + igb5 to be a reasonable overall choice, with rmsd’s centered around 2.9 Å for the helix and 3.9 Å for the hairpin.
The error bars in Figure 3a show the minimum and maximum rmsd values of the three run repeats. Much of this variance is likely due to structural fluctuations around the stable conformation in each case, on the order of ~ 1–2 Å. On the other hand, two runs in particular exhibit fairly pronounced fluctuations: those involving ff96/igb1 and a modified version of ff96/igb5 that is described later in the paper. For those two cases, there is significant run-to-run variation among the three trials, and examining these further, we also found that each individual run could be characterized by significant structural fluctuations as the simulations progressed with time. This kind of behavior is typical of a rough folding landscape.
As a reference comparison, Figure 3b shows the correlation between averaged rmsd (as reported above) and the time-averaged fraction of native backbone hydrogen bonds formed in each peptide, compiled over all simulation test cases in our study. Both the EK peptide and protein G hairpin were reported to remain only partially stable in solution. Though a detailed comparison of structural ensembles with experiment is difficult (since it is not clear whether the reported fractional stabilities correspond to partial unfolding or temporal fluctuations), we can ask what kinds of fluctuations emerge from partially stable structures in the simulations. For the instructive case in which 50% of the native hydrogen bonding is present, Figure 3b suggests rmsd averages of 2.0 Å for the EK helix and 3.0 Å for the G hairpin. Thus, it is not unreasonable to expect that partially stable peptides experience rmsd fluctuations around a native structure that are on the order 2–3 Å, and that the fluctuations might be greater for the hairpin than the helix.
Due to the finite length of these simulations, it is indeed possible that equilibrium convergence is not fully attained. However, we believe that the dramatic differences among the systems studied in the 10 ns REMD runs provides a useful force field diagnostic. To illustrate some of these differences, Figure 4 shows some of the convergence properties of the simulations, showing a well- and poor-performing force field/solvent pair for each of the two peptides. A time progression of the rmsd from presumed native structure shows that the poor-performing pairs quickly depart the native basin, in under 1 ns of REMD simulation time. For these, the populations of the dominant clusters increase with time, indicating that the peptides find increasingly stable alternate conformations, away from the native. In contrast, the best-performing pairs largely remain a short distance from the native basin, although with time it appears that the rmsd fluctuations for these slightly increase. Moreover, in the well-performing simulations, the dominant cluster populations decrease during the run relative to the initial 2 ns. This could indicate that the systems locate slightly more diverse conformational ensembles than the native structure alone. On the time scale of the run, it appears these conformational populations start stabilizing around the 8 ns time point.
Several research groups have previously found that implicit solvent models give non-native salt-bridge-trap conformations.51 We looked for salt-bridge traps in our simulations. The G hairpin contains a single lysine residue that can pair with either of two aspartate or two glutamate residues. One of these four potential salt bridges, Lys7 and Asp10, does not disrupt the native hairpin conformation. Figure 5 compares the potential of mean force (PMF) for this native salt bridge with different solvent models for ff96. These PMF curves were extracted from the last 2 ns of each simulation, and averaged over three runs. In the strictest sense, these curves must be taken as approximate given the finite simulation durations and the limitations of the 2 ns data window, and are best taken as a measure of probabilities during the sampling procedure. However, error bars measured from differences among the independent runs do give an indication as to the variation of these computations, and they show that there are some statistically distinct features in these measurements for different force fields. Note that each force field/solvent pair produces a minimum around 2.8 Å corresponding to the formation of the salt bridge and a shallower minimum around 4.5 Å due to the second charged aspartate oxygen.
An analysis of the PMF curves shows the relative populations of ion pairs in each of the three solvents tested. The main minima in the PMFs in igb1 and igb5 are roughly equal in depth, indicating that the salt bridge is formed roughly the same fraction of time in both solvents. The heights of the barriers between igb1 and igb5 are different, however, which suggests that salt bridges in igb1 less readily dissociate once formed. In the igb7 case, the higher absolute position of the minimum means that salt bridges are less populated overall. Qualitatively, our simulations show a ranking of igb1, igb5, and igb7 from strongest to weakest salt-bridge stabilization in the protein G hairpin, in terms of populations of different salt-bridge states.
It is interesting to consider these results in light of the recent work of Lwin and Luo24 that found that salt-bridge populations are also significantly affected by the choice of force field. Indeed, our results show that the other force fields, beyond ff96, alter the strength of salt-bridge formation; in many of these cases, however, the effect is indirect, through a major altering of the secondary structure preference of the peptide. Still, it is challenging to rigorously separate force field and implicit solvent effects on ion pairing in these simulations. Lwin and Luo also suggest that the nonpolar surface area term contributes significantly to ion pairing.25 Our simulations clearly show an effect of the electrostatic part alone, since each of the three GB models is paired with the same surface area term. A useful future study might examine the effect of turning off the surface area component of the implicit models.
We find in the ff96 + igb5 REMD trajectories that non-native salt bridges are convergence traps, persisting for 100–1000 ps in run time at the lowest temperature replica, but not stable over longer times. In the protein G hairpin, the most frequently seen such decoy configuration entailed a one-residue shift in the strand alignment to accommodate the non-native Lys7–Asp11 pairing. This shifted alignment was more frequent in the ff96 + igb1 case, consistent with the relative salt-bridge stabilization of igb1 and igb5. For the EK peptide, we found non-native salt-bridge pairing occasionally led to a hairpin conformation from which it was difficult to reach the native helical state.
We performed a simple test to establish whether the salt bridges contributed significantly to the stability of the native structures in the force field. We excluded the possibility of salt-bridge formation during the simulations by removing the associated configurations from the sampling space. In our molecular dynamics runs, we introduced a repulsive potential felt only upon close approach between any two ion-pairing atoms, steep enough to prevent salt bridges but not so steep that it introduced numerical instabilities. The form of the potential is shown in Figure 6, chosen because it is an option in the AMBER 9 package. We chose parameters to create a steep repulsion that would mimic the effects of a hard overlap term: the potential is zero at large distances, rises harmonically from 6 to 4 Å with a force constant of 0.5 kcal/mol/Å2, and is continuously linear thereafter. For comparison, salt bridges typically form when the pairwise distance is less than 4 Å. We stress that this modification is not intended to be a reparametrization of the force field, but simply an easy way to exclude those portions of configuration space that correspond to salt-bridge formation. We have also verified that increasing the steepness of the repulsion does not have a statistically significant effect (data not shown).
We used the salt-bridge-free sampling on the best-performing force field, ff96 + igb5. Figure 7 shows the effect of removing the possibility of salt bridges on the PMF calculations for the protein G hairpin. The modification has two effects: as expected, it pushes the potential ion-pairing heavy atoms apart, but interestingly, it also increases the probability that the native contact is formed relative to the non-native salt bridge, as shown by the lower PMF for the former.
In all of the peptides we examined, we found that removing the possibility of salt bridges from the sampling did not significantly destabilize the native secondary structures. Figure 8 shows a comparison of the structures for the EK helix, protein G hairpin, Baldwin’s C peptide, and the trpzip2 mini protein. All four peptides contain potential salt bridges. Even without the repulsion, the ff96 + igb5 force field/solvent model combination yields reasonable results across all four systems. With the salt-bridge sampling, similar cluster structures were obtained, and in some cases, we found that convergence was accelerated. One minor exception is the case of the protein G hairpin, which exhibited a large run-to-run variation in the kinds of structures sampled when the salt bridge was not allowed to form (Figure 3a): this could be an indication that the native basin was broadened by the removal of salt bridges, since it resulted in both structures closer to and farther from the normal ff96/igb5 case. Overall, however, the fact that the conformations of the most populated cluster for each peptide did not change significantly during salt-bridge-free sampling suggests that the very close, strong association between charged ions is not critical for stability in these peptides.
We do caution that this salt-bridge-free sampling is a diagnostic, and not a general fix for ion-pairing overstabilization. In particular, it is possible that proteins with buried salt bridges, not investigated here, could be significantly destabilized by removal of salt-bridge interactions. Here, due to the small size of the systems studied, essentially every residue is exposed to solvent, and hence, all possible ion-pairing interactions are surface interactions. In contrast to this work, Simmerling and co-workers have attempted to address the ion-pairing overstabilization problem by rebalancing the force field, reducing the intrinsic (low-dieletric) radii of hydrogen atoms bound to charged nitrogen atoms.52
We also conducted a brief test of salt-bridge-free sampling on three larger proteins with surface salt-bridge pairs, to ensure it did not lead to unphysical artifacts in larger systems than peptides. We performed REMD simulations of three larger proteins, the 56-residue α/β protein G (PDB code 1GB1), 56-residue all-β src SH3 (1SRL), and the 46-residue all-α protein A (1BDD). Two simulations were performed for each, one with and one without the salt-bridge repulsion, and each run was initialized from the experimental native structure. We found that the rmsd-to-native of each protein after 10 ns of REMD remained the same in the salt-bridge-free case. The backbone rmsd’s for the final dominant cluster configurations in the normal/repulsion-applied cases were 1.2 Å/1.2 Å for protein G, 2.7 Å/2.6 Å for src SH3, and 2.2 Å/2.0 Å for protein A. Although such short REMD simulations for proteins of this size certainly do not reach equilibrium, these results do show that the added repulsion does not destabilize the native structure on this time scale.
We performed extensive REMD simulations of a series of candidate peptides to test four AMBER force fields with three generalized Born solvation models. We found that the combination of the AMBER ff96 force field with the Onufriev, Bashford, and Case solvent (igb5) was the most consistent in capturing the behavior of all four test peptides, with balance between strand and helical conformations. The combinations of ff03 with igb5 and ff96 with the older igb1 model of Hawkins, Cramer, and Truhlar were also consistent. In other cases, we found significant backbone torsional biases in some of the force fields when paired with GB models: ff99 had a strong tendency toward helical conformations, and while ff99SB was better, it still failed to predict the correct secondary structure of one peptide. These biases are likely due to a combination of force field and implicit solvation effects.
We also found that solvent models differed in the strength of ion-pairing interactions. Furthermore, even in the best-performing cases, we found that salt bridges could stabilize non-native contacts in the protein G C-terminal hairpin and the helical EK peptide. We introduced “salt-bridge-free sampling” to assess ion-pairing contributions to stability. We found that without these interactions the native peptide structures largely remained stable. Thus, the ion-pairing interactions in these peptides are not critical to maintaining the stability of the native structure, at least in the finite simulation time used.
Force fields and solvent models have well-known flaws. Is there a cancellation of errors in the combination of ff96 with igb5, whereby β-propensity in the force field is compensated by α-propensity in the implicit solvent? This is indeed a possibility. Since the focus of our study is force fields paired with implicit solvents, we have not performed comparisons with explicit simulations, and thus cannot directly comment on the corresponding performance of each force field in particular explicit water models. In the context of implicit models exclusively, however, there are some consistent features. For example, igb7 almost always destabilized the structures of each force field, while ff99 had a strong helical bias despite the solvation model used. On the other hand, igb5 and ff96 consistently found native or near-native structure when paired with multiple force fields or solvents, respectively. It is also interesting to note that other studies have shown balance in ff96: Okamoto et al. found ff96 to be balanced among the secondary structure types, using a different solvent model,17 and Zhou found that ff96 best reproduces the free energy surface of the protein G hairpin in comparing implicit to explicit solvation.22
In summary, we find that the AMBER ff96 force field and the implicit solvation model of Onufriev, Bashford, and Case best captures the correct balance of helical and β-hairpin tendencies of a few small peptides, of the AMBER force fields and implicit solvation models. We believe it may provide a starting point for predicting native protein structures and other properties of peptides and small globular proteins.
We thank Albert Wu for helpful discussions and for sharing with us his earlier force field tests, and Vincent Voelz and Banu Ozkan for helpful suggestions and ideas. We also thank John Chodera for providing references and helpful commentary. We appreciate the support of NIH Grants GM34993 and GM63592, and of the Air Force Materials Research Lab at Wright-Patterson Air Force Base, USAF-5408-04-SC-0003. This investigation was supported in part by the National Institutes of Health under the Ruth L. Kirschstein National Research Service Award GM08284. Its contents are solely the responsibility of the authors and do not necessarily represent the official view of the NIH.
M. Scott Shell, Department of Chemical Engineering, University of California Santa Barbara, Santa Barbara, California 93106-5080.
Ryan Ritterson, Graduate Group in Biophysics, University of California San Francisco, San Francisco, California 94143.
Ken A. Dill, Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California 94143.