|Home | About | Journals | Submit | Contact Us | Français|
We have developed a version of our surface generalized Born (SGB) model that employs a Gaussian surface, as opposed to the van der Waals surface used previously. The Gaussian surface is smooth and its properties are analytically differentiable with respect to the positions of atoms. A significant advantage of a solvent model based on this analytically differentiable surface is the availability of analytical gradients of the surface and solvation forces. An efficient and robust algorithm is designed to construct and triangulate the Gaussian surface for large biomolecules with arbitrary shapes, and to compute the various terms required for energy gradients. The Gaussian surface is shown to better mimic the boundary between the solute and solvent by properly addressing solvent accessibility, as is demonstrated by comparisons with standard Poisson–Boltzmann calculations for proteins of different sizes. These results also demonstrate that surface definition is a dominant contribution to differences between GB and PB calculations, especially if the system is large. Application of the new surface to prediction of long loop regions is presented, and significant improvement in the energetics is seen compared with results obtained using the van der Waals surface, even in the absence of optimized empirical correction terms that were used in the latter calculations.
Solvation is a fundamental phenomenon in biomolecular systems. It plays a significant role in energetics, dynamics, and structures of biomolecules, such as proteins, in solution. The development of accurate and efficient solvation models is thus crucial to understanding the chemical and physical properties that underlie many biomolecular processes. Explicit solvent models treat solvent molecules at an atomic level of detail. The high computational cost associated with explicit solvent simulations has been a bottleneck in their application to complex computational modeling tasks such as protein–ligand docking and protein folding. The inclusion of a large number of explicit solvent molecules significantly increases the computational expense, especially due to the extensive equilibration and averaging required to converge the solvation free energy.
Driven by these computational exigencies, a great deal of effort has been put into the development of implicit solvation models over the past 2 decades. Poisson–Boltzmann (PB) methods, which solve the PB equation to calculate solvation free energies, have been established as a standard approach. The PB equation can be derived using well-defined approximations, is tractable for large systems, and is amenable to parameterization against experimental data. However, the computational expense involved in solving the PB equation, whether by means of finite-difference,1–4 finite-element,5,6 or boundary element methods,7–10 is significantly larger than that of calculating the energy of the solute with a molecular mechanics force field, and it is also nontrivial to compute gradients of the solvation free energy for large systems.
Simpler implicit solvation models such as effective-dielectric models or solvent-exposed surface area models do not provide a robust description of electrostatics, and hence, have great difficulty in achieving the same accuracy as the PB methods.11 Among a variety of semianalytical approximations to the PB methods, the generalized Born (GB) model proposed by Still and coworkers12 appears quite promising because of its simplicity and ability to at least semiquantitatively reproduce PB results. In the GB model, the pair-interaction energy is described by an analytical function of atomic positions, which makes it easier to compute atom-based effective solvation forces. The major computational expense arises from calculating the self-interaction energy, or the effective Born radius, for each atom. A recent study shows that with “perfect” effective Born radii, defined as those obtained by solving the PB equation with only one atom charged at a time, the GB model gives a very good approximation to the underlying Poisson Equation (PE) theory for a variety of biomacromolecular structures and conformations.13 A variety of approaches to calculate the self-interaction energies have been developed by various laboratories such as the overlapping spheres approach by the groups of Cramer, Truhlar, Case, and Brooks,14–19 the asymptotic approach by the Still group,20 the analytical continuum electrostatics (ACE) model of the Karplus group,21 and the surface GB (SGB) model developed in the Friesner group.22
There are two main approximations associated with the calculation of the effective Born radii. The first approximation is the Coulomb field approximation (CFA). In the CFA, the contribution to the electric field from the induced polarization charge distribution on the dielectric boundary is neglected and the electric field is approximated as only the Coulombic field generated by internal atomic charges. The resultant electrostatic solvation energy has an inverse fourth-power distance dependence coming from the classical monopole-induced dipole interactions. This treatment is rigorously correct for a source charge lying at the center of a sphere (i.e., a single atom). However, the deviation can be very large when the shape of the macromolecule is far from spherical and the source charge approaches the surface. Parameter optimization can reduce the errors in the CFA, depending on the size of the system. Our group has introduced several empirical corrections terms to this, which have considerably improved the agreement between the GB model and the PB model.22 Recently, the Brooks group has developed an empirical correction term that involves the square root of a fifth-order distance dependence.18
The other approximation lies in the description of the boundary that delimits solute and solvent. One of the simplest approaches, which has dominated in work on GB models, is to represent each atom by a hard sphere with chosen radius. The whole biomolecule is hence a superposition of all atomic spheres. This surface is termed the van der Waals (VDW) surface. This simple surface is easy to implement and display graphically, but it overestimates the volume of space near molecules that water can access. As the size of the solute increases, many narrow invaginations or even cavities form, which are treated as having a high dielectric constant. However, in many cases such regions are physically inaccessible to solvent molecules because they are too narrow or enclosed.
To address this problem, the solvent-accessible surface was introduced by Lee and Richards.23 It is defined as the surface traced by the locus of a probe sphere rolling over the VDW surface. The radius of the probe sphere is chosen to approximate that of a water molecule. The obtained surface is displaced outward from the VDW surface by a distance equal to the radius of the probe sphere. Later, the definition of the molecular surface was presented by Richards.24 This surface is defined to be the smooth surface traced by the inward facing part of the probe sphere rolling over the VDW surface, and consists of two parts: the contact surface, the part of the VDW surface that is accessible to the probe sphere, and the reentrant surface, the part of the inward-facing surface of the probe sphere when it is simultaneously in contact with more than one atom. A variety of implementations of the molecular surface have been reported.25–28 Probably the most important of these algorithms is the efficient analytical implementation given by Connolly.28–30
Each of these definitions represents component atoms as hard spheres, which leads to either the unavailability of analytical electrostatic forces or discontinuities of analytical electrostatic solvation forces at the intersections of the atomic spheres.
An alternative description of the shape of molecules is the isocontour of electron density of molecules, which can be approximated using Gaussian functions. The Gaussian technology has been adopted as a simple solution to calculate the electron density in quantum chemistry.31 In this technology, the total electron density at a point in space is calculated as the sum of each atomic contribution, which is a three-dimensional Gaussian function.
Both the Connolly surface and a Gaussian surface of the type discussed above can be designed to exclude water molecules from regions where they obviously are unable to penetrate. The two surfaces will differ in their details, particularly in complex regions involving multiple points of contact for a probe water molecule. In such regions, it is far from clear which surface provides a more accurate description of the physical system, and in any case, both methods require extensive parameterization to properly describe first shell interactions of water molecules with polar or charged groups. However, the Gaussian surface does possess a significant practical advantage; it is smooth, that is, continuous and analytically differentiable. This property enables us to calculate the electrostatic solvation force analytically. Furthermore, a Gaussian function’s derivative is itself a Gaussian, which facilitates efficient computation of the electrostatic solvation force.
There have been some attempts to adopt the idea of the Gaussian surface, that is, isocontour of overlapping Gaussian functions, in molecular modeling applications. Walker et al.32 employed Gaussian-type functions to calculate electron density and describe molecular shape. Gabdoulline and Wade33 used Gaussian functions to calculate isocontour points to analyze molecular interaction properties by visually inspecting shape complementarities of the isocontours. Both of these studies generated only a discretized set of surface points, which is insufficient for carrying out volume or surface integration for calculating GB self-interaction energies. Friedrichs et al.34 constructed a molecular surface, but then used a Gaussian function to assign dielectric constants to grid points in their PB method. Duncan and Olson35 used Connolly’s triangulated molecular surface as a starting point and gradually moved these surface points towards the contour value along the direction of the gradient of the density until all surface points were within a predefined error tolerance of the chosen contour value. In principle, it would be possible to apply their approach to our goal of calculating GB self-energies, but we describe here a more efficient and robust surface generation method.
Another gaussian non-isocontour description of molecular shape has been used to calculate efficiently the volume or surface area of molecules, using analytical expressions.36 Recently, this Gaussian volume exclusion approach has been applied to generating a smooth dielectric function for PB methods37 and the development of a shape-based docking scoring function.38 It has also been used by Schaefer and Karplus21 to carry out the volume integral in the calculation of self-interaction energies in their GB method. However, these approaches cannot be used to carry out surface integrals because of the lack of local surface properties. Furthermore, the results were compared with the results from the VDW surface and excellent agreements were achieved. However, the VDW surface is not a good benchmark to choose because it is well known for its weakness in addressing solvent-accessibility, as was discussed above.
In this article, we explore how the surface definition impacts the calculated electrostatic solvation energies. In particular, we present an efficient and robust method to directly construct the Gaussian surface and develop a new-generation SGB model based on this surface. The electrostatic solvation force is subsequently derived and successfully applied to protein structure minimization. We choose the SGB approach to calculate the self-interaction energy of each atom in the GB model. A noteworthy point is that we do not include the correction terms from our previous SGB model22 in the new SGB model. The correction terms are specific to the underlying model, and will need to be reparameterized; we will pursue this direction in future publications. However, in the absence of any empirical terms of this type, the Gaussian SGB model appears to perform well in the initial tests we report here.
The article is organized as follows. In the next section we summarize the theory involved in the PB method and the SGB model. Then we describe the algorithm to construct the Gaussian surface and deduce the formulae to calculate the gradient of electrostatic solvation free energy with respect to atomic coordinates, that is, the electrostatic solvation force (details are presented in the Appendix). Then we present the new-generation Gaussian surface-based SGB model, comparing with the PB and prior SGB model results, for a number of proteins. After validating the new method by applying it to loop prediction cases where the prior SGB model has been successful, we then apply it to a number of long loop prediction problems for which the prior SGB model exhibited clear energy errors, using the gradient optimization capabilities to generate optimized structures. We then discuss the implications of the results, and conclude by summarizing our findings and presenting future directions.
The PB equation treats the aqueous solvent as a dielectric continuum. One solves eq. (1) below to find the resulting electrostatic potential, (), at point from the charge density, ρ(), and the dielectric constant, ε(), of the system:
A PB model is specified by the charge distribution (arising from the molecular mechanics potential function of the solute) and the functional form of the dielectric of the system. Typically, this is defined as a dielectric surface; all points in the system falling inside the surface (i.e., in the interior of the protein) are defined to have dielectric of unity (or whatever the internal dielectric is specified as—this itself is a subject of considerable controversy), whereas points falling outside of the surface are defined to have dielectric 80 (given that we are modeling aqueous solution).
To solve the PB equation, we use the DelPhi program of Honig and coworkers.2,39 – 41 DelPhi uses a finite difference approach to solving the PB equation, with a number of specialized numerical techniques that ensure both accuracy and high efficiency. We have tested the grid resolution in Delphi for the present problem and obtained converged results using 0.5 Å/grid (that is, finer grids minimally change the results presented below). For maximum consistency, we have input the OPLS-AA charges and van der Waals radii into DelPhi.
The SGB model is an approximation to the boundary element formulation of the PB equation, in which the polarization effects throughout the entire volume of the system can be exactly reproduced by an appropriate distribution of induced polarization charge at the surface of solute molecules. A derivation of this approximation has been presented previously,22 and was shown to be equivalent to the volume formulation of the GB method by Still and coworkers,12 assuming equivalent dielectric surfaces.
The total electrostatic solvation free energy can be divided into two terms—a self-term and a pair term:
The formulas to calculate the self-term and the pair term are given as
where qk is the charge on atom k, k is its coordinate, ε is the dielectric constant, is the coordinate of a point on the dielectric boundary, () is the normal at the point, αk is the Born alpha radius for atom k, rkl is the distance between a pair of atom k and l,
We define the dielectric boundary of the molecule using a Gaussian density function, constructed as a superposition of atomic functions. First, the biomolecular system is mapped onto a cubic grid. Although the grid resolution is specified as a parameter in the new SGB model, we have chosen for the calculations presented in this article the resolution of 0.5 Å/grid, which is more than enough to obtain converged results (that is, finer grids minimally change the results presented below). The Gaussian density D at each grid point is defined by summing the atomic Gaussian contributions:
where the sum is over all atoms i of the solute, i is the coordinate of atom i, is the coordinate of the grid point, Ri is the radius of atom i, and a is a parameter defining the smoothness of the surface. The radii we use are taken from the Lennard–Jones parameters of the OPLS-AA force field.42 We have experimented with the optimal value of a, which controls the “smoothness” of the surface, by comparing the shape and curvature of the Gaussian surface of a three-atom system (Fig. 3) and another 56-atom system (results not shown) with the molecular surface and chosen the value of 2.5. This represents a compromise between a surface that is too smooth to accurately define the shape (value of a too small), and a surface that is too close to a VDW surface, with the associated problem of inappropriately high solvent accessibility, for example, in the macromolecular interior (value of a too large). The choice of the water probe radius used for the molecular surface will affect our choice of a as well. The larger the water probe radius, the smoother the molecular surface, an effect that can be mimicked by decreasing the value of a. Here we have used the standard water probe radius of 1.4 Å. More quantitative tests can be imagined to determine the value of a, such as comparing the surface areas of the Gaussian surface and molecular surface on a representative set of structures. However, it is clear in our much simpler tests that the value of 2.5 for a is nearly optimal, and further refinement, if any is required, will take place in the context of parameter optimization with respect to experimental solvation free energies. As shown in Figure 4, the Gaussian surface of the large HIV-1-TMC125 complex (two chains, 979 residues in total) created using a = 2.5 mimics the molecular surface. In addition, our application of the new solvation model based on this Gaussian surface to protein loop prediction further validates that this choice of a is a reasonable one.
With a = 2.5, the contribution to D from atoms that are located more than twice their radius from the grid point in question is negligible, and neglected in our algorithm. The gradients of D with respect to the x component xi, y component yi, and z component zi of the i are calculated for the later computation of the effective electrostatic solvating force:
Having calculated the density D at the grid points, we can judge whether a grid point is located inside or outside of the surface by comparing its electron density with the chosen contour value for the Gaussian surface. If it is larger than (or equal to) the preset value, the grid point is inside (or on) the surface; if it is smaller than the preset value, the grid point is outside the surface. When one of two adjacent grid points has density larger than the preset value and the other has density smaller than the preset value, the surface must intersect the cube edge connecting these two points since the density is continuous.
The contour cutoff value Dcutoff for the Gaussian surface cannot be chosen arbitrarily once the value of a is chosen. Their relationship can be deduced from the case of a single atom, for which the Gaussian surface should be identical to the hard-sphere surface. Applying eq. (5) to the surface of this single atom, we have
Because a = 2.5, the value of Dcutoff is exp(−2.5), approximately 0.082085.
The position of the intersecting surface point can be approximated by linear interpolation:
where in and Din are the coordinate and the density of the grid point inside the surface, respectively, and out and Dout are the coordinate and the density of the grid point outside the surface. Actually, two coordinate components of in − out are 0 and the third component is l or −l, where l is the size of the grid, which reduces the computational cost by two-thirds. The partial derivative of with respect to xi is
Similarly, we have
There are eight grid points in each cube, and there are two states for each grid point, either outside or inside the surface. Thus, there are in all 28 = 256 ways in which a surface can intersect the cube. It is possible to triangulate the entire Gaussian surface by independently triangulating the intersecting surface component in each cube. However, enumerating a triangulation pattern for each of the 256 cases is tedious. Two different symmetries of the cube reduce the problem from 256 cases to 15 cases, as proposed in the “marching cube” algorithm by Lorensen et al.43 First, the topology of the triangulated surface is unchanged if the states (being inside or outside the Gaussian surface as mentioned above) of all eight vertices in a cube are inverted together. The only thing that changes is the surface normal. This inversion symmetry reduces the number of the cases from 256 to 128. The second symmetry, rotational symmetry, further reduces the problem to 15 basic cases, as shown in Figure 1a. However, it is found by Dürst that these 15 triangulation patterns are not self-consistent, that is, the obtained surface is not always enclosed.44 Figure 2 illustrates an example where an artificial gap can be observed. Heiden et al.45 proposed an additional triangulation pattern for the inverted case of basic case 3, 6, 7, and 4, respectively, to solve this problem. Actually, the additional triangulation pattern for the inverted case of basic case 4 is unnecessary from the point of view of self-consistency. With a slight modification of Heiden and coworkers’ patterns by eliminating the unnecessary inverted case of basic case 4, we have our improved marching cube triangulation patterns, 18 in all, as presented in Figure 1a and b. This algorithm always converges and is very accurate and efficient.
The center j,, area Aj, and unit normal j (pointing outward) of each surface triangle j, which are needed in the later computation of the effective Born radius in the SGB model, are subsequently calculated by
where j is the surface normal before normalization, I, II, and III are the positions of the three vertices of the surface triangle, numbered so that the direction of the unit normal j complies with the right-hand rule. With the knowledge of vector differentiation r = /r,, where = x + y + z and r = ||, the gradients for the center, normal, and surface area of a surface triangle can be easily derived:
Because of its length, this derivation has been placed in an Appendix.
We first investigate the question of whether the Gaussian surface as defined above properly addresses issues of solvent accessibility, from a qualitative point of view. To answer this question, first we study a simple system of three identical atoms, where each pair of atomic spheres is tangent to each other. The atomic radius is set to 2.0 Å, close to that of a heavy atom (C, N, O) in a biomolecular system. The VDW and Gaussian surfaces for this system are presented in Figure 3. In the VDW surface, as shown in Figure 3a, an enclosing cavity forms in the middle of the three atomic spheres. This cavity is assigned the high dielectric constant of the solvent in implicit solvent models. However, it is in practice solvent inaccessible and should be assigned a low dielectric constant. In the molecular surface, as shown in Figure 3b, this cavity no longer exists. The region occupied by the cavity is now treated as part of the solute. Furthermore, the solvent-inaccessible invagination area between each pair of atoms is corrected in the molecular surface by a reentrant surface. Similarly, in the Gaussian surface, the cavity in the middle of the system is eliminated, as shown in Figures 3c and 3d, and the narrow invagination areas between each pair of atoms are also corrected. Actually Figures 3b and 3d show that the Gaussian surface behaves very similarly to the molecular surface. The physical fact that the density at a point is a sum of the contributions from adjacent atoms ensures the correct treatment of solvent accessibility in the Gaussian surface, leading to the formation of this kind of reentrant surface. On the other hand, the Gaussian surface is smooth, as expected, which is vital for its application to the development of implicit solvent models with analytical electrostatic solvation forces.
To examine the same question for a large system, in Figure 4 we compare the molecular surface and Gaussian surface for the complex of HIV-1 and nonnucleoside reverse transcriptase inhibitor (NNRTI) TMC125-R165335 (etravirine), 1SV5 in the protein data bank (PDB),46 which has 979 residues. Clearly, both surfaces show similar encapsulation of the ligand and reveal three key cavities in HIV-1. In addition, while the shape and curvature of the Gaussian surface are similar to the molecular surface, the Gaussian surface appears to be smoother, and tends to remove the artificial sharp boundaries that occur at the intersections of atomic spheres in the molecular surface.
Quantitative comparisons are more difficult to evaluate in a rigorous fashion. Many previous workers have set as an objective for approximate continuum methods such as generalized Born approaches the reproduction of the solvation free energy obtained from accurate numerical solution of the PB equation. However, agreement with PB results is not by any means equivalent to agreement with experimental data. The PB results themselves are strongly dependent upon the definition of the dielectric surface (with regard to both functional form and the specific atomic parameters used to build the surface). It is far from clear, for example, whether the Connolly surface is a better or worse representation of physical reality than the Gaussian surface defined above, yet the numerical values of solvation free energy obtained from each can be quite different. Furthermore, we have recently shown that both PB and GB models deviate substantially from explicit solvent potentials of mean force when a single bridging water separates two charged groups.47 Hence, one cannot simply use detailed comparisons of a GB model with any particular PB calculation as a measure of quantitative accuracy of the predicted solvation free energy of the GB model in question. Direct comparison with experimental data (e.g., predicting the conformations of loops and side chains in protein structures) is ultimately the only important measure of the accuracy and relevance of the model.
However, we can extend the analysis initiated above with regard to eliminating qualitatively incorrect features of the dielectric surface (which are clearly wrong based on straightforward physical arguments), and correlate elimination of these features with closer agreement of PB and GB results. That is, we expect that when we use the Gaussian surface rather than the VDW surface in an SGB calculation, solvation free energy differences between different conformations become closer on average to the PB values as a result.
We denote our new-generation of SGB model based on the Gaussian surface the Gaussian surface-based SGB model (GSGB). Because the error in solvent-accessibility in the VDW surface becomes more serious as the size of the biomolecular system becomes large, we apply the new GSGB model to a series of conformations, generated by a minimization on a native structure, of three systems ranging from a small peptide to a large protein, and compare the GSGB results obtained with the results from molecular surface-based DelPhi and the original VDW surface-based SGB (VSGB) model. The absolute value of the electrostatic solvation energies for the same conformation differs across these three methods, and we do not expect to see the absolute results of VSGB and GSGB match well with DelPhi because we do not incorporate any correction terms to improve the errors introduced by the Coulomb field approximation used to calculate the effective Born radii. The differences are highly dependent upon parameterization, and, as argued above, it is not necessarily clear that the DelPhi values are correct compared with experimental results. Moreover, in most applications, only the relative energies are important. Therefore, we also set to zero in all three approaches the electrostatic solvation energy for the conformation with the lowest solvation energy in DelPhi and present the relative energies.
The absolute and relative results for the small peptide, PDB code 1ct6, which has 124 atoms, or nine residues, are shown in Figures 5 and and6.6. Both VSGB and GSGB agree reasonably well with DelPhi (noting again that below some threshold, quantitative differences are difficult to interpret). This system is so small that most of its atoms are substantially exposed to water. Few cavities form, and few atoms are buried in the interior of the system. In this case, the VDW surface does not overestimate the solvent-accessible surface, and thus the VSGB solvent model does rather well. Apparently the small deviation of VSGB and GSGB from DelPhi comes much more from the Coulomb field approximation in the GB formulation to PB rather than from the surface definition. As a matter of fact, VSGB is somewhat closer to the PB results than GSGB, which is quantified by the smaller RMSD of the VSGB points from DelPhi. However, the difference between the two is quite small.
As we increase the size of the system to that of 1cbh, 499 atoms or 36 residues, the VSGB results begin to deviate erratically from the DelPhi results, while the GSGB results stay reasonably close to the DelPhi results, as shown by the VSGB’s poorer correlation with DelPhi in Figure 7 and its larger RMSD in Figure 8. In this case, there are some cavities formed inside the protein and some atoms are buried in the interior, and the error in the definition of surface begins to surpass the error in the Coulomb field approximation in the GB formulation.
Last, we study the protein 1mo9. This is a large system, which consists of two protein chains with a total of 16,012 atoms or 1046 residues. The results are shown in Figures 9 and and10.10. It is clear that the VSGB results now deviate erratically from the DelPhi results while the GSGB results still agree well. When the system gets this large, the error in the definition of surface becomes dominant while that in the Coulomb field approximation becomes less important. In such a large system, many cavities exist inside the protein. Furthermore, many atoms, including charged atoms, are entirely buried in the interior of the protein. The self-interaction energies for these buried atoms are small, and their effective Born radius can be as large as 10 Å or more. However, in the VDW surface, all these cavities are treated as high dielectric, and many of the buried atoms are still treated as exposed to water to different extents. Hence, in the VSGB model the calculated self-interaction energies of buried atoms tend to be larger than their corresponding PB values and their effective Born radii tend to be systematically smaller than the effective PB values. In fact, the effective Born radii rarely exceed 10 Å in the VSGB model, as shown in Figure 11. On the other hand, the cavities and buried atoms are realistically treated in the Gaussian surface, which leads to good qualitative agreement of the relative energies in the GSGB model with DelPhi. In addition, the plot in Figure 11 indicates that the relationship between the effective Born radii of GSGB and DelPhi is systematically linear (on average; the deviation of individual atoms’ effective Born radii mostly comes from the Coulomb field approximation, which will cancel to a great extent in the calculation of relative energies, especially for large systems) while the relationship between VSGB and DelPhi is roughly linear for small Born radii but then deviates sharply, with a bias towards smaller radii for atoms in the interior of the system (Born radii >7 Å), as we expected. The success of the GSGB model on such a large system also highlights the robustness of our Gaussian surface construction algorithm.
To test the analytical electrostatic solvation forces, and obtain a preliminary assessment of the energetics in the GSGB model, we have integrated the model with the protein local optimization program (PLOP),48–53 a protein structure prediction package developed by the Jacobson and Friesner groups. The sampling and minimization algorithms in PLOP have been described previously, and this description will not be repeated here.50–52 Replacement of forces and energies from the previously used VSGB model with their GSGB equivalents was straightforward, as no changes were made in the underlying algorithms.
We have previously reported the use of PLOP for predicting the conformations of loops with lengths ranging from 4 residues to 12 residues, starting from the native protein structure.50 In general, the performance of PLOP and the VSGB model in these tests has been excellent, representing a clear advance in the state of the art compared to alternative results in the published literature. However, there are on occasion loops that exhibit substantial deviations from the native structure, and some of these outliers can be identified as failures of the energy model (as opposed to errors caused by incomplete sampling, which represent an entirely different class of problems). Such failures become more frequent as the length of the loop increases; if long loop prediction is to be made robust, the sources of these errors must be identified and eliminated.
We have hypothesized that problems associated with the VSGB dielectric surface are responsible in part for the long loop energy errors that we have observed. One can imagine that in long loop prediction, incorrect structures with small pockets or invaginations are created, and that these are incorrectly assigned more solvation free energy than they physically deserve by the VSGB surface. If this is indeed the case, we would expect that when such false positives are presented to the GSGB surface, their energy relative to the native structure should become less favorable. Note that we do not necessarily expect that the current version of the GSGB model will unerringly select the native conformation as best; previous experience indicates that some degree of empirical parameterization to experimental data is needed to obtain highly accurate and robust results, and we have not yet carried out such parameterization for the GSGB model (which has to be started from scratch). However, reduction of the large energy gaps seen in the VSGB false positives is a reasonable expectation.
As a first test suite, we selected two set of “loop decoys”: series of conformations for protein loops of various lengths ranging from native-like to highly nonnative structures. These decoy sets consist of all loop conformations generated during the testing of a previously published loop prediction algorithm that utilized the VSGB solvent model.50 The first decoy set contains 25 four-residue loops, each with around 300 decoys as well as relaxed native structures by minimization and by side-chain optimization followed by minimization. The second decoy set contains 12 12-residue loops, each with around 1100 decoys as well as relaxed native structures. We then compute the VSGB and GSGB energy of each conformation of each loop. In these calculations (and the following loop calculations as well), crystal packing effects are considered, that is, the simulated systems include multiple copies of the conformation described by the corresponding PDB file with the exact number of copies depending on the space group of protein unit cell. As we have reported previously, VSGB on average does an excellent job: the average RMSD of lowest-energy conformations for the four-residue decoy set is 0.27 Å with a standard deviation of 0.33 Å, and the average RMSD 0.60 Å for the 12-residue decoy set with a standard deviation of 0.45 Å. The GSGB does even slightly better. The GSGB average RMSD for the four-residue decoy set is 0.20 Å with standard deviation 0.13 Å, and 0.58 ± 0.39 Å for the 12-residue decoy set. The detailed results are shown in Tables 1 and and2.2. These results are encouraging because the VSGB model required extensive parameterization (adjustable atomic radii and several types of correction terms) to obtain these excellent results, while the GSGB model has not yet been subjected to parameter optimization.
Then we selected seven 13-residue loops (chosen from our test suite of 40 such loops) for which the VSGB solvent model scored nonnative conformations as having significantly lower energy than the native. These failures, which may reflect deficiencies of the energy function, are much more common in longer, more flexible loops. Details of each test case are listed in Table 3. We compute the GSGB energy of the 52 lowest energy conformations generated by VSGB. We also compute these energies for the relaxed native structures. We then minimize each conformation using the GSGB analytical gradient methods described above, which results in a substantial reduction in the GSGB gradient and energy.
The results for the various test cases are summarized in Table 3. The “energy error” in Table 3 is defined by taking the lowest energy native-like structure (choosing whichever of the minimized native structure, or side-chain optimized native structure, is lowest in energy, typically the latter) and comparing its energy with the lowest energy loop generated by PLOP. For the VSGB model, the energy errors are all negative; these relatively large negative values are, in fact, why the specific test cases were selected. We also plot the GSGB and VSGB energy vs. RMSD for the 1cnv test case in Figure 12 for all of the native-like and false positive conformations, to provide some illustrative detailed results.
It is clear that, as was hypothesized above, the GSGB model has substantially reduced the energy gaps between the false positive VSGB conformations and the native or native-like structures. In the cases of 1cnv, 1dpg, 1ojq, a low RMSD loop conformation is actually lowest in energy. In the case of 1mo9, a low RMSD structure is almost lowest in energy, with the energy error being reduced to only −1.91 kcal/mol. In the remaining cases of 1yge and 1ock, the energy gap is quantitatively reduced, opening the possibility that further parameter optimization will, in fact, enable the native structure to be selected as lowest in energy (although it should be noted that for 1yge and 1ock, the gap is still quite large, potentially indicating other significant problems with the model). Note that the present test cases are selected to have an extremely high degree of difficulty with regard to recognition of the native structure (compared to the most loop prediction test cases we have investigated, where VSGB works very well, or randomly generated structural perturbations, which do not seek out and maximize flaws in the energy model, and hence, are often trivial to rank as higher in energy than the native structure). Finally, it is possible that one or more of these cases is failing for reasons unrelated to the solvent model, or even the force field, for example, incorrect assignment of the protonation states of titrable residues on the loop or on neighboring side chains. Further investigation will be required to more rigorously assess the quality of the current model, and to discriminate among the various possible reasons for the remaining energy errors.
All of the aforementioned calculations were carried out on Intel PIII 1.40-GHz processors, and a summary of the computational performance of the new GSGB model compared to the VSGB model is presented in Table 4. In general, a single-point total energy calculation with the GSGB model takes about 1.4 times as long as with VSGB, and a single-point gradient calculation with GSGB takes about 2.1 times as long as the VSGB, which does not compute derivatives of the Born alphas. As the primary purpose of this article is to introduce and validate the new GSGB model, we used a resolution of 0.5 Å/grid, which may be finer than is required for many applications. In practice, a coarser resolution or multilevel resolution can be used to speed up the new model significantly. It turns out that when we used a resolution of 1.0 Å/grid, the GSGB model became as fast as or even faster than the VSGB model in single-point energy calculation and the results are within +2% of the results of 0.5 Å/grid resolution, as shown in Table 5.
It is clear from the results presented above that the definition of the boundary between solvent and solute in GB models has a large impact on their abilities to estimate the electrostatic solvation energy accurately. The Gaussian surface has been shown to qualitatively reduce the problem of spurious cavities and invaginations relative to a VDW surface. A new-generation SGB model based on the Gaussian surface, without any correction terms to the first Coulombic approximation, nor with any force field parameter optimizations, is demonstrated to qualitatively match PB results, even for a system with more than 1000 residues. Our results demonstrate that the surface definition dominates the qualitative differences between GB and PB results when the VDW surface is used for the former when the system gets large. The significance of the remaining differences in relative energies of different conformations requires further investigation, as does the question of how the details of surface definition (e.g., Connolly surface vs. Gaussian surface) affects accuracy compared to physical reality.
The present model does not incorporate any correction terms or parameter optimization designed to improve agreement with experimental data. Once a surface has been defined that qualitatively exhibits the correct behavior with regard to excluding water from unphysical regions, it is our view that other aspects of the model should be calibrated by direct fitting to experiment and/or explicit solvent simulations, as opposed to the use of a particular approximation (e.g., PB solutions using the Connolly surface) as a benchmark. We have recently shown that both GB and PB continuum solvation methods display serious errors in modeling ion pairs separated by a single bridging water; there was no evidence from these calculations that PB results are at all superior to GB results in comparison with potentials of mean force obtained using explicit solvent models.47 One productive direction to follow is optimization of the model to maximize the accuracy of protein side-chain prediction. We previously optimized the VSGB model using this approach, dramatically reducing overprediction of salt bridges via the use of empirical correction terms. However, such correction terms are specific to the underlying continuum model, so a new round of optimization will have to be carried out for the present GSGB model. The combination of correction terms and the use of the Gaussian surface should result in a model that is an improvement over previous efforts.
Significant progress can also be made with regard to computational performance. The efficiency of the GSGB model can be further improved by utilizing variable grid spacings, where a finer grid is used in the neighborhood of charged or polar atoms and a coarser grid is used in the neighborhood of uncharged or nonpolar atoms. Such an approach would speed up both the surface construction and subsequent surface integration to compute effective Born radii. Also, there is no need to compute the GSGB solvation force for every minimization step because the force does not change significantly within such a short period. With careful bookkeeping, we only need to update it once every several minimization steps. This way, the computational cost of minimization can be cut to a fraction of what it is now.
Contract/grant sponsor: the NSF; contract/grant number: MCB-0346399, to M.P.J.
Contract/grant sponsor: NIH; contract/grant numbers: GM071790 and GM56531
Contract/grant sponsor: HHMI Biomedical Research Support Program; contract/grant number: 5300246 to the UCSF School of Medicine
Contract/grant sponsor: NIH; contract/grant number: GM-52018, to R.A.F.
Z.Y. thanks Dr. David Pincus for providing the seven test cases of long loop conformations, Dr. David Rinaldo for helping with figure preparation, and Dr. Yixiang Cao at Schrodinger for helpful calculations.
The electrostatic solvation energy consists of two components: self-interaction energy and pair-interaction energy. Correspondingly, the electrostatic solvation force has two components: one from the gradient of the self-interaction energy, and the other from the gradient of the pair-interaction energy:
The gradients of the total self-interaction energy are the sum of the gradients of the individual self-interaction energy for each atom:
From the formula for the self-interaction energy of each atom, eq. (3), we have
In eq. (A.4), δ is the Delta function. Similarly, we have the formulae for the partial derivatives with regard to the y, z components
Then the gradient of the effective Born radius αk can be derived based on the gradient of the self-interaction energy:
Last, the gradient of the pair-interaction energy is derived based on the gradient of the effective Born radius:
Similarly, we have the gradients with regard to the other two coordinate components: