|Home | About | Journals | Submit | Contact Us | Français|
Structure-based predictions of binding affinities of ligands binding to proteins by coordination bonds with transition metals, covalent bonds, and bonds involving charge re-distributions are hindered by the absence of proper force fields. This shortcoming affects all methods which use force-field-based molecular simulation data on complex formation for affinity predictions. One of the most frequently used methods in this category is the Linear Response (LR) approach of Åquist, correlating binding affinities with van der Waals and electrostatic energies, as extended by Jorgensen’s inclusion of solvent-accessible surface areas. All these terms represent the differences, upon binding, in the ensemble averages of pertinent quantities, obtained from molecular dynamics (MD) or Monte Carlo simulations of the complex and of single components. Here we report a modification of the LR approach by: (1) the replacement of the two energy terms through the single-point QM/MM energy of the time-averaged complex structure from an MD simulation; and (2) a rigorous consideration of multiple modes (mm) of binding. The first extension alleviates the force-field related problems, while the second extension deals with the ligands exhibiting large-scale motions in the course of an MD simulation. The second modification results in the correlation equation that is nonlinear in optimized coefficients, but does not lead to an increase in the number of optimized coefficients. The application of the resulting mm QM/MM LR approach to the inhibition of zinc-dependent gelatinase B (matrix metalloproteinase 9) by 28 hydroxamate ligands indicates a significant improvement of descriptive and predictive abilities.
The continuously growing number of protein structures and protein structural models, resulting from the knowledge of genomes, represents enormous promise for development of drugs and other bioactive compounds. Full utilization of the potential requires an armory of computational methods for prediction of ligand-protein binding affinities. Structure-based prediction methods with various levels of accuracy and speed are available today: fast docking and scoring approaches [1, 2] for reduction of vast chemical libraries to several hundreds of probable binders, the second-pass methods selecting a dozen or so compounds with a high probability of success, and the most sophisticated methods like Free Energy Perturbation  and Thermodynamic Integration , which can e.g. discern the affinity of closely related analogs.
In recent years, most developments have been seen in the second category of methods, which are based on decomposition of the binding free energy. The methods usually use conformational sampling of the ligand-receptor complex and of free reactants by Molecular Dynamics (MD) or Monte Carlo (MC) simulations, although the use of multiple structures corresponding to energy minima  or single optimized structures to speed up the calculations have also been advocated [6, 7]. Other differences appear in the following areas: the treatment of solvent (explicit , continuum [6, 9], or in vacuo [6, 10]), estimation of the electrostatic component of solvation energies (linearized Poisson-Boltzman equation [5, 6, 10–13], the generalized Born model [9, 12, 14], or the pair-wise Coulomb relations in explicit solvent ), and the parameter optimization (used  or not used [5, 13]).
We prefer to use Linear Response (LR) method  and, especially, its extended variant , because they contain 3–5 adjustable coefficients, which, in our opinion, account for the differences between the simulated and experimental systems better than other approaches. The LR decomposition of the free energy of binding, ΔGb, reads:
where K is the association constant, R is the universal gas constant, T is temperature, E denotes the energy (van der Waals energy and electrostatic energy, scaled by separate coefficients α and β [8, 17] or overall QM/MM energy scaled by α ), PSA and NPSA represent polar and nonpolar solvent-accessible surface areas , and γ, δ, and κ are adjustable coefficients. The angle brackets denote the ensemble averages and Δ indicates the difference between the ensemble averages in the bound and free ligand states. The ensemble averages of the energies and the surface areas can be replaced by the respective quantities calculated for the time-averaged structures .
The MD simulations are often trapped in local minima close to the starting conformations. To obtain a better sampling of the conformational space, strategies as replica exchange MD  or local replica exchange MD  were devised. Alternatively, predominant states can be analyzed as in the approach called Mining Minima . Some ligands exhibit extensive movements and need much longer MD simulations to equilibrate than the rest of compounds. To reduce the cost, these ligands can be included in the LR correlations by treating them as assuming several binding modes, as described in our multi-mode (mm) approach .
A natural question emerges, whether the QM/MM energies can also be successfully used in the mm approach. To examine this possibility, the mm QM/MM LR method was applied to a set of 28 diverse hydroxamate inhibitors  of gelatinase B (MMP-9), a member of the family of matrix metalloproteinases (MMPs). MMPs participate in remodeling of extracellular matrix and are implicated in cancer, metastases, arthritis, and other diseases .
To provide the theoretical background for the correlation equation underlying the mm QM/MM LR approach, the description of binding equilibria as well as the replacement of the ensemble averages of van der Waals and electrostatic energies are briefly reviewed. The computational protocol is described.
Reversible 1:1 binding of the i-th ligand Li in m mutually exclusive orientations or conformations (modes) to the receptor site R can be schematically written as
Individual equilibria are characterized by the partial association constants Kij. The ligand is present as a single species in the receptor surroundings. The apparent association constant Ki for this process is, on the concentration basis, defined as
Eq. (3) is in accordance with the Mining Minima approach to calculation of free energies , as well as with other published analyses of formally analogous situations: the rigorous statistical thermodynamic description  and equilibrium treatment [26, 27] of the multi-mode interactions of ligands with proteins, and kinetic analyses of reversible uni-molecular reactions leading to different products  or isomers .
Each partial association constant Kij can be expressed using Eq. (1), with the same values of the adjustable coefficients α, β, γ, and κ. The apparent association constant Ki can then be correlated with the simulation results by a combination of Eq. (1), now with the subscript ij representing the j-th binding mode of the i-th compound), and Eq. (3):
In addition to predicting the binding affinity, Eq. (4) can also select the most dominant bound conformation(s) of the ligand out of the set of hypothetical conformations. Individual exponentials in Eq. (4) represent the partial association constants Kij of binding modes as defined in Eq. (2) and Eq. (3). After the coefficients in Eq. (4) are optimized, the Boltzmann probability of each bound conformation is easily calculated as Kij/ΣKij.
The use of molecular mechanics (MM) for the description of the ligand-receptor interactions requires similarity of chemical bonds in free and bound systems. Once the bonds change upon binding, most commonly when covalent or coordination bonds are formed or even when polarizability causes a different charge distribution for ligand-receptor complex, the use of standard force fields becomes problematic. Two possible solutions to this problem are represented by the development of more sophisticated force fields or by the use of quantum mechanical (QM) methods. For large ligand-receptor systems, combination QM/MM methods are frequently used, with the QM domain including the changing bonds and the MM domain covering the rest of the receptor molecules.
The use of the time-averaged structures in place of the ensemble averages in the LR equations makes the use of QM/MM methods straightforward: the energy of the relaxed time-averaged structures can be used instead of the van der Waals and electrostatic energies in Eq. (1). We have recently shown that this modification of the LR approach with electrostatic and van der Waals energies improves both descriptive and predictive abilities of the correlation .
The computation details were described previously [18, 22]. Briefly, the protocol consisted of: (1) FlexX docking of the ligands to the processed structure of MMP-9 taken from the Protein Data Bank  (file 1GKC ), with the pose selected primarily on the basis of acceptable distances between the catalytic zinc and the hydroxamate atoms; (2) QM/MM minimization of the selected poses, with the QM region including the ligand, tha catalytic zinc, Glu402 and the His triad as shown in Figure 1; (3) 200-ps MD simulation of the complex with the zinc bonds restricted; and (4) calculation of single-point QM/MM energies of the complex in the time-averaged structures from the simulations.
The van der Waals, electrostatic energies, and surface area terms were calculated using the corresponding time-averaged structures of the complex and the free ligand for eight 25-ps intervals of the 200-ps MD simulations. The time-averaged structure for each interval represented a binding mode: 0–25 ps - mode 1, 25–50 ps - mode 2, … 175–200 ps - mode 8. The structures of the compounds and inhibitory activities, both experimental  and calculated by individual equations, are given in supporting information (Table S1).
The QM/MM optimization of the best FlexX poses showed that the hydroxamate groups of all 28 derivatives created, along with the histidine triad of MMP-9, rather similar trigonal bipyramidal coordination spheres around the catalytic zinc  (Figure 1). Glu402 formed the hydrogen bonds with the nonionized hydroxyl group of the hydroxamates. The QM/MM approach, in general, handles the zinc-ligand charge transfer, bond length changes, polarization, and ionization of the ligands upon binding to zinc better than advanced force fields.
The ligand behavior differed more in the 200-ps simulations following 100-ps equilibrations with harmonically restrained zinc-binding groups (Figure 2). Most ligands were well equilibrated (as ligand 27, structure in Table S1 in supplementary information), but some ligands showed large-scale motions (e.g. ligands 22 and especially 7). The multi-mode approach was implemented to account for the instabilities. The LR terms in Eq. (1) and Eq. (4), which were calculated for the minimized and time-averaged structures from the MD data (supplementary information, Tables S2 and S3, respectively) did not exhibit colinearities: the highest mutual correlation coefficient was r2 = 0.193. As the last parameter, the single-point QM/MM energies for the time-averaged structures resulting from the 200-ps MD simulations were calculated. In all studied cases, the used time-averaged structures were close to at least one structure recorded in the simulation.
All statistically significant results of the fits of Eq. (1) and Eq. (4) to the data for the one-mode and multi-mode LR approaches with classical and QM/MM energy parameters are summarized in Table 1. In all cases, the dependent variable was ln (1/Ki) because the reported Ki values  are the inhibition constants, not the association constants. The quality of individual fits can be assessed using the statistical indices in Table 1 and also the plot of residuals vs. the experimental activities for individual equations (Figure 3). The FlexX docking, while providing good starting geometries, did not allow the FlexX scoring function to discriminate between individual binding affinities. For both MM and QM/MM minimizations (Eq. (1), lines 2 and 3 in Table 1), the QM/MM, van der Waals, and electrostatic terms were statistically not significant and the coefficient errors were higher than the optimized coefficient values. Moreover, the coefficients for the QM/MM and van der Waals terms had negative values. The optimized structures, regardless whether obtained by MM or QM/MM minimization, probably did not represent well the bound conformations and the results were discouraging (r2 = 0.508 and 0.514), in contrast to other reports .
Both MD and MD+QM/MM one-mode treatments resulted in the same equation (line 4 in Table 1), because all the energy-related coefficients were insignificant. The coefficient errors for the electrostatics and PSA terms were almost equal to the optimized coefficient values. The sign of the coefficient for the van der Waals term was negative and the error was higher than the coefficient. Inclusion of the statistically insignificant terms led to negligible or moderate improvements in the correlations: for MM minimization, r2=0.516; for QM/MM minimization, r2=0.520; and for the one-mode treatment, r2 = 0.625 (MD) and r2 = 0.630 (MD + QM/MM). Conformational sampling using 200-ps MD simulations did not improve the statistics significantly (r2 = 0.589). We showed previously [18, 32] that these correlations can be amended by a careful selection of the used intervals of MD simulations.
The interval selection is automated in the multi-mode approaches resulting in Eq. (4), which provide a real breakthrough in the performance, explaining 84% of the variance with the MD energy parameters, and 90% with the QM/MM parameter (Table 1, the last two lines). The improvement caused by the inclusion of the QM/MM energies into the multi-mode LR approach is significant and results in the best description of the data. All three terms included in Eq. (4) for the mm QM/MM LR approach exhibited significant contributions to the correlation. The residuals (Figure 3) are the lowest among the used approaches, and seem to be randomly distributed.
The robustness of the regression Eq. (1) and Eq. (4) and their predictive abilities were probed by leave-one-out (LOO) and leave-several-out (LSO) cross-validation procedures. The latter approach, using a random selection of a 6-member test set that was repeated 200 times, provided a thorough evaluation. The predictivity of the mm QM/MM LR approach is better than with other approaches (Table 1, the last line). The RMSE values using LOO (0.817) and LSO (0.827) cross-validations were comparable to the SD value (0.730), which is equivalent to the RMSE value without any compound omission.
The prevalences of individual simulation intervals representing the binding modes for the studied ligands can be calculated as Kij/ΣKij, where the partial association constants Kij, as defined in Eq. (2) and Eq. (3), correspond to individual exponentials in Eq. (4). The prevalences are summarized in Table S4 in the supporting information, and plotted against inhibitory activities in Figure 4. Apparently, the number of binding modes is not related to the activity, contrary to the notion that weak binders have a frustrated energy landscapes leading to multiple binding modes . It should be noted, however, that the least active compound 28 (structure in Table S1 in the supporting information) does not have a clearly dominating mode, and all the modes exhibit about equal (10–14%) contributions to the binding. All but one compounds have the major binding mode representing less than 50% of bound molecules. In 60% of the cases, mode 1 is the dominant binding mode. Major outliers in one-mode treatment (compounds 2, 3, 6, 15, 21, 22, 25, and 28, Table S1 in the supporting information) are predicted accurately by the multi-mode treatment.
The presented mm QM/MM LR approach, combining MD simulations and QM/MM methods, exhibits very good descriptive and predictive abilities for the studied inhibition of MMP-9 by hydroxamate ligands. If further studies confirm the applicability of the approach, it can become a useful addition to the armory of the approaches to the structure-based prediction of binding affinities. The two novel features of the mm QM/MM LR approach become especially handy in the following situations. The multi-mode treatment automates the selection of suitable simulation intervals for compounds undergoing significant large-scale movements. The QM/MM energy provides superior descriptions of the binding energies if the ligand-macromolecule complex is formed by coordination, covalent or polarizable bonds rather than by standard weak interactions.
This work was supported in part by the NIH NCRR grants 1P20RR15566 and 1P20RR16471, as well as by the access to resources of the Computational Chemistry and Biology Network and the Center for High Performance Computing, both at the North Dakota State University.
*Dedicated to Yvonne C. Martin on the occasion of her round birthday and retirement