|Home | About | Journals | Submit | Contact Us | Français|
To alleviate the problems in the receptor-based design of metalloprotein ligands due to inadequacies in the force-field description of coordination bonds, a four-tier approach was devised. Representative ligand-metalloprotein interaction energies are obtained by subsequent application of (1) docking with metal-binding-guided selection of modes; (2) optimization of the ligand-metalloprotein complex geometry by combined quantum mechanics and molecular mechanics (QM/MM) methods; (3) conformational sampling of the complex with constrained metal bonds by force-field based molecular dynamics (MD); and (4) a single point QM/MM energy calculation for the time-averaged structures. The QM/MM interaction energies are, in a linear combination with the desolvation-characterizing changes in the solvent-accessible surface areas, correlated with experimental data. The approach was applied to structural correlation of published binding free energies of a diverse set of 28 hydroxamate inhibitors to zinc-dependent matrix metalloproteinase 9 (MMP-9). Inclusion of step 3 and step 4 significantly improved both correlation and prediction. The two descriptors explained 90% of variance in inhibition constants of all 28 inhibitors, ranging from 0.08 to 349 nM, with the average unassigned error of 0.318 log units. The structural and energetic information obtained from the time-averaged MD simulation results helped understand the differences in binding modes of related compounds.
Metalloproteins play important roles in physiological processes (hemoglobin, cytochrome oxidase, catalase and superoxide dismutase), the receptor binding of potential drugs (carbonic anhydrases, matrix metalloproteinases, thermolysin, leucine aminopeptidase, phospolipase C, carboxypeptidase A and B, adenosine and cytidine deaminase), and drug metabolism (cytochromes P450 and methane monooxygenase), to give just a few examples.1–3 In drug design, a description of the ligand interactions with transition metals poses a challenge due to the possibility of multi-dentate coordination bonding that is most appropriately treated at the quantum mechanical level.4–11 This study presents an approach to a receptor-based estimation of binding affinities of metalloprotein ligands that is more reliable than standard ensemble-based techniques at the expense of a modest increase in the computing time.
The fastest and usually least precise descriptions of coordination bonds are obtained using molecular mechanical approaches. Several techniques are available with varying levels of sophistication. The nonbonded approach12,13 uses optimized electrostatics and van der Waals terms,14,15 occasionally using dummy cations placed around the metal atom16,17 to enforce the correct coordination geometry. The geometry enforcement is more stringent in the bonded model18–20 that utilizes the bond terms including bond stretching, angle bending, and torsional terms. These approaches require a predefined valence of the coordinating metal, in contrast to the directional force field YETI that is more flexible in selection of appropriate valence.21,22 Further force-field enhancements include addition of polarizable bonds,23,24 directionality based on orbital hybridization,25 and ligand field stabilization energy.26,27 Unfortunately, the more sophisticated force fields are not readily available for a routine use in modeling of ligand-metal interactions.
Combined quantum mechanical and molecular mechanical (QM/MM) methods represent an economical approach to characterization of macromolecular processes that include changes in the covalent bond status.28–31 The atoms directly involved in the chemical steps are typically included in the QM region, whereas the rest of the system is treated at the MM level. Initially developed for gas-phase calculations,32 the QM/MM approach was first applied to enzyme systems by Warshel & Levitt.28 QM/MM methods combine the efficiency of MM allowing for free energy simulations of macromolecules, with the accuracy and a systematic improvement potential provided by QM.33 The methods have been implemented in combination with semi empirical and ab initio molecular orbital theory and with density functional theory (DFT). QM/MM-based molecular dynamics (MD) simulations have been used for modeling of enzymatic reactions,31,34,35 but are computationally intensive and currently impractical for the prediction of binding affinities of a series of metalloprotein ligands. Alternatively, a large system as a whole can be treated at the QM level by fragmenting, linearly scaling approaches;36 however, the QM/MM treatment seems to provide a better focus by applying the most sophisticated methods to the key chemical steps and the fast methods to the less important noncovalent processes.
Several categories of the methods for prediction of binding affinities to the receptors with known structures are available. Free Energy Perturbation (FEP),37 Thermodynamic Integration,38 and similar techniques39 are the most sophisticated tools for free energy calculations. Extensive sampling resulting in extreme demands on computational resources and limitation to close homologs currently preclude their routine use in drug design. Computational costs can be reduced by partitioning of the binding energy into individual contributions and, in further simplification, by replacing the ensembles of structures generated in molecular simulations by single structures.
In ensemble-based partitioning approaches, only two states needs to be considered: complexes and free interaction partners. The approaches in this category can be classified according to the use of adjustable parameters in the final relation between the binding affinity and the calculated free energy contributions. The MM-PBSA40,41 and MM-GBSA42 methods form the parameter-free category while the Linear Response (LR, a.k.a. Linear Interaction Energy) method43–45 and its extended version (ELR)46–48 represent the parametrized category. The contributions to the binding free energy are expressed as the differences Δ between the solvated ligand in the bound and free states in the ensemble averages (denoted by angle brackets) of respective quantities.
In ELR type methods, binding free energy ΔGb is calculated as the linear combination of the differences Δ in the van der Waals energies, electrostatic energies,43–45 and the solvent-accessible surface areas46–48 (SASA) obtained from MD or Monte Carlo simulations:
The adjustable parameters α, β, and γ contain the protein-solvent and solvent-solvent interactions.49–51 The van der Waals parameter α depends on the used force field, as was shown e.g. for thrombin.47,49,52 Its magnitude was analyzed with respect to hydrophobicity of the binding site.49 The coulombic scaling coefficient β varies with the ligand nature and ligand surroundings (protein or water),53–55 although its equality to ½ was assumed initially based on the linear response of the surroundings to electric fields.43–45 Similar method for treatment of hydration of more complex molecules56 required the use H-bond donor and acceptor counts, in addition to the quantities present in Eq. 1. Continuing studies of more diverse ligands may reveal the need for further empirical corrections.
Single-structure-based partitioning approaches, represented by VALIDATE,57 the Free Energy Force Field approach,58,59 COMBINE analysis,60 and a single-structure version of the LR method using continuum electrostatics,61 replace the ensemble averages by a single configuration, usually obtained by a direct geometry optimization of the receptor-ligand complex. Scoring functions are simplified single-structure-based partitioning approaches that are categorized as force field-based methods,62,63 empirical free energy scoring functions,64–66 and knowledge-based scoring functions.67–69 They are mainly used in high-throughput virtual screening, in connection with fast docking procedures.
To overcome the limitations of the aforementioned methods in prediction of binding affinities to metalloproteins due to the difficulties in handling the transition metal atoms, we have combined docking, QM/MM calculations, and force-field-based MD methods into a coherent approach. The approach is tested using inhibitors of matrix metalloproteinases (MMPs),70 which belong to the most-studied metalloenzymes. Development of MMP inhibitors is complicated by structural relatedness of the MMP family,71 in which some members assume normal physiological roles and others are pathological, depending upon given concentration or activity.
Interactions of metalloproteins with ligands are often difficult to describe effectively due to the limited availability of appropriate force fields. To predict binding affinities in these cases, we devised a four-tier procedure consisting of: (1) docking with the selection of poses based upon appropriate metal binding; (2) QM/MM optimization of the best docked geometries; (3) MD simulation with the metal binding group of the ligand confined in the geometry from Step 2; and (4) QM/MM single point interaction energy calculation based on the time-averaged structures from Step 3:
The QM/MM interaction energies are correlated in an ELR-type approach, along with SASA crudely parametrizing desolvation, with experimental affinities. The use of the energies of the time-averaged structures in place of the ensemble averages of energies was previously shown to provide equivalent results in a parametrized partitioning approach to binding affinity correlations72 and in protein pKa prediction.73 The approach was tested using published data70 on inhibition of 28 hydroxamate inhibitors of MMP-9 (Table 1).
Docking of the inhibitors to the MMP-9 structure taken from the Protein Data Bank74 (PDB file 1GKC) was performed using FlexX.64,75 The ranking of poses was based upon the distance between catalytic zinc and both hydroxamate oxygens in the interval 1.5 – 2.5 Å as the primary criterion and the FlexX score as the secondary criterion.76 Docking provided a greater variety of the initial binding modes than superposition of the ligands with the ligand in the PDB file of the complex.77 Since the MD simulations routinely do not sample conformations too different from the starting conformations, this step was instrumental in selecting the binding mode explaining the experimental data for several compounds. To reduce the QM/MM convergence time, for the top complexes of each ligand, the mobile region within 5 Å of the ligand superposition was briefly optimized by the conjugate gradient minimization using OPLS-AA force field.
The charges and inter-atomic distances for all 28 ligands (Table 1) do not differ much; therefore we present them as average values in Figure 1. The maximum standard deviations were 8.1 % for charges and 2.5 % for distances. Charge transfer from hydroxamate group to zinc is significant, albeit very similar for all studied compounds (Figure 1). For the free/bound hydroxamates, the average bond lengths were (in Å): C=0 1.236/1.284, C-N 1.355/1.308, N–O 1.401/1.364, and O–H 0.982/1.590, respectively. In the QM/MM geometry optimization of the complex, the proton from the hydroxamate OH group was transferred to the Glu402 oxygen but remained H-bonded to the parent oxygen. The average C=0, C-N and N-O bond lengths for the bound state obtained here (1.284, 1.308 and 1.364 Å) are closer to deprotonated Zn-coordinated hydroxamate (1.294, 1.317 and 1.378 Å) than to neutral hydroxamate with respective distances of 1.279, 1.337 and 1.416 Å.78 The distance between Glu402 oxygen and oxygen of the reverse hydroxamate (2.70 Å) in MMP-9 and hydroxamate oxygen in MMP-779 and MMP-880,81 (2.6–2.8 Å) complex are in close agreement with the distance found in Step 2 (2.604 Å). These facts suggest that bound hydroxamate structure is, in the presence of proton-accepting Glu402, closer to deprotonated state than to neutral form. Hydroxamates approach the binding site as neutral molecules (pKa ≥ 8.9 for the studied compounds)82 and require ionized Glu402 for full potency.83
Two hydroxamate oxygens may bind to zinc in monodentate or bidentate configurations that result in four- or five-fold coordination, respectively. Among the structures of hydroxamates bound to MMPs in PDB, the bidentate zinc binding prevails, with trigonal bipyramidal (TB) configurations occurring about six times more frequently than the square-based pyramid (SP). After QM/MM optimization, the bidentate TB coordination was observed in the studied hydroxamate structures, as evidenced by the average geometries of all 28 complexes (Figure 1). The bond lengths of the two Zn-O bonds (Zn-O1 and Zn-O2) are 2.029±0.018 Å and 2.089±0.012 Å, respectively, which indicates that the two oxygens coordinate with zinc about equally. The average bond angles O1-Zn-N in His401, His405 and His411 in these complexes are 121.77±14.81°, 135.58±8.95°, and 82.06±1.65°, respectively (ideal values for TB: 120°, 120°, 90°; and for SP: 90°, 180°, 90°). The average bond angles O2-Zn-N in the same order are 101.55±1.89°, 95.07±2.13°, and 155.23±2.98°, respectively (both TB and SP: 90°, 90°, 180°). The average bond angle of O2-Zn-O1 is 79.66±0.61° , approaching the ideal value of 90° for both TB and SP configurations.
The QM/MM approach for the entire binding domain provided a more realistic picture than the studies using reduced systems5,10,78,84,85 that do not consider the protein surroundings of bound ligands. In this case, ionized Glu402 is crucial for a correct description of binding. The QM/MM approach handled the zinc-ligand charge transfer,36 bond length changes, polarization, and ionization25 upon binding to zinc, which would be difficult to describe by advanced force fields.21–27 Albeit very similar in the studied set, the electronic phenomena can be anticipated to play a more important role in a heterogeneous series of ligands.
The QM/MM optimized complexes were subjected to the MD simulation with the constrained zinc-hydroxamate oxygens bond lengths and angles, to obtain conformational sampling for the rest of the complex. To reduce the computational expense, only one MD simulation, for the enzyme-inhibitor complex, was performed.41 The energy terms for the enzyme and inhibitor taken from the MD simulation for the complex were comparable to those obtained from running separate simulations for the enzyme and inhibitor, as checked for several inhibitors. Inspection of the trajectories revealed that the secondary and tertiary structure of the ligand-receptor complex remained stable during the entire 200-ps simulation period (data not shown). The influence of zinc bond constrains on movement of other ligand parts is illustrated in Figure 2. The chiral carbon (Table 1) that is only 2–3 bonds apart from the constrained hydroxamate oxygens is comparatively rigid. The ring nitrogen and phosphorus are separated by 1–2 more bonds from the constrained part and exhibit amplitudes comparable to those of distant binding site parts like the oxygen of Glu402 that is H-bonded to hydroxamate oxygen. The ensemble averages of the van der Waals and electrostatic energies that were calculated for the time-averaged structures obtained after 5 ps simulations are listed in Table 1. Longer simulation times provided similar samplings that did not improve the correlations using Eq. 1 and Eq. 3 and the results are not shown. The ΔSASA terms were lower for the time-averaged structures from Step 3 than for the minimized structures from Step 2 in 16 cases (Table 1). For 12 ligands, however, the ΔSASA terms increased, indicating that MD sampling found better binding modes than minimization. The largest increases were observed for ligands 3, 5, 6, 8, 20, 22, 25, and 26.
For the time-averaged structures resulting from 5-ps MD simulations, single point QM/MM interaction energies were calculated according to Eq. 2 and are summarized in Table 1. The time-averaged structures were used to preserve conformational sampling obtained in Step 3 that should result in more realistic descriptions than using the single optimized structures.61 As can be expected, the QM/MM energy changes are much smaller for the time-averaged structures from Step 3 than for the minimized structures in Step 2. The QM/MM calculations properly treat the coordination bonds between zinc and the ligands. Therefore, the QM/MM interaction energies are expected to provide better energy estimates than the MM-based force field simulations.
For a comparison, the MD results were correlated with biological data using Eq. 1. The influence of the SASA’s polar and nonpolar components, as well as that of the constant term κ were examined. The resulting equations are not listed because they improved neither correlations nor predictions.
The fits of Eq. 1 and Eq. 3, with logKi on the left side, to the inhibition data for the 28 compounds for individual Steps are summarized in Table 2. Prediction of the MMP-9 inhibition constant based upon the linear correlation of the experimental data with the FlexX scores in Step 1 resulted in a poor correlation characterized by r2 = 0.044, with r being the correlation coefficient.
The QM/MM energies of the ligand-enzyme complexes with optimized geometries, obtained in Step 2, resulted in a slightly improved statistics (r2 = 0.504, Table 2). The correlation is dominated by the SASA term. Inclusion of the QM/MM energy for the optimized geometry did not improve the correlation. The pertinent regression parameter α has a negative sign and an inflated error, possibly due to a moderate cross-correlation between the QM/MM energy and the SASA term (r2=0.460).
The use of the energy terms from the MD conformational sampling in Step 3 pushed the correlation to r2=0.764 (Table 2). Compounds 3, 7, 20, 22, and 26 (Table 1) that saw the greatest description improvement in Step 3 (0.6 – 1.1 log units), exhibit a significant change in the conformation of the bound ligand upon MD sampling. The change in the binding conformation is accompanied by a large increase of ΔSASA, except compound 7. Descriptions for other compounds with ΔSASA increased upon MD sampling (5, 6, 8, and 25, Table 1) improved by 0.1 – 0.3 log units. For the van der Waals coefficient α, the error is almost equal to the parameter estimate, while for the coulombic coefficient β, the sign of the parameter estimate is negative and the error term larger than the parameter estimate. This misbehavior may be caused by the collinearity problem of the van der Waals and electrostatic terms (r2=0.712 – 0.839 for different simulation times). The low value of the van der Waals coefficient α might indicate a low hydrophobicity of the MMP 9 binding site, if the results of the original analysis49 are also valid for correlations with the explicit SASA term. Cross-correlation of van der Waals and SASA terms was very weak (r2=0.174).
The best model for Step 4 contained just the single point QM/MM energy for the time-averaged structures and the SASA term and was obtained using the data from a 5-ps MD simulation. This treatment was most beneficial for compounds 18 and 19, the residuals of which decreased from ~1.0 in Step 3 to ~0.3 log units in Step 4, and compounds 5 and 8, with residuals improving from ~ 0.6 to ~0.1 log units. The correlation for all 28 compounds is characterized by r2 = 0.900 and the standard deviation SD = 0.318 reflecting a good agreement between actual and calculated values (Table 2). For each parameter, the probability >F ratio was <0.0001, implying that the likelihood of a random occurrence of a significant parameter is negligible. The cross-correlation between the QM/MM energy and SASA is very weak as indicated by the r2 value of 0.140. The dominance of the SASA terms, clearly seen in Table 2, is probably reflecting the effect of burial of the inhibitor in the binding site. This phenomenon was described previously in the analysis of binding energies of several ligand-protein complexes.86 A plot of experimental activity as a linear combination of contributions from QM/MM energy and SASA is shown in Figure 3. The quality of correlations in Step 4 remained at about the same level with the increase in the MD simulation time for obtaining the time-averaged structures. Consequently, the simulation time of 5 ps seems to be sufficient for the binding energy analyses in the studied case, which is characteristic by constrained geometry of the zinc binding group in the complex and rigid protein structure outside the 5-Å region around the ligand superposition.
The adjustable parameter κ in Eq. 3 yields an attractive term of about −2.623 log units (Table 2), providing a base value for the inhibitors that is then modulated by the QM/MM interaction and SASA terms. The values of the QM/MM terms (Table 1) are negative and the associated positive coefficient (Table 2) implies that a strong interaction between the inhibitor and the binding site is important for inhibition. The SASA terms (Table 1) are negative, implying burial of the surface area upon binding. The associated parameter γ (Table 2) is positive so that the removal of mostly hydrophobic surface area from the contact with water upon binding promotes the binding, which simply reflects the hydrophobic effect.87 The obtained values of γ (Table 2: 0.00754-0.011 Å−2; multiplied by R×T×ln10 = 1.419 kcal/mol to account for the change of the dependent variable from free energy to log Ki as described in part Methods/Data Set) are in the same range as the slopes of the linear dependencies of solvation free energies on SASA: 0.007 kcal/(mol×Å2) for alkanes,88 and 0.01689 or 0.020 kcal/(mol×Å2)46 for various compounds.
The robustness of the regression equations and their predictive abilities were probed by cross-validation. The leave-one-out (LOO) procedure and especially the leave-several-out (LSO) procedure with a random selection of 6-member test set that was repeated 200 times provided a thorough evaluation. The predictive root mean squared error (RMSE) for Eq 3 obtained for the 5 ps MD simulation time is the lowest among all correlations. The RMSE values using LOO (0.331) and LSO (0.319) were comparable to that of the RMSE of the whole data set (0.315). Inclusion of all Steps in the correlation was warranted by the improvement in descriptive and predictive ability. The quality of correlations for individual Steps is documented in Figure 4.
The correlation described by Eq. 3 with the optimized parameters given in Table 2 is much better than our previous ELR results77 obtained from MD simulations with nonbonded zinc-ligand interactions. The predictive ability of the ELR model for all 28 compounds was characterized by RMSE from LSO cross-validation between 0.584 and 1.173, depending upon the simulation time (Table 6, model A in ref. 77). Comparison of these values with equivalent LSO RMSE values in the right-most column of Table 2 shows that current correlations were significantly better only for Step 4 (RMSE 0.319), where single-point QM/MM energies of the time-averaged structures were used. The results from MD simulations employing the same time-averaged structures (Step 3), although run with improved starting geometries of hydroxamate groups from the QM/MM minimization in Step 2, did not show any major progress: the best RMSE was 0.531 that is only slightly better than 0.584 resulting from a less constrained MD simulation. The QM/MM energy from the minimization procedure (Step 2) ended with the best RMSE of 0.751 that is not much better than RMSE of 0.785 that was obtained previously with a force-field minimization. Apparently, conformational sampling embodied in the time-averaged structures (Step 3) and a good description of the zinc coordination bonds (Step 4) are jointly required for a good correlation with experimental inhibitory potencies and none of these procedures alone is making a major breakthrough.
Eq. 3 and underlying structural information on bound ligands present a straightforward framework for understanding the trends in the observed activities. For a given series of hydroxamate derivatives in Table 1, the ranges for the contributions from QM/MM interactions and burial of solvent accessible surface area are 451.81 kcal/mol and 305.97 Å2, respectively. The SASA term contributes more to the computed activities than the QM/MM term. Representative binding modes of compound 22 (Table 1) obtained after FlexX docking, minimization, and MD simulation are illustrated in Figure 5. The differences between minimized structures and time-averaged structures after MD sampling in Step 3 were sometimes larger than those shown in Figure 5, especially for compounds 3, 7, 20, and 26 (Table 1). The S1 subsite is structurally defined by Asp185, Gly186, Leu187, Leu188, Ala189, His190, Ala191, and Phe192, while the S2’ subsite is defined by Ala417, Leu418, Met419, Tyr420, Pro421, Met422, and Tyr423. The P=O group of all the ligands except 2, 3, 13, 16, 20, and 25 forms hydrogen bonds with the backbone NH of Leu188. In general, the R1 substituents occupy the pocket lined with residues of the S1 subsite while R2 substituents occupy the S2’ subsite (Figure 6). Even compounds with longer R2 substituents (compounds 18–22) leave an empty pocket in the S2’ subsite. Since these are the most potent compounds in the given series, further lengthening of the R2 substituents may be helpful in designing more potent inhibitors.
Compound 1 is a potent inhibitor (Ki = 5.05 nM) while compound 2 exhibits only moderate activity (Ki = 349 nM). The compounds are structurally similar except the configuration on C-3 where 1 exhibits the R-configuration while in 2 it is the S-configuration. The importance of the R-configuration at the α carbon is well documented.90–94 Binding modes after MD simulation (Step 3) of the two inhibitors are shown in Figure 7. The zinc-binding group is tightly bound to zinc in both the compounds but the rest of the ligand is flipped by 180°. In case of 1, several hydrogen bonds can be discerned: between the oxygen atom of the ethyl ester group and the main chain NH of Leu188; between the hydroxamate OH and oxygen of Glu402 (and also an intramolecular H-bond with the phosphonamide oxygen). In contrast, compound 2 forms only one hydrogen bond with the Glu402 (Figure 7). The H-bond pattern is reflected in computed QM/MM interaction terms (Table 1) that are equal to −698.95 kcal/mol for 1 and −489.14 kcal/mol for 2. Additional benefit for 1 comes from the burial of more SASA (−389.34 Å2) than for 2 (−311.53 Å2).
Substitution of 4-methoxyphenyl in the R2 position in 1 with pyridine ring in 15 resulted in a severe loss of activity. The fact can be explained by the lower QM/MM energy values as well as a lower burial of SASA in 15 as compared to 1 (Table 1). Compounds 10 – 12 (Ki = 42.7, 41.8, and 51.9 nM, respectively) are structural analogs of compound 1 but exhibit lower activity as compared to 1 (Ki = 5.05 nM). The 4-methoxyphenyl group in R2 position of 1 is replaced with phenyl (10), 4-methylphenyl (11), and 4-fluorophenyl (12) substituents. Compounds 10 – 12 exhibit less burial of SASA and similar (11) or lower QM/MM energy values (10 and 12), as compared to 1 (Table 1).
Compound 19 (R2=Ph-Ph) differs from 20 (R2=Ph-O-Ph) in the interlinking oxygen atom and exhibits a lower activity. The ether linkage in 20 is expected to be better solvated in the unbound state of the ligand than in the bound state. However, this factor is not encoded in the overall ΔSASA value: greater ΔSASA (−604.77 Å2) of 20 than 19 (−423.94 Å2) merely reflects the size difference as both ligands bind in a similar way. Although the QM/MM energy term is more favorable for 19 (−940.95 kcal/mol) than for 20 (−834.28 kcal/mol), this contribution is negated by the dominance of the SASA term and consequently higher activity for compound 20.
Compounds 20 and 21 differ in isosteric R2 substituents (Ph-O-Ph and Ph-O-C6H5N, respectively). The QM/MM as well as SASA terms are more favorable for compound 20 as compared to compound 21 (Table 1). For compound 20, MD simulation in Step 3 found a new time-averaged binding mode resulting in a substantial increase in ΔSASA that explains higher activity of 20 as compared to 21. In this mode, the P=O group of 20 forms H-bond interaction with backbone NH of Ala189 that was not observed after QM/MM minimization in Step 2.
MD simulation also found a new time-averaged binding mode for compound 26, resulting in a significant increase in ΔSASA and improvement in description of activity after Step 3 as compared to Step 2. A closer look at the structural changes shows that the new mode also exhibits a different H-bonding interaction pattern: the P=0 group forms H-bond with backbone NH of Leu188, while after Step 2 the NH group in X4 was H-bonded with the backbone C=0 group of Gly186.
Published inhibitory potencies, characterized by the inhibition constants Ki at 37 °C, of a series of 28 hydroxamate derivatives70 towards MMP-9 were used (Table 1). The inhibition constants are the inverse values of the association constants K. The left sides of Eq. 1 and Eq. 3 are the binding free energies ΔGb = − R×T×lnK = R×T×ln10×logKi. The logKi values were used as dependent variables, in order to work with the dependent variables that are most common in medicinal chemistry. The optimized values of adjustable parameters α, β, γ, and κ (Table 2) can be recalculated for the ΔGb as independent variable by multiplication with R×T×ln10 = 1.419 kcal/mol.
Coordinates of the MMP-9 catalytic domain were taken from the recently reported x-ray crystal structure of the MMP-9 in the complex with N-formyl-N-hydroxy-2-(2-methylpropyl)-β-alanyl-N,3-dimethyl-L-valinamide, a ‘reverse hydroxamate’ inhibitor, as deposited in the PDB (file 1GKC).95
Three-dimensional structures of ligands were constructed using the SYBYL6.9 suite of programs96 running under Irix 6.5. Full geometry optimization and charges were calculated using DFT/B3LYP-6–31G** approach.97 The inhibitors were docked into the active site of MMP-9 using the FlexX program that considers ligand conformational flexibility by an incremental fragment placing technique.64,75 For each ligand, the best conformation in the active site was selected from the top 30 poses generated by FlexX using the distances in the interval 1.5 – 2.5 Å between both hydroxamate oxygens and the catalytic zinc atom as the primary criterion and the FlexX ranking as the secondary criterion.76 Step 1 was completed by a brief MM geometry optimization using the OPLS-AA force field with a distance-dependent dielectrics and conjugate gradient algorithm with a convergence criterion of 0.001 kcal/(mol×Å).
QM/MM Calculations98 were used for three purposes: optimization of the initial geometries of the inhibitor-enzyme complexes and estimation of their interaction energies (Step 2), as well as estimation of the interaction energies for the time-averaged structures (Step 4) obtained by MD simulation (Step 3). The QM region consisted of side chains of His405 and His411, the backbone atoms and side chains of His401 and Glu402, the entire inhibitor, and the zinc ion. The backbone atoms were included to obtain valid QM/MM cuts. The rest of the protein was treated with MM. The protein and water outside 5 Å of the ligand superposition after Step 1 were frozen.
The interface between QM and MM regions is mediated by frozen orbitals.99 The QM and MM regions interact via two mechanisms: electrostatic interactions between MM point charges and the QM wave function, and van der Waals interactions between QM and MM atoms. For the MM and QM parts of the QM/MM calculations, OPLS-AA force field100 and DFT functional B3LYP101 were deployed, respectively. All charges in the MM region were treated using the OPLS-AA force field. The 6–31G* basis set was used in the interface region between the QM and MM regions. The LAV3P** basis set was employed for geometry optimization: for Zn, S and P atoms, this means the Los Alamos effective core potential (ECP)102,103 with all the s functions and the last p and d gaussian uncontracted; for the remaining atoms, it implies 6–31G** basis set. The maximum number of iterations was set to 100 cycles and all calculations converged before reaching this limit: the root mean squared change in density matrix elements was less than the criterion of 5.0×10−6. B3LYP provides as good or better geometries and energies as those from correlated ab initio methods for the first-row transition metal complexes.5 We therefore selected B3LYP to optimize the structures of the complexes in Step 2. In Step 4, the interaction energies were calculated by subtracting the QM/MM energy of the ligand and the receptor from the QM/MM energy of the complex for the time-averaged structures (Eq. 2).
Molecular Dynamics Simulations were performed using SYBYL 6.9 under isothermal/isobaric (NPT) conditions with Tripos force field.104 The QM/MM-optimized bonds between the zinc-binding group (hydroxamate oxygens) of inhibitors, zinc and nitrogens of His401, His405, and His411, slightly different for each ligand, were restrained with a harmonic potential (the force constant 200, power 2) throughout the simulation. The Mulliken charges105 resulting from the QM/MM optimization were used for the QM region (ligand, catalytic zinc, coordinating histidine and glutamate residues). For the rest of the protein and water, Gasteiger-Hückel charges106 were used. For each system, only one set of simulations was performed with the ligand bound to the protein with a cap comprising two layers of water (TIP3P) molecules,104 surrounding the complex. The water molecules were then minimized for 10,000 cycles using conjugate gradient minimization keeping the protein and ligand atom fixed to its initial positions. The outer portion of the water cap was farther than 5 Å from the ligand superposition after Step 2 and was assigned to the frozen region. Subsequently MD simulation was performed for 20 ps for the mobile water molecules, keeping the protein, ligand, and crystal water and frozen water fixed. This solvent equilibration phase was performed in order for the solvent molecule to readjust to the potential field of the ligand-receptor complex.
After the 15-ps heating phase, when the temperature of the system was raised from 0 to 300 K, the equilibration run was performed for 100 ps. Finally, the production phase was carried out at 300 K for 200 ps. The time step of the simulations was 1 fs with a cutoff of 12 for nonbonded interactions. The nonbonded pairs were updated every 25 fs. All residues within 5 of any atom in the ligand superposition after Step 2 were allowed to move freely and the remaining part of the protein and water was kept frozen. This setting was identical for all ligands. The time-averaged structures, obtained from the readings in 100 fs intervals, of the complete mobile region were collected at appropriate times. These structures were briefly minimized, to relieve the worst conflicts, using the Tripos force field with a distance-dependent dielectrics and the Powell conjugate gradient algorithm with a convergence criterion of 0.001 kcal/(mol×Å). The minimization produces structures with standard bond length and angles, with the dihedrals representing the ensemble. The time-averaged structures were shown to provide similar LIE correlations as ensemble averages.72,73 The single-point QM/MM energy calculations on the time-averaged structures were used to estimate the zinc binding energies. The polar, non-polar and total solvent accessible surface area (SASA) terms were calculated using the ProsSat option in the Homology module of the Insight II modeling package.107
The least-square fits108 were based on Eq 1 and Eq 3 with the constant term. The robustness of the regression equations and their predictive abilities were probed by cross-validation. For this purpose, the fits to the potency data are generated leaving out one or more inhibitors from the calibration process. The resulting equation for each fit is used to predict the potencies of the omitted compounds. We used the leave-one-out (LOO) approach and the leave-several-out (LSO) approach, where 6 inhibitors were randomly omitted and the process was repeated 200 times. The correlations of the LOO and LSO predictions with the actual potencies were characterized by the root mean square errors (RMSE).
A computational approach combining docking, QM/MM calculations, and MD simulations was developed for prediction of binding affinities of ligands to metalloproteins. The use of QM/MM energies in the ELR-type correlations was facilitated by the use of time-averaged structures from MD simulations. The application of the approach to the MMP-9 inhibition by 28 hydroxamates resulted in an excellent correlation (r2 = 0.900) between experimental and calculated values for all tested compounds that exhibit ~4000-fold difference in binding affinity, with the inhibition constants Ki ranging from 0.08 to 349 nM. Prediction ability of the correlation is characterized by RMSE ~ 0.3 for the logKi values, as compared to RMSE > 0.6 if the QM/MM term is not used. The examination of the energetic and structural results provides a basis for understanding activity differences of individual inhibitors and for their rational design. The proposed approach improves the descriptive and predictive abilities for metalloprotein ligand affinity prediction as compared to the LR approach at the expense of about a four-fold increase in the computational time.
This work was supported in part by the NIH NCRR grants 1P20RR15566 and 1P20RR16471, as well as by the access to resources of the Computational Chemistry and Biology Network and the Center for High Performance Computing, both at the North Dakota State University.