|Home | About | Journals | Submit | Contact Us | Français|
Accurate prediction of the binding affinities of small-molecule ligands to their biological targets is fundamental for structure-based drug design but remains a very challenging task. In this paper, we have performed computational studies to predict the binding models of 31 small-molecule Smac (the second mitochondria-derived activator of caspase) mimetics to their target, the XIAP (X-linked inhibitor of apoptosis) protein, and their binding affinities. Our results showed that computational docking was able to reliably predict the binding models, as confirmed by experimentally determined crystal structures of some Smac mimetics complexed with XIAP. However, all the computational methods we have tested, including an empirical scoring function, two knowledge-based scoring functions and MM-GBSA (molecular mechanics generalized Born surface area), yield poor to modest prediction for binding affinities. The linear correlation coefficient (r2) value between the predicted affinities and the experimentally determined affinities was found to be between 0.21 and 0.36. Inclusion of ensemble protein-ligand conformations obtained from molecular dynamic simulations did not significantly improve the prediction. However, major improvement was achieved when the free-energy change for ligands between their free- and bound-states, or “ligand-reorganization free energy”, was included in the MM-GBSA calculation and the r2 value increased from 0.36 to 0.66. The prediction was validated using ten additional Smac mimetics designed and evaluated by an independent group. This study demonstrates that ligand reorganization free energy plays an important role in the overall binding free energy between Smac mimetics and XIAP. This term should be evaluated for other ligand-protein systems and included in the development of new scoring functions. To our best knowledge, this is the first computational study to demonstrate the importance of ligand reorganization free energy for the prediction of protein-ligand binding free energy.
Accurate prediction of the binding affinities of small-molecule ligands to their biological targets is fundamental to structure-based drug design but remains a very challenging task. Over the years, a number of scoring functions have been developed.1 Although these scoring functions can yield good predictions for the binding affinities of small-molecule ligands to their protein targets in some cases, they typically give poor to modest prediction when applied to a large number of proteins from different families and ligands with diverse chemical structures2,3. This is especially troubling even when experimentally determined high-resolution crystal structures of the protein-ligand complexes are used as the starting points for binding-affinity prediction. For example, our previous evaluations of 11 scoring functions showed that the best scoring functions were only able to yield an r2 value of 0.4 for 100 protein-ligand complexes, even when the experimentally determined crystal structures were used4. More recently, we have developed a new knowledge-based scoring function, named M-Score, based upon 2331 high-resolution protein-ligand crystal structures5. Evaluation of M-Score against 896 structurally diverse protein-ligand complexes not included in the training set yielded an overall correlation of r2 = 0.24 between the experimentally determined binding affinities and the calculated scores5. Further analysis of M-score with 17 protein families having more than 10 protein-ligand complex crystal structures for each protein showed that while we were able to achieve good to excellent correlation between the predicted scores and experimentally determined binding affinities with r2 values between 0.42 and 0.85 for six protein families, we obtained poor to modest correlation for nine protein families and anti-correlation for two protein families5. In general, our inability to accurately predict the binding affinities of small-molecule ligands to their biological targets is a major impediment to the success of many structure-based drug design projects. There is therefore a clear need to achieve a much better understanding of the factors that contribute to protein-ligand binding and to include them in the prediction of protein-ligand binding affinities.6
In this study, we have evaluated X-Score7, Drugscore8, M-Score5 and MM-GBSA9 for their ability to predict the binding affinities of 31 small-molecule Smac mimetics to XIAP. The 31 inhibitors include both peptides and peptidomimetics and have binding affinities (Ki) to XIAP ranging from 4 nM to 68 µM. We have found that all these four computational methods yield poor to modest prediction for binding affinities when using a single protein-ligand complex structure. The linear correlation coefficient (r2) value between the experimentally determined affinities and the predicted values was found to be between 0.21 and 0.36. Inclusion of an ensemble of protein-ligand conformations obtained from molecular dynamic (MD) simulation failed to improve the prediction significantly for three of the scoring functions. However, when the reorganization free energy for ligands was included in the MM-GBSA calculation, the r2 value increased from 0.36 to 0.66. The prediction was further validated using additional Smac mimetics from other laboratories.10,11
To our best knowledge, this is the first computational study to demonstrate the importance of ligand reorganization free energy for the prediction of protein-ligand binding free energy. Our present study clearly demonstrates that the reorganization free energy for ligands can make a significant contribution to the overall binding free energy between small-molecule ligands and their biological targets. Accordingly, this term should be included in the development of new computational methods for binding-affinity prediction of protein-ligand interaction.
The XIAP protein consists of three Baculoviral IAP repeat (BIR) domains. It is known that Smac-based peptides bind to both BIR2 and BIR3 domains but have much higher affinities to the BIR3 domain12. Accordingly, our current study has focused on the interaction between XIAP BIR3 domain and Smac mimetics. The XIAP-BIR3 protein used in our simulations consists of 102 residues (L256-E357) with the sequence:
The structure of the complex formed by XIAP-BIR3 and Smac protein was determined previously,13 and the coordinates were obtained from the Protein Data Bank14 (PDB entry: 1G73). Based upon the crystal structure, the first four residues (AVPI) in Smac mediate the interaction between Smac and XIAP BIR3. Thus, only a monomer unit (chain D) of XIAP and the first four residues (AVPI, chain A) of the Smac protein from the x-ray structure were used in the current study.
Models of all compounds were prepared and minimized with the Sybyl15 program. Docking simulations were performed using the GOLD 3.2 program16 to predict their binding models. The center of the binding site was set at T308 of XIAP BIR3 and its radius was defined as 13 Å, large enough to cover all the residues in the binding site. For each genetic algorithm (GA) run, a maximum number of 200,000 operations were performed on a population of 5 islands of 100 individuals. Operator weights for crossover, mutation and migration were set to 95, 95 and 10 respectively. The docking was terminated after 20 runs for each ligand. GoldScore, implemented in Gold 3.2, was used as the fitness function to evaluate the docked conformations. The twenty highest ranked conformations by each fitness function were saved for analysis. Since all the small-molecule inhibitors in our current study were designed to mimic the Smac AVPI peptide in its binding to XIAP BIR3, we used the following two criteria for the selection of the initial binding pose for each compound for subsequent MD simulations: (1) it should be a high ranked pose; and (2) the backbone atoms of the Smac mimetic should closely mimic the corresponding atoms of the Smac AVPI peptide in the crystal structure.
Amber (version 8)17 was used for the MD simulations. The Amber 99 force field parameters18 were used for the amino acids. Some parameters not found in the standard amino acid library are also required to perform the simulations. First, in XIAP-BIR3, a zinc ion is covalently bonded with His320, Cys300, Cys303, and Cys327 in a tetrahedral structure. Because the zinc ion only has a structural role and does not have direct interactions with the ligands, we used the force field parameters derived by Ryde19 and constrained the four residues in the zinc-coordinated center with modest harmonic forces (1 kcal/mol-Å2). Second, we used the Antechamber module in Amber to derive the force field parameters for all the compounds. The protocol for generating the point charge parameters is as follows: The docked pose of each compound was minimized at the RHF level using a 6–31G** basis set with Gaussian9820. The electrostatic field potential calculated from Gaussian98 was used to generate the point charges at each atom site based on the RESP fitting procedure.
To prepare the topology and coordinate files, counter ions were added to neutralize the charges in the complex before it was placed in a 10 Å cubic box of water. The TIP3P21 water model was used. After a 500-step minimization, a 0.5 ps simulation was performed to raise the temperature of the system to 150K, followed by another 1 ps of simulation to increase the temperature further to 298K. The system was then equilibrated for 8.5 ps at 298K. The production run was 1 ns. Conformations were saved from the trajectory at intervals of 0.4 ps. Conformations collected from 0.2 ns to 1 ns were used for the binding affinity prediction calculations. All the MD simulations were in the isothermal isobaric (NTP, T = 298K and P = 1 atm) ensemble. The SHAKE22 algorithm was used to fix bonds involving hydrogen. The PME method23 was used and the non-bonded cutoff distance was set at 10 Å. The time step was 2 fs, and the neighboring pairs list was updated in every 20 steps.
The unbound ligand conformation simulations were performed with the Generalized Born (GB) model.24 The force field parameters of the ligand were the same as those used in the protein-ligand complex simulation. When preparing the topology file, PBradii was set to mbondi2 and igb to 5. During simulations, the nonbonded cutoff distance was set to 999 Å, which was equivalent to no cutoff. Langevin dynamics was turned on with the collision frequency set to 5 ps−1. The temperature of the simulation was set to 300K. A time step of 1 fs was used in the simulation and the total length of simulation was 10 ns. Forty-nine conformations of each ligand from the 10 ns simulation were used for the free energy calculations. Convergence of the results was checked by using twice as many conformations from the same 10 ns simulations.
We evaluated three scoring functions (X-Score7, Drugscore8 and M-Score5) and the MM-GBSA7 method for binding affinity prediction. The ensemble average binding affinity predictions using X-Score, Drugscore and M-Score were calculated as follows: A snapshot of each single protein-ligand complex conformation from the trajectory was used to calculate a single score. It should be noted that the conformations of both the protein and the ligand are different in different snapshots. For every compound, 41 conformations, taken at intervals of 20 ps in the MD simulation (0.2 to 1 ns), were used to compute the average score and standard deviation to represent the binding affinity of each ligand. In the MM-GBSA calculation, the same 41 conformations corresponding to 20 ps intervals in the trajectory were used for the molecular mechanics calculations. Nine conformations (taken at intervals of 100 ps) from the 1 ns trajectory were chosen for the normal mode calculations for entropic contribution to the binding free energy. In the normal mode calculations, a distance-dependent dielectric constant ε = 4r was used, the maximum cycle was set to 60,000, and the convergence tolerance was 0.0002 kcal mol−1Å−1.
In general, the ligand reorganization free energy is defined as the free energy difference of the ligand with conformations at two different states, namely protein-bound and protein-free states. This quantity can be calculated using appropriate theories with consistent force field parameters. In this work, because we used the MM-GBSA model to estimate the relative binding free energy of the ligands and the protein, we used the same level of theory and force field parameters (i.e. MM-GBSA) to estimate the ligand reorganization free energy. Detailed theoretical models and calculations are as follows.
where Gcomplex(x), Gprotein(unbound) and Gligand(unbound) stand for the free energy of the protein-ligand complex, of the unbound protein and of the unbound ligand in solution. We will use (x) and (unbound) to denote conformations in the protein-ligand complex and the unbound states respectively. Individual terms of the free energy according to the MM-GBSA model are calculated as, Gsolute = EMM + Gsolv − TS, where EMM is the molecular mechanics energy, Gsolv is the solvation free energy and S is the entropy for the solute. A direct computation of the binding free energy using equation (1) will give the absolute binding free energy between the protein and its ligand. However, convergence of the absolute binding free energy typically requires a lengthy simulation. When the relative binding free energies of a series of compounds are needed, the one-state endpoint MM-GBSA model is frequently used in which case only a single trajectory of the protein-ligand complex simulation is needed. The connection between the two-state and one-state endpoint MM-GBSA models can be obtained by rearranging equation (1) as:
where δGprotein,re and δGligand,re are the protein and ligand reorganization free energy, respectively. When calculating the binding free energy of a series of compounds, the two-state endpoint model accounts for the differences of the structural changes in the protein and the ligand upon binding whereas the one-state endpoint model assumes that both δGprotein,re and δGligand,re contribute similarly to the binding free energy. In this assume δGprotein,re is the same for the series of compounds but explicitly included the δGligand,re term for each compound.
To calculate δGligand,re, we need to compute Gligand(x) and Gligand(unbound). Gligand(x) is the free energy of the ligand at the protein-bound state and was calculated directly from the MM-GBSA module in the Amber program suite using the simulated trajectories of the protein-ligand complex system. To be consistent with the force field parameters, Gligand(unbound) was calculated from the same MM-GBSA module using the simulated trajectories of the unbound ligand in the implicit GB solvent model. The choice of using the implicit GB solvent model to sample the unbound ligand conformations as opposed to the explicit solvent model results from considering the computational efficiency and consistency in force field parameters when calculating the free energy of the ligand. To extract Gligand(unbound) in the MM-GBSA module, snapshots of the ligand conformations from the unbound ligand simulations were used and the ligand was treated as a “receptor”. For both Gligand(x) and Gligand(unbound), enthalpic and entropic terms in the free energy were included as has been shown in the equation previously. Throughout the text, we have abbreviated and δGligand,re as ΔG and δΔGre.
The crystal structure of the XIAP BIR3 domain protein complexed with Smac protein13, and the NMR solution structure of XIAP BIR3 bound to a Smac-based peptide,12 reveal the detailed interactions between Smac and XIAP BIR3 and have provided a structural basis for the design of small-molecule Smac mimetics as inhibitors of XIAP.
The interactions between Smac and XIAP BIR3 are mediated by the Ala1-Val2-Pro3-Ile4 (A1-V2-P3-I4) four-residue binding motif in Smac and a well-defined groove on the surface of XIAP BIR3 protein.13 The methyl group of the Ala1 residue inserts into a small hydrophobic pocket, the free amino group forms strong hydrogen bonds to the Glu314 and Gln319 residues on the protein, and the backbone carbonyl group forms a suboptimal hydrogen bond to the indole NH group of Trp323. The amino and carbonyl groups of Val2 form optimal hydrogen bonds with the carbonyl and amino groups of Thr308, respectively, while the Val2 side chain is exposed to solvent and has no interaction with protein residues. The 5-membered ring of Pro3 has van der Waals contacts with the side chains of Trp323 and Tyr324 and finally, the amino group of the Ile4 residue forms a hydrogen bond with the carbonyl group of Gly306, and its hydrophobic side chain inserts into the hydrophobic pocket formed by the side chains of Leu292 and Val298 and the hydrophobic portion of the side chains in Lys297 and Lys299.
In addition to these two initial structures, seven crystal structures of small-molecule Smac peptidomimetics and non-peptidic mimetics complexed with XIAP BIR3 protein have been reported, and their coordinates are available in the Protein Data Bank (Supporting Information, Table S1). The availability of these experimental structures afforded us the opportunity to examine if these Smac mimetics have similar interactions with XIAP BIR3 and if ligand binding induces significant conformational change.
Superimposition of the nine available structures of XIAP BIR3 complexed with Smac protein, Smac peptide or small-molecule Smac mimetics showed that positional variation of the backbone atoms around the binding site (residues 292 to 324) in XIAP is minimal, with a root-mean-standard-deviation (RMSD) between 0.20Å and 0.75Å. Positional variations of the side chain of residues directly interacting with ligands (L292, K297, K299, L307, W310, E314, Q319, W323 and Y324) are also small and the RMSD value for non-hydrogen side chain atoms is between 0.28Å (W310) and 0.73Å (K297). Thus, the very similar conformations for the XIAP binding-site when bound to different ligands in these crystal structures suggest that the binding site may have limited conformational flexibility.
A possibility that the XIAP binding-site adopts similar conformations in the crystal structures is that all these small-molecule Smac mimetics were designed based upon the Smac AVPI peptide, and thus have very similar interactions with XIAP. To further assess the conformational flexibility of the XIAP binding site, we removed the peptide from the crystal structure of XIAP BIR3 complexed with the AVPI peptide and performed a lengthy (9 ns) MD simulation on the ligand-free XIAP BIR3 structure in the presence of explicit water molecules. This MD simulation showed that in the first 5.4 ns, the backbone atoms of the binding site (R285-P325) deviate very little from the ligand-bound conformation, with an RMSD around 0.8 Å for all non-hydrogen atoms (Supporting Information, Figure S1). After 5.4 ns, an increase in the RMSD of the backbone atoms to around 1.2 Å was observed. This increase in the backbone RMSD was found to be primarily attributable to the motion of the loop between K311 and D315. Since the hydrophobic side chain of W323 is exposed to solvent and is involved in interactions with the ligands, we expected that this residue might undergo a major conformational change to refold its hydrophobic side chain onto the binding site in the absence of the ligand. It was found however that throughout the entire 9 ns simulation, W323 remained in its open conformation.
Taken together, the MD simulations with the experimental structures of XIAP BIR3 complexed with different ligands indicate that XIAP BIR3 has a well-defined binding site with limited flexibility.
Binding models of these Smac mimetics, including both peptidomimetics and non-peptidic mimetics, to XIAP BIR3 were predicted using the GOLD program, starting from the high-resolution crystal structure of XIAP BIR3 complexed with Smac.13 The chemical structures of these Smac-based ligands and their binding affinities to XIAP BIR3 are provided in Table 1.
Analysis of the predicted binding models for these compounds showed that they all have very similar interactions with XIAP BIR3. The key charge-charge interactions, hydrogen bonding and hydrophobic contacts observed between the Smac AVPI peptide and XIAP BIR3 were essentially maintained in the predicted binding models for each of these compounds.
Since the binding model for compound 23 (SM-130) was predicted prior to the determination of its crystal structure in complex with XIAP BIR3, it provided an opportunity to directly validate the predicted binding model for this compound. Our analysis showed that the peptidic backbone conformation of compound 23 is the same in the predicted model and the crystal structure. The small binding cavity, in which Ala1 in Smac docks, has adequate room for the additional carbon atom of the ethyl group in compound 23. This ethyl group adopts two slightly different orientations in the model and in the crystal structure. The model also shows a more energetically favorable conformation for the peptide bond connecting the lactam and the biphenyl group as opposed to the distorted conformation in the crystal structure. Finally, the biphenyl group in compound 23 assumes the same orientation in the model and in the crystal structure.
The predicted binding model for 22 from our group and that for 32 reported by scientists from Genentech are shown in Figure 3 (B)–3(D). The crystal structure of compound 32 in the complex with an XIAP-ML-IAP BIR3 chimera has been reported (PDB ID: 2I3I), and closely mimics the predicted binding pose for this compound complexed with XIAP. The bicyclic core of compound 32 in the docked pose has less contact with the protein surface than that in the chimera crystal structure, which may be attributed to the fact that Phe324 in the chimera protein is replaced with a slightly bulkier and polar Tyr324 in XIAP BIR3.
Based upon the predicted binding models for Smac ligands, we have calculated their relative binding affinities using three different scoring functions, X-Score, Drugscore and M-Score, and correlated the calculated values with their experimentally determined Ki values binding to XIAP BIR3. We included compounds 1–31 in the correlation analysis since their binding affinities to XIAP BIR3 were determined under the same assay conditions.
As seen in Figure 4, the r2 between the calculated relative binding affinities or scores and experimentally determined Ki values are 0.16 by X-Score, 0.25 by Drugscore and 0.30 by M-Score, respectively. Hence, all these three scoring functions yield poor to modest accuracy in their prediction of the relative binding affinities for these 31 Smac ligands.
Although it has been a common practice to use a single predicted binding model for each compound to assess the relative binding-affinities between a set of compounds, this practice has its inherent limitations. For example, although the highest ranked binding pose based upon the scoring function in the GOLD program was selected as the predicted binding model for each ligand, there were a number of top ranked binding poses which were similar but not identical to the highest ranked binding pose. Furthermore, although our MD simulation of the XIAP BIR3 protein suggested that its binding site has limited flexibility, a number of polar/charged side chains of key binding residues exhibited significant conformational changes during the 9 ns simulation. Therefore, we tested whether inclusion of an ensemble of binding poses for each compound can improve the prediction of the relative binding affinities of Smac ligands to XIAP over the use of a single binding pose for each compound.
To generate an ensemble of conformations, MD simulation was performed for each protein-ligand complex in the presence of explicit water molecules starting from the predicted binding pose. The ensemble of conformations was then used to calculate an average score by X-score, Drugscore and M-score for each protein-ligand complex. A correlation analysis between the averaged scores and the experimentally determined Ki values is shown in Figure 5. As can be seen, using the average scores generated from an ensemble of conformations for protein-ligand complexes generally improved the correlation between the predicted scores and the experimental Ki values for all three scoring functions. However, the overall correlation obtained for each of the scoring functions is still modest. The largest improvement was observed for Drugscore, in which the r2 value improves from 0.25 to 0.36.
Since use of an ensemble of conformations still only yielded modest accuracy by Drugscore, M-score and X-score in the prediction of the relative binding affinities for these Smac ligands, we next evaluated the MM-GBSA method. In the context of using an ensemble of conformations for binding affinity assessment, the MM-GBSA method provides several advantages. First, the same force field parameters are used for simulation and post-processing for calculating binding affinity. Second, individual free-energy components can be analyzed. Third, other parameters related to the overall binding free energy, such as conformational free-energy changes for a protein and ligand that cannot be obtained from the protein-ligand complex structure alone, can be considered on the same theoretical basis.
We first performed energetic decomposition and correlated each term with the experimental Ki values. It was found that the most significant correlation is between the van der Waals interaction of ligands with protein and the experimentally determined Ki values, yielding an r2 value of 0.29. This correlation is similar to those obtained by Drugscore, M-score and X-score. Including all other terms in the MM-GBSA method (Table S3, Supporting Information), the r2 value increases from 0.29 to 0.36 [Figure 6(B)]. Hence, our MM-GBSA analysis showed that the dominant contribution to the binding affinity difference of these 31 compounds with XIAP BIR3 is from the van der Waals interaction, but the correlation is still modest.
In our design of Smac mimetics, we have applied the classical conformational constraining strategy on the Smac peptide to improve their binding affinities to XIAP by reducing the entropic cost. Therefore, we next investigated whether inclusion of ligand reorganization free energy, the free-energy change between protein-free and protein-bound states for the ligands, can significantly improve the correlation in the MM-GBSA calculation. Since the binding site of XIAP BIR3 adopted a very similar conformation when bound to different ligands in the crystal structures and based upon our computational studies, we made an assumption that the XIAP protein has a similar conformational free-energy change upon binding to different ligands and focused our investigation on the ligand conformational free energy.
We generated ligand ensemble conformations in the protein-bound and protein-free states through MD simulations, which were necessary to calculate the ligand reorganization free energy change. Because ligands bound to the protein occupy a restricted conformational space, a relatively short simulation (1 ns) was performed to sample the ligand conformations bound to the protein. Ligands in the protein-free state (i.e. in solution) are much less restricted by the environment and much longer simulation is needed to adequately sample their conformational space. Accordingly, we performed a 10 ns simulation for each ligand using the generalized Born (GB) inexplicit solvation model. Conformations of the ligand from both simulations were used to calculate the free energies of the ligand in the respective states and the difference between them is the ligand reorganization free energy.
This calculated ligand reorganization free energy was then included in the correlation analysis, together with other terms calculated using the MM-GBSA method, and the results are shown in Figure 6C. As can be seen, inclusion of the ligand reorganization free-energy change significantly improves the correlation between the calculated binding free energies and experimental Ki values for these 31 Smac ligands. The r2 value was increased from 0.36 to 0.66 and the standard deviation (SD) was improved by 1.0 kcal/mol, from 2.6 to 1.6 kcal/mol.
Analysis of the ligand reorganization free energy showed that between different ligands, there is a quite large variation in this term. The most positive contributions (most negative values for δΔGre) are −1.85 and −0.92 kcal/mol for compounds 9 and 26, respectively. The most negative contributions (most positive values for δΔGre) are 7.85, 7.29 and 5.24 kcal/mol for compounds 29, 28 and 23, respectively. The most potent compounds 26, 27 and 25 have a relatively small penalty in their ligand reorganization free energy; an exception is found in compound 22.
We decomposed the ligand reorganization free energy into the enthalpic and entropic terms. One major surprise to us was that there is a positive entropic contribution (negative value for −δΔTΔS) for most compounds but a negative enthalpic contribution for all the compounds except 9, 21 and 26. Most of the compounds gain some entropy (up to 1.62 kcal/mol) but lose more enthalpy (up to 8.44 kcal/mol) when they transition from the protein-free to protein-bound state (Figure 7B). On average, enthalpic loss dominates over entropic gain by a factor of 6, and while the reorganization enthalpic loss can change the total ΔH by about 25%, the entropic gain resulting from reorganization accounts for at most 9% of the total TΔS (cf. Supporting Information, Tables S3 and S4).
To quantify the ligand conformational changes between protein-bound and protein-free states, we use the radius of gyration, commonly used in protein folding studies, as an indicator. The average radius of gyration (Rg) of the ligands in protein-bound and protein-free states is found to be 4.68 and 4.22 Å, respectively (Table S5, Supporting Information), indicating that there is a 10% extension of the ligand geometry upon binding. The data are consistent with the structural information that these ligands adopt an extended conformation when bound to the protein, similar to the β-turn conformation of the Smac AVPI peptide that is observed in the crystal structure. In solution, they form more compact conformations, corresponding to the hydrophobic collapse of the side chains of A1, P3, and I4. Such conformational changes allow ligands to maximize hydrophobic contact with the protein and minimize their hydrophobic surface in solution.
We next performed the leave-one-out cross-validation to further examine the improvement in binding affinity prediction by inclusion of the ligand reorganization free energy in the MM-GBSA calculation. The results are provided in Table 2.
Without the ligand reorganization free-energy term (ΔG only), the cross-validated r2 is 0.36 ± 0.02 in the leave-one-out analysis. In comparison, inclusion of δΔGre in the leave-one-out analysis improves the cross-validated r2 value to 0.65 ± 0.02. Importantly, the average unsigned error was reduced to 1.03 kcal/mol from 1.78 kcal/mol by inclusion of the ligand reorganization free-energy term in the cross-validation analysis.
We next assessed the true predictive power of the MM-GBSA calculation with or without the ligand reorganization free energy for novel compounds not included in the correlation analysis.
where the number in parentheses is the fitting error of each parameter27 and the standard deviation of the fit is 1.61 kcal/mol.
When only ΔG is used, equation 2 was obtained:
Equation 3 and Equation 4 were then used to predict the binding affinities of ten Smac mimetics (compounds 32–41 in Figure 8), which were reported by scientists from Genentech.10,11,28,29 The crystal structures of compounds 32, 36 and 37 with the XIAP-ML-IAP chimera have been determined (PDB entries, 2I3I, 3F7H and 3F7I)10,11. Since the binding sites between XIAP-ML-IAP chimera and XIAP BIR3 are similar, these crystal structures can be used to validate our predicted binding poses for compounds 32–41 to XIAP BIR3. The docked pose and the final snapshots of the 1 ns MD simulations of compounds 32–41 are depicted in Figure 9, which show that there are substantial changes of the initial docked poses for these compounds upon MD simulations.
The predictions using equation 3 and equation 4 for compounds 32–41 are provided in Table 3 and Figure 10. Using only ΔG (equation 4), the average unassigned error of the predicted binding affinities for compounds 32–41 from the experimentally determined values is 3.26 kcal/mol. In comparison, using ΔG+δΔGre (equation 3), the average unassigned error of the predicted binding affinities for compounds 32–41 from the experimentally determined values is 1.11 kcal/mol. As can be seen from Figure 10, the predicted values for compounds 32–41 (depicted in red circle) all fall within the error ranges observed for compounds 1–31 (depicted in grey dots). The predicted r2 value was improved from 0.13 for compounds 32–41 when equation 2 was used, to 0.45 when equation 3 was used. The lower predicted r2 value of 0.45 for compounds 32–41 than the cross-validated r2 value of 0.65 for compounds 1–31 may be attributed, at least in part, to the fact that the binding affinity difference between compounds 32–41 is only 30 times. These data strongly suggest that inclusion of the ligand reorganization free energy in the MM-GBSA calculation not only significantly improves the correlation between the calculated values and the experimentally determined binding affinities for compounds in the training set, but also greatly enhances the prediction of the binding affinities for new compounds.
Despite significant progress, accurate prediction of the binding affinities (binding free-energies) of protein-ligand interaction remains a very challenging task. Poor prediction for protein-ligand affinities may arise from incorrectly predicted binding models, lack of consistent binding data obtained using different assay methods and conditions, and defects in computational methods for calculating the binding affinity of a protein-ligand complex. In this study, we have performed computational studies to investigate the strengths and weaknesses of current computational methods for the prediction of binding models, as well as binding affinities. We have chosen the XIAP protein and its ligands (which are also called Smac mimetics) as our model system due to the following reasons: (1). the availability of a number of high-resolution crystal structures for this system; (2). the availability of a relatively large number of designed ligands for XIAP, both peptides and non-peptidic in nature; (3). the availability of consistent experimental binding affinity data obtained using the same method and under the same assay conditions; and (4). The important role of this system in regulation of apoptosis.
Our studies showed that computational docking was able to reliably predict the binding models of these Smac mimetics to XIAP, as confirmed by crystal structures for some of these compounds. However, all of the four computational methods tested (X-Score, Drugscore, M-Score and MM-GBSA) yield poor to modest correlations between computational scores and experimentally determined binding affinities. Inclusion of the ensemble conformations for the protein-ligand complex structures failed to yield a significant improvement. A major improvement to the correlation by the MM-GBSA method was found when the ligand reorganization free energy change between protein-free and protein-bound states was included in the calculation, which improved the correlation r2 value from 0.36 to 0.66, and the unassigned error from 2.6 to 1.6 kcal/mol. Our leave-one-out cross-validation analysis further showed that inclusion of the ligand reorganization free energy change in the MM-GBSA calculation indeed significantly improves the prediction accuracy. Importantly, when tested for ten new compounds reported by another research group, significant improvement in binding affinity prediction accuracy was also observed with the inclusion of the ligand reorganization free energy change.
The majority of the current computational methods for binding affinity prediction include terms accounting for the interaction energy between protein and ligand and desolvation effect for both ligand and protein. However, the importance of the ligand reorganization free energy between protein-free and protein-bound states, as well as that of the protein reorganization free energy between ligand-free and ligand-bound states, for the overall binding free-energy between protein and ligand is not known. This is in part due to the experimental difficulty in obtaining the ensemble conformations for both ligand and protein in two different states.30,31 Using computational methods, our present study clearly shows that the ligand reorganization free energy change between protein-free and protein-bound states can play a very important role in the binding affinity between small-molecule inhibitors and their targets. This term should be evaluated in the binding affinity prediction for other protein-ligand systems and in the development of a new generation of scoring functions with improved accuracy. To our knowledge, this is the first computational study which examines the importance of the ligand reorganization free energy between protein-free and protein-bound states for the prediction of protein-ligand binding free energy.
In our present study, we have found that the XIAP protein experiences very limited conformational changes when binding to different ligands. Furthermore, the small-molecule inhibitors of XIAP in our study were all designed to mimic the Smac AVPI peptide and have very similar interactions with the protein, both in our predicted models and in the crystal structures. Therefore, we made an assumption that the XIAP protein reorganization free energy changes between ligand-free and ligand-bound states for these ligands are similar. This assumption may not be valid in cases in which a protein adopts very different conformations as it binds to different ligands and/or ligands vary significantly in their binding modes. In such cases, both the ligand and protein reorganization free energy terms may have to be included in order to significantly improve binding affinity prediction between proteins and ligands. We are currently investigating the importance of the protein reorganization free energy term in binding affinity prediction, and the results will be reported in due course.
We are grateful for financial support from the National Cancer Institute, National Institutes of Health, USA, (R01CA109025), the Breast Cancer Research Foundation, the Susan G. Komen Foundation, the University of Michigan Cancer Center Core grant (P30CA046592) and the National Center for Integrative Biomedical Informatics grant (U54DA021519).
Crystal structure information between nine ligands and XIAP BIR3, scores calculated by three scoring functions, binding free energy and ligand reorganization free energy calculated by the MM-GBSA method, radius of gyration of the ligands are provided free of charge via the Internet at http://pubs.acs.org.