The application of computational methods in drug discovery has been ever increasing with the growth of computational power and accumulation of new experimental data. During the past decade, various new methodologies have continued to improve in addressing the two fundamental steps of computational drug design: docking1–3
and scoring.4, 5
Docking generates and identifies native or native-like configurations of the receptor-ligand complexes. Then an appropriate scoring function approximates the binding affinities using their geometries. In a virtual screening process, these two steps are used to distinguish the hits from a large library of small molecules. While numerous docking methods are capable of producing near native docking poses,1–3
the development of a high-accuracy scoring function still remains a challenge. A good scoring function has several requirements. 1) It should be able to distinguish the near native poses from the non-native poses of a particular ligand molecule. 2) The predicted protein-ligand binding affinities should correlate with the experimentally derived values. 3) The scoring function should have a low computational cost, making it applicable to a large set of ligand molecules. Ferrara et al
. evaluated the performance of 9 different scoring functions using 189 protein-ligand complexes from the Ligand-Protein DataBase (LPDB).4
Several of the scoring functions, including CHARMm6
, and AutoDock9
, performed well in distinguishing near-native poses from mis-docked poses. However, the binding affinities predicted by most of the scoring functions correlated poorly with the experimental binding energies. Among them, ChemScore performed the best with R2
= 0.51. Using 800 protein-ligand complexes, Wang et al.
evaluated 14 different scoring functions for their ability to predict experimental binding affinities.5
They also concluded that the predicted and experimental binding affinities correlated only moderately. Among the scoring functions, X-Score10
, DrugScore, Sybyl::ChemScore11
, and Cerius2::PLP12
produced better correlations. Even in an ideal case where all of the binding geometries are predicted with perfect accuracy, an inaccurate scoring function for ranking compounds will still produce a large number of false positives in the screening process. This impairs the utility of virtual screening as a tool for drug discovery. Thus the development of a reliable and accurate scoring function has been the focus of many ongoing studies.
Force-field based linear interaction energy (LIE) methods have been widely employed to predict protein-ligand binding free energies with reasonable accuracy.13–18
Typical implementation of a LIE estimates the binding free energy by averaging the interaction energies of protein-ligand complexes extracted from molecular dynamics simulations. More recently this method has been successfully applied using only single-point energy minimizations along with high-accuracy continuum electrostatic solvent approaches (either Poisson-Boltzmann or Generalized-Born implicit solvation). Different interaction energy terms for van der Waals, electrostatic and solvation are scaled by empirical parameters that are optimized using a set of protein-ligand complexes. The greatest advantage of the LIE method compared to alchemical transformation methods (free energy perturbation and thermodynamic integration) is that it may be applied over thousands of ligands with dramatic differences in size and topology. In spite of this advantage, the application of the LIE method still suffers from the fact that the parameters may not be perfectly transferable across significant variations in ligand classes and functional groups, and over significant changes in binding sites.
Empirical scoring functions have been very useful for their simplicity and low computational cost. The derivation of most standard empirical scoring functions is based on the non-rigorous Kirkwood superposition approximation, namely that the binding affinity can be estimated by the addition of individual interaction terms. In many of these implementations, regression-based methods are applied to estimate the weights of the individual terms using experimentally derived configurations and binding affinities of a set of receptor-ligand complexes. ChemScore8
have been among the most popular empirical scoring functions. In spite of their success, poor transferability of parameters has been one of the major disadvantages of empirical scoring functions. As a general rule, the parameters that regulate the relative contributions of the individual terms are dependent on the training set used for the regression analysis. Several factors, including molecular surface area, hydrogen bonds, ligand rotational entropy, ionic interactions, hydrophobic interactions, and desolvation energies, were taken into account while constructing empirical scoring functions.8, 9, 19
The exploration and incorporation of other novel factors can also improve performance of empirical scoring approaches as experimental datasets grow larger and more diverse.20
For ligands that bind with reasonable affinity, the size of the ligand correlates directly with the protein-ligand binding affinity and thus it can be considered as a factor in the empirical scoring scheme. Kuntz et al.
surveyed the correlation between the sizes and the binding free energies of a large number of ligands.21
They demonstrated that each non-hydrogen ligand atom contributed a maximum of ~ −1.5 kcal/mol to the protein-ligand binding free energy. The total binding free energy reached a limit of ~ −15 kcal/mol for larger number of heavy atoms per molecule. They suggested that the decrease in the average contribution per atom with increasing number of ligand atoms was due to the shielding of van der Waals and hydrophobic interactions. Hajduk et al.
deconstructed 18 highly optimized drug leads into fragments and analyzed the consequent potency change.22
They observed a linear relationship between the binding affinity and the ligand molecular weight. Both of these studies suggest that as long as a given ligand is known to bind with a reasonable affinity, the binding free energy scales as a function of ligand size. In the spirit of these observations, we have developed a scoring function S1 based on a direct quantitative relationship between the binding free energy and the ligand size. For large ligands that bind strongly to a receptor, the contribution to ΔGbind
from ligand atoms that are not strongly interacting with receptor atoms are significantly reduced due to shielding effects.21
Therefore ligand atoms that form close contacts are the major contributors to the binding free energy. Thus, in our derivation of S1, we considered only these “interacting” ligand atoms in determining the “effective size” of the ligand. We have demonstrated that this method is successful in removing the non-linearity in the binding-affinity ligand-size relationship. We have further extended S1 in order to construct an empirical pair potential-based scoring function S2, which was initially trained and validated on a diverse set of 259 protein-ligand complexes from the Ligand Protein Database (LPDB),23
and then additionally assessed in the CSARdock 2010 benchmark test (NCR HiQ set) (reference). In our pair interaction energy scheme S2, we classified and counted only the directly interacting protein-ligand atomic pairs and used them as variables in our derivation of binding free energy. S2 is entirely empirical in nature, which is different from the knowledge-based pair potentials. Our primary objective of this manuscript is to assess the performance of S1 and S2. In order to do this, we also compared S1 and S2 results to other variations of these schemes (S3, S2h and S3w) in order to further understand the origin of good or bad performance. For example, as a variation on S2, we have also constructed S2h, which considers the interactions among only the heavy atoms.
Many knowledge-based scoring functions are developed by systematic analyses of atomic interactions between adjacent receptor and ligand surfaces. Using the crystal structures of 38 protein-ligand complexes, Wallqvist et al.
determined the statistical preferences of the atomic interactions between amino acids.24, 25
They constructed a model to estimate the binding free energy based on the classifications of the atomic pairs buried in the interfacial surface. Using the model, they could predict the binding free energies of 10 HIV protease complexes with an accuracy of ±1.5 kcal/mol. Verkhivker et al.
derived a knowledge-based distance-dependent pair potential using 30 HIV-1, HIV-2 and SIV protein-ligand complexes and were able to achieve a good correlation between the experimental and predicted ΔGbind
DeWitte et al.
developed an inter-atomic interaction-based scoring function using a large set of protein-ligand crystal structures.27
They incorporated this knowledge-based potential into a Metropolis Monte Carlo molecular growth algorithm, SMoG, for de novo ligand design. DrugScore and DrugScore(CSD)28
are the most widely used and validated distance dependent pair potential based scoring function for protein-ligand interactions. DrugScore has been successful in identifying native poses of ligands7
as well as in predicting their binding affinities28
. As a control, we have compared our S1 and S2 scoring schemes to a distance-dependent pair potential S3 that was derived from the same protein-ligand dataset that was used for regression analysis to develop S1 and S2.
We have compared the performance of these scoring functions against our implementation of LIE for several steps of the virtual drug screening process. The parameters of the LIE, S1, S2, and S2h scoring functions demonstrate reasonable transferability in a five-fold cross-validation for both the prediction of binding geometries and free energies. The performance of S2 is significantly better than the performance of LIE for predicting the binding free energies of a diverse set of 259 protein-ligand complexes from LPDB. We have evaluated the ability of the scoring functions to discriminate the well-docked native-like poses from the mis-docked poses and calculated the discriminative power
(DP) of the scoring functions to identify native poses. Finally, we have tested the scoring functions for their ability to distinguish active compounds from the non-active decoy compounds in a virtual screening process. We combined active and decoy ligands obtained from the directory of useful decoys (DUD)29
that are specific for representative proteins from four target classes. The results of this study demonstrate the usefulness of the S1, S2, and S2h scoring functions in various stages of virtual screening process, especially compared to the S3 and S3w variations.