|Home | About | Journals | Submit | Contact Us | Français|
Applications in structural biology and medicinal chemistry require protein-ligand scoring functions for two distinct tasks: (i) ranking different poses of a small molecule in a protein binding site; and (ii) ranking different small molecules by their complementarity to a protein site. Using probability theory, we developed two atomic distance-dependent statistical scoring functions: PoseScore was optimized for recognizing native binding geometries of ligands from other poses and RankScore was optimized for distinguishing ligands from nonbinding molecules. Both scores are based on a set of 8,885 crystallographic structures of protein-ligand complexes, but differ in the values of three key parameters. Factors influencing the accuracy of scoring were investigated, including the maximal atomic distance and non-native ligand geometries used for scoring, as well as the use of protein models instead of crystallographic structures for training and testing the scoring function. For the test set of 19 targets, RankScore improved the ligand enrichment (logAUC) and early enrichment (EF1) scores computed by DOCK 3.6 for 13 and 14 targets, respectively. In addition, RankScore performed better at rescoring than each of seven other scoring functions tested. Accepting both the crystal structure and decoy geometries with all-atom root-mean-square errors of up to 2 Å from the crystal structure as correct binding poses, PoseScore gave the best score to a correct binding pose among 100 decoys for 88% of all cases in a benchmark set containing 100 protein-ligand complexes. PoseScore accuracy is comparable to that of DrugScoreCSD and ITScore/SE, and superior to 12 other tested scoring functions. Therefore, RankScore can facilitate ligand discovery, by ranking complexes of the target with different small molecules; PoseScore can be used for protein-ligand complex structure prediction, by ranking different conformations of a given protein-ligand pair. The statistical potentials are available through the Integrative Modeling Platform (IMP) software package (http://salilab.org/imp/) and the LigScore web server (http://salilab.org/ligscore/).
Molecular recognition between proteins and ligands plays an important role in many biological processes, such as membrane receptor signaling and enzyme catalysis. Predicting the structures of protein-ligand complexes and finding ligands by virtual screening of small molecule databases are two long-standing goals in molecular biophysics and medicinal chemistry.1, 2 Solving both problems requires the development of an accurate and efficient scoring function to assess protein-ligand interactions.
Much effort has been devoted to developing scoring functions for modeling protein-ligand interactions.3–12 These scoring functions can be divided into three categories13: potential or free energy functions based primarily on a molecular mechanics force field14–26, knowledge-based statistical potentials based on distributions of intermolecular features in large databases of protein-ligand complex structures27–40, and empirical-regression functions fitted to experimental binding constants of a training set of protein-ligand complexes.41–50
Energy functions based on molecular mechanics force field generally estimate the binding affinity by summing van der Waals, electrostatic, desolvation, and/or entropy terms. The weights for various terms are sometimes obtained by fitting the energy function to experimental binding constants for a training set of protein-ligand complexes. Because of the rugged energy landscape, minimization is often required prior to energy evaluation. The identification of the global minimum in the energy landscape generally requires extensive conformational and configurational sampling.
Statistical potentials are based on distributions of intermolecular structural features extracted from large databases, such as Protein Data Bank (PDB)51 and Cambridge Structural Database (CSD).52 Statistical potentials have been widely used because of their relative simplicity, accuracy, and computational efficiency.53–98 During the last decade, several statistical potentials have been developed to describe protein-ligand interactions, such as PMF,27 SMoG2001,33 and DrugScore.30, 35 Still, many aspects of statistical potentials for protein-ligand interactions have not yet been systematically explored.
Here, we are interested in the following questions. First, can a statistical potential be used for distinguishing between ligands and nonbinding molecules, in addition to recognizing native binding modes? Second, can the accuracy of a statistical potential be improved by adding “negative” information, such as geometric decoys of the true ligands? Third, what is the accuracy of scoring complexes with modeled protein structures relative to that with crystallographic structures? Finally, what are the differences between the reference states for protein-ligand and protein-protein statistical potentials?
We describe two distance-dependent atomic statistical potentials derived from PDB - one for predicting the binding pose of a known ligand (PoseScore), and the other one for identifying ligands through virtual screening (RankScore). We proceed in three steps. First, distance distributions for the protein-ligand atom-type pairs were calculated from a sample of native complex structures (structures in the training and testing sets are excluded). Second, the distance-dependent atomic potential was derived from these distance distributions, and trained to find the optimal set of parameters for binding pose prediction (PoseScore) and ligand enrichment (RankScore), respectively. Third, PoseScore and RankScore were evaluated with the aid of two widely used docking benchmarks.11, 99 The performance of PoseScore and RankScore was compared to that of a number of other scoring functions.
We begin by describing the theory used to derive PoseScore and RankScore, criteria to evaluate the accuracy of each statistical potential, as well as the procedures and data sets used to derive, train, and test the statistical potentials (Methods). We then describe the accuracy of PoseScore and RankScore for docking against crystal structures of proteins in comparison to 14 and 7 other scoring functions, respectively (Results). We proceed by describing (i) the effect of including both native and non-native conformations of small molecules in the derivation of the statistical potentials, (ii) the accuracy of scoring against modeled protein structures, as well as (iii) the distribution of atomic protein-ligand distances. Finally, we discuss the implications of the results, relative successes and failures, and answer the questions raised above (Discussion and Conclusions).
An atomic distance-dependent, statistical potential for protein-ligand complexes can be defined as the negative logarithm of the joint probability density function (pdf) of the atomic Cartesian coordinates, as suggested in a previous study100:
where m is the number of atoms in the protein and i are the Cartesian coordinates of protein atom i ; n is the number of atoms in the ligand and j are the Cartesian coordinates of ligand atom j. p (1, 2, …, m, 1, 2, …, n) is the probability that the structure defined by the Cartesian coordinates (1, 2, …, m, 1, 2, …, n) is the native structure. The joint pdf can be approximately modelled by a normalized product of pair pdfs:
where p (i) and p(j) are single-body pdfs of protein atom i and ligand atom j; a pair pdf p(i, j) depends only on the types and positions of the two atoms.
Next, we relate the pair pdf p (i, j) for an atom pair between a protein atom i of atom type P and a ligand atom j of atom type L to the distance pdf:
where is the distance distribution for the atom-type pair (P,L), derived directly from a sample of native complex structures. We define a “reference state” as uncorrelated positions of atoms of types P and L in a finite sphere of an appropriate size and centered at L. Combining Equations 1–3, we obtain:
In this formulation, the statistical potential for the protein-ligand complex is a sum of three terms, corresponding to the protein, ligand, and protein-ligand atomic distances. Here, we focus only on the protein-ligand term:
Given a sample of native complex structures, the distance distribution for a pair of atom types (P,L) can be estimated as:
where N (rP,L) is the number of observations of the pair (P,L) in a particular distance bin (r, r + Δr]. rmin and rmax are the minimal and maximal distance bounds of the distribution, respectively. To minimize the impact of the finite sample size on the accuracy of pnat (rP,L), we obtain a more accurate estimate of the distance distribution for a pair of atom types (P,L) by linearly combining pnat (rP,L) with the reference state pref (rP,L)101, 102:
where wref is an adjustable parameter to be optimized by training. For a pair of atom types (P,L), the reference state is the distance distribution derived from all conformations of a protein-ligand complex.
The calculation of pref (rP,L) is not straightforward, because it is not possible to enumerate all conformations of the protein-ligand complex. We approximated the reference state by deleting atom type labels in native complex structures:
where is the number of all pairs of atom types in a particular distance bin (r, r + Δr]. Because the reference state in Equation 8 is based on the native complex structures, it may still be different from the distribution for the native and all non-native structures. Therefore, we also adjust the reference state by linearly combining it with the uniform distribution:
where wuni is an adjustable parameter to be optimized by training, and Nbin is the total number of distance bins between rmin and rmax. Considering Equation 5, 7, and 9, the statistical potential for protein-ligand interactions is:
The bin size Δr is set to 0.1 Å. The minimal distance boundary rmin is set to 2 Å. The protein-ligand score for distances of less than 2 Å is calculated by a linear interpolation between Smax = 20.0 and S (rP,L) in the distance bin (2.0,2.1]. rmax is an adjustable parameter to be optimized by training.
The protein atom types were adapted from the DOPE scoring function,100 resulting in 158 residue-dependent atom types for non-hydrogen atoms . 26 atom types were used to represent non-hydrogen atoms in small molecules, derived from the SYBYL software (Tripos, Inc.) (Table 1).
The geometric accuracy of a ligand pose was measured by its all-atom root-mean-square-deviation (RMSD) from the crystal structure. The correct binding pose of a ligand was considered successfully recognized if the all-atom RMSD value of the best-scored pose is less than 2 Å.11, 30, 35, 39, 40
The accuracy of ranking ligands in molecular docking screens was evaluated by the enrichment for the known ligands among the entire docking library, as quantified by the area under the curve (logAUC) of the enrichment plot.103 A random selection of compounds from the mixture of actual ligands and decoys yields a logAUC of 14.5; a mediocre selection that picks twice as many ligands as a random selection has logAUC of 24.5; a highly accurate enrichment that produces ten times as many ligands than a random selection has logAUC of 47.7. For each compound, the DOCK-produced complex model104, 105 was re-ranked by the tested scoring function. The ligand enrichment was quantified using the area under the enrichment curve with the x-axis on the logarithmic scale (logAUC),103, 106 For each protein target, the ligand enrichment for the DOCK-produced ranking was compared to that generated by re-ranking the DOCK list with the statistical potential. A difference larger than 3 logAUC units between the two enrichment values was defined to be significant; otherwise the enrichment values were considered to be comparable. The value for this significance cutoff was chosen subjectively, based on a previous study.103
8,885 X-ray structures of protein-ligand complexes used for the calculation of PoseScore and RankScore were selected from the dataset used in previous automated docking screens.51, 106 Five conditions were applied: 1) only crystallographically determined complexes with a resolution better than or equal to 2.5 Å were used; 2) the protein receptor had to contain more than 50 non-hydrogen atoms; 3) the ligand had to contain at least one carbon or nitrogen atom; 4) at least one pair of protein-ligand non-hydrogen atoms had to have a distance between 2.0 and 4.0 Å; 5) no overlap between the complex structures (PDB entries)in the training and testing sets was allowed.
The training set used for PoseScore was constructed from the Astex diverse set that contains 85 crystallographically determined protein-ligand complexes.107 DOCK was employed to generate ligand poses for all complexes. In 70 out of the 85 cases, 100 poses were generated for the ligand, containing at least one pose with an all atom RMSD error of less than 2.0 Å (near-native solution). The training set (Astex_DOCK, Table S1) was formed by these 70 complexes. For each complex, Astex_DOCK included the crystal structure of the protein, the crystal binding pose of the ligand, and the 100 docking poses of the ligand (geometric decoys).
The Astex_DOCK training set was used to find optimal values for the three adjustable parameters in PoseScore, including the maximal distance boundary rmax, and the two smoothing parameters wref and wuni. rmax was optimized over 6 discrete values, including 4, 6, 8, 10, 12, and 14 Å. The search for optimal values of wref and wuni ranged from 0.0 to 0.9 with an increment of 0.1. First, we fixed wuni at 0.0 and scanned rmax and wref. The statistical potential was most accurate when rmax was 6 Å and wref was 0.4 (Figure 1a). The correct binding pose of the ligand, either the crystal structure or a docking pose with an all-atom RMSD error ≤ 2.0 Å, was detected for 64 (91%) targets. When the crystal structures of ligands were excluded from the training set, we were able to identify the correct binding pose for 53 (76%) targets. Second, the value of rmax was set to the optimized value of 6 Å, while the values of wref and wuni were explored. The statistical potential was most accurate when wref and wuni were both 0.3 (PoseScore). The correct binding pose was detected for 67 (96%) and 56 (80%) targets when the crystal structures of ligands were included and omitted, respectively.
The performance of the trained PoseScore was tested using the previously constructed data set of 100 protein-ligand complexes11 (Wang_AutoDock). For each complex, 100 docked conformations were generated using AutoDock.26 In 91 out of 100 cases, there is at least one near-native solution. The accuracy of PoseScore on Wang_AutoDock was compared to accuracies of 14 other scoring functions that were previously tested using the same data set 11, 35, 38–40,(Table 2).
38 crystallographically determined protein-ligand complexes were taken from “A Directory of Useful Decoys” (DUD) benchmark set99 and divided into two equally sized subsets. 19 complexes (DUD-1) were used in the training of RankScore (Table S2a), while the rest (DUD-2) were used to test RankScore (Table S2b). All compounds in DUD (annotated ligands and screening decoys) were screened against the 38 holo X-ray structures.103 The generated docking poses in DUD-1 and DUD-2 were used to train and test RankScore, respectively.
The DUD-1 training set was used to find optimal values for rmax, wref, and wuni. The training process used for PoseScore was employed, except that 3 values were explored for rmax, including 6, 10, and 14 Å. The statistical potential showed the highest accuracy in the rescoring of the DUD-1 training set when rmax, wref, and wuni were 6 Å, 0.4, and 0.0 (RankScore), respectively (Figure 1b). For 14 targets, enrichment against the entire DUD library (logAUC) was improved by rescoring, compared to the original enrichment by DOCK. For 1 target, the rescoring enrichment was comparable to that by DOCK. For the remaining 4 targets, lower enrichment was obtained after the rescoring procedure. Rescoring improved the average logAUC by 6.9.
The accuracy of the trained RankScore was tested using the DUD-2 set. We also rescored DUD-2 with 7 other scoring functions, including ITScore38, 39, DrugScorePDB 30, FlexX108, PMF27, PLP109, ScreenScore32, and the all-atom energy function in PLOP30, 32, 38, 39, 110. We did not test ITScore/SE and DrugScoreCSD because they are not publicly available. However, ITScore and DrugScorePDB should perform similarly as ITScore/SE and DrugScoreCSD, in terms of ligand enrichment (personal communication with XQ. Zou and G. Klebe, respectively). FlexX, PMF, PLP, and ScreenScore were calculated by a re-implementation of the original scoring functions (kindly provided by M. Stahl).32
For each of the 8,885 native complex structures, the crystal ligands were docked to the binding site using DOCK 3.6.104, 105 Overall, ligand docking poses were generated for 7,215 targets (the remaining 1670 targets did not produce any ligand docking pose during the fully automated docking). Out of these 7,215 cases, 4,059 produced at least one near-native solution, while 6,895 produced poses with all-atom RMSD errors of more than 2 Å (random solutions).
The influence of incorporating geometric decoys into deriving a protein-ligand statistical potential111–113 was investigated (Table S3). The near-native complex structures and the random complex structures were considered during the evaluation of Eqn. 5, in three different ways: (i) keeping pref unchanged, the near-native structures were combined with the native structures and used to calculate pnat ; (ii) keeping pnat unchanged, the random structures were used to represent the reference state and used to calculated pref ; and (iii) using pnat from (i) and pref from (ii). Each resulting statistical potential was subjected to the same training process that was used for PoseScore and RankScore.
PoseScore and RankScore were derived, trained, and tested for target protein structures determined by crystallography. In realistic applications, comparative models are often used to represent the receptor structure in both docking and virtual screening.114–117 Thus, we also investigated the accuracy of statistical potentials on docking and screening against models of target proteins (Table S3).
The 170 protein structures from Astex_DOCK and Wang_AutoDock were structurally perturbed using MODELLER118 in the absence of the crystal ligand. For each protein, binding site residues that have atoms within 10 Å from the bound ligand were simulated by 100 steps of molecular dynamics with simulated annealing (MD-SA) in which the temperature was reduced from 400 K to 100 K, and 100 steps of conjugate gradient minimization (CG). All-atom RMSD errors of the resulting models ranged from 0.5 to 1.5 Å. These perturbed structures have been used in the past as proxies for comparative models.119 Next, ligand poses were generated in the modeled binding site for each of the 170 structures. Of the 70 structures in Astex_DOCK and the 100 structures in Wang_AutoDock, 60 (Astex_DOCKmodel) and 85 (Wang_AutoDockmodel) produced 100 ligand docking poses including at least one near-native solution, respectively. Astex_DOCKmodel and Wang_AutoDockmodel were used as the training set and the testing set, respectively. The optimal values of the three adjustable parameters rmax, wref, and wuni were determined as 6 Å, 0.7, and 0.1, with the aid of the Astex_DOCKmodel training set.
The 38 protein structures from DUD-1 and DUD-2 were perturbed using the same approach as described above (DUD-1model and DUD-2model, respectively). All-atom RMSD errors of 38 resulting models also ranged from 0.5 to 1.5 Å. All compounds in the DUD library were docked against each of the 38 models. DUD-1model and DUD-2model were used as the training set and testing set, respectively. The optimal values of rmax, wref, and wuni were determined as 6 Å, 0.5, and 0, respectively, with the aid of the DUD-1model training set.
To investigate the reference state for a protein-ligand statistical potential, an additional protein-ligand statistical potential was derived from the same sample of complex structures that were used for PoseScore, employing the formula described for the DFIRE potential83:
where Nobs (i, j,r) is the number of (i, j) pairs within the distance shell (r, r + Δr] observed in the X-ray structures used to generate PoseScore. Two approximations similar to those in DFIRE were made. First, the number of pairs of ideal gas points in a finite protein-ligand sphere is proportional to rα in Eqn. 11. Second, the potential has a finite interaction range rcut fixed at 14 Å. That is, for r > rcut, ū(i, j, r) 0. Differently from DFIRE, we set the bin width Δr to 0.1 Å. We then generated a statistical potential with distinct values for the exponent α including 1, 2, 3, 4, 5, and, 6. To exclude the influence of distance boundary, for each α value, 5 different values (6, 8, 10, 12, 14 Å) for the maximal boundary rmax, beyond which atom pairs were not considered during scoring, were tested on the Astex_DOCK set.
The trained PoseScore in which rmax, wref, and wuni were set to 6 Å, 0.3, and 0.3, respectively, was assessed by the Wang_AutoDock testing set of 100 protein-ligand complexes. The correct binding pose was detected for 88 (88%) targets, of which 70 were crystal structures (Table 2). Furthermore, a correct binding pose was ranked the best, top 5, and top 10 by PoseScore for 88 (88%), 97(97%), and 99 (99%) targets in the testing set, respectively (Table 3). To mimic realistic applications, only the geometric decoys of the ligands were scored, resulting in the correct binding pose identification for 63 (69%) targets.
Among all the scoring functions under test, ITScore/SE, PoseScore, and DrugScoreCSD performed better than other scoring functions, showing the success rate of 91%, 88%, and 87% in the identification of the correct binding pose, respectively. When the crystal structures of ligands were excluded from the test, PLP, PoseScore, and F-Score were the three best performing scoring functions, with the success rate of 70%, 69%, and 68%, respectively.
PoseScore detected the correct binding pose for 88 (88%) targets in the testing set: four examples are shown in Figure 2. The 12 failures were investigated in detail. Out of the 12, 9 and 11 cases had correct binding poses ranked in top 5 and top 10, respectively (Table 3). The 12 targets can be divided into four classes: (i) water molecules play an important role in ligand binding, including five targets 1cla, 3cla, 4cla, 1rgl, and 3tmn; (ii) the ligand and the receptor forms a transition-state complex, including three targets 1tlp, 1zzz, and 2sns; (iii) the ligand is located in the neighborhood of a cofactor or another ligand, including two targets 1dr1 and 1tha; and (iv) no particular feature in the binding site is found to be responsible for the failure, including two targets 1tni and 1tnj. Next, we discuss one example from each class.
1cla is the crystal structure of chloramphenicol acetyltransferase, determined at the resolution of 2.34 Å.120, 121 The binding pocket accommodates the substrate chloramphenicol and several ordered water molecules (Figure 3). The residues lining the pocket are predominantly hydrophobic. The substrate adopts an eclipsed conformation and forms direct hydrogen bonds only with the water molecules. These water molecules were not included during the generation of docking solutions of the substrate. As a consequence, the crystal structure of chloramphenicol was only ranked 9. The best ranking pose is in a staggered conformation and has an all-atom RMSD error of 10.3 Å.
1zzz is the crystal structure of trypsin with a peptidyl aldehyde inhibitor CVS1694, determined at the resolution of 1.90 Å.122 In the crystal complex, the guanidinopiperidyl group of CVS1694 makes water-bridged hydrogen bonds with Asp189 and Gly219. The carbonyl oxygen of the aldehyde group is hydrogen bonding with Gly193N and Ser195N, while the carbonyl carbon forms a tetrahedral intermediate with Ser195OG. A consequence of the latter interaction is the covalent-bonding distance for C-Ser195OG (1.8 Å). The glycine residue and the six-member lactam ring of the inhibitor make hydrogen bonds with Ser214-Gly216, holding this part of the inhibitor close to trypsin. This crystal structure of CVS1694 was ranked 2. The best ranking pose has an all-atom RMSD error of 3.10 Å, deviating from the crystal structure in its aldehyde and lactam groups.
1dr1 is the crystal structure of dihydrofolate reductase (DHFR), solved as a complex with NADP+ and biopterin at the resolution of 2.20 Å.123 The closely packed cofactor and the substrate interact at an angle of 45º. Water molecules also hydrogen bond to both hydroxyls in the dihydroxypropyl and the pteridine group of biopterin. The cofactor and these water molecules were not included during the generation of docking solutions of the substrate. A near-native solution of biopterin (RMSD 1.02 Å) was ranked 2, while the crystal structure was ranked 3. The best ranking pose has an all-atom RMSD error of 4.17 Å. The pteridine ring of the best ranking pose connects the substrate and the cofactor binding sites, and is perpendicular to the plane of the pteridine ring in the crystal structure. The 2-amino group of the pteridine ring still hydrogen bonds to Glu30, as it does in the crystal structure.
1tni and 1tnj are crystal structures of bovine trypsin, solved with phenylbutylamine (PBA) and phenylethylamine (PEA) at the resolution of 1.90 Å and 1.80 Å, respectively.124. The binding affinity of PEA to trypsin (Ki 11.0 mM) is higher than that of PBA (Ki 20.0 mM). In the crystal complexes, the amine group in both inhibitors forms a salt-bridge with Asp189 and a hydrogen bond to Gly219. The difference in the binding affinity between the two compounds could be due to the position of the benzene ring, which is more solvent exposed in PBA than in PEA. In the case of 1tni, the crystal structure was ranked 22. The best ranking pose has an all-atom RMSD error of 5.72 Å. The benzene ring of the best ranking pose was close to the equivalent group of PEA in 1tnj structure. However, the amine group in the best ranking pose pointed in the direction opposite to the crystal structure and failed to form a hydrogen bond to Gly219. Interestingly, in the case of 1tnj, the crystal structure was ranked 2 while the all-atom RMSD errors of the 1st and 3rd ranking poses are 2.29 and 1.51 Å, respectively (poses not shown). The best ranking pose was in the same orientation as in the crystal structure and formed a hydrogen bond to Gly219.
The trained RankScore was assessed by the DUD-2 testing set of 19 targets. For 13 (68%) targets, the rescoring enhanced the enrichment against the entire DUD library (logAUC), with respect to the original enrichment by DOCK (Table 4). Rescoring improved the average logAUC by 6.8. Another enrichment indicator - the enrichment factor at 1% of the ranked docking library (EF1)99 - was also measured because the early enrichment is particularly important in realistic applications. RankScore significantly improved EF1 for 14 (74%) targets (of which 12 had an increased logAUC, Table 5). In particular, the rank of the best-scored ligand was enhanced by RankScore for 16 (84%) targets.
RankScore was more accurate in the rescoring than 7 other tested scoring functions (Table 4). Of these 7 scoring functions, only rescoring by ITScore and FlexX improved the enrichment relative to DOCK (the average logAUC increased by 3.6 and 0.9, respectively). For the 19 targets, increased enrichment was observed in 9, 8, 6, 9, 7, 7, and 8 cases with ITScore, DrugScorePDB, FlexX, PMF, PLP, ScreenScore, and PLOP, respectively.
Rescoring by RankScore worsened both logAUC and EF1 for 4 targets, including thrombin, cyclooxygenase-2 (COX-2), AmpC β-lactamase (AmpC) and hydroxymethylglutaryl-CoA reductase (HMGR). In 3 Of the 4 cases, including thrombin, AmpC and HMGR, the best rank of ligands produced by RankScore is better than that by DOCK. For COX2, the crystal structure at 3 Å resolution (PDB code: 1cx2) was determined with SC-558, a selective COX-2 inhibitor. However, the sulfonamide group in SC-558 is only 1.3 Å away from the guanidinium ion of Arg513. Many docking poses of ligands were close to that of the crystal ligand but deviated from the position of the sulfonamide. For the other three targets, we discuss AmpC in detail.
In the DUD benchmark, the A chain in the dimeric structure of AmpC (1xgj) was used.125, 126 Ligand enrichments in logAUC are 47.4 and 10.3 units by DOCK and the rescoring method proposed here, respectively. However, the H10 helix and the α-helix close to the binding site are unstructured in the A chain, potentially explaining poor accuracy of RankScore in this case. We tested this hypothesis by using the B chain structure, in which these two helices are well defined. As a result, both the enrichment by DOCK and that by rescoring were improved, reflected in logAUC of 53.8 and 19.3, respectively. The B chain structure was solved as a complex with a thiophene-carboxylate derivative (HTC). The docking pose of HTC was ranked 628 and 14 by DOCK and by RankScore, respectively. Another ligand CTC differs from HTC in the 4-carboxylate benzene ring on which HTC has a 2-hydroxyl group but CTC has a 3-chloride group (Figure 4a). The binding affinity of HTC to AmpC (Ki 1.0 μM) is higher than that of CTC (Ki 1.9 μM). As shown in Figure 4b, the thiophene-carboxylate in HTC hydrogen bonded to Thr316, Asn346 and Arg349 in the primary carboxylate binding site. The sulfonamide oxygen hydrogen bonded to Ser64, Tyr150, and Ala318 in the oxyanion hole and the hydroxyl binding site. The hydroxyl group on the benzene-carboxylate ring hydrogen bonded to other active site residues (Lys67 and Asn152). However, another AmpC ligand CTC with similar chemical structure and docking pose as HTC, was ranked 786 by DOCK but only 52,978 by RankScore. In CTC, the 2-hydroxyl on the carboxylate benzene ring of HTC was replaced by the 3-chloride, which forced this part of CTC to move away from the neighboring residue Tyr221, resulting in weaker thiophene-carboxylate and sulfonamide hydrogen bonds.
The near-native and random solutions were considered in the derivation of protein-ligand statistical potential. The resulting statistical potentials were subjected to the same training process as PoseScore and RankScore.
For ligand enrichment, the trained potentials still improved the enrichment with respect to the original result by DOCK, but performed worse than RankScore (the average logAUC improved by 6.8): (i) keeping pref unchanged but incorporating the near-native structures in pnat, the trained potential with the three adjustable parameters rmax, wref, and wuni optimized to be 6 Å, 0.2, and 0 improved the average logAUC by 6.0; (ii) keeping pnat unchanged but computing pref from random structures, the trained potential with rmax, wref, and wuni optimized to be 6 Å, 0, and 0 improved the average logAUC by 2.2; and (iii) using pnat from (i) and pref from (ii), the trained potential with rmax, wref, and wuni optimized to be 6 Å, 0, and 0 improved the average logAUC by 2.8.
In contrast, for ligand pose prediction, the trained potential (PoseScoredock) that was computed with the same adjustable parameters as PoseScore but incorporated random docking solutions in pref, showed better accuracy than PoseScore. Of the 100 proteins in the testing set, PoseScoredock detected the correct binding pose for 90 (90%) targets, among which 70 (70%) are crystal structures (88% and 70% by PoseScore, respectively). When the crystal structures were omitted, the near-native solution was identified for 67 (74%) targets (69% by PoseScore). The results of PoseScore and PoseScoredock are probably robust because both potentials were trained and assessed based on decoys constructed by different docking programs.
The trained potential (PoseScoremodel) detected the near-native solution for 66 (78%) targets in the Wang_AutoDockmodel testing set (69% by PoseScore for Wang_AutoDock). In comparison to Wang_AutoDock, more near-native solutions were included for 60 targets, 48 of which (80%) had the correct binding pose detected. Equal and less near-native solutions were included for 2 and 23 targets, respectively, among which the correct binding pose was detected for 18 (72%) targets. Clearly, the improvement in the accuracy is partially due to a better sampling of near-native solutions. Meanwhile, the perturbation applied to the original X-ray structures also contributed to the higher success rate. As an example, in both Wang_AutoDock and Wang_AutoDockmodel testing sets, the target 1bra contained 15 near-native solutions among 100 geometric decoys with the minimal RMSD error of 0.6 and 1.2 Å, respectively. In the crystal structure, the acetamidine group in the ligand benzamidine hydrogen bonded with Ser190 backbone, Gly219 backbone, and Asp226 sidechain. A docking pose of 1.8 Å RMSD was ranked the best when scoring against the crystal structure of 1bra, because many near-native poses were in different orientations favoring hydrogen bonds with Asp226 and/or Ser190. In the modeled structure of 1bra, the Asp226 sidechain became closer to the binding site, associated with the Ser190 backbone moving away. The ligand could form hydrogen bonding network with this binding site configuration, in an orientation close to that in the crystal structure. The docking pose with the minimal RMSD of 1.2 Å was selected.
An improved enrichment (logAUC) was observed for 10 targets in the DUD-2model testing set. For 3 and 6 targets, the rescoring enrichment was comparable to and lower than that by DOCK, respectively. Rescoring by RankScoremodel improved the average logAUC by 1.6. Interestingly, for three targets, dihydrofolate reductase (DHFR), glycinamide ribonucleotide transformylase (GART), and thrombin, the logAUC by DOCK using the modeled structure was significantly higher by 40.1, 21.5, and 24.1 than that using the ligand-bound crystal structure, respectively.
The protein-ligand statistical potential with the DFIRE reference state showed the best accuracy when rmax was 6 Å and α was 3 (Figure 5). The correct binding pose of the ligand, either the crystal structure or a docking pose with an all-atom RMSD error ≤ 2.0 Å, was detected for 54 (77%) targets. When the crystal structures of ligands were excluded from the training set, the correct binding pose was detected for 49 (70%) targets. For the 5 rmax values tested, the statistical potential always showed the maximal accuracy with an α value of 3 and/or 4.
Two key results emerge from this study. First, two different statistical potentials can be derived by statistical analysis of a database of known protein-ligand complex structures: PoseScore for ligand pose prediction, and RankScore for ligand discovery in virtual screeing. Second, PoseScore is as accurate as DrugScoreCSD and ITScore/SE in detecting the native structure of a protein-ligand complex, and superior to 12 other scoring functions tested; RankScore is more accurate than 7 other scoring functions in discriminating between true ligands and nonbinders.
We address three points here. First, we compare the distance distributions of atom pairs in protein-ligand complexes to those in proteins, and discuss the dataset used to derive the reference state. Second, we discuss the optimal parameters used in PoseScore and RankScore, including the maximal distance boundary rmax and the atom types. Finally, we suggest possible improvements of PoseScore and RankScore.
Proteins are finite systems. For a folded protein in the ligand-free state, the reference distribution should not increase in r2 as in an infinite system, but in rα, where α is smaller than 2. The optimal value of α was found empirically to be approximately 1.6 for protein structures.83 In protein-ligand complexes, however, each ligand atom is partly surrounded by other ligand atoms. As a result, the protein-ligand distance distributions are expected to be different from those for protein structures alone. In Figure 6, the protein is approximated as the outer sphere (solid line) and the ligand is completely embedded inside the protein as the inner sphere with a radius r. All protein-ligand pairwise interactions within the maximal distance Rcutoff of the ligand atom only occur inside the protein sphere. Thus, for the ligand atom positioned at distance d from the ligand center, the number of protein-ligand atom pairs within distance R (R ≤ Rcutoff ) is proportional to the partial volume V of the sphere with a radius R centered on the ligand atom that is not taken by the ligand:
The expected number of protein-ligand atom pairs in the distance bin (R, R + ΔR] is proportional to the derivative of V with respect to R. For distance R not larger than d + r, the distribution of atom pairs increases faster than R3. Clearly, there are additional geometrical arrangements, depending on the size of the protein, the size of the ligand, and the location of the ligand relative to the protein. Furthermore, the shapes of both protein and ligand are generally not ideal spheres. Nevertheless, this simplified example suggests that the number of protein-ligand atom pairs could increase much faster with distance R than the number of protein atom pairs does. The reference distributions observed in the X-ray and docking-generated structures of protein-ligand complexes (Figure 7) are better approximated by the distribution in eqn. 11 with an α value of 3 (not 2), thus supporting our hypothesis. Our approach to defining a reference state might be applicable for other multi-component systems, such as protein-peptide, protein-nucleotide, and even protein-protein interfaces.
In general, the goal of scoring functions for structure prediction is to distinguish the native from non-native states. The reference state used in a statistical potential should maximize this discrimination. Therefore, the choice of the optimal reference state depends on the native and non-native states. In particular, we hypothesize that a protein-ligand statistical potential for distinguishing ligand poses with an RMSD error of less than 2 Å (correct binding pose) from other poses (random solutions) should depend on a reference state that is maximally different from the native state and maximally similar to random solutions. For ligand pose prediction, a reference state computed from random solutions should improve the accuracy of the statistical potential, in comparison to the reference state computed from a sample of native complex structures with the same approach. PoseScoredock is more accurate than PoseScore, thus supporting our hypothesis. Similarly, for ligand discovery by virtual screening, a reference state including information about non-binders should improve a statistical potential relative to the reference state from true ligands alone. Therefore, it was not surprising that RankScoredock using the reference state derived from docking poses of true ligands did not show better accuracy than RankScore.
Different values had been used for the maximal distance boundary in previously developed protein-ligand statistical potentials. A distance bound of 12 Å was used in the potential of mean force (PMF) compiled from 697 complex structures in PDB, in conjunction with a correction term regarding the volume occupied by the ligand to incorporate solvent effects in the pairwise potential.27 In comparison to PMF, a shorter distance cutoff of 6 Å was used in several recently developed protein-ligand statistical potentials, including DrugScorePDB 30, DrugScoreCSD 35, ITScore38 and ITScore/SE40 that were based on distinct structure samples and solvation models. In this study, the distance cutoff was subjected to the optimization with the aid of a training set, for both ligand pose prediction and virtual screening. In both cases, the optimal value was found to be 6 Å. This distance cutoff is much smaller than those used in protein statistical potentials (e.g., 20 Å for RAPDF70, 14.5 Å for DFIRE83, and 15 Å for DOPE100). This result may indicate that non-covalent protein-ligand interactions are dominated by the specific interactions formed between a ligand and the first shell of the protein residues outlining the binding site pocket; examples of such interactions include hydrogen bonds and salt bridges.127
An atom type classification is needed for generating atom pair distributions. 16 protein atom types and 34 ligand atom types were employed in PMF. 17 and 18 atom types for both protein and ligand were employed in DrugScorePDB and DrugScoreCSD, respectively. 26 and 27 atom types for both protein and ligand were employed in ITScore and ITScore/SE, respectively. These atom types reflect both the type of an atom as well as its environment, similar to the Sybyl atom type.
For proteins, 158 residue-dependent atom types that were used in DOPE100 statistical potential were tested in this study. In addition, we explored combining the 158 protein atom types, to obtain a smaller number of protein atom classes and thus improve the statistical robustness (and therefore accuracy) of the resulting statistical potential. Two protein atom types could be combined without loss of information if each one of the 26 protein-ligand atom distance distribution comparisons had a sufficiently small χ2 (tail probability p = 0.05 ).128 Surprisingly, for the maximal distance bound of 6 Å, only 16 pairs of protein atom types were nearly identical to each other (Table S4). Therefore, we did not combine the individual protein atom types. For ligands, 26 atom types were used, based on the Sybyl atom type classification (Table 1). Although the chemical space of small molecules is much more diverse than that of the 20 standard amino acid residues, we did not explore a more fine-grained atom type classification for practical reasons.
For 12 of the 100 benchmark targets, a correct binding pose was not identified by PoseScore as the best scoring pose. For 10 of the 12 targets, the assessment of binding between protein and ligand was affected by the lack of considering crystal water molecules, a cofactor, or the transition state. This observation indicates that the current protein-ligand statistical potential could be improved by considering ligand-water interactions, ligand-cofactor interactions, and transition states. Currently, the derivation of statistical potentials that explicitly consider ligand-water and ligand-cofactor interactions is limited by the size and accuracy of the sample of known complex structures. The accuracy of a statistical potential that is derived from a sample of experimental structures clearly depends on the resolution of the experimental structures. More accurate structures result in a more accurate statistical potential, all other things being equal. In this study, we derived the statistical potentials from 2.5 Å structures. We also tested a resolution cutoff of 2 Å. In comparison to PoseScore and RankScore, the number of available structures decreased from 8885 to 5353, and the newly derived statistical potentials performed worse (data not shown). Clearly, the disadvantage of a smaller sample outweighed the advantage of more accurate structures in the sample, a problem that would only be larger for statistical potentials for ligand-water and ligand-cofactor interactions. This dilemma may be at least partially solved by deriving a statistical potential from small molecule crystals in CSD, which provides a rich source of interaction geometries for small molecules at high resolution (292,539 structures with R-factor < 0.050). One successful example of such a statistical potential is DrugScoreCSD that showed a substantial improvement compared to DrugScorePDB in the recognition of near-native ligand binding poses.35
As shown in eqn. 4, the statistical potential for the protein-ligand complex should be a sum of three terms, including those for the intramolecular protein interactions, intramolecular ligand interactions, and intermolecular protein-ligand interactions. In this study, we only developed statistical potentials for the protein-ligand intermolecular interactions. More accurate scoring could be achieved by combining PoseScore/RankScore with other scoring functions such as the DOPE potential that measure the intramolecular protein interactions.
We conclude by returning to the four questions posed in Introduction.
Yes. We developed two statistical potentials from a sample of X-ray structures of protein-ligand complexes: PoseScore was optimized for distinguish correct binding poses from geometric decoys and RankScore was optimized for distinguishing ligands from screening decoys. The reference states of PoseScore and RankScore are different, because of the differences in the values of three adjustable parameters. PoseScore scored a correct binding pose best among 100 decoys for 88% of all cases in the benchmark set containing 100 protein-ligand complexes. Furthermore, a correct binding pose was ranked the best, top 5, and top 10 by PoseScore for 88 (88%), 97 (97%), and 99 (99%) targets in the testing set, respectively. RankScore improved the ligand enrichment (logAUC) and early enrichment (EF1) by rescoring the results by DOCK for 13 and 14 targets, respectively. Furthermore, RankScore ranked at least one annotated ligand within the top 500 scored compounds for all targets.
Yes. PoseScoredock in which the reference state was computed from geometric decoys that had all-atom RMSD errors of more than 2 Å from crystal binding poses (random solution) showed higher accuracy (74%) in detecting near-native solutions from geometric decoys, in comparison to PoseScore (69%). However, the performance of RankScore was not improved by including the geometric decoys.
For ligand pose prediction, PoseScoremodel showed higher accuracy (78%) in detecting near-native solutions from geometric decoys, in comparison to PoseScore (69%) and PoseScoredock (74%). For ligand enrichment, RankScoremodel also improved the average logAUC by 1.6, relative to DOCK. However, this improvement was less than that by RankScore (6.8) applied to complexes with crystal structures.
Proteins are finite systems. Therefore, for a folded protein in the ligand-free state, the reference distribution should not increase in r2 as in an infinite system, but in rα where α is smaller than < 2. The optimal value of α was found empirically to be approximately 1.6. In contrast, in protein-ligand complexes, each ligand atom is partly surrounded by other ligand atoms. As a result, the number of protein-ligand atom pairs should increase faster with the distance r than the number of protein atom pairs. Correspondingly, the optimal value of α was found empirically to be around 3.
We thank Dr. Shaomeng Wang for the geometry test set, Dr. Xiaoqin Zou for ITScore, Dr. Gerhard Klebe for DrugScorePDB, Michael Mysinger and Yiqun Cao for discussions about the DUD database and the automated docking pipeline, Dr. Minyi Shen and Dr. Jiang Zhu for discussion about the theory used for computing statistical potentials, and Dr. Daniel Russel and Keren Lasker for implementation of the scoring function in IMP. We are supported by National Institutes of Health grants (GM71790, GM54762, and GM093342 to AS, GM59957 and GM71896 to BKS). We are also grateful to Ron Conway, Mike Homer, Hewlett-Packard, IBM, NetApp, and Intel for computing hardware gifts.
The list of PDB codes of 70 protein-ligand X-ray complexes used to train the PoseScore, the list of PDB codes of 19 protein-ligand X-ray complexes in DUD benchmark (DUD-1) used to train the RankScore, and the list of PDB codes of 19 protein-ligand X-ray complexes in DUD benchmark (DUD-2) used to test the RankScore. This information is available free of charge via the Internet at http://pubs.acs.org.