|Home | About | Journals | Submit | Contact Us | Français|
Flexible docking and scoring using the Internal Coordinate Mechanics software (ICM) was benchmarked for ligand binding mode prediction against the 85 co-crystal structures in the modified Astex data set. The ICM virtual ligand screening was tested against the 40 DUD target benchmarks and 11-target WOMBAT sets. The self-docking accuracy was evaluated for the top 1 and top 3 scoring poses at each ligand binding site with near native conformations below 2 Å RMSD found in 91% and 95% of the predictions, respectively. The virtual ligand screening using single rigid pocket conformations provided the median area under the ROC curves equal to 69.4 with 22.0% true positives recovered at 2% false positive rate. Significant improvements up to ROC AUC= 82.2 and ROC(2%)= 45.2 were achieved following our best practices for flexible pocket refinement and out-of-pocket binding rescore. The virtual screening can be further improved by considering multiple conformations of the target.
Over the past decade the number of protein structures solved by X-ray crystallography increased by more than four times allowing the use of atomic details derived from high resolution target structures as a standard procedure in many drug discovery projects. Structure-based drug design is typically used at several stages of the drug discovery and development pipeline, such as for the virtual ligand screening (VLS) of large electronic compound databases and initial hit identification, and for the lead optimization of the most promising drug candidates into new potent, selective and drug-like chemical entities . Docking-based methods rely on computer algorithms to generate possible small molecule binding modes within a protein pocket and appropriate scoring functions to estimate the strength of the protein-ligand interaction. Binding poses generated at each docking run sample translational, rotational and conformational degrees of freedom of the ligand (semi-flexible docking), and in some cases, additional conformational degrees of freedom of protein residues within the ligand binding site (fully-flexible docking). The multiple ligand poses are then ranked by a scoring function in an attempt to identify the most energetically favorable protein-bound conformation. Virtual screening of multiple ligands to the same protein site uses docking and scoring to generate a ranked list of compounds. While most software packages are able to predict experimental poses with reasonable accuracy, binding affinity predictions for a diverse set of molecules followed by ranking their predicted potencies is a much more challenging endeavor. The reasons behind this scoring problem have been well described in recent reviews [2, 3], and typically pointed out to protein flexibility and induced-fit upon ligand binding , oversimplification of energy terms commonly used in the scoring functions such as solvation  and entropy contributions , and specific non-covalent interactions not commonly present in scoring functions such as cation-π , n→π*  and weak hydrogen bonds . Prediction of the most stable tautomeric states for the ligands , water-mediated hydrogen-bonds , metal coordination [12, 13] and identification of pKa-based protonation microspecies  are additional challenges that need to be solved before further progress can be made in the field.
Because of the multiple approximations used in docking and scoring, benchmarking software packages is critically important in structure-based drug discovery. New algorithms are typically trained against a large number of high quality co-crystal structures collected from the PDB database  and evaluated for their ability to reproduce known binding modes. On the other hand, retrospective virtual screening benchmarks use test sets with annotated known binders for particular targets and presumed inactive decoys. Graphical representation of the true positive rate versus the false positive rate in receiver operating characteristic (ROC) plots is assumed to provide a measure of both the software performance and the ligand binding pocket quality for virtual screening purposes. Because the top scoring compounds will most likely be selected for further testing, early enrichment measures are important to identify successful VLS runs where active compounds are ranked ahead of decoys on a large score list .
The Internal Coordinate Mechanics (ICM) method has been extensively validated in bioinformatics and drug discovery projects. Prospective and retrospective studies demonstrate that ICM is able not only to reconstitute the most critical protein-ligand contacts, but also to successfully identify high affinity ligands for the most important classes of targets in drug discovery, such as enzymes [17–24], receptors [25–30], ion channels [31, 32] and transport proteins [33–35]. As part of the programmatic theme, “Docking and Scoring: A Review of Docking Programs”, ICM was benchmarked for self-docking accuracy and virtual ligand screening using the Astex , DUD  and WOMBAT  test sets, and the benchmarking results, current developments and future prospects were presented during a symposium at the 241st ACS National Meeting held in March 2011 in Anaheim, CA. This article summarizes the results discussed in Anaheim, adds calculation details, and proposes how to improve the performance further using our best practices.
Flexible ligand docking with the ICM software uses Monte Carlo simulations to globally optimize a set of ligand internal coordinates in the space of grid potential maps calculated for the protein pocket , according to the following procedure: (1) a random move is introduced to one of the rotational, translational or conformational variables of the ligand within the binding pocket; (2) differentiable terms of the energy function are minimized; (3) desolvation energy is calculated; (4) the Metropolis selection criterion is used to accept or reject the final minimized conformation  and the procedure is repeated until the maximal number of steps is achieved. An adaptive algorithm is used to determine the maximal number of steps, according to the number of rotatable bonds in the ligand multiplied by a user-defined thoroughness value. The thoroughness value represents the user-defined multiplier by which the automatically determined Monte Carlo run length is extended. Ligand sampling in a set of pre-calculated grid maps accounting for hydrogen bonding potential, van der Waals potential with carbon-, sulphur- and hydrogen-like probes, hydrophobic potential and electrostatic potential, significantly reduces the time required for calculation. The maps are generated in a rectangular box with 0.5 Å grid spacing centered at the ligand binding site. Each molecule is first submitted to a conformational analysis outside of the protein pocket and a stack of low energy conformations is collected and used as starting geometries for the grid docking. Ligand binding modes are scored according to the quality of the complex and a user-defined number of the top scoring poses is re-ranked using the full ICM scoring function. The predicted score is calculated as the weighted (α1 to α5) sum of ligand-target van der Waals interactions and internal force field energy of the ligand (ΔEIntFF), free energy changes due to conformational energy loss upon ligand binding (TΔSTor), hydrogen bonding interactions (ΔEHBond), hydrogen bond donor-acceptor desolvation energy (ΔEHBDesol), solvation electrostatic energy upon ligand binding (ΔESolEl), hydrophobic free energy gain (ΔEHPhob) and a size correction term proportional to the number of ligand atoms (QSize) [28, 41, 42]:
The Astex diverse set provides a collection of 85 high-resolution protein structures from representative target families co-crystalized with drug-like small molecules . The files used in this study were provided by the thematic ACS meeting organizers and prepared for docking with ICM version 3.7-2b using a small script written in the ICM language. For each target:
The directory of useful decoys (DUD), a benchmarking test set freely available on the internet (http://dud.docking.org/) , covers 40 different targets with 3950 active ligands and 36 decoys per active. Decoys have similar molecular weight, number of hydrogen bonding groups, logP and number of rotatable bonds compared to the active compounds, but their molecular topologies are different. DUD is currently the largest, and one of the most challenging test sets for virtual ligand screening benchmarks, however, because a significant number of the true binders are very close analogs, it has limited usefulness for scaffold hopping evaluations . The WOMBAT data sets derived from DUD by filtering non-lead-like compounds, clustering unique chemotypes and expanding the original test sets with additional active compounds, provide an alternative benchmark for a sub-set of 11 DUD targets .
Virtual ligand screening performance using the ICM method was evaluated with modified versions of the DUD and WOMBAT test sets provided by the ACS meeting organizers. Protein targets were prepared for docking using a similar procedure described above. Ligand binding pockets were initially defined as a rectangular box centered on a known co-crystal ligand, provided with the test set, and extended additional 4 Å in any direction. However, given the large diversity of ligands and protein pockets, box dimensions were subsequently adjusted manually after visual inspection of the protein-ligand complex and the list of known binders. For each target, active compounds and decoys were grouped into an annotated sdf file and used as provided. Semi-randomized pairings of small molecules and targets were used to detect potential chemical bias on the DUD test sets that might contribute for discrimination based on anything other than protein-ligand contacts. Target sites of approximately the same size but different families were matched as described by McGann  and their ligands included in the annotated sdf file for docking. For example, cyclooxygenase 2 actives and decoys were docked against neuraminidase, and vice-versa. WOMBAT ligands for a subset of 11 DUD targets were also included in this study. Three independent docking runs were performed with a thoroughness value set to 1 and the top 5 best quality complexes rescored. The top scoring pose was selected for each compound.
For each target the true positive rate was plotted as a function of the false positive rate for all positions of the ranked score list. Binding pockets with perfect discrimination, i.e. scoring all true positives at the top ranked positions, have ROC plots that pass through the upper left corner and area under the curve (AUC) equal to 100. Therefore the higher the AUC value in a ROC curve, the better the discrimination. Because successful VLS ranks active compounds early on a large score list, the fraction of actives recovered at 0.1%, 1% and 2% decoys recovered (abbreviated to ROC(0.1%), ROC(1%) and ROC(2%)) were used in this study as early recognition metrics. AUCs, early enrichment values and statistical analysis were calculated with appropriate ICM macros. Because there are no decoys available with the WOMBAT test sets, enrichment studies were performed using DUD decoys derived for the same targets. This approach allows comparison between benchmarks of different software packages but introduces some chemical bias to the results because, contrarily to DUD, WOMBAT actives are predominately lead-like molecules with lower molecular weights and logP, as well as less hydrogen bonding groups.
Virtual ligand screening with ICM was further improved using our best practices for binding rescore and induced fit analysis. The default ICM scores were rescored using predicted out-of-pocket protein-ligand contacts according to the following penalty function:
Near-native interactions important for productive binding were defined as the superset of contacts between the top 10 scoring actives (Ctop10) and protein atoms within a 3.5 Å cutoff. Contacts beyond this distance were considered to be out-of-pocket. For each ligand in the test set, the total number of contacts within a 3.5 Å cutoff (Cligand) was compared with Ctop10 to derive a subset of common protein-ligand interactions (Ctop10 ∩ Cligand). The fraction between common contacts and total contacts was then used in a function to increase the original ICM score whenever the molecule binds extensively to a new surrounding area. However, the penalty applied gradually decreases as the overlap with binding modes of known actives increases. Because large compounds have higher propensity to occupy additional surrounding areas of the pocket, a scaling factor was introduced based on the fraction between the average number of non-hydrogen atoms of active compounds in the test set (Āactives) and the number of non-hydrogen atoms of each ligand in the test set (Aligand). Therefore, the function assigns a larger penalty for out-of-pocket binding of small ligands than compounds with more heavy atoms.
Induced fit upon small molecule binding was simulated by fully-flexible ligand and protein docking. To this purpose, crystallographic coordinates of a single co-crystal ligand per target, provided with the DUD test sets, were used. For each target, residues with side chain atoms within a 5 Å cutoff from the seed ligand were allowed to randomly move using the ICM Biased Probability Monte Carlo algorithm, followed by full local energy minimization . Geometrically diverse low-energy conformations were saved in a conformational stack. The number of alternative conformations generated during this procedure was largely dependent on the pocket plasticity. The maximal number of structures in the conformational stack was set to 300 by adjusting the maximum angular RMSD per variable when two structures are still considered belonging to the same cluster. Calculations took less than 60 minutes per pocket in all cases using a Linux workstation (Intel Core i7 Processor 3.07 GHz, 12 GB RAM) running Fedora 12. The files were prepared for docking using the VLS procedure described before. For each new conformation, grid potential maps were calculated and three independent docking runs were performed with a thoroughness value set to 1 and the top 5 best quality complexes rescored.
The 85 X-ray co-crystal structures used in this study provide a representative ensemble of non-redundant ligand binding pockets for self-dock benchmarks. Detailed statistical analysis on ligand binding accuracy using the ICM method is provided in Table 1, whereas individual RMSD values for predictions on every ligand binding site are shown in Fig. 1.
ICM was found to predict the top 1 scoring poses below 2 Å RMSD in 91% of the sites with an average RMSD of 0.91 Å (median= 0.54 Å). Predictions below 1 Å and below 0.5 Å were found in 78% and 43% of the cases, respectively. When the lowest RMSD value among the top 3 solutions in the ranked score list was considered, near native conformations below 2 Å RMSD were found in 95% of the sites with an average RMSD value of 0.67 Å (median= 0.48 Å). In all cases, the highest RMSD prediction among the top 3 scoring poses was below 3.8 Å.
The top 1 scoring solutions provided the lowest RMSD among the top 3 in 76% of the sites. However, with PDB entries 1jje, 1gpk, 1sq5, 1gm8 and 1meh, illustrated in Fig. 2, predicted binding modes considering the top 3 ensemble provided geometries significantly closer to the co-crystal ligand. PDB entry 1jje (top 1 RMSD= 8.2 Å, top 3 RMSD= 0.4 Å, Fig. 2A) provided the highest RMSD value for the top 1 predictions in the whole set. Visual inspection of the protein-ligand complex revealed a 180° flip where most native contacts are captured. Because the ligand is highly symmetric, benzyl moieties and the succinyl acid group are perfectly aligned with the co-crystal structure. Furthermore, analysis of crystal packing interactions revealed the presence of charged residues Lys8 and Asp10 in the ligand binding pocket vicinity with stabilizing effect over the co-crystal binding mode. Highly symmetric protein-ligand hydrophobic interactions are also involved in a 180° flip found with the top 1 prediction of PDB entry 1gpk (top 1 RMSD= 3.6 Å, top 3 RMSD= 0.3 Å, Fig. 2B). The co-crystal ligand is additionally stabilized by a water-mediated hydrogen bond involving the amide nitrogen, Glu199 and Gly117. Water mediated hydrogen bonds between co-crystal ligands and at least one protein residue are also found in PDB entries 1gm8 (top 1 RMSD= 3.0 Å, top 3 RMSD= 1.5 Å, Fig. 2C) and 1meh (top 1 RMSD= 2.4 Å, top 3 RMSD= 0.6 Å, Fig. 2D). The β-lactam ring in PDB entry 1gm8 interacts with Ser386 through a water-mediated hydrogen bond and the contact is correctly predicted with the top 1 scoring pose. However, because water molecules were excluded from the calculation, the penicillin core is translated by approximately 3 Å (Fig. 2C). On the other hand, the heterocyclic moiety of mycophenolic acid in PDB entry 1meh is correctly predicted but the highly flexible aliphatic chain adopts an alternative conformation. The native binding mode is stabilized by hydrogen bonding contacts between the carboxylic acid and Ser263, and a water-mediated hydrogen bond involving Asp261. For this ligand ICM predicts a non-native charged interaction with Arg414 (Fig. 2D).
In a few cases ICM failed to predict the co-crystal binding mode below 2 Å RMSD within the top 3 scoring poses. PDB entry 1jd0 is predicted at 3.8 Å RMSD because of an alternative bi-dentate metal coordination geometry with zinc between the nitrogens of the sulfonamide and the thiadiazole ring (Fig. 2E). Furthermore, the native binding mode is stabilized by a water-mediated hydrogen bond between the thiadiazole ring and Pro201. Binding to multiple sites of PDB entry 1hvy is predicted at RMSD values between 2.2 Å and 2.8 Å. ICM provides good predictions for the heterocyclic ring moieties but the highly flexible aliphatic chain, stabilized in the X-ray co-crystal structure by a complex network of water-mediated hydrogen bonds between the terminal carboxylic acids and Lys77, Leu221, Ile307 and Met309, adopts an alternative conformation (Fig. 2F). Similar self-docking inaccuracies have been reported with PDB entries 1jje, 1hvy, 1gm8, 1jd0 and 1sq5 using alternative docking and scoring methods [36, 47–49].
ICM predicted similar binding poses for multiple sites of the same ligand, providing final docked conformations independent of the initial site selection. The average difference between the highest and the lowest RMSD predictions for multiple sites of the same ligand was 0.4 Å, and only 3 cases of RMSD differences above 2 Å were found, i.e. PDB entries 1w1p, 1tz8 and 1sq5 (Fig. 2). Out of two sites in PDB entry 1w1p, only one ICM prediction matches to the co-crystal binding mode. Visual inspection revealed the presence of glycerol molecules in only one of the pockets, establishing additional favorable interactions that correctly align the ligand, such as a hydrogen bond between the amide nitrogen and one of the primary glycerol hydroxyls and hydrophobic contacts between the ligand and the aliphatic chain of the second glycerol. The alternative binding mode predicted for the second site corresponds to a 180° flip (RMSD= 2.9 Å, Fig. 2G) that is fully compatible with the available electron density. Ligand binding to PBD entry 1tz8 was predicted at 2.9 Å RMSD in one out of 3 possible sites. However, analysis of symmetry related neighbors revealed an alternative ligand binding conformation that is predicted as top 1 scoring pose (RMSD= 1.3 Å, Fig. 2H). One out of four sites in PDB entry 1sq5 is incorrectly predicted at the top 1 scoring pose (top 1 RMSD= 3.3 Å, top 3 RMSD= 0.7 Å). Residue conformations are well conserved among different sites but a slight rotation of the Cα-Cβ bond in His177 causes a small displacement of the imidazole ring (0.4 Å translation of the ε2 nitrogen) that favors a strong non-native hydrogen bond with the top 1 prediction.
In all cases ICM provided identical binding mode predictions below 1.5 Å RMSD when alternative side chain conformations for residues located within the binding pocket were considered. On average, selecting the alternative A provided slightly better predictions than alternative B or any combination of alternatives A and B for cases where two variable residues were present.
Alternative binding modes available for ligands in PDB entries 1ig3, 1sg0 and 1tz8 were predicted below 2 Å RMSD within the top 3 scoring poses, as described previously for 1tz8 (Fig. 2H). This result highlights the good sampling performance of the ICM docking algorithm.
Virtual screening with the ICM method was benchmarked against 40 protein targets using single ligand binding site conformations and the DUD and WOMBAT test sets. Statistical analyses of the results obtained are provided in Table 2, Table 3 and Table S1. Individual values of ROC AUC, ROC(0.1%), ROC(1%) and ROC(2%) are reported in Fig. 3, Fig. 4 and Fig. S1.
Docking DUD test sets against the original ligand binding site coordinates provided by the ACS meeting organizers and scoring with the default ICM function provided good results for a significant number of targets. On average, the AUC in linear ROC plots was 71.6 (median= 69.4, Table 2A). However, individual performances were largely target-dependent (Fig. 3A.). Neuraminidase (na, AUC= 95.8), peroxisome proliferator activated receptor γ (ppar, AUC= 94.5), epidermal growth factor receptor kinase (egfr, AUC= 92.9), trypsin (AUC= 92.6) and glycinamide ribonucleotide transformylase (gart, AUC= 92.1) were among the best performing targets, whereas platelet derived growth factor receptor kinase (pdgfrb, AUC= 27.0), P38 mitogen activated protein kinase (p38, AUC= 41.4), glucocorticoid receptor (gr, AUC= 41.8) and fibroblast growth factor receptor kinase (fgfr1, AUC= 46.7) performed close to, or worse than random (Fig. 3A). As expected, semi-randomized target/ligand test sets for the null hypothesis testing provided ROC AUC values close to random (AUCnull= 46.7 on average, median= 43.4) suggesting minimal chemical bias for most of the test sets. However, exceptions were found when dihydrofolate reductase (dhfr) was used to dock glycinamide ribonucleotide transformylase molecules (gart), when the mineralocorticoid receptor (mr) was used to dock estrogen receptor agonists (er_agonist) and when thrombin was used to dock factor Xa compounds (fxa). These 3 cases provided abnormally high ROC AUCnull values of 85.1, 83.1 and 87.4, respectively (Fig. 3A). Factor Xa and thrombin are trypsin-like serine proteases with similar catalytic domains and many ligands are cross-active indeed [50, 51]. On the other hand, the nuclear receptors for estrogens and mineralocorticoids bind to a similar steroid scaffold.
True positive rates ROC(0.1%), ROC(1%) and ROC(2%) at 0.1%, 1% and 2% false positive rates were calculated as a metric of early enrichment. On average the ICM method provided good early enrichments identifying 7.3%, 21.0% and 26.6% of true positives (median= 3.8%, 14.8% and 22.0%), respectively, using the original pocket coordinates and the default scoring method (Table 2A). Also in this case, individual values were highly target-dependent (Fig. 3A), being maximal with the peroxisome proliferator activated receptor γ (ppar, ROC(2%)= 77.7%), neuraminidase (na, ROC(2%)= 77.6%) and glycogen phosphorylase β (gpb, ROC(2%)= 65.2%) and minimal with thymidine kinase (tk, ROC(2%)= 0%), human heat shock protein 90 (hsp90, ROC(2%)= 2.7%) and acetylcholine esterase (ache, ROC(2%)= 2.8%).
Given the fast computation time, rescoring default docking results is a popular approach to improve chemical recognition. Here we propose a simple rescoring approach based on a soft out-of-pocket penalty described in detail in the experimental section. Improvements in ROC AUC and early enrichment measures were found with most targets (Table S1A and Fig. S1A), such as with acetylcholine esterase (ache, AUCinitial= 67.8, AUCrescored= 77.8, ROC(2%)initial= 2.8%, ROC(2%)rescored= 49.5%), adenosine deaminase (ada, AUCinitial= 50.3, AUCrescored= 72.3, ROC(2%)initial= 7.7%, ROC(2%)rescored= 25.6%), poly(ADP-ribose) polymerase (parp, AUCinitial= 84.6, AUCrescored= 90.5, ROC(2%)initial= 17.1%, ROC(2%)rescored= 57.1%) and thymidine kinase (tk, AUCinitial= 74.7, AUCrescored= 80.1, ROC(2%)initial= 0%, ROC(2%)rescored= 36.4%).
Energy-based refinement accounting for pocket plasticity upon ligand binding provided consistent improvements for most DUD targets in terms of ROC AUC and early enrichments (Table S1B and Fig. S1B). Examples include catechol-O-methyltransferase (comt, AUCinitial= 83.0, AUCrefined= 85.7, ROC(2%)initial= 9.1%, ROC(2%)refined= 54.6%), cyclooxygenase 2 (cox2, AUCinitial= 75.1, AUCrefined= 82.1, ROC(2%)initial= 7.0%, ROC(2%)refined= 38.5%), fibroblast growth factor receptor kinase (fgfr1, AUCinitial= 46.7, AUCrefined= 73.2, ROC(2%)initial= 7.5%, ROC(2%)refined= 30.0%), poly(ADP-ribose) polymerase (parp, AUCinitial= 84.6, AUCrefined= 92.9, ROC(2%)initial= 17.1%, ROC(2%)refined= 77.1%), progesterone receptor (pr, AUCinitial= 62.0, AUCrefined= 68.7, ROC(2%)initial= 11.1%, ROC(2%)refined= 37.0%), tyrosine kinase SRC (src, AUCinitial= 69.3, AUCrefined= 82.7, ROC(2%)initial= 13.8%, ROC(2%)refined= 44.0%) and thymidine kinase (tk, AUCinitial= 74.7, AUCrefined= 79.7, ROC(2%)initial= 0%, ROC(2%)refined= 54.6%). Because a single seed ligand was used for the induced fit modeling of each target (co-crystal ligand provided by the ACS meeting organizers), the conformational sampling was somehow limited. Better performances are expected for this method if more co-crystal ligands are used.
Optimal virtual ligand screening performance was obtained combining pocket refinement with out-of-pocket rescoring (Table 2B, Fig. 3B). An average ROC AUC of 79.4 (median= 82.2) with 24.7%, 39.5% and 44.3% of true positives (median= 20.4%, 37.0% and 45.2%) identified at 0.1%, 1% and 2% false positive rates, respectively, was obtained using this combined approach. Moreover, ROC AUC values above 70 and ROC(2%) above 20 were found in 78% of the targets, being potential good candidates for virtual screening. While for most targets ligand recognition was improved after pocket refinement and out-of-pocket penalty, worse than random models such as pdgfrb, p38 and gr did not change significantly with any of the approaches used.
Virtual screening with actives from WOMBAT (combined with decoys for the same targets derived from DUD), provided an average ROC AUC value of 63.1 (median= 64.8) with 12.8% (median= 12.5%) true positives identified at a 2% false positive rate (Table 3A and Fig. 4A). Pocket refinement followed by out-of-pocket rescoring led to an average AUCrefined/rescored= 72.1 (median=69.8) and ROC(2%)refined/rescored= 28.7% (median= 22.8% Table 3B). Despite more conservative than DUD, these results confirm the good overall virtual ligand screening performance of ICM. Peroxisome proliferator activated receptor γ (ppar, AUCrefined/rescored= 92.7, ROC(2%)refined/rescored= 69.8%) and estrogen receptor antagonists (er_antag, AUCrefined/rescored= 89.0, ROC(2%)refined/rescored= 53.0%) were among the best performing targets (Fig. 4B).
Virtual ligand screening benchmarks using the WOMBAT test sets of lead-like, chemically diverse small molecules is inherently more challenging than DUD. This is particularly critical when using a single ligand binding pocket conformation because of potential induced fit mechanisms upon binding of different chemotypes. On the other hand, chemical bias was introduced combining WOMBAT actives with DUD decoys. Compared to WOMBAT actives, DUD decoys have higher molecular weights and more functional groups than WOMBAT, which in turn leads to additional favorable contacts and better scores.
In this study, docking and scoring with ICM was benchmarked for ligand binding mode accuracy and virtual screening against publicly available test sets. ICM was highly successful in generating multiple poses that include experimentally solved near native conformations. Known co-crystal binding modes were reproduced as the top 1 scoring pose in most sites. Visual inspection of a few cases with larger RMSD predictions revealed that the ligands were correctly positioned within the protein pocket and that the most critical contacts were captured. Difficult cases included PDB entries with protein-ligand water-mediated hydrogen bonds, metal coordination, highly symmetrical and flexible molecules. Our results demonstrate that ICM performs remarkably well in self-docking experiments; however, cross-docking tests where the extracted ligands are docked into protein structures from different complexes, would provide a more challenging and realistic assessment. Cross-docking pose prediction test sets using experimentally solved conformational ensembles of druggable binding pockets will be important for future software comparison benchmarks .
Virtual ligand screening using the default ICM scoring method, single pocket conformations per target and no additional modeling steps to account for protein plasticity, showed good overall performance, yet highly dependent on the target of interest. This finding highlights the importance of target and software validation before undergoing time- and cost-consuming screening of large compound databases followed by experimental evaluation of new molecules. Rescoring based on out-of-pocket contacts was an efficient and CPU-time inexpensive method to improve recognition in virtual ligand screening. On the other hand, an intrinsic complexity in the structure-based identification of new active compounds is related to the flexibility of the protein binding pockets. Co-crystal structures of identical ligand-binding domains bound to different chemotypes reveal more or less pronounced conformational changes to accommodate binding of ligands. Our results using energy-based pocket refinement, accounting for induced fit and protein plasticity, dramatically improved the overall discrimination and early enrichments using a single binding site model per target. However, because different chemotypes can induce alternative conformational changes to the ligand binding pocket, such as rotation of side chains and small loop rearrangements, each of them represents only a fraction of the total molecular chemical recognition properties and has somewhat limited potential for virtual screening of novel chemotypes. Increasing evidence, including our own results, suggests that pocket ensembles and 4D (multi-conformational) docking are more effective to recognize ligands in virtual screening and predict their binding geometries [53–55].
M.A.C. Neves thanks Fundação para a Ciência e a Tecnologia (FCT), Portugal, for a Post Doctoral grant (SFRH/BPD/64216/2009). The authors thank Irina Kufareva, Manuel Rueda, Winston Chen, Chayan Acharya and Chris Edwards for useful discussions and comments. This work was supported by National Institutes of Health [grant numbers R01 GM071872, U01 GM094612 and U54 GM094618].
Marco A. C. Neves, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA. Centro de Neurociências, Lab. Química Farmacêutica, Faculdade de Farmácia, Universidade de Coimbra, Pólo das Ciências da Saúde, 3000-548 Coimbra, Portugal.
Maxim Totrov, Molsoft L.L.C. 11199 Sorrento Valley Road, S209, San Diego, CA 92121, USA.
Ruben Abagyan, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA. Molsoft L.L.C. 11199 Sorrento Valley Road, S209, San Diego, CA 92121, USA.