Molecular-docking-based virtual screening is an important tool in drug discovery that is used to significantly reduce the number of possible chemical compounds to be investigated. In addition to the selection of a sound docking strategy with appropriate scoring functions, another technical challenge is to in silico screen millions of compounds in a reasonable time. To meet this challenge, it is necessary to use high performance computing (HPC) platforms and techniques. However, the development of an integrated HPC system that makes efficient use of its elements is not trivial.
We have developed an application termed DOVIS that uses AutoDock (version 3) as the docking engine and runs in parallel on a Linux cluster. DOVIS can efficiently dock large numbers (millions) of small molecules (ligands) to a receptor, screening 500 to 1,000 compounds per processor per day. Furthermore, in DOVIS, the docking session is fully integrated and automated in that the inputs are specified via a graphical user interface, the calculations are fully integrated with a Linux cluster queuing system for parallel processing, and the results can be visualized and queried.
DOVIS removes most of the complexities and organizational problems associated with large-scale high-throughput virtual screening, and provides a convenient and efficient solution for AutoDock users to use this software in a Linux cluster platform.
Virtual screening is used to distinguish potential leads from inactive compounds in a database of chemical samples. One method for accomplishing this is by docking compounds into the structure of a receptor binding site in order to rank-order compounds by the quality of the interactions they form with the receptor. It is generally established that docking can be reasonably successful at generating good poses of a ligand in an active site. However, the scoring functions that are used with docking are typically not successful at correctly ranking ligands according to binding affinity or even distinguishing correct poses of a given ligand from incorrect ones.
We have developed a simple method for reducing the number of false positives in a virtual screen, meaning ligands which are scored highly by the docking program but do not bind well in reality. This method uses a docking program for pose generation without regard to scoring, followed by filtering with receptor-based pharmacophore searches. We applied it to three test-case targets: neuraminidase A, cyclin-dependent kinase 2, and the C1 domain of protein kinase C.
The pharmacophore filtering method can perform better than more traditional docking + scoring methods, and allows the advantages of both docking-based and pharmacophore-based approaches to virtual screening to be fully realized.
Nature, especially the plant kingdom, is a rich source for novel bioactive compounds that can be used as lead compounds for drug development. In order to exploit this resource, the two neural network-based virtual screening techniques novelty detection with self-organizing maps (SOMs) and counterpropagation neural network were evaluated as tools for efficient lead structure discovery. As application scenario, significant descriptors for acetylcholinesterase (AChE) inhibitors were determined and used for model building, theoretical model validation, and virtual screening. Top-ranked virtual hits from both approaches were docked into the AChE binding site to approve the initial hits. Finally, in vitro testing of selected compounds led to the identification of forsythoside A and (+)-sesamolin as novel AChE inhibitors.
Natural products; drug discovery; acetylcholinesterase; virtual screening; counterpropagation network; spinne; novelty detection
The number of protein targets with a known or predicted tri-dimensional structure and of drug-like chemical compounds is growing rapidly and so is the need for new therapeutic compounds or chemical probes. Performing flexible structure-based virtual screening computations on thousands of targets with millions of molecules is intractable to most laboratories nor indeed desirable. Since shape complementarity is of primary importance for most protein-ligand interactions, we have developed a tool/protocol based on rigid-body docking to select compounds that fit well into binding sites.
Here we present an efficient multiple conformation rigid-body docking approach, MS-DOCK, which is based on the program DOCK. This approach can be used as the first step of a multi-stage docking/scoring protocol. First, we developed and validated the Multiconf-DOCK tool that generates several conformers per input ligand. Then, each generated conformer (bioactives and 37970 decoys) was docked rigidly using DOCK6 with our optimized protocol into seven different receptor-binding sites. MS-DOCK was able to significantly reduce the size of the initial input library for all seven targets, thereby facilitating subsequent more CPU demanding flexible docking procedures.
MS-DOCK can be easily used for the generation of multi-conformer libraries and for shape-based filtering within a multi-step structure-based screening protocol in order to shorten computation times.
The need for fast and accurate scoring functions has been driven by the increased use of in silico virtual screening twinned with high-throughput screening as a method to rapidly identify potential candidates in the early stages of drug development. We examine the ability of some the most common scoring functions (GOLD, ChemScore, DOCK, PMF, BLEEP and Consensus) to discriminate correctly and efficiently between active and non-active compounds among a library of ~3,600 diverse decoy compounds in a virtual screening experiment against heat shock protein 90 (Hsp90).
Firstly, we investigated two ranking methodologies, GOLDrank and BestScorerank. GOLDrank is based on ranks generated using GOLD. The various scoring functions, GOLD, ChemScore, DOCK, PMF, BLEEP and Consensus, are applied to the pose ranked number one by GOLD for that ligand. BestScorerank uses multiple poses for each ligand and independently chooses the best ranked pose of the ligand according to each different scoring function. Secondly, we considered the effect of introducing the Thr184 hydrogen bond tether to guide the docking process towards a particular solution, and its effect on enrichment. Thirdly, we considered normalisation to account for the known bias of scoring functions to select larger molecules. All the scoring functions gave fairly similar enrichments, with the exception of PMF which was consistently the poorest performer. In most cases, GOLD was marginally the best performing individual function; the Consensus score usually performed similarly to the best single scoring function. Our best results were obtained using the Thr184 tether in combination with the BestScorerank protocol and normalisation for molecular weight. For that particular combination, DOCK was the best individual function; DOCK recovered 90% of the actives in the top 10% of the ranked list; Consensus similarly recovered 89% of the actives in its top 10%.
Overall, we demonstrate the validity of virtual screening as a method for identifying new leads from a pool of ligands with similar physicochemical properties and we believe that the outcome of this study provides useful insight into the setting up of a suitable docking and scoring protocol, resulting in enrichment of 'target active' compounds.
Poor performance of scoring functions is a well-known bottleneck in structure-based virtual screening, which is most frequently manifested in the scoring functions’ inability to discriminate between true ligands versus known non-binders (therefore designated as binding decoys). This deficiency leads to a large number of false positive hits resulting from virtual screening. We have hypothesized that filtering out or penalizing docking poses recognized as non-native (i.e., pose decoys) should improve the performance of virtual screening in terms of improved identification of true binders. Using several concepts from the field of cheminformatics, we have developed a novel approach to identifying pose decoys from an ensemble of poses generated by computational docking procedures. We demonstrate that the use of target-specific pose (-scoring) filter in combination with a physical force field-based scoring function (MedusaScore) leads to significant improvement of hit rates in virtual screening studies for 12 of the 13 benchmark sets from the clustered version of the Database of Useful Decoys (DUD). This new hybrid scoring function outperforms several conventional structure-based scoring functions, including XSCORE∷HMSCORE, ChemScore, PLP, and Chemgauss3, in six out of 13 data sets at early stage of VS (up 1% decoys of the screening database). We compare our hybrid method with several novel VS methods that were recently reported to have good performances on the same DUD data sets. We find that the retrieved ligands using our method are chemically more diverse in comparison with two ligand-based methods (FieldScreen and FLAP∷LBX). We also compare our method with FLAP∷RBLB, a high-performance VS method that also utilizes both the receptor and the cognate ligand structures. Interestingly, we find that the top ligands retrieved using our method are highly complementary to those retrieved using FLAP∷RBLB, hinting effective directions for best VS applications. We suggest that this integrative virtual screening approach combining cheminformatics and molecular mechanics methodologies may be applied to a broad variety of protein targets to improve the outcome of structure-based drug discovery studies.
Ligand-based and structure-based drug screening methods were integrated for in silico drug development by combining the maximum-volume overlap (MVO) method with a protein-compound docking program. The MVO method is used to select reliable docking poses by calculating volume overlaps between the docking pose in question and the known ligand docking pose, if at least a single protein-ligand complex structure is known. In the present study, the compounds in a database were docked onto a target protein that had a known protein-ligand complex structure. The new score is the summation of the docking score and the MVO score, which is the measure of the volume overlap between the docking poses of the compound in question and the known ligand. The compounds were sorted according to the new score. The in silico screening results were improved by comparing the MVO score to the original docking score only. The present method was also applied to some target proteins with known ligands, and the results demonstrated that it worked well.
virtual drug screening; structure-based drug screening; protein-compound docking.
Structure-based virtual screening of NCI Diversity set II compounds was performed to indentify novel inhibitor scaffolds of trypanothione reductase (TR) from Leishmania infantum. The top 50 ranked hits were clustered using the AuPoSOM tool. Majority of the top-ranked compounds were Tricyclic. Clustering of hits yielded four major clusters each comprising varying number of subclusters differing in their mode of binding and orientation in the active site. Moreover, for the first time, we report selected alkaloids and dibenzothiazepines as inhibitors of Leishmania infantum TR. The mode of binding observed among the clusters also potentiates the probable in vitro inhibition kinetics and aids in defining key interaction which might contribute to the inhibition of enzymatic reduction of T[S] 2. The method provides scope for automation and integration into the virtual screening process employing docking softwares, for clustering the small molecule inhibitors based upon protein-ligand interactions.
Virtual and high-throughput screens (HTS) should have complementary strengths and weaknesses, but studies that prospectively and comprehensively compare them are rare. We undertook a parallel docking and HTS screen of 197861 compounds against cruzain, a thiol protease target for Chagas disease, looking for reversible, competitive inhibitors. On workup, 99% of the hits were eliminated as false positives, yielding 146 well-behaved, competitive ligands. These fell into five chemotypes: two were prioritized by scoring among the top 0.1% of the docking-ranked library, two were prioritized by behavior in the HTS and by clustering, and one chemotype was prioritized by both approaches. Determination of an inhibitor/cruzain crystal structure and comparison of the high-scoring docking hits to experiment illuminated the origins of docking false-negatives and false-positives. Prioritizing molecules that are both predicted by docking and are HTS-active yields well-behaved molecules, relatively unobscured by the false-positives to which both techniques are individually prone.
Inhibitors of the transmembrane protein sarco/endoplasmic reticulum calcium ATPase (SERCA) are invaluable tools for the study of the enzyme’s physiological functions and they have been recognized as a promising new class of anticancer agents. For the discovery of novel enzyme inhibitors, small molecule docking for virtual screens of large compound libraries has become increasingly important. Since the performance of various docking routines varies considerably, depending on the target and the chemical nature of the ligand, we critically evaluated the performance of four frequently used programs – GOLD, AutoDock, Surflex-Dock, and FRED – for the docking of SERCA inhibitors based on the structures of thapsigargin, di-tert-butylhydroquinone, and cyclopiazonic acid. Evaluation criteria were docking accuracy using crystal structures as references, docking reproducibility, and correlation between docking scores and known bioactivities. The best overall results were obtained by GOLD and FRED. Docking runs with conformationally flexible binding sites produced no significant improvement of the results.
computational docking; scoring function; inhibitory potency; calcium pump; thapsigargin; di-tert-butylhydroquinone; cyclopiazonic acid; inhibitor binding site
The neuronal nicotinic acetylcholine receptor (nAChR) has been a target for drug development studies for over a decade. A series ofmono- andbis-quaternary ammonium salts, known to be antagonists at nAChRs, were separated into 3 structural classes and evaluated using both self-organizing map (SOM) and genetic functional approximation (GFA) algorithm models. Descriptors from these compounds were used to create several nonlinear quantitative structure-activity relationships (QSARs). The SOM methodology was effective in appropriately grouping these compounds with diverse structures and activities. The GFA models were also able to predict the activities of these molecules. Charge distribution and the hydrophobic free energies were found to be important indicators of bioactivity for this particular class of molecules. These QSAR approaches may be a useful to screen and selectin silico new drug candidates from larger compound libraries to be further evaluated in in vitro biological assays.
self-organizing map; genetic functional approximation; neuronal nicotinic acetylcholine receptor
Molecular docking is routinely used for understanding drug‐receptor interaction in modern drug design. Here, we describe
the docking of 2, 4-diamino-5-methyl-5-deazapteridine (DMDP) derivatives as inhibitors to human dihydrofolate reductase
(DHFR). We docked 78 DMDP derivates collected from literature to DHFR and studied their specific interactions with DHFR.
A new shape-based method, LigandFit, was used for docking DMDP derivatives into DHFR active sites. The result indicates
that the molecular docking approach is reliable and produces a good correlation coefficient (r2 = 0.499) for the 73
compounds between docking score and IC50 values (Inhibitory Activity). The chloro substituted naphthyl ring of compound
63 makes significant hydrophobic contact with Leu 22, Phe 31 and Pro 61 of the DHFR active site leading to enhanced
inhibition of the enzyme. The docked complexes provide better insights to design more potent DHFR inhibitors prior to their
DHFR inhibitors; DMDP derivatives; molecular docking; drug; receptor
The increasing numbers of 3D compounds and protein complexes stored in databases contribute greatly to current advances in biotechnology, being employed in several pharmaceutical and industrial applications. However, screening and retrieving appropriate candidates as well as handling false positives presents a challenge for all post-screening analysis methods employed in retrieving therapeutic and industrial targets.
Using the TSCC method, virtually screened compounds were clustered based on their protein-ligand interactions, followed by structure clustering employing physicochemical features, to retrieve the final compounds. Based on the protein-ligand interaction profile (first stage), docked compounds can be clustered into groups with distinct binding interactions. Structure clustering (second stage) grouped similar compounds obtained from the first stage into clusters of similar structures; the lowest energy compound from each cluster being selected as a final candidate.
By representing interactions at the atomic-level and including measures of interaction strength, better descriptions of protein-ligand interactions and a more specific analysis of virtual screening was achieved. The two-stage clustering approach enhanced our post-screening analysis resulting in accurate performances in clustering, mining and visualizing compound candidates, thus, improving virtual screening enrichment.
Virtual compound screening using molecular docking is widely used in the discovery of new lead compounds for drug design. However, this method is not completely reliable and therefore unsatisfactory. In this study, we used massive molecular dynamics simulations of protein-ligand conformations obtained by molecular docking in order to improve the enrichment performance of molecular docking. Our screening approach employed the molecular mechanics/Poisson-Boltzmann and surface area method to estimate the binding free energies. For the top-ranking 1,000 compounds obtained by docking to a target protein, approximately 6,000 molecular dynamics simulations were performed using multiple docking poses in about a week. As a result, the enrichment performance of the top 100 compounds by our approach was improved by 1.6–4.0 times that of the enrichment performance of molecular dockings. This result indicates that the application of molecular dynamics simulations to virtual screening for lead discovery is both effective and practical. However, further optimization of the computational protocols is required for screening various target proteins.
Lead discovery is one of the most important processes in rational drug design. To improve the rate of the detection of lead compounds, various technologies such as high-throughput screening and combinatorial chemistry have been introduced into the pharmaceutical industry. However, since these technologies alone may not improve lead productivity, computational screening has become important. A central method for computational screening is molecular docking. This method generally docks many flexible ligands to a rigid protein and predicts the binding affinity for each ligand in a practical time. However, its ability to detect lead compounds is less reliable. In contrast, molecular dynamics simulations can treat both proteins and ligands in a flexible manner, directly estimate the effect of explicit water molecules, and provide more accurate binding affinity, although their computational costs and times are significantly greater than those of molecular docking. Therefore, we developed a special purpose computer “MDGRAPE-3” for molecular dynamics simulations and applied it to computational screening. In this paper, we report an effective method for computational screening; this method is a combination of molecular docking and massive-scale molecular dynamics simulations. The proposed method showed a higher and more stable enrichment performance than the molecular docking method used alone.
Progress in functional genomics and structural studies on biological macromolecules are generating a growing number of potential targets for therapeutics, adding to the importance of computational approaches for small molecule docking and virtual screening of candidate compounds. In this review, recent improvements in several public domain packages that are widely used in the context of drug development, including DOCK, AutoDock, AutoDock Vina and Screening for Ligands by Induced-fit Docking Efficiently (SLIDE) are surveyed. The authors also survey methods for the analysis and visualisation of docking simulations, as an important step in the overall assessment of the results. In order to illustrate the performance and limitations of current docking programs, the authors used the National Center for Toxicological Research (NCTR) oestrogen receptor benchmark set of 232 oestrogenic compounds with experimentally measured strength of binding to oestrogen receptor alpha. The methods tested here yielded a correlation coefficient of up to 0.6 between the predicted and observed binding affinities for active compounds in this benchmark.
drug discovery; small molecule docking; virtual screening; docking packages; visualisation of docking poses; oestrogen receptor; oestrogen activity prediction; SAR
Virtual screening by molecular docking has become a widely used approach to lead discovery in the pharmaceutical industry when a high resolution structure of the biological target of interest is available. The performance of three widely-used docking programs (Glide, GOLD, and DOCK) for virtual database screening is studied when they are applied to the same protein target and ligand set. Comparisons of the docking programs and scoring functions using a large and diverse data set of pharmaceutically interesting targets and active compounds are carried out. We focus on the problem of docking and scoring flexible compounds which are sterically capable of docking into a rigid conformation of the receptor. The Glide XP methodology is shown to consistently yield enrichments superior to the two alternative methods, while GOLD outperforms DOCK on average. The study also shows that docking into multiple receptor structures can decrease the docking error in screening a diverse set of active compounds.
A docking-rescoring method, based on per-residue van der Waals (VDW), electrostatic (ES), or hydrogen bond (HB) energies has been developed to aid discovery of ligands that have interaction signatures with a target (footprints) similar to that of a reference. Biologically useful references could include known drugs, inhibitors, substrates, transition states, or side-chains that mediate protein-protein interactions. Termed footprint similarity (FPS) score, the method, as implemented in the program DOCK, was validated and characterized using: (1) pose identification, (2) crossdocking, (3) enrichment, and (4) virtual screening. Improvements in pose identification (6–12%) were obtained using footprint-based (FPSVDW+ES) vs standard DOCK (DCEVDW+ES) scoring as evaluated on three large datasets (680–775 systems) from the SB2010 database. Enhanced pose identification was also observed using FPS (45.4% or 70.9%) compared with DCE (17.8%) methods to rank challenging crossdocking ensembles from carbonic anhydrase. Enrichment tests, for three representative systems, revealed FPSVDW+ES scoring yields significant early fold enrichment in the top 10% of ranked databases. For EGFR, top FPS poses are nicely accommodated in the molecular envelope defined by the reference in comparison with DCE which yields distinct molecular weight bias towards larger molecules. Results from a representative virtual screen of ca. 1 million compounds additionally illustrate how ligands with footprints similar to a known inhibitor can readily be identified from within large commercially available databases. By providing an alternative way to rank ligand poses in a simple yet directed manner we anticipate that FPS scoring will be a useful tool for docking and structure-based design.
Molecular Footprints; Molecular Fingerprints; Pose Comparison; Pose Rescoring; Docking; Virtual Screening; Enrichment; ROC Curves; Euclidean Distance; Pearson Correlation
The RNA polymerase NS5B of Hepatitis C virus (HCV) is a well-characterised drug target with an active site and four allosteric binding sites. This work presents a workflow for virtual screening and its application to Drug Bank screening targeting the Hepatitis C Virus (HCV) RNA polymerase non-nucleoside binding sites. Potential polypharmacological drugs are sought with predicted active inhibition on viral replication, and with proven positive pharmaco-clinical profiles. The approach adopted was receptor-based. Docking screens, guided with contact pharmacophores and neural-network activity prediction models on all allosteric binding sites and MD simulations, constituted our analysis workflow for identification of potential hits. Steps included: 1) using a two-phase docking screen with Surflex and Glide Xp. 2) Ranking based on scores, and important H interactions. 3) a machine-learning target-trained artificial neural network PIC prediction model used for ranking. This provided a better correlation of IC50 values of the training sets for each site with different docking scores and sub-scores. 4) interaction pharmacophores-through retrospective analysis of protein-inhibitor complex X-ray structures for the interaction pharmacophore (common interaction modes) of inhibitors for the five non-nucleoside binding sites were constructed. These were used for filtering the hits according to the critical binding feature of formerly reported inhibitors. This filtration process resulted in identification of potential new inhibitors as well as formerly reported ones for the thumb II and Palm I sites (HCV-81) NS5B binding sites. Eventually molecular dynamics simulations were carried out, confirming the binding hypothesis and resulting in 4 hits.
The main functional components of green tea, such as epigallocatechin gallate (EGCG), epigallocatechin (EGC), epicatechin gallate (ECG) and epicatechin (EC), are found to have a broad antineoplastic activity. The discovery of their targets plays an important role in revealing the antineoplastic mechanism. Therefore, to identify potential target proteins for tea polyphenols, we have taken a comparative virtual screening approach using two reverse docking systems, one based on Autodock software and the other on Tarfisdock. Two separate in silico workflows were implemented to derive a set of target proteins related to human diseases and ranked by the binding energy score. Several conventional clinically important proteins with anti-tumor effects are screened out from the PDTD protein database as the potential receptors by both procedures. To further analyze the validity of docking results, we study the binding mode of EGCG and the potential target protein Leukotriene A4 hydrolase in detail. We indicate that interactions mediated by electrostatic and hydrogen bond play a key role in ligand binding. EGCG binds to the enzyme with certain orientation and conformation that is suitable for nucleophilic attacks by several electrical residues inside the enzyme’s activity cavity. This study provides useful information for studying the antitumor mechanism of tea’s functional components. The comparative reverse docking strategy presented generates a tractable set of antineoplastic proteins for future experimental validation as drug targets against tumors.
tea polyphenols; reverse docking; target protein; binding mode; virtual screening
Cytochrome P450 enzymes are responsible for metabolizing many endogenous and xenobiotic molecules encountered by the human body. It has been estimated that 75% of all drugs are metabolized by cytochrome P450 enzymes. Thus, predicting a compound s potential sites of metabolism (SOM) is highly advantageous early in the drug development process. We have combined molecular dynamics, AutoDock Vina docking, the neighboring atom type (NAT) reactivity model, and a solvent-accessible surface-area term to form a reactivity-accessibility model capable of predicting SOM for cytochrome P450 2C9 substrates. To investigate the importance of protein flexibility during the ligand binding process, the results of SOM prediction using a static protein structure for docking were compared to SOM prediction using multiple protein structures in ensemble docking. The results reported here indicate that ensemble docking increases the number of ligands that can be docked in a bioactive conformation (ensemble: 96%, static: 85%) but only leads to a slight improvement (49% vs. 44%) in predicting an experimentally known SOM in the top-1 position for a ligand library of 75 CYP2C9 substrates. Using ensemble docking, the reactivity-accessibility model accurately predicts SOM in the top-1 ranked position for 49% of the ligand library and considering the top-3 predicted sites increases the prediction success rate to approximately 70% of the ligand library. Further classifying the substrate library according to Km values leads to an improvement in SOM prediction for substrates with low Km values (57% at top-1). While the current predictive power of the reactivity-accessibility model still leaves significant room for improvement, the results illustrate the usefulness of this method to identify key protein-ligand interactions and guide structural modifications of the ligand to increase its metabolic stability.
CYP2C9; metabolism; docking; protein flexibility; computational chemistry
The constitutive androstane receptor (CAR, NR1I3) is a xenobiotic sensor governing the transcription of numerous hepatic genes associated with drug metabolism and clearance. Recent evidence suggests that CAR also modulates energy homeostasis and cancer development. Thus, identification of novel human (h) CAR activators is of both clinical importance and scientific interest.
Docking and ligand-based structure-activity models were used for virtual screening of a database containing over 2000 FDA-approved drugs. Identified lead compounds were evaluated in cell-based reporter assays to determine hCAR activation. Potential activators were further tested in human primary hepatocytes (HPHs) for the expression of the prototypical hCAR target gene CYP2B6.
Nineteen lead compounds with optimal modeling parameters were selected for biological evaluation. Seven of the 19 leads exhibited moderate to potent activation of hCAR. Five out of the seven compounds translocated hCAR from the cytoplasm to the nucleus of HPHs in a concentration-dependent manner. These compounds also induce the expression of CYP2B6 in HPHs with rank-order of efficacies closely resembling that of hCAR activation.
These results indicate that our strategically integrated approaches are effective in the identification of novel hCAR modulators, which may function as valuable research tools or potential therapeutic molecules.
CAR; Pharmacophore; CYP2B6; Induction; Hepatocytes
In this study, we aimed to develop a new ligand-based virtual screening approach using an effective shape-overlapping procedure and a more robust scoring function (denoted by the HWZ score for convenience). The HWZ score-based virtual screening approach was tested against the compounds for 40 protein targets available in the Database of Useful Decoys (DUD) (dud.docking.org/jahn/), and the virtual screening performance was evaluated in terms of the area under the ROC curve (AUC), Enrichment Factor (EF), and Hit Rate (HR), demonstrating an improved overall performance compared to other popularly used approaches examined. In particular, the HWZ score-based virtual screening led to an average AUC value of 0.84 ± 0.02 (95% confidence interval) for the 40 targets. The average HR values at top 1% and 10% of the active compounds for the 40 targets were 46.3% ± 6.7% and 59.2% ± 4.7%, respectively. In addition, the performance of the HWZ score-based virtual screening approach is less sensitive to the choice of the target.
Reverse docking approaches have been explored in previous studies on drug discovery to overcome some problems in traditional virtual screening. However, current reverse docking approaches are problematic in that the target spaces of those studies were rather small, and their applications were limited to identifying new drug targets. In this study, we expanded the scope of target space to a set of all protein structures currently available and developed several new applications of reverse docking method.
We generated 2D Matrix of docking scores among all the possible protein structures in yeast and human and 35 famous drugs. By clustering the docking profile data and then comparing them with fingerprint-based clustering of drugs, we first showed that our data contained accurate information on their chemical properties. Next, we showed that our method could be used to predict the druggability of target proteins. We also showed that a combination of sequence similarity and docking profile similarity could predict the enzyme EC numbers more accurately than sequence similarity alone. In two case studies, 5-flurouracil and cycloheximide, we showed that our method can successfully find identifying target proteins.
By using a large number of protein structures, we improved the sensitivity of reverse docking and showed that using as many protein structure as possible was important in finding real binding targets.
Protein-protein docking, which aims to predict the structure of a protein-protein complex from its unbound components, remains an unresolved challenge in structural bioinformatics. An important step is the ranking of docked poses using a scoring function, for which many methods have been developed. There is a need to explore the differences and commonalities of these methods with each other, as well as with functions developed in the fields of molecular dynamics and homology modelling.
We present an evaluation of 115 scoring functions on an unbound docking decoy benchmark covering 118 complexes for which a near-native solution can be found, yielding top 10 success rates of up to 58%. Hierarchical clustering is performed, so as to group together functions which identify near-natives in similar subsets of complexes. Three set theoretic approaches are used to identify pairs of scoring functions capable of correctly scoring different complexes. This shows that functions in different clusters capture different aspects of binding and are likely to work together synergistically.
All functions designed specifically for docking perform well, indicating that functions are transferable between sampling methods. We also identify promising methods from the field of homology modelling. Further, differential success rates by docking difficulty and solution quality suggest a need for flexibility-dependent scoring. Investigating pairs of scoring functions, the set theoretic measures identify known scoring strategies as well as a number of novel approaches, indicating promising augmentations of traditional scoring methods. Such augmentation and parameter combination strategies are discussed in the context of the learning-to-rank paradigm.
Docking; Scoring functions; Binding energy; Ranking; SwarmDock
Virtual screening methods start to be well established as effective approaches to identify hits, candidates and leads for drug discovery research. Among those, structure based virtual screening (SBVS) approaches aim at docking collections of small compounds in the target structure to identify potent compounds. For SBVS, the identification of candidate pockets in protein structures is a key feature, and the recent years have seen increasing interest in developing methods for pocket and cavity detection on protein surfaces.
Fpocket is an open source pocket detection package based on Voronoi tessellation and alpha spheres built on top of the publicly available package Qhull. The modular source code is organised around a central library of functions, a basis for three main programs: (i) Fpocket, to perform pocket identification, (ii) Tpocket, to organise pocket detection benchmarking on a set of known protein-ligand complexes, and (iii) Dpocket, to collect pocket descriptor values on a set of proteins. Fpocket is written in the C programming language, which makes it a platform well suited for the scientific community willing to develop new scoring functions and extract various pocket descriptors on a large scale level. Fpocket 1.0, relying on a simple scoring function, is able to detect 94% and 92% of the pockets within the best three ranked pockets from the holo and apo proteins respectively, outperforming the standards of the field, while being faster.
Fpocket provides a rapid, open source and stable basis for further developments related to protein pocket detection, efficient pocket descriptor extraction, or drugablity prediction purposes. Fpocket is freely available under the GNU GPL license at .