Search tips
Search criteria

Results 1-25 (1514207)

Clipboard (0)

Related Articles

1.  An Efficient Computational Method for Calculating Ligand Binding Affinities 
PLoS ONE  2012;7(8):e42846.
Virtual compound screening using molecular docking is widely used in the discovery of new lead compounds for drug design. However, the docking scores are not sufficiently precise to represent the protein-ligand binding affinity. Here, we developed an efficient computational method for calculating protein-ligand binding affinity, which is based on molecular mechanics generalized Born/surface area (MM-GBSA) calculations and Jarzynski identity. Jarzynski identity is an exact relation between free energy differences and the work done through non-equilibrium process, and MM-GBSA is a semimacroscopic approach to calculate the potential energy. To calculate the work distribution when a ligand is pulled out of its binding site, multiple protein-ligand conformations are randomly generated as an alternative to performing an explicit single-molecule pulling simulation. We assessed the new method, multiple random conformation/MM-GBSA (MRC-MMGBSA), by evaluating ligand-binding affinities (scores) for four target proteins, and comparing these scores with experimental data. The calculated scores were qualitatively in good agreement with the experimental binding affinities, and the optimal docking structure could be determined by ranking the scores of the multiple docking poses obtained by the molecular docking process. Furthermore, the scores showed a strong linear response to experimental binding free energies, so that the free energy difference of the ligand binding (ΔΔG) could be calculated by linear scaling of the scores. The error of calculated ΔΔG was within ≈±1.5 kcal•mol−1 of the experimental values. Particularly, in the case of flexible target proteins, the MRC-MMGBSA scores were more effective in ranking ligands than those generated by the MM-GBSA method using a single protein-ligand conformation. The results suggest that, owing to its lower computational costs and greater accuracy, the MRC-MMGBSA offers efficient means to rank the ligands, in the post-docking process, according to their binding affinities, and to compare these directly with the experimental values.
PMCID: PMC3423425  PMID: 22916168
2.  Application of Consensus Scoring and Principal Component Analysis for Virtual Screening against β-Secretase (BACE-1) 
PLoS ONE  2012;7(6):e38086.
In order to identify novel chemical classes of β-secretase (BACE-1) inhibitors, an alternative scoring protocol, Principal Component Analysis (PCA), was proposed to summarize most of the information from the original scoring functions and re-rank the results from the virtual screening against BACE-1.
Given a training set (50 BACE-1 inhibitors and 9950 inactive diverse compounds), three rank-based virtual screening methods, individual scoring, conventional consensus scoring and PCA, were judged by the hit number in the top 1% of the ranked list. The docking poses were generated by Surflex, five scoring functions (Surflex_Score, D_Score, G_Score, ChemScore, and PMF_Score) were used for pose extraction. For each pose group, twelve scoring functions (Surflex_Score, D_Score, G_Score, ChemScore, PMF_Score, LigScore1, LigScore2, PLP1, PLP2, jain, Ludi_1, and Ludi_2) were used for the pose rank. For a test set, 113,228 chemical compounds (Sigma-Aldrich® corporate chemical directory) were docked by Surflex, then ranked by the same three ranking methods motioned above to select the potential active compounds for experimental test.
For the training set, the PCA approach yielded consistently superior rankings compared to conventional consensus scoring and single scoring. For the test set, the top 20 compounds according to conventional consensus scoring were experimentally tested, no inhibitor was found. Then, we relied on PCA scoring protocol to test another different top 20 compounds and two low micromolar inhibitors (S450588 and 276065) were emerged through the BACE-1 fluorescence resonance energy transfer (FRET) assay.
The PCA method extends the conventional consensus scoring in a quantitative statistical manner and would appear to have considerable potential for chemical screening applications.
PMCID: PMC3372491  PMID: 22701601
3.  PDTD: a web-accessible protein database for drug target identification 
BMC Bioinformatics  2008;9:104.
Target identification is important for modern drug discovery. With the advances in the development of molecular docking, potential binding proteins may be discovered by docking a small molecule to a repository of proteins with three-dimensional (3D) structures. To complete this task, a reverse docking program and a drug target database with 3D structures are necessary. To this end, we have developed a web server tool, TarFisDock (Target Fishing Docking) , which has been used widely by others. Recently, we have constructed a protein target database, Potential Drug Target Database (PDTD), and have integrated PDTD with TarFisDock. This combination aims to assist target identification and validation.
PDTD is a web-accessible protein database for in silico target identification. It currently contains >1100 protein entries with 3D structures presented in the Protein Data Bank. The data are extracted from the literatures and several online databases such as TTD, DrugBank and Thomson Pharma. The database covers diverse information of >830 known or potential drug targets, including protein and active sites structures in both PDB and mol2 formats, related diseases, biological functions as well as associated regulating (signaling) pathways. Each target is categorized by both nosology and biochemical function. PDTD supports keyword search function, such as PDB ID, target name, and disease name. Data set generated by PDTD can be viewed with the plug-in of molecular visualization tools and also can be downloaded freely. Remarkably, PDTD is specially designed for target identification. In conjunction with TarFisDock, PDTD can be used to identify binding proteins for small molecules. The results can be downloaded in the form of mol2 file with the binding pose of the probe compound and a list of potential binding targets according to their ranking scores.
PDTD serves as a comprehensive and unique repository of drug targets. Integrated with TarFisDock, PDTD is a useful resource to identify binding proteins for active compounds or existing drugs. Its potential applications include in silico drug target identification, virtual screening, and the discovery of the secondary effects of an old drug (i.e. new pharmacological usage) or an existing target (i.e. new pharmacological or toxic relevance), thus it may be a valuable platform for the pharmaceutical researchers. PDTD is available online at .
PMCID: PMC2265675  PMID: 18282303
4.  Implementation and Evaluation of a Docking-Rescoring Method using Molecular Footprint Comparisons 
Journal of computational chemistry  2011;10.1002/jcc.21814.
A docking-rescoring method, based on per-residue van der Waals (VDW), electrostatic (ES), or hydrogen bond (HB) energies has been developed to aid discovery of ligands that have interaction signatures with a target (footprints) similar to that of a reference. Biologically useful references could include known drugs, inhibitors, substrates, transition states, or side-chains that mediate protein-protein interactions. Termed footprint similarity (FPS) score, the method, as implemented in the program DOCK, was validated and characterized using: (1) pose identification, (2) crossdocking, (3) enrichment, and (4) virtual screening. Improvements in pose identification (6–12%) were obtained using footprint-based (FPSVDW+ES) vs standard DOCK (DCEVDW+ES) scoring as evaluated on three large datasets (680–775 systems) from the SB2010 database. Enhanced pose identification was also observed using FPS (45.4% or 70.9%) compared with DCE (17.8%) methods to rank challenging crossdocking ensembles from carbonic anhydrase. Enrichment tests, for three representative systems, revealed FPSVDW+ES scoring yields significant early fold enrichment in the top 10% of ranked databases. For EGFR, top FPS poses are nicely accommodated in the molecular envelope defined by the reference in comparison with DCE which yields distinct molecular weight bias towards larger molecules. Results from a representative virtual screen of ca. 1 million compounds additionally illustrate how ligands with footprints similar to a known inhibitor can readily be identified from within large commercially available databases. By providing an alternative way to rank ligand poses in a simple yet directed manner we anticipate that FPS scoring will be a useful tool for docking and structure-based design.
PMCID: PMC3181325  PMID: 21541962
Molecular Footprints; Molecular Fingerprints; Pose Comparison; Pose Rescoring; Docking; Virtual Screening; Enrichment; ROC Curves; Euclidean Distance; Pearson Correlation
5.  Can the Energy Gap in the Protein-Ligand Binding Energy Landscape Be Used as a Descriptor in Virtual Ligand Screening? 
PLoS ONE  2012;7(10):e46532.
The ranking of scores of individual chemicals within a large screening library is a crucial step in virtual screening (VS) for drug discovery. Previous studies showed that the quality of protein-ligand recognition can be improved using spectrum properties and the shape of the binding energy landscape. Here, we investigate whether the energy gap, defined as the difference between the lowest energy pose generated by a docking experiment and the average energy of all other generated poses and inferred to be a measure of the binding energy landscape sharpness, can improve the separation power between true binders and decoys with respect to the use of the best docking score. We performed retrospective single- and multiple-receptor conformation VS experiments in a diverse benchmark of 40 domains from 38 therapeutically relevant protein targets. Also, we tested the performance of the energy gap on 36 protein targets from the Directory of Useful Decoys (DUD). The results indicate that the energy gap outperforms the best docking score in its ability to discriminate between true binders and decoys, and true binders tend to have larger energy gaps than decoys. Furthermore, we used the energy gap as a descriptor to measure the height of the native binding phase and obtained a significant increase in the success rate of near native binding pose identification when the ligand binding conformations within the boundaries of the native binding phase were considered. The performance of the energy gap was also evaluated on an independent test case of VS-identified PKR-like ER-localized eIF2α kinase (PERK) inhibitors. We found that the energy gap was superior to the best docking score in its ability to more highly rank active compounds from inactive ones. These results suggest that the energy gap of the protein-ligand binding energy landscape is a valuable descriptor for use in VS.
PMCID: PMC3468575  PMID: 23071584
6.  Scoring functions and enrichment: a case study on Hsp90 
BMC Bioinformatics  2007;8:27.
The need for fast and accurate scoring functions has been driven by the increased use of in silico virtual screening twinned with high-throughput screening as a method to rapidly identify potential candidates in the early stages of drug development. We examine the ability of some the most common scoring functions (GOLD, ChemScore, DOCK, PMF, BLEEP and Consensus) to discriminate correctly and efficiently between active and non-active compounds among a library of ~3,600 diverse decoy compounds in a virtual screening experiment against heat shock protein 90 (Hsp90).
Firstly, we investigated two ranking methodologies, GOLDrank and BestScorerank. GOLDrank is based on ranks generated using GOLD. The various scoring functions, GOLD, ChemScore, DOCK, PMF, BLEEP and Consensus, are applied to the pose ranked number one by GOLD for that ligand. BestScorerank uses multiple poses for each ligand and independently chooses the best ranked pose of the ligand according to each different scoring function. Secondly, we considered the effect of introducing the Thr184 hydrogen bond tether to guide the docking process towards a particular solution, and its effect on enrichment. Thirdly, we considered normalisation to account for the known bias of scoring functions to select larger molecules. All the scoring functions gave fairly similar enrichments, with the exception of PMF which was consistently the poorest performer. In most cases, GOLD was marginally the best performing individual function; the Consensus score usually performed similarly to the best single scoring function. Our best results were obtained using the Thr184 tether in combination with the BestScorerank protocol and normalisation for molecular weight. For that particular combination, DOCK was the best individual function; DOCK recovered 90% of the actives in the top 10% of the ranked list; Consensus similarly recovered 89% of the actives in its top 10%.
Overall, we demonstrate the validity of virtual screening as a method for identifying new leads from a pool of ligands with similar physicochemical properties and we believe that the outcome of this study provides useful insight into the setting up of a suitable docking and scoring protocol, resulting in enrichment of 'target active' compounds.
PMCID: PMC1790905  PMID: 17257425
7.  Validation of Molecular Docking Programs for Virtual Screening against Dihydropteroate Synthase 
Dihydropteroate synthase (DHPS) is the target of the sulfonamide class of antibiotics and has been a validated antibacterial drug target for nearly 70 years. The sulfonamides target the p-aminobenzoic acid (pABA) binding site of DHPS and interfere with folate biosynthesis and ultimately prevent bacterial replication. However, widespread bacterial resistance to these drugs has severely limited their effectiveness. This study explores the second and more highly conserved pterin binding site of DHPS as an alternative approach to developing novel antibiotics that avoid resistance. In this study, five commonly-used docking programs, FlexX, Surflex, Glide, GOLD, and DOCK, and nine scoring functions, were evaluated for their ability to rank-order potential lead compounds for an extensive virtual screening study of the pterin binding site of B. anthracis DHPS. Their performance in ligand docking and scoring was judged by their ability to reproduce a known inhibitor conformation and to efficiently detect known active compounds seeded into three separate decoy sets. Two other metrics were used to assess performance; enrichment at 1% and 2%, and Receiver Operating Characteristic (ROC) curves. The effectiveness of post-docking relaxation prior to rescoring and consensus scoring were also evaluated. Finally, we have developed a straightforward statistical method of including the inhibition constants of the known active compounds when analyzing enrichment results to more accurately assess scoring performance, which we call the ‘sum of the sum of log rank’ or SSLR. Of the docking and scoring functions evaluated, Surflex with Surflex-Score and Glide with GlideScore were the best overall performers for use in virtual screening against the DHPS target, with neither combination showing statistically significant superiority over the other in enrichment studies or pose selection. Post-docking ligand relaxation and consensus scoring did not improve overall enrichment.
PMCID: PMC2788795  PMID: 19434845
8.  AMMOS: Automated Molecular Mechanics Optimization tool for in silico Screening 
BMC Bioinformatics  2008;9:438.
Virtual or in silico ligand screening combined with other computational methods is one of the most promising methods to search for new lead compounds, thereby greatly assisting the drug discovery process. Despite considerable progresses made in virtual screening methodologies, available computer programs do not easily address problems such as: structural optimization of compounds in a screening library, receptor flexibility/induced-fit, and accurate prediction of protein-ligand interactions. It has been shown that structural optimization of chemical compounds and that post-docking optimization in multi-step structure-based virtual screening approaches help to further improve the overall efficiency of the methods. To address some of these points, we developed the program AMMOS for refining both, the 3D structures of the small molecules present in chemical libraries and the predicted receptor-ligand complexes through allowing partial to full atom flexibility through molecular mechanics optimization.
The program AMMOS carries out an automatic procedure that allows for the structural refinement of compound collections and energy minimization of protein-ligand complexes using the open source program AMMP. The performance of our package was evaluated by comparing the structures of small chemical entities minimized by AMMOS with those minimized with the Tripos and MMFF94s force fields. Next, AMMOS was used for full flexible minimization of protein-ligands complexes obtained from a mutli-step virtual screening. Enrichment studies of the selected pre-docked complexes containing 60% of the initially added inhibitors were carried out with or without final AMMOS minimization on two protein targets having different binding pocket properties. AMMOS was able to improve the enrichment after the pre-docking stage with 40 to 60% of the initially added active compounds found in the top 3% to 5% of the entire compound collection.
The open source AMMOS program can be helpful in a broad range of in silico drug design studies such as optimization of small molecules or energy minimization of pre-docked protein-ligand complexes. Our enrichment study suggests that AMMOS, designed to minimize a large number of ligands pre-docked in a protein target, can successfully be applied in a final post-processing step and that it can take into account some receptor flexibility within the binding site area.
PMCID: PMC2588602  PMID: 18925937
9.  Combining docking with pharmacophore filtering for improved virtual screening 
Virtual screening is used to distinguish potential leads from inactive compounds in a database of chemical samples. One method for accomplishing this is by docking compounds into the structure of a receptor binding site in order to rank-order compounds by the quality of the interactions they form with the receptor. It is generally established that docking can be reasonably successful at generating good poses of a ligand in an active site. However, the scoring functions that are used with docking are typically not successful at correctly ranking ligands according to binding affinity or even distinguishing correct poses of a given ligand from incorrect ones.
We have developed a simple method for reducing the number of false positives in a virtual screen, meaning ligands which are scored highly by the docking program but do not bind well in reality. This method uses a docking program for pose generation without regard to scoring, followed by filtering with receptor-based pharmacophore searches. We applied it to three test-case targets: neuraminidase A, cyclin-dependent kinase 2, and the C1 domain of protein kinase C.
The pharmacophore filtering method can perform better than more traditional docking + scoring methods, and allows the advantages of both docking-based and pharmacophore-based approaches to virtual screening to be fully realized.
PMCID: PMC3152774  PMID: 20298524
10.  Cheminformatics Meets Molecular Mechanics: A Combined Application of Knowledge-based Pose Scoring and Physical Force Field-based Hit Scoring Functions Improves the Accuracy of Structure-Based Virtual Screening 
Poor performance of scoring functions is a well-known bottleneck in structure-based virtual screening, which is most frequently manifested in the scoring functions’ inability to discriminate between true ligands versus known non-binders (therefore designated as binding decoys). This deficiency leads to a large number of false positive hits resulting from virtual screening. We have hypothesized that filtering out or penalizing docking poses recognized as non-native (i.e., pose decoys) should improve the performance of virtual screening in terms of improved identification of true binders. Using several concepts from the field of cheminformatics, we have developed a novel approach to identifying pose decoys from an ensemble of poses generated by computational docking procedures. We demonstrate that the use of target-specific pose (-scoring) filter in combination with a physical force field-based scoring function (MedusaScore) leads to significant improvement of hit rates in virtual screening studies for 12 of the 13 benchmark sets from the clustered version of the Database of Useful Decoys (DUD). This new hybrid scoring function outperforms several conventional structure-based scoring functions, including XSCORE∷HMSCORE, ChemScore, PLP, and Chemgauss3, in six out of 13 data sets at early stage of VS (up 1% decoys of the screening database). We compare our hybrid method with several novel VS methods that were recently reported to have good performances on the same DUD data sets. We find that the retrieved ligands using our method are chemically more diverse in comparison with two ligand-based methods (FieldScreen and FLAP∷LBX). We also compare our method with FLAP∷RBLB, a high-performance VS method that also utilizes both the receptor and the cognate ligand structures. Interestingly, we find that the top ligands retrieved using our method are highly complementary to those retrieved using FLAP∷RBLB, hinting effective directions for best VS applications. We suggest that this integrative virtual screening approach combining cheminformatics and molecular mechanics methodologies may be applied to a broad variety of protein targets to improve the outcome of structure-based drug discovery studies.
PMCID: PMC3264743  PMID: 22017385
11.  pDOCK: a new technique for rapid and accurate docking of peptide ligands to Major Histocompatibility Complexes 
Immunome Research  2010;6(Suppl 1):S2.
Identification of antigenic peptide epitopes is an essential prerequisite in T cell-based molecular vaccine design. Computational (sequence-based and structure-based) methods are inexpensive and efficient compared to experimental approaches in screening numerous peptides against their cognate MHC alleles. In structure-based protocols, suited to alleles with limited epitope data, the first step is to identify high-binding peptides using docking techniques, which need improvement in speed and efficiency to be useful in large-scale screening studies. We present pDOCK: a new computational technique for rapid and accurate docking of flexible peptides to MHC receptors and primarily apply it on a non-redundant dataset of 186 pMHC (MHC-I and MHC-II) complexes with X-ray crystal structures.
We have compared our docked structures with experimental crystallographic structures for the immunologically relevant nonameric core of the bound peptide for MHC-I and MHC-II complexes. Primary testing for re-docking of peptides into their respective MHC grooves generated 159 out of 186 peptides with Cα RMSD of less than 1.00 Å, with a mean of 0.56 Å. Amongst the 25 peptides used for single and variant template docking, the Cα RMSD values were below 1.00 Å for 23 peptides. Compared to our earlier docking methodology, pDOCK shows upto 2.5 fold improvement in the accuracy and is ~60% faster. Results of validation against previously published studies represent a seven-fold increase in pDOCK accuracy.
The limitations of our previous methodology have been addressed in the new docking protocol making it a rapid and accurate method to evaluate pMHC binding. pDOCK is a generic method and although benchmarks against experimental structures, it can be applied to alleles with no structural data using sequence information. Our outcomes establish the efficacy of our procedure to predict highly accurate peptide structures permitting conformational sampling of the peptide in MHC binding groove. Our results also support the applicability of pDOCK for in silico identification of promiscuous peptide epitopes that are relevant to higher proportions of human population with greater propensity to activate T cells making them key targets for the design of vaccines and immunotherapies.
PMCID: PMC2946780  PMID: 20875153
12.  Analysis of multiple compound–protein interactions reveals novel bioactive molecules 
The authors use machine learning of compound-protein interactions to explore drug polypharmacology and to efficiently identify bioactive ligands, including novel scaffold-hopping compounds for two pharmaceutically important protein families: G-protein coupled receptors and protein kinases.
We have demonstrated that machine learning of multiple compound–protein interactions is useful for efficient ligand screening and for assessing drug polypharmacology.This approach successfully identified novel scaffold-hopping compounds for two pharmaceutically important protein families: G-protein-coupled receptors and protein kinases.These bioactive compounds were not detected by existing computational ligand-screening methods in comparative studies.The results of this study indicate that data derived from chemical genomics can be highly useful for exploring chemical space, and this systems biology perspective could accelerate drug discovery processes.
The discovery of novel bioactive molecules advances our systems-level understanding of biological processes and is crucial for innovation in drug development. Perturbations of biological systems by chemical probes provide broader applications not only for analysis of complex systems but also for intentional manipulations of these systems. Nevertheless, the lack of well-characterized chemical modulators has limited their use. Recently, chemical genomics has emerged as a promising area of research applicable to the exploration of novel bioactive molecules, and researchers are currently striving toward the identification of all possible ligands for all target protein families (Wang et al, 2009). Chemical genomics studies have shown that patterns of compound–protein interactions (CPIs) are too diverse to be understood as simple one-to-one events. There is an urgent need to develop appropriate data mining methods for characterizing and visualizing the full complexity of interactions between chemical space and biological systems. However, no existing screening approach has so far succeeded in identifying novel bioactive compounds using multiple interactions among compounds and target proteins.
High-throughput screening (HTS) and computational screening have greatly aided in the identification of early lead compounds for drug discovery. However, the large number of assays required for HTS to identify drugs that target multiple proteins render this process very costly and time-consuming. Therefore, interest in using in silico strategies for screening has increased. The most common computational approaches, ligand-based virtual screening (LBVS) and structure-based virtual screening (SBVS; Oprea and Matter, 2004; Muegge and Oloff, 2006; McInnes, 2007; Figure 1A), have been used for practical drug development. LBVS aims to identify molecules that are very similar to known active molecules and generally has difficulty identifying compounds with novel structural scaffolds that differ from reference molecules. The other popular strategy, SBVS, is constrained by the number of three-dimensional crystallographic structures available. To circumvent these limitations, we have shown that a new computational screening strategy, chemical genomics-based virtual screening (CGBVS), has the potential to identify novel, scaffold-hopping compounds and assess their polypharmacology by using a machine-learning method to recognize conserved molecular patterns in comprehensive CPI data sets.
The CGBVS strategy used in this study was made up of five steps: CPI data collection, descriptor calculation, representation of interaction vectors, predictive model construction using training data sets, and predictions from test data (Figure 1A). Importantly, step 1, the construction of a data set of chemical structures and protein sequences for known CPIs, did not require the three-dimensional protein structures needed for SBVS. In step 2, compound structures and protein sequences were converted into numerical descriptors. These descriptors were used to construct chemical or biological spaces in which decreasing distance between vectors corresponded to increasing similarity of compound structures or protein sequences. In step 3, we represented multiple CPI patterns by concatenating these chemical and protein descriptors. Using these interaction vectors, we could quantify the similarity of molecular interactions for compound–protein pairs, despite the fact that the ligand and protein similarity maps differed substantially. In step 4, concatenated vectors for CPI pairs (positive samples) and non-interacting pairs (negative samples) were input into an established machine-learning method. In the final step, the classifier constructed using training sets was applied to test data.
To evaluate the predictive value of CGBVS, we first compared its performance with that of LBVS by fivefold cross-validation. CGBVS performed with considerably higher accuracy (91.9%) than did LBVS (84.4%; Figure 1B). We next compared CGBVS and SBVS in a retrospective virtual screening based on the human β2-adrenergic receptor (ADRB2). Figure 1C shows that CGBVS provided higher hit rates than did SBVS. These results suggest that CGBVS is more successful than conventional approaches for prediction of CPIs.
We then evaluated the ability of the CGBVS method to predict the polypharmacology of ADRB2 by attempting to identify novel ADRB2 ligands from a group of G-protein-coupled receptor (GPCR) ligands. We ranked the prediction scores for the interactions of 826 reported GPCR ligands with ADRB2 and then analyzed the 50 highest-ranked compounds in greater detail. Of 21 commercially available compounds, 11 showed ADRB2-binding activity and were not previously reported to be ADRB2 ligands. These compounds included ligands not only for aminergic receptors but also for neuropeptide Y-type 1 receptors (NPY1R), which have low protein homology to ADRB2. Most ligands we identified were not detected by LBVS and SBVS, which suggests that only CGBVS could identify this unexpected cross-reaction for a ligand developed as a target to a peptidergic receptor.
The true value of CGBVS in drug discovery must be tested by assessing whether this method can identify scaffold-hopping lead compounds from a set of compounds that is structurally more diverse. To assess this ability, we analyzed 11 500 commercially available compounds to predict compounds likely to bind to two GPCRs and two protein kinases. Functional assays revealed that nine ADRB2 ligands, three NPY1R ligands, five epidermal growth factor receptor (EGFR) inhibitors, and two cyclin-dependent kinase 2 (CDK2) inhibitors were concentrated in the top-ranked compounds (hit rate=30, 15, 25, and 10%, respectively). We also evaluated the extent of scaffold hopping achieved in the identification of these novel ligands. One ADRB2 ligand, two NPY1R ligands, and one CDK2 inhibitor exhibited scaffold hopping (Figure 4), indicating that CGBVS can use this characteristic to rationally predict novel lead compounds, a crucial and very difficult step in drug discovery. This feature of CGBVS is critically different from existing predictive methods, such as LBVS, which depend on similarities between test and reference ligands, and focus on a single protein or highly homologous proteins. In particular, CGBVS is useful for targets with undefined ligands because this method can use CPIs with target proteins that exhibit lower levels of homology.
In summary, we have demonstrated that data mining of multiple CPIs is of great practical value for exploration of chemical space. As a predictive model, CGBVS could provide an important step in the discovery of such multi-target drugs by identifying the group of proteins targeted by a particular ligand, leading to innovation in pharmaceutical research.
The discovery of novel bioactive molecules advances our systems-level understanding of biological processes and is crucial for innovation in drug development. For this purpose, the emerging field of chemical genomics is currently focused on accumulating large assay data sets describing compound–protein interactions (CPIs). Although new target proteins for known drugs have recently been identified through mining of CPI databases, using these resources to identify novel ligands remains unexplored. Herein, we demonstrate that machine learning of multiple CPIs can not only assess drug polypharmacology but can also efficiently identify novel bioactive scaffold-hopping compounds. Through a machine-learning technique that uses multiple CPIs, we have successfully identified novel lead compounds for two pharmaceutically important protein families, G-protein-coupled receptors and protein kinases. These novel compounds were not identified by existing computational ligand-screening methods in comparative studies. The results of this study indicate that data derived from chemical genomics can be highly useful for exploring chemical space, and this systems biology perspective could accelerate drug discovery processes.
PMCID: PMC3094066  PMID: 21364574
chemical genomics; data mining; drug discovery; ligand screening; systems chemical biology
13.  Strategies for Lead Discovery: Application of Footprint Similarity Targeting HIVgp41 
Bioorganic & medicinal chemistry  2013;22(1):651-661.
A highly-conserved binding pocket on HIVgp41 is an important target for development of anti-viral inhibitors. Holden et al. (Bioorg. Med. Chem. Lett. 2012) recently reported 7 experimentally-verified leads identified through a computational screen to the gp41 pocket in conjunction with a new DOCK scoring method (termed FPS scoring) developed in our laboratory. The method employs molecular footprints based on per-residue van der Waals interactions, electrostatic interactions, or the sum. In this work, we critically examine the gp41 screening results, prioritized using different scoring methods, in terms of two main criteria: (1) ligand pose properties which include footprint and energy score decompositions, MW, number of rotatable bonds, ligand efficiency, formal charge, and volume overlap, and (2) ligand pose stability which includes footprint stability (changes in footprint overlap) and rmsd stability (changes in geometry). Relative to standard DOCK scoring, pose property analyses demonstrate how FPS scoring can be used to identify ligands that mimic a known reference (derived here from the native gp41 substrate), while pose stability analyses demonstrate how FPS scoring can be used to enrich for compounds with greater overall stability during molecular dynamics (MD) simulations. Compellingly, of the 115 compounds tested experimentally, the 7 active compounds, as a group, more closely mimic the footprints made by the reference and show greater MD stability compared to the inactive group. Extensive studies using 116 protein-ligand complexes as controls reveal that ligands in their crystallographic binding pose also maintain higher FPS scores and smaller rmsds than do accompanying decoys, confirming that native poses are indeed “stable” under the same conditions and that monitoring FPS variability during compound prioritization is likely to be beneficial. Overall, the results suggest the new scoring method will complement current virtual screening approaches for both the identification (FPS-ranking) and prioritization (FPS-stability) of target-compatible molecules in a quantitative and logical way.
PMCID: PMC3913180  PMID: 24315195
HIV; gp41; Protein-protein interactions; Docking; Virtual screening; DOCK; Footprint similarity; Scoring functions; Molecular dynamics
14.  Docking of molecules identified in bioactive medicinal plants extracts into the p50 NF-kappaB transcription factor: correlation with inhibition of NF-kappaB/DNA interactions and inhibitory effects on IL-8 gene expression 
The transcription factor NF-kappaB is a very interesting target molecule for the design on anti-tumor, anti-inflammatory and pro-apoptotic drugs. However, the application of the widely-used molecular docking computational method for the virtual screening of chemical libraries on NF-kappaB is not yet reported in literature. Docking studies on a dataset of 27 molecules from extracts of two different medicinal plants to NF-kappaB-p50 were performed with the purpose of developing a docking protocol fit for the target under study.
We enhanced the simple docking procedure by means of a sort of combined target- and ligand-based drug design approach. Advantages of this combination strategy, based on a similarity parameter for the identification of weak binding chemical entities, are illustrated in this work with the discovery of a new lead compound for NF-kappaB. Further biochemical analyses based on EMSA were performed and biological effects were tested on the compound exhibiting the best docking score. All experimental analysis were in fairly good agreement with molecular modeling findings.
The results obtained sustain the concept that the docking performance is predictive of a biochemical activity. In this respect, this paper represents the first example of successfully individuation through molecular docking simulations of a promising lead compound for the inhibition of NF-kappaB-p50 biological activity and modulation of the expression of the NF-kB regulated IL8 gene.
PMCID: PMC2543017  PMID: 18768082
15.  DOVIS: an implementation for high-throughput virtual screening using AutoDock 
BMC Bioinformatics  2008;9:126.
Molecular-docking-based virtual screening is an important tool in drug discovery that is used to significantly reduce the number of possible chemical compounds to be investigated. In addition to the selection of a sound docking strategy with appropriate scoring functions, another technical challenge is to in silico screen millions of compounds in a reasonable time. To meet this challenge, it is necessary to use high performance computing (HPC) platforms and techniques. However, the development of an integrated HPC system that makes efficient use of its elements is not trivial.
We have developed an application termed DOVIS that uses AutoDock (version 3) as the docking engine and runs in parallel on a Linux cluster. DOVIS can efficiently dock large numbers (millions) of small molecules (ligands) to a receptor, screening 500 to 1,000 compounds per processor per day. Furthermore, in DOVIS, the docking session is fully integrated and automated in that the inputs are specified via a graphical user interface, the calculations are fully integrated with a Linux cluster queuing system for parallel processing, and the results can be visualized and queried.
DOVIS removes most of the complexities and organizational problems associated with large-scale high-throughput virtual screening, and provides a convenient and efficient solution for AutoDock users to use this software in a Linux cluster platform.
PMCID: PMC2267697  PMID: 18304355
16.  Tumor necrosis factor receptor superfamily 10B (TNFRSF10B): an insight from structure modeling to virtual screening for designing drug against head and neck cancer 
Head and neck cancer (HNC) belongs to a group of heterogeneous disease with distinct patterns of behavior and presentation. TNFRSF10B, a tumor suppressor gene mapped on chromosome 8. Mutation in candidate gene is responsible for the loss of chromosome p arm which is frequently observed in head and neck tumors. TNFRSF10B inhibits tumor formation through apoptosis but deregulation encourages metastasis, migration and invasion of tumor cell tissues.
Structural modeling was performed by employing MODELLER (9v10). A suitable template [2ZB9] was retrieved from protein databank with query coverage and sequence identity of 84% and 30% respectively. Predicted Model evaluation form Rampage revealed 93.2% residues in favoured region, 5.7% in allowed region while only 1 residue is in outlier region. ERRAT and ProSA demonstrated 51.85% overall quality with a −1.08 Z-score of predicted model. Molecular Evolutionary Genetics Analysis (MEGA 5) tool was executed to infer an evolutionary history of TNFRSF10B candidate gene. Orthologs and paralogs [TNFRSF10A & TNFRSF10D] protein sequences of TNFRSF10B gene were retrieved for developed ancestral relationship. Topology of tree presenting TNFRSF10A gene considered as outgroup. Human and gorilla shared more than 90% similarities with conserved amino acid sequence. Virtual screening approach was appliedfor identification of novel inhibitors. Library (Mcule) was screened for novel inhibitors and utilized the scrutinized lead compounds for protein ligand docking. Screened lead compounds were further investigated for molecular docking studies. STRING server was employed to explore protein-protein interactions of TNFRSF10B target protein. TNFSF10 protein showed highest 0.999 confidence score and selected protein-protein docking by utilizing GRAMM-X server. In-silico docking results revealed I-58, S-90 and A-62 as most active interacting residues of TNFRSF10B receptor protein with R-130, S-156 and R-130 of TNFSF10B ligand protein.
Current research may provide a backbone for understanding structural and functional insights of TNFRSF10B protein. The designed novel inhibitors and predicted interactions might serve to inhibit the disease. Effective in-vitro potent ligands are required which will be helpful in future to design a drug to against Head and neck cancer disease. There is an urgent need for affective drug designing of head and neck cancer and computational tools for examining candidate genes more efficiently and accurately are required.
PMCID: PMC3691635  PMID: 23724937
Head and neck cancer; Modeling; Tumor necrosis factor; TNFRSF10B; Docking; MODELLER; Phylogenetic; Virtual screening; Inhibitors; Bioinformatics
17.  Prediction of protein-binding areas by small-world residue networks and application to docking 
BMC Bioinformatics  2011;12:378.
Protein-protein interactions are involved in most cellular processes, and their detailed physico-chemical and structural characterization is needed in order to understand their function at the molecular level. In-silico docking tools can complement experimental techniques, providing three-dimensional structural models of such interactions at atomic resolution. In several recent studies, protein structures have been modeled as networks (or graphs), where the nodes represent residues and the connecting edges their interactions. From such networks, it is possible to calculate different topology-based values for each of the nodes, and to identify protein regions with high centrality scores, which are known to positively correlate with key functional residues, hot spots, and protein-protein interfaces.
Here we show that this correlation can be efficiently used for the scoring of rigid-body docking poses. When integrated into the pyDock energy-based docking method, the new combined scoring function significantly improved the results of the individual components as shown on a standard docking benchmark. This improvement was particularly remarkable for specific protein complexes, depending on the shape, size, type, or flexibility of the proteins involved.
The network-based representation of protein structures can be used to identify protein-protein binding regions and to efficiently score docking poses, complementing energy-based approaches.
PMCID: PMC3189935  PMID: 21943333
protein interactions; small-world networks; binding site prediction; protein-protein docking; pyDock
18.  A Combination of Rescoring and Refinement Significantly Improves Protein Docking Performance 
Proteins  2008;72(1):270-279.
To determine the structures of protein-protein interactions, protein docking is a valuable tool that complements experimental methods to characterize protein complexes. While protein docking can often produce a near-native solution within a set of global docking predictions, there are sometimes predictions that require refinement to elucidate correct contacts and conformation. Previously, we developed the ZRANK algorithm to rerank initial docking predictions from ZDOCK, a docking program developed by our lab. In this study, we have applied the ZRANK algorithm toward refinement of protein docking models, in conjunction with the protein docking program RosettaDock. This was performed by reranking global docking predictions from ZDOCK, performing local side chain and rigid-body refinement using RosettaDock, and selecting the refined model based on ZRANK score. For comparison, we examined using RosettaDock score instead of ZRANK score, and a larger perturbation size for the RosettaDock search, and determined that the larger RosettaDock perturbation size with ZRANK scoring was optimal. This method was validated on a protein-protein docking benchmark. For refining docking benchmark predictions from the newest ZDOCK version, this led to improved structures of top-ranked hits in 20 of 27 cases, and an increase from 23 to 27 cases with hits in the top 20 predictions. Finally, we optimized the ZRANK energy function using refined models, which provides a significant improvement over the original ZRANK energy function. Using this optimized function and the refinement protocol, the numbers of cases with hits ranked at number one increased from 12 to 19 and from 7 to 15 for two different ZDOCK versions. This shows the effective combination of independently developed docking protocols (ZDOCK/ZRANK, and RosettaDock), indicating that using diverse search and scoring functions can improve protein docking results.
PMCID: PMC2696687  PMID: 18214977
19.  Protein docking by Rotation-Based Uniform Sampling (RotBUS) with fast computing of intermolecular contact distance and residue desolvation 
BMC Bioinformatics  2010;11:352.
Protein-protein interactions are fundamental for the majority of cellular processes and their study is of enormous biotechnological and therapeutic interest. In recent years, a variety of computational approaches to the protein-protein docking problem have been reported, with encouraging results. Most of the currently available protein-protein docking algorithms are composed of two clearly defined parts: the sampling of the rotational and translational space of the interacting molecules, and the scoring and clustering of the resulting orientations. Although this kind of strategy has shown some of the most successful results in the CAPRI blind test, more efforts need to be applied. Thus, the sampling protocol should generate a pool of conformations that include a sufficient number of near-native ones, while the scoring function should discriminate between near-native and non-near-native proposed conformations. On the other hand, protocols to efficiently include full flexibility on the protein structures are increasingly needed.
In these work we present new computational tools for protein-protein docking. We describe here the RotBUS (Rotation-Based Uniform Sampling) method to generate uniformly distributed sets of rigid-body docking poses, with a new fast calculation of the optimal contacting distance between molecules. We have tested the method on a standard benchmark of unbound structures and we can find near-native solutions in 100% of the cases. After applying a new fast filtering scheme based on residue-based desolvation, in combination with FTDock plus pyDock scoring, near-native solutions are found with rank ≤ 50 in 39% of the cases. Knowledge-based experimental restraints can be easily included to reduce computational times during sampling and improve success rates, and the method can be extended in the future to include flexibility of the side-chains.
This new sampling algorithm has the advantage of its high speed achieved by fast computing of the intermolecular distance based on a coarse representation of the interacting surfaces. In addition, a fast desolvation scoring permits the screening of millions of conformations at low computational cost, without compromising accuracy. The protocol presented here can be used as a framework to include restraints, flexibility and ensemble docking approaches.
PMCID: PMC2911459  PMID: 20584304
20.  An effective docking strategy for virtual screening based on multi-objective optimization algorithm 
BMC Bioinformatics  2009;10:58.
Development of a fast and accurate scoring function in virtual screening remains a hot issue in current computer-aided drug research. Different scoring functions focus on diverse aspects of ligand binding, and no single scoring can satisfy the peculiarities of each target system. Therefore, the idea of a consensus score strategy was put forward. Integrating several scoring functions, consensus score re-assesses the docked conformations using a primary scoring function. However, it is not really robust and efficient from the perspective of optimization. Furthermore, to date, the majority of available methods are still based on single objective optimization design.
In this paper, two multi-objective optimization methods, called MOSFOM, were developed for virtual screening, which simultaneously consider both the energy score and the contact score. Results suggest that MOSFOM can effectively enhance enrichment and performance compared with a single score. For three different kinds of binding sites, MOSFOM displays an excellent ability to differentiate active compounds through energy and shape complementarity. EFMOGA performed particularly well in the top 2% of database for all three cases, whereas MOEA_Nrg and MOEA_Cnt performed better than the corresponding individual scoring functions if the appropriate type of binding site was selected.
The multi-objective optimization method was successfully applied in virtual screening with two different scoring functions that can yield reasonable binding poses and can furthermore, be ranked with the potentially compromised conformations of each compound, abandoning those conformations that can not satisfy overall objective functions.
PMCID: PMC2753843  PMID: 19210777
21.  FINDSITELHM: A Threading-Based Approach to Ligand Homology Modeling 
PLoS Computational Biology  2009;5(6):e1000405.
Ligand virtual screening is a widely used tool to assist in new pharmaceutical discovery. In practice, virtual screening approaches have a number of limitations, and the development of new methodologies is required. Previously, we showed that remotely related proteins identified by threading often share a common binding site occupied by chemically similar ligands. Here, we demonstrate that across an evolutionarily related, but distant family of proteins, the ligands that bind to the common binding site contain a set of strongly conserved anchor functional groups as well as a variable region that accounts for their binding specificity. Furthermore, the sequence and structure conservation of residues contacting the anchor functional groups is significantly higher than those contacting ligand variable regions. Exploiting these insights, we developed FINDSITELHM that employs structural information extracted from weakly related proteins to perform rapid ligand docking by homology modeling. In large scale benchmarking, using the predicted anchor-binding mode and the crystal structure of the receptor, FINDSITELHM outperforms classical docking approaches with an average ligand RMSD from native of ∼2.5 Å. For weakly homologous receptor protein models, using FINDSITELHM, the fraction of recovered binding residues and specific contacts is 0.66 (0.55) and 0.49 (0.38) for highly confident (all) targets, respectively. Finally, in virtual screening for HIV-1 protease inhibitors, using similarity to the ligand anchor region yields significantly improved enrichment factors. Thus, the rather accurate, computationally inexpensive FINDSITELHM algorithm should be a useful approach to assist in the discovery of novel biopharmaceuticals.
Author Summary
As an integral part of drug development, high-throughput virtual screening is a widely used tool that could in principle significantly reduce the cost and time to discovery of new pharmaceuticals. In practice, virtual screening algorithms suffer from a number of limitations. The high sensitivity of all-atom ligand docking approaches to the quality of the target receptor structure restricts the selection of drug targets to those for which high-quality X-ray structures are available. Furthermore, the predicted binding affinity is typically strongly correlated with the molecular weight of the ligand, independent of whether or not it really binds. To address these significant problems, we developed FINDSITELHM, a novel threading-based approach that employs structural information extracted from weakly related proteins to perform rapid ligand docking and ranking that is very much in the spirit of homology modeling of protein structures. Particularly for low-quality modeled receptor structures, FINDSITELHM outperforms classical all-atom ligand docking approaches in terms of the accuracy of ligand binding pose prediction and requires considerably less CPU time. As an attractive alternative to classical molecular docking, FINDSITELHM offers the possibility of rapid structure-based virtual screening at the proteome level to improve and speed up the discovery of new biopharmaceuticals.
PMCID: PMC2685473  PMID: 19503616
22.  Expression cartography of human tissues using self organizing maps 
BMC Bioinformatics  2011;12:306.
Parallel high-throughput microarray and sequencing experiments produce vast quantities of multidimensional data which must be arranged and analyzed in a concerted way. One approach to addressing this challenge is the machine learning technique known as self organizing maps (SOMs). SOMs enable a parallel sample- and gene-centered view of genomic data combined with strong visualization and second-level analysis capabilities. The paper aims at bridging the gap between the potency of SOM-machine learning to reduce dimension of high-dimensional data on one hand and practical applications with special emphasis on gene expression analysis on the other hand.
The method was applied to generate a SOM characterizing the whole genome expression profiles of 67 healthy human tissues selected from ten tissue categories (adipose, endocrine, homeostasis, digestion, exocrine, epithelium, sexual reproduction, muscle, immune system and nervous tissues). SOM mapping reduces the dimension of expression data from ten of thousands of genes to a few thousand metagenes, each representing a minicluster of co-regulated single genes. Tissue-specific and common properties shared between groups of tissues emerge as a handful of localized spots in the tissue maps collecting groups of co-regulated and co-expressed metagenes. The functional context of the spots was discovered using overrepresentation analysis with respect to pre-defined gene sets of known functional impact. We found that tissue related spots typically contain enriched populations of genes related to specific molecular processes in the respective tissue. Analysis techniques normally used at the gene-level such as two-way hierarchical clustering are better represented and provide better signal-to-noise ratios if applied to the metagenes. Metagene-based clustering analyses aggregate the tissues broadly into three clusters containing nervous, immune system and the remaining tissues.
The SOM technique provides a more intuitive and informative global view of the behavior of a few well-defined modules of correlated and differentially expressed genes than the separate discovery of the expression levels of hundreds or thousands of individual genes. The program is available as R-package 'oposSOM'.
PMCID: PMC3161046  PMID: 21794127
23.  Mining SOM expression portraits: feature selection and integrating concepts of molecular function 
BioData Mining  2012;5:18.
Self organizing maps (SOM) enable the straightforward portraying of high-dimensional data of large sample collections in terms of sample-specific images. The analysis of their texture provides so-called spot-clusters of co-expressed genes which require subsequent significance filtering and functional interpretation. We address feature selection in terms of the gene ranking problem and the interpretation of the obtained spot-related lists using concepts of molecular function.
Different expression scores based either on simple fold change-measures or on regularized Student’s t-statistics are applied to spot-related gene lists and compared with special emphasis on the error characteristics of microarray expression data. The spot-clusters are analyzed using different methods of gene set enrichment analysis with the focus on overexpression and/or overrepresentation of predefined sets of genes. Metagene-related overrepresentation of selected gene sets was mapped into the SOM images to assign gene function to different regions. Alternatively we estimated set-related overexpression profiles over all samples studied using a gene set enrichment score. It was also applied to the spot-clusters to generate lists of enriched gene sets. We used the tissue body index data set, a collection of expression data of human tissues as an illustrative example. We found that tissue related spots typically contain enriched populations of gene sets well corresponding to molecular processes in the respective tissues. In addition, we display special sets of housekeeping and of consistently weak and high expressed genes using SOM data filtering.
The presented methods allow the comprehensive downstream analysis of SOM-transformed expression data in terms of cluster-related gene lists and enriched gene sets for functional interpretation. SOM clustering implies the ability to define either new gene sets using selected SOM spots or to verify and/or to amend existing ones.
PMCID: PMC3599960  PMID: 23043905
24.  An interaction-motif-based scoring function for protein-ligand docking 
BMC Bioinformatics  2010;11:298.
A good scoring function is essential for molecular docking computations. In conventional scoring functions, energy terms modeling pairwise interactions are cumulatively summed, and the best docking solution is selected. Here, we propose to transform protein-ligand interactions into three-dimensional geometric networks, from which recurring network substructures, or network motifs, are selected and used to provide probability-ranked interaction templates with which to score docking solutions.
A novel scoring function for protein-ligand docking, MotifScore, was developed. It is non-energy-based, and docking is, instead, scored by counting the occurrences of motifs of protein-ligand interaction networks constructed using structures of protein-ligand complexes. MotifScore has been tested on a benchmark set established by others to assess its ability to identify near-native complex conformations among a set of decoys. In this benchmark test, 84% of the highest-scored docking conformations had root-mean-square deviations (rmsds) below 2.0 Å from the native conformation, which is comparable with the best of several energy-based docking scoring functions. Many of the top motifs, which comprise a multitude of chemical groups that interact simultaneously and make a highly significant contribution to MotifScore, capture recurrent interacting patterns beyond pairwise interactions.
While providing quite good docking scores, MotifScore is quite different from conventional energy-based functions. MotifScore thus represents a new, network-based approach for exploring problems associated with molecular docking.
PMCID: PMC3098071  PMID: 20525216
25.  DockAnalyse: an application for the analysis of protein-protein interactions 
Is it possible to identify what the best solution of a docking program is? The usual answer to this question is the highest score solution, but interactions between proteins are dynamic processes, and many times the interaction regions are wide enough to permit protein-protein interactions with different orientations and/or interaction energies. In some cases, as in a multimeric protein complex, several interaction regions are possible among the monomers. These dynamic processes involve interactions with surface displacements between the proteins to finally achieve the functional configuration of the protein complex. Consequently, there is not a static and single solution for the interaction between proteins, but there are several important configurations that also have to be analyzed.
To extract those representative solutions from the docking output datafile, we have developed an unsupervised and automatic clustering application, named DockAnalyse. This application is based on the already existing DBscan clustering method, which searches for continuities among the clusters generated by the docking output data representation. The DBscan clustering method is very robust and, moreover, solves some of the inconsistency problems of the classical clustering methods like, for example, the treatment of outliers and the dependence of the previously defined number of clusters.
DockAnalyse makes the interpretation of the docking solutions through graphical and visual representations easier by guiding the user to find the representative solutions. We have applied our new approach to analyze several protein interactions and model the dynamic protein interaction behavior of a protein complex. DockAnalyse might also be used to describe interaction regions between proteins and, therefore, guide future flexible dockings. The application (implemented in the R package) is accessible.
PMCID: PMC2987812  PMID: 20969768

Results 1-25 (1514207)