We report on the prediction accuracy of ligand-based (2D QSAR) and structure-based (MedusaDock) methods used both independently and in consensus for ranking the congeneric series of ligands binding to three protein targets (UK, ERK2, and CHK1) from the CSAR 2011 benchmark exercise. An ensemble of predictive QSAR models was developed using known binders of these three targets extracted from the publicly-available ChEMBL database. Selected models were used to predict the binding affinity of CSAR compounds towards the corresponding targets and rank them accordingly; the overall ranking accuracy evaluated by Spearman correlation was as high as 0.78 for UK, 0.60 for ERK2, and 0.56 for CHK1, placing our predictions in top-10% among all the participants. In parallel, MedusaDock designed to predict reliable docking poses was also used for ranking the CSAR ligands according to their docking scores; the resulting accuracy (Spearman correlation) for UK, ERK2, and CHK1 were 0.76, 0.31, and 0.26, respectively. In addition, performance of several consensus approaches combining MedusaDock and QSAR predicted ranks altogether has been explored; the best approach yielded Spearman correlation coefficients for UK, ERK2, and CHK1 of 0.82, 0.50, and 0.45, respectively. This study shows that (i) externally validated 2D QSAR models were capable of ranking CSAR ligands at least as accurately as more computationally intensive structure-based approaches used both by us and by other groups and (ii) ligand-based QSAR models can complement structure-based approaches by boosting the prediction performances when used in consensus.
Ligand virtual screening is a widely used tool to assist in new pharmaceutical
discovery. In practice, virtual screening approaches have a number of
limitations, and the development of new methodologies is required. Previously,
we showed that remotely related proteins identified by threading often share a
common binding site occupied by chemically similar ligands. Here, we demonstrate
that across an evolutionarily related, but distant family of proteins, the
ligands that bind to the common binding site contain a set of strongly conserved
anchor functional groups as well as a variable region that accounts for their
binding specificity. Furthermore, the sequence and structure conservation of
residues contacting the anchor functional groups is significantly higher than
those contacting ligand variable regions. Exploiting these insights, we
developed FINDSITELHM that employs structural information extracted
from weakly related proteins to perform rapid ligand docking by homology
modeling. In large scale benchmarking, using the predicted anchor-binding mode
and the crystal structure of the receptor, FINDSITELHM outperforms
classical docking approaches with an average ligand RMSD from native of
∼2.5 Å. For weakly homologous receptor protein models, using
FINDSITELHM, the fraction of recovered binding residues and
specific contacts is 0.66 (0.55) and 0.49 (0.38) for highly confident (all)
targets, respectively. Finally, in virtual screening for HIV-1 protease
inhibitors, using similarity to the ligand anchor region yields significantly
improved enrichment factors. Thus, the rather accurate, computationally
inexpensive FINDSITELHM algorithm should be a useful approach to
assist in the discovery of novel biopharmaceuticals.
As an integral part of drug development, high-throughput virtual screening is a
widely used tool that could in principle significantly reduce the cost and time
to discovery of new pharmaceuticals. In practice, virtual screening algorithms
suffer from a number of limitations. The high sensitivity of all-atom ligand
docking approaches to the quality of the target receptor structure restricts the
selection of drug targets to those for which high-quality X-ray structures are
available. Furthermore, the predicted binding affinity is typically strongly
correlated with the molecular weight of the ligand, independent of whether or
not it really binds. To address these significant problems, we developed
FINDSITELHM, a novel threading-based approach that employs
structural information extracted from weakly related proteins to perform rapid
ligand docking and ranking that is very much in the spirit of homology modeling
of protein structures. Particularly for low-quality modeled receptor structures,
FINDSITELHM outperforms classical all-atom ligand docking
approaches in terms of the accuracy of ligand binding pose prediction and
requires considerably less CPU time. As an attractive alternative to classical
molecular docking, FINDSITELHM offers the possibility of rapid
structure-based virtual screening at the proteome level to improve and speed up
the discovery of new biopharmaceuticals.
Virtual or in silico ligand screening combined with other computational methods is one of the most promising methods to search for new lead compounds, thereby greatly assisting the drug discovery process. Despite considerable progresses made in virtual screening methodologies, available computer programs do not easily address problems such as: structural optimization of compounds in a screening library, receptor flexibility/induced-fit, and accurate prediction of protein-ligand interactions. It has been shown that structural optimization of chemical compounds and that post-docking optimization in multi-step structure-based virtual screening approaches help to further improve the overall efficiency of the methods. To address some of these points, we developed the program AMMOS for refining both, the 3D structures of the small molecules present in chemical libraries and the predicted receptor-ligand complexes through allowing partial to full atom flexibility through molecular mechanics optimization.
The program AMMOS carries out an automatic procedure that allows for the structural refinement of compound collections and energy minimization of protein-ligand complexes using the open source program AMMP. The performance of our package was evaluated by comparing the structures of small chemical entities minimized by AMMOS with those minimized with the Tripos and MMFF94s force fields. Next, AMMOS was used for full flexible minimization of protein-ligands complexes obtained from a mutli-step virtual screening. Enrichment studies of the selected pre-docked complexes containing 60% of the initially added inhibitors were carried out with or without final AMMOS minimization on two protein targets having different binding pocket properties. AMMOS was able to improve the enrichment after the pre-docking stage with 40 to 60% of the initially added active compounds found in the top 3% to 5% of the entire compound collection.
The open source AMMOS program can be helpful in a broad range of in silico drug design studies such as optimization of small molecules or energy minimization of pre-docked protein-ligand complexes. Our enrichment study suggests that AMMOS, designed to minimize a large number of ligands pre-docked in a protein target, can successfully be applied in a final post-processing step and that it can take into account some receptor flexibility within the binding site area.
The number of protein targets with a known or predicted tri-dimensional structure and of drug-like chemical compounds is growing rapidly and so is the need for new therapeutic compounds or chemical probes. Performing flexible structure-based virtual screening computations on thousands of targets with millions of molecules is intractable to most laboratories nor indeed desirable. Since shape complementarity is of primary importance for most protein-ligand interactions, we have developed a tool/protocol based on rigid-body docking to select compounds that fit well into binding sites.
Here we present an efficient multiple conformation rigid-body docking approach, MS-DOCK, which is based on the program DOCK. This approach can be used as the first step of a multi-stage docking/scoring protocol. First, we developed and validated the Multiconf-DOCK tool that generates several conformers per input ligand. Then, each generated conformer (bioactives and 37970 decoys) was docked rigidly using DOCK6 with our optimized protocol into seven different receptor-binding sites. MS-DOCK was able to significantly reduce the size of the initial input library for all seven targets, thereby facilitating subsequent more CPU demanding flexible docking procedures.
MS-DOCK can be easily used for the generation of multi-conformer libraries and for shape-based filtering within a multi-step structure-based screening protocol in order to shorten computation times.
The success of ligand docking calculations typically depends on the quality of the receptor structure. Given improvements in protein structure prediction approaches, approximate protein models now can be routinely obtained for the majority of gene products in a given proteome. Structure-based virtual screening of large combinatorial libraries of lead candidates against theoretically modeled receptor structures requires fast and reliable docking techniques capable of dealing with structural inaccuracies in protein models. Here, we present Q-DockLHM, a method for low-resolution refinement of binding poses provided by FINDSITELHM, a ligand homology modeling approach. We compare its performance to that of classical ligand docking approaches in ligand docking against a representative set of experimental (both holo and apo) as well as theoretically modeled receptor structures. Docking benchmarks reveal that unlike all-atom docking, Q-DockLHM exhibits the desired tolerance to the receptor’s structure deformation. Our results suggest that the use of an evolution-based approach to ligand homology modeling followed by fast low-resolution refinement is capable of achieving satisfactory performance in ligand-binding pose prediction with promising applicability to proteome-scale applications.
Q-dock; ligand docking; homology modeling; low-resolution modeling; threading
Protein/receptor explicit flexibility has recently become an important feature of molecular docking simulations. Taking the flexibility into account brings the docking simulation closer to the receptors’ real behaviour in its natural environment. Several approaches have been developed to address this problem. Among them, modelling the full flexibility as an ensemble of snapshots derived from a molecular dynamics simulation (MD) of the receptor has proved very promising. Despite its potential, however, only a few studies have employed this method to probe its effect in molecular docking simulations. We hereby use ensembles of snapshots obtained from three different MD simulations of the InhA enzyme from M. tuberculosis (Mtb), the wild-type (InhA_wt), InhA_I16T, and InhA_I21V mutants to model their explicit flexibility, and to systematically explore their effect in docking simulations with three different InhA inhibitors, namely, ethionamide (ETH), triclosan (TCL), and pentacyano(isoniazid)ferrate(II) (PIF).
The use of fully-flexible receptor (FFR) models of InhA_wt, InhA_I16T, and InhA_I21V mutants in docking simulation with the inhibitors ETH, TCL, and PIF revealed significant differences in the way they interact as compared to the rigid, InhA crystal structure (PDB ID: 1ENY). In the latter, only up to five receptor residues interact with the three different ligands. Conversely, in the FFR models this number grows up to an astonishing 80 different residues. The comparison between the rigid crystal structure and the FFR models showed that the inclusion of explicit flexibility, despite the limitations of the FFR models employed in this study, accounts in a substantial manner to the induced fit expected when a protein/receptor and ligand approach each other to interact in the most favourable manner.
Protein/receptor explicit flexibility, or FFR models, represented as an ensemble of MD simulation snapshots, can lead to a more realistic representation of the induced fit effect expected in the encounter and proper docking of receptors to ligands. The FFR models of InhA explicitly characterizes the overall movements of the amino acid residues in helices, strands, loops, and turns, allowing the ligand to properly accommodate itself in the receptor’s binding site. Utilization of the intrinsic flexibility of Mtb’s InhA enzyme and its mutants in virtual screening via molecular docking simulation may provide a novel platform to guide the rational or dynamical-structure-based drug design of novel inhibitors for Mtb’s InhA. We have produced a short video sequence of each ligand (ETH, TCL and PIF) docked to the FFR models of InhA_wt. These videos are available at http://www.inf.pucrs.br/~osmarns/LABIO/Videos_Cohen_et_al_19_07_2011.htm.
Virtual compound screening using molecular docking is widely used in the discovery of new lead compounds for drug design. However, this method is not completely reliable and therefore unsatisfactory. In this study, we used massive molecular dynamics simulations of protein-ligand conformations obtained by molecular docking in order to improve the enrichment performance of molecular docking. Our screening approach employed the molecular mechanics/Poisson-Boltzmann and surface area method to estimate the binding free energies. For the top-ranking 1,000 compounds obtained by docking to a target protein, approximately 6,000 molecular dynamics simulations were performed using multiple docking poses in about a week. As a result, the enrichment performance of the top 100 compounds by our approach was improved by 1.6–4.0 times that of the enrichment performance of molecular dockings. This result indicates that the application of molecular dynamics simulations to virtual screening for lead discovery is both effective and practical. However, further optimization of the computational protocols is required for screening various target proteins.
Lead discovery is one of the most important processes in rational drug design. To improve the rate of the detection of lead compounds, various technologies such as high-throughput screening and combinatorial chemistry have been introduced into the pharmaceutical industry. However, since these technologies alone may not improve lead productivity, computational screening has become important. A central method for computational screening is molecular docking. This method generally docks many flexible ligands to a rigid protein and predicts the binding affinity for each ligand in a practical time. However, its ability to detect lead compounds is less reliable. In contrast, molecular dynamics simulations can treat both proteins and ligands in a flexible manner, directly estimate the effect of explicit water molecules, and provide more accurate binding affinity, although their computational costs and times are significantly greater than those of molecular docking. Therefore, we developed a special purpose computer “MDGRAPE-3” for molecular dynamics simulations and applied it to computational screening. In this paper, we report an effective method for computational screening; this method is a combination of molecular docking and massive-scale molecular dynamics simulations. The proposed method showed a higher and more stable enrichment performance than the molecular docking method used alone.
Identification of antigenic peptide epitopes is an essential prerequisite in T cell-based molecular vaccine design. Computational (sequence-based and structure-based) methods are inexpensive and efficient compared to experimental approaches in screening numerous peptides against their cognate MHC alleles. In structure-based protocols, suited to alleles with limited epitope data, the first step is to identify high-binding peptides using docking techniques, which need improvement in speed and efficiency to be useful in large-scale screening studies. We present pDOCK: a new computational technique for rapid and accurate docking of flexible peptides to MHC receptors and primarily apply it on a non-redundant dataset of 186 pMHC (MHC-I and MHC-II) complexes with X-ray crystal structures.
We have compared our docked structures with experimental crystallographic structures for the immunologically relevant nonameric core of the bound peptide for MHC-I and MHC-II complexes. Primary testing for re-docking of peptides into their respective MHC grooves generated 159 out of 186 peptides with Cα RMSD of less than 1.00 Å, with a mean of 0.56 Å. Amongst the 25 peptides used for single and variant template docking, the Cα RMSD values were below 1.00 Å for 23 peptides. Compared to our earlier docking methodology, pDOCK shows upto 2.5 fold improvement in the accuracy and is ~60% faster. Results of validation against previously published studies represent a seven-fold increase in pDOCK accuracy.
The limitations of our previous methodology have been addressed in the new docking protocol making it a rapid and accurate method to evaluate pMHC binding. pDOCK is a generic method and although benchmarks against experimental structures, it can be applied to alleles with no structural data using sequence information. Our outcomes establish the efficacy of our procedure to predict highly accurate peptide structures permitting conformational sampling of the peptide in MHC binding groove. Our results also support the applicability of pDOCK for in silico identification of promiscuous peptide epitopes that are relevant to higher proportions of human population with greater propensity to activate T cells making them key targets for the design of vaccines and immunotherapies.
Incorporating receptor flexibility into molecular docking should improve results for flexible proteins. However, the incorporation of explicit all-atom flexibility with molecular dynamics for the entire protein chain may also introduce significant error and “noise” that could decrease docking accuracy and deteriorate the ability of a scoring function to rank native-like poses. We address this apparent paradox by comparing the success of several flexible receptor models in cross-docking and multiple receptor ensemble docking for p38α mitogen-activated protein (MAP) kinase. Explicit all-atom receptor flexibility has been incorporated into a CHARMM-based molecular docking method (CDOCKER) using both molecular dynamics (MD) and torsion angle molecular dynamics (TAMD) for the refinement of predicted protein-ligand binding geometries. These flexible receptor models have been evaluated, and the accuracy and efficiency of TAMD sampling is directly compared to MD sampling. Several flexible receptor models are compared, encompassing flexible side chains, flexible loops, multiple flexible backbone segments, and treatment of the entire chain as flexible. We find that although including side chain and some backbone flexibility is required for improved docking accuracy as expected, docking accuracy also diminishes as additional and unnecessary receptor flexibility is included into the conformational search space. Ensemble docking results demonstrate that including protein flexibility leads to to improved agreement with binding data for 227 active compounds. This comparison also demonstrates that a flexible receptor model enriches high affinity compound identification without significantly increasing the number of false positives from low affinity compounds.
CDOCKER; CHARMM; Binding Pocket; Protein-Ligand Interactions; Flexible Docking; DFG-out; linear interaction energy
Computational tools are essential in the drug design process, especially in order to take advantage of the increasing numbers of solved X-ray and NMR protein–ligand structures. Nowadays, molecular docking methods are routinely used for prediction of protein–ligand interactions and to aid in selecting potent molecules as a part of virtual screening of large databases. The improvements and advances in computational capacity in the last decade have allowed for further developments in molecular docking algorithms to address more complicated aspects such as protein flexibility. The effects of incorporation of active site water molecules and implicit or explicit solvation of the binding site are other relevant issues to be addressed in the docking procedures. Using the right docking algorithm at the right stage of virtual screening is most important. We report a staged study to address the effects of various aspects of protein flexibility and inclusion of active site water molecules on docking effectiveness to retrieve (and to be able to predict) correct ligand poses and to rank docked ligands in relation to their biological activity, for CHK1, ERK2, LpxC and UPA. We generated multiple conformers for the ligand, and compared different docking algorithms that use a variety of approaches to protein flexibility, including rigid receptor, soft receptor, flexible side chains, induced-fit, and multiple structure algorithms. Docking accuracy varied from 1 to 84%, demonstrating that the choice of method is important.
protein sampling; ligand sampling; conformational sampling; molecular docking; active site waters; CSAR; CHK1; ERK2; LpxC and UPA
High resolution structures of antibody-antigen complexes are useful for analyzing the binding interface and to make rational choices for antibody engineering. When a crystallographic structure of a complex is unavailable, the structure must be predicted using computational tools. In this work, we illustrate a novel approach, named SnugDock, to predict high-resolution antibody-antigen complex structures by simultaneously structurally optimizing the antibody-antigen rigid-body positions, the relative orientation of the antibody light and heavy chains, and the conformations of the six complementarity determining region loops. This approach is especially useful when the crystal structure of the antibody is not available, requiring allowances for inaccuracies in an antibody homology model which would otherwise frustrate rigid-backbone docking predictions. Local docking using SnugDock with the lowest-energy RosettaAntibody homology model produced more accurate predictions than standard rigid-body docking. SnugDock can be combined with ensemble docking to mimic conformer selection and induced fit resulting in increased sampling of diverse antibody conformations. The combined algorithm produced four medium (Critical Assessment of PRediction of Interactions-CAPRI rating) and seven acceptable lowest-interface-energy predictions in a test set of fifteen complexes. Structural analysis shows that diverse paratope conformations are sampled, but docked paratope backbones are not necessarily closer to the crystal structure conformations than the starting homology models. The accuracy of SnugDock predictions suggests a new genre of general docking algorithms with flexible binding interfaces targeted towards making homology models useful for further high-resolution predictions.
Antibodies are proteins that are key elements of the immune system and increasingly used as drugs. Antibodies bind tightly and specifically to antigens to block their activity or to mark them for destruction. Three-dimensional structures of the antibody-antigen complexes are useful for understanding their mechanism and for designing improved antibody drugs. Experimental determination of structures is laborious and not always possible, so we have developed tools to predict structures of antibody-antigen complexes computationally. Computer-predicted models of antibodies, or homology models, typically have errors which can frustrate algorithms for prediction of protein-protein interfaces (docking), and result in incorrect predictions. Here, we have created and tested a new docking algorithm which incorporates flexibility to overcome structural errors in the antibody structural model. The algorithm allows both intramolecular and interfacial flexibility in the antibody during docking, resulting in improved accuracy approaching that when using experimentally determined antibody structures. Structural analysis of the predicted binding region of the complex will enable the protein engineer to make rational choices for better antibody drug designs.
Poor performance of scoring functions is a well-known bottleneck in structure-based virtual screening, which is most frequently manifested in the scoring functions’ inability to discriminate between true ligands versus known non-binders (therefore designated as binding decoys). This deficiency leads to a large number of false positive hits resulting from virtual screening. We have hypothesized that filtering out or penalizing docking poses recognized as non-native (i.e., pose decoys) should improve the performance of virtual screening in terms of improved identification of true binders. Using several concepts from the field of cheminformatics, we have developed a novel approach to identifying pose decoys from an ensemble of poses generated by computational docking procedures. We demonstrate that the use of target-specific pose (-scoring) filter in combination with a physical force field-based scoring function (MedusaScore) leads to significant improvement of hit rates in virtual screening studies for 12 of the 13 benchmark sets from the clustered version of the Database of Useful Decoys (DUD). This new hybrid scoring function outperforms several conventional structure-based scoring functions, including XSCORE∷HMSCORE, ChemScore, PLP, and Chemgauss3, in six out of 13 data sets at early stage of VS (up 1% decoys of the screening database). We compare our hybrid method with several novel VS methods that were recently reported to have good performances on the same DUD data sets. We find that the retrieved ligands using our method are chemically more diverse in comparison with two ligand-based methods (FieldScreen and FLAP∷LBX). We also compare our method with FLAP∷RBLB, a high-performance VS method that also utilizes both the receptor and the cognate ligand structures. Interestingly, we find that the top ligands retrieved using our method are highly complementary to those retrieved using FLAP∷RBLB, hinting effective directions for best VS applications. We suggest that this integrative virtual screening approach combining cheminformatics and molecular mechanics methodologies may be applied to a broad variety of protein targets to improve the outcome of structure-based drug discovery studies.
In silico molecular docking is an essential step in modern drug discovery when driven by a well defined macromolecular target. Hence, the process is called structure-based or rational drug design (RDD). In the docking step of RDD the macromolecule or receptor is usually considered a rigid body. However, we know from biology that macromolecules such as enzymes and membrane receptors are inherently flexible. Accounting for this flexibility in molecular docking experiments is not trivial. One possibility, which we call a fully-flexible receptor model, is to use a molecular dynamics simulation trajectory of the receptor to simulate its explicit flexibility. To benefit from this concept, which has been known since 2000, it is essential to develop and improve new tools that enable molecular docking simulations of fully-flexible receptor models.
We have developed a Flexible-Receptor Docking Workflow System (FReDoWS) to automate molecular docking simulations using a fully-flexible receptor model. In addition, it includes a snapshot selection feature to facilitate acceleration the virtual screening of ligands for well defined disease targets. FReDoWS usefulness is demonstrated by investigating the docking of four different ligands to flexible models of Mycobacterium tuberculosis’ wild type InhA enzyme and mutants I21V and I16T. We find that all four ligands bind effectively to this receptor as expected from the literature on similar, but wet experiments.
A work that would usually need the manual execution of many computer programs, and the manipulation of thousands of files, was efficiently and automatically performed by FReDoWS. Its friendly interface allows the user to change the docking and execution parameters. Besides, the snapshot selection feature allowed the acceleration of docking simulations. We expect FReDoWS to help us explore more of the role flexibility plays in receptor-ligand interactions. FReDoWS can be made available upon request to the authors.
Benchmarks for molecular docking have historically focused on re-docking the cognate ligand of a well-determined protein-ligand complex to measure geometric pose prediction accuracy, and measurement of virtual screening performance has been focused on increasingly large and diverse sets of target protein structures, cognate ligands, and various types of decoy sets. Here, pose prediction is reported on the Astex Diverse set of 85 protein ligand complexes, and virtual screening performance is reported on the DUD set of 40 protein targets. In both cases, prepared structures of targets and ligands were provided by symposium organizers. The re-prepared data sets yielded results not significantly different than previous reports of Surflex-Dock on the two benchmarks. Minor changes to protein coordinates resulting from complex pre-optimization had large effects on observed performance, highlighting the limitations of cognate ligand re-docking for pose prediction assessment. Docking protocols developed for cross-docking, which address protein flexibility and produce discrete families of predicted poses, produced substantially better performance for pose prediction. Performance on virtual screening performance was shown to benefit by employing and combining multiple screening methods: docking, 2D molecular similarity, and 3D molecular similarity. In addition, use of multiple protein conformations significantly improved screening enrichment.
For many targets of pharmaceutical importance conformational changes of the receptor protein are relevant during the ligand binding process. A new docking approach, ReFlexIn (Receptor Flexibility by Interpolation), that combines receptor flexibility with the computationally efficient potential grid representation of receptor molecules has been evaluated on the retroviral HIV-1 (Human Immunodeficiency Virus 1) protease system. An approximate inclusion of receptor flexibility is achieved by using interpolation between grid representations of individual receptor conformations. For the retroviral protease the method was tested on an ensemble of protease structures crystallized in the presence of different ligands and on a set of structures obtained from morphing between the unbound and a ligand-bound protease structure. Docking was performed on ligands known to bind to the protease and several non-binders. For the binders the ReFlexIn method yielded in almost all cases ligand placements in similar or closer agreement with experiment than docking to any of the ensemble members without degrading the discrimination with respect to non-binders. The improved docking performance compared to docking to rigid receptors allows for systematic virtual screening applications at very small additional computational cost.
The rapidly growing number of theoretically predicted protein structures requires robust methods that can utilize low-quality receptor structures as targets for ligand docking. Typically, docking accuracy falls off dramatically when apo or modeled receptors are used in docking experiments. Low-resolution ligand docking techniques have been developed to deal with structural inaccuracies in predicted receptor models. In this spirit, we describe the development and optimization of a knowledge-based potential implemented in Q-Dock, a low-resolution flexible ligand docking approach. Self-docking experiments using crystal structures reveals satisfactory accuracy, comparable with all-atom docking. All-atom models reconstructed from Q-Dock’s low-resolution models can be further refined by even a simple all-atom energy minimization. In decoy-docking against distorted receptor models with a root-mean-square deviation, RMSD, from native of ~3 Å, Q-Dock recovers on average 15–20% more specific contacts and 25–35% more binding residues than all-atom methods. To further improve docking accuracy against low-quality protein models, we propose a pocket-specific protein-ligand interaction potential derived from weakly homologous threading holo-templates. The success rate of Q-Dock employing a pocket-specific potential is 6.3 times higher than that previously reported for the Dolores method, another low-resolution docking approach.
Q-Dock; ligand docking; low-resolution docking; pocket-specific potential; protein models; threading
To find out whether linarin can be used as a potential natural inhibitor to target CDK4 in retinoblastoma using virtual screening studies.
Materials and Methods:
In this study, molecular modeling and protein structure optimization was performed for crystal structure of CDK4 (PDB id: 3G33), and was subjected to Molecular Dynamics (MD) simulation for 10 nanoseconds, as a preparatory process for docking. Furthermore, the stable conformation obtained in the MD simulation was utilized for virtual screening against the library of natural compounds in Indian Plant Anticancer Compounds Database (InPACdb) using AutoDock Vina. Finally, best docked ligands were revalidated individually through semi-flexible docking by AutoDock 4.0.
The CDK4 structure was stereochemically optimized to fix clashes and bad angles, which placed 96.4% residues in the core region of Ramachandran plot. The final structure of CDK4 that emerged after MD simulation was proven to be highly stable as per different validation tools. Virtual screening and docking was carried out for CDK4 against optimized ligands from InPACdb through AutoDock Vina. This inferred Linarin (Inpacdb AC.NO. acd0073) as a potential therapeutic agent with binding energy of -8.9 kJ/mol. Furthermore, it was also found to be valid as per AutoDock 4.0 semi-flexible docking procedure, with the binding energy of -8.18 kJ/mol and Ki value of 1.01 μM.
The docking results indicate linarin, a flavonoid plant compound, as a potential inhibitor of CDK4 compared to some of the currently practiced anticancer drugs for retinoblastoma. This finding can be extended to experimental validation to assess the in vivo efficacy of the identified compound.
Cyclin-dependent kinase 4; InPACdb; molecular docking; molecular dynamics; retinoblastoma; virtual screening
The rapidly increasing number of high-resolution X-ray structures of G-protein coupled receptors (GPCRs) creates a unique opportunity to employ comparative modeling and docking to provide valuable insight into the function and ligand binding determinants of novel receptors, to assist in virtual screening and to design and optimize drug candidates. However, low sequence identity between receptors, conformational flexibility, and chemical diversity of ligands present an enormous challenge to molecular modeling approaches. It is our hypothesis that rapid Monte-Carlo sampling of protein backbone and side-chain conformational space with Rosetta can be leveraged to meet this challenge. This study performs unbiased comparative modeling and docking methodologies using 14 distinct high-resolution GPCRs and proposes knowledge-based filtering methods for improvement of sampling performance and identification of correct ligand-receptor interactions. On average, top ranked receptor models built on template structures over 50% sequence identity are within 2.9 Å of the experimental structure, with an average root mean square deviation (RMSD) of 2.2 Å for the transmembrane region and 5 Å for the second extracellular loop. Furthermore, these models are consistently correlated with low Rosetta energy score. To predict their binding modes, ligand conformers of the 14 ligands co-crystalized with the GPCRs were docked against the top ranked comparative models. In contrast to the comparative models themselves, however, it remains difficult to unambiguously identify correct binding modes by score alone. On average, sampling performance was improved by 103 fold over random using knowledge-based and energy-based filters. In assessing the applicability of experimental constraints, we found that sampling performance is increased by one order of magnitude for every 10 residues known to contact the ligand. Additionally, in the case of DOR, knowledge of a single specific ligand-protein contact improved sampling efficiency 7 fold. These findings offer specific guidelines which may lead to increased success in determining receptor-ligand complexes.
The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains.
We present a side-chain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein side-chain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library.
We focused on improving the accuracy of side-chain conformation prediction. For a testing set of 442 proteins, 87.19% of X1 and 77.11% of X12 angles were predicted correctly within 40° of the X-ray positions. We compared the accuracy of pacoPacker with state-of-the-art methods, such as CIS-RR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CIS-RR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to state-of-the-art solutions for packing side chains.
This parallel approach combines various sources of searching intelligence and energy functions to pack protein side chains. It provides a frame-work for combining different inaccuracy/usefulness objective functions by designing parallel heuristic search algorithms.
protein; side chains; pack; ACO; parallel
As part of the SAMPL4 blind challenge, filtered AutoDock Vina ligand docking predictions and large scale binding energy distribution analysis method binding free energy calculations have been applied to the virtual screening of a focused library of candidate binders to the LEDGF site of the HIV integrase protein. The computational protocol leveraged docking and high level atomistic models to improve enrichment. The enrichment factor of our blind predictions ranked best among all of the computational submissions, and second best overall. This work represents to our knowledge the first example of the application of an all-atom physics-based binding free energy model to large scale virtual screening. A total of 285 parallel Hamiltonian replica exchange molecular dynamics absolute protein-ligand binding free energy simulations were conducted starting from docked poses. The setup of the simulations was fully automated, calculations were distributed on multiple computing resources and were completed in a 6-weeks period. The accuracy of the docked poses and the inclusion of intramolecular strain and entropic losses in the binding free energy estimates were the major factors behind the success of the method. Lack of sufficient time and computing resources to investigate additional protonation states of the ligands was a major cause of mispredictions. The experiment demonstrated the applicability of binding free energy modeling to improve hit rates in challenging virtual screening of focused ligand libraries during lead optimization.
Binding free energy; Reorganization free energy; Free energy ligand screening; BEDAM; HIV Integrase
Virtual screening is becoming an important tool for drug discovery. However, the application of virtual screening has been limited by the lack of accurate scoring functions. Here, we present a novel scoring function, MedusaScore, for evaluating protein-ligand binding. MedusaScore is based on models of physical interactions that include van der Waals, solvation and hydrogen bonding energies. To ensure the best transferability of the scoring function, we do not use any protein-ligand experimental data for parameter training. We then test the MedusaScore for docking decoy recognition and binding affinity prediction and find superior performance compared to other widely used scoring functions. Statistical analysis indicates that one source of inaccuracy of MedusaScore may arise from the unaccounted entropic loss upon ligand binding, which suggests avenues of approach for further MedusaScore improvement.
The increasing resistance to current therapeutic agents for HIV drug regiment remains a major problem for effective acquired immune deficiency syndrome (AIDS) therapy. Many potential inhibitors have today been developed which inhibits key cellular pathways in the HIV cycle. Inhibition of HIV-1 reverse transcriptase associated ribonuclease H (RNase H) function provides a novel target for anti-HIV chemotherapy. Here we report on the applicability of conceptually different in silico approaches as virtual screening (VS) tools in order to efficiently identify RNase H inhibitors from large chemical databases. The methods used here include machine-learning algorithms (e.g. support vector machine, random forest and kappa nearest neighbor), shape similarity (rapid overlay of chemical structures), pharmacophore, molecular interaction fields-based fingerprints for ligands and protein (FLAP) and flexible ligand docking methods. The results show that receptor-based flexible docking experiments provides good enrichment (80–90%) compared to ligand-based approaches such as FLAP (74%), shape similarity (75%) and random forest (72%). Thus, this study suggests that flexible docking experiments is the model of choice in terms of best retrieval of active from inactive compounds and efficiency and efficacy schemes. Moreover, shape similarity, machine learning and FLAP models could also be used for further validation or filtration in virtual screening processes. The best models could potentially be use for identifying structurally diverse and selective RNase H inhibitors from large chemical databases. In addition, pharmacophore models suggest that the inter-distance between hydrogen bond acceptors play a key role in inhibition of the RNase H domain through metal chelation.
Small molecule docking predicts the interaction of a small molecule ligand with a protein at atomic-detail accuracy including position and conformation the ligand but also conformational changes of the protein upon ligand binding. While successful in the majority of cases, docking algorithms including RosettaLigand fail in some cases to predict the correct protein/ligand complex structure. In this study we show that simultaneous docking of explicit interface water molecules greatly improves Rosetta’s ability to distinguish correct from incorrect ligand poses. This result holds true for both protein-centric water docking wherein waters are located relative to the protein binding site and ligand-centric water docking wherein waters move with the ligand during docking. Protein-centric docking is used to model 99 HIV-1 protease/protease inhibitor structures. We find protease inhibitor placement improving at a ratio of 9∶1 when one critical interface water molecule is included in the docking simulation. Ligand-centric docking is applied to 341 structures from the CSAR benchmark of diverse protein/ligand complexes . Across this diverse dataset we see up to 56% recovery of failed docking studies, when waters are included in the docking simulation.
Transcriptional regulation of some genes involved in xenobiotic detoxification and apoptosis is performed via the human pregnane X receptor (PXR) which in turn is activated by structurally diverse agonists including steroid hormones. Activation of PXR has the potential to initiate adverse effects, altering drug pharmacokinetics or perturbing physiological processes. Reliable computational prediction of PXR agonists would be valuable for pharmaceutical and toxicological research. There has been limited success with structure-based modeling approaches to predict human PXR activators. Slightly better success has been achieved with ligand-based modeling methods including quantitative structure-activity relationship (QSAR) analysis, pharmacophore modeling and machine learning. In this study, we present a comprehensive analysis focused on prediction of 115 steroids for ligand binding activity towards human PXR. Six crystal structures were used as templates for docking and ligand-based modeling approaches (two-, three-, four- and five-dimensional analyses). The best success at external prediction was achieved with 5D-QSAR. Bayesian models with FCFP_6 descriptors were validated after leaving a large percentage of the dataset out and using an external test set. Docking of ligands to the PXR structure co-crystallized with hyperforin had the best statistics for this method. Sulfated steroids (which are activators) were consistently predicted as non-activators while, poorly predicted steroids were docked in a reverse mode compared to 5α-androstan-3β-ol. Modeling of human PXR represents a complex challenge by virtue of the large, flexible ligand-binding cavity. This study emphasizes this aspect, illustrating modest success using the largest quantitative data set to date and multiple modeling approaches.
Promiscuous proteins generally bind a large array of diverse ligand structures. This may be facilitated by a very large binding site, multiple binding sites, or a flexible binding site that can adjust to the size of the ligand. These aspects also increase the complexity of predicting whether a molecule will bind or not to such proteins which frequently function as exogenous compound sensors to respond to toxic stress. For example, transporters may prevent absorption of some molecules, and enzymes may convert them to more readily excretable compounds (or alternatively activate them prior to further clearance by other detoxification enzymes). Nuclear hormone receptors may respond to ligands and then affect downstream gene expression to upregulate both enzymes and transporters to increase the clearance for the same or different molecules. We have assessed the ability of many different ligand-based and structure-based computational approaches to model and predict the activation of human PXR by steroidal compounds. We find the most effective computational approach to identify potential steroidal PXR agonists which are clinically relevant due to their widespread use in clinical medicine and the presence of mimics in the environment.
Virtual (computational) screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4).
Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O) traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers.
Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI) and node-level (OpenMP) parallelization to best fit both workloads and computational resources.