Prediction of physical protein-protein interactions represents a key challenge in computational systems biology. This study provides a proof-of-principle that high-throughput in silico protein docking results can be used to predict interaction partners.
Deciphering the whole network of protein interactions for a given proteome (‘interactome') is the goal of many experimental and computational efforts in Systems Biology. Separately the prediction of the structure of protein complexes by docking methods is a well-established scientific area. To date, docking programs have not been used to predict interaction partners. We provide a proof of principle for such an approach. Using a set of protein complexes representing known interactors in their unbound form, we show that a standard docking program can distinguish the true interactors from a background of 922 non-redundant potential interactors. We additionally show that true interactions can be distinguished from non-likely interacting proteins within the same structural family. Our approach may be put in the context of the proposed ‘funnel-energy model'; the docking algorithm may not find the native complex, but it distinguishes binding partners because of the higher probability of favourable models compared with a collection of non-binders. The potential exists to develop this proof of principle into new approaches for predicting interaction partners and reconstructing biological networks.
interactome; protein docking; protein–protein interaction
Protein-RNA interactions play fundamental roles in many biological processes. Understanding the molecular mechanism of protein-RNA recognition and formation of protein-RNA complexes is a major challenge in structural biology. Unfortunately, the experimental determination of protein-RNA complexes is tedious and difficult, both by X-ray crystallography and NMR. For many interacting proteins and RNAs the individual structures are available, enabling computational prediction of complex structures by computational docking. However, methods for protein-RNA docking remain scarce, in particular in comparison to the numerous methods for protein-protein docking.
We developed two medium-resolution, knowledge-based potentials for scoring protein-RNA models obtained by docking: the quasi-chemical potential (QUASI-RNP) and the Decoys As the Reference State potential (DARS-RNP). Both potentials use a coarse-grained representation for both RNA and protein molecules and are capable of dealing with RNA structures with posttranscriptionally modified residues. We compared the discriminative power of DARS-RNP and QUASI-RNP for selecting rigid-body docking poses with the potentials previously developed by the Varani and Fernandez groups.
In both bound and unbound docking tests, DARS-RNP showed the highest ability to identify native-like structures. Python implementations of DARS-RNP and QUASI-RNP are freely available for download at http://iimcb.genesilico.pl/RNP/
RNA; protein; RNP; macromolecular docking; complex modeling; structural bioinformatics
G protein-coupled receptors (GPCRs) represent a large family of signaling proteins that includes many therapeutic targets; however, progress in identifying new small molecule drugs has been disappointing. The past four years have seen remarkable progress in the structural biology of GPCRs, raising the possibility of applying structure-based approaches to GPCR drug discovery efforts. Of the various structure-based approaches that have been applied to soluble protein targets, such as proteases and kinases, in silico docking is among the most ready applicable to GPCRs. Early studies suggest that GPCR binding pockets are well suited to docking, and docking screens have identified potent and novel compounds for these targets. This review will focus on the current state of in silico docking for GPCRs.
Many important cellular processes are carried out by protein complexes. To provide physical pictures of interacting proteins, many computational protein-protein prediction methods have been developed in the past. However, it is still difficult to identify the correct docking complex structure within top ranks among alternative conformations.
We present a novel protein docking algorithm that utilizes imperfect protein-protein binding interface prediction for guiding protein docking. Since the accuracy of protein binding site prediction varies depending on cases, the challenge is to develop a method which does not deteriorate but improves docking results by using a binding site prediction which may not be 100% accurate. The algorithm, named PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm), is based on a pair wise protein docking prediction algorithm, LZerD, which we have developed earlier. PI-LZerD starts from performing docking prediction using the provided protein-protein binding interface prediction as constraints, which is followed by the second round of docking with updated docking interface information to further improve docking conformation. Benchmark results on bound and unbound cases show that PI-LZerD consistently improves the docking prediction accuracy as compared with docking without using binding site prediction or using the binding site prediction as post-filtering.
We have developed PI-LZerD, a pairwise docking algorithm, which uses imperfect protein-protein binding interface prediction to improve docking accuracy. PI-LZerD consistently showed better prediction accuracy over alternative methods in the series of benchmark experiments including docking using actual docking interface site predictions as well as unbound docking cases.
protein docking prediction; protein-protein interaction; interaction site prediction
Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality.
In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems.
The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design.
We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem.
GPU.proton.DOCK (Genuine Protein Ultrafast proton equilibria consistent DOCKing) is a state of the art service for in silico prediction of protein–protein interactions via rigorous and ultrafast docking code. It is unique in providing stringent account of electrostatic interactions self-consistency and proton equilibria mutual effects of docking partners. GPU.proton.DOCK is the first server offering such a crucial supplement to protein docking algorithms—a step toward more reliable and high accuracy docking results. The code (especially the Fast Fourier Transform bottleneck and electrostatic fields computation) is parallelized to run on a GPU supercomputer. The high performance will be of use for large-scale structural bioinformatics and systems biology projects, thus bridging physics of the interactions with analysis of molecular networks. We propose workflows for exploring in silico charge mutagenesis effects. Special emphasis is given to the interface-intuitive and user-friendly. The input is comprised of the atomic coordinate files in PDB format. The advanced user is provided with a special input section for addition of non-polypeptide charges, extra ionogenic groups with intrinsic pKa values or fixed ions. The output is comprised of docked complexes in PDB format as well as interactive visualization in a molecular viewer. GPU.proton.DOCK server can be accessed at http://gpudock.orgchm.bas.bg/.
Molecular docking is the most commonly used technique in the modern drug discovery process where computational approaches involving docking algorithms are used to dock small molecules into macromolecular target structures. Over the recent years several evaluation studies have been reported by independent scientists comparing the performance of the docking programs by using default ‘black box’ protocols supplied by the software companies. Such studies have to be considered carefully as the docking programs can be tweaked towards optimum performance by selecting the parameters suitable for the target of interest. In this study we address the problem of selecting an appropriate docking and scoring function combination (88 docking algorithm-scoring functions) for substrate specificity predictions for feruloyl esterases, an industrially relevant enzyme family. We also propose the ‘Key Interaction Score System’ (KISS), a more biochemically meaningful measure for evaluation of docking programs based on pose prediction accuracy.
Antibodies play an increasing pivotal role in both basic research and the biopharmaceutical sector, therefore technology for characterizing and improving their properties through rational engineering is desirable. This is a difficult task thought to require high-resolution x-ray structures, which are not always available. We, instead, use a combination of solution NMR epitope mapping and computational docking to investigate the structure of a human antibody in complex with the four Dengue virus serotypes. Analysis of the resulting models allows us to design several antibody mutants altering its properties in a predictable manner, changing its binding selectivity and ultimately improving its ability to neutralize the virus by up to 40 fold. The successful rational design of antibody mutants is a testament to the accuracy achievable by combining experimental NMR epitope mapping with computational docking and to the possibility of applying it to study antibody/pathogen interactions.
Protein-peptide interactions are vital for the cell. They mediate, inhibit or serve as structural components in nearly 40% of all macromolecular interactions, and are often associated with diseases, making them interesting leads for protein drug design. In recent years, large-scale technologies have enabled exhaustive studies on the peptide recognition preferences for a number of peptide-binding domain families. Yet, the paucity of data regarding their molecular binding mechanisms together with their inherent flexibility makes the structural prediction of protein-peptide interactions very challenging. This leaves flexible docking as one of the few amenable computational techniques to model these complexes. We present here an ensemble, flexible protein-peptide docking protocol that combines conformational selection and induced fit mechanisms. Starting from an ensemble of three peptide conformations (extended, a-helix, polyproline-II), flexible docking with HADDOCK generates 79.4% of high quality models for bound/unbound and 69.4% for unbound/unbound docking when tested against the largest protein-peptide complexes benchmark dataset available to date. Conformational selection at the rigid-body docking stage successfully recovers the most relevant conformation for a given protein-peptide complex and the subsequent flexible refinement further improves the interface by up to 4.5 Å interface RMSD. Cluster-based scoring of the models results in a selection of near-native solutions in the top three for ∼75% of the successfully predicted cases. This unified conformational selection and induced fit approach to protein-peptide docking should open the route to the modeling of challenging systems such as disorder-order transitions taking place upon binding, significantly expanding the applicability limit of biomolecular interaction modeling by docking.
Computer-based docking screens are now widely used to discover new ligands for targets of known structure; in the last two years alone, the discovery of ligands for over 20 proteins have been reported. Recently, investigators have also turned to predicting new substrates and for enzymes of unknown function, taking docking in a wholly new direction. Increasingly, the hit-rates, the true- and the false-positives from the docking screens are being compared to those from empirical, high-throughput screens, revealing the strengths, weaknesses and complementarities of both techniques. The recent efflorescence of GPCR structures has made these quintessential drug targets available to structure-based approaches. Consistent with their “druggability”, the docking screens have returned high hit-rates and potent molecules. Finally, in the last several years, an approach almost exactly opposite to docking has also appeared; this pharmacological network approach begins not with the structure of the target but rather those of drug molecules and asks, given a pattern of chemistry in the ligands, what targets may a particular drug bind to? This method, which returns to an older, pharmacology logic, has been surprisingly successful in predicting new “off-targets” for established drugs.
Despite continuous efforts of the international community to reduce the impact of malaria on developing countries, no significant progress has been made in the recent years and the discovery of new drugs is more than ever needed. Out of the many proteins involved in the metabolic activities of the Plasmodium parasite, some are promising targets to carry out rational drug discovery.
Recent years have witnessed the emergence of grids, which are highly distributed computing infrastructures particularly well fitted for embarrassingly parallel computations like docking. In 2005, a first attempt at using grids for large-scale virtual screening focused on plasmepsins and ended up in the identification of previously unknown scaffolds, which were confirmed in vitro to be active plasmepsin inhibitors. Following this success, a second deployment took place in the fall of 2006 focussing on one well known target, dihydrofolate reductase (DHFR), and on a new promising one, glutathione-S-transferase.
In silico drug design, especially vHTS is a widely and well-accepted technology in lead identification and lead optimization. This approach, therefore builds, upon the progress made in computational chemistry to achieve more accurate in silico docking and in information technology to design and operate large scale grid infrastructures.
On the computational side, a sustained infrastructure has been developed: docking at large scale, using different strategies in result analysis, storing of the results on the fly into MySQL databases and application of molecular dynamics refinement are MM-PBSA and MM-GBSA rescoring. The modeling results obtained are very promising. Based on the modeling results, In vitro results are underway for all the targets against which screening is performed.
The current paper describes the rational drug discovery activity at large scale, especially molecular docking using FlexX software on computational grids in finding hits against three different targets (PfGST, PfDHFR, PvDHFR (wild type and mutant forms) implicated in malaria. Grid-enabled virtual screening approach is proposed to produce focus compound libraries for other biological targets relevant to fight the infectious diseases of the developing world.
Over the last years, large scale proteomics studies have generated a wealth of information of biomolecular complexes. Adding the structural dimension to the resulting interactomes represents a major challenge that classical structural experimental methods alone will have difficulties to confront. To meet this challenge, complementary modeling techniques such as docking are thus needed. Among the current docking methods, HADDOCK (High Ambiguity-Driven DOCKing) distinguishes itself from others by the use of experimental and/or bioinformatics data to drive the modeling process and has shown a strong performance in the critical assessment of prediction of interactions (CAPRI), a blind experiment for the prediction of interactions. Although most docking programs are limited to binary complexes, HADDOCK can deal with multiple molecules (up to six), a capability that will be required to build large macromolecular assemblies. We present here a novel web interface of HADDOCK that allows the user to dock up to six biomolecules simultaneously. This interface allows the inclusion of a large variety of both experimental and/or bioinformatics data and supports several types of cyclic and dihedral symmetries in the docking of multibody assemblies. The server was tested on a benchmark of six cases, containing five symmetric homo-oligomeric protein complexes and one symmetric protein-DNA complex. Our results reveal that, in the presence of either bioinformatics and/or experimental data, HADDOCK shows an excellent performance: in all cases, HADDOCK was able to generate good to high quality solutions and ranked them at the top, demonstrating its ability to model symmetric multicomponent assemblies. Docking methods can thus play an important role in adding the structural dimension to interactomes. However, although the current docking methodologies were successful for a vast range of cases, considering the variety and complexity of macromolecular assemblies, inclusion of some kind of experimental information (e.g. from mass spectrometry, nuclear magnetic resonance, cryoelectron microscopy, etc.) will remain highly desirable to obtain reliable results.
Computational prediction of the 3D structures of molecular interactions is a challenging area, often requiring significant computational resources to produce structural predictions with atomic-level accuracy. This can be particularly burdensome when modeling large sets of interactions, macromolecular assemblies, or interactions between flexible proteins. We previously developed a protein docking program, ZDOCK, which uses a fast Fourier transform to perform a 3D search of the spatial degrees of freedom between two molecules. By utilizing a pairwise statistical potential in the ZDOCK scoring function, there were notable gains in docking accuracy over previous versions, but this improvement in accuracy came at a substantial computational cost. In this study, we incorporated a recently developed 3D convolution library into ZDOCK, and additionally modified ZDOCK to dynamically orient the input proteins for more efficient convolution. These modifications resulted in an average of over 8.5-fold improvement in running time when tested on 176 cases in a newly released protein docking benchmark, as well as substantially less memory usage, with no loss in docking accuracy. We also applied these improvements to a previous version of ZDOCK that uses a simpler non-pairwise atomic potential, yielding an average speed improvement of over 5-fold on the docking benchmark, while maintaining predictive success. This permits the utilization of ZDOCK for more intensive tasks such as docking flexible molecules and modeling of interactomes, and can be run more readily by those with limited computational resources.
Small molecule docking predicts the interaction of a small molecule ligand with a protein at atomic-detail accuracy including position and conformation the ligand but also conformational changes of the protein upon ligand binding. While successful in the majority of cases, docking algorithms including RosettaLigand fail in some cases to predict the correct protein/ligand complex structure. In this study we show that simultaneous docking of explicit interface water molecules greatly improves Rosetta’s ability to distinguish correct from incorrect ligand poses. This result holds true for both protein-centric water docking wherein waters are located relative to the protein binding site and ligand-centric water docking wherein waters move with the ligand during docking. Protein-centric docking is used to model 99 HIV-1 protease/protease inhibitor structures. We find protease inhibitor placement improving at a ratio of 9∶1 when one critical interface water molecule is included in the docking simulation. Ligand-centric docking is applied to 341 structures from the CSAR benchmark of diverse protein/ligand complexes . Across this diverse dataset we see up to 56% recovery of failed docking studies, when waters are included in the docking simulation.
Macromolecular complexes are the molecular machines of the cell. Knowledge at the atomic level is essential to understand and influence their function. However, their number is huge and a significant fraction is extremely difficult to study using classical structural methods such as NMR and X-ray crystallography. Therefore, the importance of large-scale computational approaches in structural biology is evident. This study combines two of these computational approaches, interface prediction and docking, to obtain atomic-level structures of protein-protein complexes, starting from their unbound components.
Here we combine six interface prediction web servers into a consensus method called CPORT (Consensus Prediction Of interface Residues in Transient complexes). We show that CPORT gives more stable and reliable predictions than each of the individual predictors on its own. A protocol was developed to integrate CPORT predictions into our data-driven docking program HADDOCK. For cases where experimental information is limited, this prediction-driven docking protocol presents an alternative to ab initio docking, the docking of complexes without the use of any information. Prediction-driven docking was performed on a large and diverse set of protein-protein complexes in a blind manner. Our results indicate that the performance of the HADDOCK-CPORT combination is competitive with ZDOCK-ZRANK, a state-of-the-art ab initio docking/scoring combination. Finally, the original interface predictions could be further improved by interface post-prediction (contact analysis of the docking solutions).
The current study shows that blind, prediction-driven docking using CPORT and HADDOCK is competitive with ab initio docking methods. This is encouraging since prediction-driven docking represents the absolute bottom line for data-driven docking: any additional biological knowledge will greatly improve the results obtained by prediction-driven docking alone. Finally, the fact that original interface predictions could be further improved by interface post-prediction suggests that prediction-driven docking has not yet been pushed to the limit. A web server for CPORT is freely available at http://haddock.chem.uu.nl/services/CPORT.
In modeling ligand-protein interactions, the representation and role of water is of great importance. We introduce a forcefield and hydration docking method that enables the automated prediction of waters mediating the binding of ligands with target proteins. The method presumes no prior knowledge of the apo or holo protein hydration state, and is potentially useful in the process of structure-based drug discovery. The hydration forcefield accounts for the entropic and enthalpic contributions of discrete waters to ligand binding, improving energy estimation accuracy and docking performance. The forcefield has been calibrated and validated on a total of 417 complexes (197 training set; 220 test set), then tested in cross-docking experiments, for a total of 1649 ligand-protein complexes evaluated. The method is computationally efficient and was used to model up to 35 waters during docking. The method was implemented and tested using unaltered AutoDock4 with new forcefield tables.
The community-wide GPCR Dock assessment is conducted to evaluate the status of molecular modeling and ligand docking for human G-protein coupled receptors. The present round of the assessment was based on the recent structures of dopamine D3 and CXCR4 chemokine receptors bound to small molecule antagonists and CXCR4 with a synthetic cyclopeptide. Thirty five groups submitted their receptor-ligand complex structure predictions prior to the release of the crystallographic coordinates. With closely related experimentally determined homology templates, as for dopamine D3 receptor, and with incorporation of biochemical and QSAR data, modern computational techniques predicted complex details with accuracy approaching experimental. In contrast, CXCR4 complexes that had less characterized interactions and only distant homology to the known GPCR structures still remained very challenging. The assessment results provide guidance for modeling and crystallographic communities in method development and target selection for further expansion of the structural coverage of the GPCR universe.
G-protein coupled receptor; chemokine receptor CXCR4; dopamine D3 receptor; homology modeling; docking; experimental restraints; structure prediction; atomic contacts; site-directed mutagenesis
Protein–protein docking algorithms aim to predict the structure of a complex given the atomic structures of the proteins that assemble it. The docking procedure usually consists of two main steps: docking candidate generation and their refinement. The refinement stage aims to improve the accuracy of the candidate solutions and to identify near-native solutions among them. During protein–protein interaction, both side chains and backbone change their conformation. Refinement methods should model these conformational changes in order to obtain a more accurate model of the complex. Handling protein backbone flexibility is a major challenge for docking methodologies, since backbone flexibility adds a huge number of degrees of freedom to the search space. FiberDock is the first docking refinement web server, which accounts for both backbone and side-chain flexibility. Given a set of up to 100 potential docking candidates, FiberDock models the backbone and side-chain movements that occur during the interaction, refines the structures and scores them according to an energy function. The FiberDock web server is free and available with no login requirement at http://bioinfo3d.cs.tau.ac.il/FiberDock/.
The number of protein targets with a known or predicted tri-dimensional structure and of drug-like chemical compounds is growing rapidly and so is the need for new therapeutic compounds or chemical probes. Performing flexible structure-based virtual screening computations on thousands of targets with millions of molecules is intractable to most laboratories nor indeed desirable. Since shape complementarity is of primary importance for most protein-ligand interactions, we have developed a tool/protocol based on rigid-body docking to select compounds that fit well into binding sites.
Here we present an efficient multiple conformation rigid-body docking approach, MS-DOCK, which is based on the program DOCK. This approach can be used as the first step of a multi-stage docking/scoring protocol. First, we developed and validated the Multiconf-DOCK tool that generates several conformers per input ligand. Then, each generated conformer (bioactives and 37970 decoys) was docked rigidly using DOCK6 with our optimized protocol into seven different receptor-binding sites. MS-DOCK was able to significantly reduce the size of the initial input library for all seven targets, thereby facilitating subsequent more CPU demanding flexible docking procedures.
MS-DOCK can be easily used for the generation of multi-conformer libraries and for shape-based filtering within a multi-step structure-based screening protocol in order to shorten computation times.
Protein design has many applications not only in biotechnology but also in basic science. It uses our current knowledge in structural biology to predict, by computer simulations, an amino acid sequence that would produce a protein with targeted properties. As in other examples of synthetic biology, this approach allows the testing of many hypotheses in biology. The recent development of automated computational methods to design proteins has enabled proteins to be designed that are very different from any known ones. Moreover, some of those methods mostly rely on a physical description of atomic interactions, which allows the designed sequences not to be biased towards known proteins. In this paper, we will describe the use of energy functions in computational protein design, the use of atomic models to evaluate the free energy in the unfolded and folded states, the exploration and optimization of amino acid sequences, the problem of negative design and the design of biomolecular function. We will also consider its use together with the experimental techniques such as directed evolution. We will end by discussing the challenges ahead in computational protein design and some of their future applications.
computational; design; proteins
Understanding the interaction between viral proteins and neutralizing antibodies at atomic resolution is hindered by a lack of experimentally solved complexes. Progress in computational docking has led to the prediction of increasingly high-quality model antibody-antigen complexes. The accuracy of atomic-level docking predictions is improved when integrated with experimental information and expert knowledge.
Binding affinity data associated with somatic mutations of a rotavirus-specific human adult antibody (RV6-26) are used to filter potential docking orientations of an antibody homology model with respect to the rotavirus VP6 crystal structure. The antibody structure is used to probe the VP6 trimer for candidate interface residues.
Three conformational epitopes are proposed. These epitopes are candidate antigenic regions for site-directed mutagenesis of VP6, which will help further elucidate antigenic function. A pseudo-atomic resolution RV6-26 antibody-VP6 complex is proposed consistent with current experimental information.
The use of mutagenesis constraints in docking calculations allows for the identification of a small number of alternative arrangements of the antigen-antibody interface. The mutagenesis information from the natural evolution of a neutralizing antibody can be used to discriminate between residue-scale models and create distance constraints for atomic-resolution docking. The integration of binding affinity data or other information with computation may be an advantageous approach to assist peptide engineering or therapeutic antibody design.
Modelling the interactions of biological molecules, or docking, is critical both to understanding basic life processes and to designing new drugs. The field programmable gate array (FPGA) based acceleration of a recently developed, complex, production docking code is described. The authors found that it is necessary to extend their previous three-dimensional (3D) correlation structure in several ways, most significantly to support simultaneous computation of several correlation functions. The result for small-molecule docking is a 100-fold speed-up of a section of the code that represents over 95% of the original run-time. An additional 2% is accelerated through a previously described method, yielding a total acceleration of 36× over a single core and 10× over a quad-core. This approach is found to be an ideal complement to graphics processing unit (GPU) based docking, which excels in the protein–protein domain.
Accurately predicting the binding affinities of large sets of diverse protein-ligand complexes is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for analysing the outputs of Molecular Docking, which is in turn an important technique for drug discovery, chemical biology and structural biology. Each scoring function assumes a predetermined theory-inspired functional form for the relationship between the variables that characterise the complex, which also include parameters fitted to experimental or simulation data, and its predicted binding affinity. The inherent problem of this rigid approach is that it leads to poor predictivity for those complexes that do not conform to the modelling assumptions. Moreover, resampling strategies, such as cross-validation or bootstrapping, are still not systematically used to guard against the overfitting of calibration data in parameter estimation for scoring functions.
We propose a novel scoring function (RF-Score) that circumvents the need for problematic modelling assumptions via non-parametric machine learning. In particular, Random Forest was used to implicitly capture binding effects that are hard to model explicitly. RF-Score is compared with the state of the art on the demanding PDBbind benchmark. Results show that RF-Score is a very competitive scoring function. Importantly, RF-Score’s performance was shown to improve dramatically with training set size and hence the future availability of more high quality structural and interaction data is expected to lead to improved versions of RF-Score.
Virtual or in silico ligand screening combined with other computational methods is one of the most promising methods to search for new lead compounds, thereby greatly assisting the drug discovery process. Despite considerable progresses made in virtual screening methodologies, available computer programs do not easily address problems such as: structural optimization of compounds in a screening library, receptor flexibility/induced-fit, and accurate prediction of protein-ligand interactions. It has been shown that structural optimization of chemical compounds and that post-docking optimization in multi-step structure-based virtual screening approaches help to further improve the overall efficiency of the methods. To address some of these points, we developed the program AMMOS for refining both, the 3D structures of the small molecules present in chemical libraries and the predicted receptor-ligand complexes through allowing partial to full atom flexibility through molecular mechanics optimization.
The program AMMOS carries out an automatic procedure that allows for the structural refinement of compound collections and energy minimization of protein-ligand complexes using the open source program AMMP. The performance of our package was evaluated by comparing the structures of small chemical entities minimized by AMMOS with those minimized with the Tripos and MMFF94s force fields. Next, AMMOS was used for full flexible minimization of protein-ligands complexes obtained from a mutli-step virtual screening. Enrichment studies of the selected pre-docked complexes containing 60% of the initially added inhibitors were carried out with or without final AMMOS minimization on two protein targets having different binding pocket properties. AMMOS was able to improve the enrichment after the pre-docking stage with 40 to 60% of the initially added active compounds found in the top 3% to 5% of the entire compound collection.
The open source AMMOS program can be helpful in a broad range of in silico drug design studies such as optimization of small molecules or energy minimization of pre-docked protein-ligand complexes. Our enrichment study suggests that AMMOS, designed to minimize a large number of ligands pre-docked in a protein target, can successfully be applied in a final post-processing step and that it can take into account some receptor flexibility within the binding site area.
Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur.
We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-αRMSD ≤ 2.5 Å) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases.
We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods.