RosettaDock has been increasingly used in protein docking and design strategies in order to predict the structure of protein-protein interfaces. Here we test capabilities of RosettaDock 3.2, part of the newly developed Rosetta v3.2 modeling suite, against Docking Benchmark 3.0, and compare it with RosettaDock v2.3, the latest version of the previous Rosetta software package. The benchmark contains a diverse set of 116 docking targets including 22 antibody-antigen complexes, 33 enzyme-inhibitor complexes, and 60 ‘other’ complexes. These targets were further classified by expected docking difficulty into 84 rigid-body targets, 17 medium targets, and 14 difficult targets. We carried out local docking perturbations for each target, using the unbound structures when available, in both RosettaDock v2.3 and v3.2. Overall the performances of RosettaDock v2.3 and v3.2 were similar. RosettaDock v3.2 achieved 56 docking funnels, compared to 49 in v2.3. A breakdown of docking performance by protein complex type shows that RosettaDock v3.2 achieved docking funnels for 63% of antibody-antigen targets, 62% of enzyme-inhibitor targets, and 35% of ‘other’ targets. In terms of docking difficulty, RosettaDock v3.2 achieved funnels for 58% of rigid-body targets, 30% of medium targets, and 14% of difficult targets. For targets that failed, we carry out additional analyses to identify the cause of failure, which showed that binding-induced backbone conformation changes account for a majority of failures. We also present a bootstrap statistical analysis that quantifies the reliability of the stochastic docking results. Finally, we demonstrate the additional functionality available in RosettaDock v3.2 by incorporating small-molecules and non-protein co-factors in docking of a smaller target set. This study marks the most extensive benchmarking of the RosettaDock module to date and establishes a baseline for future research in protein interface modeling and structure prediction.
Many important cellular processes are carried out by protein complexes. To provide physical pictures of interacting proteins, many computational protein-protein prediction methods have been developed in the past. However, it is still difficult to identify the correct docking complex structure within top ranks among alternative conformations.
We present a novel protein docking algorithm that utilizes imperfect protein-protein binding interface prediction for guiding protein docking. Since the accuracy of protein binding site prediction varies depending on cases, the challenge is to develop a method which does not deteriorate but improves docking results by using a binding site prediction which may not be 100% accurate. The algorithm, named PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm), is based on a pair wise protein docking prediction algorithm, LZerD, which we have developed earlier. PI-LZerD starts from performing docking prediction using the provided protein-protein binding interface prediction as constraints, which is followed by the second round of docking with updated docking interface information to further improve docking conformation. Benchmark results on bound and unbound cases show that PI-LZerD consistently improves the docking prediction accuracy as compared with docking without using binding site prediction or using the binding site prediction as post-filtering.
We have developed PI-LZerD, a pairwise docking algorithm, which uses imperfect protein-protein binding interface prediction to improve docking accuracy. PI-LZerD consistently showed better prediction accuracy over alternative methods in the series of benchmark experiments including docking using actual docking interface site predictions as well as unbound docking cases.
protein docking prediction; protein-protein interaction; interaction site prediction
Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality.
In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems.
The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design.
We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem.
Protein-peptide interactions are vital for the cell. They mediate, inhibit or serve as structural components in nearly 40% of all macromolecular interactions, and are often associated with diseases, making them interesting leads for protein drug design. In recent years, large-scale technologies have enabled exhaustive studies on the peptide recognition preferences for a number of peptide-binding domain families. Yet, the paucity of data regarding their molecular binding mechanisms together with their inherent flexibility makes the structural prediction of protein-peptide interactions very challenging. This leaves flexible docking as one of the few amenable computational techniques to model these complexes. We present here an ensemble, flexible protein-peptide docking protocol that combines conformational selection and induced fit mechanisms. Starting from an ensemble of three peptide conformations (extended, a-helix, polyproline-II), flexible docking with HADDOCK generates 79.4% of high quality models for bound/unbound and 69.4% for unbound/unbound docking when tested against the largest protein-peptide complexes benchmark dataset available to date. Conformational selection at the rigid-body docking stage successfully recovers the most relevant conformation for a given protein-peptide complex and the subsequent flexible refinement further improves the interface by up to 4.5 Å interface RMSD. Cluster-based scoring of the models results in a selection of near-native solutions in the top three for ∼75% of the successfully predicted cases. This unified conformational selection and induced fit approach to protein-peptide docking should open the route to the modeling of challenging systems such as disorder-order transitions taking place upon binding, significantly expanding the applicability limit of biomolecular interaction modeling by docking.
The CAPRI experiment (Critical Assessment of Predicted Interactions) simulates realistic and diverse docking challenges, each case having specific properties that may be exploited by docking algorithms. Motivated by the different CAPRI challenges, we developed and implemented a comprehensive suite of docking algorithms. These were incorporated into a dynamic docking protocol, consisting of four main stages: (1) Biological and bioinformatics research aiming to predict the binding site residues, to define distance constraints between interface atoms and to analyze the flexibility of molecules; (2) Rigid or flexible docking, performed by the PatchDock or FlexDock method, which utilizes the information gathered in the previous step. Symmetric complexes are predicted by the SymmDock method; (3) Flexible refinement and re-ranking of the rigid docking solution candidates, performed by FiberDock; and finally, (4) clustering and filtering the results based on energy funnels. We analyzed the performance of our docking protocol on a large benchmark and on recent CAPRI targets. The analysis has demonstrated the importance of biological information gathering prior to docking, which significantly increased the docking success rate, and of the refinement and re-scoring stage that significantly improved the ranking of the rigid docking solutions. Our failures were mostly a result of mishandling backbone flexibility, inaccurate homology modeling, or incorrect biological assumptions. Most of the methods are available at http://bioinfo3d.cs.tau.ac.il/.
High resolution structures of antibody-antigen complexes are useful for analyzing the binding interface and to make rational choices for antibody engineering. When a crystallographic structure of a complex is unavailable, the structure must be predicted using computational tools. In this work, we illustrate a novel approach, named SnugDock, to predict high-resolution antibody-antigen complex structures by simultaneously structurally optimizing the antibody-antigen rigid-body positions, the relative orientation of the antibody light and heavy chains, and the conformations of the six complementarity determining region loops. This approach is especially useful when the crystal structure of the antibody is not available, requiring allowances for inaccuracies in an antibody homology model which would otherwise frustrate rigid-backbone docking predictions. Local docking using SnugDock with the lowest-energy RosettaAntibody homology model produced more accurate predictions than standard rigid-body docking. SnugDock can be combined with ensemble docking to mimic conformer selection and induced fit resulting in increased sampling of diverse antibody conformations. The combined algorithm produced four medium (Critical Assessment of PRediction of Interactions-CAPRI rating) and seven acceptable lowest-interface-energy predictions in a test set of fifteen complexes. Structural analysis shows that diverse paratope conformations are sampled, but docked paratope backbones are not necessarily closer to the crystal structure conformations than the starting homology models. The accuracy of SnugDock predictions suggests a new genre of general docking algorithms with flexible binding interfaces targeted towards making homology models useful for further high-resolution predictions.
Antibodies are proteins that are key elements of the immune system and increasingly used as drugs. Antibodies bind tightly and specifically to antigens to block their activity or to mark them for destruction. Three-dimensional structures of the antibody-antigen complexes are useful for understanding their mechanism and for designing improved antibody drugs. Experimental determination of structures is laborious and not always possible, so we have developed tools to predict structures of antibody-antigen complexes computationally. Computer-predicted models of antibodies, or homology models, typically have errors which can frustrate algorithms for prediction of protein-protein interfaces (docking), and result in incorrect predictions. Here, we have created and tested a new docking algorithm which incorporates flexibility to overcome structural errors in the antibody structural model. The algorithm allows both intramolecular and interfacial flexibility in the antibody during docking, resulting in improved accuracy approaching that when using experimentally determined antibody structures. Structural analysis of the predicted binding region of the complex will enable the protein engineer to make rational choices for better antibody drug designs.
Structural details of protein–protein interactions are invaluable for understanding and deciphering biological mechanisms. Computational docking methods aim to predict the structure of a protein–protein complex given the structures of its single components. Protein flexibility and the absence of robust scoring functions pose a great challenge in the docking field. Due to these difficulties most of the docking methods involve a two-tier approach: coarse global search for feasible orientations that treats proteins as rigid bodies, followed by an accurate refinement stage that aims to introduce flexibility into the process. The FireDock web server, presented here, is the first web server for flexible refinement and scoring of protein–protein docking solutions. It includes optimization of side-chain conformations and rigid-body orientation and allows a high-throughput refinement. The server provides a user-friendly interface and a 3D visualization of the results. A docking protocol consisting of a global search by PatchDock and a refinement by FireDock was extensively tested. The protocol was successful in refining and scoring docking solution candidates for cases taken from docking benchmarks. We provide an option for using this protocol by automatic redirection of PatchDock candidate solutions to the FireDock web server for refinement. The FireDock web server is available at http://bioinfo3d.cs.tau.ac.il/FireDock/.
Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur.
We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-αRMSD ≤ 2.5 Å) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases.
We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods.
To determine the structures of protein-protein interactions, protein docking is a valuable tool that complements experimental methods to characterize protein complexes. While protein docking can often produce a near-native solution within a set of global docking predictions, there are sometimes predictions that require refinement to elucidate correct contacts and conformation. Previously, we developed the ZRANK algorithm to rerank initial docking predictions from ZDOCK, a docking program developed by our lab. In this study, we have applied the ZRANK algorithm toward refinement of protein docking models, in conjunction with the protein docking program RosettaDock. This was performed by reranking global docking predictions from ZDOCK, performing local side chain and rigid-body refinement using RosettaDock, and selecting the refined model based on ZRANK score. For comparison, we examined using RosettaDock score instead of ZRANK score, and a larger perturbation size for the RosettaDock search, and determined that the larger RosettaDock perturbation size with ZRANK scoring was optimal. This method was validated on a protein-protein docking benchmark. For refining docking benchmark predictions from the newest ZDOCK version, this led to improved structures of top-ranked hits in 20 of 27 cases, and an increase from 23 to 27 cases with hits in the top 20 predictions. Finally, we optimized the ZRANK energy function using refined models, which provides a significant improvement over the original ZRANK energy function. Using this optimized function and the refinement protocol, the numbers of cases with hits ranked at number one increased from 12 to 19 and from 7 to 15 for two different ZDOCK versions. This shows the effective combination of independently developed docking protocols (ZDOCK/ZRANK, and RosettaDock), indicating that using diverse search and scoring functions can improve protein docking results.
Protein-RNA interactions play an important role in many biological processes. The ability to predict the molecular structures of protein-RNA complexes from docking would be valuable for understanding the underlying chemical mechanisms. We have developed a novel non-redundant benchmark dataset for protein-RNA docking and scoring. The diverse dataset of 72 targets consists of 52 unbound-unbound test complexes, and 20 unbound-bound test complexes. Here, unbound-unbound complexes refer to cases in which both binding partners of the co-crystallized complex are either in apo form or in a conformation taken from a different protein-RNA complex, whereas unbound-bound complexes are cases in which only one of the two binding partners has another experimentally determined conformation. The dataset is classified into three categories according to the interface RMSD and the percentage of native contacts in the unbound structures: 49 easy, 16 medium, and 7 difficult targets. The bound and unbound cases of the benchmark dataset are expected to benefit the development and improvement of docking and scoring algorithms for the docking community. All the easy-to-view structures are freely available to the public at http://zoulab.dalton.missouri.edu/RNAbenchmark/.
Benchmarking; protein-RNA interactions; molecular docking; scoring function; molecular recognition
Protein-protein interactions depend on a host of environmental factors. Local pH conditions influence the interactions through the protonation states of the ionizable residues that can change upon binding. In this work, we present a pH-sensitive docking approach, pHDock, that can sample side-chain protonation states of five ionizable residues (Asp, Glu, His, Tyr, Lys) on-the-fly during the docking simulation. pHDock produces successful local docking funnels in approximately half (79/161) the protein complexes, including 19 cases where standard RosettaDock fails. pHDock also performs better than the two control cases comprising docking at pH 7.0 or using fixed, predetermined protonation states. On average, the top-ranked pHDock structures have lower interface RMSDs and recover more native interface residue-residue contacts and hydrogen bonds compared to RosettaDock. Addition of backbone flexibility using a computationally-generated conformational ensemble further improves native contact and hydrogen bond recovery in the top-ranked structures. Although pHDock is designed to improve docking, it also successfully predicts a large pH-dependent binding affinity change in the Fc–FcRn complex, suggesting that it can be exploited to improve affinity predictions. The approaches in the study contribute to the goal of structural simulations of whole-cell protein-protein interactions including all the environmental factors, and they can be further expanded for pH-sensitive protein design.
Protein-protein interactions are fundamental for biological function and are strongly influenced by their local environment. Cellular pH is tightly controlled and is one of the critical environmental factors that regulates protein-protein interactions. Three-dimensional structures of the protein complexes can help us understand the mechanism of the interactions. Since experimental determination of the structures of protein-protein complexes is expensive and time-consuming, computational docking algorithms are helpful to predict the structures. However, none of the current protein-protein docking algorithms account for the critical environmental pH effects. So we developed a pH-sensitive docking algorithm that can dynamically pick the favorable protonation states of the ionizable amino-acid residues. Compared to our previous standard docking algorithm, the new algorithm improves docking accuracy and generates higher-quality predictions over a large dataset of protein-protein complexes. We also use a case study to demonstrate efficacy of the algorithm in predicting a large pH-dependent binding affinity change that cannot be captured by the other methods that neglect pH effects. In principle, the approaches in the study can be used for rational design of pH-dependent protein inhibitors or industrial enzymes that are active over a wide range of pH values.
Similarly to protein folding, the association of two proteins is driven by a free energy funnel, determined by favorable interactions in some neighborhood of the native state. We describe a docking method based on stochastic global minimization of funnel-shaped energy functions in the space of rigid body motions (SE(3)) while accounting for flexibility of the interface side chains. The method, called semi-definite programming-based underestimation (SDU), employs a general quadratic function to underestimate a set of local energy minima and uses the resulting underestimator to bias further sampling. While SDU effectively minimizes functions with funnel-shaped basins, its application to docking in the rotational and translational space SE(3) is not straightforward due to the geometry of that space. We introduce a strategy that uses separate independent variables for side-chain optimization, center-to-center distance of the two proteins, and five angular descriptors of the relative orientations of the molecules. The removal of the center-to-center distance turns out to vastly improve the efficiency of the search, because the five-dimensional space now exhibits a well-behaved energy surface suitable for underestimation. This algorithm explores the free energy surface spanned by encounter complexes that correspond to local free energy minima and shows similarity to the model of macromolecular association that proceeds through a series of collisions. Results for standard protein docking benchmarks establish that in this space the free energy landscape is a funnel in a reasonably broad neighborhood of the native state and that the SDU strategy can generate docking predictions with less than 5 Å ligand interface Cα root-mean-square deviation while achieving an approximately 20-fold efficiency gain compared to Monte Carlo methods.
Protein–protein interactions play a central role in various aspects of the structural and functional organization of the cell, and their elucidation is crucial for a better understanding of processes such as metabolic control, signal transduction, and gene regulation. Genomewide proteomics studies, primarily yeast two-hybrid assays, will provide an increasing list of interacting proteins, but only a small fraction of the potential complexes will be amenable to direct experimental analysis. Thus, it is important to develop computational docking methods that can elucidate the details of specific interactions at the atomic level. Protein–protein docking generally starts with a rigid body search that generates a large number of docked conformations with good shape, electrostatic, and chemical complementarity. The conformations are clustered to obtain a manageable number of models, but the current methods are unable to select the most likely structure among these models. Here we describe a refinement algorithm that, applied to the individual clusters, improves the quality of the models. The better models are suitable for higher-accuracy energy calculation, thereby increasing the chances that near-native structures can be identified, and thus the refinement increases the reliability of the entire docking algorithm.
Computational tools are essential in the drug design process, especially in order to take advantage of the increasing numbers of solved X-ray and NMR protein–ligand structures. Nowadays, molecular docking methods are routinely used for prediction of protein–ligand interactions and to aid in selecting potent molecules as a part of virtual screening of large databases. The improvements and advances in computational capacity in the last decade have allowed for further developments in molecular docking algorithms to address more complicated aspects such as protein flexibility. The effects of incorporation of active site water molecules and implicit or explicit solvation of the binding site are other relevant issues to be addressed in the docking procedures. Using the right docking algorithm at the right stage of virtual screening is most important. We report a staged study to address the effects of various aspects of protein flexibility and inclusion of active site water molecules on docking effectiveness to retrieve (and to be able to predict) correct ligand poses and to rank docked ligands in relation to their biological activity, for CHK1, ERK2, LpxC and UPA. We generated multiple conformers for the ligand, and compared different docking algorithms that use a variety of approaches to protein flexibility, including rigid receptor, soft receptor, flexible side chains, induced-fit, and multiple structure algorithms. Docking accuracy varied from 1 to 84%, demonstrating that the choice of method is important.
protein sampling; ligand sampling; conformational sampling; molecular docking; active site waters; CSAR; CHK1; ERK2; LpxC and UPA
The intrinsic flexibility of DNA and the difficulty of identifying its interaction surface have long been challenges that prevented the development of efficient protein–DNA docking methods. We have demonstrated the ability our flexible data-driven docking method HADDOCK to deal with these before, by using custom-built DNA structural models. Here we put our method to the test on a set of 47 complexes from the protein–DNA docking benchmark. We show that HADDOCK is able to predict many of the specific DNA conformational changes required to assemble the interface(s). Our DNA analysis and modelling procedure captures the bend and twist motions occurring upon complex formation and uses these to generate custom-built DNA structural models, more closely resembling the bound form, for use in a second docking round. We achieve throughout the benchmark an overall success rate of 94% of one-star solutions or higher (interface root mean square deviation ≤4 Å and fraction of native contacts >10%) according to CAPRI criteria. Our improved protocol successfully predicts even the challenging protein–DNA complexes in the benchmark. Finally, our method is the first to readily dock multiple molecules (N > 2) simultaneously, pushing the limits of what is currently achievable in the field of protein–DNA docking.
Accurate prediction of protein–DNA complexes could provide an important stepping stone towards a thorough comprehension of vital intracellular processes. Few attempts were made to tackle this issue, focusing on binding patch prediction, protein function classification and distance constraints-based docking. We introduce ParaDock: a novel ab initio protein–DNA docking algorithm. ParaDock combines short DNA fragments, which have been rigidly docked to the protein based on geometric complementarity, to create bent planar DNA molecules of arbitrary sequence. Our algorithm was tested on the bound and unbound targets of a protein–DNA benchmark comprised of 47 complexes. With neither addressing protein flexibility, nor applying any refinement procedure, CAPRI acceptable solutions were obtained among the 10 top ranked hypotheses in 83% of the bound complexes, and 70% of the unbound. Without requiring prior knowledge of DNA length and sequence, and within <2 h per target on a standard 2.0 GHz single processor CPU, ParaDock offers a fast ab initio docking solution.
Identification of antigenic peptide epitopes is an essential prerequisite in T cell-based molecular vaccine design. Computational (sequence-based and structure-based) methods are inexpensive and efficient compared to experimental approaches in screening numerous peptides against their cognate MHC alleles. In structure-based protocols, suited to alleles with limited epitope data, the first step is to identify high-binding peptides using docking techniques, which need improvement in speed and efficiency to be useful in large-scale screening studies. We present pDOCK: a new computational technique for rapid and accurate docking of flexible peptides to MHC receptors and primarily apply it on a non-redundant dataset of 186 pMHC (MHC-I and MHC-II) complexes with X-ray crystal structures.
We have compared our docked structures with experimental crystallographic structures for the immunologically relevant nonameric core of the bound peptide for MHC-I and MHC-II complexes. Primary testing for re-docking of peptides into their respective MHC grooves generated 159 out of 186 peptides with Cα RMSD of less than 1.00 Å, with a mean of 0.56 Å. Amongst the 25 peptides used for single and variant template docking, the Cα RMSD values were below 1.00 Å for 23 peptides. Compared to our earlier docking methodology, pDOCK shows upto 2.5 fold improvement in the accuracy and is ~60% faster. Results of validation against previously published studies represent a seven-fold increase in pDOCK accuracy.
The limitations of our previous methodology have been addressed in the new docking protocol making it a rapid and accurate method to evaluate pMHC binding. pDOCK is a generic method and although benchmarks against experimental structures, it can be applied to alleles with no structural data using sequence information. Our outcomes establish the efficacy of our procedure to predict highly accurate peptide structures permitting conformational sampling of the peptide in MHC binding groove. Our results also support the applicability of pDOCK for in silico identification of promiscuous peptide epitopes that are relevant to higher proportions of human population with greater propensity to activate T cells making them key targets for the design of vaccines and immunotherapies.
Biological complexes typically exhibit intermolecular interfaces of high shape complementarity. Many computational docking approaches use this surface complementarity as a guide in the search for predicting the structures of protein-protein complexes. Proteins often undergo conformational changes in order to create a highly complementary interface when associating. These conformational changes are a major cause of failure for automated docking procedures when predicting binding modes between proteins using their unbound conformations. Low resolution surfaces in which high frequency geometric details are omitted have been used to address this problem. These smoothed, or blurred, surfaces are expected to minimize the differences between free and bound structures, especially those that are due to side chain conformations or small backbone deviations.
In spite of the fact that this approach has been used in many docking protocols, there has yet to be a systematic study of the effects of such surface smoothing on the shape complementarity of the resulting interfaces. Here we investigate this question by computing shape complementarity of a set of 66 protein-protein complexes represented by multi-resolution blurred surfaces. Complexed and unbound structures are available for these protein-protein complexes. They are a subset of complexes from a non-redundant docking benchmark selected for rigidity (i.e. the proteins undergo limited conformational changes between their bound and unbound states). In this work we construct the surfaces by isocontouring a density map obtained by accumulating the densities of Gaussian functions placed at all atom centers of the molecule. The smoothness or resolution is specified by a Gaussian fall-off coefficient, termed “blobbyness”. Shape complementarity is quantified using a histogram of the shortest distances between two proteins' surface mesh vertices for both the crystallographic complexes and the complexes built using the protein structures in their unbound conformation.
The histograms calculated for the bound complex structures demonstrate that medium resolution smoothing (blobbyness=−0.9) can reproduce about 88% of the shape complementarity of atomic resolution surfaces. Complexes formed from the free component structures show a partial loss of shape complementarity (more overlaps and gaps) with the atomic resolution surfaces. For surfaces smoothed to low resolution (blobbyness=−0.3), we find more consistency of shape complementarity between the complexed and free cases. To further reduce bad contacts without significantly impacting the good contacts we introduce another blurred surface, in which the Gaussian densities of flexible atoms are reduced. From these results we discuss the use of shape complementarity in protein-protein docking.
Protein interactions; protein-protein docking; Gaussian surface; protein side-chain flexibility; protein interfaces; unbound-unbound docking; protein complexes; Blur surface; FlexBlur surface; enzyme-inhibitor complexes
Protein-DNA interactions are essential for many biological processes, X-ray crystallography can provide high-resolution structures, but protein-DNA complexes are difficult to crystallize and typically contain only small DNA fragments. Thus, there is a need for computational methods that can provide useful predictions to give insights into mechanisms and guide the design of new experiments. We used the program DOT, which performs an exhaustive, rigid-body search between two macromolecules, to investigate four diverse protein-DNA interactions. Here, we compare our computational results with subsequent experimental data on related systems. In all cases, the experimental data strongly supported our structural hypotheses from the docking calculations: a mechanism for weak, non-sequence-specific DNA binding by a transcription factor, a large DNA-binding footprint on the surface of the DNA-repair enzyme uracil-DNA-glycosylase, viral and host DNA-binding sites on the catalytic domain of HIV integrase, and a three-DNA-contact model of the linker histone bound to the nucleosome. In the case of uracil-DNA-glycosylase, the experimental design was based on the DNA-binding surface found by docking, rather than the much smaller surface observed in the crystallographic structure. These comparisons demonstrate that the DOT electrostatic energy gives a good representation of the distinctive electrostatic properties of DNA and DNA-binding proteins. The large, favorably-ranked clusters resulting from the dockings identify active sites, map out large DNA-binding sites, and reveal multiple DNA contacts with a protein. Thus, computational docking can not only help to identify protein-DNA interactions in the absence of a crystal structure, but also expand structural understanding beyond known crystallographic structures.
protein-DNA structure; HIV integrase; uracil DNA-glycosylase; linker histone; transcription factor; Poisson-Boltzmann electrostatics; hydrogen/deuterium exchange
The number of protein targets with a known or predicted tri-dimensional structure and of drug-like chemical compounds is growing rapidly and so is the need for new therapeutic compounds or chemical probes. Performing flexible structure-based virtual screening computations on thousands of targets with millions of molecules is intractable to most laboratories nor indeed desirable. Since shape complementarity is of primary importance for most protein-ligand interactions, we have developed a tool/protocol based on rigid-body docking to select compounds that fit well into binding sites.
Here we present an efficient multiple conformation rigid-body docking approach, MS-DOCK, which is based on the program DOCK. This approach can be used as the first step of a multi-stage docking/scoring protocol. First, we developed and validated the Multiconf-DOCK tool that generates several conformers per input ligand. Then, each generated conformer (bioactives and 37970 decoys) was docked rigidly using DOCK6 with our optimized protocol into seven different receptor-binding sites. MS-DOCK was able to significantly reduce the size of the initial input library for all seven targets, thereby facilitating subsequent more CPU demanding flexible docking procedures.
MS-DOCK can be easily used for the generation of multi-conformer libraries and for shape-based filtering within a multi-step structure-based screening protocol in order to shorten computation times.
Accommodating backbone flexibility continues to be the most difficult challenge in computational docking of protein-protein complexes. Towards that end, we simulate four distinct biophysical models of protein binding in RosettaDock, a multi-scale Monte-Carlo based algorithm that uses a quasi-kinetic search process to emulate the diffusional encounter of two proteins and identify low energy complexes. The four binding models are: 1) key-lock model (KL) using rigid-backbone docking, 2) conformer selection model (CS) using a novel ensemble docking algorithm, 3) induced fit model (IF) using energy gradient-based backbone minimization, and 4) a combined conformer selection/induced fit model (CS/IF). Backbone flexibility was limited to the smaller partner of the complex, structural ensembles were generated using Rosetta refinement methods, and docking consisted of local perturbations around the complexed conformation using unbound component crystal structures for a set of 21 target complexes. The lowest-energy structure contained more than 30% of the native residue-residue contacts for 9, 13, 13, and 14 targets for KL, CS, IF and CS/IF docking respectively. When applied to 15 targets using NMR ensembles of the smaller protein, the lowest-energy structure recovered at least 30% native residue contacts in 3, 8, 4 and 8 targets for KL, CS, IF and CS/IF docking respectively. CS/IF docking of the NMR ensemble performed equally well or better than KL docking with the unbound crystal structure in 10 of 15 cases. The marked success of CS and CS/IF docking shows that ensemble docking can be a versatile and effective method for accommodating conformational plasticity in docking and serves as a demonstration for the conformer selection theory - that binding-competent conformers exist in the unbound ensemble and can be selected based on their favorable binding energies.
protein-protein docking; flexible docking; ensemble docking; conformer selection; NMR ensembles
Existing flexible docking approaches model the ligand and receptor flexibility either separately or in a loosely-coupled manner, which captures the conformational changes inefficiently. Here, we propose a flexible docking approach, MedusaDock, which models both ligand and receptor flexibility simultaneously with sets of discrete rotamers. We develop an algorithm to build the ligand rotamer library “on-the-fly” during docking simulations. MedusaDock benchmarks demonstrate a rapid sampling efficiency and high prediction accuracy in both self-docking (to the co-crystallized state) and cross-docking (to a state co-crystallized with a different ligand), the latter of which mimics the virtual-screening procedure in computational drug discovery. We also perform a virtual-screening test of four flexible kinase targets including cyclin-dependent kinase 2, vascular endothelial growth factor receptor 2, HIV reverse transcriptase, and HIV protease. We find significant improvements of virtual-screening enrichments when compared to rigid-receptor methods. The predictive power of MedusaDock in cross-docking and preliminary virtual-screening benchmarks highlights the importance to model both ligand and receptor flexibility simultaneously in computational docking.
Small molecule docking predicts the interaction of a small molecule ligand with a protein at atomic-detail accuracy including position and conformation the ligand but also conformational changes of the protein upon ligand binding. While successful in the majority of cases, docking algorithms including RosettaLigand fail in some cases to predict the correct protein/ligand complex structure. In this study we show that simultaneous docking of explicit interface water molecules greatly improves Rosetta’s ability to distinguish correct from incorrect ligand poses. This result holds true for both protein-centric water docking wherein waters are located relative to the protein binding site and ligand-centric water docking wherein waters move with the ligand during docking. Protein-centric docking is used to model 99 HIV-1 protease/protease inhibitor structures. We find protease inhibitor placement improving at a ratio of 9∶1 when one critical interface water molecule is included in the docking simulation. Ligand-centric docking is applied to 341 structures from the CSAR benchmark of diverse protein/ligand complexes . Across this diverse dataset we see up to 56% recovery of failed docking studies, when waters are included in the docking simulation.
Motivation: Predicting how proteins interact at the molecular level is a computationally intensive task. Many protein docking algorithms begin by using fast Fourier transform (FFT) correlation techniques to find putative rigid body docking orientations. Most such approaches use 3D Cartesian grids and are therefore limited to computing three dimensional (3D) translational correlations. However, translational FFTs can speed up the calculation in only three of the six rigid body degrees of freedom, and they cannot easily incorporate prior knowledge about a complex to focus and hence further accelerate the calculation. Furthemore, several groups have developed multi-term interaction potentials and others use multi-copy approaches to simulate protein flexibility, which both add to the computational cost of FFT-based docking algorithms. Hence there is a need to develop more powerful and more versatile FFT docking techniques.
Results: This article presents a closed-form 6D spherical polar Fourier correlation expression from which arbitrary multi-dimensional multi-property multi-resolution FFT correlations may be generated. The approach is demonstrated by calculating 1D, 3D and 5D rotational correlations of 3D shape and electrostatic expansions up to polynomial order L=30 on a 2 GB personal computer. As expected, 3D correlations are found to be considerably faster than 1D correlations but, surprisingly, 5D correlations are often slower than 3D correlations. Nonetheless, we show that 5D correlations will be advantageous when calculating multi-term knowledge-based interaction potentials. When docking the 84 complexes of the Protein Docking Benchmark, blind 3D shape plus electrostatic correlations take around 30 minutes on a contemporary personal computer and find acceptable solutions within the top 20 in 16 cases. Applying a simple angular constraint to focus the calculation around the receptor binding site produces acceptable solutions within the top 20 in 28 cases. Further constraining the search to the ligand binding site gives up to 48 solutions within the top 20, with calculation times of just a few minutes per complex. Hence the approach described provides a practical and fast tool for rigid body protein-protein docking, especially when prior knowledge about one or both binding sites is available.
Interaction profile method is a useful method for processing rigid-body docking. After the docking process, the resulting set of docking poses could be classified by calculating similarities among them using these interaction profiles to search for near-native poses. However, there are some cases where the near-native poses are not included in this set of docking poses even when the bound-state structures are used. Therefore, we have developed a method for generating near-native docking poses by introducing a re-docking process. We devised a method for calculating the profile of interaction fingerprints by assembling protein complexes after determining certain core-protein complexes. For our analysis, we used 44 bound-state protein complexes selected from the ZDOCK benchmark dataset ver. 2.0, including some protein pairs none of which generated near-native poses in the docking process. Consequently, after the re-docking process we obtained profiles of interaction fingerprints, some of which yielded near-native poses. The re-docking process involved searching for possible docking poses in a restricted area using the profile of interaction fingerprints. If the profile includes interactions identical to those in the native complex, we obtained near-native docking poses. Accordingly, near-native poses were obtained for all bound-state protein complexes examined here. Application of interaction fingerprints to the re-docking process yielded structures with more native interactions, even when a docking pose, obtained following the initial docking process, contained only a small number of native amino acid interactions. Thus, utilization of the profile of interaction fingerprints in the re-docking process yielded more near-native poses.
We present version 3.0 of our publicly available protein-protein docking benchmark. This update includes 40 new test cases, representing a 48% increase from Benchmark 2.0. For all of the new cases, the crystal structures of both binding partners are available. As with Benchmark 2.0, SCOP1 (Structural Classification of Proteins) was used to remove redundant test cases. The 124 unbound-unbound test cases in Benchmark 3.0 are classified into 88 rigid-body cases, 19 medium difficulty cases, and 17 difficult cases, based on the degree of conformational change at the interface upon complex formation. In addition to providing the community with more test cases for evaluating docking methods, the expansion of Benchmark 3.0 will facilitate the development of new algorithms that require a large number of training examples. Benchmark 3.0 is available to the public at http://zlab.bu.edu/benchmark.
protein-protein docking; protein complexes; protein-protein interactions; complex structure