A good scoring function is essential for molecular docking computations. In conventional scoring functions, energy terms modeling pairwise interactions are cumulatively summed, and the best docking solution is selected. Here, we propose to transform protein-ligand interactions into three-dimensional geometric networks, from which recurring network substructures, or network motifs, are selected and used to provide probability-ranked interaction templates with which to score docking solutions.
A novel scoring function for protein-ligand docking, MotifScore, was developed. It is non-energy-based, and docking is, instead, scored by counting the occurrences of motifs of protein-ligand interaction networks constructed using structures of protein-ligand complexes. MotifScore has been tested on a benchmark set established by others to assess its ability to identify near-native complex conformations among a set of decoys. In this benchmark test, 84% of the highest-scored docking conformations had root-mean-square deviations (rmsds) below 2.0 Å from the native conformation, which is comparable with the best of several energy-based docking scoring functions. Many of the top motifs, which comprise a multitude of chemical groups that interact simultaneously and make a highly significant contribution to MotifScore, capture recurrent interacting patterns beyond pairwise interactions.
While providing quite good docking scores, MotifScore is quite different from conventional energy-based functions. MotifScore thus represents a new, network-based approach for exploring problems associated with molecular docking.
Two sets of ligand binding decoys have been constructed for the CSAR (Community Structure-Activity Resource) benchmark by using the MDock and DOCK programs for rigid-ligand and flexible-ligand docking, respectively. The decoys generated for each complex in the benchmark thoroughly cover the binding site and also contain a certain number of near-native binding modes. A few scoring functions have been evaluated using the ligand binding decoy sets for their abilities of predicting near-native binding modes. Among them, ITScore achieved a success rate of 86.7% for the rigid-ligand decoys and 79.7% for the flexible-ligand decoys, under the common definition of a successful prediction as RMSD < 2.0 Å from the native structure if the top-scored binding mode was considered. The decoy sets may serve as benchmarks for binding mode prediction of a scoring function, which are available at the CSAR website (http://www.csardock.org/).
molecular docking; scoring function; CSAR benchmark; binding mode; knowledge-based
Molecular docking is widely used to obtain binding modes and binding affinities of a molecule to a given target protein. Despite considerable efforts, however, prediction of both properties by docking remains challenging mainly due to protein’s structural flexibility and inaccuracy of scoring functions. Here, an integrated approach has been developed to improve the accuracy of binding mode and affinity prediction, and tested for small molecule MDM2 and MDMX antagonists. In this approach, initial candidate models selected from docking are subjected to equilibration MD simulations to further filter the models. Free energy perturbation molecular dynamics (FEP/MD) simulations are then applied to the filtered ligand models to enhance the ability in predicting the near-native ligand conformation. The calculated binding free energies for MDM2 complexes are overestimated compared to experimental measurements mainly due to the difficulties in sampling highly flexible apo-MDM2. Nonetheless, the FEP/MD binding free energy calculations are more promising for discriminating binders from nonbinders than docking scores. In particular, the comparison between the MDM2 and MDMX results suggests that apo-MDMX has lower flexibility than apo-MDM2. In addition, the FEP/MD calculations provide detailed information on the different energetic contributions to ligand binding, leading to a better understanding of the sensitivity and specificity of protein-ligand interactions.
Molecular dynamics simulation; free energy perturbation; protein-protein interaction; docking; computer-aided drug design
To determine the structures of protein-protein interactions, protein docking is a valuable tool that complements experimental methods to characterize protein complexes. While protein docking can often produce a near-native solution within a set of global docking predictions, there are sometimes predictions that require refinement to elucidate correct contacts and conformation. Previously, we developed the ZRANK algorithm to rerank initial docking predictions from ZDOCK, a docking program developed by our lab. In this study, we have applied the ZRANK algorithm toward refinement of protein docking models, in conjunction with the protein docking program RosettaDock. This was performed by reranking global docking predictions from ZDOCK, performing local side chain and rigid-body refinement using RosettaDock, and selecting the refined model based on ZRANK score. For comparison, we examined using RosettaDock score instead of ZRANK score, and a larger perturbation size for the RosettaDock search, and determined that the larger RosettaDock perturbation size with ZRANK scoring was optimal. This method was validated on a protein-protein docking benchmark. For refining docking benchmark predictions from the newest ZDOCK version, this led to improved structures of top-ranked hits in 20 of 27 cases, and an increase from 23 to 27 cases with hits in the top 20 predictions. Finally, we optimized the ZRANK energy function using refined models, which provides a significant improvement over the original ZRANK energy function. Using this optimized function and the refinement protocol, the numbers of cases with hits ranked at number one increased from 12 to 19 and from 7 to 15 for two different ZDOCK versions. This shows the effective combination of independently developed docking protocols (ZDOCK/ZRANK, and RosettaDock), indicating that using diverse search and scoring functions can improve protein docking results.
Near-native selections from docking decoys have proved challenging especially when unbound proteins are used in the molecular docking. One reason is that significant atomic clashes in docking decoys lead to poor predictions of binding affinities of near native decoys. Atomic clashes can be removed by structural refinement through energy minimization. Such an energy minimization, however, will lead to an unrealistic bias toward docked structures with large interfaces. Here, we extend an empirical energy function developed for protein design to protein–protein docking selection by introducing a simple reference state that removes the unrealistic dependence of binding affinity of docking decoys on the buried solvent accessible surface area of interface. The energy function called EMPIRE (EMpirical Protein-InteRaction Energy), when coupled with a refinement strategy, is found to provide a significantly improved success rate in near native selections when applied to RosettaDock and refined ZDOCK docking decoys. Our work underlines the importance of removing nonspecific interactions from specific ones in near native selections from docking decoys.
knowledge-based potential; energy score functions; reference state; binding affinity; docking decoys
The 3D-partner is a web tool to predict interacting partners and binding models of a query protein sequence through structure complexes and a new scoring function. 3D-partner first utilizes IMPALA to identify homologous structures (templates) of a query from a heterodimer profile library. The interacting-partner sequence profiles of these templates are then used to search interacting candidates of the query from protein sequence databases (e.g. SwissProt) by PSI-BLAST. We developed a new scoring function, which includes the contact-residue interacting score (e.g. the steric, hydrogen bonds, and electrostatic interactions) and the template consensus score (e.g. couple-conserved residue and the template similarity scores), to evaluate how well the interfaces between the query and interacting candidates. Based on this scoring function, 3D-partner provides the statistic significance, the binding models (e.g. hydrogen bonds and conserved amino acids) and functional annotations of interacting partners. The correlation between experimental energies and predicted binding affinities of our scoring function is 0.91 on 275 mutated residues from the ASEdb. The average precision of the server is 0.72 on 563 queries and the execution time of this server for a query is ∼15 s on average. These results suggest that the 3D-partner server can be useful in protein-protein interaction predictions and binding model visualizations. The server is available online at: http://3D-partner.life.nctu.edu.tw.
The identification of near native protein-protein complexes among a set of decoys remains highly challenging. A strategy for improving the success rate of near native detection is to enrich near native docking decoys in a small number of top ranked decoys. Recently, we found that a combination of three scoring functions (energy, conservation and interface propensity) can predict the location of binding interface regions with reasonable accuracy. Here, these three scoring functions are modified and combined into a consensus scoring function called ENDES for enriching near native docking decoys. We found that all individual scores result in enrichment for the majority of 28 targets in ZDOCK2.3 decoy set and the 22 targets in Benchmark 2.0. Among the three scores, the interface propensity score yields the highest enrichment in both sets of protein complexes. When these scores are combined into the ENDES consensus score, a significant increase in enrichment of near-native structures is found. For example, when 2000 dock decoys are reduced to 200 decoys by ENDES, the fraction of near-native structures in docking decoys increases by a factor of about six in average. ENDES was implemented into a computer program that is available for download at http://sparks.informatics.iupui.edu.
Protein Docking; Near Native Decoy Selection; Energy Score; Interface Propensity; Conservation Score
Exhaustive exploration of molecular interactions at the level of complete proteomes requires efficient and reliable computational approaches to protein function inference. Ligand docking and ranking techniques show considerable promise in their ability to quantify the interactions between proteins and small molecules. Despite the advances in the development of docking approaches and scoring functions, the genome-wide application of many ligand docking/screening algorithms is limited by the quality of the binding sites in theoretical receptor models constructed by protein structure prediction. In this study, we describe a new template-based method for the local refinement of ligand-binding regions in protein models using remotely related templates identified by threading. We designed a Support Vector Regression (SVR) model that selects correct binding site geometries in a large ensemble of multiple receptor conformations. The SVR model employs several scoring functions that impose geometrical restraints on the Cα positions, account for the specific chemical environment within a binding site and optimize the interactions with putative ligands. The SVR score is well correlated with the RMSD from the native structure; in 47% (70%) of the cases, the Pearson’s correlation coefficient is >0.5 (>0.3). When applied to weakly homologous models, the average heavy atom, local RMSD from the native structure of the top-ranked (best of top five) binding site geometries is 3.1 Å (2.9 Å) for roughly half of the targets; this represents a 0.1 (0.3) Å average improvement over the original predicted structure. Focusing on the subset of strongly conserved residues, the average heavy atom RMSD is 2.6 Å (2.3 Å). Furthermore, we estimate the upper bound of template-based binding site refinement using only weakly related proteins to be ~2.6 Å RMSD. This value also corresponds to the plasticity of the ligand-binding regions in distant homologues. The Binding Site Refinement (BSR) approach is available to the scientific community as a web server that can be accessed at http://cssb.biology.gatech.edu/bsr/.
Ligand-binding site refinement; proteinthreading; protein structure prediction; ligand-binding site prediction; ensemble docking; molecular function
Poor performance of scoring functions is a well-known bottleneck in structure-based virtual screening, which is most frequently manifested in the scoring functions’ inability to discriminate between true ligands versus known non-binders (therefore designated as binding decoys). This deficiency leads to a large number of false positive hits resulting from virtual screening. We have hypothesized that filtering out or penalizing docking poses recognized as non-native (i.e., pose decoys) should improve the performance of virtual screening in terms of improved identification of true binders. Using several concepts from the field of cheminformatics, we have developed a novel approach to identifying pose decoys from an ensemble of poses generated by computational docking procedures. We demonstrate that the use of target-specific pose (-scoring) filter in combination with a physical force field-based scoring function (MedusaScore) leads to significant improvement of hit rates in virtual screening studies for 12 of the 13 benchmark sets from the clustered version of the Database of Useful Decoys (DUD). This new hybrid scoring function outperforms several conventional structure-based scoring functions, including XSCORE∷HMSCORE, ChemScore, PLP, and Chemgauss3, in six out of 13 data sets at early stage of VS (up 1% decoys of the screening database). We compare our hybrid method with several novel VS methods that were recently reported to have good performances on the same DUD data sets. We find that the retrieved ligands using our method are chemically more diverse in comparison with two ligand-based methods (FieldScreen and FLAP∷LBX). We also compare our method with FLAP∷RBLB, a high-performance VS method that also utilizes both the receptor and the cognate ligand structures. Interestingly, we find that the top ligands retrieved using our method are highly complementary to those retrieved using FLAP∷RBLB, hinting effective directions for best VS applications. We suggest that this integrative virtual screening approach combining cheminformatics and molecular mechanics methodologies may be applied to a broad variety of protein targets to improve the outcome of structure-based drug discovery studies.
Molecular docking is widely used to predict novel lead compounds for drug discovery. Success depends on the quality of the docking scoring function, among other factors. An imperfect scoring function can mislead by predicting incorrect ligand geometries or by selecting nonbinding molecules over true ligands. These false-positive hits may be considered “decoys”. Although these decoys are frustrating, they potentially provide important tests for a docking algorithm; the more subtle the decoy, the more rigorous the test. Indeed, decoy databases have been used to improve protein structure prediction algorithms and protein–protein docking algorithms. Here, we describe 20 geometric decoys in five enzymes and 166 “hit list” decoys–i.e., molecules predicted to bind by our docking program that were tested and found not to do so–for β-lactamase and two cavity sites in lysozyme. Especially in the cavity sites, which are very simple, these decoys highlight particular weaknesses in our scoring function. We also consider the performance of five other widely used docking scoring functions against our geometric and hit list decoys. Intriguingly, whereas many of these other scoring functions performed better on the geometric decoys, they typically performed worse on the hit list decoys, often highly ranking molecules that seemed to poorly complement the model sites. Several of these “hits” from the other scoring functions were tested experimentally and found, in fact, to be decoys. Collectively, these decoys provide a tool for the development and improvement of molecular docking scoring functions. Such improvements may, in turn, be rapidly tested experimentally against these and related experimental systems, which are well-behaved in assays and for structure determination.
The efficient and accurate quantification of protein-ligand interactions using computational methods is still a challenging task. Two factors strongly contribute to the failure of docking methods to predict free energies of binding accurately: the insufficient incorporation of protein flexibility coupled to ligand binding and the neglected dynamics of the protein-ligand complex in current scoring schemes. We have developed a new methodology, named the ‘ligand-model’ concept, to sample protein conformations that are relevant for binding structurally diverse sets of ligands. In the ligand-model concept, molecular-dynamics (MD) simulations are performed with a virtual ligand, represented by a collection of functional groups that binds to the protein and dynamically changes its shape and properties during the simulation. The ligand model essentially represents a large ensemble of different chemical species binding to the same target protein. Representative protein structures were obtained from the MD simulation, and docking was performed into this ensemble of protein conformation. Similar binding poses were clustered, and the averaged score was utilized to re-rank the poses. We demonstrate that the ligand-model approach yields significant improvements in predicting native-like binding poses and quantifying binding affinities compared to static docking and ensemble docking simulations into protein structures generated from an apo MD simulation.
Ligand-model concept; protein-ligand interactions; protein flexibility; induced-fit; docking; holo; apo
Accommodating backbone flexibility continues to be the most difficult challenge in computational docking of protein-protein complexes. Towards that end, we simulate four distinct biophysical models of protein binding in RosettaDock, a multi-scale Monte-Carlo based algorithm that uses a quasi-kinetic search process to emulate the diffusional encounter of two proteins and identify low energy complexes. The four binding models are: 1) key-lock model (KL) using rigid-backbone docking, 2) conformer selection model (CS) using a novel ensemble docking algorithm, 3) induced fit model (IF) using energy gradient-based backbone minimization, and 4) a combined conformer selection/induced fit model (CS/IF). Backbone flexibility was limited to the smaller partner of the complex, structural ensembles were generated using Rosetta refinement methods, and docking consisted of local perturbations around the complexed conformation using unbound component crystal structures for a set of 21 target complexes. The lowest-energy structure contained more than 30% of the native residue-residue contacts for 9, 13, 13, and 14 targets for KL, CS, IF and CS/IF docking respectively. When applied to 15 targets using NMR ensembles of the smaller protein, the lowest-energy structure recovered at least 30% native residue contacts in 3, 8, 4 and 8 targets for KL, CS, IF and CS/IF docking respectively. CS/IF docking of the NMR ensemble performed equally well or better than KL docking with the unbound crystal structure in 10 of 15 cases. The marked success of CS and CS/IF docking shows that ensemble docking can be a versatile and effective method for accommodating conformational plasticity in docking and serves as a demonstration for the conformer selection theory - that binding-competent conformers exist in the unbound ensemble and can be selected based on their favorable binding energies.
protein-protein docking; flexible docking; ensemble docking; conformer selection; NMR ensembles
As protein–protein interactions are crucial in most biological processes, it is valuable to understand how and where protein pairs interact. We developed a web server HOMCOS (Homology Modeling of Complex Structure, http://biunit.naist.jp/homcos) to predict interacting protein pairs and interacting sites by homology modeling of complex structures. Our server is capable of three services. The first is modeling heterodimers from two query amino acid sequences posted by users. The server performs BLAST searches to identify homologous templates in the latest representative dataset of heterodimer structures generated from the PQS database. Structure validity is evaluated by the combination of sequence similarity and knowledge-based contact potential energy as previously described. The server generates a sequence-replaced model PDB file and a MODELLER script to build full atomic models of complex structures. The second service is modeling homodimers from one query sequence. The third service is identification of potentially interacting proteins for one query sequence. The server searches the dataset of heterodimer structures for a homologous template, outputs the candidate interacting sequences in the Uniprot database homologous for the interacting partner template proteins. These features are useful for wide range of researchers to predict putative interaction sites and interacting proteins.
Computational small molecule docking into comparative models of proteins is widely used to query protein function and in the development of small molecule therapeutics. We benchmark RosettaLigand docking into comparative models for nine proteins built during CASP8 that contain ligands. We supplement the study with 21 additional protein/ligand complexes to cover a wider space of chemotypes. During a full docking run in 21 of the 30 cases, RosettaLigand successfully found a native-like binding mode among the top ten scoring binding modes. From the benchmark cases we find that careful template selection based on ligand occupancy provides the best chance of success while overall sequence identity between template and target do not appear to improve results. We also find that binding energy normalized by atom number is often less than −0.4 in native-like binding modes.
Protein heterodimer complexes are often involved in catalysis, regulation, assembly, immunity and inhibition. This involves the formation of stable interfaces
between the interacting partners. Hence, it is of interest to describe heterodimer interfaces using known structural complexes. We use a non-redundant dataset of
192 heterodimer complex structures from the protein databank (PDB) to identify interface residues and describe their interfaces using amino-acids residue property
preference. Analysis of the dataset shows that the heterodimer interfaces are often abundant in polar residues. The analysis also shows the presence of two classes
of interfaces in heterodimer complexes. The first class of interfaces (class A) with more polar residues than core but less than surface is known. These interfaces
are more hydrophobic than surfaces, where protein-protein binding is largely hydrophobic. The second class of interfaces (class B) with more polar residues than
core and surface is shown. These interfaces are more polar than surfaces, where binding is mainly polar. Thus, these findings provide insights to the understanding
of protein-protein interactions.
protein-protein interaction (PPI); heterodimer; interface; surface; core; polar abundance
Protein-RNA interactions play fundamental roles in many biological processes. Understanding the molecular mechanism of protein-RNA recognition and formation of protein-RNA complexes is a major challenge in structural biology. Unfortunately, the experimental determination of protein-RNA complexes is tedious and difficult, both by X-ray crystallography and NMR. For many interacting proteins and RNAs the individual structures are available, enabling computational prediction of complex structures by computational docking. However, methods for protein-RNA docking remain scarce, in particular in comparison to the numerous methods for protein-protein docking.
We developed two medium-resolution, knowledge-based potentials for scoring protein-RNA models obtained by docking: the quasi-chemical potential (QUASI-RNP) and the Decoys As the Reference State potential (DARS-RNP). Both potentials use a coarse-grained representation for both RNA and protein molecules and are capable of dealing with RNA structures with posttranscriptionally modified residues. We compared the discriminative power of DARS-RNP and QUASI-RNP for selecting rigid-body docking poses with the potentials previously developed by the Varani and Fernandez groups.
In both bound and unbound docking tests, DARS-RNP showed the highest ability to identify native-like structures. Python implementations of DARS-RNP and QUASI-RNP are freely available for download at http://iimcb.genesilico.pl/RNP/
RNA; protein; RNP; macromolecular docking; complex modeling; structural bioinformatics
With the development of many computational methods that predict the structural models of protein-protein complexes, there is a pressing need to benchmark their performance. As was the case for protein monomers, assessing the quality of models of protein complexes is not straightforward. An effective scoring scheme should be able to detect substructure similarity and estimate its statistical significance. Here, we focus on characterizing the similarity of the interfaces of the complex and introduce two scoring functions. The first, the interfacial Template Modeling score (iTM-score), measures the geometric distance between the interfaces, while the second, the Interface Similarity score (IS-score), evaluates their side chain contact similarity in addition to their geometric similarity. We first demonstrate that the IS-score is more suitable for assessing docking models than the iTM-score. The IS-score is then validated in a large-scale benchmark test on 1,562 dimeric complexes. Finally, the scoring function is applied to evaluate docking models submitted to the Critical Assessment of PRediction of Interactions (CAPRI) experiments. While the results according to the new scoring scheme are generally consistent with the original CAPRI assessment, the IS-score identifies models whose significance was previously underestimated.
docking; protein-protein interaction; protein-protein interface; structure prediction; TM-score; IS-score; CAPRI
While many structures of single protein components are becoming available, structural characterization of their complexes remains challenging. Methods for modeling assembly structures from individual components frequently suffer from large errors, due to protein flexibility and inaccurate scoring functions. However, when additional information is available, it may be possible to reduce the errors and compute near-native complex structures. One such type of information is a small angle X-ray scattering (SAXS) profile that can be collected in a high-throughput fashion from a small amount of sample in solution. Here, we present an efficient method for protein-protein docking with a SAXS profile (FoXSDock): generation of complex models by rigid global docking with PatchDock, filtering of the models based on the SAXS profile, clustering of the models, and refining the interface by flexible docking with FireDock. FoXSDock is benchmarked on 124 protein complexes with simulated SAXS profiles, as well as on 6 complexes with experimentally determined SAXS profiles. When induced fit is less than 1.5Å interface C⟨ RMSD and the fraction residues of missing from the component structures is less than 3%, FoXSDock can find a model close to the native structure within the top 10 predictions in 77% of the cases; in comparison, docking alone succeeds in only 34% of the cases. Thus, the integrative approach significantly improves on molecular docking alone. The improvement arises from an increased resolution of rigid docking sampling and more accurate scoring.
Small Angle X-ray Scattering (SAXS); protein-protein docking; macromolecular assembly
RNA-binding proteins play many essential roles in the regulation of gene expression in the cell. Despite the significant increase in the number of structures for RNA–protein complexes in the last few years, the molecular basis of specificity remains unclear even for the best-studied protein families. We have developed a distance and orientation-dependent hydrogen-bonding potential based on the statistical analysis of hydrogen-bonding geometries that are observed in high-resolution crystal structures of protein–DNA and protein–RNA complexes. We observe very strong geometrical preferences that reflect significant energetic constraints on the relative placement of hydrogen-bonding atom pairs at protein–nucleic acid interfaces. A scoring function based on the hydrogen-bonding potential discriminates native protein–RNA structures from incorrectly docked decoys with remarkable predictive power. By incorporating the new hydrogen-bonding potential into a physical model of protein–RNA interfaces with full atom representation, we were able to recover native amino acids at protein–RNA interfaces.
Membrane proteins are of particular biological and pharmaceutical importance, and computational modeling and structure prediction approaches play an important role in studies of membrane proteins. Developing an accurate model quality assessment program is of significance to the structure prediction of membrane proteins. Few such programs are proposed that can be applied to a broad range of membrane protein classes and perform with high accuracy. We developed a new model scoring function IQ, based on the analysis of four types of inter-residue interactions within the transmembrane domains of helical membrane proteins. This function was tested using three high-quality model sets: all 206 models of GPCR Dock 2008, all 284 models of GPCR Dock 2010, and all 92 helical membrane protein models of the HOMEP set. For all three sets, the scoring function can select the native structures among all of the models with the success rates of 93%, 85%, and 100% respectively. For comparison, these three model sets were also adopted for a recently published model assessment program for membrane protein structures, ProQM, which gave the success rates of 85%, 79%, and 92% separately. These results suggested that IQ outperforms ProQM when only the transmembrane regions of the models are considered. This scoring function should be useful for the computational modeling of membrane proteins.
membrane proteins; structure quality; inter-residue interactions; frequency score; average number of interactions
Molecular docking is an important method for the research of protein-protein interaction and recognition. A protein can be considered as a network when the residues are treated as its nodes. With the contact energy between residues as link weight, a weighted residue network is constructed in this paper. Two weighted parameters (strength and weighted average nearest neighbors’ degree) are introduced into this model at the same time. The stability of a protein is characterized by its strength. The global topological properties of the protein-protein complex are reflected by the weighted average nearest neighbors’ degree. Based on this weighted network model and these two parameters, a new docking scoring function is proposed in this paper. The scoring and ranking for 42 systems’ bound and unbounded docking results are performed with this new scoring function. Comparing the results obtained from this new scoring function with that from the pair potentials scoring function, we found that this new scoring function has a similar performance to the pair potentials on some items, and this new scoring function can get a better success rate. The calculation of this new scoring function is easy, and the result of its scoring and ranking is acceptable. This work can help us better understand the mechanisms of protein-protein interactions and recognition.
residue network; weighted parameter; protein-protein docking; scoring function
Structural details of protein–protein interactions are invaluable for understanding and deciphering biological mechanisms. Computational docking methods aim to predict the structure of a protein–protein complex given the structures of its single components. Protein flexibility and the absence of robust scoring functions pose a great challenge in the docking field. Due to these difficulties most of the docking methods involve a two-tier approach: coarse global search for feasible orientations that treats proteins as rigid bodies, followed by an accurate refinement stage that aims to introduce flexibility into the process. The FireDock web server, presented here, is the first web server for flexible refinement and scoring of protein–protein docking solutions. It includes optimization of side-chain conformations and rigid-body orientation and allows a high-throughput refinement. The server provides a user-friendly interface and a 3D visualization of the results. A docking protocol consisting of a global search by PatchDock and a refinement by FireDock was extensively tested. The protocol was successful in refining and scoring docking solution candidates for cases taken from docking benchmarks. We provide an option for using this protocol by automatic redirection of PatchDock candidate solutions to the FireDock web server for refinement. The FireDock web server is available at http://bioinfo3d.cs.tau.ac.il/FireDock/.
We have generated docking poses for the FKBP-GPI complex using eight docking programs, and compared their scoring functions with scoring based on NMR chemical shift perturbations (NMRScore). Because the chemical shift perturbation (CSP) is exquisitely sensitive on the orientation of ligand inside the binding pocket, NMRScore offers an accurate and straightforward approach to score different poses. All scoring functions were inspected by their abilities to highly rank the native-like structures and separate them from decoy poses generated for a protein-ligand complex. The overall performance of NMRScore is much better than that of energy-based scoring functions associated with docking programs in both aspects. In summary, we find that the combination of docking programs with NMRScore results in an approach that can robustly determine the binding site structure for a protein-ligand complex, thereby, providing a new tool facilitating the structure-based drug discovery process.
DNA–protein interactions are involved in many essential biological
activities. Because there is no simple mapping code between DNA base pairs and
protein amino acids, the prediction of DNA–protein interactions is a
challenging problem. Here, we present a novel computational approach for
predicting DNA-binding protein residues and DNA–protein interaction
modes without knowing its specific DNA target sequence. Given the structure of a
DNA-binding protein, the method first generates an ensemble of complex
structures obtained by rigid-body docking with a nonspecific canonical B-DNA.
Representative models are subsequently selected through clustering and ranking
by their DNA–protein interfacial energy. Analysis of these encounter
complex models suggests that the recognition sites for specific DNA binding are
usually favorable interaction sites for the nonspecific DNA probe and that
nonspecific DNA–protein interaction modes exhibit some similarity to
specific DNA–protein binding modes. Although the method requires as
input the knowledge that the protein binds DNA, in benchmark tests, it achieves
better performance in identifying DNA-binding sites than three previously
established methods, which are based on sophisticated machine-learning
techniques. We further apply our method to protein structures predicted through
modeling and demonstrate that our method performs satisfactorily on protein
models whose root-mean-square Cα deviation from native is up to 5
Å from their native structures. This study provides valuable
structural insights into how a specific DNA-binding protein interacts with a
nonspecific DNA sequence. The similarity between the specific
DNA–protein interaction mode and nonspecific interaction modes may
reflect an important sampling step in search of its specific DNA targets by a
Many essential biological activities require interactions between DNA and
proteins. These proteins usually use certain amino acids, called DNA-binding
sites, to recognize their specific DNA targets. To facilitate the search of its
specific DNA targets, a DNA-binding protein often associates with nonspecific
DNA and then diffuses along the DNA. Due to the weak interactions between
nonspecific DNA and the protein, structural characterization of nonspecific
DNA–protein complexes is experimentally challenging. This paper
describes a computational modeling study on nonspecific DNA–protein
complexes and comparative analysis with respect to specific
DNA–protein complexes. The study found that the specific DNA-binding
sites on a protein are typically favorable for nonspecific DNA and that
nonspecific and specific DNA–protein interaction modes are quite
similar. This similarity may reflect an important sampling step in the search
for the specific DNA target sequence by a DNA-binding protein. On the basis of
these observations, a novel method was proposed for predicting DNA-binding sites
and binding modes of a DNA-binding protein without knowing its specific DNA
target sequence. Ultimately, the combination of this method and protein
structure prediction may lead the way to high throughput modeling of
Accurate prediction of the structure of protein-protein complexes in computational docking experiments remains a formidable challenge. It has been recognized that identifying native or native-like poses among multiple decoys is the major bottleneck of the current scoring functions used in docking. We have developed a novel multi-body pose-scoring function that has no theoretical limit on the number of residues contributing to the individual interaction terms. We use a coarse-grain representation of a protein-protein complex where each residue is represented by its side chain centroid. We apply a computational geometry approach called Almost-Delaunay tessellation that transforms protein-protein complexes into a residue contact network, or an un-directional graph where vertex-residues are nodes connected by edges. This treatment forms a family of interfacial graphs representing a dataset of protein-protein complexes. We then employ frequent subgraph mining approach to identify common interfacial residue patterns that appear in at least a subset of native protein-protein interfaces. The geometrical parameters and frequency of occurrence of each “native” pattern in the training set are used to develop the new SPIDER scoring function. SPIDER was validated using standard “ZDOCK” benchmark dataset that was not used in the development of SPIDER. We demonstrate that SPIDER scoring function ranks native and native-like poses above geometrical decoys and that it exceeds in performance a popular ZRANK scoring function. SPIDER was ranked among the top scoring functions in a recent round of CAPRI (Critical Assessment of PRedicted Interactions) blind test of protein–protein docking methods.
Bioinformatics; Amino acids; Centroids; Statistical potential; Delaunay tessellation; Subgraph mining; Motifs; Coarse-grained; ZDOCK; CAPRI