1.  Protein-Protein Docking with F2Dock 2.0 and GB-Rerank 
PLoS ONE  2013;8(3):e51307.
Computational simulation of protein-protein docking can expedite the process of molecular modeling and drug discovery. This paper reports on our new F2 Dock protocol which improves the state of the art in initial stage rigid body exhaustive docking search, scoring and ranking by introducing improvements in the shape-complementarity and electrostatics affinity functions, a new knowledge-based interface propensity term with FFT formulation, a set of novel knowledge-based filters and finally a solvation energy (GBSA) based reranking technique. Our algorithms are based on highly efficient data structures including the dynamic packing grids and octrees which significantly speed up the computations and also provide guaranteed bounds on approximation error.
The improved affinity functions show superior performance compared to their traditional counterparts in finding correct docking poses at higher ranks. We found that the new filters and the GBSA based reranking individually and in combination significantly improve the accuracy of docking predictions with only minor increase in computation time. We compared F2 Dock 2.0 with ZDock 3.0.2 and found improvements over it, specifically among 176 complexes in ZLab Benchmark 4.0, F2 Dock 2.0 finds a near-native solution as the top prediction for 22 complexes; where ZDock 3.0.2 does so for 13 complexes. F2 Dock 2.0 finds a near-native solution within the top 1000 predictions for 106 complexes as opposed to 104 complexes for ZDock 3.0.2. However, there are 17 and 15 complexes where F2 Dock 2.0 finds a solution but ZDock 3.0.2 does not and vice versa; which indicates that the two docking protocols can also complement each other.
The docking protocol has been implemented as a server with a graphical client (TexMol) which allows the user to manage multiple docking jobs, and visualize the docked poses and interfaces. Both the server and client are available for download. Server: Client:
PMCID: PMC3590208  PMID: 23483883
2.  Protein-protein docking using region-based 3D Zernike descriptors 
BMC Bioinformatics  2009;10:407.
Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur.
We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-αRMSD ≤ 2.5 Å) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases.
We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods.
PMCID: PMC2800122  PMID: 20003235
3.  The scoring of poses in protein-protein docking: current capabilities and future directions 
BMC Bioinformatics  2013;14:286.
Protein-protein docking, which aims to predict the structure of a protein-protein complex from its unbound components, remains an unresolved challenge in structural bioinformatics. An important step is the ranking of docked poses using a scoring function, for which many methods have been developed. There is a need to explore the differences and commonalities of these methods with each other, as well as with functions developed in the fields of molecular dynamics and homology modelling.
We present an evaluation of 115 scoring functions on an unbound docking decoy benchmark covering 118 complexes for which a near-native solution can be found, yielding top 10 success rates of up to 58%. Hierarchical clustering is performed, so as to group together functions which identify near-natives in similar subsets of complexes. Three set theoretic approaches are used to identify pairs of scoring functions capable of correctly scoring different complexes. This shows that functions in different clusters capture different aspects of binding and are likely to work together synergistically.
All functions designed specifically for docking perform well, indicating that functions are transferable between sampling methods. We also identify promising methods from the field of homology modelling. Further, differential success rates by docking difficulty and solution quality suggest a need for flexibility-dependent scoring. Investigating pairs of scoring functions, the set theoretic measures identify known scoring strategies as well as a number of novel approaches, indicating promising augmentations of traditional scoring methods. Such augmentation and parameter combination strategies are discussed in the context of the learning-to-rank paradigm.
PMCID: PMC3850738  PMID: 24079540
Docking; Scoring functions; Binding energy; Ranking; SwarmDock
4.  Shape Complementarity of Protein-Protein Complexes at Multiple Resolutions 
Proteins  2009;75(2):453-467.
Biological complexes typically exhibit intermolecular interfaces of high shape complementarity. Many computational docking approaches use this surface complementarity as a guide in the search for predicting the structures of protein-protein complexes. Proteins often undergo conformational changes in order to create a highly complementary interface when associating. These conformational changes are a major cause of failure for automated docking procedures when predicting binding modes between proteins using their unbound conformations. Low resolution surfaces in which high frequency geometric details are omitted have been used to address this problem. These smoothed, or blurred, surfaces are expected to minimize the differences between free and bound structures, especially those that are due to side chain conformations or small backbone deviations.
In spite of the fact that this approach has been used in many docking protocols, there has yet to be a systematic study of the effects of such surface smoothing on the shape complementarity of the resulting interfaces. Here we investigate this question by computing shape complementarity of a set of 66 protein-protein complexes represented by multi-resolution blurred surfaces. Complexed and unbound structures are available for these protein-protein complexes. They are a subset of complexes from a non-redundant docking benchmark selected for rigidity (i.e. the proteins undergo limited conformational changes between their bound and unbound states). In this work we construct the surfaces by isocontouring a density map obtained by accumulating the densities of Gaussian functions placed at all atom centers of the molecule. The smoothness or resolution is specified by a Gaussian fall-off coefficient, termed “blobbyness”. Shape complementarity is quantified using a histogram of the shortest distances between two proteins' surface mesh vertices for both the crystallographic complexes and the complexes built using the protein structures in their unbound conformation.
The histograms calculated for the bound complex structures demonstrate that medium resolution smoothing (blobbyness=−0.9) can reproduce about 88% of the shape complementarity of atomic resolution surfaces. Complexes formed from the free component structures show a partial loss of shape complementarity (more overlaps and gaps) with the atomic resolution surfaces. For surfaces smoothed to low resolution (blobbyness=−0.3), we find more consistency of shape complementarity between the complexed and free cases. To further reduce bad contacts without significantly impacting the good contacts we introduce another blurred surface, in which the Gaussian densities of flexible atoms are reduced. From these results we discuss the use of shape complementarity in protein-protein docking.
PMCID: PMC2928789  PMID: 18837463
Protein interactions; protein-protein docking; Gaussian surface; protein side-chain flexibility; protein interfaces; unbound-unbound docking; protein complexes; Blur surface; FlexBlur surface; enzyme-inhibitor complexes
5.  A machine learning based method to improve docking scoring functions and its application to drug repurposing 
Docking scoring functions are notoriously weak predictors of binding affinity. They typically assign a common set of weights to the individual energy terms that contribute to the overall energy score, however, these weights should be gene family-dependent. In addition, they incorrectly assume that individual interactions contribute towards the total binding affinity in an additive manner. In reality, noncovalent interactions often depend on one another in a nonlinear manner. In this paper we show how the use of support vector machines (SVMs), trained by associating sets of individual energy terms retrieved from molecular docking with the known binding affinity of each compound from high-throughput screening experiments, can be used to improve the correlation between known binding affinities and those predicted by the docking program eHiTS. We construct two prediction models; a regression model trained using IC50 values from BindingDB, and a classification model trained using active and decoy compounds from the Directory of Useful Decoys (DUD). Moreover, to address the issue of overrepresentation of negative data in high-throughput screening data sets, we have designed a multiple-planar SVM training procedure for the classification model. The increased performance that both SVMs give when compared with the original eHiTS scoring function highlights the potential for using nonlinear methods when deriving overall energy scores from their individual components. We apply the above methodology to train a new scoring function for direct inhibitors of M.tuberculosis (M.tb) InhA. By combining ligand binding site comparison with the new scoring function, we propose that phosphodiesterase inhibitors can potentially be repurposed to target M.tb InhA. Our methodology may be applied to other gene families for which target structures and activity data are available, as demonstrated in the work presented here.
PMCID: PMC3076728  PMID: 21291174
6.  DECK: Distance and environment-dependent, coarse-grained, knowledge-based potentials for protein-protein docking 
BMC Bioinformatics  2011;12:280.
Computational approaches to protein-protein docking typically include scoring aimed at improving the rank of the near-native structure relative to the false-positive matches. Knowledge-based potentials improve modeling of protein complexes by taking advantage of the rapidly increasing amount of experimentally derived information on protein-protein association. An essential element of knowledge-based potentials is defining the reference state for an optimal description of the residue-residue (or atom-atom) pairs in the non-interaction state.
The study presents a new Distance- and Environment-dependent, Coarse-grained, Knowledge-based (DECK) potential for scoring of protein-protein docking predictions. Training sets of protein-protein matches were generated based on bound and unbound forms of proteins taken from the DOCKGROUND resource. Each residue was represented by a pseudo-atom in the geometric center of the side chain. To capture the long-range and the multi-body interactions, residues in different secondary structure elements at protein-protein interfaces were considered as different residue types. Five reference states for the potentials were defined and tested. The optimal reference state was selected and the cutoff effect on the distance-dependent potentials investigated. The potentials were validated on the docking decoys sets, showing better performance than the existing potentials used in scoring of protein-protein docking results.
A novel residue-based statistical potential for protein-protein docking was developed and validated on docking decoy sets. The results show that the scoring function DECK can successfully identify near-native protein-protein matches and thus is useful in protein docking. In addition to the practical application of the potentials, the study provides insights into the relative utility of the reference states, the scope of the distance dependence, and the coarse-graining of the potentials.
PMCID: PMC3145612  PMID: 21745398
7.  Modeling T cell receptor recognition of CD1-lipid and MR1-metabolite complexes 
BMC Bioinformatics  2014;15(1):319.
T cell receptors (TCRs) can recognize diverse lipid and metabolite antigens presented by MHC-like molecules CD1 and MR1, and the molecular basis of many of these interactions has not been determined. Here we applied our protein docking algorithm TCRFlexDock, previously developed to perform docking of TCRs to peptide-MHC (pMHC) molecules, to predict the binding of αβ and γδ TCRs to CD1 and MR1, starting with the structures of the unbound molecules.
Evaluating against TCR-CD1d complexes with crystal structures, we achieved near-native structures in the top 20 models for two out of four cases, and an acceptable-rated prediction for a third case. We also predicted the structure of an interaction between a MAIT TCR and MR1-antigen that has not been structurally characterized, yielding a top-ranked model that agreed remarkably with a characterized TCR-MR1-antigen structure that has a nearly identical TCR α chain but a different β chain, highlighting the likely dominance of the conserved α chain in MR1-antigen recognition. Docking performance was improved by re-training our scoring function with a set of TCR-pMHC complexes, and for a case with an outlier binding mode, we found that alternative docking start positions improved predictive accuracy. We then performed unbound docking with two mycolyl-lipid specific TCRs that recognize lipid-bound CD1b, which represent a class of interactions that is not structurally characterized. Highly-ranked models of these complexes showed remarkable agreement between their binding topologies, as expected based on their shared germline sequences, while differences in residue-level interactions with their respective antigens point to possible mechanisms underlying their distinct specificities.
Together these results indicate that flexible docking simulations can provide accurate models and atomic-level insights into TCR recognition of MHC-like molecules presenting lipid and other small molecule antigens.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-319) contains supplementary material, which is available to authorized users.
PMCID: PMC4261541  PMID: 25260513
TCR; MAIT; CD1d; CD1b; GEM; MHC-like
8.  Structural Modeling of Protein Interactions by Analogy: Application to PSD-95 
PLoS Computational Biology  2006;2(11):e153.
We describe comparative patch analysis for modeling the structures of multidomain proteins and protein complexes, and apply it to the PSD-95 protein. Comparative patch analysis is a hybrid of comparative modeling based on a template complex and protein docking, with a greater applicability than comparative modeling and a higher accuracy than docking. It relies on structurally defined interactions of each of the complex components, or their homologs, with any other protein, irrespective of its fold. For each component, its known binding modes with other proteins of any fold are collected and expanded by the known binding modes of its homologs. These modes are then used to restrain conventional molecular docking, resulting in a set of binary domain complexes that are subsequently ranked by geometric complementarity and a statistical potential. The method is evaluated by predicting 20 binary complexes of known structure. It is able to correctly identify the binding mode in 70% of the benchmark complexes compared with 30% for protein docking. We applied comparative patch analysis to model the complex of the third PSD-95, DLG, and ZO-1 (PDZ) domain and the SH3-GK domains in the PSD-95 protein, whose structure is unknown. In the first predicted configuration of the domains, PDZ interacts with SH3, leaving both the GMP-binding site of guanylate kinase (GK) and the C-terminus binding cleft of PDZ accessible, while in the second configuration PDZ interacts with GK, burying both binding sites. We suggest that the two alternate configurations correspond to the different functional forms of PSD-95 and provide a possible structural description for the experimentally observed cooperative folding transitions in PSD-95 and its homologs. More generally, we expect that comparative patch analysis will provide useful spatial restraints for the structural characterization of an increasing number of binary and higher-order protein complexes.
Protein–protein interactions play a crucial role in many cellular processes. An important step towards a mechanistic description of these processes is a structural characterization of the proteins and their complexes. The authors developed a new approach to modeling the structure of protein complexes and multidomain proteins. The approach, called comparative patch analysis, complements the two currently existing approaches for structural modeling of protein complexes, comparative modeling, and protein docking. It limits the configurations refined by molecular docking to the structurally defined interactions of each of the complex components, or their homologs, with any other protein, irrespective of its fold; the final prediction corresponds to the best-scoring refined configuration. The authors applied comparative patch analysis to predict the structure of the core fragment of PSD-95, a five-domain protein that plays a major role in the postsynaptic density at neuronal synapses. The study suggests two alternate configurations of the core fragment that potentially correspond to the different functional forms of PSD-95. This finding provides a possible structural explanation for the experimentally observed cooperative folding transitions in PSD-95 and its homologs.
PMCID: PMC1635541  PMID: 17096593
9.  Benchmarking and Analysis of Protein Docking Performance in Rosetta v3.2 
PLoS ONE  2011;6(8):e22477.
RosettaDock has been increasingly used in protein docking and design strategies in order to predict the structure of protein-protein interfaces. Here we test capabilities of RosettaDock 3.2, part of the newly developed Rosetta v3.2 modeling suite, against Docking Benchmark 3.0, and compare it with RosettaDock v2.3, the latest version of the previous Rosetta software package. The benchmark contains a diverse set of 116 docking targets including 22 antibody-antigen complexes, 33 enzyme-inhibitor complexes, and 60 ‘other’ complexes. These targets were further classified by expected docking difficulty into 84 rigid-body targets, 17 medium targets, and 14 difficult targets. We carried out local docking perturbations for each target, using the unbound structures when available, in both RosettaDock v2.3 and v3.2. Overall the performances of RosettaDock v2.3 and v3.2 were similar. RosettaDock v3.2 achieved 56 docking funnels, compared to 49 in v2.3. A breakdown of docking performance by protein complex type shows that RosettaDock v3.2 achieved docking funnels for 63% of antibody-antigen targets, 62% of enzyme-inhibitor targets, and 35% of ‘other’ targets. In terms of docking difficulty, RosettaDock v3.2 achieved funnels for 58% of rigid-body targets, 30% of medium targets, and 14% of difficult targets. For targets that failed, we carry out additional analyses to identify the cause of failure, which showed that binding-induced backbone conformation changes account for a majority of failures. We also present a bootstrap statistical analysis that quantifies the reliability of the stochastic docking results. Finally, we demonstrate the additional functionality available in RosettaDock v3.2 by incorporating small-molecules and non-protein co-factors in docking of a smaller target set. This study marks the most extensive benchmarking of the RosettaDock module to date and establishes a baseline for future research in protein interface modeling and structure prediction.
PMCID: PMC3149062  PMID: 21829626
10.  Probing Molecular Docking in a Charged Model Binding Site 
Journal of molecular biology  2006;357(5):1449-1470.
A model binding site was used to investigate charge–charge interactions in molecular docking. This simple site, a small (180 Å3) engineered cavity in cyctochrome c peroxidase (CCP), is negatively charged and completely buried from solvent, allowing us to explore the balance between electrostatic energy and ligand desolvation energy in a system where many of the common approximations in docking do not apply. A database with about 5300 molecules was docked into this cavity. Retrospective testing with known ligands and decoys showed that overall the balance between electrostatic interaction and desolvation energy was captured. More interesting were prospective docking scre”ens that looked for novel ligands, especially those that might reveal problems with the docking and energy methods. Based on screens of the 5300 compound database, both high-scoring and low-scoring molecules were acquired and tested for binding. Out of 16 new, high-scoring compounds tested, 15 were observed to bind. All of these were small heterocyclic cations. Binding constants were measured for a few of these, they ranged between 20 μM and 60 μM. Crystal structures were determined for ten of these ligands in complex with the protein. The observed ligand geometry corresponded closely to that predicted by docking. Several low-scoring alkyl amino cations were also tested and found to bind. The low docking score of these molecules owed to the relatively high charge density of the charged amino group and the corresponding high desolvation penalty. When the complex structures of those ligands were determined, a bound water molecule was observed interacting with the amino group and a backbone carbonyl group of the cavity. This water molecule mitigates the desolvation penalty and improves the interaction energy relative to that of the “naked” site used in the docking screen. Finally, six low-scoring neutral molecules were also tested, with a view to looking for false negative predictions. Whereas most of these did not bind, two did (phenol and 3-fluorocatechol). Crystal structures for these two ligands in complex with the cavity site suggest reasons for their binding. That these neutral molecules do, in fact bind, contradicts previous results in this site and, along with the alkyl amines, provides instructive false negatives that help identify weaknesses in our scoring functions. Several improvements of these are considered.
PMCID: PMC3025978  PMID: 16490206
molecular docking; electrostatic; solvation; cyctochrome c peroxidase; X-ray crystallography
11.  A Unified Conformational Selection and Induced Fit Approach to Protein-Peptide Docking 
PLoS ONE  2013;8(3):e58769.
Protein-peptide interactions are vital for the cell. They mediate, inhibit or serve as structural components in nearly 40% of all macromolecular interactions, and are often associated with diseases, making them interesting leads for protein drug design. In recent years, large-scale technologies have enabled exhaustive studies on the peptide recognition preferences for a number of peptide-binding domain families. Yet, the paucity of data regarding their molecular binding mechanisms together with their inherent flexibility makes the structural prediction of protein-peptide interactions very challenging. This leaves flexible docking as one of the few amenable computational techniques to model these complexes. We present here an ensemble, flexible protein-peptide docking protocol that combines conformational selection and induced fit mechanisms. Starting from an ensemble of three peptide conformations (extended, a-helix, polyproline-II), flexible docking with HADDOCK generates 79.4% of high quality models for bound/unbound and 69.4% for unbound/unbound docking when tested against the largest protein-peptide complexes benchmark dataset available to date. Conformational selection at the rigid-body docking stage successfully recovers the most relevant conformation for a given protein-peptide complex and the subsequent flexible refinement further improves the interface by up to 4.5 Å interface RMSD. Cluster-based scoring of the models results in a selection of near-native solutions in the top three for ∼75% of the successfully predicted cases. This unified conformational selection and induced fit approach to protein-peptide docking should open the route to the modeling of challenging systems such as disorder-order transitions taking place upon binding, significantly expanding the applicability limit of biomolecular interaction modeling by docking.
PMCID: PMC3596317  PMID: 23516555
12.  Molecular Modeling-Based Evaluation of hTLR10 and Identification of Potential Ligands in Toll-Like Receptor Signaling 
PLoS ONE  2010;5(9):e12713.
Toll-like receptors (TLRs) are pattern recognition receptors that recognize pathogens based on distinct molecular signatures. The human (h)TLR1, 2, 6 and 10 belong to the hTLR1 subfamilies, which are localized in the extracellular regions and activated in response to diverse ligand molecules. Due to the unavailability of the hTLR10 crystal structure, the understanding of its homo and heterodimerization with hTLR2 and hTLR1 and the ligand responsible for its activation is limited. To improve our understanding of the TLR10 receptor-ligand interaction, we used homology modeling to construct a three dimensional (3D) structure of hTLR10 and refined the model through molecular dynamics (MD) simulations. We utilized the optimized structures for the molecular docking in order to identify the potential site of interactions between the homo and heterodimer (hTLR10/2 and hTLR10/1). The docked complexes were then used for interaction with ligands (Pam3CSK4 and PamCysPamSK4) using MOE-Dock and ASEDock. Our docking studies have shown the binding orientations of hTLR10 heterodimer to be similar with other TLR2 family members. However, the binding orientation of hTLR10 homodimer is different from the heterodimer due to the presence of negative charged surfaces at the LRR11-14, thereby providing a specific cavity for ligand binding. Moreover, the multiple protein-ligand docking approach revealed that Pam3CSK4 might be the ligand for the hTLR10/2 complex and PamCysPamSK4, a di-acylated peptide, might activate hTLR10/1 hetero and hTLR10 homodimer. Therefore, the current modeled complexes can be a useful tool for further experimental studies on TLR biology.
PMCID: PMC2943521  PMID: 20877634
13.  An interaction-motif-based scoring function for protein-ligand docking 
BMC Bioinformatics  2010;11:298.
A good scoring function is essential for molecular docking computations. In conventional scoring functions, energy terms modeling pairwise interactions are cumulatively summed, and the best docking solution is selected. Here, we propose to transform protein-ligand interactions into three-dimensional geometric networks, from which recurring network substructures, or network motifs, are selected and used to provide probability-ranked interaction templates with which to score docking solutions.
A novel scoring function for protein-ligand docking, MotifScore, was developed. It is non-energy-based, and docking is, instead, scored by counting the occurrences of motifs of protein-ligand interaction networks constructed using structures of protein-ligand complexes. MotifScore has been tested on a benchmark set established by others to assess its ability to identify near-native complex conformations among a set of decoys. In this benchmark test, 84% of the highest-scored docking conformations had root-mean-square deviations (rmsds) below 2.0 Å from the native conformation, which is comparable with the best of several energy-based docking scoring functions. Many of the top motifs, which comprise a multitude of chemical groups that interact simultaneously and make a highly significant contribution to MotifScore, capture recurrent interacting patterns beyond pairwise interactions.
While providing quite good docking scores, MotifScore is quite different from conventional energy-based functions. MotifScore thus represents a new, network-based approach for exploring problems associated with molecular docking.
PMCID: PMC3098071  PMID: 20525216
14.  DrugScorePPI Knowledge-Based Potentials Used as Scoring and Objective Function in Protein-Protein Docking 
PLoS ONE  2014;9(2):e89466.
The distance-dependent knowledge-based DrugScorePPI potentials, previously developed for in silico alanine scanning and hot spot prediction on given structures of protein-protein complexes, are evaluated as a scoring and objective function for the structure prediction of protein-protein complexes. When applied for ranking “unbound perturbation” (“unbound docking”) decoys generated by Baker and coworkers a 4-fold (1.5-fold) enrichment of acceptable docking solutions in the top ranks compared to a random selection is found. When applied as an objective function in FRODOCK for bound protein-protein docking on 97 complexes of the ZDOCK benchmark 3.0, DrugScorePPI/FRODOCK finds up to 10% (15%) more high accuracy solutions in the top 1 (top 10) predictions than the original FRODOCK implementation. When used as an objective function for global unbound protein-protein docking, fair docking success rates are obtained, which improve by ∼2-fold to 18% (58%) for an at least acceptable solution in the top 10 (top 100) predictions when performing knowledge-driven unbound docking. This suggests that DrugScorePPI balances well several different types of interactions important for protein-protein recognition. The results are discussed in view of the influence of crystal packing and the type of protein-protein complex docked. Finally, a simple criterion is provided with which to estimate a priori if unbound docking with DrugScorePPI/FRODOCK will be successful.
PMCID: PMC3931789  PMID: 24586799
15.  Scoring Protein Interaction Decoys using Exposed Residues (SPIDER): A Novel Multi-Body Interaction Scoring Function based on Frequent Geometric Patterns of Interfacial Residues 
Proteins  2012;80(9):2207-2217.
Accurate prediction of the structure of protein-protein complexes in computational docking experiments remains a formidable challenge. It has been recognized that identifying native or native-like poses among multiple decoys is the major bottleneck of the current scoring functions used in docking. We have developed a novel multi-body pose-scoring function that has no theoretical limit on the number of residues contributing to the individual interaction terms. We use a coarse-grain representation of a protein-protein complex where each residue is represented by its side chain centroid. We apply a computational geometry approach called Almost-Delaunay tessellation that transforms protein-protein complexes into a residue contact network, or an un-directional graph where vertex-residues are nodes connected by edges. This treatment forms a family of interfacial graphs representing a dataset of protein-protein complexes. We then employ frequent subgraph mining approach to identify common interfacial residue patterns that appear in at least a subset of native protein-protein interfaces. The geometrical parameters and frequency of occurrence of each “native” pattern in the training set are used to develop the new SPIDER scoring function. SPIDER was validated using standard “ZDOCK” benchmark dataset that was not used in the development of SPIDER. We demonstrate that SPIDER scoring function ranks native and native-like poses above geometrical decoys and that it exceeds in performance a popular ZRANK scoring function. SPIDER was ranked among the top scoring functions in a recent round of CAPRI (Critical Assessment of PRedicted Interactions) blind test of protein–protein docking methods.
PMCID: PMC3409293  PMID: 22581643
Bioinformatics; Amino acids; Centroids; Statistical potential; Delaunay tessellation; Subgraph mining; Motifs; Coarse-grained; ZDOCK; CAPRI
16.  SnugDock: Paratope Structural Optimization during Antibody-Antigen Docking Compensates for Errors in Antibody Homology Models 
PLoS Computational Biology  2010;6(1):e1000644.
High resolution structures of antibody-antigen complexes are useful for analyzing the binding interface and to make rational choices for antibody engineering. When a crystallographic structure of a complex is unavailable, the structure must be predicted using computational tools. In this work, we illustrate a novel approach, named SnugDock, to predict high-resolution antibody-antigen complex structures by simultaneously structurally optimizing the antibody-antigen rigid-body positions, the relative orientation of the antibody light and heavy chains, and the conformations of the six complementarity determining region loops. This approach is especially useful when the crystal structure of the antibody is not available, requiring allowances for inaccuracies in an antibody homology model which would otherwise frustrate rigid-backbone docking predictions. Local docking using SnugDock with the lowest-energy RosettaAntibody homology model produced more accurate predictions than standard rigid-body docking. SnugDock can be combined with ensemble docking to mimic conformer selection and induced fit resulting in increased sampling of diverse antibody conformations. The combined algorithm produced four medium (Critical Assessment of PRediction of Interactions-CAPRI rating) and seven acceptable lowest-interface-energy predictions in a test set of fifteen complexes. Structural analysis shows that diverse paratope conformations are sampled, but docked paratope backbones are not necessarily closer to the crystal structure conformations than the starting homology models. The accuracy of SnugDock predictions suggests a new genre of general docking algorithms with flexible binding interfaces targeted towards making homology models useful for further high-resolution predictions.
Author Summary
Antibodies are proteins that are key elements of the immune system and increasingly used as drugs. Antibodies bind tightly and specifically to antigens to block their activity or to mark them for destruction. Three-dimensional structures of the antibody-antigen complexes are useful for understanding their mechanism and for designing improved antibody drugs. Experimental determination of structures is laborious and not always possible, so we have developed tools to predict structures of antibody-antigen complexes computationally. Computer-predicted models of antibodies, or homology models, typically have errors which can frustrate algorithms for prediction of protein-protein interfaces (docking), and result in incorrect predictions. Here, we have created and tested a new docking algorithm which incorporates flexibility to overcome structural errors in the antibody structural model. The algorithm allows both intramolecular and interfacial flexibility in the antibody during docking, resulting in improved accuracy approaching that when using experimentally determined antibody structures. Structural analysis of the predicted binding region of the complex will enable the protein engineer to make rational choices for better antibody drug designs.
PMCID: PMC2800046  PMID: 20098500
17.  Protein Docking by the Underestimation of Free Energy Funnels in the Space of Encounter Complexes 
PLoS Computational Biology  2008;4(10):e1000191.
Similarly to protein folding, the association of two proteins is driven by a free energy funnel, determined by favorable interactions in some neighborhood of the native state. We describe a docking method based on stochastic global minimization of funnel-shaped energy functions in the space of rigid body motions (SE(3)) while accounting for flexibility of the interface side chains. The method, called semi-definite programming-based underestimation (SDU), employs a general quadratic function to underestimate a set of local energy minima and uses the resulting underestimator to bias further sampling. While SDU effectively minimizes functions with funnel-shaped basins, its application to docking in the rotational and translational space SE(3) is not straightforward due to the geometry of that space. We introduce a strategy that uses separate independent variables for side-chain optimization, center-to-center distance of the two proteins, and five angular descriptors of the relative orientations of the molecules. The removal of the center-to-center distance turns out to vastly improve the efficiency of the search, because the five-dimensional space now exhibits a well-behaved energy surface suitable for underestimation. This algorithm explores the free energy surface spanned by encounter complexes that correspond to local free energy minima and shows similarity to the model of macromolecular association that proceeds through a series of collisions. Results for standard protein docking benchmarks establish that in this space the free energy landscape is a funnel in a reasonably broad neighborhood of the native state and that the SDU strategy can generate docking predictions with less than 5 Å ligand interface Cα root-mean-square deviation while achieving an approximately 20-fold efficiency gain compared to Monte Carlo methods.
Author Summary
Protein–protein interactions play a central role in various aspects of the structural and functional organization of the cell, and their elucidation is crucial for a better understanding of processes such as metabolic control, signal transduction, and gene regulation. Genomewide proteomics studies, primarily yeast two-hybrid assays, will provide an increasing list of interacting proteins, but only a small fraction of the potential complexes will be amenable to direct experimental analysis. Thus, it is important to develop computational docking methods that can elucidate the details of specific interactions at the atomic level. Protein–protein docking generally starts with a rigid body search that generates a large number of docked conformations with good shape, electrostatic, and chemical complementarity. The conformations are clustered to obtain a manageable number of models, but the current methods are unable to select the most likely structure among these models. Here we describe a refinement algorithm that, applied to the individual clusters, improves the quality of the models. The better models are suitable for higher-accuracy energy calculation, thereby increasing the chances that near-native structures can be identified, and thus the refinement increases the reliability of the entire docking algorithm.
PMCID: PMC2538569  PMID: 18846200
18.  CPORT: A Consensus Interface Predictor and Its Performance in Prediction-Driven Docking with HADDOCK 
PLoS ONE  2011;6(3):e17695.
Macromolecular complexes are the molecular machines of the cell. Knowledge at the atomic level is essential to understand and influence their function. However, their number is huge and a significant fraction is extremely difficult to study using classical structural methods such as NMR and X-ray crystallography. Therefore, the importance of large-scale computational approaches in structural biology is evident. This study combines two of these computational approaches, interface prediction and docking, to obtain atomic-level structures of protein-protein complexes, starting from their unbound components.
Methodology/Principal Findings
Here we combine six interface prediction web servers into a consensus method called CPORT (Consensus Prediction Of interface Residues in Transient complexes). We show that CPORT gives more stable and reliable predictions than each of the individual predictors on its own. A protocol was developed to integrate CPORT predictions into our data-driven docking program HADDOCK. For cases where experimental information is limited, this prediction-driven docking protocol presents an alternative to ab initio docking, the docking of complexes without the use of any information. Prediction-driven docking was performed on a large and diverse set of protein-protein complexes in a blind manner. Our results indicate that the performance of the HADDOCK-CPORT combination is competitive with ZDOCK-ZRANK, a state-of-the-art ab initio docking/scoring combination. Finally, the original interface predictions could be further improved by interface post-prediction (contact analysis of the docking solutions).
The current study shows that blind, prediction-driven docking using CPORT and HADDOCK is competitive with ab initio docking methods. This is encouraging since prediction-driven docking represents the absolute bottom line for data-driven docking: any additional biological knowledge will greatly improve the results obtained by prediction-driven docking alone. Finally, the fact that original interface predictions could be further improved by interface post-prediction suggests that prediction-driven docking has not yet been pushed to the limit. A web server for CPORT is freely available at
PMCID: PMC3064578  PMID: 21464987
19.  A flexible-protein molecular docking study of the binding of ruthenium complex compounds to PIM1, GSK-3β, and CDK2/Cyclin A protein kinases 
Journal of molecular modeling  2012;19(1):371-382.
We employ ensemble docking simulations to characterize the interactions of two enantiomeric forms of a Ru-complex compound (1-R and 1-S) with three protein kinases, namely PIM1, GSK-3β, and CDK2/cyclin A. We show that our ensemble docking computational protocol adequately models the structural features of these interactions and discriminates between competing conformational clusters of ligand-bound protein structures. Using the determined X-ray crystal structure of PIM1 complexed to the compound 1-R as a control, we discuss the importance of including the protein flexibility inherent in the ensemble docking protocol, for the accuracy of the structure prediction of the bound state. A comparison of our ensemble docking results suggests that PIM1 and GSK-3β bind the two enantiomers in similar fashion, through two primary binding modes: conformation I, which is very similar to the conformation presented in the existing PIM1/compound 1-R crystal structure; conformation II, which represents a 180° flip about an axis through the NH group of the pyridocarbazole moiety, relative to conformation I. In contrast, the binding of the enantiomers to CDK2 is found to have a different structural profile including a suggested bound conformation, which lacks the conserved hydrogen bond between the kinase and the ligand (i.e., ATP, staurosporine, Ru-complex compound). The top scoring conformation of the inhibitor bound to CDK2 is not present among the top-scoring conformations of the inhibitor bound to either PIM1 or GSK-3β and vice-versa. Collectively, our results help provide atomic-level insights into inhibitor selectivity among the three kinases.
PMCID: PMC3537894  PMID: 22926267
Small molecular kinase inhibitor; Protein kinase; Inhibitor selectivity; Ruthenium-based organometalic compound; Molecular dynamics simulation; Molecular docking; Protein flexibility; Ensemble molecular docking
20.  Dissecting the retinoid-induced differentiation of F9 embryonal stem cells by integrative genomics 
We reveal how the RXRα−RARγ heterodimer upon activation by ATRA sets up a sequence of temporally controlled events that generate different subsets of primary and secondarily induced gene networks.We established RARγ and RXRα chromatin immunoprecipitation (ChIP) analyses coupled with massive parallel sequencing (ChIP-seq) together with the corresponding microarray transcriptomics at five time points during differentiation using pan-RAR and RAR isotype-selective ligands.Gene-regulatory decisions were inferred in silico from the dynamic changes of the transcriptomics patterns that correlated with the expression of RXRα−RARγ and other annotated transcription factors (TFs).Our analysis provides a temporal view of retinoic acid (RA) signalling during F9 cell differentiation, reveals RA receptor (RAR) heterodimer dynamics and promiscuity, and predicts decisions that diversify the RA signal into distinct gene-regulatory programs.
Nuclear receptors are ligand-inducible transcription factors, which upon induction by their cognate ligand induce complex temporally controlled physiological programs. Retinoic acid (RA) and its receptors are key regulators of multiple physiological processes, including embryogenesis, organogenesis, immune functions, reproduction and organ homeostasis. While insight into (some of) the physiological functions of the various RA receptor (RAR) and retinoid X receptor (RXR) subtypes has been obtained by exploiting mouse genetics (for a review, see Mark et al, 2006) we are far from an understanding of the molecular circuitries and gene networks that are at the basis of these physiological events.
RAs act by interacting with a complex receptor system that comprises heterodimers formed by one of the three RXR (RARα, β and γ) and RAR (RARα, β and γ) isotypes. While insight into the role of heterodimerization on response element preference and contribution of RAR and RXR to transcription activation of model genes has been obtained (for review, see Gronemeyer et al, 2004) very little is known about the role and dynamics of target gene interaction of the various RXR–RAR heterodimers at a global scale in the context of a biological program.
More fundamentally, in order to develop a systems biology of nuclear receptors we need to establish approaches that reveal how the initial event, the information embedded in the chemical structure of a small molecular weight compound, is propagated through binding to cognate receptor(s), recruitment of co-regulatory factors, epigenetic modulators and additional complexes/machineries to establish temporally controlled gene programs. In this respect, a recent study has revealed the impact of epigenetic modulator crosstalk in the setting up of subprograms for oestrogen receptor signalling (Ceschin et al, 2011).
In the present study, we have used mouse F9 EC cells, a homogeneous cell system which is known to differentiate upon RA exposure and require RARγ for this response (Taneja et al, 1996), in order to integrate at a genome-wide scale (i) the dynamics of RXRα and RARγ binding by chromatin immunoprecipitation (ChIP) analyses coupled with massive parallel sequencing (ChIP-seq), (ii) the correlated temporal regulation of gene programs by global transcriptomics analyses, including (iii) the response to isotype-selective RAR ligands (Box 1). Our study revealed an unexpected highly dynamic association of the RXRα–RARγ with target chromatin and an unexpected dynamics of the heterodimer composition itself, which is indicative of partner swapping.
Inspired by early works on the dynamics of Drosophila puffing patterns during ecdysone-induced metamorphosis (Ashburner et al, 1974) our working hypothesis was that diversification of gene programming is achieved by the sequential activation of separable gene cohorts that constitute the various facets of differentiation, such as altered proliferation, cell physiology, signalling and finally terminal apoptogenic differentiation. To identify these temporally activated subroutines within the overall program, we inferred gene-regulatory decisions in silico from dynamically altered global gene expression patterns that occurred due to the action of RXRα−RARγ and other annotated TFs (Ernst et al, 2007). This dynamic regulatory map was used to reconstruct RXRα–RARγ signalling networks by integration of functional co-citation. Altogether we present a genome-wide view of the temporal gene-regulatory events and the corresponding gene programs elicited by the RXRα–RARγ during F9 cell differentiation. Our study deciphers some of the mechanisms by which the chemical information encoded in RA is diversified to regulate different cohorts of genes.
Retinoic acid (RA) triggers physiological processes by activating heterodimeric transcription factors (TFs) comprising retinoic acid receptor (RARα, β, γ) and retinoid X receptor (RXRα, β, γ). How a single signal induces highly complex temporally controlled networks that ultimately orchestrate physiological processes is unclear. Using an RA-inducible differentiation model, we defined the temporal changes in the genome-wide binding patterns of RARγ and RXRα and correlated them with transcription regulation. Unexpectedly, both receptors displayed a highly dynamic binding, with different RXRα heterodimers targeting identical loci. Comparison of RARγ and RXRα co-binding at RA-regulated genes identified putative RXRα–RARγ target genes that were validated with subtype-selective agonists. Gene-regulatory decisions during differentiation were inferred from TF-target gene information and temporal gene expression. This analysis revealed six distinct co-expression paths of which RXRα–RARγ is associated with transcription activation, while Sox2 and Egr1 were predicted to regulate repression. Finally, RXRα–RARγ regulatory networks were reconstructed through integration of functional co-citations. Our analysis provides a dynamic view of RA signalling during cell differentiation, reveals RAR heterodimer dynamics and promiscuity, and predicts decisions that diversify the RA signal into distinct gene-regulatory programs.
This study provides a dynamic view of retinoic acid signalling during cell differentiation, reveals RAR/RXR heterodimer dynamics and promiscuity, and predicts decisions that diversify the RA signal into distinct gene-regulatory programs.
PMCID: PMC3261707  PMID: 21988834
ChIP-seq; retinoic acid-induced differentiation; RXR–RAR heterodimers; temporal control of gene networks; transcriptomics
21.  An Integrated Suite of Fast Docking Algorithms 
Proteins  2010;78(15):3197-3204.
The CAPRI experiment (Critical Assessment of Predicted Interactions) simulates realistic and diverse docking challenges, each case having specific properties that may be exploited by docking algorithms. Motivated by the different CAPRI challenges, we developed and implemented a comprehensive suite of docking algorithms. These were incorporated into a dynamic docking protocol, consisting of four main stages: (1) Biological and bioinformatics research aiming to predict the binding site residues, to define distance constraints between interface atoms and to analyze the flexibility of molecules; (2) Rigid or flexible docking, performed by the PatchDock or FlexDock method, which utilizes the information gathered in the previous step. Symmetric complexes are predicted by the SymmDock method; (3) Flexible refinement and re-ranking of the rigid docking solution candidates, performed by FiberDock; and finally, (4) clustering and filtering the results based on energy funnels. We analyzed the performance of our docking protocol on a large benchmark and on recent CAPRI targets. The analysis has demonstrated the importance of biological information gathering prior to docking, which significantly increased the docking success rate, and of the refinement and re-scoring stage that significantly improved the ranking of the rigid docking solutions. Our failures were mostly a result of mishandling backbone flexibility, inaccurate homology modeling, or incorrect biological assumptions. Most of the methods are available at
PMCID: PMC2952695  PMID: 20607855
22.  The auxin signalling network translates dynamic input into robust patterning at the shoot apex 
We provide a comprehensive expression map of the different genes (TIR1/AFBs, ARFs and Aux/IAAs) involved in the signalling pathway regulating gene transcription in response to auxin in the shoot apical meristem (SAM).We demonstrate a relatively simple structure of this pathway using a high-throughput yeast two-hybrid approach to obtain the Aux/IAA-ARF full interactome.The topology of the signalling network was used to construct a model for auxin signalling and to predict a role for the spatial regulation of auxin signalling in patterning of the SAM.We used a new sensor to monitor the input in the auxin signalling pathway and to confirm the model prediction, thus demonstrating that auxin signalling is essential to create robust patterns at the SAM.
The plant hormone auxin is a key morphogenetic signal involved in the control of cell identity throughout development. A striking example of auxin action is at the shoot apical meristem (SAM), a population of stem cells generating the aerial parts of the plant. Organ positioning and patterning depends on local accumulations of auxin in the SAM, generated by polar transport of auxin (Vernoux et al, 2010). However, it is still unclear how auxin is distributed at cell resolution in tissues and how the hormone is sensed in space and time during development. A complex ensemble of 29 Aux/IAAs and 23 ARFs is central to the regulation of gene transcription in response to auxin (for review, see Leyser, 2006; Guilfoyle and Hagen, 2007; Chapman and Estelle, 2009). Protein–protein interactions govern the properties of this transduction pathway (Del Bianco and Kepinski, 2011). Limited interaction studies suggest that, in the absence of auxin, the Aux/IAA repressors form heterodimers with the ARF transcription factors, preventing them from regulating target genes. In the presence of auxin, the Aux/IAA proteins are targeted to the proteasome by an SCF E3 ubiquitin ligase complex (Chapman and Estelle, 2009; Leyser, 2006). In this process, auxin promotes the interaction between Aux/IAA proteins and the TIR1 F-box of the SCF complex (or its AFB homologues) that acts as an auxin co-receptor (Dharmasiri et al, 2005a, 2005b; Kepinski and Leyser, 2005; Tan et al, 2007). The auxin-induced degradation of Aux/IAAs would then release ARFs to regulate transcription of their target genes. This includes activation of most of the Aux/IAA genes themselves, thus establishing a negative feedback loop (Guilfoyle and Hagen, 2007). Although this general scenario provides a framework for understanding gene regulation by auxin, the underlying protein–protein network remains to be fully characterized.
In this paper, we combined experimental and theoretical analyses to understand how this pathway contributes to sensing auxin in space and time (Figure 1). We first analysed the expression patterns of the ARFs, Aux/IAAs and TIR1/AFBs genes in the SAM. Our results demonstrate a general tendency for most of the 25 ARFs and Aux/IAAs detected in the SAM: a differential expression with low levels at the centre of the meristem (where the stem cells are located) and high levels at the periphery of the meristem (where organ initiation takes place). We also observed a similar differential expression for TIR1/AFB co-receptors. To understand the functional significance of the distribution of ARFs and Aux/IAAs in the SAM, we next investigated the global structure of the Aux/IAA-ARF network using a high-throughput yeast two-hybrid approach and uncover a rather simple topology that relies on three basic generic features: (i) Aux/IAA proteins interact with themselves, (ii) Aux/IAA proteins interact with ARF activators and (iii) ARF repressors have no or very limited interactions with other proteins in the network.
The results of our interaction analysis suggest a model for the Aux/IAA-ARF signalling pathway in the SAM, where transcriptional activation by ARF activators would be negatively regulated by two independent systems, one involving the ARF repressors, the other the Aux/IAAs. The presence of auxin would remove the inhibitory action of Aux/IAAs, but leave the ARF repressors to compete with ARF activators for promoter-binding sites. To explore the regulatory properties of this signalling network, we developed a mathematical model to describe the transcriptional output as a function of the signalling input that is the combinatorial effect of auxin concentration and of its perception. We then used the model and a simplified view of the meristem (where the same population of Aux/IAAs and ARFs exhibit a low expression at the centre and a high expression in the peripheral zone) for investigating the role of auxin signalling in SAM function. We show that in the model, for a given ARF activator-to-repressor ratio, the gene induction capacity increases with the absolute levels of ARF proteins. We thus predict that the differential expression of the ARFs generates differences in auxin sensitivities between the centre (low sensitivity) and the periphery (high sensitivity), and that the expression of TIR1/AFB participates to this regulation (prediction 1). We also use the model to analyse the transcriptional response to rapidly changing auxin concentrations. By simulating situations equivalent either to the centre or the periphery of our simplified representation of the SAM, we predict that the signalling pathway buffers its response to the auxin input via the balance between ARF activators and repressors, in turn generated by their differential spatial distributions (prediction 2).
To test the predictions from the model experimentally, we needed to assess both the input (auxin level and/or perception) and the output (target gene induction) of the signalling cascade. For measuring the transcriptional output, the widely used DR5 reporter is perfectly adapted (Figure 5) (Ulmasov et al, 1997; Sabatini et al, 1999; Benkova et al, 2003; Heisler et al, 2005). For assaying pathway input, we designed DII-VENUS, a novel auxin signalling sensor that comprises a constitutively expressed fusion of the auxin-binding domain (termed domain II or DII) (Dreher et al, 2006; Tan et al, 2007) of an IAA to a fast-maturating variant of YFP, VENUS (Figure 5). The degradation patterns from DII-VENUS indicate a high auxin signalling input both in flower primordia and at the centre of the SAM. This is in contrast to the organ-specific expression pattern of DR5::VENUS (Figure 5). These results indicate that the signalling pathway limits gene activation in response to auxin at the meristem centre and confirm the differential sensitivity to auxin between the centre and the periphery (prediction 1). We further confirmed the buffering capacities of the signalling pathway (prediction 2) by carrying out live imaging experiments to monitor DII-VENUS and DR5::VENUS expression in real time (Figure 5). This analysis reveals the presence of important temporal variations of DII-VENUS fluorescence, while DR5::VENUS does not show such global variations. Our approach thus provides evidence that the Aux/IAA-ARF pathway has a key role in patterning in the SAM, alongside the auxin transport system. Our results illustrate how the tight spatio-temporal regulation of both the distribution of a morphogenetic signal and the activity of the downstream signalling pathway provides robustness to a dynamic developmental process.
A comprehensive expression and interaction map of auxin signalling factors in the Arabidopsis shoot apical meristem is constructed and used to derive a mathematical model of auxin signalling, from which key predictions are experimentally confirmed.
The plant hormone auxin is thought to provide positional information for patterning during development. It is still unclear, however, precisely how auxin is distributed across tissues and how the hormone is sensed in space and time. The control of gene expression in response to auxin involves a complex network of over 50 potentially interacting transcriptional activators and repressors, the auxin response factors (ARFs) and Aux/IAAs. Here, we perform a large-scale analysis of the Aux/IAA-ARF pathway in the shoot apex of Arabidopsis, where dynamic auxin-based patterning controls organogenesis. A comprehensive expression map and full interactome uncovered an unexpectedly simple distribution and structure of this pathway in the shoot apex. A mathematical model of the Aux/IAA-ARF network predicted a strong buffering capacity along with spatial differences in auxin sensitivity. We then tested and confirmed these predictions using a novel auxin signalling sensor that reports input into the signalling pathway, in conjunction with the published DR5 transcriptional output reporter. Our results provide evidence that the auxin signalling network is essential to create robust patterns at the shoot apex.
PMCID: PMC3167386  PMID: 21734647
auxin; biosensor; live imaging; ODE; signalling
23.  Scoring docking conformations using predicted protein interfaces 
BMC Bioinformatics  2014;15:171.
Since proteins function by interacting with other molecules, analysis of protein-protein interactions is essential for comprehending biological processes. Whereas understanding of atomic interactions within a complex is especially useful for drug design, limitations of experimental techniques have restricted their practical use. Despite progress in docking predictions, there is still room for improvement. In this study, we contribute to this topic by proposing T-PioDock, a framework for detection of a native-like docked complex 3D structure. T-PioDock supports the identification of near-native conformations from 3D models that docking software produced by scoring those models using binding interfaces predicted by the interface predictor, Template based Protein Interface Prediction (T-PIP).
First, exhaustive evaluation of interface predictors demonstrates that T-PIP, whose predictions are customised to target complexity, is a state-of-the-art method. Second, comparative study between T-PioDock and other state-of-the-art scoring methods establishes T-PioDock as the best performing approach. Moreover, there is good correlation between T-PioDock performance and quality of docking models, which suggests that progress in docking will lead to even better results at recognising near-native conformations.
Accurate identification of near-native conformations remains a challenging task. Although availability of 3D complexes will benefit from template-based methods such as T-PioDock, we have identified specific limitations which need to be addressed. First, docking software are still not able to produce native like models for every target. Second, current interface predictors do not explicitly consider pairwise residue interactions between proteins and their interacting partners which leaves ambiguity when assessing quality of complex conformations.
PMCID: PMC4057934  PMID: 24906633
Protein-protein interaction; Interface prediction; Homology modelling; Docking; Model scoring; Model ranking
24.  Surface-histogram, a new shape descriptor for protein-protein docking 
Proteins  2011;80(1):221-238.
Determining the structure of protein-protein complexes remains a difficult and lengthy process, either by NMR or by X-ray crystallography. Several computational methods based on docking have been developed to support and even serve as possible alternatives to these experimental methods. In this paper, we introduce a new protein-protein docking algorithm, shDock, based on shape complementarity. We characterize the local geometry on each protein surface with a new shape descriptor, the surface-histogram. We measure the complementarity between two surface-histograms, one on each protein, using a modified Manhattan distance. When a match is found between two local protein surfaces, a model is generated for the protein complex, which is then scored by checking for collision between the two proteins. We have tested our algorithm on Version 3 of the ZDOCK protein-protein docking benchmark. We found that for 110 out of the 124 test cases of bound docking in the benchmark, our algorithm was able to generate a model in the top 3600 candidates for the protein complex within an RMSD of 2.5 Å from its native structure. For unbound docking predictions, we found a model within 2.5 Å in the top 3600 models in 54 out of 124 test cases. A comparison with other shape-based docking algorithms demonstrates that our approach gives significantly improved performance for both bound and unbound docking test cases.
PMCID: PMC3240741  PMID: 22072544
protein-protein docking; protein surface; shape descriptor; surface-histogram
25.  Evaluation of multiple protein docking structures using correctly predicted pairwise subunits 
BMC Bioinformatics  2012;13(Suppl 2):S6.
Many functionally important proteins in a cell form complexes with multiple chains. Therefore, computational prediction of multiple protein complexes is an important task in bioinformatics. In the development of multiple protein docking methods, it is important to establish a metric for evaluating prediction results in a reasonable and practical fashion. However, since there are only few works done in developing methods for multiple protein docking, there is no study that investigates how accurate structural models of multiple protein complexes should be to allow scientists to gain biological insights.
We generated a series of predicted models (decoys) of various accuracies by our multiple protein docking pipeline, Multi-LZerD, for three multi-chain complexes with 3, 4, and 6 chains. We analyzed the decoys in terms of the number of correctly predicted pair conformations in the decoys.
Results and conclusion
We found that pairs of chains with the correct mutual orientation exist even in the decoys with a large overall root mean square deviation (RMSD) to the native. Therefore, in addition to a global structure similarity measure, such as the global RMSD, the quality of models for multiple chain complexes can be better evaluated by using the local measurement, the number of chain pairs with correct mutual orientation. We termed the fraction of correctly predicted pairs (RMSD at the interface of less than 4.0Å) as fpair and propose to use it for evaluation of the accuracy of multiple protein docking.
PMCID: PMC3377905  PMID: 22536869

