|Home | About | Journals | Submit | Contact Us | Français|
The analysis of results from CAPRI (Critical Assessment of Predicted Interactions), the first communitywide experiment devoted to protein docking, shows that all successful methods consist of multiple stages. The methods belong to three classes: global methods based on fast Fourier transforms or geometric matching, medium range Monte Carlo methods, and the restraint-guided HADDOCK program. Although these classes of methods require very different amounts of information in addition to the structures of component proteins, they all share the same four computational steps: (1) simplified and/or rigid body search; (2) selecting the region(s) of interest; (3) refinement of docked structures; and (4) selecting the best models. While each method is optimal for a specific class of docking problems, combining computational steps from different methods can improve the reliability and accuracy of results.
The challenge for predictive protein docking is to start with the coordinates of the unbound component molecules and to obtain computationally a model of the bound complex [1,2]. Protein docking methods have substantially improved during the past few years. This has been demonstrated by the results of CAPRI (Critical Assessment of Predicted Interactions), the first communitywide experiment devoted to protein docking . In 16 rounds of CAPRI up to 63 participating groups tested their methods in blind predictions of 37 target protein-protein complexes. The predictions were grouped into highly accurate, medium accuracy, acceptable, and incorrect categories on the basis of the fraction of native contacts, the backbone root mean square deviation of the ligand (L_RMS) from the reference ligand structure after superimposing the receptor structures, and the backbone RMSD of the interface residues (I_RMS). The calculation of these measures and the exact definitions of categories are given in the first CAPRI evaluation paper ; here we note only that for the highly accurate, medium accuracy, acceptable, and incorrect models the ligand RMSD is given by L_RMS < 1Å, 1Å < L_RMS < 5Å, 5Å < L_RMS < 10Å, and L_RMS > 10Å, respectively. Each participating group was entitled to submit ten predictions for each target. The assessors considered all ten models, and the results for each group include the number of predictions in each of the four categories.
Results for CAPRI rounds 1–11 with 28 targets have been “officially” evaluated [4–6]. Two targets (22 and 23) were cancelled due to the early release of X-ray structures. Table 1 shows the summary of results for the six groups that submitted acceptable or better predictions for at least 10 targets. The table was obtained by summing the results for each group from the three separate CAPRI evaluation meetings [4–6]. The numbers of medium and high accuracy models submitted by these groups are also shown. We note that the Bonvin group also performed extremely well in rounds 3–11 using the HADDOCK program , but they did not participate in the first two rounds, and hence their overall score was lower than that of the six groups listed in Table 1.
The number of CAPRI targets, and more generally, the number of test cases for protein-protein docking  are rather small for drawing conclusions on a statistically significant basis. Nevertheless, the results so far suggest three observations. First, according to Table 1, all successful methods consist of multiple stages. Second, these methods belong to the three general classes shown in Table 2 that primarily differ in terms of the information which is required in addition to the structures of the component proteins. Third, although the CAPRI rules allow submitting ten models for each target, even the best methods have only about 50% success rate, and thus at this point it is advisable to incorporate experimental information to improve the reliability of predictions.
Although the results from each group in CAPRI seem quite close to each other, we note that the three classes of methods are most successful for different classes of problems. The global methods were generally the best for those CAPRI targets where neither of the components proteins underwent conformational change of more than 2 Å, particularly if no a priori information on the complex was available, requiring the search of the entire conformational space. The medium-range methods, particularly RosettaDock , yielded excellent results for a number of target for which side chain repacking was crucial, e.g., when one the component protein structures was a homology model. HADDOCK produced the best results if sufficient information on a number of the interface residues was available, even when the binding caused large conformational change, possibly including the backbone . Independent of the method, docking is relatively easy for enzyme-inhibitor complexes that usually can be determined with reasonable accuracy, possibly within a few alternative structures . Results are less predictable for antigen-antibody pairs, and are generally poor for small signaling complexes of weakly interacting proteins . The most difficult targets are the transient complexes that have a large interface area and are subject to substantial conformational change. While HADDOCK was able to generate meaningful models in some cases, no acceptable predictions were submitted for several CAPRI targets of this type [4–6].
In this review we focus on two issues. First, we show that although the three classes of methods in Table 2 use the additional information in very different ways, the main computational steps are common and rather similar in essentially all docking methods. Second, we argue that each class of methods is optimal for a specific class of docking problems, but the reliability of results can be further improved by combining computational steps from different methods.
The four steps that seem to occur in most docking algorithms are as follows: (1) simplified and/or rigid body search; (2) selecting the region(s) of interest; (3) refinement of docked structures; and (4) global discrimination, i.e., selecting the best models. Figure 1 shows these steps and the typical number of conformations retained in each step for the most general case which starts with a global search for the orientation and position of one component protein relative to the other. However, as shown in Table 2, the search may be restricted to a region of the conformational space, simplifying the selection of structures in Steps 2 and 4.
Due to computational constraints, truly global searches over the entire rotational/translational space are routinely carried out only by rigid body methods that use either fast Fourier transforms (FFT) [7, 8] or geometric matching [12,18]. FFT based methods systematically evaluate billions of docked conformations on a grid using correlation-type scoring functions. The original scoring function, based only on shape complementarity , has been expanded to include electrostatic and solvation terms, and more recently structure-based interaction potentials [20, 21], substantially improving the accuracy of the method. In all scoring functions the shape complementarity term allows for overlaps, thereby accounting for the differences between bound and unbound (separately crystallized) structures.
ICM-DISCO  and Rosetta Dock  start with rigid body Monte Carlo minimization runs in the rotational/translational space from random initial structures around the known or hypothetical receptor binding site, and thus generally explore only certain regions of the conformational space. In the first stage of RosettaDock the proteins are represented as backbones plus side chain centroids, and the search is guided by a residue-scale interaction potential. Benefiting from the simplified protein representation, the method was recently extended to account for loop flexibility . However, due to the increased computational burden loop search was feasible only in local rather than global docking , further restricting the search region. HADDOCK (High Ambiguity Driven biomolecular DOCKing) starts with rigid body energy minimization from completely random initial states, typically retaining 1000 complex structures . However, HADDOCK utilizes extra information in the form of a number of active residues (which are supposed to be part of the interface) and passive residues (surface neighbors of active residues). Ambiguous interaction restraints are defined between any atom of the active residues and all atoms of active and passive residues on the partner protein. The interaction restraints are incorporated into the scoring function and guide the search toward regions of the conformational space in which the restraints are satisfied.
HADDOCK applications generally involve ambiguous interaction restraints based on 10 to 25 active residues on the two sides of the interface. These residues may be selected using biochemical and/or biophysical information such as chemical shift perturbation data or the results of mutagenesis experiments, but predicted interface residues were also used for some of the CAPRI targets .
Since the rigid body searches rely on “soft” scoring functions that allow for overlaps, the accuracy is limited. The refinement of structures requires some level of protein flexibility, and due to the higher computational costs the number of structures must be reduced. ICM and HADDOCK simply retain a few hundred low energy conformations. In RosettaDock generally the centers of low energy clusters are selected . In the web-based docking server ClusPro  we cluster the low energy conformations and rank the clusters according to their size . Lorenzen and Zhang  compared the performance of clustering algorithms for selecting near-native docking conformations among structures generated by four FFT-based protein–protein docking methods and showed that although the performance of clustering depends on the quality and structural distribution of the decoys, the ranking based on clustering is better than that by the inherent scoring functions. Large scale docking studies by Vakser and co-workers have shown that the number of distinct energy basins is generally small and correlated with known binding modes .
In ICM-DISCO the retained solutions are further optimized with flexible interface ligand side chains using a biased probability Monte Carlo procedure . In RosettaDock the Monte Carlo minimization in translational and rotational coordinates is integrated with repacking the interface side chains using a backbone-dependent rotamer library. More recently the method has been extended to dock proteins with backbone conformational changes by combining the rigid-body search with modeling of some variable loops [11, 24]. Although the method gave excellent results for some CAPRI targets, the search had to be restricted to even smaller regions of the rotational/translational space, and this was a disadvantage in others . Chaudhury and Gray  also improved the ability of RosettaDock to consider backbone flexibility by adding ensemble docking and induced fit capabilities.
After global docking one may have to refine structures in many clusters, and hence efficiency is important. The FireDock refinement algorithm shows that it is enough to remove a few side chain clashes to substantially improve ranking in rigid body docking . SDU (Semi-Definite programming based Underestimation), an efficient stochastic optimization algorithm , is based on the assumption that the free energy is a funnel-like function within the region defined by each cluster . In HADDOCK the refinement starts with simulated annealing procedures that allow the interface side chains and the backbone to move, and proceeds with energy minimization and molecular dynamics simulations in a shell of TIP3P water molecules .
Since the accuracy of energy functions is limited, selecting the best predictions is not at all trivial, and it is not always clear how the individual predictors select the 10 models for CAPRI submission. It appears that the lowest energy structures are selected from the ICM-DISCO runs. In RosettaDock and HADDOCK, clusters of low energy structures are chosen. Following the refinement by SDU we rank the clusters according to the energies of their lowest energy structures . However, the energy function is not globally discriminatory in any of these methods. This is generally not a major problem in HADDOCK, because the search is restrained by the additional information, but improvement is needed in the other two classes of methods. Analyzing structures generated by semi-global runs of RosettaDock, London and Schueler-Furman  applied a machine learning algorithm to distinguish ensembles of low-energy conformations around the native conformation from other low-energy ensembles. By applying recursive feature elimination, the starting 42 features were reduced to seven, all with well defined biophysical interpretation, thereby reducing the possibility of overfitting, a serious problem for machine learning methods. The resulting classifier, FunHunt, identified the native orientation in 50/52 protein complexes in a test set, and showed that the energy decrease of trajectories toward near-native orientations is significantly larger than for other orientations.
Each class of methods in Table 2 has its specific strengths and limitations. In principle, global methods can be used without any information beyond the structures of the component proteins. However, at present experimental information is generally required for reliable docking results. The usage pattern of the docking server ClusPro proves this point . Since its release in December 2004, the server has performed close to 20,000 docking calculations for more than 2000 users, resulting in over 100 publications. In most studies the server was used to generate putative complex conformations, and the best models were selected and validated using a variety of experimental techniques, including site-directed mutagenesis, cross-linking, FRET, enzymatic proteolysis, or radiolytic protein footprinting. The advantage of this approach is that the putative models can be effectively used to design the most appropriate validation experiments. To emphasize the importance of such validation we note that, due to the rigid body step, global methods tend to fail if the bound and unbound protein structures substantially differ, whereas selecting final models for weak complexes is very unreliable.
Side chains can be relatively easily adjusted within the Monte Carlo steps of the translational/rotational search. As implemented in ICM-DISCO, and particularly in RosettaDock, the repacking of side chains allows for some conformational change upon binding and improves the accuracy of models. According to the CAPRI results, both programs generated respectable predictions for a few targets that were beyond the scope of the FFT-based methods. However, the CAPRI results also showed that without any information on the binding mode the search may be performed in a wrong region of the conformational space. The success of extending RosettaDock for dealing with backbone flexibility so far appears to be inconclusive, as the increased degrees of freedom tends to increase the number of false positive predictions, and due to the increased computational efforts the search becomes even more local . HADDOCK is the ideal docking method if substantial and reliable information is available from mutagenesis, mass spectrometry, or NMR. With appropriate restraints, the program can provide good results even for proteins with substantial change in side chain and backbone conformations. However, restraints based on incorrect information are likely to lead to incorrect structures. Nevertheless, in the latest rounds of CAPRI, HADDOCK was able to yield good results for several targets using the information which was available in the literature, and neural network based predictions of the interface residues .
The accuracy and reliability of docking results can be improved by combining different classes of methods. For example, we have studied the 30 clusters generated by FFT-based docking by starting RosettaDock runs from random points around the cluster centers, and observing whether a certain fraction of trajectories converge to a small region within the cluster . A cluster was considered stable if such a strong attractor existed and contained a low energy structure. It was shown that all clusters close to the native structure are stable, and that restricting considerations to stable clusters eliminates around half of the false positives. In similar spirit, Pierce and Weng  refined global docking predictions from ZDOCK using RosettaDock, and selected the best models based on their ZRANK score. Refining docking benchmark predictions from ZDOCK led to improved structures of top ranked hits in 20 of 27 cases, and an increase from 23 to 27 cases with hits in the top 20 predictions. In addition, the ZRANK energy function was optimized using the refined models. With the new energy function, the numbers of cases with hits ranked at number one increased from 12 to 19 and from 7 to 15 for two different ZDOCK versions. These results show that combinations of independently developed docking protocols (ZDOCK/ZRANK and RosettaDock) can substantially improve protein docking results.
Lorenzen and Zhang  refined initial docking estimates of protein complex structures, generated by an FFT-based method, using a Monte Carlo approach including rigid-body moves and side-chain optimization. During the simulation they gradually shifted from a smoothed van der Waals potential, which prevented trapping in local energy minima, to the standard Lennard-Jones potential. Following the simulation, the conformations were clustered to obtain the final predictions. The refinement procedure was able to generate near-native structures (interface RMSD <2.5 Å) as first model in 14 of 59 cases in the benchmark set. More generally, improving model accuracy using Monte Carlo methods enables the use of potentials that are more accurate but also more sensitive to structural errors. It may also be useful to combine HADDOCK with Monte Carlo methods if the extra information used is not fully reliable. In such cases the dependence of results on the restraints was tested by randomly removing 25% of data in docking trials . Since RosettaDock has a highly accurate scoring function (at least in a neighborhood of the native state) and performs complete repacking of side chains, it may be less biased to generate candidate models using HADDOCK, and to explore the “stability’’ of these models by Monte Carlo simulations without any restraints.
The analysis of docking predictions for the 28 CAPRI targets evaluated so far shows that similar success rates have been achieved by three classes of methods that require very different amount of information in addition to the structures of the component proteins. In spite of this substantial difference, all methods include very similar computational steps. However, each method is optimal for a specific class of docking problems. Global methods can provide valid predictions without any additional information, although experimental validation is highly recommended even in this case. Due to the repacking of side chains, Monte Carlo based methods can yield highly accurate models, but the search is restricted to a neighborhood of the starting structures. Finally, HADDOCK is the ideal method if substantial and reliable interaction information is available to guide the search. We suggest that reliable docking results can be obtained for a broader class of problems by combining computational steps from different methods. With increasing computing power such combined approaches become increasingly feasible, and can more efficiently utilize the information from a given set of experimental data.
At present the major unsolved problem in docking is the treatment of proteins with substantial backbone conformational change. In spite of attempts to introduce loop prediction and backbone adjustment in Monte Carlo based methods, it appears that the most successful method is still HADDOCK, of course assuming that appropriate interaction information is available. In order to avoid futile attempts for docking flexible proteins using rigid methods, it is necessary to develop methods that can predict protein flexibility [37–39]. In addition, predicting the hinge regions in one of the component proteins  enables the use of special methods such as FlexDock that deals with localized backbone flexibility by docking the rigid parts of the flexible molecule, and builds consistent configurations of the entire protein from these candidate parts .
This work has been supported by grant GM061867 from the National Institutes of Health
Conflict of Interest
The authors declare no conflict of interest.
Papers of particular interest, published within the annual period of review, have been highlighted as:
• of special interest
•• of outstanding interest