Search tips
Search criteria 


Logo of bioinfoLink to Publisher's site
Bioinformatics. 2012 December; 28(24): 3282–3289.
Published online 2012 October 23. doi:  10.1093/bioinformatics/bts628
PMCID: PMC3519461

A method for integrative structure determination of protein-protein complexes


Motivation: Structural characterization of protein interactions is necessary for understanding and modulating biological processes. On one hand, X-ray crystallography or NMR spectroscopy provide atomic resolution structures but the data collection process is typically long and the success rate is low. On the other hand, computational methods for modeling assembly structures from individual components frequently suffer from high false-positive rate, rarely resulting in a unique solution.

Results: Here, we present a combined approach that computationally integrates data from a variety of fast and accessible experimental techniques for rapid and accurate structure determination of protein–protein complexes. The integrative method uses atomistic models of two interacting proteins and one or more datasets from five accessible experimental techniques: a small-angle X-ray scattering (SAXS) profile, 2D class average images from negative-stain electron microscopy micrographs (EM), a 3D density map from single-particle negative-stain EM, residue type content of the protein–protein interface from NMR spectroscopy and chemical cross-linking detected by mass spectrometry. The method is tested on a docking benchmark consisting of 176 known complex structures and simulated experimental data. The near-native model is the top scoring one for up to 61% of benchmark cases depending on the included experimental datasets; in comparison to 10% for standard computational docking. We also collected SAXS, 2D class average images and 3D density map from negative-stain EM to model the PCSK9 antigen–J16 Fab antibody complex, followed by validation of the model by a subsequently available X-ray crystallographic structure.


Contact: gro.balilas@anid or gro.balilas@ilas

Supplementary information: Supplementary data are available at Bioinformatics online.


Biologists are identifying components of macromolecular assemblies and networks (Krogan et al., 2006). To understand how these assemblies and networks underpin essential biological processes and to modulate them for therapeutic purposes, we need to describe the structures of both natural and engineered protein interactions (Robinson et al., 2007). Owing to the difficulty of determining the atomic structures of protein complexes by X-ray crystallography and NMR spectroscopy as well as inaccuracy of alternative methods, such as protein–protein docking, new techniques are necessary (Alber et al., 2008).

One major computational approach to predicting structures of protein complexes relies on molecular docking of unbound single-component structures. Even for complexes with two proteins, docking problem remains challenging despite recent advances (Lensink and Wodak, 2010b). The major bottlenecks include dealing with protein flexibility and the absence of an accurate scoring function (Ritchie, 2008). Pairwise protein docking methods can be divided into three classes based on their configurational sampling algorithm (Vajda and Kozakov, 2009): (i) global methods using a fast Fourier transform (FFT) (Eisenstein and Katchalski-Katzir, 2004) or geometric matching (Schneidman-Duhovny et al., 2005); (ii) medium-range methods such as Monte Carlo sampling (Fernandez-Recio et al., 2003; Gray et al., 2003); and (iii) methods guided by data, such as complex refinement based on NMR restraints, cross-linking, interface prediction or site-directed mutagenesis (Dominguez et al., 2003; Sivasubramanian et al., 2006). It is common to begin docking two proteins with an unbiased global search followed by refinement of the best scoring models (Mashiach et al., 2010b).

Characterizing the structures of multi-subunit complexes benefits from using varied experimental datasets (Alber et al., 2007a,b; Russel et al., 2012). In this hybrid or integrative approach, the datasets are encoded into a scoring function used to evaluate candidate models generated by a sampling method. Integrative structure determination typically iterates through the following stages: (i) gathering information, (ii) designing model representation and evaluation, (iii) sampling good models, and (iv) analyzing models and information.

Here, we present an integrative approach to pairwise protein docking. First, data from one or more of five different experiment types are translated into the corresponding scoring function terms. These data include (i) the pair-distance distribution function of the complex from a small-angle X-ray scattering (SAXS) profile, (ii) 2D class average images of the complex from negative-stain electron microscopy micrographs (EM2D), (iii) a 3D density map of the complex from single-particle negative-stain electron microscopy micrographs (EM3D), (iv) residue type content at the protein interface from NMR spectroscopy (NMR-RTC) (Reese and Dötsch, 2003), and (v) chemical cross-linking detected by mass spectrometry (CXMS). These five experimental methods were selected because of their feasibility and efficiency of data collection: a SAXS profile of the complex in solution can be collected in several minutes (Hura et al., 2009); a 3D EM density map can be reconstructed from a smaller sample amount than that for SAXS, but data collection process is significantly longer (Stahlberg and Walz, 2008); 2D class averages can be computed from micrographs more easily and rapidly than performing a full 3D reconstruction; the composition of interface residues from NMR (Reese and Dötsch, 2003) provides information about the interaction interface, unlike the SAXS and EM data; and cross-linking data (Rappsilber, 2011) provide information at intermediate resolution imposing an upper distance bound on inter-molecular pairs of residues. Second, complex models are sampled, relying on efficient global search methods developed for pairwise protein docking, followed by filtering based on fit to the experimental data, conformational refinement and composite scoring. Third, good-scoring representatives of clusters of models are picked as final models.

To validate this approach, we apply the integrative method in two contexts. First, we test the method on a large benchmark for protein docking (Hwang et al., 2010) with simulated experimental data and known complex structures. This test allows a robust assessment of the value of the individual types of experimental data for specific types of proteins. Second, we also collected SAXS, EM2D and EM3D data to model the PCSK9 antigen–J16 Fab antibody complex, followed by validation of the model by a subsequently available X-ray crystallographic structure. This second test highlights the advantages of the integrative method that allows computing an accurate model in a timely manner.


2.1 Integrative docking method summary

Given the atomic structures of two proteins and one or more datasets from SAXS, EM2D, EM3D, NMR-RTC and CXMS, we compute the 3D structure of their complex. The approach involves four major stages (Fig. 1, Supplementary Material):

Fig. 1.
Schematic representation of the integrative docking method. The number of possible configurations for two docked proteins is on the order of ~1011 (three rotational degrees of freedom sampled in five degrees interval and three translational degrees ...

2.1.1 Stage 1: Global search

A global search in the space of all possible docking models is performed using geometry-based molecular docking (Duhovny et al., 2002). The configurational sampling precision is increased significantly compared with the default settings (Supplementary Table S1: from 4.5·103 to 212·103 models) to ensure the interface and global shape of the complex are sampled with precision commensurate with that of the data.

2.1.2 Stage 2: Data-guided filtering

Each available experimental dataset is used independently for scoring and filtering of models inconsistent with the data. To account for noise in the data, we convert the data into soft restraints. For SAXS profile, a model is filtered out if its radius of gyration is in significant disagreement with the experimentally derived one (Schneidman-Duhovny et al., 2011). For EM2D class averages, there is no filter. For EM3D density map, a model is filtered only if it significantly protrudes out of the density map. For NMR-RTC data, a model is filtered out if it does not satisfy at least half of the specified residue type frequencies. For the cross-linking data, a model is filtered if it does not satisfy any of the cross-links. For each data type, the scores of the remaining models are normalized, using the average and standard deviation of their scores (SData). This normalization facilitates combining and comparing of scores for different data types with different noise levels. The models are clustered, and the cluster representative with the best fit to the data is selected. Top-scoring 5000 cluster representatives are processed further. This number of models usually guarantees that near-native models are not excluded even in the case of noisy data (Supplementary Table S1).

2.1.3 Stage 3: Conformational refinement

The goal of this stage is to compute an interface energy score. Because rigid docking models may contain steric clashes, the side-chain conformations as well as relative positions and orientations of the model components are refined, and an interface energy score (SEnergy) is computed (Andrusier et al., 2007; Mashiach et al., 2010a).

2.1.4 Stage 4: Composite scoring

The final models are scored and ranked by a composite score consisting of a normalized interface energy term and the fit to the data:

equation image

2.2 Benchmark using simulated experimental data

Pairwise protein docking benchmark 4.0 (Hwang et al., 2010) is used to validate integrative docking method. This benchmark contains 176 complexes and their corresponding unbound structures, classified into 121 low-difficulty or rigid-body cases, 30 medium-difficulty cases and 25 high-difficulty cases, based on the degree of conformational change at the interface upon complex formation. For testing EM2D and EM3D, only a subset of 27 complexes with >675 residues is used (EM benchmark). These complexes are divided into 16 rigid-body, 4 medium-difficulty and 7 difficult cases. Each of the benchmark complexes also had SAXS, EM2D, EM3D, NMR-RTC and CXMS data simulated based on its native complex structure (Supplementary Material). We have also tested the method on three experimental SAXS datasets from pyDockSAXS benchmark (Niemann et al., 2008; Pinotsis et al., 2008; Pons et al., 2010; Schubert et al., 2002). Integrative docking is performed on each of these cases starting from the unbound structures and the predicted complex models are compared with the native complex and assessed for accuracy. Each model is assessed for accuracy by two measurements: orientation and interface accuracy, similar to CAPRI (Lensink et al., 2007; Lensink and Wodak, 2010a). Orientation accuracy (high, medium, acceptable or incorrect) is based on RMSD criteria (Supplementary Material), whereas interface accuracy is based on the fraction of correctly predicted interface residues (Supplementary Material). In line with previous docking papers, we define a near-native model as a model of high, medium or acceptable accuracy. The success rate is the percentage of benchmark cases with at least one near-native model in the top N predictions (N is typically 10, referred to as top 10).


3.1 Docking benchmark results

3.1.1 Docking accuracy increases significantly for individual datasets

Integrative docking method shows 2-fold increase in the top 10 success rate compared with standard docking (PatchDock-FireDock protocol) for SAXS and NMR-RTC, almost 3-fold increase for CXMS and 4-fold increase for EM2D and EM3D (Table 1, Fig. 2A). The standard docking protocol succeeds to rank a near-native model in the top 10 scoring models in 24% of benchmark cases. When SAXS data is used, this number goes up to 51%. If we consider only ~65 rigid body cases with <3% missing residues (unbound structures compared with complex), the success rate increases to 77% (Schneidman-Duhovny et al., 2011). For EM2D and EM3D, the success rate is 82% and 79%, respectively. This success rate quadruples when compared with standard docking, with the 19% success rate for the 27 complexes in the EM benchmark. For NMR-RTC, the success rate is 47%. With up to three cross-links, the success rate is 65%. If we consider the top-scoring model, there is a 2-fold increase in the success rate for SAXS and NMR-RTC (22% and 18% versus 10%), almost 4-fold increase for CXMS (36% versus 10%) and almost 5-fold increase for EM2D and EM3D (33% versus 7%).

Table 1.
Success rate of integrative docking using individual experimental filters
Fig. 2.
Success rate of integrative docking for Benchmark 4.0. (A) Success rate in prediction of orientation (top, top10) and interface (top-I, top10-I) for standard docking and docking restrained by NMR-RTC, CXMS, SAXS, EM2D and EM3D. (B) Success rate for predicting ...

Although using any type of data significantly improves the results relative to standard docking, we are still far from the upper limit on the success rate, given the initial sampling by finer docking (97% of the benchmark cases have a near-native model sampled by a global search). When we allow for a near-native model in the top 100 instead of top 10 models, the success rate increases to 71–89%, depending on the data types. For the failing benchmark cases, the near-native model is usually among the top 1000 models.

The success rate depends on the difficulty of the benchmark cases, but there is a significant increase when compared with standard docking, independent of the difficulty (Supplementary Tables S2 and S3). The success rate also increases when only high or medium accuracy models are considered as near-natives (Supplementary Table S4).

3.1.2 Interface prediction accuracy

We find that the top-scoring model has a correctly predicted interface (Fig. 2A, Supplementary Material) in 50–68% of cases compared with 32% for standard docking. Although NMR-RTC performs worse than other data types in orientation prediction, the success rate in interface prediction is comparable with that for other data types. Based on these benchmark results, the probability of a correctly predicted interface in the top-scoring model is 50–70% (depending on data type used) versus 32% for standard docking, and it increases to 84–91% if top 10 models are considered.

3.1.3 Dependence of success rate on complex size

For each of the five data types, we test the dependence of the success rate on the complex size. The 176 benchmark complexes were divided into four groups according to complex size, with the fourth group corresponding to the 27 complexes in the EM benchmark (Fig. 2B). Varying data types are most informative and applicable for different complex sizes. In particular, the success rate of standard docking decreases with the increase in complex size from 34% for small complexes to 19% for the EM benchmark. The reason is that the number of configurations and flexibility increase with protein size. The success rate for NMR-RTC drops sharply for complexes with >300 residues (for complexes with >675 residues, there is no significant difference between standard docking and docking with NMR-RTC). The reason is that the number of potential interfaces (i.e. the size of the search space) increases with protein size. In contrast, the success rate of SAXS is not sensitive to complex size. Unsurprisingly, the success rate for CXMS decreases slightly for the larger and more challenging complexes.

3.1.4 Dependence of success rate on protein shapes

Protein complexes were classified into oblate, spherical and prolate based on the eigenvalues of the gyration tensor (Pons et al., 2010) (Fig. 2C). The success rate of standard docking is highest for oblate proteins and lowest for prolate proteins. The reason is that oblate proteins have larger interfaces and better shape complementarity. The success rate of NMR-RTC, CXMS and SAXS is not sensitive to protein shapes owing to a combination of data and energy scores. The most significant increase in the success rate compared with standard docking is for prolate proteins: 3-fold increase for NMR-RTC, 5-fold increase for CXMS and 4-fold increase for SAXS. No analysis was performed for EM, owing to a small size of the EM benchmark.

3.1.5 Combining different experimental datasets increases the success rate

We tested pairwise combinations of the five experimental data types. The top 10 success rate increases from 42–82% for individual data types to 63–82% for pairwise combinations (Table 2). More important is the increase in the top 1 success rate from 17–36% to 26–52%. CXMS data complements all other data types, with most significant improvement for the top-scoring model, where the success rate increases from 36% for CXMS alone to 47–52% for all four pairwise data type combinations. The top 10 success rate for CXMS combined with SAXS or NMR-RTC is 80% and 81%, respectively, and is comparable with the success rate of EM data types. Another successful pairwise combination is SAXS–NMR-RTC, improving the success rate for the whole benchmark from 47–51% for SAXS and NMR-RTC separately to 68% when both data types are used. No significant improvement in the top 10 success rate is obtained by combining EM (2D or 3D) with other data types, as their independent success rate is already high (79–82%). For the EM–NMR-RTC combinations, there is even a slight decrease in the success rate because the NMR-RTC data is not informative for large protein complexes in the EM benchmark. When all five data types are combined, the top 10 success rate is similar to that for EM (83%), but more important is the increase in the top 1 success rate to 61% from 33% for EM alone.

Table 2.
Success rate of integrative docking using combined experimental filters

3.2 Application to an antibody–antigen complex

To test the applicability of the integrative method for determining pairwise protein complexes in a biopharmaceutical setting, we applied it to an antibody–antigen complex with experimentally generated datasets. In a typical biopharmaceutical discovery project, antibodies for a specific target can be generated by mice immunization or by phage-display libraries. The next step is selecting an optimal antibody out of several candidates for further development into a drug. Knowledge of the epitope is an important factor in antibody selection process. Therefore, a method that can model antibody–antigen complexes rapidly and accurately would be extremely useful.

In our case, the antigen PCSK9 plays a major regulatory role in cholesterol homeostasis and it is an important drug target (Horton et al., 2007). PCSK9 binds to the EGF-A domain of the low-density lipoprotein receptor (LDLR) and induces LDLR degradation. Reduced LDLR levels result in decreased metabolism of low-density lipoproteins, which may lead to hypercholesterolemia. The antibody J16 inhibits the action of PCSK9 by preventing LDLR binding (Liang et al., 2012). Recently, a crystal structure of PCSK9 in complex with J16 Fab showed that J16 is a competitive inhibitor of LDLR binding (Liang et al., 2012).

3.2.1 Complex structure modeling

The atomic structure of the unbound PCSK9 has been available since the beginning of this study (Protein Data Bank code 2P4E) (Cunningham et al., 2007). For the J16 Fab, 20 comparative models corresponding to two different elbow angles (136° and 168°) and 10 different CDR loop conformations were selected based on the fit to the J16 Fab SAXS profile (Supplementary Material). In addition, the missing loops, N-termini, C-termini and His tags were added for PCSK9 and the J16 Fab with MODELLER-9v8, to better model the SAXS data. The integrative docking protocol was applied to PCSK9 and 20 J16 Fab models; the final clustering considered all complex models simultaneously.

The structure of the complex was determined by X-ray crystallography during the course of this project, but was made available to this study only after the model of the complex was computed. Therefore, this application corresponds to a real-life antibody discovery scenario, where the unbound structures of the drug target and the antibody are known, but the structure of the complex is not available.

3.2.2 Assessment against X-ray structure

Because the accuracy of a docking prediction highly depends on the accuracy of the input structures, we first assess the accuracy of our input structures. The Cα-RMSD between the bound and unbound PCSK9 structures is 1.4 Å and between the bound and modeled J16 Fabs is 1.0 and 3.0 Å for elbow angles of 136° and 168°, respectively. Thus, there are no major PCSK9 conformational changes on binding. The elbow angle of the J16 Fab in the complex X-ray structure is 137.6°. Therefore, the prediction of the elbow angle based on the Fab SAXS profile was correct (Supplementary Fig. S1). We have also tested the fit of the X-ray structure of the complex against each data type (Supplementary Fig. S2) and observed high-quality fits for SAXS (χ of 2.24), EM2D (cross-correlation coefficient of 0.87) and EM3D (cross-correlation coefficient of 0.78).

Next, we analyze the accuracy of the best-scored models in terms of orientation and interface accuracy for different datasets. The best-scored models with acceptable accuracy were ranked 14, 2, 2 and 2 for SAXS, EM2D, EM3D and all three datasets combined, respectively (Supplementary Table S5). The best-scored models with a correct epitope were ranked 3, 2, 1 and 1, respectively (Supplementary Table S5). Docking results are slightly better for models with the elbow angle of 136° than for 168°, with acceptable accuracy models ranked 5, 2, 1 and 2 for SAXS, EM2D, EM3D and all three data types combined, respectively (Supplementary Table S5).

3.2.3 Data-guided filtering and funnel analysis

Ideally, the normalized fitting scores would correlate strongly with the accuracy of the model over a broad range of accuracy (i.e. I-RMSD of 0-5 Å or L-RMSD of 0-15 Å). We now examine whether or not such a ‘funnel’ exists for each type of data (London and Schueler-Furman, 2008) and how these funnels relate to specific complex structures (Fig. 3A). The three experimental datasets indeed result in pronounced funnels, revealing similar complex structures (Fig. 3B). Typically, there are three or four funnels associated with complex structures related by the pseudosymmetry of the antibody (i.e. light chain versus heavy chain) and the triangular shape symmetry of PCSK9 (Fig. 3C).

Fig. 3.
Modeling of the PCSK9–J16 Fab complex. (A) Scoring funnels as a function of L-RMSD for different experimental filters. (B) Top-scoring cluster representatives (red, green, gold and yellow) for integrative docking with SAXS, EM2D and EM3D filters, ...

The SAXS dataset produces four funnels. One of them includes the near-native models, although this funnel is the least pronounced among the four funnels. The EM2D dataset produces three funnels with comparable scores. One of the funnels is centered close to the native structure, demonstrating the predictive power of EM2D. The EM3D dataset produces four funnels similar to those from the SAXS dataset. In contrast to SAXS, the funnel with near-native models has the best EM3D scores, although this funnel is not centered on the native complex structure (its center is ~11 Å RMSD away from the native structure). While the EM3D dataset is best in selecting the correct funnel, the EM2D score is better in picking the highest accuracy model once the correct funnel has been selected. The shift in the near-native EM3D funnel relative to the near-native EM2D funnel can be explained by a distortion of the 3D density map that results from inaccuracies in the initial density map used for the 3D reconstruction that was obtained from the 2D class averages by the random conical tilt method.


We developed an integrative method for docking two protein structures by combining protein docking techniques with data from five experimental methods including SAXS, EM2D, EM3D, NMR-RTC and CXMS. To assess the accuracy of the integrative method, we used a benchmark of 176 complex structures with simulated experimental data. We also applied the method to an antibody–antigen complex, relying on experimental datasets collected specifically for this study.

Additional information, such as sequence conservation and impact of site-directed mutagenesis on complex formation, has been used previously to increase the accuracy of pairwise protein docking (Lensink and Wodak, 2010b; Mashiach et al., 2010b). Here, we analyze the improved docking success rates afforded by data from five accessible experimental methods. Our integrative framework can be modularly extended to support additional types of experimental data, such as those from footprinting, site-directed mutagenesis, FRET spectroscopy and atomic force microscopy (Trinh et al., 2012). In addition to the data types tested here, binding site residues and distance constraints, if available, can be added directly to the PatchDock input. In principle, experimental datasets can be used either to filter docking models or directly to drive the sampling. We select the first approach, because it allows seamless integration of any combination of datasets and we can rely on efficient global search methods already developed for pairwise protein docking. Moreover, driving the docking with global shape data, such as SAXS and EM2D, is algorithmically challenging.

4.1 Experimental datasets and their impact on docking

The experimental methods were chosen for their utility in a biotherapeutics discovery context where multiple artificial binding proteins (such as antibodies) are engineered to bind a specific drug target and rapid tools for epitope prediction are required. According to our large benchmark analysis, EM (2D and 3D) are the most informative of all datasets (Table 1). However, collecting experimental information to generate a 3D map is generally possible only for complexes larger than approximately 100 kDa and requires a relatively large amount of work. Here, we show that 2D class averages, which can be obtained significantly faster and for a wider range of samples, can provide the same information for pairwise docking as 3D density maps. In contrast to EM, SAXS has the advantage of being able to collect and analyze multiple samples in a few hours. As a result, purification, SAXS data collection and docking of multiple antibodies binding the same target could be performed in a matter of days. Automation of collecting EM data is more challenging than that for SAXS, although recent advances in data acquisition and an increase in computing power have allowed more streamlined processes in single-particle EM (Lyumkis et al.; Wu et al., 2012). While cross-linking with mass spectrometry is informative on its own, it also complements all other data types. With recent advances in data collection (Rappsilber, 2011), it is becoming a method of choice for combination with shape informative methods, such as SAXS and EM.

While validation of integrative docking by a large benchmark using simulated data has allowed a robust statistical analysis (Fig. 2), data collection and application to a specific target with real data has highlighted advantages and challenges of the integrative docking approach. Unlike NMR-RTC, which depends on protein expression in a cell-free expression system, both SAXS and EM gave useful data for the PCSK9–J16 Fab complex. In general, larger size and higher symmetry of a complex simplify EM data acquisition and interpretation. The larger mass of an IgG (150 kDa) compared with a Fab fragment (50 kDa) would simplify the data acquisition and image processing. However, the flexibility of an IgG may result in a conformationally heterogenous complex sample, favoring the use of the more rigid Fab fragment. Although the EM3D data was most informative for identifying the near-native cluster of models and predicting the epitope, more accurate structural models could be selected by the EM2D score. Despite the relatively low information content of the SAXS profile, the SAXS score predicted the same clusters as the EM-based scores. Additionally, the J16 Fab SAXS profile was useful in predicting the Fab structure and its elbow angle.

4.2 Improvement compared with standard docking

Although integrative protocol succeeds in including a near-native model among the top 10 models in 42–82% of the cases (Table 1), state-of-the-art docking methods succeed only in 30–40% of the cases, depending on the benchmark and accuracy criterion. ZDOCK-ZRANK ranks a model with I-RMSD < 4.0 Å among the top 10 models in 35–40% of the rigid-body cases of Benchmark 2.0 (Pierce and Weng, 2008). Recently developed residue potential, SIPPER (Pons et al., 2011), succeeds to rank a model with L-RMSD <10 Å in 28% of the 81 Benchmark 3.0 complexes, where at least one model with L-RMSD < 10 Å was generated by FTDock. In a recent CAPRI evaluation, an acceptable accuracy model was submitted by at least one participating group for 11 out of 13 complexes (Lensink and Wodak, 2010b). However, top 8 predictors could only predict correctly 6 out of 13 complex structures. While predictors can use additional information manually, a fully automated method, ClusPro (Comeau et al., 2004), succeeded to predict correctly five targets.

4.3 Comparison to other hybrid docking methods

Docking has been previously combined with additional data. HADDOCK (Dominguez et al., 2003) benefits from a consensus interface predictor CPORT (de Vries and Bonvin, 2011), succeeding to rank an acceptable accuracy model among the top 10 models for ~19% of the Benchmark 2.0 complexes. pyDockSAXS (Pons et al., 2010), which combines FTDock sampling with the pyDock scoring function and a SAXS profile, succeeds to rank a model with L-RMSD <10 Å in 43% of the Benchmark 2.0 complexes (i.e. for 70 of the 84 complexes with similar molecular mass for bound and unbound structures). In comparison, our approach applied to the same benchmark with a SAXS profile only, results in the significantly increased 63% success rate. The increase in the success rate is because of the increased precision of configurational sampling and higher accuracy of interface energy score in FireDock (Supplementary Tables S6 and S7). The integrative approach had a similar performance for three experimental SAXS dataset from pyDockSAXS benchmark (Niemann et al., 2008; Pinotsis et al., 2008; Pons et al., 2010; Schubert et al., 2002) (Supplementary Table S7).

4.4 Sampling versus scoring

The current work highlights the challenges in protein docking. Sufficient sampling is required in the global search stage to maximize model accuracy, hit rate and the quality-of-fit to the experimental data (Supplementary Tables S1 and S6). Our integrative approach is designed to benefit from the increasingly focused molecular docking search space afforded by consideration of experimental data. While an acceptable accuracy model is contained among the ~200 000 models generated by a global search for 97% of benchmark complexes, our integrative protocol succeeds to rank such models among the top 10 scoring models only in 42–82% of the test cases, depending on the data used (Tables 1 and and2);2); correct binding sites are identified among the top 10 scoring models in 84–91% of the cases (Fig. 2A). We suggest that a combination of finer sampling methods (including flexible docking) and improved scoring functions with physico-chemical and/or statistical terms can be helpful for further improving the success rate of pairwise protein docking. Integrative docking, such as that described here, may provide the best compromise between the relative expediency and inaccuracy of standard docking on one hand and relative complexity and accuracy of experimental structure determination by X-ray crystallography or NMR spectroscopy on the other hand.

Software. The package is downloadable from SAXS, EM2D, EM3D, NMR-RTC and CXMS scoring functions are implemented in our open source Integrative Modeling Platform (IMP; PatchDock and FireDock are available at Docking with a SAXS profile can also be done via a webserver at


We wish to thank Javier Chaparro-Riggers for engineering J16 Fab and Michael Chin for the PCSK9 and J16 Fab samples. We thank Carles Pons, Juan Fernandez-Recio and Dmitri Svergun for kindly providing access to the experimental SAXS datasets.

Funding: DSD has been funded by the Weizmann Institute Advancing Women in Science Postdoctoral Fellowship. We also acknowledge support from NIH R01 GM083960, NIH U54 RR022220 and Rinat (Pfizer) Inc. The SIBYLS beamline at Lawrence Berkeley National Laboratory is supported by the DOE program Integrated Diffraction Analysis Technologies (IDAT). We are also grateful for computer hardware gifts from Ron Conway, Mike Homer, Intel, Hewlett-Packard, IBM and NetApp.

Conflict of Interest: none declared.


  • Alber F., et al. Determining the architectures of macromolecular assemblies. Nature. 2007a;450:683–694. [PubMed]
  • Alber F., et al. The molecular architecture of the nuclear pore complex. Nature. 2007b;450:695–701. [PubMed]
  • Alber F., et al. Integrating diverse data for structure determination of macromolecular assemblies. Annu. Rev. Biochem. 2008;77:443–477. [PubMed]
  • Andrusier N., et al. FireDock: fast interaction refinement in molecular docking. Proteins. 2007;69:139–159. [PubMed]
  • Comeau S.R., et al. ClusPro: an automated docking and discrimination method for the prediction of protein complexes. Bioinformatics. 2004;20:45–50. [PubMed]
  • Cunningham D., et al. Structural and biophysical studies of PCSK9 and its mutants linked to familial hypercholesterolemia. Nat. Struct. Mol. Biol. 2007;14:413–419. [PubMed]
  • de Vries S.J., Bonvin A.M. CPORT: a consensus interface predictor and its performance in prediction-driven docking with HADDOCK. PLoS One. 2011;6:e17695. [PMC free article] [PubMed]
  • Dominguez C., et al. HADDOCK: a protein-protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc. 2003;125:1731–1737. [PubMed]
  • Duhovny D., et al. Efficient unbound docking of rigid molecules. In: Guigó R., Gusfield D., editors. Second International Workshop, WABI 2002. Rome, Italy: Springer Berlin/Heidelberg; 2002. pp. 185–200.
  • Eisenstein M., Katchalski-Katzir E. On proteins, grids, correlations, and docking. C. R. Biol. 2004;327:409–420. [PubMed]
  • Fernandez-Recio J., et al. ICM-DISCO docking by global energy optimization with fully flexible side-chains. Proteins. 2003;52:113–117. [PubMed]
  • Gray J.J., et al. Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. J. Mol. Biol. 2003;331:281–299. [PubMed]
  • Horton J.D., et al. Molecular biology of PCSK9: its role in LDL metabolism. Trends Biochem. Sci. 2007;32:71–77. [PMC free article] [PubMed]
  • Hura G.L., et al. Robust, high-throughput solution structural analyses by small angle X-ray scattering (SAXS) Nat. Methods. 2009;6:606–612. [PMC free article] [PubMed]
  • Hwang H., et al. Protein-protein docking benchmark version 4.0. Proteins. 2010;78:3111–3114. [PMC free article] [PubMed]
  • Krogan N.J., et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006;440:637–643. [PubMed]
  • Lensink M.F., et al. Docking and scoring protein complexes: CAPRI 3rd edition. Proteins. 2007;69:704–718. [PubMed]
  • Lensink M.F., Wodak S.J. Blind predictions of protein interfaces by docking calculations in CAPRI. Proteins. 2010a;78:3085–3095. [PubMed]
  • Lensink M.F., Wodak S.J. Docking and scoring protein interactions: CAPRI 2009. Proteins. 2010b;78:3073–3084. [PubMed]
  • Liang H., et al. PCSK9 Antagonism reduces LDL-cholesterol in statin-treated hypercholesterolemic non-human primates. J. Pharmacol. Exp. Ther. 2012;340:228–236. [PubMed]
  • London N., Schueler-Furman O. Funnel hunting in a rough terrain: learning and discriminating native energy funnels. Structure. 2008;16:269–279. [PubMed]
  • Lyumkis D., et al. Automation in single-particle electron microscopy connecting the pieces. Methods Enzymol. 483:291–338. [PubMed]
  • Mashiach E., et al. FiberDock: Flexible induced-fit backbone refinement in molecular docking. Proteins. 2010a;78:1503–1519. [PubMed]
  • Mashiach E., et al. An integrated suite of fast docking algorithms. Proteins. 2010b;78:3197–3204. [PMC free article] [PubMed]
  • Niemann H.H., et al. X-ray and neutron small-angle scattering analysis of the complex formed by the Met receptor and the Listeria monocytogenes invasion protein InlB. J. Mol. Biol. 2008;377:489–500. [PubMed]
  • Pierce B., Weng Z. A combination of rescoring and refinement significantly improves protein docking performance. Proteins. 2008;72:270–279. [PMC free article] [PubMed]
  • Pinotsis N., et al. Molecular basis of the C-terminal tail-to-tail assembly of the sarcomeric filament protein myomesin. EMBO J. 2008;27:253–264. [PubMed]
  • Pons C., et al. Structural characterization of protein-protein complexes by integrating computational docking with small-angle scattering data. J. Mol. Biol. 2010;403:217–230. [PubMed]
  • Pons C., et al. Scoring by intermolecular pairwise propensities of exposed residues (SIPPER): a new efficient potential for protein-protein docking. J. Chem. Inf. Model. 2011;51:370–377. [PubMed]
  • Rappsilber J. The beginning of a beautiful friendship: cross-linking/mass spectrometry and modelling of proteins and multi-protein complexes. J. Struct. Biol. 2011;173:530–540. [PMC free article] [PubMed]
  • Reese M.L., Dötsch V. Fast mapping of protein-protein interfaces by NMR spectroscopy. J. Am. Chem. Soc. 2003;125:14250–14251. [PubMed]
  • Ritchie D.W. Recent progress and future directions in protein-protein docking. Curr. Protein. Pept. Sci. 2008;9:1–15. [PubMed]
  • Robinson C.V., et al. The molecular sociology of the cell. Nature. 2007;450:973–982. [PubMed]
  • Russel D., et al. Putting the pieces together: integrative modeling platform software for structure determination of macromolecular assemblies. PLoS Biol. 2012;10:e1001244. [PMC free article] [PubMed]
  • Schneidman-Duhovny D., et al. Macromolecular docking restrained by a small angle X-ray scattering profile. J. Struct. Biol. 2011;173:461–471. [PMC free article] [PubMed]
  • Schneidman-Duhovny D., et al. Geometry-based flexible and symmetric protein docking. Proteins. 2005;60:224–231. [PubMed]
  • Schubert W.D., et al. Structure of internalin, a major invasion protein of Listeria monocytogenes, in complex with its human receptor E-cadherin. Cell. 2002;111:825–836. [PubMed]
  • Sivasubramanian A., et al. Structural model of the mAb 806-EGFR complex using computational docking followed by computational and experimental mutagenesis. Structure. 2006;14:401–414. [PubMed]
  • Stahlberg H., Walz T. Molecular electron microscopy: state of the art and current challenges. ACS Chem. Biol. 2008;3:268–281. [PMC free article] [PubMed]
  • Trinh M.H., et al. Computational reconstruction of multidomain proteins using atomic force microscopy data. Structure. 2012;20:113–120. [PMC free article] [PubMed]
  • Vajda S., Kozakov D. Convergence and combination of methods in protein-protein docking. Curr. Opin. Struct. Biol. 2009;19:164–170. [PMC free article] [PubMed]
  • Wu S., et al. Fabs enable single particle cryoEM studies of small proteins. Structure. 2012;20:582–592. [PMC free article] [PubMed]

Articles from Bioinformatics are provided here courtesy of Oxford University Press