|Home | About | Journals | Submit | Contact Us | Français|
While many structures of single protein components are becoming available, structural characterization of their complexes remains challenging. Methods for modeling assembly structures from individual components frequently suffer from large errors, due to protein flexibility and inaccurate scoring functions. However, when additional information is available, it may be possible to reduce the errors and compute near-native complex structures. One such type of information is a small angle X-ray scattering (SAXS) profile that can be collected in a high-throughput fashion from a small amount of sample in solution. Here, we present an efficient method for protein-protein docking with a SAXS profile (FoXSDock): generation of complex models by rigid global docking with PatchDock, filtering of the models based on the SAXS profile, clustering of the models, and refining the interface by flexible docking with FireDock. FoXSDock is benchmarked on 124 protein complexes with simulated SAXS profiles, as well as on 6 complexes with experimentally determined SAXS profiles. When induced fit is less than 1.5Å interface C RMSD and the fraction residues of missing from the component structures is less than 3%, FoXSDock can find a model close to the native structure within the top 10 predictions in 77% of the cases; in comparison, docking alone succeeds in only 34% of the cases. Thus, the integrative approach significantly improves on molecular docking alone. The improvement arises from an increased resolution of rigid docking sampling and more accurate scoring.
Many proteins are components of complexes, interacting with other proteins to deliver their functions, such as signal transduction, transport, and catalysis (Krogan et al., 2006; Robinson et al., 2007). Thus, structural description of protein complexes is important for understanding these processes. However, the number of solved complex structures remains relatively low, even while the number of experimentally solved single protein structures increases (Dutta and Berman, 2005). This gap can be bridged by hybrid or integrative methods(Alber et al., 2008; Alber et al., 2007; Steven and Baumeister, 2008). Integrative methods determine complex architectures by computationally combining information from different methods, such as X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy of component structures, electron microscopy of whole complexes, chemical cross-linking of components detected by mass spectrometry, and small angle X-Ray scattering (SAXS) of complexes.
The computational docking problem, which aims to predict a binary complex starting from the structures of unbound components, has been studied for more than three decades (Katchalski-Katzir et al., 1992; Wodak and Janin, 1978). Docking methods can be classified into three classes based on the sampling algorithms (Ritchie, 2008; Vajda and Kozakov, 2009): global search methods using the fast Fourier transform (FFTs) (Eisenstein and Katchalski-Katzir, 2004) or geometric shape matching (Schneidman-Duhovny et al., 2003), medium-range Monte Carlo methods (Fernandez-Recio et al., 2003; Gray et al., 2003), and the restraint-guided methods (van Dijk et al., 2005). Each class of methods is suitable for a specific docking sub-problem. Global methods are required for an adequate coverage of the search space, medium-range methods are best for local search and refinement, and restraint-guided methods perform well when additional information is available and can be translated into spatial restraints.
Docking methods have been systematically and prospectively evaluated at Critical Assessment of PRedictions of Interactions (CAPRI), relying on target complexes without available structures at the time of prediction (Janin, 2005). It is clear that the state-of-the-art docking methods can successfully (within top 10 predictions) predict the complex structure of two components with limited conformational change upon binding (induced fit that involves rotations of a few side chains), a standard size interface area (change in solvent accessibility area upon complex formation is between 1400 Å2 and 2000 Å2), and significant hydrophobic interaction (solvation free energy of complex formation is less than - 4 kcal/mol) (Vajda, 2005). Predictions can also be accurate if additional experimental information about the interaction is available, such as mutations and cross-linking that help identify binding site residues. However, docking methods still suffer from a relatively high rate of incorrect prediction, due to protein flexibility and lack of a reliable scoring function (Lensink et al., 2007; Mendez et al., 2003; Mendez et al., 2005).
SAXS measurement is emerging as a rapid and effective way for obtaining low-resolution (10-30Å) structural information about macromolecular structures in solution (Petoukhov and Svergun, 2007; Putnam et al., 2007). The scattering curve resulting from the subtraction of the buffer from the sample, (SAXS profile, I(q)), is radially symmetric (isotropic) due to the randomly-oriented distribution of particles in solution. The profile can be converted into a radial distribution function of the molecule via a Fourier transform. Unlike electron microscopy, NMR spectroscopy, and X-ray crystallography, SAXS experiments can be performed under a wide variety of solution conditions, including near physiological conditions. The measurement is performed with ~1.0 mg/ml of a macromolecular sample in a ~15 μl volume, and usually takes only a few minutes on a well-equipped synchrotron beam line (Hura et al., 2009; Tsuruta and Irving, 2008).
Computational approaches for modeling a macromolecular structure based on its SAXS profile can be classified into ab initio and rigid body modeling methods (Putnam et al., 2007). On the one hand, the ab initio methods search for coarse shapes represented by dummy atoms (beads) that fit the experimental SAXS profile (Chacon et al., 1998; Svergun, 1999; Svergun et al., 2001). On the other hand, rigid body approaches search for an atomic model of the molecule with a computed SAXS profile that fits the experimental profile (Förster et al., 2008; Pelikan et al., 2009; Petoukhov and Svergun, 2005). Therefore, rigid body modeling can be used only if an approximate structure of the studied molecule or its components are available, as is the case in protein-protein docking.
There are several methods for rigid docking with a SAXS profile. DIMFOM, GLOBSYMM and SASREF (Petoukhov and Svergun, 2005) are based on the CRYSOL program (Svergun et al., 1995) for SAXS profile fitting with a simplified sampling algorithm, where the structure of one monomer is rolled over the surface of the other; however, no interface optimization is performed. In another method, the scoring function combines SAXS and simple interface complementarity terms, sampled by a local search method that requires a relatively accurate initial configuration (Förster et al., 2008); in the absence of the initial configuration, the method starts from 1000 random orientations. A number of analyses of specific biological systems relied on docking followed by filtering of models based on a fit to a SAXS profile (Covaceuszach et al., 2008; Filgueira de Azevedo et al., 2003; Sondermann et al., 2005).
Here, we present a hybrid approach that computes a model of a complex for two given component structures, by simultaneously satisfying physicochemical complementarity between the components as well as a fit to a SAXS profile. The SAXS profile allows to increase the configurational sampling precision and decrease the number of inaccurate models with good scores. Moreover, while docking methods optimize interface shape complementarity, a SAXS profile provides information about the global complex shape. In many cases, especially if the proteins are elongated, small changes in the interface can lead to large changes in the global complex shape. Therefore, it is necessary to increase the sampling resolution to sample the complex accurately in terms of its interface as well as global shape. We test the method on 124 cases with simulated SAXS profiles and six cases with experimental SAXS profiles. The hybrid approach significantly improves on molecular docking alone: When induced fit is less than 1.5Å interface C RMSD and the fraction residues missing from the component structures is less than 3%, FoXSDock can find a model close to the native structure within the top 10 predictions in 77% of the cases; in comparison, docking alone succeeds in only 34% of the cases.
The method presented here addresses the docking problem restrained by a SAXS profile: Given two structures of molecules (referred to as a receptor and a ligand) and the SAXS profile of their complex, the goal is to find the complex structure; only minor conformational changes, such as side chain repacking, are explicitly modeled.
The docking protocol involves five steps (Figure 1):
PatchDock is used for global rigid docking (Duhovny et al., 2002; Schneidman-Duhovny et al., 2005b). PatchDock is an efficient rigid docking method that maximizes geometric shape complementarity. To account for surface flexibility in real-life docking involving unbound component structures, the geometric shape complementarity scoring function allows a small amount of steric clashes at the interface. The molecular docking is similar to assembling a jigsaw puzzle. Given two molecules, their surfaces are divided into patches based on their shape: convex, flat, and concave. Once the patches are defined, a pair of neighboring patches on one molecule is superimposed with a pair of neighboring patches on the other molecule, using Geometric Hashing (Lamdan and Wolfson, 1988). Next, the resulting models are clustered, filtered for severe steric clashes, and scored by shape complementarity. The configurational sampling precision can be controlled by the resolution of the surface representation (minimal distance between surface points used to generate docking models) and clustering parameters. Usually, docking methods balance configurational sampling precision against the accuracy and efficiency of scoring function, with the goal of retaining a sufficiently accurate model within a sufficiently small fraction of the best scoring models.
Here, the configurational sampling precision is increased to ensure the complex is sampled accurately in terms of its interface as well as global shape. The final clustering of rigid docking models is performed with a 2Å cut-off on the ligand interface C RMSD (compared to the default of 4Å; the ligand interface C RMSD is computed using the ligand C atoms within 10Å from the receptor in the docked configuration) and the resolution of surface representation of the ligand is decreased by 0.5Å to 1Å. These changes result in the average of 1.7 105 rigid docking models per complex, compared to 8.2 103 for the default parameter values. In addition, near-native models (ligand C RMSD (L-RMSD) < 10Å or interface C RMSD (I-RMSD) < 4Å as defined below in the assessment criteria) are observed in 94% of the benchmark cases, compared to 80% for default parameters.
Symmetric cases are docked with SymmDock (Schneidman-Duhovny et al., 2005a), a docking algorithm for the prediction of cyclically symmetric complexes (Cn) given the structure of its asymmetric unit and symmetry order n. SymmDock a priori restricts its transformational search space only to symmetric transformations, and thus gains both in efficiency and accuracy. In the case of dihedral symmetry (D2 tetramer is a dimer of dimers), SymmDock is applied first to generate dimers. Next, D2 tetramers are constructed by combining dimer pairs with perpendicular symmetry axes.
For a SAXS profile, radius of gyration (RGexp) is computed from the slope of the Guinier plot of the profile (Guinier and Fournet, 1955). For a protein structure, radius of gyration (RG3D) is computed as , where rk is a position of atom k, and rc is the centroid of the structure.
A docking model is filtered out if its radius of gyration is 10% smaller or 4% larger than the radius of gyration computed from the SAXS profile (0.9RGexp ≤ RG3D ≤ 1.04RGexp); the larger tolerance for the lower bound results from ignoring the hydration layer in the radius of gyration calculation.
For a given structure or a model, the SAXS profile is computed by FoXS (Schneidman-Duhovny et al.), based on the Debye formula (Debye, 1915):
where the intensity, I(q), is a function of the momentum transfer, q = (4π sin θ) / λ; 2θ is the scattering angle and λ is the wavelength of the incident X-ray beam; fi(q) is the form factor of an atom i, dij is the distance between atoms i and j, and N is the number of atoms in the system. In the FoXS model, the form factor fi(q) takes into account the displaced solvent as well as the hydration layer:
where fv(q) is the atomic form factor in vacuo (Svergun et al., 1995), fs(q) is the form factor of the dummy atom that represents the displaced solvent (Fraser et al., 1978), si is the fraction of the solvent accessible surface of the atom i (Connolly, 1983), and fw(q) is the water form factor. The parameter c1 is used to adjust the total excluded volume of the atoms (default value is 1.0) and c2 is used to adjust the density of the water in the hydration layer (default value is 0.0). In this work, the default values for c1 and c2 are used, because we want to rank docking models based on their SAXS fitting scores calculated under identical conditions.
The SAXS profile computed from the structure is fitted to the experimental SAXS profile by minimizing Χ:
where Iexp(q) and I(q) are the experimental and computed profiles, respectively, σ(q) is the error of the experimental profile, M is the number of points in the profile, and c is the scaling factor.
For rigid binary docking, additional speed-up is achieved by pre-computing rigid body profiles (IA, IB), made possible by constant distances for atom pairs within a rigid body. Only the contribution of inter-rigid body distances to the complex profile (IAB) is computed for each docking model by iterating over inter-molecular atom pairs in Equation 1. The profile of the docked complex is computed as the sum of three profiles: Icomplex = IA+ IB+IAB.
For symmetric complexes, even higher speed-up can be achieved, because the symmetric complex contains multiple copies of the symmetry unit. For dihedral symmetry D2, the profile is given by Icomplex = 4IA+ 2IAB+2IAC+2IAD (Figure 2a). For cyclic symmetry Cn, all distances between the symmetry units can be computed based on the distances between the first unit and n/2 other units in the complex. The complex profile is computed as , where Ui is unit i in the symmetric complex, c=1 if n is odd, and c=0.5 if n is even (Figure 2b).
The models are clustered iteratively, as follows. The clustering starts with the docking model that has the lowest Χ score. This model becomes a representative of the current cluster and the C atoms in the binding site of its ligand (ie, the ligand C atoms within 10Å from the receptor in the docked configuration) provide the frame of reference for calculating the ligand interface C RMSD for each one of the remaining (unclustered) models. All models with a ligand interface C RMSD below 4Å are assigned to the current cluster. When the cluster can no longer be expanded, the docking model with the lowest Χ score from the unclustered set of models initiates a new cluster.
The steric clashes, introduced by PatchDock, are removed with FireDock (Andrusier et al., 2007; Mashiach et al., 2008) that refines side chain positions and relative protein orientations. After steric clashes are removed, an energy-like function is used to rank the docking models. This interface energy score is a weighted combination of softened van der Waals, desolvation, electrostatics, hydrogen bonding, disulfide bonding, π-stacking, aliphatic interactions, and rotamer preferences (Andrusier et al., 2007).
The interface energy score and SAXS profile fitting scores ( values) of the final docking models are rescaled independently to the [0-1] interval and the composite score is computed as: SComposite = SEnergy + 0.3SSAXS, where SEnergy and SSAXS are the rescaled scores and 0.3 is the weight of the SAXS term. This weight was determined by enumerating a range of weight values to maximize the number of cases with near-native model within 10 top scoring models. Half of the Benchmark 1 randomly selected cases were used to determine the weight and the other half was used for validation.
We test the method with two types of data. First, each test case consists of unbound component structures and a simulated SAXS profile for their complex. Second, each test case consists of bound component structures and an experimentally obtained SAXS profile for their complex.
Protein-protein docking benchmark 3.0 (Hwang et al., 2008) is used for method validation with computed SAXS profiles. This benchmark contains 124 unbound-unbound test cases, classified into 88 rigid-body cases (I-RMSD ≤ 1.5Å), 19 medium-difficulty cases (1.5Å < I-RMSD ≤ 2.2Å), and 17 difficult cases (I-RMSD > 2.2Å). The complexes are also classified into three biochemical categories: enzyme–inhibitor (35 cases), antigen–antibody (25 cases), and others (64 cases). A SAXS profile is simulated using the co-crystallized structure of the complex for a q range from 0 to 0.5Å-1. For Χ calculations involving only computed profiles, the relative error is calculated from the Poisson distribution with λ of 10 and bound to 5%.
Experimental SAXS profiles and associated relative errors for 6 complexes (Table 1, Figure 3 - left column) from the BIOSIS database are used (Hura et al., 2009). These cases include three symmetric dimers with cyclic symmetry, two tetramers with dihedral symmetry, and one decamer with dihedral symmetry. The dimers are docked with SymmDock starting from the monomer structure. The tetramers are also docked with SymmDock by exhaustive enumeration of C2 symmetric models (Methods). For the decamer, we start with the dimer structure and apply SymmDock to build a pentamer of dimers. BIOISIS entries include structures with modeled missing residues. These residues are used for SAXS calculations, but not in docking.
An assessment criterion similar to that from CAPRI is used (Lensink et al., 2007). A docking model is considered acceptable (one star) if a ligand C RMSD (L-RMSD) after superposition of the receptor is below 10Å or interface C RMSD (I-RMSD) is below 4Å. A docking model is of medium accuracy (two stars) if L-RMSD < 5Å or I-RMSD < 2Å, and of high accuracy (three stars) if L-RMSD < 1Å or I-RMSD < 1Å. A docking model of acceptable or better accuracy is referred to as near-native. For symmetric complexes, C RMSD is computed after least-squares-fit superposition of the model on the native complex. Symmetric docking model is considered near-native if C RMSD is below 5Å.
We begin by assessing the accuracy of the radius of gyration computed from a SAXS profile, followed by quantifying the match between an experimental SAXS profile and a SAXS profile computed for the native complex. Finally, we assess FoXSDock by its performance on the two benchmarks.
We first assess to what degree the radius of gyration (RGexp) computed from the SAXS profile of the complex fits the radius of gyration (RG3D) of the complex structure. This analysis is used to find the threshold values for coarse SAXS filtering stage. In Benchmark 1 with simulated SAXS profiles, we compared the RGexp to the RG3D of the best possible docking models (Table S1). The best possible docking model is constructed by superposing unbound components to the complex structure. In Benchmark 2 with experimental SAXS profiles, the RGexp is compared with the RG3D of the complex structures (Table 1). In Benchmark 1, the RGexp is predicted with 2.18% accuracy (average) for cases with less than 3% missing residues (81 cases out of 124). The fractional difference in the number of residues in the complexes with bound and unbound structures is referred to as the fraction of missing residues. We conclude that RG measure is not very sensitive to conformational changes upon binding. Thus, it is possible to compute an accurate RG3D even when using unbound components for docking. In Benchmark 2, the RGexp can be up to ~7% larger than the RG3D. One possible explanation is that the hydration layer of a protein is not taken into account when computing RG3D from the coordinates of protein atoms.
Based on the numbers above, the thresholds for coarse SAXS filtering by RGexp are set to 0.9RGexp and 1.04RGexp (ie. a docking model is filtered out if its RG3D is more than 10% smaller or 4% larger than the RGexp). The RG3D of 119 (out of 124) complexes of Benchmark 1 and all the complexes of Benchmark 2 is within these thresholds (0.9RGexp ≤RG3D≤ 1.04RGexp). In the remaining 5 cases of Benchmark 1, the fraction of missing residues is more than 5% or large conformational changes are involved (I-RMSD > 8Å).
For Benchmark 1, the profile computed from the complex structure is compared to the profile computed from the best possible docking model of unbound components (Table S1). The best possible docking model is constructed by superposing unbound components to the complex structure. The accuracy of the profile fit is assessed as a function of the fraction of missing residues and the I-RMSD between the bound and unbound component structures (Figure 4). As expected, Χ increases with the increase in the fraction of missing residues and I-RMSD.
For Benchmark 2, experimental SAXS profiles are compared with profiles computed from the complex structures. In all cases, except 1YEM and 2E2G, a good fit is observed (ie, the experimental and computed profiles overlap for q < 0.2 Å-1; Figure 3a). The difference between the experimental and computed profiles for 1YEM might be explained by the modeling error for the residues missing in the crystallographic structure as well as by the difference between the solution and crystal structures. For 2E2G, an additional possible cause for the profile mismatch includes the differences between the experimental profile measured for PF1033 from P. furiosus and the profile computed from the homologous structure 2E2G (57% sequence identity).
Next, we assess each stage of the method to gain a better appreciation of the contribution of each stage to the final accuracy. The goal of each stage is to output as many good scoring near-native models as possible, while eliminating as many non-native models as possible. However, the emphasis on these two aspects changes with the progress through the flowchart. In the initial stages, the priority is to produce as many near-native models as possible, while in the later stages the priority is to rank them highly.
The average frequency of near-native models among the output models is 0.0026%, varying from 0 to 1755 models, with the average of 331 near-native models per case (Table S2). The average is higher in the rigid-body cases (413 models) than in the medium and difficult cases (217 and 30 models, respectively). Near-native models are found for all benchmark cases, except for one medium difficulty case and six difficult cases (ie, 117 out of 124 benchmark cases have a model of acceptable or better accuracy after rigid docking with PatchDock). Moreover, 96 cases include at least one model of medium accuracy and 58 cases include a model of high accuracy.
About one third of all models are eliminated in this stage. Nevertheless, in most cases, near-native models are not filtered out and the average hit rate increases to 0.0037% (Table S2). The average number of near-native models does not change relative to the global search stage (316 versus 331). All near-native models are filtered out only in three cases (1I4D, 1I2M, 1R8S), due to a large error in the computed RGexp resulting from a high number of missing residues (Table S1). In practice, missing residues can be accounted by decreasing RGexp thresholds and the weight of SAXS component in the composite score.
Ideally, the profile fitting score (Χ) score should be correlated with the accuracy of the model below some usefully large radius of convergence (ie, I-RMSD of ~5Å or L-RMSD of ~10Å). We examine whether or not such a “funnel” exists for each case in Benchmark 1 (Figure 5, first column; Figure S1). Some cases show a clear funnel, such as the first two cases in Figure 5 (1BVN and 1DFJ) with low Χ value models corresponding to near-native structures. Others have additional model clusters with low Χ values (Figure 5, 1CGI and 1TMQ), resulting from widely different configurations with similar overall shapes. For example, if the ligand has a globular shape, all the complexes with the correct ligand-binding site on the receptor have a low Χ, irrespective of the ligand orientation (Figure 5, 1CGI and 1E6E). If the receptor is symmetric, there are a number of ligand clusters with a low Χ value (eg, three clusters for the triangular receptor shape; Figure 5, 2O8V). There are also cases with no funnel at all (Figure 5, 1E6E and 2O8V). However, even in these cases, the scores of near-native complexes are significantly lower than average, so the profile still provides valuable information that eliminates a large number of non-native complexes.
We also examine the accuracy of coarse filtering by RGexp compared to that by Χ. Plots of the Χ score versus RG3D colored by the corresponding accuracy of the model show that both SAXS-based criteria eliminate many non-native models, while retaining the near-native ones (Figures 5 and S1).
Clustering eliminates more than 80% of models; the average number of models after clustering is 19,860. The filtered models are ranked according to the Χ value. Overall, there is a top scoring model of acceptable or better accuracy in 24 (19%) cases of the benchmark (Tables 2, S3). Considering the top 10 ranked models, 54 (44%) cases correspond to an acceptable or better model. In 79 (64%) cases, there is a near-native model among the top 100 predictions. In the remaining cases, the rank is in the range from 100 to 5000 (35 cases); near-native model is not found at all in only 10 cases (in 7 cases it is not produced by docking, and in 3 cases it is eliminated by the RGexp filtering). Out of the 124 cases in the benchmark, 21 and 63 include high and medium accuracy models.
We refine 5000 models with the lowest Χ scores after clustering. The models are re-ranked according to the composite score, corresponding to the sum of the energy-based score and Χ. Energy-based scoring brings new information into the protocol, improving model ranking compared to the previous stage. There is a top-scoring model of acceptable or better accuracy in 32 (26%) cases of the benchmark. 67 (54%) cases include an acceptable or better model among the top 10 predictions (Table 2). In 88 cases, there is a near-native model among the top 100 predictions; in 22 cases, the rank is worse or no near-native model is found (14 cases). The accuracy of the models is also improving; there are 27 cases with high accuracy models after refinement.
Next, we examine the success of the protocol in different complex categories.
The best performance is obtained for enzyme-inhibitor complexes (26 out of 35 cases have a near-native model among the 10 best scoring models), followed by antibody-antigen complexes (17 cases out of 25 cases have a near-native model among the 10 best scoring models). For other protein-protein complexes, the success rate is lower (only 24 out of 64 cases include near-native models among the 10 best scoring models). Many of these cases have high numbers of missing residues (more than 3% of residues are missing in 33 cases). In such cases, additional improvement might be possible to achieve by accurate modeling of the missing residues.
If we consider only 88 rigid-body cases of the benchmark, the success rate is higher than the overall average. There is a near-native model among the 10 best scoring models in 66% of the cases. Difficult cases require explicit modeling of the backbone flexibility that was not performed in this work. However, the protocol presented here can in principle process docking models from flexible docking as well.
65 of the 88 rigid-body cases have less than 3% missing residues. For this subset, our success rate is the highest, with a near-native model among the 10 best scoring models in 77% of the cases.
We compare FoXSDock to the standard docking protocol by PatchDock (Duhovny et al.) and FireDock without SAXS profiles (Andrusier et al., 2007). In standard docking, rigid docking models are created with PatchDock and 5000 top scoring models are refined and re-ranked by FireDock. The only difference between the two protocols is that FoXSDock uses a higher configurational sampling precision resulting into an increase from 8.2 103 to 1.6 105 sampled models per complex (it would be computationally too expensive to refine all of these models by FireDock). The success rate of FoXSDock is much higher than that of standard docking (Table 2). The top-scoring model is near-native in only 12 cases for standard docking, compared to 32 cases for FoXSDock. The number of near-native models among the top 10 models doubles from 33 to 67. The accuracy is also improved; there are 27 cases with high accuracy models compared to 14 without using a SAXS profile.
The performance of FoXSDock on Benchmark 2 is qualitatively similar to that for Benchmark 1 (Table 3). As expected, rigid docking finds a near-native model in all cases (Columns 2-4 in Table 3); the hit rates are higher compared to Benchmark 1 cases, because the search space is restricted to symmetric complexes only. As before, coarse SAXS filtering by RGexp significantly enriches for near-native models (Columns 5-7 in Table 3). There is a strong funnel in the plot of Χ versus C RMSD for three cases (Figure 4; 2DVM, 2G4J, 1DQK). Coarse SAXS filtering keeps most of the near-native models (indicated by grey horizontal lines in Figure 4). After Χ scoring for cases with funnels, a near-native model scores best for 2 cases (2DVM, 2G4J), 2nd best for 1 case (1DQK), while no near-native models are ranked highly for the remaining 3 cases. The rank improves significantly after refinement by FireDock. 5 out of 6 cases have a near-native model among the top 10 models, three of them at the very top. In one case (2E2G, already discussed above) involving a structure of a homolog for which the experimental profile was measured, FoXSDock fails to rank a near-native model among the top 10. It is possible that this case requires additional rigid docking sampling or accurate comparative modeling, because there are only six near-native models among the initial SymmDock-produced models.
Three key points emerge from this study. First, incorporation of a SAXS profile into rigid docking requires increasing configurational sampling precision of docking. Second, a SAXS profile helps to achieve a significant improvement over a standard docking protocol. Third, while helpful, a SAXS profile still provides only limited information about the complex shape; thus, the accuracy of the scoring function for selecting near-native models is a major remaining problem. We discuss each of these points in turn.
External information, such as a SAXS profile, binding site residues, symmetry, and distance restraints, are ideally incorporated directly into the configurational and/or conformational search algorithm (Andre et al., 2007; Schneidman-Duhovny et al., 2005a; van Dijk et al., 2005). While a SAXS profile provides information about the atomic distance distribution within the complex, generating all docking models with a specified tolerance around a given distance distribution is a challenging problem. Therefore, we take an alternative approach in this work. The configurational sampling precision of rigid docking is increased, followed by filtering out the models that are not consistent with the SAXS profile. The five search and rank stages of FoXSDock are designed to benefit from the increasingly focused molecular docking search space afforded by the knowledge of the SAXS profile. In contrast, most other molecular docking protocols include external information in a single filtering step following the global search.
The method is tested on a large benchmark with computed profiles and six cases with experimental SAXS profiles. Including a SAXS profile helps to achieve a significant improvement over a standard docking protocol: The number of cases with a near-native model at the top almost triples (from 10% to 26%) and doubles if we accept a near-native model within the top 10 scoring models (from 27% to 54%). In rigid-body cases with less than 3% missing residues, FoXSDock can find a model close to the native structure within the top 10 predictions in 77% of the cases, compared to 34% for docking alone. Increasing the configurational sampling precision also helps to improve model accuracy: There are twice as many cases with high accuracy models (27 compared to 14 without using a SAXS profile).
There are three major reasons for failure to produce a near-native model within the top 10 scoring models. First, large conformational changes upon binding do not allow finding a near-native model in the global search (6%). Second, a SAXS profile is sensitive to missing residues. The fraction of distances missing from a SAXS profile is almost double the fraction of the missing part. Third, the scoring function cannot rank a near-native model high (ie, the rank of a near-native model is between 10 and 5000 (35% of the cases – Table 2)).
In summary, we present a method that efficiently combines molecular docking and fitting to a SAXS profile. We expect to find FoXSDock useful in a variety of applications, such as docking with comparative models that is becoming increasingly common at CAPRI (Janin, 2007), flexible docking (Schneidman-Duhovny et al., 2007), and determining structures of multi-component macromolecular assemblies (Inbar et al., 2005; Karaca et al., 2010; Lasker et al., 2009). FoXSDock can also be applied when a SAXS profile is measured for a mixture of the complex and unbound components. In this case, coarse filtering by RGexp is not possible and the SAXS scoring stage has to fit a weighted sum of computed profiles of unbound components and a docking model to the experimental profile (Konarev et al., 2003).
Because the accuracy of the scoring function is still a major bottleneck, FoXSDock can be improved further by integrating more external information in addition to the SAXS profile. This information includes binding site residues determined by mutation, conservation analysis, NMR spectroscopy and other approaches; distance restraints determined from cross-linking, hydrogen-deuterium exchange, NMR spectroscopy or FRET spectroscopy; and a density map from electron microscopy. In this way, FoXSDock will contribute to maximizing the accuracy, precision, coverage, and efficiency of the structural characterization of macromolecular assemblies.
Source code and executables for SAXS profile calculation and fitting are available as part of the IMP software package (http://salilab.org/imp). A standalone version of FoXS is also available for download and as a web-server from http://salilab.org/foxs. The two benchmarks and the protocol scripts will be available at http://salilab.org/foxs. PatchDock and FireDock are available from http://bioinfo3d.cs.tau.ac.il.
We thank Hiro Tsuruta, David Agard, Bill Weis, and Dmitry Svergun for discussions about SAXS, as well as Ben Webb and Daniel Russel for help with IMP. DSD has been funded by the Weizmann Institute Advancing Women in Science Postdoctoral Fellowship. We acknowledge support from NIH R01 GM083960, NIH U54 RR022220, NIH PN2 EY016525, and Rinat (Pfizer) Inc. SIBYLS beamline at Lawrence Berkeley National Laboratory is supported by the DOE program Integrated Diffraction Analysis Technologies (IDAT). We are also grateful for computer hardware gifts from Ron Conway, Mike Homer, Intel, Hewlett-Packard, IBM, and Netapp.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.