Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2745287

Formats

Article sections

Authors

Related links

J Mol Biol. Author manuscript; available in PMC 2009 October 17.

Published in final edited form as:

Published online 2008 July 31. doi: 10.1016/j.jmb.2008.07.074

PMCID: PMC2745287

NIHMSID: NIHMS71752

Friedrich Förster,^{1,}^{*} Benjamin Webb,^{1} Kristin A. Krukenberg,^{2} Hiro Tsuruta,^{3} David A. Agard,^{4,}^{*} and Andrej Sali^{1,}^{*}

The publisher's final edited version of this article is available at J Mol Biol

See other articles in PMC that cite the published article.

A major challenge in structural biology is to determine the configuration of domains and proteins in multi-domain proteins and assemblies, respectively. To maximize the accuracy and precision of these models, all available data should be considered. Small angle x-ray scattering (SAXS) efficiently provides low-resolution experimental data about the shapes of proteins and their assemblies. Thus, we integrated SAXS profiles into our software for modeling proteins and their assemblies by satisfaction of spatial restraints. Specifically, we model the quaternary structures of multidomain proteins with structurally defined rigid domains as well as quaternary structures of binary complexes of structurally defined rigid proteins. In addition to SAXS profiles and the component structures, we employ stereochemical restraints and an atomic distance-dependent statistical potential. The scoring function is optimized by a biased Monte Carlo protocol, including quasi-Newton and simulated annealing schemes. The final prediction corresponds to the best scoring solution in the largest cluster of many independently calculated solutions. To quantify how well the quaternary structures are determined based on their SAXS profiles, we used a benchmark of 12 simulated examples as well as an experimental SAXS profile of the homo-tetramer D-xylose isomerase. Optimization of the SAXS-dependent scoring function generally results in accurate models, if sufficiently precise approximations for the constituent rigid bodies are available; otherwise, the best scoring models can have significant errors. Thus, SAXS profiles can play a useful role in the structural characterization of proteins and assemblies, if they are combined with additional data and used judiciously. Our integration of a SAXS profile into modeling by satisfaction of spatial restraints will facilitate further integration of different kinds of data for structure determination of proteins and their assemblies.

A comprehensive structural description of proteins, nucleic acids, and their assemblies will help us discover the principles that underlie cellular processes and bridge the gaps between genome sequencing, functional genomics, proteomics, and systems biology ^{1}^{; }^{2}. While X-ray crystallography and NMR spectroscopy can provide accurate high-resolution structures, these methods are limited by the difficulties in protein purification, stability of large complexes, crystallization (X-ray), and size (NMR). Single particle cryo-electron microscopy (cryo-EM) generally does not provide atomic-resolution structures and currently cannot be applied to systems smaller than approximately 250 kDa. While efficient, computational protein structure prediction methods are limited by their accuracy. These difficulties may be overcome by computational methods that effectively combine experimental, theoretical, and statistical information ^{2}^{; }^{3}.

Small angle X-ray scattering (SAXS) can rapidly provide low-resolution information about the shape of a macromolecule or a complex in solution ^{4}^{; }^{5}^{; }^{6}^{; }^{7}. A SAXS measurement determines the molecule's rotationally averaged scattering intensity as a function of spatial frequency, *I*(*q*), typically at 1-3 nm resolution ^{5}^{; }^{7}. This profile can be readily transformed into an electron pair distance distribution function *P*(*r*), which is essentially a histogram of all pairwise distances *r* of the electrons in the sample. Due to the rotational averaging, the information content of a SAXS profile is dramatically reduced compared to an X-ray crystallographic diffraction pattern or even a density map from cryo-EM. However, one of the advantages of SAXS is its applicability to a wide range of assemblies; its applications range from DNA fragments ^{8} to whole virions ^{9}. Moreover, data collection and processing are very rapid (typically, from seconds to minutes), allowing high-throughput analyses of a large number of samples at many conditions. These advantages are in stark contrast to crystallography, cryo-EM, and NMR spectroscopy. The ease of altering solution conditions makes SAXS ideal for mapping structural differences between varied conformational states of a macromolecular system; if the structure of one conformational state is known, even the relatively sparse information content of a SAXS profile can be sufficient to determine structural rearrangements, such as hinge-motions in proteins ^{10}^{; }^{11}^{; }^{12}.

Information from SAXS can in principle be incorporated into the modeling process in two ways. First, a SAXS profile can be used to assess different models that were produced based on other considerations. For example, experimental SAXS profiles have been used to choose one of the many different quaternary structure arrangements produced by molecular docking of the son-of-sevenless domains ^{13}. Similarly, simulations indicated that SAXS profiles can be used to choose a close-to-native solution from a large set of alternative homology models of a given protein ^{14}.

Second, a SAXS profile can be used during the model building stage itself. The first such calculation of a model based on a SAXS profile relied on representing a macromolecular surface using spherical harmonics ^{15}. However, this representation has a relatively low resolution, thus leading to the development of alternative methods. Due to the limited information content of SAXS profiles, virtually all subsequently developed methods have aimed to integrate additional information into structure determination to reduce the manifold of solutions consistent with a given SAXS profile to a usefully small number.

Depending on the nature and resolution of additional information, different representations have been proposed. Early approaches modeled the molecule's envelope enforcing compactness ^{16}. Early coarse-grained approaches represented the macromolecule as an assembly of unconnected beads on a grid ^{17}^{; }^{18}^{; }^{19}. This representation enforces an overall mass by using a required number of beads; geometrical symmetry may also be incorporated through symmetric sampling. In addition, compactness of the models is ensured by restricting the sampling to the vicinity of a compact initial model ^{17}^{; }^{18} or by including appropriate terms into the scoring function ^{19}^{; }^{20}. Other modeling approaches represent a protein as a chain of beads rather than a set of disconnected beads on a grid ^{20}. In a recent application, atomic models were fitted into 6-fold symmetric bead reconstructions to gain insights into domain rearrangements of the AAA-ATPase p97 during its functional cycle ^{10}. Higher resolution *a priori* structural information about some parts of the protein can also be integrated into the modeling process by focusing the conformational sampling only on the undefined segments, such as loops ^{21}, or on the configuration of structurally defined domains and their flexible linkers ^{22}. For example, rigid body modeling was applied to give qualitative insights into the conformation of the polypyrimidine binding protein ^{23}. Recently, SAXS profiles have been incorporated into the modeling of protein structures based on NMR-derived restraints, which significantly increased the accuracy of models for multidomain proteins compared to the models based on NMR profiles alone ^{24}. In addition, the inclusion of simulated SAXS profiles into folding simulations led to models of small helical proteins ^{25}.

Here, we describe the newly developed SAXS module in *Integrative Modeling Platform* (IMP), our software platform for modeling macromolecular assemblies by satisfaction of spatial restraints (http://salilab.org/imp) ^{3}^{; }^{26}. This integration in turn allowed us to combine SAXS profiles with various other types of data already used by IMP. Specifically, we present a protocol for modeling multidomain proteins and complexes. In particular, the protocol calculates the quaternary structures of multidomain proteins with structurally defined rigid domains as well as quaternary structures for complexes of structurally defined rigid proteins. In addition to a SAXS profile and rigid-body constraints, we employ stereochemical restraints derived from a molecular mechanics force field ^{27}, a simple non-bonded atom pair term ^{28}, the atomistic distance-dependent statistical potential DOPE ^{29}, and an optional symmetry-enforcing term ^{26}. We quantify the performance of the protocol using a benchmark of simulated examples. Finally, to test the method in a realistic setting, we also model the quaternary structure of the homotetratmer D-xylose isomerase (XI) based on an experimental SAXS profile. The method already revealed large domain rearrangements between the nucleotide-free and the nucleotide-bound forms of *Escherichia coli* Hsp90 ^{12}.

We developed a method to calculate atomic models of proteins and their assemblies that are consistent with experimental SAXS profiles as well as other spatial restraints. The solution is found by optimizing a scoring function that quantifies how consistent a model is with the SAXS profile and the other restraints. Next, the method is described by specifying its three components: (*i*) the representation of the modeled system, (*ii*) the terms that contribute to the scoring function, and (*iii*) the optimization protocol. We also describe how to evaluate the ensemble of models obtained from independent optimizations of the scoring function.

The system is represented by its *N _{at}* non-hydrogen atoms. A major problem that needs to be overcome is the large size of the search space compared to the amount of input data. To reduce the number of degrees of freedom, the model is partitioned into

The scoring function is defined as:

$$S={S}_{\mathrm{stereo}}+{S}_{\mathrm{overlap}}+{S}_{\mathrm{DOPE}}+{S}_{\mathrm{sym}}+{S}_{\mathrm{SAXS}}.$$

(1)

*S _{stereo}* accounts for the inter-rigid body stereochemical features of the atoms in the protein (

SAXS experiments measure the rotationally averaged X-ray scattering of the macromolecule under scrutiny. The measured quantity is a one-dimensional curve that gives the scattered intensity *I*_{exp} as a function of the momentum transfer *q*=(4π/λ)sin(*θ*), where λ is the wavelength of the incident X-ray beam and 2*θ* is the scattering angle.

Similar to previous approaches to modeling a macromolecule based on its SAXS profile ^{17}^{; }^{18}^{; }^{20}^{; }^{24}, we score a model based on the deviation between the calculated (*I*_{m}) and experimental SAXS profiles (*I*_{exp}):

$${\chi}^{2}=\frac{1}{Q}\sum _{k=1}^{Q}\frac{1}{{\sigma}_{\text{exp}}^{2}({q}_{k})}\cdot {\left({I}_{\text{exp}}({q}_{k})-c\cdot {I}_{\text{m}}({q}_{k})\right)}^{2},$$

(2)

where *k* denotes the index of the measured frequency *q, Q* is the total number of frequencies, and *σ*_{exp} is the experimental error. The relative scaling of *I*_{m} with respect to *I*_{exp} cannot be determined precisely because the protein concentrations generally cannot be measured with sufficient accuracy. Thus, the profile *I*_{m} is scaled by a constant *c*, which is chosen by minimizing *χ*^{2} (Supplementary Theory and Methods).

Moreover, we require *S _{SAXS}* to be comparable in size to the other four types of terms in

$${S}_{\mathrm{SAXS}}={N}_{\mathrm{at}}{k}_{B}\cdot T\cdot {\chi}^{2}$$

(3)

where *k _{B}* is the Boltzmann constant and

The experimental SAXS profile of a macromolecule in solution is the difference between the scattering of the solution with and without the protein. Thus, the profile is approximately equal to the difference between the scattering intensities of the macromolecule and the solvent in the same volume. This model neglects the approximately 3 Å thin hydration layer around the macromolecule, which has a slightly higher density than bulk water. We neglected the hydration layer because the density difference between the hydration layer and bulk water is relatively small (<0.060 e^{-}Å^{-3}) compared to the densities of water (0.334 e^{-}Å^{-3}) and protein (approximately 0.440 e^{-}Å^{-3}) and because the volume of the hydration layer is small compared to that of the macromolecule. We assess the error due to neglecting the hydration layer in Results by comparing our profiles to those using the program CRYSOL, which does account for the hydration layer.

We utilize the Debye Formula ^{30} to calculate the SAXS profile *I*_{m} of a given atomic model of a protein or assembly:

$${I}_{\text{m}}(q)=\sum _{j=1}^{{N}_{A}}\sum _{j=1}^{{N}_{A}}{f}_{i}(q)\phantom{\rule{0.2em}{0ex}}{f}_{j}(q)\frac{\text{sin}(q{d}_{ij})}{q{d}_{ij}},$$

(4)

where *f _{i}*(

For optimization of *S*, we need the gradient of *χ*^{2}. After some algebra, we obtain the partial derivative of *χ*^{2} (Eq. (3)) with respect to the *x*-coordinate of atom *i* using Eq. (4):

$$\frac{\partial}{\partial {x}_{i}}{\chi}^{2}=4c\cdot \sum _{j\ne i}^{{N}_{A}}\frac{{x}_{i}-{x}_{j}}{{d}_{ij}^{2}}\sum _{k=1}^{Q}\frac{{I}_{\text{exp}}({q}_{k})-c\cdot {I}_{\text{m}}({q}_{k})}{{\sigma}_{\text{exp}}^{2}({q}_{k})}\cdot {f}_{i}({q}_{k})\cdot {f}_{j}({q}_{k})\cdot \left(\frac{\text{sin}({q}_{k}{d}_{ij})}{{q}_{k}{d}_{ij}}-\text{cos}({q}_{k}{d}_{ij})\right).$$

(5)

The derivatives with respect to *y* and *z* are equivalent to those for *x*.

The calculation of both *χ*^{2} and its derivatives is computationally demanding. However, we can substantially shorten the computation time for both quantities by approximating the modeled system with atoms of different scattering masses but equal shape (Supplementary Theory and Methods). Using this approximation, the computation time is reduced by a factor equal to the number of frequencies at which *I*(*q*) is sampled, typically 100. Even at the resolution of *q _{max}*=1Å

The rigidity constraints imply that atoms belonging to the same rigid body *b _{l}* are not allowed to move with respect to each other. Thus, all forces within

In IMP, we provide the option to add more than one SAXS profile term to *S* (Supplementary Theory and Methods). In addition to the profile of the macromolecule under scrutiny, SAXS profiles of the gold-labeled macromolecule or profiles of subsets of the original system can be acquired. When the corresponding structures are conserved in these constructs, these profiles provide additional information about the macromolecule.

We optimize the model with respect to *S* to obtain an ensemble of good-scoring solutions (Fig. 1). To sample different minima of *S*, many independent optimizations are carried out starting from different random initial configurations. We differentiate between the global search mode, in which we sample 1000 different initial configurations, and the local search mode, in which we sample 100 initial configurations in the vicinity of a specific configuration. In both cases, the resulting models are clustered to make the final prediction.

Flowchart for the modeling protocol. As input, we need a SAXS profile, an initial model, the definition of rigid bodies, and optionally the symmetry. Initialization yields *N* random configurations, which are subsequently optimized in three stages. For **...**

Starting from an initial model of a monomeric protein or a complex (*eg*, a crystal structure or a homology model), the domains and the connecting linker residues (monomeric protein) or the proteins (complex) are individually rotated and translated by random values *ΔΦ* and *ΔT*, respectively. Depending on the magnitude of these parameters, the protocol explores the intermediate neighborhood of the initial model (local search mode) or the whole space of solutions (global search mode). We chose *ΔT*=40Å and *ΔΦ*=180° for global sampling and *ΔT*=1Å and *ΔΦ* =2° for local sampling. In the case of monomers, the domains are initially brought into relative vicinity by optimization with respect to *S _{stereo}* using the method of conjugate gradients

Next, in Stage I of optimization, a coarse relaxation of the model is carried out with 10 cycles of a quasi-Newton method ^{33}, relying on the scoring function consisting only of *S _{stereo}*,

In Stage II, we further optimize the model with respect to the sum of *S _{stereo}*,

In Stage III, additional 25 Monte-Carlo steps are finally carried out to optimize the models with respect to the complete *S*, including *S _{DOPE}*.

On average, more than 200 different models are generated for each of the 1,000 initial independent optimizations, resulting in the total of ~200,000 evaluated models. On a Linux-based cluster of 100 processors, the computations take from a few hours to two days, depending on the size of the modeled system.

Because each optimization of a random starting configuration samples only a fraction of the configuration space, it generally ends in a local minimum. Thus, we rank all solutions by *S _{SAXS}* and retain only the top 10% for further analysis. These solutions are hierarchically clustered

To assess the method with statistical rigor, we applied it to a benchmark set of 12 known structures with calculated SAXS profiles. The benchmark includes 9 multidomain proteins (Tables 1, S1); their domains are treated as rigid bodies in our calculations. The domain definitions for the native structures were taken from the CATH database. Domains for five of the 9 proteins were represented by experimentally determined structures with identical sequences in a different state. Domains for the other 4 proteins were modeled by comparative modeling based on related structures. The alignments for comparative modeling were obtained from our comprehensive database of structural alignments, DBAli ^{35}. The models were built with the ‘automodel’ class in MODELLER-9.0. Except for 1o0vA, the models cover the whole sequence. The benchmark also includes 3 protein complexes, each consisting of 2 proteins (Table 2), which were obtained from ‘Docking Benchmark 2.0’ ^{36}. Here, the rigid bodies for each protein corresponded to crystallographic structures of the same sequence in a different state ^{36}.

Benchmark of multidomain proteins. ^{a} We modeled 9 different proteins (targets) using rigid bodies for the specified amino-acid residue ranges, connected by flexible linkers. ^{b} The rigid bodies corresponded to comparative models built using the specified **...**

We aimed to map the accuracy of the predicted configurations as a function of rigid body accuracy. We modeled the configurations of rigid bodies as described above, employing the global and local search modes. For comparison, we also created an initial model with all rigid bodies superposed onto their counterparts in the native structure (RBNC, rigid bodies in the native configuration) using Chimera ^{37} and optimized the linker segments with respect to *S _{stereo}* and

The models were compared to the native state using two measures: (i) Cα RMSD with respect to the native structure and (ii) rigid-body translation *Δr* and rotation *Δ*α of the rigid bodies relative to their positions in the native state. The Cα RMSD was calculated using the ‘superpose’ command of MODELLER-9.0. For calculation of *Δr* and *Δ*α, the reference frame was defined by first superposing the largest rigid body from the model on its equivalent fragment in the native structure. Next, each of the remaining rigid bodies was rotated around its center of mass and subsequently translated such that it superposed with the equivalent part of the native structure. The corresponding rotation and translation define *Δ*α and *Δr*, respectively. If the model consisted of more than two rigid bodies, we computed the mean values of *Δ*α and *Δr* to characterize the similarity between two configurations of rigid bodies, always using the largest rigid body to define the reference frame.

We prepared XI solutions with concentrations of 0.55, 1.1, 2.7, and approximately 20 mg/ml from XI crystals (Hampton Research, Aliso Viejo, CA) and a buffer solution containing 10mM Hepes (*p*H 7.4) and 150 mM NaCl. SAXS profiles were recorded at Beam Line 4-2 at the Stanford Synchrotron Radiation Laboratory ^{38}: Each solution was placed in a cuvette, which was maintained at 20 °C and located 2.5 m from a MarCCD165 detector (MarUSA, Evanston, IL). Twenty 15 second exposures (X-ray wavelength λ=1.381 Å) were acquired in series for each concentration. For a range of concentrations, we obtained approximately constant radii of gyration *R*_{G} in the *R*_{G}·*q*_{min}-R_{G}·*q*_{max} range of 0.37-1.27, indicating no protein aggregation or changes in the homotetramer quaternary structure: *R*_{G} of 32.7±0.4 Å^{-1} at 0.55 mg/ml, 32.9±0.2 Å^{-1} at 1.1 mg/ml, and 32.8±0.1 Å^{-1} at 2.7 mg/ml; these *R*_{G} values are similar to those reported previously^{39}. SAXS profiles were computed from the scattering images using MarParse ^{38} and profiles recorded at 2.7 mg/ml (0.01 Å^{-1} < *q* < 0.10 Å^{-1}) and 20 mg/ml (0.055 Å^{-1} < *q* < 0.27 Å^{-1}) concentrations were scaled and merged using Primus ^{40} The complete profiles ranged from *q*=0.01 Å^{-1} to *q*=0.27 Å^{-1}. We finally resampled the profile on a uniform grid of 100 mesh points using linear interpolation.

We determined that it is not necessary to include the hydration shell of a protein in the Debye model for calculating SAXS profiles because the inclusion of the hydration shell has a much smaller impact on *χ*^{2} than the errors in an experimentally measured SAXS profile. Specifically, we compared the experimentally measured lysozyme SAXS profile with the profiles computed by our method and CRYSOL ^{41} (PDB 6lyz) (Fig. 2). The profiles calculated by our program and by CRYSOL agree with the experimental profile within its error, though the CRYSOL profile is a slightly better fit (*χ*^{2}=0.20 *versus χ*^{2}=0.26). We also determined that using a single consensus atomic shape has no notable effect on the accuracy of calculated SAXS profiles; that is, the *χ*^{2} difference between profiles calculated with the consensus and specific atomic shapes is approximately 0.001 (Supplementary Theory and Methods).

Accuracy of the Debye model for calculating SAXS profiles. We compare an experimental profile of lysozyme (black, obtained from http://www.embl-hamburg.de/ExternalInfo/Research/Sax/crysol.html) to the profiles calculated using our methodology (green) **...**

We first illustrate the method by its application to monomeric Diphteria toxin (Fig. 3A, PDB code 1mdt). We calculated the SAXS profile of the crystallographic structure: The *q* values ranged from *q _{min}*=0.02 Å

Modeling Diphteria toxin using a simulated SAXS profile. A: Native monomeric Diphteria toxin (PDB code 1mdtA) has two domains (blue and red). B. We approximated these domains by their structures in the dimeric form (PDB code 2ddtA). C: The sum of *S*_{DOPE} **...**

In our approach, the predicted quaternary structure depends primarily on the experimental *S _{SAXS}* term and the modeling terms

We further analyzed the models obtained from the optimization of *S*. Using Cα RMSD, we hierarchically clustered the 10% models with the best *χ*^{2} (Fig. 4A). The models clearly cluster into 3 distinct groups, separated by more than 12.5 Å Cα RMSD from each other. Each cluster is represented by the model with the best *S* (Fig. 4B-D). Albeit all models correspond to distinct configurations, their shapes and calculated profiles are similar, as expected from the nature of the SAXS restraint. The model from cluster III (Fig. 4D) has the lowest *S* and the lowest SAXS penalty (χ^{2}=1.3 compared to 1.5, and 1.4 for the models from clusters I and II, respectively; Fig. 4E) and is closest to the native state in terms of its Cα RMSD (1.4 Å compared to 10.4 and 14.0 Å, respectively). Interestingly, cluster III does not constitute the largest cluster nor does it include the models with the lowest *S _{DOPE}* (Supplementary table S1). Thus, in this case

To assess the modeling protocol with statistical rigor, we applied it to our benchmark set of 9 multidomain proteins (Theory and Methods). The benchmark comprised proteins with 2 domains (5 cases), 3 domains (3 cases), and 4 domains (1 case) (Tables 1, S1). For five proteins, the domains were represented by experimentally determined structures, which are identical in sequence but part of a different assembly. The rigid bodies of the remaining four proteins were modeled by comparative modeling based on related template structures. Thus, the benchmark covered different scenarios in terms of protein size and available structural information.

After optimization and clustering, the cluster with the most accurate model (in terms of Cα RMSD) was termed the most accurate cluster. In the global search mode, the best-scoring models in the most accurate cluster were close to the native state (Cα RMSD < 4 Å) for 6 proteins, of medium accuracy for one protein (Cα RMSD = 12.8 Å), and of low accuracy in only two cases (Cα RMSD > 18 Å) (Tables 1, S1). When crystallographic domain structures were used, the resulting configurations were always highly accurate. In contrast, when comparative models were used, high accuracy configurations could only be obtained for 1cb6 and medium accuracy configurations for 1iknA. In our benchmark, these proteins also possessed the highest sequence identity to their template structures (63% and 46%, respectively). It is well established that the accuracy of a comparative model is correlated with the sequence identity on which it is based ^{42}. Therefore, not surprisingly, the accuracy of whole protein models produced by our method is correlated with the sequence identity of the structural template for the modeling of the individual domains.

In real applications, we do not have the modeled native structure. Thus, we don't know which one of the generated models is most accurate, but need to select it based on some criterion. Possible criteria include the *S* and *S _{SAXS}* (

In almost all cases, the local search produced more accurate models than the global search. Only for 1mdtA, the local search resulted in a less accurate model than the global search, consistent with the largest distortion in the initial structure relative to the native state (Cα RMSD > 20 Å) among all benchmark cases. Remarkably, for the four-domain protein 1cb6 (Cα RMSD of the initial model is 8.2 Å), local sampling yielded more accurate results than global sampling, albeit global sampling results in models with lower scores. Thus, in this case, the near-native configurations are within the radius of convergence for the local sampling, whereas the global minimum, which is not a near-native configuration, is not. Based on the examples of 1mdtA and 1cb6, we conclude that the radius of convergence for local sampling is on the order of 10 Å.

We also applied our protocol to 3 binary protein complexes (Theory and Methods). The accuracy of binary complex models is generally lower than that for the individual two-domain proteins (Table 2). The highest accuracy was achieved for 1avxAB, whose best-scoring models had Cα RMSD of approximately 10 Å. This target was classified as ‘easy’ in Docking Benchmark 2.0 ^{36}. The high Cα RMSD is due largely to inaccurate relative orientation of the two proteins (Δα = 81.6°); in comparison, the relative position is quite accurate (*Δr* = 5.1 Å). Interestingly, local sampling in the vicinity of the RBNC state resulted in a model similar to the configuration using global sampling, which also scored better than the RBNC state according to *S* and χ^{2}, but not *S _{DOPE}* (Table S2).

The results for target 1ibr are similar to those for 1avx; the best scoring models were different from the native state and scored better than the RBNC state. The target 1ibr is classified as ‘difficult’ in Docking Benchmark 2.0 ^{36}.

For the third example, 1fq1 (‘difficult’ in Docking Benchmark 2.0 ^{36}^{,}), the largest three clusters produced by the global sampling were of similar size and different from the native state (Cα RMSD > 30 Å). Moreover, the individual *S*, *S _{SAXS}*, and

If we assemble the native protein structures instead of their models, the local sampling around the native state always finds a configuration close to the native state (Cα RMSD < 1.2 Å); moreover, the optimized *S*, *S _{SAXS}*, and

To test our methodology on experimental rather than simulated profiles, we modeled the quaternary structure of D-xylose isomerase (XI) from *Streptomyces rubiginosus*. The model was calculated using three different approximations for the monomers: (*i*) the native subunit structure in the complex, (*ii*) a comparative model based on 4xim as a template (67% sequence identity), and (*iii*) a comparative model derived from 1a0d (28% sequence identity). The best possible superpositions of four copies of the monomer comparative models onto the native structure of the XI complex, *ie*, the Ca RMSD values of the RBNCs, were 2.7 Å and 5.1 Å for 4xim and 1a0d, respectively.

To reproduce the 222 symmetry of the XI tetramer consisting of identical subunits A, B, C, and D, we added a symmetry term *S _{sym}* to

We acquired an experimental XI SAXS profile ranging from *q _{min}*=0.01 Å

Modeling of the XI tetramer, using monomer models based on a range of sequence identity with the templates. ^{a}: Clusters of all models are characterized by their relative sizes and the minimum Cα RMSD's. ^{b}: From each cluster, the best-scoring model **...**

For the native monomer, the largest cluster contains the models with the lowest score *S* as well as the most native-like models (Cα RMSD = 3.5Å), indicating an accurate prediction. However, if we only consider *S _{SAXS}*, the best scoring models are not closest to the native state (Fig. 5A-D), illustrating the positive role played by the other terms in the combined scoring function

Models of XI and their calculated SAXS profiles compared to the experimental profile. To facilitate visual comparison of the different models, chain A (red) of all models is oriented identically. A: Native XI (χ^{2}=17.7; PDB code 1xib). B: The XI **...**

The calculation with the 4xim-based monomer models follows the same trends as those with the native subunit (Table 3 and Fig. 5E). Therefore, in a realistic setting, our modeling protocol would have determined correctly the quaternary arrangement of four subunits of the XI monomer, using its experimental SAXS profile and a high accuracy subunit model based on 67% sequence identity to the template structure.

In contrast, the results for the comparative model based on the remotely related 1a0d are qualitatively different. In this case, the largest cluster does not contain the most nativelike models (Cα RMSD = 49.1 Å). If we select the cluster according to *S* or *S _{DOPE}*, the model is closer to the native state, albeit far from it in terms of Cα RMSD (15.3 Å). Nevertheless, this model still predicts correctly many of the residues at the subunit interfaces: 14% of the native contacts (the number of correctly predicted residue–residue contacts in the model divided by the number of contacts in the native structure) are identified correctly, which is considered ‘acceptable’ in blind assessments of protein docking methods at CAPRI meetings

We now analyze the extent to which model accuracy is limited by sampling and scoring; sampling is limiting when configurations close to the native state are not generated during optimization and scoring is limiting when the most native-like configurations do not correspond to the global minimum of *S*.

For three of our benchmark cases (1ha0, 1ibr, and 1ko9), we plotted the best score (*S*_{min}), the corresponding Cα RMSD (RMSD(*S*_{min})), and the minimum Cα RMSD (RMSD_{min}) for a set of structures resulting from global sampling against the number of independent optimizations (Fig. 6). For the two-domain protein 1ha0, *S*_{min}, RMSD(*S*_{min}), and RMSD_{min} do not improve substantially beyond 100 independent optimizations. Moreover, global sampling performs as well as local sampling starting from the RBNC (Fig. 6A; Supplementary Table S1). For the protein complex 1ibr, *S*_{min} and RMSD(S_{min}) reach a plateau at approximately 100 optimizations. RMSD(*S*_{min}) asymptotically reaches the value for local sampling around the RBNC, which is well above 10 Å, showing that a non-native configuration scores better than the RBNC (Fig. 6B; Table S2). The values for RMSD_{min} asymptotically approach a value of approximately 3 Å (Cα RMSD of RBNC is 2.3 Å), indicating that near-native configurations are sampled, but do not score well. In summary, we observed for all two-component systems (two-domain proteins and binary complexes) that the best-scoring models scored approximately the same as the refined RNBC configurations; therefore, sampling does not appear to be limiting the accuracy of our predictions in these cases.

Scoring *versus* sampling. A: The minimum score *S*_{min} (left), the corresponding Cα RMSD (middle), and the minimum Cα RMSD for all models (RMSD_{min}; right) are plotted as a function of the number of independent optimizations for the benchmark **...**

For the three-domain protein 1ko9, *S*_{min}, RMSD(S_{min}) and RMSD_{min} improve slowly with an increase in the number of independent optimizations, beginning to reach a plateau at 400 independent optimizations. *S*_{min} from global optimization exceeds *S*_{min} of the models obtained by refining the RBNC, indicating that in this case the accuracy of the predicted quaternary structures is limited by sampling. Similar results are obtained for the other three- and four domain proteins; the global minimum of *S* could only be reached using global sampling if we performed at least 1,000 independent optimizations. However, when a sufficiently accurate starting configuration is available (*eg*, from a similar template protein), highly accurate configuration models can be obtained using only 100 independent optimizations (Cα RMSD < 2 Å) (Supplementary Table S1).

We incorporated information from a SAXS profile into protein structure modeling by satisfaction of spatial restraints, implemented in our *Integrative Modeling Platform* (IMP) ^{3}^{; }^{26}. We then benchmarked IMP by modeling quaternary structures of multidomain proteins and protein assemblies using rigid domains and proteins, respectively. We discuss here (i) the relationship between our method and those of others, (ii) the benefits of integrating protein structure modeling and SAXS fitting, (iii) the limitations arising from inaccurate scoring, imperfect sampling, and errors in rigid bodies, as well as (iv) the scope for integration of additional information into the modeling process.

Recently, experimental SAXS profiles have also been used to calculate atomic models of proteins by BUNCH ^{22} and CNS, which relies on NMR-derived restraints as well as a SAXS profile ^{24}. We now outline similarities and differences between these two approaches and that of ours.

Our SAXS penalty is similar to the score implemented previously in CNS ^{24}. Identically, both scores employ the Debye formula to treat the excluded solvent and rely on χ^{2} as the SAXS penalty. Moreover, they both calculate the partial derivatives allowing them to use gradient-based optimization techniques. However, the computation of the SAXS penalty and its derivative in our approach is significantly faster compared to the original implementation because we employ the electron pair distribution function *P*(*r*) for the calculation of *χ*^{2} and its derivative (Supplementary Theory and Methods). The gain in efficiency depends on the granularity of *I*(*q*) sampling. For example, we reduce the computation time by two orders of magnitude for a dataset with 100 data points and more for finer-sampled profiles. This gain in computational speed does not reduce the precision of the calculated SAXS profiles by more than the typical precision of experimental SAXS profiles. Moreover, we can gain additional efficiency relative to CNS through the use of rigid bodies, which required changes in the calculation of the partial derivatives of *S _{SAXS}* (Supplementary Material). The gains in computational efficiency are important because they allowed us to sample the space of possible solutions more exhaustively.

Our SAXS penalty is different from the penalty in BUNCH ^{22}, which is calculated by CRYSOL ^{41} and up-weighs high frequency components in χ^{2}. While we also tested such a scoring function, it did not result in a significant improvement for our benchmark. For example, we modeled the Diphteria toxin using a χ^{2} that weights frequencies according to *q*^{2}. The corresponding *S _{SAXS}* term did not result in a more accurate model if used only in conjunction with

Our optimization protocol consists of independent minimizations of the scoring function from many random starting configurations, with the aim to sample the entire configuration space (for the global optimization mode) (Fig. 1). For each minimization, we use a simulated annealing biased-Monte Carlo algorithm; each Monte Carlo step is followed by a local quasi-Newton relaxation, for which the first derivatives of the scoring function are needed. Thus, the biased-Monte Carlo process samples only local minima. In contrast, BUNCH ^{22} employs a conventional simulated annealing Monte Carlo protocol, in which the sampling is not restricted to local minima. However, in many applications, such as X-ray crystallography ^{46}, NMR spectroscopy ^{47}, comparative protein structure modeling ^{28}, and *ab initio* structure prediction of proteins ^{48} and assemblies ^{49}, optimization methods that employ the first derivatives are known to be significantly more efficient than Monte Carlo schemes. Thus, we efficiently implemented and used the first derivatives in our optimization.

A perennial problem in structure modeling is whether or not an optimization scheme finds all the good scoring solutions. To at least partly address this problem, we run many minimizations in parallel and independently on a large computer cluster (*ie*, hundreds of nodes). The resulting large sample of solutions is then clustered, to present them more parsimoniously for subsequent analysis. “By construction, the structures in one cluster are generally distinct from the structures in another cluster; they involve dissimilar interfaces and have Cα-RMSD values worse than 12 Å (Figs. 4, ,5).5). An analysis of the sampling shows that our protocol usually finds the global minimum for up to four rigid bodies (below); thus, most good-scoring local minima are also expected to be sampled in these cases. A large computer cluster with at least 100 processors is currently needed for an efficient use of our method, perhaps limiting its practical utility. Nevertheless, such computing clusters are becoming increasingly available to many users. In addition, our software is being adapted to run on graphics processing units, such as Nvidia's Tesla with 240 processors (http://www.nvidia.com/object/tesla_c1060.html), which might enable efficient application on a single desktop computer.

Integrative computational methods can exploit various kinds of spatial information to determine the assembly structures at higher accuracy and precision than is possible based on each individual type of data ^{3}^{; }^{26}; in conjunction, pieces of data that are relatively uninformative by themselves can still result in accurate and precise models of proteins and assemblies.

Here, we combined a SAXS profile with information about protein structures that can be calculated only from their sequence. Specifically, we supplemented the SAXS term (*S _{SAXS}*) by the penalties for steric clashes (

Our method does not necessarily predict unique best-scoring solutions; due to the low information content of the input restraints, models from different clusters can have comparable scores. For example, *S _{SAXS}* is completely invariant to rotations of a spherical rigid body. For the benchmark case 1cb6, models from the near-native cluster and a non-native cluster have similar scores (Supplementary Table S1). Nevertheless, due to the combination of different types of information, the number of distinct configurations compatible with the input data is generally much smaller compared to using only a single type of information (

Our benchmark allows us to assess the limitations of the protocol and highlight opportunities for future research. Modeling by optimization depends on two conditions: (*i*) The scoring function needs to have a global minimum at the native or near-native state and (*ii*) the sampling needs to be sufficiently thorough to find the global or near-global minimum. Therefore, we tested the degree to which our method is limited by the accuracy of the scoring function and the thoroughness of sampling. We also asked how accurate do the rigid bodies need to be so that the scoring function still has the global minimum at the native state. The assessment of sampling allowed us to judge when a global search without reliance on a suitable initial structure can be successful; or, conversely, when we need a sufficiently accurate initial model so that at least a near-native state can be found by local sampling alone. We also analyzed the accuracy of the method as a function of the number of rigid bodies, which allows us to further qualify sampling and scoring limitations.

For systems of two rigid bodies (*ie*, two-domain proteins and binary protein complexes), even the global sampling produced numerous configurations close to the global minimum of the scoring function *S* (Fig. 6A, B; Supplementary Tables S1 and S2). Thus, the accuracy of these models is largely determined by the accuracy of *S*. The global minimum of *S* corresponded to the native or near-native state only if the rigid bodies were not too distorted (*ie*, Cα RMSD of less than 3 Å). Therefore, as expected, the accuracy of the rigid bodies crucially influences the landscape of *S*.

Among the individual terms of *S*, it is not surprising that *S _{DOPE}* sometimes favors non-native configurations if sufficiently distorted rigid bodies are used. Errors in the positions of exposed atoms interfere with their packing, which is the aspect scored by

For three or more rigid bodies, the typical number of independent optimizations we used (1000) was insufficient to reliably find the global minimum of *S* in a global search (Fig. 6C). For four domains (benchmark case 1cb6), we increased the number of initial configurations to 5,000, requiring 3 days on 200 CPUs. Such an exhaustive sampling is impossible without employing a large computer cluster. However, configurations close to the global minimum of *S* could be found at dramatically reduced computational cost employing local sampling, if the template (initial) configuration was sufficiently close to the native state (Cα RMSD < 10 Å). In such a case, the local search is more efficient than the global search because no computing time is wasted on searching far from the native configuration. In the future, the development of computationally more expensive and sophisticated sampling strategies may allow sampling the configurations of five or more domains with a larger radius of convergence than that of our present optimization.

To overcome the limitations on prediction accuracy arising from rigid body errors, we probably need to abandon the rigid body approximation. In the field of protein-protein docking, simultaneous sampling of different component conformations and their configuration has been described^{49}. Similar approaches could be used for fitting a configuration to a SAXS profile, requiring both more sophisticated scoring functions and sampling algorithms than those described here. The use of high-angle scattering data might allow us to compute atomic structures more accurately, but this goal would require flexible modeling of the atomic structures. Structural changes at the domain level result in signals at frequencies beyond *q* = 0.5 Å^{-1}, which we did not consider in our calculations because we represented domains and proteins as rigid bodies.

Our protocol currently relies on the sample protein or complex existing in a single state. However, proteins and complexes can exist in equilibrium among different states, corresponding to varied packing between secondary structure segments, domains, and proteins as well as variations in unstructured regions such as long domain linkers. An approach to score a given ensemble of models has been proposed recently ^{51}. Our protocol could also be extended to optimize an ensemble of models using a similar score. However, the increased computational effort as well as the limited information in SAXS profiles will limit such approaches.

Given the relatively low information content of a SAXS profile and the limitations in our protein structure modeling terms, incorporating additional information into the scoring function *S* is desirable. Using supplementary information is further justified by the sensitivity of *S* to rigid body errors and possible systematic experimental errors in a SAXS profile (*eg*, due to aggregation of macromolecules in solution). Further integration is facilitated by our implementation of SAXS fitting in IMP, which can already produce models by simultaneously satisfying a large variety of other spatial restraints. In addition to the SAXS and modeling terms used here, IMP can incorporate (*i*) restraints implied by an alignment between the modeled sequence and related structures ^{28}, (*ii*) restraints implied by an alignment of the modeled sequence and many short segments of known structure, (*iii*) bioinformatics analysis of protein interaction modes ^{52}, (*iv*) protein-protein docking that is restrained by the composition of interacting surfaces determined by NMR spectroscopy ^{53}, (*v*) symmetry and density from a cryo-EM map of the assembly ^{54}, and (*vi*) proximity of subunits inferred from immuno-affinity purifications, yeast two-hybrid system, footprinting, and chemical cross-linking ^{26}. The global shape information from SAXS is especially complementary to local restraints, such as the atomic distance restraints derived from chemical cross-linking detected by mass spectrometry ^{55}. Another attractive application is the integration of SAXS for structural characterization of component structures of a large assembly whose overall density is determined by cryo-EM.

An accurate quaternary structure model of a protein or a protein assembly can be obtained using only a SAXS profile, stereochemistry restraints from a molecular mechanics force field, and an atomic distance-dependent statistical potential, provided sufficiently accurate approximations for the constituent domain and protein structures are available. Otherwise, the predictions are generally ambiguous and have large errors. Our integration of a SAXS profile into modeling by satisfaction of spatial restraints will facilitate further integration of different kinds of data for structure determination of proteins and their assemblies.

We thank Maya Topf, Narayanan Eswar, Frank Alber, Fred Davis, Min-Yi Shen, and Marc Marti-Renom for fruitful discussions. FF is grateful to a long-term fellowship from the Human Frontier Science Project Organization (HFSPO). KAK was supported by a NSDEG Graduate Fellowship. SSRL is funded by DOE BES, and the SSRL Structural Molecular Biology Program is supported by DOE, OBER and NIH NCRR BTP Grant (P41 RR001209). DAA has been supported by the Howard Hughes Medical Institute, DAA and AS have been supported by a UC Discovery Grant (bio03-10401/Agard). AS has also been supported by The Sandler Family Supporting Foundation, NIH (R01 GM54762, R01 GM083960, U54 RR022220, and PN2 EY016525), NSF (EIA-032645 and IIS-0705196), Hewlett-Packard, NetApps, IBM, and Intel.

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1. Sali A, Glaeser R, Earnest T, Baumeister W. From words to literature in structural proteomics. Nature. 2003;422:216–25. [PubMed]

2. Robinson CV, Sali A, Baumeister W. The molecular sociology of the cell. Nature. 2007;450:973–82. [PubMed]

3. Alber F, Förster F, Korkin K, Topf M, Sali A. Integrating Diverse Data for Structure Determination of Macromolecular Assemblies. Annu Rev Biochem. 2008 in press. [PubMed]

4. Doniach S. Changes in biomolecular conformation seen by small angle X-ray scattering. Chem Rev. 2001;101:1763–78. [PubMed]

5. Koch MH, Vachette P, Svergun DI. Small-angle scattering: a view on the properties, structures and structural changes of biological macromolecules in solution. Q Rev Biophys. 2003;36:147–227. [PubMed]

6. Putnam CD, Hammel M, Hura GL, Tainer JA. X-ray solution scattering (SAXS) combined with crystallography and computation: defining accurate macromolecular structures, conformations and assemblies in solution. Vol. 40. Cambridge Journals Online; 2007. [PubMed]

7. Svergun D, Koch M. Small-angle scattering studies of biological macromolecules in solution. Rep Prog Phys. 2003;66:1735–1782.

8. Das R, Kwok LW, Millett IS, Bai Y, Mills TT, Jacob J, Maskel GS, Seifert S, Mochrie SGJ, Thiyagarajan P, Doniach S, Pollack L, Herschlag D. The fastest global events in RNA folding: Electrostatic relaxation and tertiary collapse of the tetrahymena ribozyme. J Mol Biol. 2003;332:311–319. [PubMed]

9. Canady MA, Tsuruta H, Johnson JE. Analysis of rapid, large-scale protein quaternary structural changes: time-resolved X-ray solution scattering of Nudaurelia capensis omega virus (NomegaV) maturation. J Mol Biol. 2001;311:803–14. [PubMed]

10. Davies JM, Tsuruta H, May AP, Weis WI. Conformational changes of p97 during nucleotide hydrolysis determined by small-angle X-Ray scattering. Structure. 2005;13:183–95. [PubMed]

11. Yamagata A, Tainer JA. Hexameric structures of the archaeal secretion ATPase GspE and implications for a universal secretion mechanism. EMBO J. 2007;26:878–90. Epub 2007 Jan 25. [PubMed]

12. Krukenberg KA, Förster F, Rice L, Sali A, Agard DA. A novel conformation of E. coli Hsp90 in solution: insights into the conformational dynamics of Hsp90. Structure. 2008 in press. [PMC free article] [PubMed]

13. Sondermann H, Nagar B, Bar-Sagi D, Kuriyan J. Computational docking and solution x-ray scattering predict a membrane-interacting role for the histone domain of the Ras activator son of sevenless. Proc Natl Acad Sci USA. 2005;102:16632–7. [PubMed]

14. Zheng W, Doniach S. Fold recognition aided by constraints from small angle X-ray scattering data. Protein Eng Des Sel. 2005;18:209–19. Epub 2005 Apr 21. [PubMed]

15. Stuhrmann H. Interpretation of small-angle scattering functions of dilute solutions and gases. A representation of the structures related to a one-particle scattering function. Acta Crystallogr A. 1970;26:297–306.

16. Svergun D, Stuhrmann H. New developments in direct shape determination from small-angle scattering 1. Theory and model calculations. Acta Crystallogr A. 1991;47:736–44.

17. Chacon P, Moran F, Diaz JF, Pantos E, Andreu JM. Low-resolution structures of proteins in solution retrieved from X-ray scattering with a genetic algorithm. Biophys J. 1998;74:2760–75. [PubMed]

18. Walther D, Cohen FE, Doniach S. Reconstruction of low-resolution three-dimensional density maps from one-dimensional small-angle X-ray solution scattering data for biomolecules. J Appl Crystallogr. 2000;33:350–363.

19. Svergun DI. Restoring low resolution structure of biological macromolecules from solution scattering using simulated annealing. Biophys J. 1999;76:2879–86. [PubMed]

20. Svergun DI, Petoukhov MV, Koch MH. Determination of domain structure of proteins from X-ray solution scattering. Biophys J. 2001;80:2946–53. [PubMed]

21. Petoukhov MV, Eady NA, Brown KA, Svergun DI. Addition of missing loops and domains to protein models by x-ray solution scattering. Biophys J. 2002;83:3113–25. [PubMed]

22. Petoukhov MV, Svergun DI. Global Rigid Body Modeling of Macromolecular Complexes against Small-Angle Scattering Data. Biophys J. 2005;89:1237–50. [PubMed]

23. Petoukhov MV, Monie TP, Allain FH, Matthews S, Curry S, Svergun DI. Conformation of polypyrimidine tract binding protein in solution. Structure. 2006;14:1021–7. [PubMed]

24. Grishaev A, Wu J, Trewhella J, Bax A. Refinement of multidomain protein structures by combination of solution small-angle X-ray scattering and NMR data. J Am Chem Soc. 2005;127:16621–8. [PubMed]

25. Wu Y, Tian X, Lu M, Chen M, Wang Q, Ma J. Folding of small helical proteins assisted by small-angle X-ray scattering profiles. Structure. 2005;13:1587–97. [PubMed]

26. Alber F, Dokudovskaya S, Veenhoff LM, Zhang W, Kipper J, Devos D, Suprapto A, Karni-Schmidt O, Williams R, Chait BT, Rout MP, Sali A. Determining the architectures of macromolecular assemblies. Nature. 2007;450:683–694. [PubMed]

27. MacKerell AD, Jr, Bashford D, Bellott M, Dunbrack RL, Jr, Evanseck J, Field MJ, Fischer S, Gao J, Guo H, Ha S, Joseph D, Kuchnir L, Kuczera K, Lau FTK, Mattos C, Michnick S, Ngo T, Nguyen DT, Prodhom B, Reiher I, W E, Roux B, Schlenkrich M, Smith J, Stote R, Straub J, Watanabe M, Wiorkiewicz-Kuczera J, Yin D, Karplus M. All-atom empirical potential for molecular modeling and dynamics studies of protein. Journal of Physical Chemistry B. 1998;102:3586–616. [PubMed]

28. Sali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol. 1993;234:779–815. [PubMed]

29. Shen MY, Sali A. Statistical potential for assessment and prediction of protein structures. Protein Sci. 2006;15:2507–24. [PubMed]

30. Debye P. Zerstreuung von Roentgenstrahlen. Ann Phys. 1915;46:809–23.

31. Fraser R, MacRae T, Suzuki E. An improved method for calculating the contribution of solvent to the X-ray diffraction pattern of biological molecules. J Appl Crystallogr. 1978;11:693–694.

32. Shanno D, Phua K. Remark on algorithm 500. ACM Trans Math Software. 1980;6:618–622.

33. Press WH, Teukolsky SA, Vetterling WT, Flannery BP. Numerical Recipes in FORTRAN: The art of scientific computing. 2nd. Cambridge University Press; Cambridge: 1992.

34. Johnson SC. Hierarchical Clustering Schemes. Psychometrika. 1967;2:241–254. [PubMed]

35. Marti-Renom MA, Ilyin VA, Sali A. DBAli: a database of protein structure alignments. Bioinformatics. 2001;17:746–7. [PubMed]

36. Janin J, Henrick K, Moult J, Eyck LT, Sternberg MJ, Vajda S, Vakser I, Wodak SJ. CAPRI: a Critical Assessment of PRedicted Interactions. Proteins. 2003;52:2–9. [PubMed]

37. Goddard TD, Huang CC, Ferrin TE. Visualizing density maps with UCSF Chimera. J Struct Biol. 2007;157:281–287. [PubMed]

38. Smolsky IL, Liu P, Niebuhr M, Ito L, Weiss TM, Tsuruta H. Biological small-angle X-ray scattering facility at the Stanford Synchrotron Radiation Laboratory. J Appl Crystallogr. 2007;40:s453–8.

39. Kozak M. Direct comparison of the crystal and solution structure of glucose/xylose isomerase from Streptomyces rubiginosus. Protein Pept Lett. 2005;12:547–50. [PubMed]

40. Konarev PV, Volkov VV, Sokolova AV, Koch MHJ, Svergun DI. Primus: a Windows PC-based system for small angle scattering data anslysis. J Appl Crystallogr. 2003;36:1277–82.

41. Svergun D, Barberato C, Koch M. CRYSOL - A program to evaluate x-ray solution scattering of biological macromolecules from atomic coordinates. J Appl Crystallogr. 1995;28:768–773.

42. Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A. Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct. 2000;29:291–325. [PubMed]

43. Alber F, Kim MF, Sali A. Structural characterization of assemblies from overall shape and subcomplex compositions. Structure. 2005;13:435–45. [PubMed]

44. Mendez R, Leplae R, De Maria L, Wodak SJ. Assessment of blind predictions of protein-protein interactions: current status of docking methods. Proteins. 2003;52:51–67. [PubMed]

45. Zuo X, Tiede DM. Resolving conflicting crystallographic and NMR models for solution-state DNA with solution X-ray diffraction. J Am Chem Soc. 2005;127:16–7. [PubMed]

46. Brunger AT, Kuriyan J, Karplus M. Crystallographic R factor refinement by molecular dynamics. Science. 1987;235:458–460. [PubMed]

47. Brunger AT, Clore GM, Gronenborn AM, Karplus M. Three-dimensional structure of proteins determined by molecular dynamics with interproton distance restraints: Application to crambin. Proc Nat Acad Sci USA. 1986;83:3801–5. [PubMed]

48. Bradley P, Misura KM, Baker D. Toward high-resolution de novo structure prediction for small proteins. Science. 2005;309:1868–71. [PubMed]

49. Gray JJ, Moughon S, Wang C, Schueler-Furman O, Kuhlman B, Rohl CA, Baker D. Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. J Mol Biol. 2003;331:281–99. [PubMed]

50. Stone JE, Phillips JC, Freddolino PL, Hardy DJ, Trabuco LG, Schulten K. Accelerating molecular modeling applications with graphics processors. J Comput Chem. 2007;28:2618–40. [PubMed]

51. Bernado P, Mylonas E, Petoukhov MV, Blackledge M, Svergun DI. Structural characterization of flexible proteins using small-angle X-ray scattering. J Am Chem Soc. 2007;129:5656–64. Epub 2007 Apr 6. [PubMed]

52. Korkin D, Davis FP, Alber F, Luong T, Shen MY, Lucic V, Kennedy MB, Sali A. Structural modeling of protein interactions by analogy: application to PSD-95. PLoS Comput Biol. 2006;2:e153. [PubMed]

53. Kim MF, Sali A, Dotsch V, Rees M. 2008 xxx. in preparation.

54. Topf M, Lasker K, Webb B, Wolfson H, Chiu W, Sali A. Protein structure fitting and refinement guided by cryo-EM density. Structure. 2008;16:295–307. [PMC free article] [PubMed]

55. Seebacher J, Mallick P, Zhang N, Eddes JS, Aebersold R, Gelb MH. Protein cross-linking analysis using mass spectrometry, isotope-coded cross-linkers, and integrated computational data processing. Journal of proteome research. 2006;5:2270–82. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |