|Home | About | Journals | Submit | Contact Us | Français|
X-ray diffraction plays a pivotal role in understanding of biological systems by revealing atomic structures of proteins, nucleic acids, and their complexes, with much recent interest in very large assemblies like the ribosome. Since crystals of such large assemblies often diffract weakly (resolution worse than 4 Å), we need methods that work at such low resolution. In macromolecular assemblies, some of the components may be known at high resolution, while others are unknown: current refinement methods fail as they require a high-resolution starting structure for the entire complex1. Determining such complexes, which are often of key biological importance, should be possible in principle as the number of independent diffraction intensities at a resolution below 5 Å generally exceed the number of degrees of freedom. Here we introduce a new method that adds specific information from known homologous structures but allows global and local deformations of these homology models. Our approach uses the observation that local protein structure tends to be conserved as sequence and function evolve. Cross-validation with Rfree determines the optimum deformation and influence of the homology model. For test cases at 3.5 – 5 Å resolution with known structures at high resolution, our method gives significant improvements over conventional refinement in the model coordinate accuracy, the definition of secondary structure, and the quality of electron density maps. For re-refinements of a representative set of 19 low-resolution crystal structures from the PDB, we find similar improvements. Thus, a structure derived from low-resolution diffraction data can have quality similar to a high-resolution structure. Our method is applicable to studying weakly diffracting crystals using X-ray micro-diffraction2 as well as data from new X-ray light sources3. Use of homology information is not restricted to X-ray crystallography and cryo-electron microscopy: as optical imaging advances to sub-nanometer resolution4,5, it can use similar tools.
A grand challenge in structural biology is to determine atomic structures of large macromolecular complexes. Unfortunately, growth of well-ordered crystals needed for high-resolution X-ray crystallography, is often precluded by inherent flexibility, disordered solvent, lipids, and other essential components; diffraction often is weak, anisotropic and has an effective resolution of worse than ~ 4 Å. Atomic interpretation of resulting electron density maps is limited to fitting rigid models. There is a need for accurate atomic structures from low-resolution diffraction data to reach mechanistic conclusions that critically depend on individually resolved residues.
X-ray crystal structures can achieve “super-resolution” where the estimated coordinate accuracy is better than the resolution limit of the diffraction data (typically, by 10x), by imposing constraints when interpreting observed diffraction data and electron density maps. Super-resolution arises from the excluded volumes of atoms: the scattering objects are always further apart than half of the wavelength of X-ray radiation typically used (1–2 Å). This atomicity leads to a solution of the phase problem for small molecule crystals6, and it allows estimation of coordinate errors7. Assuming polymers have standard chemical bond lengths and bond angles extends this concept to the resolution characteristic of macromolecular crystallography8,9.
Low-resolution X-ray diffraction data at 5 Å contains, in principle, sufficient information to determine the true structure (the “target structure”) since the number of observable diffracted intensities exceeds the number of torsion-angle degrees of freedom of a macromolecule10. Although an exhaustive conformational search in torsion-angle space against the diffraction data should lead to an accurate structure at 5 Å resolution, such a search is computationally intractable. Our approach aids the search by adding known information to the observed data at low resolution. Instead of adding generic information about macromolecular stereochemistry (idealized chemical bond lengths, bond angles, and atom sizes that heralded the era of reciprocal-space restrained refinement8,9), we add specific information for the particular macromolecule(s) or complex, deriving this information from known structures of homologous proteins or domains (the “reference model”).
The target structure often differs from the reference model by large-scale deformations, related to the approximate conservation of local polypeptide geometry as sequence and function evolve. How can such deformations be mathematically described? An early approach11 used low-frequency normal modes, shown to reproduce large-scale collective changes in structures with very few degrees of freedom12; it has been used to refine protein structures with low-resolution X-ray or cryo-electron microscopy data13,14. Here we take a very different approach. Instead of choosing special collective degrees of freedom, we use an extension of our Deformable Elastic Network (DEN) approach15. DEN fits of models into cryo-electron density maps allowing large deformations such as hinge bending. DEN defines springs between selected atom pairs using the reference model as the template. The equilibrium distance of each spring (distance at which its potential energy is minimum) is initially set to the distance between these atoms in the starting structure for refinement. As torsion angle molecular dynamics against a combined target function (comprising diffraction data, DEN, and energy, Eq. 1) proceeds, the equilibrium lengths of the DEN network are adjusted to incorporate the distance information from the reference model. The degree of this adjustment is controlled by a parameter, γ (Online Methods). Here we extend DEN to homology models, or more generally, any reference model, such as a predicted structure.
We first tested our method on a model system, the protein penicillopepsin whose structure had been determined to dmin=1.8 Å resolution (PDB ID 3app)16. Synthetic low resolution data sets were generated at 3.5, 4.0, 4.5, & 5.0 Å resolution (Online Methods). Optimum values for the γ and wDEN parameters used for DEN refinement were obtained by a grid search against Rfree (Fig. 1a for refinement at 4.5 Å resolution). With this standard protocol, referred to here as “DEN”, the Rfree optimum is found at (γ, wDEN) = (0,10) (marked by black ellipse). As a control, we performed a refinement using exactly the same protocol but with the DEN potential set to zero; this corresponds to a second standard protocol, referred to here as “noDEN”. We assess the quality of the resulting models by comparing the structures resulting from the DEN and noDEN refinements to the target structure (the 1.8 Å resolution crystal structure of penicillopepsin, 3app). Fig. 1b shows a contour plot of the all-atom root-mean-square difference (RMSD) between 3app and the corresponding DEN refined structures from Fig. 1a. The RMSD shows good agreement with the Rfree values. Thus, the lowest Rfree value should be a good predictor for the (γ, wDEN) pair that gives the optimum structure in cases when a high resolution target structure is not known. The resulting electron density maps (Supplementary Fig. 1) are greatly improved showing better connectivity and sidechain definition compared to noDEN refinement.
DEN refinement dramatically improves the structure compared to noDEN over a wide range of low resolution (Figs. 1c to 1e, Table 1), and with and without experimental phase information (compare Fig. 1 and Supplementary Fig. 2): The DEN Rfree values (Fig. 1c) are nearly independent of the limiting resolution of the synthetic data sets (black), whereas they steadily increase for noDEN (red). For the data set at 5 Å resolution, DEN improves 17 Rfree by 0.1 (black double-arrow). The GDT(<1Å) score measures the fraction of atoms that fit the target structure well and thus focuses on the more accurate part of the structure (Fig. 1d). For data sets at dmin>4 Å, the GDT scores dramatically worsen for the structures refined without DEN: the resulting GDT score is worse than that of the initial model (dashed line). In contrast, the GDT score of the DEN refined models is consistently high. The RMSD to the target structure (3app) (Fig. 1e) is also significantly smaller with DEN. These improvements persist even when refinement cycles are added to the protocol without DEN (i.e., with wDEN set to zero) (Supplementary Fig. 3).
In a broader test, we applied our method to 19 existing structures for which only low-resolution X-ray data are available (worse than 4 Å). To focus on DEN’s core strengths, we chose to re-refine the existing low-resolution structures with the help of a reference model that contains higher-resolution information. To minimize bias, we automated the re-refinement which is expected to limit structure improvement; as discussed below; much better results could be obtained by an investigator familiar with the structure and differences to the reference model.
For each selected PDB structure, a reference model was built by homology modeling on templates manually selected by simultaneously satisfying the three criteria of high sequence identity, high resolution, and large number of matched residues (Supplementary Tables 1 & 2). On average, 86% of the residues could be modeled. In some extreme cases (PDB 1av1, 2vkz, and 2bf1), the Main Chain RMSD of the template to the corresponding low-resolution PDB structure was around 10 Å, in which case structural similarity is likely to be limited and significant improvement is not expected. We included these cases to see if DEN can lead to improvements (2vkz and 2bf1, see below), and show that even in the worst case (1av1) DEN does not lead to a deterioration of the structure.
The Rfree values of the DEN refined structures (Fig. 2a, Table 2, Supplementary Fig. 4) all improved relative to the noDEN structures. Eleven structures show an improvement of over 0.01, four an improvement of over 0.02, and the best an improvement of 0.058 (1xxi), a 12% improvement. The difference between R and Rfree is on average 0.018 smaller for DEN vs. noDEN (Table 2); this indicates that overfitting is significantly reduced by DEN. Both the minimum and the maximum Rfree values are generally lower for DEN than for noDEN (Supplementary Table 3), indicating that relevant, low-Rfree regions of conformational space are better sampled.
The Ramachandran Score shows that DEN refinement generally improves the secondary structure compared to noDEN (Fig. 2b and Table 2) with an average increase of 0.05. The largest improvement (0.23 or 37%) is again seen for 1xxi. There is high correlation between Rfree and the Ramachandran Score Improvements (Fig. 2c). The four cases where the Ramachandran Score has slightly worsened (1av1, 1xdv, 2a62, 2bf1) are all cases with an optimal value of γ=1.0 (Supplementary Table 4). In these (and five additional cases with γ=1.0) the reference model is ignored, as it does not provide useful distances. As expected, the average Rfree improvement in these nine cases is small (0.0061, Supplementary Table 4). In contrast, for the ten cases with γ<1, the average Rfree improvement is significant (0.022, Supplementary Table 4). These ten successful cases cover a variety of differences between the reference model and the crystal structure, including large (sub-)domain motions, hinge motions, local structural differences, or differences throughout (Table 2 and Supplementary Fig. 5).
We calculated electron density maps from experimental intensities combined with model phases from the DEN and noDEN refined structures. In the three cases shown (Fig. 3) the noDEN backbone density is broken in several places (red), making it difficult to correctly trace the backbone. In contrast, the DEN maps show a continuous backbone density (blue). The DEN refined coordinates also show clear improvements, e.g. with DEN, Pro114 in the 1ye1 structure (Fig. 3c & 3d) is shifted by 3.2 Å into well-defined electron density (blue); very little density is visible for noDEN (red). Such improved interpretability of electron density maps indicates that the phases calculated from DEN refined structures are superior to those from noDEN refined structures.
How does DEN increase the accuracy of the refined structure? For the penicillopepsin test case at 4.5 Å resolution we analyzed the distances between atom pairs not well defined by the diffraction data, specifically those with large root-mean-square fluctuations (RMSF) between the ten models of the noDEN refinement repeats (Fig. 4 Inset). These distances are much closer to the distances in the target structure (3app) for DEN compared to noDEN, showing that DEN provides information for distances that are not well defined by the diffraction data.
Performance can be much improved by manually selecting cutoff criteria and structural elements used for DEN. For the unligated SIV gp120 structure18 (PDB 2bf1) we restricted the DEN network to the main chain and Cβ-atoms of the reference model (HIV gp120-antibody complex at 2.0 Å resolution19, PDB 2nxz) and to regions of the structure considered reliable predictors of SIV gp120 structure (at least 35.8 % local sequence identity, Supplementary Table 2). Refinement with optimum DEN parameters resulted in a 4% lower Rfree value and 8% higher Ramachandran Score. With such judicious manual choice of the network, DEN used the reference model distances (γ=0.4, rather than γ=1 for automated DEN), and produced a more accurate structure as assessed by Rfree.
Cross-validation with Rfree allows determination of the optimum parameter values (particularly γ) yielding more accurate models at low resolution even when no high-resolution model is available. DEN can be applied to predicted structures, which have shown promise in molecular replacement 20 and to RNA/DNA. DEN can be easily modified in future developments: for example, individual atomic weights could account for model error, variations in a family of homologous structures, or predicted loop conformations. Criteria for selection of distances can also be modified as done manually for 2bf1.
The total energy function consists of a weighted sum of three terms
where Egeometric is a “geometric” or stereochemical energy function commonly used for macromolecular crystal structure refinement21, EML is a maximum likelihood target function that incorporates experimental X-ray amplitude (and optionally phase information) 22–24, EDEN (γ) is the DEN potential (Online Methods), and wa and wDEN are relative weights. Such combination energy functions have been used for refinement of macromolecules since their first introduction for energy refinement 25 and application to X-ray refinement9. The refinement protocol uses repeats of torsion angle dynamics26 against Etotal and B-factor refinement (Online Methods).
For DEN, the target sequence must be sufficiently close to an homologous sequence (sequence identity at least 30%), which means that the target and homolog will be structurally similar. It also requires that the homolog structure was determined at sufficiently high resolution (at least 3.5 Å resolution), so that it will contain useful specific high-resolution information about the target. Homology models for the target sequence were constructed using standard well-accepted methods such as SegMod27 or MODELLER28. Often, multiple homology models were combined to cover the entire target structure even when it consists of multiple domains and polypeptide chains.
Our approach is a major advance over conventional modeling of low resolution X-ray diffraction data by fitting rigid bodies29 since it accounts for deformations of the models while at the same time using a minimal set of variables (the single-bond torsion angles) (for five cases, our re-refinement achieved a substantial improvement in Rfree over rigid-body refined structures, Supplementary Table 1). Optionally, we turn off the DEN potential during the last refinement repeats to assess the robustness of the improvement achieved by DEN. The radius of convergence of DEN refinement is very large: in tests, automatic correction of polypeptide chain register in α-helices was observed, a notoriously difficult problem for macromolecular refinement.
We thank Paul Adams, Stephen Harrison, and Tim Fenn for discussions and the National Science Foundation for computing resources (CNS-0619926), the National Institutes of Health for a Roadmap Grant PN2 (EY016525) to ML (GM072970), the National Institutes of Health for a grant to ML (GM63718), and the Deutsche Forschungsgemeinschaft (DFG) for support to GFS.
AUTHOR CONTRIBUTIONSGFS developed the computational algorithms, GFS and ATB designed the computational experiments, performed all calculations and analysis. All authors wrote the paper.
The authors declare no competing financial interests.