A grand challenge in structural biology is to determine atomic structures of large macromolecular complexes. Unfortunately, growth of well-ordered crystals needed for high-resolution X-ray crystallography, is often precluded by inherent flexibility, disordered solvent, lipids, and other essential components; diffraction often is weak, anisotropic and has an effective resolution of worse than ~ 4 Å. Atomic interpretation of resulting electron density maps is limited to fitting rigid models. There is a need for accurate atomic structures from low-resolution diffraction data to reach mechanistic conclusions that critically depend on individually resolved residues.
X-ray crystal structures can achieve “super-resolution” where the estimated coordinate accuracy is better than the resolution limit of the diffraction data (typically, by 10x), by imposing constraints when interpreting observed diffraction data and electron density maps. Super-resolution arises from the excluded volumes of atoms: the scattering objects are always further apart than half of the wavelength of X-ray radiation typically used (1–2 Å). This atomicity leads to a solution of the phase problem for small molecule crystals6
, and it allows estimation of coordinate errors7
. Assuming polymers have standard chemical bond lengths and bond angles extends this concept to the resolution characteristic of macromolecular crystallography8,9
Low-resolution X-ray diffraction data at 5 Å contains, in principle, sufficient information to determine the true structure (the “target structure”) since the number of observable diffracted intensities exceeds the number of torsion-angle degrees of freedom of a macromolecule10
. Although an exhaustive conformational search in torsion-angle space against the diffraction data should lead to an accurate structure at 5 Å resolution, such a search is computationally intractable. Our approach aids the search by adding known information to the observed data at low resolution. Instead of adding generic information about macromolecular stereochemistry (idealized chemical bond lengths, bond angles, and atom sizes that heralded the era of reciprocal-space restrained refinement8,9
), we add specific
information for the particular macromolecule(s) or complex, deriving this information from known structures of homologous proteins or domains (the “reference model”).
The target structure often differs from the reference model by large-scale deformations, related to the approximate conservation of local polypeptide geometry as sequence and function evolve. How can such deformations be mathematically described? An early approach11
used low-frequency normal modes, shown to reproduce large-scale collective changes in structures with very few degrees of freedom12
; it has been used to refine protein structures with low-resolution X-ray or cryo-electron microscopy data13,14
. Here we take a very different approach. Instead of choosing special collective degrees of freedom, we use an extension of our Deformable Elastic Network (DEN) approach15
. DEN fits of models into cryo-electron density maps allowing large deformations such as hinge bending. DEN defines springs between selected atom pairs using the reference model as the template. The equilibrium distance of each spring (distance at which its potential energy is minimum) is initially set to the distance between these atoms in the starting structure for refinement. As torsion angle molecular dynamics against a combined target function (comprising diffraction data, DEN, and energy, Eq. 1
) proceeds, the equilibrium lengths of the DEN network are adjusted to incorporate the distance information from the reference model. The degree of this adjustment is controlled by a parameter, γ (Online Methods
). Here we extend DEN to homology models, or more generally, any reference model, such as a predicted structure.
We first tested our method on a model system, the protein penicillopepsin whose structure had been determined to dmin
=1.8 Å resolution (PDB ID 3app)16
. Synthetic low resolution data sets were generated at 3.5, 4.0, 4.5, & 5.0 Å resolution (Online Methods
). Optimum values for the γ and wDEN
parameters used for DEN refinement were obtained by a grid search against Rfree
( for refinement at 4.5 Å resolution). With this standard protocol, referred to here as “DEN”, the Rfree
optimum is found at (γ, wDEN
) = (0,10) (marked by black ellipse). As a control, we performed a refinement using exactly the same protocol but with the DEN potential set to zero; this corresponds to a second standard protocol, referred to here as “noDEN”. We assess the quality of the resulting models by comparing the structures resulting from the DEN and noDEN refinements to the target structure (the 1.8 Å resolution crystal structure of penicillopepsin, 3app). shows a contour plot of the all-atom root-mean-square difference (RMSD) between 3app and the corresponding DEN refined structures from . The RMSD shows good agreement with the Rfree
values. Thus, the lowest Rfree
value should be a good predictor for the (γ, wDEN
) pair that gives the optimum structure in cases when a high resolution target structure is not known. The resulting electron density maps (Supplementary Fig. 1
) are greatly improved showing better connectivity and sidechain definition compared to noDEN refinement.
Results for the penicillopepsin test calculations using the MLHL target function (experimental phase information)
DEN refinement dramatically improves the structure compared to noDEN over a wide range of low resolution (, ), and with and without experimental phase information (compare and Supplementary Fig. 2
): The DEN Rfree
values () are nearly independent of the limiting resolution of the synthetic data sets (black), whereas they steadily increase for noDEN (red). For the data set at 5 Å resolution, DEN improves 17
by 0.1 (black double-arrow). The GDT(<1Å) score measures the fraction of atoms that fit the target structure well and thus focuses on the more accurate part of the structure (). For data sets at dmin
>4 Å, the GDT scores dramatically worsen for the structures refined without DEN: the resulting GDT score is worse than that of the initial model (dashed line). In contrast, the GDT score of the DEN refined models is consistently high. The RMSD to the target structure (3app) () is also significantly smaller with DEN. These improvements persist even when refinement cycles are added to the protocol without DEN (i.e., with wDEN
set to zero) (Supplementary Fig. 3
DEN Refinement Improves Structures Refined against Four Synthetic Data Sets of Penicillopepsina
In a broader test, we applied our method to 19 existing structures for which only low-resolution X-ray data are available (worse than 4 Å). To focus on DEN’s core strengths, we chose to re-refine the existing low-resolution structures with the help of a reference model that contains higher-resolution information. To minimize bias, we automated the re-refinement which is expected to limit structure improvement; as discussed below; much better results could be obtained by an investigator familiar with the structure and differences to the reference model.
For each selected PDB structure, a reference model was built by homology modeling on templates manually selected by simultaneously satisfying the three criteria of high sequence identity, high resolution, and large number of matched residues (Supplementary Tables 1 & 2
). On average, 86% of the residues could be modeled. In some extreme cases (PDB 1av1, 2vkz, and 2bf1), the Main Chain RMSD of the template to the corresponding low-resolution PDB structure was around 10 Å, in which case structural similarity is likely to be limited and significant improvement is not expected. We included these cases to see if DEN can lead to improvements (2vkz and 2bf1, see below), and show that even in the worst case (1av1) DEN does not lead to a deterioration of the structure.
values of the DEN refined structures (, , Supplementary Fig. 4
) all improved relative to the noDEN structures. Eleven structures show an improvement of over 0.01, four an improvement of over 0.02, and the best an improvement of 0.058 (1xxi), a 12% improvement. The difference between R and Rfree
is on average 0.018 smaller for DEN vs. noDEN (); this indicates that overfitting is significantly reduced by DEN. Both the minimum and the maximum Rfree
values are generally lower for DEN than for noDEN (Supplementary Table 3
), indicating that relevant, low-Rfree
regions of conformational space are better sampled.
Re-refinement of nineteen low-resolution PDB structures
DEN Refinement Improves Low Resolution Structures in the PDBa
The Ramachandran Score shows that DEN refinement generally improves the secondary structure compared to noDEN ( and ) with an average increase of 0.05. The largest improvement (0.23 or 37%) is again seen for 1xxi. There is high correlation between Rfree
and the Ramachandran Score Improvements (). The four cases where the Ramachandran Score has slightly worsened (1av1, 1xdv, 2a62, 2bf1) are all cases with an optimal value of γ=1.0 (Supplementary Table 4
). In these (and five additional cases with γ=1.0) the reference model is ignored, as it does not provide useful distances. As expected, the average Rfree
improvement in these nine cases is small (0.0061, Supplementary Table 4
). In contrast, for the ten cases with γ<1, the average Rfree
improvement is significant (0.022, Supplementary Table 4
). These ten successful cases cover a variety of differences between the reference model and the crystal structure, including large (sub-)domain motions, hinge motions, local structural differences, or differences throughout ( and Supplementary Fig. 5
We calculated electron density maps from experimental intensities combined with model phases from the DEN and noDEN refined structures. In the three cases shown () the noDEN backbone density is broken in several places (red), making it difficult to correctly trace the backbone. In contrast, the DEN maps show a continuous backbone density (blue). The DEN refined coordinates also show clear improvements, e.g. with DEN, Pro114 in the 1ye1 structure () is shifted by 3.2 Å into well-defined electron density (blue); very little density is visible for noDEN (red). Such improved interpretability of electron density maps indicates that the phases calculated from DEN refined structures are superior to those from noDEN refined structures.
Electron density map improvement upon DEN refinement for three structures 3dmk, 1ye1, and 1xxi
How does DEN increase the accuracy of the refined structure? For the penicillopepsin test case at 4.5 Å resolution we analyzed the distances between atom pairs not well defined by the diffraction data, specifically those with large root-mean-square fluctuations (RMSF) between the ten models of the noDEN refinement repeats ( Inset). These distances are much closer to the distances in the target structure (3app) for DEN compared to noDEN, showing that DEN provides information for distances that are not well defined by the diffraction data.
DEN provides information for degrees of freedom that are weakly defined by the experimental diffraction data
Performance can be much improved by manually selecting cutoff criteria and structural elements used for DEN. For the unligated SIV gp120 structure18
(PDB 2bf1) we restricted the DEN network to the main chain and Cβ-atoms of the reference model (HIV gp120-antibody complex at 2.0 Å resolution19
, PDB 2nxz) and to regions of the structure considered reliable predictors of SIV gp120 structure (at least 35.8 % local sequence identity, Supplementary Table 2
). Refinement with optimum DEN parameters resulted in a 4% lower Rfree
value and 8% higher Ramachandran Score. With such judicious manual choice of the network, DEN used the reference model distances (γ=0.4, rather than γ=1 for automated DEN), and produced a more accurate structure as assessed by Rfree
Cross-validation with Rfree
allows determination of the optimum parameter values (particularly γ) yielding more accurate models at low resolution even when no high-resolution model is available. DEN can be applied to predicted structures, which have shown promise in molecular replacement 20
and to RNA/DNA. DEN can be easily modified in future developments: for example, individual atomic weights could account for model error, variations in a family of homologous structures, or predicted loop conformations. Criteria for selection of distances can also be modified as done manually for 2bf1.