The general principles behind the macromolecular crystal structure refinement program REFMAC5 are described.
This paper describes various components of the macromolecular crystallographic refinement program REFMAC5, which is distributed as part of the CCP4 suite. REFMAC5 utilizes different likelihood functions depending on the diffraction data employed (amplitudes or intensities), the presence of twinning and the availability of SAD/SIRAS experimental diffraction data. To ensure chemical and structural integrity of the refined model, REFMAC5 offers several classes of restraints and choices of model parameterization. Reliable models at resolutions at least as low as 4 Å can be achieved thanks to low-resolution refinement tools such as secondary-structure restraints, restraints to known homologous structures, automatic global and local NCS restraints, ‘jelly-body’ restraints and the use of novel long-range restraints on atomic displacement parameters (ADPs) based on the Kullback–Leibler divergence. REFMAC5 additionally offers TLS parameterization and, when high-resolution data are available, fast refinement of anisotropic ADPs. Refinement in the presence of twinning is performed in a fully automated fashion. REFMAC5 is a flexible and highly optimized refinement package that is ideally suited for refinement across the entire resolution spectrum encountered in macromolecular crystallography.
The automated building of a protein model into an electron density map remains a challenging problem. In the ARP/wARP approach, model building is facilitated by initially interpreting a density map with free atoms of unknown chemical identity; all structural information for such chemically unassigned atoms is discarded. Here, this is remedied by applying restraints between free atoms, and between free atoms and a partial protein model. These are based on geometric considerations of protein structure and tentative (conditional) assignments for the free atoms. Restraints are applied in the REFMAC5 refinement program and are generated on an ad hoc basis, allowing them to fluctuate from step to step. A large set of experimentally phased and molecular replacement structures showcases individual structures where automated building is improved drastically by the conditional restraints. The concept and implementation we present can also find application in restraining geometries, such as hydrogen bonds, in low-resolution refinement.
Recent developments in PHENIX are reported that allow the use of reference-model torsion restraints, secondary-structure hydrogen-bond restraints and Ramachandran restraints for improved macromolecular refinement in phenix.refine at low resolution.
Traditional methods for macromolecular refinement often have limited success at low resolution (3.0–3.5 Å or worse), producing models that score poorly on crystallographic and geometric validation criteria. To improve low-resolution refinement, knowledge from macromolecular chemistry and homology was used to add three new coordinate-restraint functions to the refinement program phenix.refine. Firstly, a ‘reference-model’ method uses an identical or homologous higher resolution model to add restraints on torsion angles to the geometric target function. Secondly, automatic restraints for common secondary-structure elements in proteins and nucleic acids were implemented that can help to preserve the secondary-structure geometry, which is often distorted at low resolution. Lastly, we have implemented Ramachandran-based restraints on the backbone torsion angles. In this method, a ϕ,ψ term is added to the geometric target function to minimize a modified Ramachandran landscape that smoothly combines favorable peaks identified from nonredundant high-quality data with unfavorable peaks calculated using a clash-based pseudo-energy function. All three methods show improved MolProbity validation statistics, typically complemented by a lowered R
free and a decreased gap between R
work and R
macromolecular crystallography; low resolution; refinement; automation
The structures of large macromolecular complexes in different functional states can be determined by cryo-electron microscopy, which yields electron density maps of low to intermediate resolutions. The maps can be combined with high-resolution atomic structures of components of the complex, to produce a model for the complex that is more accurate than the formal resolution of the map. To this end, methods have been developed to dock atomic models into density maps rigidly or flexibly, and to refine a docked model so as to optimize the fit of the atomic model into the map. We have developed a new refinement method called YUP.SCX. The electron density map is converted into a component of the potential energy function to which terms for stereochemical restraints and volume exclusion are added. The potential energy function is then minimized (using simulated annealing) to yield a stereochemically-restrained atomic structure that fits into the electron density map optimally. We used this procedure to construct an atomic model of the 70S ribosome in the pre-accommodation state. Although some atoms are displaced by as much as 33 Å, they divide themselves into nearly rigid fragments along natural boundaries with smooth transitions between the fragments.
Electron microscopy; simulated annealing; structural refinement
Local structural similarity restraints (LSSR) provide a novel method for exploiting NCS or structural similarity to an external target structure. Two examples are given where BUSTER re-refinement of PDB entries with LSSR produces marked improvements, enabling further structural features to be modelled.
Maximum-likelihood X-ray macromolecular structure refinement in BUSTER has been extended with restraints facilitating the exploitation of structural similarity. The similarity can be between two or more chains within the structure being refined, thus favouring NCS, or to a distinct ‘target’ structure that remains fixed during refinement. The local structural similarity restraints (LSSR) approach considers all distances less than 5.5 Å between pairs of atoms in the chain to be restrained. For each, the difference from the distance between the corresponding atoms in the related chain is found. LSSR applies a restraint penalty on each difference. A functional form that reaches a plateau for large differences is used to avoid the restraints distorting parts of the structure that are not similar. Because LSSR are local, there is no need to separate out domains. Some restraint pruning is still necessary, but this has been automated. LSSR have been available to academic users of BUSTER since 2009 with the easy-to-use -autoncs and -target target.pdb options. The use of LSSR is illustrated in the re-refinement of PDB entries 5rnt, where -target enables the correct ligand-binding structure to be found, and 1osg, where -autoncs contributes to the location of an additional copy of the cyclic peptide ligand.
BUSTER; NCS restraints; target-structure restraints; local structural similarity restraints
Zinc metalloenzymes play an important role in biology. However, due to the limitation of molecular force field energy restraints used in X-ray refinement at medium or low resolutions, the precise geometry of the zinc coordination environment can be difficult to distinguish from ambiguous electron density maps. Due to the difficulties involved in defining accurate force fields for metal ions, the QM/MM (Quantum-Mechanical /Molecular-Mechanical) method provides an attractive and more general alternative for the study and refinement of metalloprotein active sites. Herein we present three examples that indicate that QM/MM based refinement yields a superior description of the crystal structure based on R and Rfree values and on the inspection of the zinc coordination environment. It is concluded that QM/MM refinement is a useful general tool for the improvement of the metal coordination sphere in metalloenzyme active sites.
The deformable elastic network (DEN) method for reciprocal-space crystallographic refinement improves crystal structures, especially at resolutions lower than 3.5 Å. The DEN web service presented here intends to provide structural biologists with access to resources for running computationally intensive DEN refinements.
Deformable elastic network (DEN) restraints have proved to be a powerful tool for refining structures from low-resolution X-ray crystallographic data sets. Unfortunately, optimal refinement using DEN restraints requires extensive calculations and is often hindered by a lack of access to sufficient computational resources. The DEN web service presented here intends to provide structural biologists with access to resources for running computationally intensive DEN refinements in parallel on the Open Science Grid, the US cyberinfrastructure. Access to the grid is provided through a simple and intuitive web interface integrated into the SBGrid Science Portal. Using this portal, refinements combined with full parameter optimization that would take many thousands of hours on standard computational resources can now be completed in several hours. An example of the successful application of DEN restraints to the human Notch1 transcriptional complex using the grid resource, and summaries of all submitted refinements, are presented as justification.
deformable elastic network restraints; low-resolution refinement; DEN refinement
We report substantial improvements to the previously introduced automated NOE assignment and structure determination protocol known as PASD. The improved protocol includes extensive analysis of input spectral data to create a low-resolution contact map of residues expected to be close in space. This map is used to obtain reasonable initial guesses of NOE assignment likelihoods which are refined during subsequent structure calculations. Information in the contact map about which residues are predicted to not be close in space is applied via conservative repulsive distance restraints which are used in early phases of the structure calculations. In comparison with the previous protocol, the new protocol requires significantly less computation time. We show results of running the new PASD protocol on six proteins and demonstrate that useful assignment and structural information is extracted on proteins of more than 220 residues. We show that useful assignment information can be obtained even in the case in which a unique structure cannot be determined.
automated structure determination; automated NOE assignment; Xplor-NIH
A brief summary of the types of restraint defined in refinement dictionaries.
At the resolution available from most macromolecular crystals, the X-ray data alone are insufficient to lead to a chemically reasonable structure, so stereochemical restraints are essential. These usually restrain bond lengths, bond angles, planes and chiral volumes. The definition of these restraints and where the values come from are described. A dictionary entry contains information about the atom types, their connectivity and all the appropriate restraints. Torsion angles are not usually restrained, but they do have optimum values. In the special case of flexible five- and six-membered rings, including pentose and hexose sugars, the ring pucker is defined by combinations of torsion angles and the pucker affects the position of substituents.
stereochemistry; restraints; bond lengths; bond angles; protein structure; crystallographic refinement
We developed an analysis pipeline enabling population studies of HARDI data, and applied it to map genetic influences on fiber architecture in 90 twin subjects. We applied tensor-driven 3D fluid registration to HARDI, re-sampling the spherical fiber orientation distribution functions (ODFs) in appropriate Riemannian manifolds, after ODF regularization and sharpening. Fitting structural equation models (SEM) from quantitative genetics, we evaluated genetic influences on the Jensen-Shannon divergence (JSD), a novel measure of fiber spatial coherence, and on the generalized fiber anisotropy (GFA; ) a measure of fiber integrity. With random-effects regression, we mapped regions where diffusion profiles were highly correlated with subjects' intelligence quotient (IQ). Fiber complexity was predominantly under genetic control, and higher in more highly anisotropic regions; the proportion of genetic versus environmental control varied spatially. Our methods show promise for discovering genes affecting fiber connectivity in the brain.
The CCP4 template-restraint library defines restraints for biopolymers, their modifications and ligands that are used in macromolecular structure refinement. JLigand is a graphical editor for generating descriptions of new ligands and covalent linkages.
Biological macromolecules are polymers and therefore the restraints for macromolecular refinement can be subdivided into two sets: restraints that are applied to atoms that all belong to the same monomer and restraints that are associated with the covalent bonds between monomers. The CCP4 template-restraint library contains three types of data entries defining template restraints: descriptions of monomers and their modifications, both used for intramonomer restraints, and descriptions of links for intermonomer restraints. The library provides generic descriptions of modifications and links for protein, DNA and RNA chains, and for some post-translational modifications including glycosylation. Structure-specific template restraints can be defined in a user’s additional restraint library. Here, JLigand, a new CCP4 graphical interface to LibCheck and REFMAC that has been developed to manage the user’s library and generate new monomer entries is described, as well as new entries for links and associated modifications.
macromolecular refinement; restraint library; molecular graphics
Structural studies of large proteins and protein assemblies are a difficult and pressing challenge in molecular biology. Experiments often yield only low-resolution or sparse data which are not sufficient to fully determine atomistic structures. We have developed a general geometry-based algorithm that efficiently samples conformational space under constraints imposed by low-resolution density maps obtained from electron microscopy or X-ray crystallography experiments. A deformable elastic network (DEN) is used to restrain the sampling to prior knowledge of an approximate structure. The DEN restraints dramatically reduce over-fitting, especially at low resolution. Cross-validation is used to optimally weight the structural information and experimental data. Our algorithm is robust even for noise-added density maps and has a large radius of convergence for our test case. The DEN restraints can also be used to enhance reciprocal space simulated annealing refinement.
We consider the inverse electrocardiographic problem of computing epicardial potentials from a body-surface potential map. We study how to improve numerical approximation of the inverse problem when the finite element method is used. Being ill-posed, the inverse problem requires different discretization strategies from its corresponding forward problem. We propose refinement guidelines that specifically address the ill-posedness of the problem. The resulting guidelines necessitate the use of hybrid finite elements composed of tetrahedra and prism elements. Also in order to maintain consistent numerical quality when the inverse problem is discretized into different scales, we propose a new family of regularizers using the variational principle underlying finite element methods. These variational-formed regularizers serve as an alternative to the traditional Tikhonov regularizers, but preserves the L2 norm and thereby achieves consistent regularization in multi-scale simulations. The variational formulation also enables a simple construction of the discrete gradient operator over irregular meshes, which is difficult to define in traditional discretization schemes. We validated our hybrid element technique and the variational regularizers by simulations on a realistic 3D torso/heart model with empirical heart data. Results show that discretization based on our proposed strategies mitigates the ill-conditioning and improves the inverse solution, and that the variational formulation may benefit a broader range of potential-based bioelectric problems.
forward/inverse electrocardiographic problem; hybrid finite element method; variational formulation; regularization
Ribonucleic acid structure determination by NMR spectroscopy relies primarily on local structural restraints provided by 1H-1H NOEs and J-couplings. When employed loosely, these restraints are broadly compatible with A- and B-like helical geometries and give rise to calculated structures that are highly sensitive to the force fields employed during refinement. A survey of recently reported NMR structures reveals significant variations in helical parameters, particularly the major groove width. Although helical parameters observed in high-resolution X-ray crystal structures of isolated A-form RNA helices are sensitive to crystal packing effects, variations among the published X-ray structures are significantly smaller than those observed in NMR structures. Here we show that restraints derived from aromatic 1H-13C residual dipolar couplings (RDCs) and residual chemical shift anisotropies (RCSAs) can overcome NMR restraint and force field deficiencies and afford structures with helical properties similar to those observed in high-resolution X-ray structures.
NMR; RNA Structure Determination; Isotope Labeling; Residual Dipolar Coupling; Residual Chemical Shift Anisotropy
The crystal structure of a Z-DNA hexamer duplex d(CGCGCG)2 determined at ultra high resolution of 0.55 Å and refined without restraints, displays a high degree of regularity and rigidity in its stereochemistry, in contrast to the more flexible B-DNA duplexes. The estimations of standard uncertainties of all individually refined parameters, obtained by full-matrix least-squares optimization, are comparable with values that are typical for small-molecule crystallography. The Z-DNA model generated with ultra high-resolution diffraction data can be used to revise the stereochemical restraints applied in lower resolution refinements. Detailed comparisons of the stereochemical library values with the present accurate Z-DNA parameters, shows in general a good agreement, but also reveals significant discrepancies in the description of guanine-sugar valence angles and in the geometry of the phosphate groups.
Prediction of protein structures from their sequences is still one of the open grand challenges of computational biology. Some approaches to protein structure prediction, especially ab initio ones, rely to some extent on the prediction of residue contact maps. Residue contact map predictions have been assessed at the CASP competition for several years now. Although it has been shown that exact contact maps generally yield correct three-dimensional structures, this is true only at a relatively low resolution (3–4 Å from the native structure). Another known weakness of contact maps is that they are generally predicted ab initio, that is not exploiting information about potential homologues of known structure.
We introduce a new class of distance restraints for protein structures: multi-class distance maps. We show that Cα trace reconstructions based on 4-class native maps are significantly better than those from residue contact maps. We then build two predictors of 4-class maps based on recursive neural networks: one ab initio, or relying on the sequence and on evolutionary information; one template-based, or in which homology information to known structures is provided as a further input. We show that virtually any level of sequence similarity to structural templates (down to less than 10%) yields more accurate 4-class maps than the ab initio predictor. We show that template-based predictions by recursive neural networks are consistently better than the best template and than a number of combinations of the best available templates. We also extract binary residue contact maps at an 8 Å threshold (as per CASP assessment) from the 4-class predictors and show that the template-based version is also more accurate than the best template and consistently better than the ab initio one, down to very low levels of sequence identity to structural templates. Furthermore, we test both ab-initio and template-based 8 Å predictions on the CASP7 targets using a pre-CASP7 PDB, and find that both predictors are state-of-the-art, with the template-based one far outperforming the best CASP7 systems if templates with sequence identity to the query of 10% or better are available. Although this is not the main focus of this paper we also report on reconstructions of Cα traces based on both ab initio and template-based 4-class map predictions, showing that the latter are generally more accurate even when homology is dubious.
Accurate predictions of multi-class maps may provide valuable constraints for improved ab initio and template-based prediction of protein structures, naturally incorporate multiple templates, and yield state-of-the-art binary maps. Predictions of protein structures and 8 Å contact maps based on the multi-class distance map predictors described in this paper are freely available to academic users at the url .
The structure of the lipid-enveloped Sindbis virus has been determined by fitting atomic resolution crystallographic structures of component proteins into an 11-Å resolution cryoelectron microscopy map. The virus has T=4 quasisymmetry elements that are accurately maintained between the external glycoproteins, the transmembrane helical region, and the internal nucleocapsid core. The crystal structure of the E1 glycoprotein was fitted into the cryoelectron microscopy density, in part by using the known carbohydrate positions as restraints. A difference map showed that the E2 glycoprotein was shaped similarly to E1, suggesting a possible common evolutionary origin for these two glycoproteins. The structure shows that the E2 glycoprotein would have to move away from the center of the trimeric spike in order to expose enough viral membrane surface to permit fusion with the cellular membrane during the initial stages of host infection. The well-resolved E1-E2 transmembrane regions form α-helical coiled coils that were consistent with T=4 symmetry. The known structure of the capsid protein was fitted into the density corresponding to the nucleocapsid, revising the structure published earlier.
The anisotropic spin interactions measured for membrane proteins in weakly oriented micelles and in oriented lipid bilayers provide independent and potentially complementary high-resolution restraints for structure determination. Here we show that the membrane protein CHIF adopts a similar structure in lipid micelles and bilayers, allowing the restraints from micelle and bilayer samples to be combined in a complementary fashion to enhance the structural information. Back-calculation and assignment of the NMR spectrum of CHIF in oriented lipid bilayers, from the structure determined in micelles, provides additional restraints for structure determination as well as the global orientation of the protein in the membrane. The combined use of solution and solid-state NMR restraints also affords cross-validation for the structural analysis.
An increasing number of structural studies of large macromolecular complexes, both in X-ray crystallography and electron cryomicroscopy, have resulted in intermediate resolution (5–10 Å) structures. Despite being limited in resolution, significant structural and functional information may be extractable from these maps. To aid in the analysis and annotation of these complexes, we have developed SSEhunter, a tool for the quantitative detection of α-helices and β-sheets. Based on density skeletonization, local geometry calculations and a template-based search, SSEhunter has been tested and validated on a variety of simulated and authentic subnanometer resolution density maps. The result is a robust, user-friendly approach that allows users to quickly visualize, assess and annotate intermediate resolution density maps. Beyond secondary structure element identification, the skeletonization algorithm in SSEhunter provides secondary structure topology, potentially useful in leading to structural models of individual molecular components directly from the density.
The Ambiguous Restraints for Iterative Assignment (ARIA) approach is widely used for NMR structure determination. It is based on simultaneously calculating structures and assigning NOE through an iterative protocol. The final solution consists of a set of conformers and a list of most probable assignments for the input NOE peak list.
ARIA was extended with a series of graphical tools to facilitate a detailed analysis of the intermediate and final results of the ARIA protocol. These additional features provide (i) an interactive contact map, serving as a tool for the analysis of assignments, and (ii) graphical representations of structure quality scores and restraint statistics. The interactive contact map between residues can be clicked to obtain information about the restraints and their contributions. Profiles of quality scores are plotted along the protein sequence, and contact maps provide information of the agreement with the data on a residue pair level.
The graphical tools and outputs described here significantly extend the validation and analysis possibilities of NOE assignments given by ARIA as well as the analysis of the quality of the final structure ensemble. These tools are included in the latest version of ARIA, which is available at . The Web site also contains an installation guide, a user manual and example calculations.
This paper describes an approach for making use of the components of the experimentally determined rotational diffusion tensor derived from NMR relaxation measurements in macomolecular structure determination. The parameters of the rotational diffusion tensor describe the shape and size of the macromolecule or macromolecular complex and are therefore complimentary to traditional NMR restraints. The structural information contained in the rotational diffusion tensor is not dissimilar to that present in the small angle region of the solution X-ray scattering profiles. We demonstrate the utility of rotational diffusion tensor restraints for protein structure refinement using the N-terminal domain of enzyme I (EIN) as an example and validate the results by solution small angle X-ray scattering. We also show how rotational diffusion tensor restraints can be used for docking complexes using the dimeric HIV-1 protease and the EIN-HPr complexes as examples. In the former case, the rotational diffusion tensor restraints are sufficient in their own right to determine the position of one subunit relative to another. In the latter case, rotational diffusion tensor restraints complemented by highly ambiguous distance restraints derived from chemical shift pertubation mapping and a hydrophobic contact potential are sufficient to correctly dock EIN to HPr. In each case, the cluster containing the lowest energy structure corresponds to the correct solution.
A simple rule of thumb based on resolution is not adequate to identify the best treatment of atomic displacements in macromolecular structural models. The choice to use isotropic B factors, anisotropic B factors, TLS models or some combination of the three should be validated through statistical analysis of the model refinement.
In choosing and refining any crystallographic structural model, there is tension between the desire to extract the most detailed information possible and the necessity to describe no more than what is justified by the observed data. A more complex model is not necessarily a better model. Thus, it is important to validate the choice of parameters as well as validating their refined values. One recurring task is to choose the best model for describing the displacement of each atom about its mean position. At atomic resolution one has the option of devoting six model parameters (a ‘thermal ellipsoid’) to describe the displacement of each atom. At medium resolution one typically devotes at most one model parameter per atom to describe the same thing (a ‘B factor’). At very low resolution one cannot justify the use of even one parameter per atom. Furthermore, this aspect of the structure may be described better by an explicit model of bulk displacements, the most common of which is the translation/libration/screw (TLS) formalism, rather than by assigning some number of parameters to each atom individually. One can sidestep this choice between atomic displacement parameters and TLS descriptions by including both treatments in the same model, but this is not always statistically justifiable. The choice of which treatment is best for a particular structure refinement at a particular resolution can be guided by general considerations of the ratio of model parameters to the number of observations and by specific statistics such as the Hamilton R-factor ratio test.
atomic displacements; B factors; TLS models; model parameters
Radiation hybrid (RH) maps are considered to be a tool of choice for fine mapping closely linked loci, considering that the resolution of linkage maps is determined by the number of informative meiosis and recombination events which may require very large mapping populations. Accurately defining the marker order on chromosomes is crucial for correct identification of quantitative trait loci (QTL), haplotype map construction and refinement of candidate gene searches.
A 12 k Radiation hybrid map of bovine chromosome 14 was constructed using 843 single nucleotide polymorphism markers. The resulting map was aligned with the latest version of the bovine assembly (Btau_3.1) as well as other previously published RH maps. The resulting map identified distinct regions on Bovine chromosome 14 where discrepancies between this RH map and the bovine assembly occur. A major region of discrepancy was found near the centromere involving the arrangement and order of the scaffolds from the assembly. The map further confirms previously published conserved synteny blocks with human chromosome 8. As well, it identifies an extra breakpoint and conserved synteny block previously undetected due to lower marker density. This conserved synteny block is in a region where markers between the RH map presented here and the latest sequence assembly are in very good agreement.
The increase of publicly available markers shifts the rate limiting step from marker discovery to the correct identification of their order for further use by the research community. This high resolution map of bovine chromosome 14 will facilitate identification of regions in the sequence assembly where additional information is required to resolve marker ordering.
The analysis of results from CAPRI (Critical Assessment of Predicted Interactions), the first communitywide experiment devoted to protein docking, shows that all successful methods consist of multiple stages. The methods belong to three classes: global methods based on fast Fourier transforms or geometric matching, medium range Monte Carlo methods, and the restraint-guided HADDOCK program. Although these classes of methods require very different amounts of information in addition to the structures of component proteins, they all share the same four computational steps: (1) simplified and/or rigid body search; (2) selecting the region(s) of interest; (3) refinement of docked structures; and (4) selecting the best models. While each method is optimal for a specific class of docking problems, combining computational steps from different methods can improve the reliability and accuracy of results.
Macromolecular structures are modeled by conformational optimization within experimental and knowledge-based restraints. Discrete restraint-based sampling generates high-quality structures within these restraints and facilitates further refinement in a continuous all-atom energy landscape. This approach has been used successfully for protein loop modeling, comparative modeling and electron density fitting in X-ray crystallography.
Here we present a software toolkit (Rappertk) which generalizes discrete restraint-based sampling for use in structural biology. Modular design and multi-layered architecture enables Rappertk to sample conformations of any macromolecule at many levels of detail and within a variety of experimental restraints. Performance against a Cα-tracing benchmark shows that the efficiency has not suffered despite the overhead required by this flexibility. We demonstrate the toolkit's capabilities by building high-quality β-sheets and by introducing restraint-driven sampling. RNA sampling is demonstrated by rebuilding a protein-RNA interface. Ability to construct arbitrary ligands is used in sampling protein-ligand interfaces within electron density. Finally, secondary structure and shape information derived from EM are combined to generate multiple conformations of a protein consistent with the observed density.
Through its modular design and ease of use, Rappertk enables exploration of a wide variety of interesting avenues in structural biology. This toolkit, with illustrative examples, is freely available to academic users from .