The CCP4 template-restraint library defines restraints for biopolymers, their modifications and ligands that are used in macromolecular structure refinement. JLigand is a graphical editor for generating descriptions of new ligands and covalent linkages.
Biological macromolecules are polymers and therefore the restraints for macromolecular refinement can be subdivided into two sets: restraints that are applied to atoms that all belong to the same monomer and restraints that are associated with the covalent bonds between monomers. The CCP4 template-restraint library contains three types of data entries defining template restraints: descriptions of monomers and their modifications, both used for intramonomer restraints, and descriptions of links for intermonomer restraints. The library provides generic descriptions of modifications and links for protein, DNA and RNA chains, and for some post-translational modifications including glycosylation. Structure-specific template restraints can be defined in a user’s additional restraint library. Here, JLigand, a new CCP4 graphical interface to LibCheck and REFMAC that has been developed to manage the user’s library and generate new monomer entries is described, as well as new entries for links and associated modifications.
macromolecular refinement; restraint library; molecular graphics
The general principles behind the macromolecular crystal structure refinement program REFMAC5 are described.
This paper describes various components of the macromolecular crystallographic refinement program REFMAC5, which is distributed as part of the CCP4 suite. REFMAC5 utilizes different likelihood functions depending on the diffraction data employed (amplitudes or intensities), the presence of twinning and the availability of SAD/SIRAS experimental diffraction data. To ensure chemical and structural integrity of the refined model, REFMAC5 offers several classes of restraints and choices of model parameterization. Reliable models at resolutions at least as low as 4 Å can be achieved thanks to low-resolution refinement tools such as secondary-structure restraints, restraints to known homologous structures, automatic global and local NCS restraints, ‘jelly-body’ restraints and the use of novel long-range restraints on atomic displacement parameters (ADPs) based on the Kullback–Leibler divergence. REFMAC5 additionally offers TLS parameterization and, when high-resolution data are available, fast refinement of anisotropic ADPs. Refinement in the presence of twinning is performed in a fully automated fashion. REFMAC5 is a flexible and highly optimized refinement package that is ideally suited for refinement across the entire resolution spectrum encountered in macromolecular crystallography.
Low-resolution refinement tools implemented in REFMAC5 are described, including the use of external structural restraints, helical restraints and regularized anisotropic map sharpening.
Two aspects of low-resolution macromolecular crystal structure analysis are considered: (i) the use of reference structures and structural units for provision of structural prior information and (ii) map sharpening in the presence of noise and the effects of Fourier series termination. The generation of interatomic distance restraints by ProSMART and their subsequent application in REFMAC5 is described. It is shown that the use of such external structural information can enhance the reliability of derived atomic models and stabilize refinement. The problem of map sharpening is considered as an inverse deblurring problem and is solved using Tikhonov regularizers. It is demonstrated that this type of map sharpening can automatically produce a map with more structural features whilst maintaining connectivity. Tests show that both of these directions are promising, although more work needs to be performed in order to further exploit structural information and to address the problem of reliable electron-density calculation.
low-resolution refinement; REFMAC5
A brief summary of the types of restraint defined in refinement dictionaries.
At the resolution available from most macromolecular crystals, the X-ray data alone are insufficient to lead to a chemically reasonable structure, so stereochemical restraints are essential. These usually restrain bond lengths, bond angles, planes and chiral volumes. The definition of these restraints and where the values come from are described. A dictionary entry contains information about the atom types, their connectivity and all the appropriate restraints. Torsion angles are not usually restrained, but they do have optimum values. In the special case of flexible five- and six-membered rings, including pentose and hexose sugars, the ring pucker is defined by combinations of torsion angles and the pucker affects the position of substituents.
stereochemistry; restraints; bond lengths; bond angles; protein structure; crystallographic refinement
The automated building of a protein model into an electron density map remains a challenging problem. In the ARP/wARP approach, model building is facilitated by initially interpreting a density map with free atoms of unknown chemical identity; all structural information for such chemically unassigned atoms is discarded. Here, this is remedied by applying restraints between free atoms, and between free atoms and a partial protein model. These are based on geometric considerations of protein structure and tentative (conditional) assignments for the free atoms. Restraints are applied in the REFMAC5 refinement program and are generated on an ad hoc basis, allowing them to fluctuate from step to step. A large set of experimentally phased and molecular replacement structures showcases individual structures where automated building is improved drastically by the conditional restraints. The concept and implementation we present can also find application in restraining geometries, such as hydrogen bonds, in low-resolution refinement.
SHELXL2013 contains improvements over the previous versions that facilitate the refinement of macromolecular structures against neutron data. This article highlights several features of particular interest for this purpose and includes a list of restraints for H-atom refinement.
Some of the improvements in SHELX2013 make SHELXL convenient to use for refinement of macromolecular structures against neutron data without the support of X-ray data. The new NEUT instruction adjusts the behaviour of the SFAC instruction as well as the default bond lengths of the AFIX instructions. This work presents a protocol on how to use SHELXL for refinement of protein structures against neutron data. It includes restraints extending the Engh & Huber [Acta Cryst. (1991), A47, 392–400] restraints to H atoms and discusses several of the features of SHELXL that make the program particularly useful for the investigation of H atoms with neutron diffraction. SHELXL2013 is already adequate for the refinement of small molecules against neutron data, but there is still room for improvement, like the introduction of chain IDs for the refinement of macromolecular structures.
single-crystal neutron diffraction; macromolecular structure refinement; hydrogen restraints; SHELXL2013
Recent developments in PHENIX are reported that allow the use of reference-model torsion restraints, secondary-structure hydrogen-bond restraints and Ramachandran restraints for improved macromolecular refinement in phenix.refine at low resolution.
Traditional methods for macromolecular refinement often have limited success at low resolution (3.0–3.5 Å or worse), producing models that score poorly on crystallographic and geometric validation criteria. To improve low-resolution refinement, knowledge from macromolecular chemistry and homology was used to add three new coordinate-restraint functions to the refinement program phenix.refine. Firstly, a ‘reference-model’ method uses an identical or homologous higher resolution model to add restraints on torsion angles to the geometric target function. Secondly, automatic restraints for common secondary-structure elements in proteins and nucleic acids were implemented that can help to preserve the secondary-structure geometry, which is often distorted at low resolution. Lastly, we have implemented Ramachandran-based restraints on the backbone torsion angles. In this method, a ϕ,ψ term is added to the geometric target function to minimize a modified Ramachandran landscape that smoothly combines favorable peaks identified from nonredundant high-quality data with unfavorable peaks calculated using a clash-based pseudo-energy function. All three methods show improved MolProbity validation statistics, typically complemented by a lowered R
free and a decreased gap between R
work and R
macromolecular crystallography; low resolution; refinement; automation
The majority of previously deposited X-ray structures can be improved by applying current refinement methods.
Structural biology, homology modelling and rational drug design require accurate three-dimensional macromolecular coordinates. However, the coordinates in the Protein Data Bank (PDB) have not all been obtained using the latest experimental and computational methods. In this study a method is presented for automated re-refinement of existing structure models in the PDB. A large-scale benchmark with 16 807 PDB entries showed that they can be improved in terms of fit to the deposited experimental X-ray data as well as in terms of geometric quality. The re-refinement protocol uses TLS models to describe concerted atom movement. The resulting structure models are made available through the PDB_REDO databank (http://www.cmbi.ru.nl/pdb_redo/). Grid computing techniques were used to overcome the computational requirements of this endeavour.
X-ray crystallography; refinement; structure validation; Protein Data Bank; grid computing
We have developed the program PERMOL for semi-automated homology modeling of proteins. It is based on restrained molecular dynamics using a simulated annealing protocol in torsion angle space. As main restraints defining the optimal local geometry of the structure weighted mean dihedral angles and their standard deviations are used which are calculated with an algorithm described earlier by Döker et al. (1999, BBRC, 257, 348–350). The overall long-range contacts are established via a small number of distance restraints between atoms involved in hydrogen bonds and backbone atoms of conserved residues. Employing the restraints generated by PERMOL three-dimensional structures are obtained using standard molecular dynamics programs such as DYANA or CNS.
To test this modeling approach it has been used for predicting the structure of the histidine-containing phosphocarrier protein HPr from E. coli and the structure of the human peroxisome proliferator activated receptor γ (Ppar γ). The divergence between the modeled HPr and the previously determined X-ray structure was comparable to the divergence between the X-ray structure and the published NMR structure. The modeled structure of Ppar γ was also very close to the previously solved X-ray structure with an RMSD of 0.262 nm for the backbone atoms.
In summary, we present a new method for homology modeling capable of producing high-quality structure models. An advantage of the method is that it can be used in combination with incomplete NMR data to obtain reasonable structure models in accordance with the experimental data.
For B-DNA, the strong linear correlation observed by nuclear magnetic resonance (NMR) between the 31P chemical shifts (δP) and three recurrent internucleotide distances demonstrates the tight coupling between phosphate motions and helicoidal parameters. It allows to translate δP into distance restraints directly exploitable in structural refinement. It even provides a new method for refining DNA oligomers with restraints exclusively inferred from δP. Combined with molecular dynamics in explicit solvent, these restraints lead to a structural and dynamical view of the DNA as detailed as that obtained with conventional and more extensive restraints. Tests with the Jun-Fos oligomer show that this δP-based strategy can provide a simple and straightforward method to capture DNA properties in solution, from routine NMR experiments on unlabeled samples.
Macromolecular structures are modeled by conformational optimization within experimental and knowledge-based restraints. Discrete restraint-based sampling generates high-quality structures within these restraints and facilitates further refinement in a continuous all-atom energy landscape. This approach has been used successfully for protein loop modeling, comparative modeling and electron density fitting in X-ray crystallography.
Here we present a software toolkit (Rappertk) which generalizes discrete restraint-based sampling for use in structural biology. Modular design and multi-layered architecture enables Rappertk to sample conformations of any macromolecule at many levels of detail and within a variety of experimental restraints. Performance against a Cα-tracing benchmark shows that the efficiency has not suffered despite the overhead required by this flexibility. We demonstrate the toolkit's capabilities by building high-quality β-sheets and by introducing restraint-driven sampling. RNA sampling is demonstrated by rebuilding a protein-RNA interface. Ability to construct arbitrary ligands is used in sampling protein-ligand interfaces within electron density. Finally, secondary structure and shape information derived from EM are combined to generate multiple conformations of a protein consistent with the observed density.
Through its modular design and ease of use, Rappertk enables exploration of a wide variety of interesting avenues in structural biology. This toolkit, with illustrative examples, is freely available to academic users from .
Acireductone dioxygenase (ARD) from Klebsiella ATCC 8724 is a metalloenzyme that is capable of catalyzing different reactions with the same substrates (acireductone and O2) depending upon the metal bound in the active site. A model for the solution structure of the paramagnetic Ni2+-containing ARD has been refined using residual dipolar couplings (RDCs) measured in two media. Additional dihedral restraints based on chemical shift (TALOS) were included in the refinement, and backbone structure in the vicinity of the active site was modeled from a crystallographic structure of the mouse homolog of ARD. The incorporation of residual dipolar couplings into the structural refinement alters the relative orientations of several structural features significantly, and improves local secondary structure determination. Comparisons between the solution structures obtained with and without RDCs are made, and structural similarities and differences between mouse and bacterial enzymes are described. Finally, the biological significance of these differences is considered.
metalloenzyme; magnetic alignment; nickel enzyme; paramagnetic; homology modeling
A script was created to allow SHELXL to use the new CDL v.1.2 stereochemical library which defines the target values for main-chain bond lengths and angles as a function of the residue’s ϕ/ψ angles. Test refinements using this script show that the refinement behavior of structures at resolutions even better than 1 Å is substantially enhanced by the use of the new conformation-dependent ideal geometry paradigm.
To utilize a new conformation-dependent backbone-geometry library (CDL) in protein refinements at atomic resolution, a script was written that creates a restraint file for the SHELXL refinement program. It was found that the use of this library allows models to be created that have a substantially better fit to main-chain bond angles and lengths without degrading their fit to the X-ray data even at resolutions near 1 Å. For models at much higher resolution (∼0.7 Å), the refined model for parts adopting single well occupied positions is largely independent of the restraints used, but these structures still showed much smaller r.m.s.d. residuals when assessed with the CDL. Examination of the refinement tests across a wide resolution range from 2.4 to 0.65 Å revealed consistent behavior supporting the use of the CDL as a next-generation restraint library to improve refinement. CDL restraints can be generated using the service at http://pgd.science.oregonstate.edu/cdl_shelxl/.
stereochemical libraries; refinement; conformation-dependent library
Restrained molecular dynamics simulations are a robust, though perhaps underused, tool for the end-stage refinement of biomolecular structures. We demonstrate their utility—using modern simulation protocols, optimized force fields, and inclusion of explicit solvent and mobile counterions—by re-investigating the solution structures of two RNA hairpins that had previously been refined using conventional techniques. The structures, both domain 5 group II intron ribozymes from yeast ai5γ and Pylaiella littoralis, share a nearly identical primary sequence yet the published 3D structures appear quite different. Relatively long restrained MD simulations using the original NMR restraint data identified the presence of a small set of violated distance restraints in one structure and a possibly incorrect trapped bulge nucleotide conformation in the other structure. The removal of problematic distance restraints and the addition of a heating step yielded representative ensembles with very similar 3D structures and much lower pairwise RMSD values. Analysis of ion density during the restrained simulations helped to explain chemical shift perturbation data published previously. These results suggest that restrained MD simulations, with proper caution, can be used to “update” older structures or aid in the refinement of new structures that lack sufficient experimental data to produce a high quality result. Notable cautions include the need for sufficient sampling, awareness of potential force field bias (such as small angle deviations with the current AMBER force fields), and a proper balance between the various restraint weights.
Electronic supplementary material
The online version of this article (doi:10.1007/s10858-012-9642-5) contains supplementary material, which is available to authorized users.
RNA structure; Molecular dynamics; Residual dipolar coupling restraints; Bulge structure; Force fields; Ion binding
The elastic network model (ENM) is a widely used method to study native protein dynamics by normal mode analysis (NMA). In ENM we need information about all pairwise distances, and the distance between contacting atoms is restrained to the native value. Therefore ENM requires O(N2) information to realize its dynamics for a protein consisting of N amino acid residues. To see if (or to what extent) such a large amount of specific structural information is required to realize native protein dynamics, here we introduce a novel model based on only O(N) restraints. This model, named the ‘contact number diffusion’ model (CND), includes specific distance restraints for only local (along the amino acid sequence) atom pairs, and semi-specific non-local restraints imposed on each atom, rather than atom pairs. The semi-specific non-local restraints are defined in terms of the non-local contact numbers of atoms. The CND model exhibits the dynamic characteristics comparable to ENM and more correlated with the explicit-solvent molecular dynamics simulation than ENM. Moreover, unrealistic surface fluctuations often observed in ENM were suppressed in CND. On the other hand, in some ligand-bound structures CND showed larger fluctuations of buried protein atoms interacting with the ligand compared to ENM. In addition, fluctuations from CND and ENM show comparable correlations with the experimental B-factor. Although there are some indications of the importance of some specific non-local interactions, the semi-specific non-local interactions are mostly sufficient for reproducing the native protein dynamics.
X-ray crystallography typically uses a single set of coordinates and B-factors to describe macromolecular conformations. Refinement of multiple copies of the entire structure has been previously used in specific cases as an alternative means of representing structural flexibility. Here, we systematically validate this method using simulated diffraction data, and find ensemble refinement produces better representations of the distributions of atomic positions in the simulated structures than single conformer refinements. Comparison of principal components calculated from the refined ensembles and simulations shows that concerted motions are captured locally, but correlations dissipate over long distances. Ensemble refinement is also used on 50 experimental structures of varying resolution, and leads to decreases in R-free, implying that improvements in the representation of flexibility observed for the simulated structures may apply to real structures. These gains are essentially independent of resolution or data-to-parameter ratio, suggesting even structures at moderate resolution can benefit from ensemble refinement.
An extension is proposed to the rigid-bond description of atomic thermal motion in crystals.
The rigid-bond model [Hirshfeld (1976 ▶). Acta Cryst. A32, 239–244] states that the mean-square displacements of two atoms are equal in the direction of the bond joining them. This criterion is widely used for verification (as intended by Hirshfeld) and also as a restraint in structure refinement as suggested by Rollett [Crystallographic Computing (1970 ▶), edited by F. R. Ahmed et al., pp. 167–181. Copenhagen: Munksgaard]. By reformulating this condition, so that the relative motion of the two atoms is required to be perpendicular to the bond, the number of restraints that can be applied per anisotropic atom is increased from about one to about three. Application of this condition to 1,3-distances in addition to the 1,2-distances means that on average just over six restraints can be applied to the six anisotropic displacement parameters of each atom. This concept is tested against very high resolution data of a small peptide and employed as a restraint for protein refinement at more modest resolution (e.g. 1.7 Å).
rigid-bond test; refinement restraints; anisotropic displacement parameters
The structures of large macromolecular complexes in different functional states can be determined by cryo-electron microscopy, which yields electron density maps of low to intermediate resolutions. The maps can be combined with high-resolution atomic structures of components of the complex, to produce a model for the complex that is more accurate than the formal resolution of the map. To this end, methods have been developed to dock atomic models into density maps rigidly or flexibly, and to refine a docked model so as to optimize the fit of the atomic model into the map. We have developed a new refinement method called YUP.SCX. The electron density map is converted into a component of the potential energy function to which terms for stereochemical restraints and volume exclusion are added. The potential energy function is then minimized (using simulated annealing) to yield a stereochemically-restrained atomic structure that fits into the electron density map optimally. We used this procedure to construct an atomic model of the 70S ribosome in the pre-accommodation state. Although some atoms are displaced by as much as 33 Å, they divide themselves into nearly rigid fragments along natural boundaries with smooth transitions between the fragments.
Electron microscopy; simulated annealing; structural refinement
The quality of model structures generated by contemporary protein structure prediction methods strongly depends on the degree of similarity between the target and available template structures. Therefore, the importance of improving template-based model structures beyond the accuracy available from template information has been emphasized in the structure prediction community. The GalaxyRefine web server, freely available at http://galaxy.seoklab.org/refine, is based on a refinement method that has been successfully tested in CASP10. The method first rebuilds side chains and performs side-chain repacking and subsequent overall structure relaxation by molecular dynamics simulation. According to the CASP10 assessment, this method showed the best performance in improving the local structure quality. The method can improve both global and local structure quality on average, when used for refining the models generated by state-of-the-art protein structure prediction servers.
The quality of protein structures determined by nuclear magnetic resonance (NMR) spectroscopy is contingent on the number and quality of experimentally-derived resonance assignments, distance and angular restraints. Two key features of protein NMR data have posed challenges for the routine and automated structure determination of small to medium sized proteins; (1) spectral resolution – especially of crowded nuclear Overhauser effect spectroscopy (NOESY) spectra, and (2) the reliance on a continuous network of weak scalar couplings as part of most common assignment protocols. In order to facilitate NMR structure determination, we developed a semi-automated strategy that utilizes non-uniform sampling (NUS) and multidimensional decomposition (MDD) for optimal data collection and processing of selected, high resolution multidimensional NMR experiments, combined it with an ABACUS protocol for sequential and side chain resonance assignments, and streamlined this procedure to execute structure and refinement calculations in CYANA and CNS, respectively. Two graphical user interfaces (GUIs) were developed to facilitate efficient analysis and compilation of the data and to guide automated structure determination. This integrated method was implemented and refined on over 30 high quality structures of proteins ranging from 5.5 to 16.5 kDa in size.
NMR data collection and processing; Chemical shift assignment; Protein structure determination and refinement; Structure validation
Protein Inter-atomic Distance Distributions (PIDD) is a dedicated database and structural bio-informatics system for distance based protein modeling. The database is developed to host and analyze the statistical data for protein inter-atomic distances based on their distributions in databases of known protein structures such as in the Protein Data Bank (PDB). PIDD is capable of generating, caching, and displaying the statistical distributions of the distances of various types and ranges. The collected information can be used to extract geometric restraints or mean-force potentials for protein structure determination including nuclear magnetic resonance structure determination and comparative model refinement. PIDD is supported with a friendly designed web interface so that users can easily specify the distance types and ranges, and retrieve, visualize or download the distributions of the distances as they desire. PIDD is freely accessible at
Most current crystallographic structure refinements augment the diffraction data with a priori information consisting of bond, angle, dihedral, planarity restraints and atomic repulsion based on the Pauli exclusion principle. Yet, electrostatics and van der Waals attraction are physical forces that provide additional a priori information. Here we assess the inclusion of electrostatics for the force field used for all-atom (including hydrogen) joint neutron/X-ray refinement. Two DNA and a protein crystal structure were refined against joint neutron/X-ray diffraction data sets using force fields without electrostatics or with electrostatics. Hydrogen bond orientation/geometry favors the inclusion of electrostatics. Refinement of Z-DNA with electrostatics leads to a hypothesis for the entropic stabilization of Z-DNA that may partly explain the thermodynamics of converting the B form of DNA to its Z form. Thus, inclusion of electrostatics assists joint neutron/X-ray refinements, especially for placing and orienting hydrogen atoms.
I-TASSER is an automated pipeline for protein tertiary structure prediction using multiple threading alignments and iterative structure assembly simulations. In CASP9 experiments, two new algorithms, QUARK and FG-MD, were added to the I-TASSER pipeline for improving the structural modeling accuracy. QUARK is a de novo structure prediction algorithm used for structure modeling of proteins that lack detectable template structures. For distantly homologous targets, QUARK models are found useful as a reference structure for selecting good threading alignments and guiding the I-TASSER structure assembly simulations. FG-MD is an atomic-level structural refinement program that uses structural fragments collected from the PDB structures to guide molecular dynamics simulation and improve the local structure of predicted model, including hydrogen-bonding networks, torsion angles and steric clashes. Despite considerable progress in both the template-based and template-free structure modeling, significant improvements on protein target classification, domain parsing, model selection, and ab initio folding of beta-proteins are still needed to further improve the I-TASSER pipeline.
protein structure prediction; threading; contact prediction; ab initio folding; CASP
When refining the fit of component atomic structures into electron microscopic reconstructions, use of a resolution-dependent atomic density function makes it possible to jointly optimize the atomic model and imaging parameters of the microscope. Atomic density is calculated by one-dimensional Fourier transform of atomic form factors convoluted with a microscope envelope correction and a low-pass filter, allowing refinement of imaging parameters such as resolution, by optimizing the agreement of calculated and experimental maps. A similar approach allows refinement of atomic displacement parameters, providing indications of molecular flexibility even at low resolution. A modest improvement in atomic coordinates is possible following optimization of these additional parameters. Methods have been implemented in a Python program that can be used in stand-alone mode for rigid-group refinement, or embedded in other optimizers for flexible refinement with stereochemical restraints. The approach is demonstrated with refinements of virus and chaperonin structures at resolutions of 9 through 4.5 Å, representing regimes where rigid-group and fully flexible parameterizations are appropriate. Through comparisons to known crystal structures, flexible fitting by RSRef is shown to be an improvement relative to other methods and to generate models with all-atom rms accuracies of 1.5–2.5 Å at resolutions of 4.5–6 Å.
Fitting; Optimization; Structure; Resolution; Restraint; B-factor; Flexibility
Differences and quotients can be defined using Friedel pairs of reflections and applied in refinement to enable absolute structure to be determined precisely even for light atom crystal structures.
Several methods for absolute structure refinement were tested using single-crystal X-ray diffraction data collected using Cu Kα radiation for 23 crystals with no element heavier than oxygen: conventional refinement using an inversion twin model, estimation using intensity quotients in SHELXL2012, estimation using Bayesian methods in PLATON, estimation using restraints consisting of numerical intensity differences in CRYSTALS and estimation using differences and quotients in TOPAS-Academic where both quantities were coded in terms of other structural parameters and implemented as restraints. The conventional refinement approach yielded accurate values of the Flack parameter, but with standard uncertainties ranging from 0.15 to 0.77. The other methods also yielded accurate values of the Flack parameter, but with much higher precision. Absolute structure was established in all cases, even for a hydrocarbon. The procedures in which restraints are coded explicitly in terms of other structural parameters enable the Flack parameter to correlate with these other parameters, so that it is determined along with those parameters during refinement.
intensity quotients; absolute structure refinement