The CCP4 template-restraint library defines restraints for biopolymers, their modifications and ligands that are used in macromolecular structure refinement. JLigand is a graphical editor for generating descriptions of new ligands and covalent linkages.
Biological macromolecules are polymers and therefore the restraints for macromolecular refinement can be subdivided into two sets: restraints that are applied to atoms that all belong to the same monomer and restraints that are associated with the covalent bonds between monomers. The CCP4 template-restraint library contains three types of data entries defining template restraints: descriptions of monomers and their modifications, both used for intramonomer restraints, and descriptions of links for intermonomer restraints. The library provides generic descriptions of modifications and links for protein, DNA and RNA chains, and for some post-translational modifications including glycosylation. Structure-specific template restraints can be defined in a user’s additional restraint library. Here, JLigand, a new CCP4 graphical interface to LibCheck and REFMAC that has been developed to manage the user’s library and generate new monomer entries is described, as well as new entries for links and associated modifications.
macromolecular refinement; restraint library; molecular graphics
The general principles behind the macromolecular crystal structure refinement program REFMAC5 are described.
This paper describes various components of the macromolecular crystallographic refinement program REFMAC5, which is distributed as part of the CCP4 suite. REFMAC5 utilizes different likelihood functions depending on the diffraction data employed (amplitudes or intensities), the presence of twinning and the availability of SAD/SIRAS experimental diffraction data. To ensure chemical and structural integrity of the refined model, REFMAC5 offers several classes of restraints and choices of model parameterization. Reliable models at resolutions at least as low as 4 Å can be achieved thanks to low-resolution refinement tools such as secondary-structure restraints, restraints to known homologous structures, automatic global and local NCS restraints, ‘jelly-body’ restraints and the use of novel long-range restraints on atomic displacement parameters (ADPs) based on the Kullback–Leibler divergence. REFMAC5 additionally offers TLS parameterization and, when high-resolution data are available, fast refinement of anisotropic ADPs. Refinement in the presence of twinning is performed in a fully automated fashion. REFMAC5 is a flexible and highly optimized refinement package that is ideally suited for refinement across the entire resolution spectrum encountered in macromolecular crystallography.
A brief summary of the types of restraint defined in refinement dictionaries.
At the resolution available from most macromolecular crystals, the X-ray data alone are insufficient to lead to a chemically reasonable structure, so stereochemical restraints are essential. These usually restrain bond lengths, bond angles, planes and chiral volumes. The definition of these restraints and where the values come from are described. A dictionary entry contains information about the atom types, their connectivity and all the appropriate restraints. Torsion angles are not usually restrained, but they do have optimum values. In the special case of flexible five- and six-membered rings, including pentose and hexose sugars, the ring pucker is defined by combinations of torsion angles and the pucker affects the position of substituents.
stereochemistry; restraints; bond lengths; bond angles; protein structure; crystallographic refinement
Low-resolution refinement tools implemented in REFMAC5 are described, including the use of external structural restraints, helical restraints and regularized anisotropic map sharpening.
Two aspects of low-resolution macromolecular crystal structure analysis are considered: (i) the use of reference structures and structural units for provision of structural prior information and (ii) map sharpening in the presence of noise and the effects of Fourier series termination. The generation of interatomic distance restraints by ProSMART and their subsequent application in REFMAC5 is described. It is shown that the use of such external structural information can enhance the reliability of derived atomic models and stabilize refinement. The problem of map sharpening is considered as an inverse deblurring problem and is solved using Tikhonov regularizers. It is demonstrated that this type of map sharpening can automatically produce a map with more structural features whilst maintaining connectivity. Tests show that both of these directions are promising, although more work needs to be performed in order to further exploit structural information and to address the problem of reliable electron-density calculation.
low-resolution refinement; REFMAC5
The automated building of a protein model into an electron density map remains a challenging problem. In the ARP/wARP approach, model building is facilitated by initially interpreting a density map with free atoms of unknown chemical identity; all structural information for such chemically unassigned atoms is discarded. Here, this is remedied by applying restraints between free atoms, and between free atoms and a partial protein model. These are based on geometric considerations of protein structure and tentative (conditional) assignments for the free atoms. Restraints are applied in the REFMAC5 refinement program and are generated on an ad hoc basis, allowing them to fluctuate from step to step. A large set of experimentally phased and molecular replacement structures showcases individual structures where automated building is improved drastically by the conditional restraints. The concept and implementation we present can also find application in restraining geometries, such as hydrogen bonds, in low-resolution refinement.
Recent developments in PHENIX are reported that allow the use of reference-model torsion restraints, secondary-structure hydrogen-bond restraints and Ramachandran restraints for improved macromolecular refinement in phenix.refine at low resolution.
Traditional methods for macromolecular refinement often have limited success at low resolution (3.0–3.5 Å or worse), producing models that score poorly on crystallographic and geometric validation criteria. To improve low-resolution refinement, knowledge from macromolecular chemistry and homology was used to add three new coordinate-restraint functions to the refinement program phenix.refine. Firstly, a ‘reference-model’ method uses an identical or homologous higher resolution model to add restraints on torsion angles to the geometric target function. Secondly, automatic restraints for common secondary-structure elements in proteins and nucleic acids were implemented that can help to preserve the secondary-structure geometry, which is often distorted at low resolution. Lastly, we have implemented Ramachandran-based restraints on the backbone torsion angles. In this method, a ϕ,ψ term is added to the geometric target function to minimize a modified Ramachandran landscape that smoothly combines favorable peaks identified from nonredundant high-quality data with unfavorable peaks calculated using a clash-based pseudo-energy function. All three methods show improved MolProbity validation statistics, typically complemented by a lowered R
free and a decreased gap between R
work and R
macromolecular crystallography; low resolution; refinement; automation
We have developed the program PERMOL for semi-automated homology modeling of proteins. It is based on restrained molecular dynamics using a simulated annealing protocol in torsion angle space. As main restraints defining the optimal local geometry of the structure weighted mean dihedral angles and their standard deviations are used which are calculated with an algorithm described earlier by Döker et al. (1999, BBRC, 257, 348–350). The overall long-range contacts are established via a small number of distance restraints between atoms involved in hydrogen bonds and backbone atoms of conserved residues. Employing the restraints generated by PERMOL three-dimensional structures are obtained using standard molecular dynamics programs such as DYANA or CNS.
To test this modeling approach it has been used for predicting the structure of the histidine-containing phosphocarrier protein HPr from E. coli and the structure of the human peroxisome proliferator activated receptor γ (Ppar γ). The divergence between the modeled HPr and the previously determined X-ray structure was comparable to the divergence between the X-ray structure and the published NMR structure. The modeled structure of Ppar γ was also very close to the previously solved X-ray structure with an RMSD of 0.262 nm for the backbone atoms.
In summary, we present a new method for homology modeling capable of producing high-quality structure models. An advantage of the method is that it can be used in combination with incomplete NMR data to obtain reasonable structure models in accordance with the experimental data.
The majority of previously deposited X-ray structures can be improved by applying current refinement methods.
Structural biology, homology modelling and rational drug design require accurate three-dimensional macromolecular coordinates. However, the coordinates in the Protein Data Bank (PDB) have not all been obtained using the latest experimental and computational methods. In this study a method is presented for automated re-refinement of existing structure models in the PDB. A large-scale benchmark with 16 807 PDB entries showed that they can be improved in terms of fit to the deposited experimental X-ray data as well as in terms of geometric quality. The re-refinement protocol uses TLS models to describe concerted atom movement. The resulting structure models are made available through the PDB_REDO databank (http://www.cmbi.ru.nl/pdb_redo/). Grid computing techniques were used to overcome the computational requirements of this endeavour.
X-ray crystallography; refinement; structure validation; Protein Data Bank; grid computing
For B-DNA, the strong linear correlation observed by nuclear magnetic resonance (NMR) between the 31P chemical shifts (δP) and three recurrent internucleotide distances demonstrates the tight coupling between phosphate motions and helicoidal parameters. It allows to translate δP into distance restraints directly exploitable in structural refinement. It even provides a new method for refining DNA oligomers with restraints exclusively inferred from δP. Combined with molecular dynamics in explicit solvent, these restraints lead to a structural and dynamical view of the DNA as detailed as that obtained with conventional and more extensive restraints. Tests with the Jun-Fos oligomer show that this δP-based strategy can provide a simple and straightforward method to capture DNA properties in solution, from routine NMR experiments on unlabeled samples.
Macromolecular structures are modeled by conformational optimization within experimental and knowledge-based restraints. Discrete restraint-based sampling generates high-quality structures within these restraints and facilitates further refinement in a continuous all-atom energy landscape. This approach has been used successfully for protein loop modeling, comparative modeling and electron density fitting in X-ray crystallography.
Here we present a software toolkit (Rappertk) which generalizes discrete restraint-based sampling for use in structural biology. Modular design and multi-layered architecture enables Rappertk to sample conformations of any macromolecule at many levels of detail and within a variety of experimental restraints. Performance against a Cα-tracing benchmark shows that the efficiency has not suffered despite the overhead required by this flexibility. We demonstrate the toolkit's capabilities by building high-quality β-sheets and by introducing restraint-driven sampling. RNA sampling is demonstrated by rebuilding a protein-RNA interface. Ability to construct arbitrary ligands is used in sampling protein-ligand interfaces within electron density. Finally, secondary structure and shape information derived from EM are combined to generate multiple conformations of a protein consistent with the observed density.
Through its modular design and ease of use, Rappertk enables exploration of a wide variety of interesting avenues in structural biology. This toolkit, with illustrative examples, is freely available to academic users from .
A script was created to allow SHELXL to use the new CDL v.1.2 stereochemical library which defines the target values for main-chain bond lengths and angles as a function of the residue’s ϕ/ψ angles. Test refinements using this script show that the refinement behavior of structures at resolutions even better than 1 Å is substantially enhanced by the use of the new conformation-dependent ideal geometry paradigm.
To utilize a new conformation-dependent backbone-geometry library (CDL) in protein refinements at atomic resolution, a script was written that creates a restraint file for the SHELXL refinement program. It was found that the use of this library allows models to be created that have a substantially better fit to main-chain bond angles and lengths without degrading their fit to the X-ray data even at resolutions near 1 Å. For models at much higher resolution (∼0.7 Å), the refined model for parts adopting single well occupied positions is largely independent of the restraints used, but these structures still showed much smaller r.m.s.d. residuals when assessed with the CDL. Examination of the refinement tests across a wide resolution range from 2.4 to 0.65 Å revealed consistent behavior supporting the use of the CDL as a next-generation restraint library to improve refinement. CDL restraints can be generated using the service at http://pgd.science.oregonstate.edu/cdl_shelxl/.
stereochemical libraries; refinement; conformation-dependent library
X-ray crystallography typically uses a single set of coordinates and B-factors to describe macromolecular conformations. Refinement of multiple copies of the entire structure has been previously used in specific cases as an alternative means of representing structural flexibility. Here, we systematically validate this method using simulated diffraction data, and find ensemble refinement produces better representations of the distributions of atomic positions in the simulated structures than single conformer refinements. Comparison of principal components calculated from the refined ensembles and simulations shows that concerted motions are captured locally, but correlations dissipate over long distances. Ensemble refinement is also used on 50 experimental structures of varying resolution, and leads to decreases in R-free, implying that improvements in the representation of flexibility observed for the simulated structures may apply to real structures. These gains are essentially independent of resolution or data-to-parameter ratio, suggesting even structures at moderate resolution can benefit from ensemble refinement.
The structures of large macromolecular complexes in different functional states can be determined by cryo-electron microscopy, which yields electron density maps of low to intermediate resolutions. The maps can be combined with high-resolution atomic structures of components of the complex, to produce a model for the complex that is more accurate than the formal resolution of the map. To this end, methods have been developed to dock atomic models into density maps rigidly or flexibly, and to refine a docked model so as to optimize the fit of the atomic model into the map. We have developed a new refinement method called YUP.SCX. The electron density map is converted into a component of the potential energy function to which terms for stereochemical restraints and volume exclusion are added. The potential energy function is then minimized (using simulated annealing) to yield a stereochemically-restrained atomic structure that fits into the electron density map optimally. We used this procedure to construct an atomic model of the 70S ribosome in the pre-accommodation state. Although some atoms are displaced by as much as 33 Å, they divide themselves into nearly rigid fragments along natural boundaries with smooth transitions between the fragments.
Electron microscopy; simulated annealing; structural refinement
Acireductone dioxygenase (ARD) from Klebsiella ATCC 8724 is a metalloenzyme that is capable of catalyzing different reactions with the same substrates (acireductone and O2) depending upon the metal bound in the active site. A model for the solution structure of the paramagnetic Ni2+-containing ARD has been refined using residual dipolar couplings (RDCs) measured in two media. Additional dihedral restraints based on chemical shift (TALOS) were included in the refinement, and backbone structure in the vicinity of the active site was modeled from a crystallographic structure of the mouse homolog of ARD. The incorporation of residual dipolar couplings into the structural refinement alters the relative orientations of several structural features significantly, and improves local secondary structure determination. Comparisons between the solution structures obtained with and without RDCs are made, and structural similarities and differences between mouse and bacterial enzymes are described. Finally, the biological significance of these differences is considered.
metalloenzyme; magnetic alignment; nickel enzyme; paramagnetic; homology modeling
Protein Inter-atomic Distance Distributions (PIDD) is a dedicated database and structural bio-informatics system for distance based protein modeling. The database is developed to host and analyze the statistical data for protein inter-atomic distances based on their distributions in databases of known protein structures such as in the Protein Data Bank (PDB). PIDD is capable of generating, caching, and displaying the statistical distributions of the distances of various types and ranges. The collected information can be used to extract geometric restraints or mean-force potentials for protein structure determination including nuclear magnetic resonance structure determination and comparative model refinement. PIDD is supported with a friendly designed web interface so that users can easily specify the distance types and ranges, and retrieve, visualize or download the distributions of the distances as they desire. PIDD is freely accessible at
The quality of model structures generated by contemporary protein structure prediction methods strongly depends on the degree of similarity between the target and available template structures. Therefore, the importance of improving template-based model structures beyond the accuracy available from template information has been emphasized in the structure prediction community. The GalaxyRefine web server, freely available at http://galaxy.seoklab.org/refine, is based on a refinement method that has been successfully tested in CASP10. The method first rebuilds side chains and performs side-chain repacking and subsequent overall structure relaxation by molecular dynamics simulation. According to the CASP10 assessment, this method showed the best performance in improving the local structure quality. The method can improve both global and local structure quality on average, when used for refining the models generated by state-of-the-art protein structure prediction servers.
Restrained molecular dynamics simulations are a robust, though perhaps underused, tool for the end-stage refinement of biomolecular structures. We demonstrate their utility—using modern simulation protocols, optimized force fields, and inclusion of explicit solvent and mobile counterions—by re-investigating the solution structures of two RNA hairpins that had previously been refined using conventional techniques. The structures, both domain 5 group II intron ribozymes from yeast ai5γ and Pylaiella littoralis, share a nearly identical primary sequence yet the published 3D structures appear quite different. Relatively long restrained MD simulations using the original NMR restraint data identified the presence of a small set of violated distance restraints in one structure and a possibly incorrect trapped bulge nucleotide conformation in the other structure. The removal of problematic distance restraints and the addition of a heating step yielded representative ensembles with very similar 3D structures and much lower pairwise RMSD values. Analysis of ion density during the restrained simulations helped to explain chemical shift perturbation data published previously. These results suggest that restrained MD simulations, with proper caution, can be used to “update” older structures or aid in the refinement of new structures that lack sufficient experimental data to produce a high quality result. Notable cautions include the need for sufficient sampling, awareness of potential force field bias (such as small angle deviations with the current AMBER force fields), and a proper balance between the various restraint weights.
Electronic supplementary material
The online version of this article (doi:10.1007/s10858-012-9642-5) contains supplementary material, which is available to authorized users.
RNA structure; Molecular dynamics; Residual dipolar coupling restraints; Bulge structure; Force fields; Ion binding
Differences and quotients can be defined using Friedel pairs of reflections and applied in refinement to enable absolute structure to be determined precisely even for light atom crystal structures.
Several methods for absolute structure refinement were tested using single-crystal X-ray diffraction data collected using Cu Kα radiation for 23 crystals with no element heavier than oxygen: conventional refinement using an inversion twin model, estimation using intensity quotients in SHELXL2012, estimation using Bayesian methods in PLATON, estimation using restraints consisting of numerical intensity differences in CRYSTALS and estimation using differences and quotients in TOPAS-Academic where both quantities were coded in terms of other structural parameters and implemented as restraints. The conventional refinement approach yielded accurate values of the Flack parameter, but with standard uncertainties ranging from 0.15 to 0.77. The other methods also yielded accurate values of the Flack parameter, but with much higher precision. Absolute structure was established in all cases, even for a hydrocarbon. The procedures in which restraints are coded explicitly in terms of other structural parameters enable the Flack parameter to correlate with these other parameters, so that it is determined along with those parameters during refinement.
intensity quotients; absolute structure refinement
An evaluation of validation and real-space intervention possibilities for improving existing automated (re-)refinement methods.
The deposition of X-ray data along with the customary structural models defining PDB entries makes it possible to apply large-scale re-refinement protocols to these entries, thus giving users the benefit of improvements in X-ray methods that have occurred since the structure was deposited. Automated gradient refinement is an effective method to achieve this goal, but real-space intervention is most often required in order to adequately address problems detected by structure-validation software. In order to improve the existing protocol, automated re-refinement was combined with structure validation and difference-density peak analysis to produce a catalogue of problems in PDB entries that are amenable to automatic correction. It is shown that re-refinement can be effective in producing improvements, which are often associated with the systematic use of the TLS parameterization of B factors, even for relatively new and high-resolution PDB entries, while the accompanying manual or semi-manual map analysis and fitting steps show good prospects for eventual automation. It is proposed that the potential for simultaneous improvements in methods and in re-refinement results be further encouraged by broadening the scope of depositions to include refinement metadata and ultimately primary rather than reduced X-ray data.
Torsion-angle sampling, as implemented in the Protein Local Optimization Program (PLOP), is used to generate multiple structurally variable single-conformer models which are in good agreement with X-ray data. An ensemble-refinement approach to differentiate between positional uncertainty and conformational heterogeneity is proposed.
Modeling structural variability is critical for understanding protein function and for modeling reliable targets for in silico docking experiments. Because of the time-intensive nature of manual X-ray crystallographic refinement, automated refinement methods that thoroughly explore conformational space are essential for the systematic construction of structurally variable models. Using five proteins spanning resolutions of 1.0–2.8 Å, it is demonstrated how torsion-angle sampling of backbone and side-chain libraries with filtering against both the chemical energy, using a modern effective potential, and the electron density, coupled with minimization of a reciprocal-space X-ray target function, can generate multiple structurally variable models which fit the X-ray data well. Torsion-angle sampling as implemented in the Protein Local Optimization Program (PLOP) has been used in this work. Models with the lowest R
free values are obtained when electrostatic and implicit solvation terms are included in the effective potential. HIV-1 protease, calmodulin and SUMO-conjugating enzyme illustrate how variability in the ensemble of structures captures structural variability that is observed across multiple crystal structures and is linked to functional flexibility at hinge regions and binding interfaces. An ensemble-refinement procedure is proposed to differentiate between variability that is a consequence of physical conformational heterogeneity and that which reflects uncertainty in the atomic coordinates.
automated refinement; multiple models; conformational heterogeneity; torsion-angle sampling
The biopolymer chain elasticity (BCE) approach and the new molecular modelling methodology presented previously are used to predict the tri- dimensional backbones of DNA and RNA hairpin loops. The structures of eight remarkably stable DNA or RNA hairpin molecules closed by a mispair, recently determined in solution by NMR and deposited in the PDB, are shown to verify the predicted trajectories by an analysis automated for large numbers of PDB conformations. They encompass: one DNA tetraloop, -GTTA-; three DNA triloops, -AAA- or -GCA-; and four RNA tetraloops, -UUCG-. Folding generates no distortions and bond lengths and bond angles of main atoms of the sugar–phosphate backbone are well restored upon energy refinement. Three different methods (superpositions, distance of main chain atoms to the elastic line and RMSd) are used to show a very good agreement between the trajectories of sugar–phosphate backbones and between entire molecules of theoretical models and of PDB conformations. The geometry of end conditions imposed by the stem is sufficient to dictate the different characteristic DNA or RNA folding shapes. The reduced angular space, consisting of the new parameter, angle Ω, together with the χ angle offers a simple, coherent and quantitative description of hairpin loops.
We have obtained the 13Cα chemical shift tensors for each amino acid in the protein GB1. We then developed a CST force field and incorporated this into the Xplor-NIH structure determination program. GB1 structures obtained by using CST restraints had improved precision over those obtained in the absence of CST restraints, and were also more accurate. When combined with isotropic chemical shifts, distance and vector angle restraints, the root-mean squared error with respect to existing x-ray structures was better than ~1.0 Å. These results are of broad general interest since they show that chemical shift tensors can be used in protein structure refinement, improving both structural accuracy and precision, opening up the way to accurate de novo structure determination.
Ribonucleic acid structure determination by NMR spectroscopy relies primarily on local structural restraints provided by 1H-1H NOEs and J-couplings. When employed loosely, these restraints are broadly compatible with A- and B-like helical geometries and give rise to calculated structures that are highly sensitive to the force fields employed during refinement. A survey of recently reported NMR structures reveals significant variations in helical parameters, particularly the major groove width. Although helical parameters observed in high-resolution X-ray crystal structures of isolated A-form RNA helices are sensitive to crystal packing effects, variations among the published X-ray structures are significantly smaller than those observed in NMR structures. Here we show that restraints derived from aromatic 1H-13C residual dipolar couplings (RDCs) and residual chemical shift anisotropies (RCSAs) can overcome NMR restraint and force field deficiencies and afford structures with helical properties similar to those observed in high-resolution X-ray structures.
NMR; RNA Structure Determination; Isotope Labeling; Residual Dipolar Coupling; Residual Chemical Shift Anisotropy
An extension is proposed to the rigid-bond description of atomic thermal motion in crystals.
The rigid-bond model [Hirshfeld (1976 ▶). Acta Cryst. A32, 239–244] states that the mean-square displacements of two atoms are equal in the direction of the bond joining them. This criterion is widely used for verification (as intended by Hirshfeld) and also as a restraint in structure refinement as suggested by Rollett [Crystallographic Computing (1970 ▶), edited by F. R. Ahmed et al., pp. 167–181. Copenhagen: Munksgaard]. By reformulating this condition, so that the relative motion of the two atoms is required to be perpendicular to the bond, the number of restraints that can be applied per anisotropic atom is increased from about one to about three. Application of this condition to 1,3-distances in addition to the 1,2-distances means that on average just over six restraints can be applied to the six anisotropic displacement parameters of each atom. This concept is tested against very high resolution data of a small peptide and employed as a restraint for protein refinement at more modest resolution (e.g. 1.7 Å).
rigid-bond test; refinement restraints; anisotropic displacement parameters
A robust method for determining bulk-solvent and anisotropic scaling parameters for macromolecular refinement is described. A maximum-likelihood target function for determination of flat bulk-solvent model parameters and overall anisotropic scale factor is also proposed.
A reliable method for the determination of bulk-solvent model parameters and an overall anisotropic scale factor is of increasing importance as structure determination becomes more automated. Current protocols require the manual inspection of refinement results in order to detect errors in the calculation of these parameters. Here, a robust method for determining bulk-solvent and anisotropic scaling parameters in macromolecular refinement is described. The implementation of a maximum-likelihood target function for determining the same parameters is also discussed. The formulas and corresponding derivatives of the likelihood function with respect to the solvent parameters and the components of anisotropic scale matrix are presented. These algorithms are implemented in the CCTBX bulk-solvent correction and scaling module.
bulk-solvent correction; anisotropic scaling