phenix.refine is a program within the PHENIX package that supports crystallographic structure refinement against experimental data with a wide range of upper resolution limits using a large repertoire of model parameterizations. This paper presents an overview of the major phenix.refine features, with extensive literature references for readers interested in more detailed discussions of the methods.
phenix.refine is a program within the PHENIX package that supports crystallographic structure refinement against experimental data with a wide range of upper resolution limits using a large repertoire of model parameterizations. It has several automation features and is also highly flexible. Several hundred parameters enable extensive customizations for complex use cases. Multiple user-defined refinement strategies can be applied to specific parts of the model in a single refinement run. An intuitive graphical user interface is available to guide novice users and to assist advanced users in managing refinement projects. X-ray or neutron diffraction data can be used separately or jointly in refinement. phenix.refine is tightly integrated into the PHENIX suite, where it serves as a critical component in automated model building, final structure refinement, structure validation and deposition to the wwPDB. This paper presents an overview of the major phenix.refine features, with extensive literature references for readers interested in more detailed discussions of the methods.
structure refinement; PHENIX; joint X-ray/neutron refinement; maximum likelihood; TLS; simulated annealing; subatomic resolution; real-space refinement; twinning; NCS
The general principles behind the macromolecular crystal structure refinement program REFMAC5 are described.
This paper describes various components of the macromolecular crystallographic refinement program REFMAC5, which is distributed as part of the CCP4 suite. REFMAC5 utilizes different likelihood functions depending on the diffraction data employed (amplitudes or intensities), the presence of twinning and the availability of SAD/SIRAS experimental diffraction data. To ensure chemical and structural integrity of the refined model, REFMAC5 offers several classes of restraints and choices of model parameterization. Reliable models at resolutions at least as low as 4 Å can be achieved thanks to low-resolution refinement tools such as secondary-structure restraints, restraints to known homologous structures, automatic global and local NCS restraints, ‘jelly-body’ restraints and the use of novel long-range restraints on atomic displacement parameters (ADPs) based on the Kullback–Leibler divergence. REFMAC5 additionally offers TLS parameterization and, when high-resolution data are available, fast refinement of anisotropic ADPs. Refinement in the presence of twinning is performed in a fully automated fashion. REFMAC5 is a flexible and highly optimized refinement package that is ideally suited for refinement across the entire resolution spectrum encountered in macromolecular crystallography.
The highly automated PHENIX AutoBuild wizard is described. The procedure can be applied equally well to phases derived from isomorphous/anomalous and molecular-replacement methods.
The PHENIX AutoBuild wizard is a highly automated tool for iterative model building, structure refinement and density modification using RESOLVE model building, RESOLVE statistical density modification and phenix.refine structure refinement. Recent advances in the AutoBuild wizard and phenix.refine include automated detection and application of NCS from models as they are built, extensive model-completion algorithms and automated solvent-molecule picking. Model-completion algorithms in the AutoBuild wizard include loop building, crossovers between chains in different models of a structure and side-chain optimization. The AutoBuild wizard has been applied to a set of 48 structures at resolutions ranging from 1.1 to 3.2 Å, resulting in a mean R factor of 0.24 and a mean free R factor of 0.29. The R factor of the final model is dependent on the quality of the starting electron density and is relatively independent of resolution.
model building; model completion; macromolecular models; Protein Data Bank; structure refinement; PHENIX
Three-dimensional RNA models fitted into crystallographic density maps exhibit pervasive conformational ambiguities, geometric errors, and steric clashes. To address these problems, we present Enumerative Real-space Refinement ASsisted by Electron density under Rosetta (ERRASER), coupled to PHENIX (Python-based Hierarchical Environment for Integrated Xtallography) diffraction-based refinement. On 24 datasets, ERRASER automatically corrects the majority of MolProbity-assessed errors, improves average Rfree factor, resolves functionally important discrepancies in non-canonical structure, and refines low-resolution models to better match higher resolution models.
Low-resolution refinement tools implemented in REFMAC5 are described, including the use of external structural restraints, helical restraints and regularized anisotropic map sharpening.
Two aspects of low-resolution macromolecular crystal structure analysis are considered: (i) the use of reference structures and structural units for provision of structural prior information and (ii) map sharpening in the presence of noise and the effects of Fourier series termination. The generation of interatomic distance restraints by ProSMART and their subsequent application in REFMAC5 is described. It is shown that the use of such external structural information can enhance the reliability of derived atomic models and stabilize refinement. The problem of map sharpening is considered as an inverse deblurring problem and is solved using Tikhonov regularizers. It is demonstrated that this type of map sharpening can automatically produce a map with more structural features whilst maintaining connectivity. Tests show that both of these directions are promising, although more work needs to be performed in order to further exploit structural information and to address the problem of reliable electron-density calculation.
low-resolution refinement; REFMAC5
The deformable elastic network (DEN) method for reciprocal-space crystallographic refinement improves crystal structures, especially at resolutions lower than 3.5 Å. The DEN web service presented here intends to provide structural biologists with access to resources for running computationally intensive DEN refinements.
Deformable elastic network (DEN) restraints have proved to be a powerful tool for refining structures from low-resolution X-ray crystallographic data sets. Unfortunately, optimal refinement using DEN restraints requires extensive calculations and is often hindered by a lack of access to sufficient computational resources. The DEN web service presented here intends to provide structural biologists with access to resources for running computationally intensive DEN refinements in parallel on the Open Science Grid, the US cyberinfrastructure. Access to the grid is provided through a simple and intuitive web interface integrated into the SBGrid Science Portal. Using this portal, refinements combined with full parameter optimization that would take many thousands of hours on standard computational resources can now be completed in several hours. An example of the successful application of DEN restraints to the human Notch1 transcriptional complex using the grid resource, and summaries of all submitted refinements, are presented as justification.
deformable elastic network restraints; low-resolution refinement; DEN refinement
Systematic investigation of a large number of trial rigid-body refinements leads to an optimized multiple-zone protocol with a larger convergence radius.
Rigid-body refinement is the constrained coordinate refinement of one or more groups of atoms that each move (rotate and translate) as a single body. The goal of this work was to establish an automatic procedure for rigid-body refinement which implements a practical compromise between runtime requirements and convergence radius. This has been achieved by analysis of a large number of trial refinements for 12 classes of random rigid-body displacements (that differ in magnitude of introduced errors), using both least-squares and maximum-likelihood target functions. The results of these tests led to a multiple-zone protocol. The final parameterization of this protocol was optimized empirically on the basis of a second large set of test refinements. This multiple-zone protocol is implemented as part of the phenix.refine program.
rigid-body refinement; multiple-zone protocols
A brief summary of the types of restraint defined in refinement dictionaries.
At the resolution available from most macromolecular crystals, the X-ray data alone are insufficient to lead to a chemically reasonable structure, so stereochemical restraints are essential. These usually restrain bond lengths, bond angles, planes and chiral volumes. The definition of these restraints and where the values come from are described. A dictionary entry contains information about the atom types, their connectivity and all the appropriate restraints. Torsion angles are not usually restrained, but they do have optimum values. In the special case of flexible five- and six-membered rings, including pentose and hexose sugars, the ring pucker is defined by combinations of torsion angles and the pucker affects the position of substituents.
stereochemistry; restraints; bond lengths; bond angles; protein structure; crystallographic refinement
Local structural similarity restraints (LSSR) provide a novel method for exploiting NCS or structural similarity to an external target structure. Two examples are given where BUSTER re-refinement of PDB entries with LSSR produces marked improvements, enabling further structural features to be modelled.
Maximum-likelihood X-ray macromolecular structure refinement in BUSTER has been extended with restraints facilitating the exploitation of structural similarity. The similarity can be between two or more chains within the structure being refined, thus favouring NCS, or to a distinct ‘target’ structure that remains fixed during refinement. The local structural similarity restraints (LSSR) approach considers all distances less than 5.5 Å between pairs of atoms in the chain to be restrained. For each, the difference from the distance between the corresponding atoms in the related chain is found. LSSR applies a restraint penalty on each difference. A functional form that reaches a plateau for large differences is used to avoid the restraints distorting parts of the structure that are not similar. Because LSSR are local, there is no need to separate out domains. Some restraint pruning is still necessary, but this has been automated. LSSR have been available to academic users of BUSTER since 2009 with the easy-to-use -autoncs and -target target.pdb options. The use of LSSR is illustrated in the re-refinement of PDB entries 5rnt, where -target enables the correct ligand-binding structure to be found, and 1osg, where -autoncs contributes to the location of an additional copy of the cyclic peptide ligand.
BUSTER; NCS restraints; target-structure restraints; local structural similarity restraints
Macromolecular structures are modeled by conformational optimization within experimental and knowledge-based restraints. Discrete restraint-based sampling generates high-quality structures within these restraints and facilitates further refinement in a continuous all-atom energy landscape. This approach has been used successfully for protein loop modeling, comparative modeling and electron density fitting in X-ray crystallography.
Here we present a software toolkit (Rappertk) which generalizes discrete restraint-based sampling for use in structural biology. Modular design and multi-layered architecture enables Rappertk to sample conformations of any macromolecule at many levels of detail and within a variety of experimental restraints. Performance against a Cα-tracing benchmark shows that the efficiency has not suffered despite the overhead required by this flexibility. We demonstrate the toolkit's capabilities by building high-quality β-sheets and by introducing restraint-driven sampling. RNA sampling is demonstrated by rebuilding a protein-RNA interface. Ability to construct arbitrary ligands is used in sampling protein-ligand interfaces within electron density. Finally, secondary structure and shape information derived from EM are combined to generate multiple conformations of a protein consistent with the observed density.
Through its modular design and ease of use, Rappertk enables exploration of a wide variety of interesting avenues in structural biology. This toolkit, with illustrative examples, is freely available to academic users from .
The implementation of crystallographic structure-refinement procedures that include both X-ray and neutron data (separate or jointly) in the PHENIX system is described.
Approximately 85% of the structures deposited in the Protein Data Bank have been solved using X-ray crystallography, making it the leading method for three-dimensional structure determination of macromolecules. One of the limitations of the method is that the typical data quality (resolution) does not allow the direct determination of H-atom positions. Most hydrogen positions can be inferred from the positions of other atoms and therefore can be readily included into the structure model as a priori knowledge. However, this may not be the case in biologically active sites of macromolecules, where the presence and position of hydrogen is crucial to the enzymatic mechanism. This makes the application of neutron crystallography in biology particularly important, as H atoms can be clearly located in experimental neutron scattering density maps. Without exception, when a neutron structure is determined the corresponding X-ray structure is also known, making it possible to derive the complete structure using both data sets. Here, the implementation of crystallographic structure-refinement procedures that include both X-ray and neutron data (separate or jointly) in the PHENIX system is described.
structure refinement; neutrons; joint X-ray and neutron refinement; PHENIX
The combination of algorithms from the structure-modeling field with those of crystallographic structure determination can broaden the range of templates that are useful for structure determination by the method of molecular replacement. Automated tools in phenix.mr_rosetta simplify the application of these combined approaches by integrating Phenix crystallographic algorithms and Rosetta structure-modeling algorithms and by systematically generating and evaluating models with a combination of these methods. The phenix.mr_rosetta algorithms can be used to automatically determine challenging structures. The approaches used in phenix.mr_rosetta are described along with examples that show roles that structure-modeling can play in molecular replacement.
Molecular replacement; Automation; Macromolecular crystallography; Rosetta; Phenix
The structures of large macromolecular complexes in different functional states can be determined by cryo-electron microscopy, which yields electron density maps of low to intermediate resolutions. The maps can be combined with high-resolution atomic structures of components of the complex, to produce a model for the complex that is more accurate than the formal resolution of the map. To this end, methods have been developed to dock atomic models into density maps rigidly or flexibly, and to refine a docked model so as to optimize the fit of the atomic model into the map. We have developed a new refinement method called YUP.SCX. The electron density map is converted into a component of the potential energy function to which terms for stereochemical restraints and volume exclusion are added. The potential energy function is then minimized (using simulated annealing) to yield a stereochemically-restrained atomic structure that fits into the electron density map optimally. We used this procedure to construct an atomic model of the 70S ribosome in the pre-accommodation state. Although some atoms are displaced by as much as 33 Å, they divide themselves into nearly rigid fragments along natural boundaries with smooth transitions between the fragments.
Electron microscopy; simulated annealing; structural refinement
Acireductone dioxygenase (ARD) from Klebsiella ATCC 8724 is a metalloenzyme that is capable of catalyzing different reactions with the same substrates (acireductone and O2) depending upon the metal bound in the active site. A model for the solution structure of the paramagnetic Ni2+-containing ARD has been refined using residual dipolar couplings (RDCs) measured in two media. Additional dihedral restraints based on chemical shift (TALOS) were included in the refinement, and backbone structure in the vicinity of the active site was modeled from a crystallographic structure of the mouse homolog of ARD. The incorporation of residual dipolar couplings into the structural refinement alters the relative orientations of several structural features significantly, and improves local secondary structure determination. Comparisons between the solution structures obtained with and without RDCs are made, and structural similarities and differences between mouse and bacterial enzymes are described. Finally, the biological significance of these differences is considered.
metalloenzyme; magnetic alignment; nickel enzyme; paramagnetic; homology modeling
The CCP4 template-restraint library defines restraints for biopolymers, their modifications and ligands that are used in macromolecular structure refinement. JLigand is a graphical editor for generating descriptions of new ligands and covalent linkages.
Biological macromolecules are polymers and therefore the restraints for macromolecular refinement can be subdivided into two sets: restraints that are applied to atoms that all belong to the same monomer and restraints that are associated with the covalent bonds between monomers. The CCP4 template-restraint library contains three types of data entries defining template restraints: descriptions of monomers and their modifications, both used for intramonomer restraints, and descriptions of links for intermonomer restraints. The library provides generic descriptions of modifications and links for protein, DNA and RNA chains, and for some post-translational modifications including glycosylation. Structure-specific template restraints can be defined in a user’s additional restraint library. Here, JLigand, a new CCP4 graphical interface to LibCheck and REFMAC that has been developed to manage the user’s library and generate new monomer entries is described, as well as new entries for links and associated modifications.
macromolecular refinement; restraint library; molecular graphics
X-ray diffraction plays a pivotal role in understanding of biological systems by revealing atomic structures of proteins, nucleic acids, and their complexes, with much recent interest in very large assemblies like the ribosome. Since crystals of such large assemblies often diffract weakly (resolution worse than 4 Å), we need methods that work at such low resolution. In macromolecular assemblies, some of the components may be known at high resolution, while others are unknown: current refinement methods fail as they require a high-resolution starting structure for the entire complex1. Determining such complexes, which are often of key biological importance, should be possible in principle as the number of independent diffraction intensities at a resolution below 5 Å generally exceed the number of degrees of freedom. Here we introduce a new method that adds specific information from known homologous structures but allows global and local deformations of these homology models. Our approach uses the observation that local protein structure tends to be conserved as sequence and function evolve. Cross-validation with Rfree determines the optimum deformation and influence of the homology model. For test cases at 3.5 – 5 Å resolution with known structures at high resolution, our method gives significant improvements over conventional refinement in the model coordinate accuracy, the definition of secondary structure, and the quality of electron density maps. For re-refinements of a representative set of 19 low-resolution crystal structures from the PDB, we find similar improvements. Thus, a structure derived from low-resolution diffraction data can have quality similar to a high-resolution structure. Our method is applicable to studying weakly diffracting crystals using X-ray micro-diffraction2 as well as data from new X-ray light sources3. Use of homology information is not restricted to X-ray crystallography and cryo-electron microscopy: as optical imaging advances to sub-nanometer resolution4,5, it can use similar tools.
X-ray crystallography; homology modeling; cross-validation; Rfree value; refinement
The automated building of a protein model into an electron density map remains a challenging problem. In the ARP/wARP approach, model building is facilitated by initially interpreting a density map with free atoms of unknown chemical identity; all structural information for such chemically unassigned atoms is discarded. Here, this is remedied by applying restraints between free atoms, and between free atoms and a partial protein model. These are based on geometric considerations of protein structure and tentative (conditional) assignments for the free atoms. Restraints are applied in the REFMAC5 refinement program and are generated on an ad hoc basis, allowing them to fluctuate from step to step. A large set of experimentally phased and molecular replacement structures showcases individual structures where automated building is improved drastically by the conditional restraints. The concept and implementation we present can also find application in restraining geometries, such as hydrogen bonds, in low-resolution refinement.
The quality of X-ray crystallographic models for biomacromolecules refined from data obtained at high-resolution is assured by the data itself. However, at low-resolution, >3.0 Å, additional information is supplied by a forcefield coupled with an associated refinement protocol. These resulting structures are often of lower quality and thus unsuitable for downstream activities like structure-based drug discovery.
An X-ray crystallography refinement protocol that enhances standard methodology by incorporating energy terms from the HINT (Hydropathic INTeractions) empirical forcefield is described. This protocol was tested by refining synthetic low-resolution structural data derived from 25 diverse high-resolution structures, and referencing the resulting models to these structures. The models were also evaluated with global structural quality metrics, e.g., Ramachandran score and MolProbity clashscore. Three additional structures, for which only low-resolution data are available, were also re-refined with this methodology.
The enhanced refinement protocol is most beneficial for reflection data at resolutions of 3.0 Å or worse. At the low-resolution limit, ≥4.0 Å, the new protocol generated models with Cα positions that have RMSDs that are 0.18 Å more similar to the reference high-resolution structure, Ramachandran scores improved by 13%, and clashscores improved by 51%, all in comparison to models generated with the standard refinement protocol. The hydropathic forcefield terms are at least as effective as Coulombic electrostatic terms in maintaining polar interaction networks, and significantly more effective in maintaining hydrophobic networks, as synthetic resolution is decremented. Even at resolutions ≥4.0 Å, these latter networks are generally native-like, as measured with a hydropathic interactions scoring tool.
Zinc metalloenzymes play an important role in biology. However, due to the limitation of molecular force field energy restraints used in X-ray refinement at medium or low resolutions, the precise geometry of the zinc coordination environment can be difficult to distinguish from ambiguous electron density maps. Due to the difficulties involved in defining accurate force fields for metal ions, the QM/MM (Quantum-Mechanical /Molecular-Mechanical) method provides an attractive and more general alternative for the study and refinement of metalloprotein active sites. Herein we present three examples that indicate that QM/MM based refinement yields a superior description of the crystal structure based on R and Rfree values and on the inspection of the zinc coordination environment. It is concluded that QM/MM refinement is a useful general tool for the improvement of the metal coordination sphere in metalloenzyme active sites.
Application of phenix.model_vs_data to the contents of the Protein Data Bank shows that the vast majority of deposited structures can be automatically analyzed to reproduce the reported quality statistics. However, the small fraction of structures that elude automated re-analysis highlight areas where new software developments can help retain valuable information for future analysis.
phenix.model_vs_data is a high-level command-line tool for the computation of crystallographic model and data statistics, and the evaluation of the fit of the model to data. Analysis of all Protein Data Bank structures that have experimental data available shows that in most cases the reported statistics, in particular R factors, can be reproduced within a few percentage points. However, there are a number of outliers where the recomputed R values are significantly different from those originally reported. The reasons for these discrepancies are discussed.
PHENIX; Protein Data Bank; data quality; model quality; structure validation; R factors
The challenge in protein structure prediction using homology modeling is the lack of reliable methods to refine the low resolution homology models. Unconstrained all-atom molecular dynamics (MD) does not serve well for structure refinement due to its limited conformational search. We have developed and tested the constrained MD method, based on the Generalized Newton-Euler Inverse Mass Operator (GNEIMO) algorithm for protein structure refinement. In this method, the high-frequency degrees of freedom are replaced with hard holonomic constraints and a protein is modeled as a collection of rigid body clusters connected by flexible torsional hinges. This allows larger integration time steps and enhances the conformational search space. In this work, we have demonstrated the use of a constraint free GNEIMO method for protein structure refinement that starts from low-resolution decoy sets derived from homology methods. In the eight proteins with three decoys for each, we observed an improvement of ~2 Å in the RMSD to the known experimental structures of these proteins. The GNEIMO method also showed enrichment in the population density of native-like conformations. In addition, we demonstrated structural refinement using a “Freeze and Thaw” clustering scheme with the GNEIMO framework as a viable tool for enhancing localized conformational search. We have derived a robust protocol based on the GNEIMO replica exchange method for protein structure refinement that can be readily extended to other proteins and possibly applicable for high throughput protein structure refinement.
Constrained MD; GNEIMO; Structure Refinement; Decoys
Methods and resources for obtaining chemically plausible starting models and restraint sets for refinement of ligand complexes are described and some of the potential pitfalls are discussed.
Model building and refinement of complexes between biomacromolecules and small molecules requires sensible starting coordinates as well as the specification of restraint sets for all but the most common non-macromolecular entities. Here, it is described why this is necessary, how it can be accomplished and what pitfalls need to be avoided in order to produce chemically plausible models of the low-molecular-weight entities. A number of programs, servers, databases and other resources that can be of assistance in the process are also discussed.
refinement; model building; ligand complexes; restraint sets; macromolecular crystallography
The PHENIX software for macromolecular structure determination is described.
Macromolecular X-ray crystallography is routinely applied to understand biological processes at a molecular level. However, significant time and effort are still required to solve and complete many of these structures because of the need for manual interpretation of complex numerical data using many software packages and the repeated use of interactive three-dimensional graphics. PHENIX has been developed to provide a comprehensive system for macromolecular crystallographic structure solution with an emphasis on the automation of all procedures. This has relied on the development of algorithms that minimize or eliminate subjective input, the development of algorithms that automate procedures that are traditionally performed by hand and, finally, the development of a framework that allows a tight integration between the algorithms.
PHENIX; Python; macromolecular crystallography; algorithms
The crystal structure of a Z-DNA hexamer duplex d(CGCGCG)2 determined at ultra high resolution of 0.55 Å and refined without restraints, displays a high degree of regularity and rigidity in its stereochemistry, in contrast to the more flexible B-DNA duplexes. The estimations of standard uncertainties of all individually refined parameters, obtained by full-matrix least-squares optimization, are comparable with values that are typical for small-molecule crystallography. The Z-DNA model generated with ultra high-resolution diffraction data can be used to revise the stereochemical restraints applied in lower resolution refinements. Detailed comparisons of the stereochemical library values with the present accurate Z-DNA parameters, shows in general a good agreement, but also reveals significant discrepancies in the description of guanine-sugar valence angles and in the geometry of the phosphate groups.
The majority of previously deposited X-ray structures can be improved by applying current refinement methods.
Structural biology, homology modelling and rational drug design require accurate three-dimensional macromolecular coordinates. However, the coordinates in the Protein Data Bank (PDB) have not all been obtained using the latest experimental and computational methods. In this study a method is presented for automated re-refinement of existing structure models in the PDB. A large-scale benchmark with 16 807 PDB entries showed that they can be improved in terms of fit to the deposited experimental X-ray data as well as in terms of geometric quality. The re-refinement protocol uses TLS models to describe concerted atom movement. The resulting structure models are made available through the PDB_REDO databank (http://www.cmbi.ru.nl/pdb_redo/). Grid computing techniques were used to overcome the computational requirements of this endeavour.
X-ray crystallography; refinement; structure validation; Protein Data Bank; grid computing