Recent developments in PHENIX are reported that allow the use of reference-model torsion restraints, secondary-structure hydrogen-bond restraints and Ramachandran restraints for improved macromolecular refinement in phenix.refine at low resolution.
Traditional methods for macromolecular refinement often have limited success at low resolution (3.0–3.5 Å or worse), producing models that score poorly on crystallographic and geometric validation criteria. To improve low-resolution refinement, knowledge from macromolecular chemistry and homology was used to add three new coordinate-restraint functions to the refinement program phenix.refine. Firstly, a ‘reference-model’ method uses an identical or homologous higher resolution model to add restraints on torsion angles to the geometric target function. Secondly, automatic restraints for common secondary-structure elements in proteins and nucleic acids were implemented that can help to preserve the secondary-structure geometry, which is often distorted at low resolution. Lastly, we have implemented Ramachandran-based restraints on the backbone torsion angles. In this method, a ϕ,ψ term is added to the geometric target function to minimize a modified Ramachandran landscape that smoothly combines favorable peaks identified from nonredundant high-quality data with unfavorable peaks calculated using a clash-based pseudo-energy function. All three methods show improved MolProbity validation statistics, typically complemented by a lowered R
free and a decreased gap between R
work and R
macromolecular crystallography; low resolution; refinement; automation
The highly automated PHENIX AutoBuild wizard is described. The procedure can be applied equally well to phases derived from isomorphous/anomalous and molecular-replacement methods.
The PHENIX AutoBuild wizard is a highly automated tool for iterative model building, structure refinement and density modification using RESOLVE model building, RESOLVE statistical density modification and phenix.refine structure refinement. Recent advances in the AutoBuild wizard and phenix.refine include automated detection and application of NCS from models as they are built, extensive model-completion algorithms and automated solvent-molecule picking. Model-completion algorithms in the AutoBuild wizard include loop building, crossovers between chains in different models of a structure and side-chain optimization. The AutoBuild wizard has been applied to a set of 48 structures at resolutions ranging from 1.1 to 3.2 Å, resulting in a mean R factor of 0.24 and a mean free R factor of 0.29. The R factor of the final model is dependent on the quality of the starting electron density and is relatively independent of resolution.
model building; model completion; macromolecular models; Protein Data Bank; structure refinement; PHENIX
The combination of algorithms from the structure-modeling field with those of crystallographic structure determination can broaden the range of templates that are useful for structure determination by the method of molecular replacement. Automated tools in phenix.mr_rosetta simplify the application of these combined approaches by integrating Phenix crystallographic algorithms and Rosetta structure-modeling algorithms and by systematically generating and evaluating models with a combination of these methods. The phenix.mr_rosetta algorithms can be used to automatically determine challenging structures. The approaches used in phenix.mr_rosetta are described along with examples that show roles that structure-modeling can play in molecular replacement.
Molecular replacement; Automation; Macromolecular crystallography; Rosetta; Phenix
The implementation of crystallographic structure-refinement procedures that include both X-ray and neutron data (separate or jointly) in the PHENIX system is described.
Approximately 85% of the structures deposited in the Protein Data Bank have been solved using X-ray crystallography, making it the leading method for three-dimensional structure determination of macromolecules. One of the limitations of the method is that the typical data quality (resolution) does not allow the direct determination of H-atom positions. Most hydrogen positions can be inferred from the positions of other atoms and therefore can be readily included into the structure model as a priori knowledge. However, this may not be the case in biologically active sites of macromolecules, where the presence and position of hydrogen is crucial to the enzymatic mechanism. This makes the application of neutron crystallography in biology particularly important, as H atoms can be clearly located in experimental neutron scattering density maps. Without exception, when a neutron structure is determined the corresponding X-ray structure is also known, making it possible to derive the complete structure using both data sets. Here, the implementation of crystallographic structure-refinement procedures that include both X-ray and neutron data (separate or jointly) in the PHENIX system is described.
structure refinement; neutrons; joint X-ray and neutron refinement; PHENIX
Systematic investigation of a large number of trial rigid-body refinements leads to an optimized multiple-zone protocol with a larger convergence radius.
Rigid-body refinement is the constrained coordinate refinement of one or more groups of atoms that each move (rotate and translate) as a single body. The goal of this work was to establish an automatic procedure for rigid-body refinement which implements a practical compromise between runtime requirements and convergence radius. This has been achieved by analysis of a large number of trial refinements for 12 classes of random rigid-body displacements (that differ in magnitude of introduced errors), using both least-squares and maximum-likelihood target functions. The results of these tests led to a multiple-zone protocol. The final parameterization of this protocol was optimized empirically on the basis of a second large set of test refinements. This multiple-zone protocol is implemented as part of the phenix.refine program.
rigid-body refinement; multiple-zone protocols
Application of phenix.model_vs_data to the contents of the Protein Data Bank shows that the vast majority of deposited structures can be automatically analyzed to reproduce the reported quality statistics. However, the small fraction of structures that elude automated re-analysis highlight areas where new software developments can help retain valuable information for future analysis.
phenix.model_vs_data is a high-level command-line tool for the computation of crystallographic model and data statistics, and the evaluation of the fit of the model to data. Analysis of all Protein Data Bank structures that have experimental data available shows that in most cases the reported statistics, in particular R factors, can be reproduced within a few percentage points. However, there are a number of outliers where the recomputed R values are significantly different from those originally reported. The reasons for these discrepancies are discussed.
PHENIX; Protein Data Bank; data quality; model quality; structure validation; R factors
Three-dimensional RNA models fitted into crystallographic density maps exhibit pervasive conformational ambiguities, geometric errors, and steric clashes. To address these problems, we present Enumerative Real-space Refinement ASsisted by Electron density under Rosetta (ERRASER), coupled to PHENIX (Python-based Hierarchical Environment for Integrated Xtallography) diffraction-based refinement. On 24 datasets, ERRASER automatically corrects the majority of MolProbity-assessed errors, improves average Rfree factor, resolves functionally important discrepancies in non-canonical structure, and refines low-resolution models to better match higher resolution models.
The PHENIX software for macromolecular structure determination is described.
Macromolecular X-ray crystallography is routinely applied to understand biological processes at a molecular level. However, significant time and effort are still required to solve and complete many of these structures because of the need for manual interpretation of complex numerical data using many software packages and the repeated use of interactive three-dimensional graphics. PHENIX has been developed to provide a comprehensive system for macromolecular crystallographic structure solution with an emphasis on the automation of all procedures. This has relied on the development of algorithms that minimize or eliminate subjective input, the development of algorithms that automate procedures that are traditionally performed by hand and, finally, the development of a framework that allows a tight integration between the algorithms.
PHENIX; Python; macromolecular crystallography; algorithms
Ten measures of experimental electron-density-map quality are examined and the skewness of electron density is found to be the best indicator of actual map quality. A Bayesian approach to estimating map quality is developed and used in the PHENIX AutoSol wizard to make decisions during automated structure solution.
Estimates of the quality of experimental maps are important in many stages of structure determination of macromolecules. Map quality is defined here as the correlation between a map and the corresponding map obtained using phases from the final refined model. Here, ten different measures of experimental map quality were examined using a set of 1359 maps calculated by re-analysis of 246 solved MAD, SAD and MIR data sets. A simple Bayesian approach to estimation of map quality from one or more measures is presented. It was found that a Bayesian estimator based on the skewness of the density values in an electron-density map is the most accurate of the ten individual Bayesian estimators of map quality examined, with a correlation between estimated and actual map quality of 0.90. A combination of the skewness of electron density with the local correlation of r.m.s. density gives a further improvement in estimating map quality, with an overall correlation coefficient of 0.92. The PHENIX AutoSol wizard carries out automated structure solution based on any combination of SAD, MAD, SIR or MIR data sets. The wizard is based on tools from the PHENIX package and uses the Bayesian estimates of map quality described here to choose the highest quality solutions after experimental phasing.
structure solution; scoring; Protein Data Bank; phasing; decision-making; PHENIX; experimental electron-density maps
The general principles behind the macromolecular crystal structure refinement program REFMAC5 are described.
This paper describes various components of the macromolecular crystallographic refinement program REFMAC5, which is distributed as part of the CCP4 suite. REFMAC5 utilizes different likelihood functions depending on the diffraction data employed (amplitudes or intensities), the presence of twinning and the availability of SAD/SIRAS experimental diffraction data. To ensure chemical and structural integrity of the refined model, REFMAC5 offers several classes of restraints and choices of model parameterization. Reliable models at resolutions at least as low as 4 Å can be achieved thanks to low-resolution refinement tools such as secondary-structure restraints, restraints to known homologous structures, automatic global and local NCS restraints, ‘jelly-body’ restraints and the use of novel long-range restraints on atomic displacement parameters (ADPs) based on the Kullback–Leibler divergence. REFMAC5 additionally offers TLS parameterization and, when high-resolution data are available, fast refinement of anisotropic ADPs. Refinement in the presence of twinning is performed in a fully automated fashion. REFMAC5 is a flexible and highly optimized refinement package that is ideally suited for refinement across the entire resolution spectrum encountered in macromolecular crystallography.
The foundations and current features of a widely used graphical user interface for macromolecular crystallography are described.
A new Python-based graphical user interface for the PHENIX suite of crystallography software is described. This interface unifies the command-line programs and their graphical displays, simplifying the development of new interfaces and avoiding duplication of function. With careful design, graphical interfaces can be displayed automatically, instead of being manually constructed. The resulting package is easily maintained and extended as new programs are added or modified.
macromolecular crystallography; graphical user interfaces; PHENIX
X-ray crystallography is a critical tool in the study of biological systems. It is able to provide information that has been a prerequisite to understanding the fundamentals of life. It is also a method that is central to the development of new therapeutics for human disease. Significant time and effort are required to determine and optimize many macromolecular structures because of the need for manual interpretation of complex numerical data, often using many different software packages, and the repeated use of interactive three-dimensional graphics. The Phenix software package has been developed to provide a comprehensive system for macromolecular crystallographic structure solution with an emphasis on automation. This has required the development of new algorithms that minimize or eliminate subjective input in favour of built-in expert-systems knowledge, the automation of procedures that are traditionally performed by hand, and the development of a computational framework that allows a tight integration between the algorithms. The application of automated methods is particularly appropriate in the field of structural proteomics, where high throughput is desired. Features in Phenix for the automation of experimental phasing with subsequent model building, molecular replacement, structure refinement and validation are described and examples given of running Phenix from both the command line and graphical user interface.
Macromolecular Crystallography; Automation; Phenix; X-ray; Diffraction; Python
Torsion-angle sampling, as implemented in the Protein Local Optimization Program (PLOP), is used to generate multiple structurally variable single-conformer models which are in good agreement with X-ray data. An ensemble-refinement approach to differentiate between positional uncertainty and conformational heterogeneity is proposed.
Modeling structural variability is critical for understanding protein function and for modeling reliable targets for in silico docking experiments. Because of the time-intensive nature of manual X-ray crystallographic refinement, automated refinement methods that thoroughly explore conformational space are essential for the systematic construction of structurally variable models. Using five proteins spanning resolutions of 1.0–2.8 Å, it is demonstrated how torsion-angle sampling of backbone and side-chain libraries with filtering against both the chemical energy, using a modern effective potential, and the electron density, coupled with minimization of a reciprocal-space X-ray target function, can generate multiple structurally variable models which fit the X-ray data well. Torsion-angle sampling as implemented in the Protein Local Optimization Program (PLOP) has been used in this work. Models with the lowest R
free values are obtained when electrostatic and implicit solvation terms are included in the effective potential. HIV-1 protease, calmodulin and SUMO-conjugating enzyme illustrate how variability in the ensemble of structures captures structural variability that is observed across multiple crystal structures and is linked to functional flexibility at hinge regions and binding interfaces. An ensemble-refinement procedure is proposed to differentiate between variability that is a consequence of physical conformational heterogeneity and that which reflects uncertainty in the atomic coordinates.
automated refinement; multiple models; conformational heterogeneity; torsion-angle sampling
The deformable elastic network (DEN) method for reciprocal-space crystallographic refinement improves crystal structures, especially at resolutions lower than 3.5 Å. The DEN web service presented here intends to provide structural biologists with access to resources for running computationally intensive DEN refinements.
Deformable elastic network (DEN) restraints have proved to be a powerful tool for refining structures from low-resolution X-ray crystallographic data sets. Unfortunately, optimal refinement using DEN restraints requires extensive calculations and is often hindered by a lack of access to sufficient computational resources. The DEN web service presented here intends to provide structural biologists with access to resources for running computationally intensive DEN refinements in parallel on the Open Science Grid, the US cyberinfrastructure. Access to the grid is provided through a simple and intuitive web interface integrated into the SBGrid Science Portal. Using this portal, refinements combined with full parameter optimization that would take many thousands of hours on standard computational resources can now be completed in several hours. An example of the successful application of DEN restraints to the human Notch1 transcriptional complex using the grid resource, and summaries of all submitted refinements, are presented as justification.
deformable elastic network restraints; low-resolution refinement; DEN refinement
A stereochemical library which defines the target values for main-chain bond lengths and angles as a function of the residue’s ϕ/ψ angles was tested in refinement. Use of this library allows the construction of models that conform to ideal geometry much better than previous libraries without degrading their fit to the diffraction data.
The major macromolecular crystallographic refinement packages restrain models to ideal geometry targets defined as single values that are independent of molecular conformation. However, ultrahigh-resolution X-ray models of proteins are not consistent with this concept of ideality and have been used to develop a library of ideal main-chain bond lengths and angles that are parameterized by the ϕ/ψ angle of the residue [Berkholz et al. (2009 ▶), Structure, 17, 1316–1325]. Here, it is first shown that the new conformation-dependent library does not suffer from poor agreement with ultrahigh-resolution structures, whereas current libraries have this problem. Using the TNT refinement package, it is then shown that protein structure refinement using this conformation-dependent library results in models that have much better agreement with library values of bond angles with little change in the R values. These tests support the value of revising refinement software to account for this new paradigm.
conformation-dependent stereochemical library; refinement; ideal geometry; restraints
The application of a new normal-mode-based X-ray crystallographic refinement method to a total of eight structures of moderate resolution is illustrated.
The structural refinement of large complexes at the lower resolution limit is often difficult and inefficient owing to the limited number of reflections and the frequently high-level structural flexibility. A new normal-mode-based X-ray crystallographic refinement method has recently been developed that enables anisotropic B-factor refinement using a drastically smaller number of thermal parameters than even isotropic refinement. Here, the method has been systematically tested on a total of eight systems in the resolution range 3.0–3.9 Å. This series of tests established the most applicable scenarios for the method, the detailed procedures for its application and the degree of structural improvement. The results demonstrated substantial model improvement at the lower resolution limit, especially in cases in which other methods such as the translation–libration–screw (TLS) model were not applicable owing to the poorly converged isotropic B-factor distribution. It is expected that this normal-mode-based method will be a useful tool for structural refinement, in particular at the lower resolution limit, in the field of X-ray crystallography.
conformational flexibility; anisotropic thermal parameters; moderate resolution
A new software system for automated ligand coordinate and restraint generation is presented.
The electronic Ligand Builder and Optimization Workbench (eLBOW) is a program module of the PHENIX suite of computational crystallographic software. It is designed to be a flexible procedure that uses simple and fast quantum-chemical techniques to provide chemically accurate information for novel and known ligands alike. A variety of input formats and options allow the attainment of a number of diverse goals including geometry optimization and generation of restraints.
ligands; coordinates; restraints; Python; object-oriented programming
According to several studies, some nuclear magnetic resonance (NMR) structures are of lower quality, less reliable and less suitable for structural analysis than high-resolution X-ray crystallographic structures. We present a public database of 2405 refined NMR solution structures [statistical torsion angle potentials (STAP) refinement of the NMR database, http://psb.kobic.re.kr/STAP/refinement] from the Protein Data Bank (PDB). A simulated annealing protocol was employed to obtain refined structures with target potentials, including the newly developed STAP. The refined database was extensively analysed using various quality indicators from several assessment programs to determine the nuclear Overhauser effect (NOE) completeness, Ramachandran appearance, χ1-χ2 rotamer normality, various parameters for protein stability and other indicators. Most quality indicators are improved in our protocol mainly due to the inclusion of the newly developed knowledge-based potentials. This database can be used by the NMR structure community for further development of research and validation tools, structure-related studies and modelling in many fields of research.
A protein crystallographic data collection using the PETRA II source sets the record in crystallographic resolution for a biological macromolecule (0.48 Å) for the small protein crambin and increases available data for refinement by a factor of 1.5. The quality of the data collected is evident by comparing the merging R factor between two different crambin data sets, collected from different crystals and using different beamlines and protocols (6%) and the final crystallographic R factor of 12.7% for the model refinement.
With the development of highly brilliant and extremely intense synchrotron X-ray sources, extreme high-resolution limits for biological samples are now becoming attainable. Here, a study is presented that sets the record in crystallographic resolution for a biological macromolecule. The structure of the small protein crambin was determined to 0.48 Å resolution on the PETRA II ring before its conversion to a dedicated synchrotron-radiation source. The results reveal a wealth of details in electron density and demonstrate the possibilities that are potentially offered by a high-energy source. The question now arises as to what the true limits are in terms of what can be seen at such high resolution. From what can be extrapolated from the results using crystals of crambin, this limit would be at approximately 0.40 Å, which approaches that for smaller compounds.
PETRA II; crambin; high resolution
The crystallization and preliminary X-ray diffraction analysis at 1.25 Å resolution of free-ligand arginine kinase from the Pacific whiteleg shrimp Litopenaeus vannamei are reported. Crystals belong to space group P212121, phases were determined by molecular replacement and refinement was performed with Phenix.
Crystals of an unligated monomeric arginine kinase from the Pacific whiteleg shrimp Litopenaeus vannamei (LvAK) were successfully obtained using the microbatch method. Crystallization conditions and preliminary X-ray diffraction analysis to 1.25 Å resolution are reported. Data were collected at 100 K on NSLS beamline X6A. The crystals belonged to space group P212121, with unit-cell parameters a = 56.5, b = 70.2, c = 81.7 Å. One monomer per asymmetric unit was found, with a Matthews coefficient (V
M) of 2.05 Å3 Da−1 and 40% solvent content. Initial phases were determined by molecular replacement using a homology model of LvAK as the search model. Refinement was performed with PHENIX, with final R
work and R
free values of 0.15 and 0.19, respectively. Biological analysis of the structure is currently in progress.
arginine kinases; Litopenaeus vannamei
DEN refinement and automated model building with AutoBuild were used to determine the structure of a putative succinyl-diaminopimelate desuccinylase from C. glutamicum. This difficult case of molecular-replacement phasing shows that the synergism between DEN refinement and AutoBuild outperforms standard refinement protocols.
Phasing by molecular replacement remains difficult for targets that are far from the search model or in situations where the crystal diffracts only weakly or to low resolution. Here, the process of determining and refining the structure of Cgl1109, a putative succinyl-diaminopimelate desuccinylase from Corynebacterium glutamicum, at ∼3 Å resolution is described using a combination of homology modeling with MODELLER, molecular-replacement phasing with Phaser, deformable elastic network (DEN) refinement and automated model building using AutoBuild in a semi-automated fashion, followed by final refinement cycles with phenix.refine and Coot. This difficult molecular-replacement case illustrates the power of including DEN restraints derived from a starting model to guide the movements of the model during refinement. The resulting improved model phases provide better starting points for automated model building and produce more significant difference peaks in anomalous difference Fourier maps to locate anomalous scatterers than does standard refinement. This example also illustrates a current limitation of automated procedures that require manual adjustment of local sequence misalignments between the homology model and the target sequence.
reciprocal-space refinement; DEN refinement; real-space refinement; automated model building; succinyl-diaminopimelate desuccinylase
Protein structure refinement refers to the process of improving the qualities of protein structures during structure modeling processes to bring them closer to their native states. Structure refinement has been drawing increasing attention in the community-wide Critical Assessment of techniques for Protein Structure prediction (CASP) experiments since its addition in 8th CASP experiment. During the 9th and recently concluded 10th CASP experiments, a consistent growth in number of refinement targets and participating groups has been witnessed. Yet, protein structure refinement still remains a largely unsolved problem with majority of participating groups in CASP refinement category failed to consistently improve the quality of structures issued for refinement. In order to alleviate this need, we developed a completely automated and computationally efficient protein 3D structure refinement method, i3Drefine, based on an iterative and highly convergent energy minimization algorithm with a powerful all-atom composite physics and knowledge-based force fields and hydrogen bonding (HB) network optimization technique. In the recent community-wide blind experiment, CASP10, i3Drefine (as ‘MULTICOM-CONSTRUCT’) was ranked as the best method in the server section as per the official assessment of CASP10 experiment. Here we provide the community with free access to i3Drefine software and systematically analyse the performance of i3Drefine in strict blind mode on the refinement targets issued in CASP10 refinement category and compare with other state-of-the-art refinement methods participating in CASP10. Our analysis demonstrates that i3Drefine is only fully-automated server participating in CASP10 exhibiting consistent improvement over the initial structures in both global and local structural quality metrics. Executable version of i3Drefine is freely available at http://protein.rnet.missouri.edu/i3drefine/.
The decision-making algorithms and software used in PDB_REDO to re-refine and rebuild crystallographic protein structures in the PDB are presented and discussed.
Developments of the PDB_REDO procedure that combine re-refinement and rebuilding within a unique decision-making framework to improve structures in the PDB are presented. PDB_REDO uses a variety of existing and custom-built software modules to choose an optimal refinement protocol (e.g. anisotropic, isotropic or overall B-factor refinement, TLS model) and to optimize the geometry versus data-refinement weights. Next, it proceeds to rebuild side chains and peptide planes before a final optimization round. PDB_REDO works fully automatically without the need for intervention by a crystallographic expert. The pipeline was tested on 12 000 PDB entries and the great majority of the test cases improved both in terms of crystallographic criteria such as R
free and in terms of widely accepted geometric validation criteria. It is concluded that PDB_REDO is useful to update the otherwise ‘static’ structures in the PDB to modern crystallographic standards. The publically available PDB_REDO database provides better model statistics and contributes to better refinement and validation targets.
validation; refinement; model building; automation; PDB
In X-ray crystallography, molecular replacement and subsequent refinement is challenging at low resolution. We compared refinement methods using synchrotron diffraction data of photosystem I at 7.4 Å resolution, starting from different initial models with increasing deviations from the known high-resolution structure. Standard refinement spoiled the initial models moving them further away from the true structure and leading to high Rfree-values. In contrast, DEN-refinement improved even the most distant starting model as judged by Rfree, atomic root-mean-square differences to the true structure, significance of features not included in the initial model, and connectivity of electron density. The best protocol was DEN-refinement with initial segmented rigid-body refinement. For the most distant initial model, the fraction of atoms within 2 Å of the true structure improved from 24% to 60%. We also found a significant correlation between Rfree-values and the accuracy of the model, suggesting that Rfree is useful even at low resolution.
DEN refinement; membrane protein; low-resolution refinement; simulated annealing; free R value
An evaluation of validation and real-space intervention possibilities for improving existing automated (re-)refinement methods.
The deposition of X-ray data along with the customary structural models defining PDB entries makes it possible to apply large-scale re-refinement protocols to these entries, thus giving users the benefit of improvements in X-ray methods that have occurred since the structure was deposited. Automated gradient refinement is an effective method to achieve this goal, but real-space intervention is most often required in order to adequately address problems detected by structure-validation software. In order to improve the existing protocol, automated re-refinement was combined with structure validation and difference-density peak analysis to produce a catalogue of problems in PDB entries that are amenable to automatic correction. It is shown that re-refinement can be effective in producing improvements, which are often associated with the systematic use of the TLS parameterization of B factors, even for relatively new and high-resolution PDB entries, while the accompanying manual or semi-manual map analysis and fitting steps show good prospects for eventual automation. It is proposed that the potential for simultaneous improvements in methods and in re-refinement results be further encouraged by broadening the scope of depositions to include refinement metadata and ultimately primary rather than reduced X-ray data.