X-ray crystallography is a critical tool in the study of biological systems. It is able to provide information that has been a prerequisite to understanding the fundamentals of life. It is also a method that is central to the development of new therapeutics for human disease. Significant time and effort are required to determine and optimize many macromolecular structures because of the need for manual interpretation of complex numerical data, often using many different software packages, and the repeated use of interactive three-dimensional graphics. The Phenix software package has been developed to provide a comprehensive system for macromolecular crystallographic structure solution with an emphasis on automation. This has required the development of new algorithms that minimize or eliminate subjective input in favour of built-in expert-systems knowledge, the automation of procedures that are traditionally performed by hand, and the development of a computational framework that allows a tight integration between the algorithms. The application of automated methods is particularly appropriate in the field of structural proteomics, where high throughput is desired. Features in Phenix for the automation of experimental phasing with subsequent model building, molecular replacement, structure refinement and validation are described and examples given of running Phenix from both the command line and graphical user interface.
Macromolecular Crystallography; Automation; Phenix; X-ray; Diffraction; Python
A density-based procedure is described for improving a homology model that is locally accurate but differs globally. The model is deformed to match the map and refined, yielding an improved starting point for density modification and further model-building.
An approach is presented for addressing the challenge of model rebuilding after molecular replacement in cases where the placed template is very different from the structure to be determined. The approach takes advantage of the observation that a template and target structure may have local structures that can be superimposed much more closely than can their complete structures. A density-guided procedure for deformation of a properly placed template is introduced. A shift in the coordinates of each residue in the structure is calculated based on optimizing the match of model density within a 6 Å radius of the center of that residue with a prime-and-switch electron-density map. The shifts are smoothed and applied to the atoms in each residue, leading to local deformation of the template that improves the match of map and model. The model is then refined to improve the geometry and the fit of model to the structure-factor data. A new map is then calculated and the process is repeated until convergence. The procedure can extend the routine applicability of automated molecular replacement, model building and refinement to search models with over 2 Å r.m.s.d. representing 65–100% of the structure.
molecular replacement; automation; macromolecular crystallography; structure similarity; modeling; Phenix; morphing
In scientific computing, Fortran was the dominant implementation language throughout most of the second part of the 20th century. The many tools accumulated during this time have been difficult to integrate with modern software, which is now dominated by object-oriented languages.
Driven by the requirements of a large-scale scientific software project, we have developed a Fortran to C++ source-to-source conversion tool named FABLE. This enables the continued development of new methods even while switching languages. We report the application of FABLE in three major projects and present detailed comparisons of Fortran and C++ runtime performances.
Our experience suggests that most Fortran 77 codes can be converted with an effort that is minor (measured in days) compared to the original development time (often measured in years). With FABLE it is possible to reuse and evolve legacy work in modern object-oriented environments, in a portable and maintainable way. FABLE is available under a nonrestrictive open source license. In FABLE the analysis of the Fortran sources is separated from the generation of the C++ sources. Therefore parts of FABLE could be reused for other target languages.
Fortran; C++; Source-to-source conversion; Python; Test-driven development
The foundations and current features of a widely used graphical user interface for macromolecular crystallography are described.
A new Python-based graphical user interface for the PHENIX suite of crystallography software is described. This interface unifies the command-line programs and their graphical displays, simplifying the development of new interfaces and avoiding duplication of function. With careful design, graphical interfaces can be displayed automatically, instead of being manually constructed. The resulting package is easily maintained and extended as new programs are added or modified.
macromolecular crystallography; graphical user interfaces; PHENIX
phenix.refine is a program within the PHENIX package that supports crystallographic structure refinement against experimental data with a wide range of upper resolution limits using a large repertoire of model parameterizations. This paper presents an overview of the major phenix.refine features, with extensive literature references for readers interested in more detailed discussions of the methods.
phenix.refine is a program within the PHENIX package that supports crystallographic structure refinement against experimental data with a wide range of upper resolution limits using a large repertoire of model parameterizations. It has several automation features and is also highly flexible. Several hundred parameters enable extensive customizations for complex use cases. Multiple user-defined refinement strategies can be applied to specific parts of the model in a single refinement run. An intuitive graphical user interface is available to guide novice users and to assist advanced users in managing refinement projects. X-ray or neutron diffraction data can be used separately or jointly in refinement. phenix.refine is tightly integrated into the PHENIX suite, where it serves as a critical component in automated model building, final structure refinement, structure validation and deposition to the wwPDB. This paper presents an overview of the major phenix.refine features, with extensive literature references for readers interested in more detailed discussions of the methods.
structure refinement; PHENIX; joint X-ray/neutron refinement; maximum likelihood; TLS; simulated annealing; subatomic resolution; real-space refinement; twinning; NCS
Recent developments in PHENIX are reported that allow the use of reference-model torsion restraints, secondary-structure hydrogen-bond restraints and Ramachandran restraints for improved macromolecular refinement in phenix.refine at low resolution.
Traditional methods for macromolecular refinement often have limited success at low resolution (3.0–3.5 Å or worse), producing models that score poorly on crystallographic and geometric validation criteria. To improve low-resolution refinement, knowledge from macromolecular chemistry and homology was used to add three new coordinate-restraint functions to the refinement program phenix.refine. Firstly, a ‘reference-model’ method uses an identical or homologous higher resolution model to add restraints on torsion angles to the geometric target function. Secondly, automatic restraints for common secondary-structure elements in proteins and nucleic acids were implemented that can help to preserve the secondary-structure geometry, which is often distorted at low resolution. Lastly, we have implemented Ramachandran-based restraints on the backbone torsion angles. In this method, a ϕ,ψ term is added to the geometric target function to minimize a modified Ramachandran landscape that smoothly combines favorable peaks identified from nonredundant high-quality data with unfavorable peaks calculated using a clash-based pseudo-energy function. All three methods show improved MolProbity validation statistics, typically complemented by a lowered R
free and a decreased gap between R
work and R
macromolecular crystallography; low resolution; refinement; automation
The combination of algorithms from the structure-modeling field with those of crystallographic structure determination can broaden the range of templates that are useful for structure determination by the method of molecular replacement. Automated tools in phenix.mr_rosetta simplify the application of these combined approaches by integrating Phenix crystallographic algorithms and Rosetta structure-modeling algorithms and by systematically generating and evaluating models with a combination of these methods. The phenix.mr_rosetta algorithms can be used to automatically determine challenging structures. The approaches used in phenix.mr_rosetta are described along with examples that show roles that structure-modeling can play in molecular replacement.
Molecular replacement; Automation; Macromolecular crystallography; Rosetta; Phenix
The implementation of crystallographic structure-refinement procedures that include both X-ray and neutron data (separate or jointly) in the PHENIX system is described.
Approximately 85% of the structures deposited in the Protein Data Bank have been solved using X-ray crystallography, making it the leading method for three-dimensional structure determination of macromolecules. One of the limitations of the method is that the typical data quality (resolution) does not allow the direct determination of H-atom positions. Most hydrogen positions can be inferred from the positions of other atoms and therefore can be readily included into the structure model as a priori knowledge. However, this may not be the case in biologically active sites of macromolecules, where the presence and position of hydrogen is crucial to the enzymatic mechanism. This makes the application of neutron crystallography in biology particularly important, as H atoms can be clearly located in experimental neutron scattering density maps. Without exception, when a neutron structure is determined the corresponding X-ray structure is also known, making it possible to derive the complete structure using both data sets. Here, the implementation of crystallographic structure-refinement procedures that include both X-ray and neutron data (separate or jointly) in the PHENIX system is described.
structure refinement; neutrons; joint X-ray and neutron refinement; PHENIX
iotbx.cif is a comprehensive toolbox for the development of applications that make use of the CIF format.
iotbx.cif is a new software module for the development of applications that make use of the CIF format. Comprehensive tools are provided for input, output and validation of CIFs, as well as for interconversion with high-level cctbx [Grosse-Kunstleve, Sauter, Moriarty & Adams (2002). J. Appl. Cryst.
35, 126–136] crystallographic objects. The interface to the library is written in Python, whilst parsing is carried out using a compiled parser, combining the performance of a compiled language (C++) with the benefits of using an interpreted language.
iotbx.cif; cctbx; CIF; computer programs
A reference table of exact direct-space asymmetric units for the 230 crystallographic space groups is presented, based on a new geometric notation for asymmetric unit conditions.
It is well known that the direct-space asymmetric unit definitions found in the International Tables for Crystallography, Volume A, are inexact at the borders. Face- and edge-specific sub-conditions have to be added to remove parts redundant under symmetry. This paper introduces a concise geometric notation for asymmetric unit conditions. The notation is the foundation for a reference table of exact direct-space asymmetric unit definitions for the 230 crystallographic space-group types. The change-of-basis transformation law for the conditions is derived, which allows the information from the reference table to be used for any space-group setting. We also show how the vertices of an asymmetric unit can easily be computed from the information in the reference table.
asymmetric unit; direct space; space groups
A new software system for automated ligand coordinate and restraint generation is presented.
The electronic Ligand Builder and Optimization Workbench (eLBOW) is a program module of the PHENIX suite of computational crystallographic software. It is designed to be a flexible procedure that uses simple and fast quantum-chemical techniques to provide chemically accurate information for novel and known ligands alike. A variety of input formats and options allow the attainment of a number of diverse goals including geometry optimization and generation of restraints.
ligands; coordinates; restraints; Python; object-oriented programming
Central to crystallographic structure solution is obtaining accurate phases in order to build a molecular model, ultimately followed by refinement of that model to optimize its fit to the experimental diffraction data and prior chemical knowledge. Recent advances in phasing and model refinement and validation algorithms make it possible to arrive at better electron density maps and more accurate models.
Application of phenix.model_vs_data to the contents of the Protein Data Bank shows that the vast majority of deposited structures can be automatically analyzed to reproduce the reported quality statistics. However, the small fraction of structures that elude automated re-analysis highlight areas where new software developments can help retain valuable information for future analysis.
phenix.model_vs_data is a high-level command-line tool for the computation of crystallographic model and data statistics, and the evaluation of the fit of the model to data. Analysis of all Protein Data Bank structures that have experimental data available shows that in most cases the reported statistics, in particular R factors, can be reproduced within a few percentage points. However, there are a number of outliers where the recomputed R values are significantly different from those originally reported. The reasons for these discrepancies are discussed.
PHENIX; Protein Data Bank; data quality; model quality; structure validation; R factors
An X-ray structural model can be reassigned to a higher symmetry space group using the presented framework if its noncrystallographic symmetry operators are close to being exact crystallographic relationships. About 2% of structures in the Protein Data Bank can be reclassified in this way.
Up to 2% of X-ray structures in the Protein Data Bank (PDB) potentially fit into a higher symmetry space group. Redundant protein chains in these structures can be made compatible with exact crystallographic symmetry with minimal atomic movements that are smaller than the expected range of coordinate uncertainty. The incidence of problem cases is somewhat difficult to define precisely, as there is no clear line between underassigned symmetry, in which the subunit differences are unsupported by the data, and pseudosymmetry, in which the subunit differences rest on small but significant intensity differences in the diffraction pattern. To help catch symmetry-assignment problems in the future, it is useful to add a validation step that operates on the refined coordinates just prior to structure deposition. If redundant symmetry-related chains can be removed at this stage, the resulting model (in a higher symmetry space group) can readily serve as an isomorphous replacement starting point for re-refinement using re-indexed and re-integrated raw data. These ideas are implemented in new software tools available at http://cci.lbl.gov/labelit.
underassigned rotational symmetry; LABELIT; validation
The PHENIX software for macromolecular structure determination is described.
Macromolecular X-ray crystallography is routinely applied to understand biological processes at a molecular level. However, significant time and effort are still required to solve and complete many of these structures because of the need for manual interpretation of complex numerical data using many software packages and the repeated use of interactive three-dimensional graphics. PHENIX has been developed to provide a comprehensive system for macromolecular crystallographic structure solution with an emphasis on the automation of all procedures. This has relied on the development of algorithms that minimize or eliminate subjective input, the development of algorithms that automate procedures that are traditionally performed by hand and, finally, the development of a framework that allows a tight integration between the algorithms.
PHENIX; Python; macromolecular crystallography; algorithms
Systematic investigation of a large number of trial rigid-body refinements leads to an optimized multiple-zone protocol with a larger convergence radius.
Rigid-body refinement is the constrained coordinate refinement of one or more groups of atoms that each move (rotate and translate) as a single body. The goal of this work was to establish an automatic procedure for rigid-body refinement which implements a practical compromise between runtime requirements and convergence radius. This has been achieved by analysis of a large number of trial refinements for 12 classes of random rigid-body displacements (that differ in magnitude of introduced errors), using both least-squares and maximum-likelihood target functions. The results of these tests led to a multiple-zone protocol. The final parameterization of this protocol was optimized empirically on the basis of a second large set of test refinements. This multiple-zone protocol is implemented as part of the phenix.refine program.
rigid-body refinement; multiple-zone protocols
Ten measures of experimental electron-density-map quality are examined and the skewness of electron density is found to be the best indicator of actual map quality. A Bayesian approach to estimating map quality is developed and used in the PHENIX AutoSol wizard to make decisions during automated structure solution.
Estimates of the quality of experimental maps are important in many stages of structure determination of macromolecules. Map quality is defined here as the correlation between a map and the corresponding map obtained using phases from the final refined model. Here, ten different measures of experimental map quality were examined using a set of 1359 maps calculated by re-analysis of 246 solved MAD, SAD and MIR data sets. A simple Bayesian approach to estimation of map quality from one or more measures is presented. It was found that a Bayesian estimator based on the skewness of the density values in an electron-density map is the most accurate of the ten individual Bayesian estimators of map quality examined, with a correlation between estimated and actual map quality of 0.90. A combination of the skewness of electron density with the local correlation of r.m.s. density gives a further improvement in estimating map quality, with an overall correlation coefficient of 0.92. The PHENIX AutoSol wizard carries out automated structure solution based on any combination of SAD, MAD, SIR or MIR data sets. The wizard is based on tools from the PHENIX package and uses the Bayesian estimates of map quality described here to choose the highest quality solutions after experimental phasing.
structure solution; scoring; Protein Data Bank; phasing; decision-making; PHENIX; experimental electron-density maps
A procedure for carrying out iterative model building, density modification and refinement is presented in which the density in an OMITregion is essentially unbiased by an atomic model. Density from a set of overlapping OMIT regions can be combined to create a composite ‘iterative-build’ OMIT map that is everywhere unbiased by an atomic model but also everywhere benefiting from the model-based information present elsewhere in the unit cell. The procedure may have applications in the validation of specific features in atomic models as well as in overall model validation. The procedure is demonstrated with a molecular-replacement structure and with an experimentally phased structure and a variation on the method is demonstrated by removing model bias from a structure from the Protein Data Bank.
An OMIT procedure is presented that has the benefits of iterative model building density modification and refinement yet is essentially unbiased by the atomic model that is built.
A procedure for carrying out iterative model building, density modification and refinement is presented in which the density in an OMIT region is essentially unbiased by an atomic model. Density from a set of overlapping OMIT regions can be combined to create a composite ‘iterative-build’ OMIT map that is everywhere unbiased by an atomic model but also everywhere benefiting from the model-based information present elsewhere in the unit cell. The procedure may have applications in the validation of specific features in atomic models as well as in overall model validation. The procedure is demonstrated with a molecular-replacement structure and with an experimentally phased structure and a variation on the method is demonstrated by removing model bias from a structure from the Protein Data Bank.
model building; model validation; macromolecular models; Protein Data Bank; refinement; OMIT maps; bias; structure refinement; PHENIX
The highly automated PHENIX AutoBuild wizard is described. The procedure can be applied equally well to phases derived from isomorphous/anomalous and molecular-replacement methods.
The PHENIX AutoBuild wizard is a highly automated tool for iterative model building, structure refinement and density modification using RESOLVE model building, RESOLVE statistical density modification and phenix.refine structure refinement. Recent advances in the AutoBuild wizard and phenix.refine include automated detection and application of NCS from models as they are built, extensive model-completion algorithms and automated solvent-molecule picking. Model-completion algorithms in the AutoBuild wizard include loop building, crossovers between chains in different models of a structure and side-chain optimization. The AutoBuild wizard has been applied to a set of 48 structures at resolutions ranging from 1.1 to 3.2 Å, resulting in a mean R factor of 0.24 and a mean free R factor of 0.29. The R factor of the final model is dependent on the quality of the starting electron density and is relatively independent of resolution.
model building; model completion; macromolecular models; Protein Data Bank; structure refinement; PHENIX
The presence of pseudosymmetry can cause problems in structure determination and refinement. The relevant background and representative examples are presented.
It is not uncommon for protein crystals to crystallize with more than a single molecule per asymmetric unit. When more than a single molecule is present in the asymmetric unit, various pathological situations such as twinning, modulated crystals and pseudo translational or rotational symmetry can arise. The presence of pseudosymmetry can lead to uncertainties about the correct space group, especially in the presence of twinning. The background to certain common pathologies is presented and a new notation for space groups in unusual settings is introduced. The main concepts are illustrated with several examples from the literature and the Protein Data Bank.
pathology; twinning; pseudosymmetry
Modelling deformation electron density using interatomic scatters is simpler than multipolar methods, produces comparable results at subatomic resolution and can easily be applied to macromolecules.
A study of the accurate electron-density distribution in molecular crystals at subatomic resolution (better than ∼1.0 Å) requires more detailed models than those based on independent spherical atoms. A tool that is conventionally used in small-molecule crystallography is the multipolar model. Even at upper resolution limits of 0.8–1.0 Å, the number of experimental data is insufficient for full multipolar model refinement. As an alternative, a simpler model composed of conventional independent spherical atoms augmented by additional scatterers to model bonding effects has been proposed. Refinement of these mixed models for several benchmark data sets gave results that were comparable in quality with the results of multipolar refinement and superior to those for conventional models. Applications to several data sets of both small molecules and macromolecules are shown. These refinements were performed using the general-purpose macromolecular refinement module phenix.refine of the PHENIX package.
structure refinement; subatomic resolution; deformation density; interatomic scatterers; PHENIX
A description is given of Phaser-2.1: software for phasing macromolecular crystal structures by molecular replacement and single-wavelength anomalous dispersion phasing.
Phaser is a program for phasing macromolecular crystal structures by both molecular replacement and experimental phasing methods. The novel phasing algorithms implemented in Phaser have been developed using maximum likelihood and multivariate statistics. For molecular replacement, the new algorithms have proved to be significantly better than traditional methods in discriminating correct solutions from noise, and for single-wavelength anomalous dispersion experimental phasing, the new algorithms, which account for correlations between F
+ and F
−, give better phases (lower mean phase error with respect to the phases given by the refined structure) than those that use mean F and anomalous differences ΔF. One of the design concepts of Phaser was that it be capable of a high degree of automation. To this end, Phaser (written in C++) can be called directly from Python, although it can also be called using traditional CCP4 keyword-style input. Phaser is a platform for future development of improved phasing methods and their release, including source code, to the crystallographic community.
computer programs; molecular replacement; SAD phasing; likelihood; structural genomics
Heterogeneity in ensembles generated by independent model rebuilding principally reflects the limitations of the data and of the model-building process rather than the diversity of structures in the crystal.
Automation of iterative model building, density modification and refinement in macromolecular crystallography has made it feasible to carry out this entire process multiple times. By using different random seeds in the process, a number of different models compatible with experimental data can be created. Sets of models were generated in this way using real data for ten protein structures from the Protein Data Bank and using synthetic data generated at various resolutions. Most of the heterogeneity among models produced in this way is in the side chains and loops on the protein surface. Possible interpretations of the variation among models created by repetitive rebuilding were investigated. Synthetic data were created in which a crystal structure was modelled as the average of a set of ‘perfect’ structures and the range of models obtained by rebuilding a single starting model was examined. The standard deviations of coordinates in models obtained by repetitive rebuilding at high resolution are small, while those obtained for the same synthetic crystal structure at low resolution are large, so that the diversity within a group of models cannot generally be a quantitative reflection of the actual structures in a crystal. Instead, the group of structures obtained by repetitive rebuilding reflects the precision of the models, and the standard deviation of coordinates of these structures is a lower bound estimate of the uncertainty in coordinates of the individual models.
model building; model completion; coordinate errors; models; Protein Data Bank; convergence; reproducibility; heterogeneity; precision; accuracy
A robust method for determining bulk-solvent and anisotropic scaling parameters for macromolecular refinement is described. A maximum-likelihood target function for determination of flat bulk-solvent model parameters and overall anisotropic scale factor is also proposed.
A reliable method for the determination of bulk-solvent model parameters and an overall anisotropic scale factor is of increasing importance as structure determination becomes more automated. Current protocols require the manual inspection of refinement results in order to detect errors in the calculation of these parameters. Here, a robust method for determining bulk-solvent and anisotropic scaling parameters in macromolecular refinement is described. The implementation of a maximum-likelihood target function for determining the same parameters is also discussed. The formulas and corresponding derivatives of the likelihood function with respect to the solvent parameters and the components of anisotropic scale matrix are presented. These algorithms are implemented in the CCTBX bulk-solvent correction and scaling module.
bulk-solvent correction; anisotropic scaling