The solvent-picking procedure in phenix.refine has been extended and combined with Phaser anomalous substructure completion and analysis of coordination geometry to identify and place elemental ions.
Many macromolecular model-building and refinement programs can automatically place solvent atoms in electron density at moderate-to-high resolution. This process frequently builds water molecules in place of elemental ions, the identification of which must be performed manually. The solvent-picking algorithms in phenix.refine have been extended to build common ions based on an analysis of the chemical environment as well as physical properties such as occupancy, B factor and anomalous scattering. The method is most effective for heavier elements such as calcium and zinc, for which a majority of sites can be placed with few false positives in a diverse test set of structures. At atomic resolution, it is observed that it can also be possible to identify tightly bound sodium and magnesium ions. A number of challenges that contribute to the difficulty of completely automating the process of structure completion are discussed.
refinement; ions; PHENIX
With the implementation of a molecular-replacement likelihood target that accounts for translational noncrystallographic symmetry, it became possible to solve the crystal structure of a protein with seven tetrameric assemblies arrayed translationally along the c axis. The new algorithm found 56 protein molecules in reduced symmetry (P1), which was used to resolve space-group ambiguity caused by severe twinning.
Translational noncrystallographic symmetry (tNCS) is a pathology of protein crystals in which multiple copies of a molecule or assembly are found in similar orientations. Structure solution is problematic because this breaks the assumptions used in current likelihood-based methods. To cope with such cases, new likelihood approaches have been developed and implemented in Phaser to account for the statistical effects of tNCS in molecular replacement. Using these new approaches, it was possible to solve the crystal structure of a protein exhibiting an extreme form of this pathology with seven tetrameric assemblies arrayed along the c axis. To resolve space-group ambiguities caused by tetartohedral twinning, the structure was initially solved by placing 56 copies of the monomer in space group P1 and using the symmetry of the solution to define the true space group, C2. The resulting structure of Hyp-1, a pathogenesis-related class 10 (PR-10) protein from the medicinal herb St John’s wort, reveals the binding modes of the fluorescent probe 8-anilino-1-naphthalene sulfonate (ANS), providing insight into the function of the protein in binding or storing hydrophobic ligands.
maximum likelihood; translational noncrystallographic symmetry; molecular replacement; commensurate modulation; pseudo-symmetry
A software system for automated protein–ligand crystallography has been implemented in the Phenix suite. This significantly reduces the manual effort required in high-throughput crystallographic studies.
High-throughput drug-discovery and mechanistic studies often require the determination of multiple related crystal structures that only differ in the bound ligands, point mutations in the protein sequence and minor conformational changes. If performed manually, solution and refinement requires extensive repetition of the same tasks for each structure. To accelerate this process and minimize manual effort, a pipeline encompassing all stages of ligand building and refinement, starting from integrated and scaled diffraction intensities, has been implemented in Phenix. The resulting system is able to successfully solve and refine large collections of structures in parallel without extensive user intervention prior to the final stages of model completion and validation.
protein–ligand complexes; automation; crystallographic structure solution and refinement
A procedure for model building is described that combines morphing a model to match a density map, trimming the morphed model and aligning the model to a sequence.
A procedure termed ‘morphing’ for improving a model after it has been placed in the crystallographic cell by molecular replacement has recently been developed. Morphing consists of applying a smooth deformation to a model to make it match an electron-density map more closely. Morphing does not change the identities of the residues in the chain, only their coordinates. Consequently, if the true structure differs from the working model by containing different residues, these differences cannot be corrected by morphing. Here, a procedure that helps to address this limitation is described. The goal of the procedure is to obtain a relatively complete model that has accurate main-chain atomic positions and residues that are correctly assigned to the sequence. Residues in a morphed model that do not match the electron-density map are removed. Each segment of the resulting trimmed morphed model is then assigned to the sequence of the molecule using information about the connectivity of the chains from the working model and from connections that can be identified from the electron-density map. The procedure was tested by application to a recently determined structure at a resolution of 3.2 Å and was found to increase the number of correctly identified residues in this structure from the 88 obtained using phenix.resolve sequence assignment alone (Terwilliger, 2003 ▶) to 247 of a possible 359. Additionally, the procedure was tested by application to a series of templates with sequence identities to a target structure ranging between 7 and 36%. The mean fraction of correctly identified residues in these cases was increased from 33% using phenix.resolve sequence assignment to 47% using the current procedure. The procedure is simple to apply and is available in the Phenix software package.
morphing; model building; sequence assignment; model–map correlation; loop-building
The functionality of the molecular-replacement pipeline phaser.MRage is introduced and illustrated with examples.
Phaser.MRage is a molecular-replacement automation framework that implements a full model-generation workflow and provides several layers of model exploration to the user. It is designed to handle a large number of models and can distribute calculations efficiently onto parallel hardware. In addition, phaser.MRage can identify correct solutions and use this information to accelerate the search. Firstly, it can quickly score all alternative models of a component once a correct solution has been found. Secondly, it can perform extensive analysis of identified solutions to find protein assemblies and can employ assembled models for subsequent searches. Thirdly, it is able to use a priori assembly information (derived from, for example, homologues) to speculatively place and score molecules, thereby customizing the search procedure to a certain class of protein molecule (for example, antibodies) and incorporating additional biological information into molecular replacement.
molecular replacement; pipeline; automation; phaser.MRage
A function for estimating the effective root-mean-square deviation in coordinates between two proteins has been developed that depends on both the sequence identity and the size of the protein and is optimized for use with molecular replacement in Phaser. A top peak translation-function Z-score of over 8 is found to be a reliable metric of when molecular replacement has succeeded.
The estimate of the root-mean-square deviation (r.m.s.d.) in coordinates between the model and the target is an essential parameter for calibrating likelihood functions for molecular replacement (MR). Good estimates of the r.m.s.d. lead to good estimates of the variance term in the likelihood functions, which increases signal to noise and hence success rates in the MR search. Phaser has hitherto used an estimate of the r.m.s.d. that only depends on the sequence identity between the model and target and which was not optimized for the MR likelihood functions. Variance-refinement functionality was added to Phaser to enable determination of the effective r.m.s.d. that optimized the log-likelihood gain (LLG) for a correct MR solution. Variance refinement was subsequently performed on a database of over 21 000 MR problems that sampled a range of sequence identities, protein sizes and protein fold classes. Success was monitored using the translation-function Z-score (TFZ), where a TFZ of 8 or over for the top peak was found to be a reliable indicator that MR had succeeded for these cases with one molecule in the asymmetric unit. Good estimates of the r.m.s.d. are correlated with the sequence identity and the protein size. A new estimate of the r.m.s.d. that uses these two parameters in a function optimized to fit the mean of the refined variance is implemented in Phaser and improves MR outcomes. Perturbing the initial estimate of the r.m.s.d. from the mean of the distribution in steps of standard deviations of the distribution further increases MR success rates.
Phaser; maximum likelihood; molecular replacement
A genetic algorithm has been developed to optimize the phases of the strongest reflections in SIR/SAD data. This is shown to facilitate density modification and model building in several test cases.
Experimental phasing of diffraction data from macromolecular crystals involves deriving phase probability distributions. These distributions are often bimodal, making their weighted average, the centroid phase, improbable, so that electron-density maps computed using centroid phases are often non-interpretable. Density modification brings in information about the characteristics of electron density in protein crystals. In successful cases, this allows a choice between the modes in the phase probability distributions, and the maps can cross the borderline between non-interpretable and interpretable. Based on the suggestions by Vekhter [Vekhter (2005 ▶), Acta Cryst. D61, 899–902], the impact of identifying optimized phases for a small number of strong reflections prior to the density-modification process was investigated while using the centroid phase as a starting point for the remaining reflections. A genetic algorithm was developed that optimizes the quality of such phases using the skewness of the density map as a target function. Phases optimized in this way are then used in density modification. In most of the tests, the resulting maps were of higher quality than maps generated from the original centroid phases. In one of the test cases, the new method sufficiently improved a marginal set of experimental SAD phases to enable successful map interpretation. A computer program, SISA, has been developed to apply this method for phase improvement in macromolecular crystallography.
experimental phasing; density modification; genetic algorithms
The statistical effects of translational noncrystallographic symmetry can be characterized by maximizing parameters describing the noncrystallographic symmetry in a likelihood function, thereby unmasking the competing statistical effects of twinning.
In the case of translational noncrystallographic symmetry (tNCS), two or more copies of a component in the asymmetric unit of the crystal are present in a similar orientation. This causes systematic modulations of the reflection intensities in the diffraction pattern, leading to problems with structure determination and refinement methods that assume, either implicitly or explicitly, that the distribution of intensities is a function only of resolution. To characterize the statistical effects of tNCS accurately, it is necessary to determine the translation relating the copies, any small rotational differences in their orientations, and the size of random coordinate differences caused by conformational differences. An algorithm to estimate these parameters and refine their values against a likelihood function is presented, and it is shown that by accounting for the statistical effects of tNCS it is possible to unmask the competing statistical effects of twinning and tNCS and to more robustly assess the crystal for the presence of twinning.
translational noncrystallographic symmetry; intensity statistics; twinning; maximum likelihood
A density-based procedure is described for improving a homology model that is locally accurate but differs globally. The model is deformed to match the map and refined, yielding an improved starting point for density modification and further model-building.
An approach is presented for addressing the challenge of model rebuilding after molecular replacement in cases where the placed template is very different from the structure to be determined. The approach takes advantage of the observation that a template and target structure may have local structures that can be superimposed much more closely than can their complete structures. A density-guided procedure for deformation of a properly placed template is introduced. A shift in the coordinates of each residue in the structure is calculated based on optimizing the match of model density within a 6 Å radius of the center of that residue with a prime-and-switch electron-density map. The shifts are smoothed and applied to the atoms in each residue, leading to local deformation of the template that improves the match of map and model. The model is then refined to improve the geometry and the fit of model to the structure-factor data. A new map is then calculated and the process is repeated until convergence. The procedure can extend the routine applicability of automated molecular replacement, model building and refinement to search models with over 2 Å r.m.s.d. representing 65–100% of the structure.
molecular replacement; automation; macromolecular crystallography; structure similarity; modeling; Phenix; morphing
DEN refinement and automated model building with AutoBuild were used to determine the structure of a putative succinyl-diaminopimelate desuccinylase from C. glutamicum. This difficult case of molecular-replacement phasing shows that the synergism between DEN refinement and AutoBuild outperforms standard refinement protocols.
Phasing by molecular replacement remains difficult for targets that are far from the search model or in situations where the crystal diffracts only weakly or to low resolution. Here, the process of determining and refining the structure of Cgl1109, a putative succinyl-diaminopimelate desuccinylase from Corynebacterium glutamicum, at ∼3 Å resolution is described using a combination of homology modeling with MODELLER, molecular-replacement phasing with Phaser, deformable elastic network (DEN) refinement and automated model building using AutoBuild in a semi-automated fashion, followed by final refinement cycles with phenix.refine and Coot. This difficult molecular-replacement case illustrates the power of including DEN restraints derived from a starting model to guide the movements of the model during refinement. The resulting improved model phases provide better starting points for automated model building and produce more significant difference peaks in anomalous difference Fourier maps to locate anomalous scatterers than does standard refinement. This example also illustrates a current limitation of automated procedures that require manual adjustment of local sequence misalignments between the homology model and the target sequence.
reciprocal-space refinement; DEN refinement; real-space refinement; automated model building; succinyl-diaminopimelate desuccinylase
An overview of the CCP4 software suite for macromolecular crystallography is given.
The CCP4 (Collaborative Computational Project, Number 4) software suite is a collection of programs and associated data and software libraries which can be used for macromolecular structure determination by X-ray crystallography. The suite is designed to be flexible, allowing users a number of methods of achieving their aims. The programs are from a wide variety of sources but are connected by a common infrastructure provided by standard file formats, data objects and graphical interfaces. Structure solution by macromolecular crystallography is becoming increasingly automated and the CCP4 suite includes several automation pipelines. After giving a brief description of the evolution of CCP4 over the last 30 years, an overview of the current suite is given. While detailed descriptions are given in the accompanying articles, here it is shown how the individual programs contribute to a complete software package.
CCP4; macromolecular crystallography; software; collaboration; automation; macromolecular structure determination
The molecular-replacement model-improvement program Sculptor is described, with an analysis of the algorithms used.
In molecular replacement, the quality of models can be improved by transferring information contained in sequence alignment to the template structure. A family of algorithms has been developed that make use of the sequence-similarity score calculated from residue-substitution scores smoothed over nearby residues to delete or downweight parts of the model that are unreliable. These algorithms have been implemented in the program Sculptor, together with well established methods that are in common use for model improvement. An analysis of the new algorithms has been performed by studying the effect of algorithm parameters on the quality of models. Benchmarking against existing techniques shows that models from Sculptor compare favourably, especially if the alignment is unreliable. Carrying out multiple trials using alternative models created from the same structure but using different algorithm parameters can significantly improve the success rate.
molecular replacement; model improvement; residue-substitution score
SAD data can be used in Phaser to solve novel structures, supplement molecular-replacement phase information or identify anomalous scatterers from a final refined model.
Phaser is a program that implements likelihood-based methods to solve macromolecular crystal structures, currently by molecular replacement or single-wavelength anomalous diffraction (SAD). SAD phasing is based on a likelihood target derived from the joint probability distribution of observed and calculated pairs of Friedel-related structure factors. This target combines information from the total structure factor (primarily non-anomalous scattering) and the difference between the Friedel mates (anomalous scattering). Phasing starts from a substructure, which is usually but not necessarily a set of anomalous scatterers. The substructure can also be a protein model, such as one obtained by molecular replacement. Additional atoms are found using a log-likelihood gradient map, which shows the sites where the addition of scattering from a particular atom type would improve the likelihood score. An automated completion algorithm adds new sites, choosing optionally among different atom types, adds anisotropic B-factor parameters if appropriate and deletes atoms that refine to low occupancy. Log-likelihood gradient maps can also identify which atoms in a refined protein structure are anomalous scatterers, such as metal or halide ions. These maps are more sensitive than conventional model-phased anomalous difference Fouriers and the iterative completion algorithm is able to find a significantly larger number of convincing sites.
SAD phasing; likelihood; molecular replacement
The pitfalls of experimental phasing are described.
Developments in protein crystal structure determination by experimental phasing are reviewed, emphasizing the theoretical continuum between experimental phasing, density modification, model building and refinement. Traditional notions of the composition of the substructure and the best coefficients for map generation are discussed. Pitfalls such as determining the enantiomorph, identifying centrosymmetry (or pseudo-symmetry) in the substructure and crystal twinning are discussed in detail. An appendix introduces combined real–imaginary log-likelihood gradient map coefficients for SAD phasing and their use for substructure completion as implemented in the software Phaser. Supplementary material includes animated probabilistic Harker diagrams showing how maximum-likelihood-based phasing methods can be used to refine parameters in the case of SIR and MIR; it is hoped that these will be useful for those teaching best practice in experimental phasing methods.
enantiomers; handedness; absolute configuration; chirality; twinning; experimental phasing
The PHENIX software for macromolecular structure determination is described.
Macromolecular X-ray crystallography is routinely applied to understand biological processes at a molecular level. However, significant time and effort are still required to solve and complete many of these structures because of the need for manual interpretation of complex numerical data using many software packages and the repeated use of interactive three-dimensional graphics. PHENIX has been developed to provide a comprehensive system for macromolecular crystallographic structure solution with an emphasis on the automation of all procedures. This has relied on the development of algorithms that minimize or eliminate subjective input, the development of algorithms that automate procedures that are traditionally performed by hand and, finally, the development of a framework that allows a tight integration between the algorithms.
PHENIX; Python; macromolecular crystallography; algorithms
Ten measures of experimental electron-density-map quality are examined and the skewness of electron density is found to be the best indicator of actual map quality. A Bayesian approach to estimating map quality is developed and used in the PHENIX AutoSol wizard to make decisions during automated structure solution.
Estimates of the quality of experimental maps are important in many stages of structure determination of macromolecules. Map quality is defined here as the correlation between a map and the corresponding map obtained using phases from the final refined model. Here, ten different measures of experimental map quality were examined using a set of 1359 maps calculated by re-analysis of 246 solved MAD, SAD and MIR data sets. A simple Bayesian approach to estimation of map quality from one or more measures is presented. It was found that a Bayesian estimator based on the skewness of the density values in an electron-density map is the most accurate of the ten individual Bayesian estimators of map quality examined, with a correlation between estimated and actual map quality of 0.90. A combination of the skewness of electron density with the local correlation of r.m.s. density gives a further improvement in estimating map quality, with an overall correlation coefficient of 0.92. The PHENIX AutoSol wizard carries out automated structure solution based on any combination of SAD, MAD, SIR or MIR data sets. The wizard is based on tools from the PHENIX package and uses the Bayesian estimates of map quality described here to choose the highest quality solutions after experimental phasing.
structure solution; scoring; Protein Data Bank; phasing; decision-making; PHENIX; experimental electron-density maps
A procedure for carrying out iterative model building, density modification and refinement is presented in which the density in an OMITregion is essentially unbiased by an atomic model. Density from a set of overlapping OMIT regions can be combined to create a composite ‘iterative-build’ OMIT map that is everywhere unbiased by an atomic model but also everywhere benefiting from the model-based information present elsewhere in the unit cell. The procedure may have applications in the validation of specific features in atomic models as well as in overall model validation. The procedure is demonstrated with a molecular-replacement structure and with an experimentally phased structure and a variation on the method is demonstrated by removing model bias from a structure from the Protein Data Bank.
An OMIT procedure is presented that has the benefits of iterative model building density modification and refinement yet is essentially unbiased by the atomic model that is built.
A procedure for carrying out iterative model building, density modification and refinement is presented in which the density in an OMIT region is essentially unbiased by an atomic model. Density from a set of overlapping OMIT regions can be combined to create a composite ‘iterative-build’ OMIT map that is everywhere unbiased by an atomic model but also everywhere benefiting from the model-based information present elsewhere in the unit cell. The procedure may have applications in the validation of specific features in atomic models as well as in overall model validation. The procedure is demonstrated with a molecular-replacement structure and with an experimentally phased structure and a variation on the method is demonstrated by removing model bias from a structure from the Protein Data Bank.
model building; model validation; macromolecular models; Protein Data Bank; refinement; OMIT maps; bias; structure refinement; PHENIX
The highly automated PHENIX AutoBuild wizard is described. The procedure can be applied equally well to phases derived from isomorphous/anomalous and molecular-replacement methods.
The PHENIX AutoBuild wizard is a highly automated tool for iterative model building, structure refinement and density modification using RESOLVE model building, RESOLVE statistical density modification and phenix.refine structure refinement. Recent advances in the AutoBuild wizard and phenix.refine include automated detection and application of NCS from models as they are built, extensive model-completion algorithms and automated solvent-molecule picking. Model-completion algorithms in the AutoBuild wizard include loop building, crossovers between chains in different models of a structure and side-chain optimization. The AutoBuild wizard has been applied to a set of 48 structures at resolutions ranging from 1.1 to 3.2 Å, resulting in a mean R factor of 0.24 and a mean free R factor of 0.29. The R factor of the final model is dependent on the quality of the starting electron density and is relatively independent of resolution.
model building; model completion; macromolecular models; Protein Data Bank; structure refinement; PHENIX
Heterogeneity in ensembles generated by independent model rebuilding principally reflects the limitations of the data and of the model-building process rather than the diversity of structures in the crystal.
Automation of iterative model building, density modification and refinement in macromolecular crystallography has made it feasible to carry out this entire process multiple times. By using different random seeds in the process, a number of different models compatible with experimental data can be created. Sets of models were generated in this way using real data for ten protein structures from the Protein Data Bank and using synthetic data generated at various resolutions. Most of the heterogeneity among models produced in this way is in the side chains and loops on the protein surface. Possible interpretations of the variation among models created by repetitive rebuilding were investigated. Synthetic data were created in which a crystal structure was modelled as the average of a set of ‘perfect’ structures and the range of models obtained by rebuilding a single starting model was examined. The standard deviations of coordinates in models obtained by repetitive rebuilding at high resolution are small, while those obtained for the same synthetic crystal structure at low resolution are large, so that the diversity within a group of models cannot generally be a quantitative reflection of the actual structures in a crystal. Instead, the group of structures obtained by repetitive rebuilding reflects the precision of the models, and the standard deviation of coordinates of these structures is a lower bound estimate of the uncertainty in coordinates of the individual models.
model building; model completion; coordinate errors; models; Protein Data Bank; convergence; reproducibility; heterogeneity; precision; accuracy