In an effort to better understand the control of the formation of branched fatty acids in Micrococcus luteus, the structure of β-ketoacyl-ACP synthase III, which catalyzes the initial step of fatty-acid biosynthesis, has been determined.
Micrococcus luteus is a Gram-positive bacterium that produces iso- and anteiso-branched alkenes by the head-to-head condensation of fatty-acid thioesters [coenzyme A (CoA) or acyl carrier protein (ACP)]; this activity is of interest for the production of advanced biofuels. In an effort to better understand the control of the formation of branched fatty acids in M. luteus, the structure of FabH (MlFabH) was determined. FabH, or β-ketoacyl-ACP synthase III, catalyzes the initial step of fatty-acid biosynthesis: the condensation of malonyl-ACP with an acyl-CoA. Analysis of the MlFabH structure provides insights into its substrate selectivity with regard to length and branching of the acyl-CoA. The most structurally divergent region of FabH is the L9 loop region located at the dimer interface, which is involved in the formation of the acyl-binding channel and thus limits the substrate-channel size. The residue Phe336, which is positioned near the catalytic triad, appears to play a major role in branched-substrate selectivity. In addition to structural studies of MlFabH, transcriptional studies of M. luteus were also performed, focusing on the increase in the ratio of anteiso:iso-branched alkenes that was observed during the transition from early to late stationary phase. Gene-expression microarray analysis identified two genes involved in leucine and isoleucine metabolism that may explain this transition.
biofuels; β-ketoacyl-ACP synthase III; iso- and anteiso-branched alkenes; microarray
The statistical effects of translational noncrystallographic symmetry can be characterized by maximizing parameters describing the noncrystallographic symmetry in a likelihood function, thereby unmasking the competing statistical effects of twinning.
In the case of translational noncrystallographic symmetry (tNCS), two or more copies of a component in the asymmetric unit of the crystal are present in a similar orientation. This causes systematic modulations of the reflection intensities in the diffraction pattern, leading to problems with structure determination and refinement methods that assume, either implicitly or explicitly, that the distribution of intensities is a function only of resolution. To characterize the statistical effects of tNCS accurately, it is necessary to determine the translation relating the copies, any small rotational differences in their orientations, and the size of random coordinate differences caused by conformational differences. An algorithm to estimate these parameters and refine their values against a likelihood function is presented, and it is shown that by accounting for the statistical effects of tNCS it is possible to unmask the competing statistical effects of twinning and tNCS and to more robustly assess the crystal for the presence of twinning.
translational noncrystallographic symmetry; intensity statistics; twinning; maximum likelihood
A density-based procedure is described for improving a homology model that is locally accurate but differs globally. The model is deformed to match the map and refined, yielding an improved starting point for density modification and further model-building.
An approach is presented for addressing the challenge of model rebuilding after molecular replacement in cases where the placed template is very different from the structure to be determined. The approach takes advantage of the observation that a template and target structure may have local structures that can be superimposed much more closely than can their complete structures. A density-guided procedure for deformation of a properly placed template is introduced. A shift in the coordinates of each residue in the structure is calculated based on optimizing the match of model density within a 6 Å radius of the center of that residue with a prime-and-switch electron-density map. The shifts are smoothed and applied to the atoms in each residue, leading to local deformation of the template that improves the match of map and model. The model is then refined to improve the geometry and the fit of model to the structure-factor data. A new map is then calculated and the process is repeated until convergence. The procedure can extend the routine applicability of automated molecular replacement, model building and refinement to search models with over 2 Å r.m.s.d. representing 65–100% of the structure.
molecular replacement; automation; macromolecular crystallography; structure similarity; modeling; Phenix; morphing
phenix.refine is a program within the PHENIX package that supports crystallographic structure refinement against experimental data with a wide range of upper resolution limits using a large repertoire of model parameterizations. This paper presents an overview of the major phenix.refine features, with extensive literature references for readers interested in more detailed discussions of the methods.
phenix.refine is a program within the PHENIX package that supports crystallographic structure refinement against experimental data with a wide range of upper resolution limits using a large repertoire of model parameterizations. It has several automation features and is also highly flexible. Several hundred parameters enable extensive customizations for complex use cases. Multiple user-defined refinement strategies can be applied to specific parts of the model in a single refinement run. An intuitive graphical user interface is available to guide novice users and to assist advanced users in managing refinement projects. X-ray or neutron diffraction data can be used separately or jointly in refinement. phenix.refine is tightly integrated into the PHENIX suite, where it serves as a critical component in automated model building, final structure refinement, structure validation and deposition to the wwPDB. This paper presents an overview of the major phenix.refine features, with extensive literature references for readers interested in more detailed discussions of the methods.
structure refinement; PHENIX; joint X-ray/neutron refinement; maximum likelihood; TLS; simulated annealing; subatomic resolution; real-space refinement; twinning; NCS
Recent developments in PHENIX are reported that allow the use of reference-model torsion restraints, secondary-structure hydrogen-bond restraints and Ramachandran restraints for improved macromolecular refinement in phenix.refine at low resolution.
Traditional methods for macromolecular refinement often have limited success at low resolution (3.0–3.5 Å or worse), producing models that score poorly on crystallographic and geometric validation criteria. To improve low-resolution refinement, knowledge from macromolecular chemistry and homology was used to add three new coordinate-restraint functions to the refinement program phenix.refine. Firstly, a ‘reference-model’ method uses an identical or homologous higher resolution model to add restraints on torsion angles to the geometric target function. Secondly, automatic restraints for common secondary-structure elements in proteins and nucleic acids were implemented that can help to preserve the secondary-structure geometry, which is often distorted at low resolution. Lastly, we have implemented Ramachandran-based restraints on the backbone torsion angles. In this method, a ϕ,ψ term is added to the geometric target function to minimize a modified Ramachandran landscape that smoothly combines favorable peaks identified from nonredundant high-quality data with unfavorable peaks calculated using a clash-based pseudo-energy function. All three methods show improved MolProbity validation statistics, typically complemented by a lowered R
free and a decreased gap between R
work and R
macromolecular crystallography; low resolution; refinement; automation
DEN refinement and automated model building with AutoBuild were used to determine the structure of a putative succinyl-diaminopimelate desuccinylase from C. glutamicum. This difficult case of molecular-replacement phasing shows that the synergism between DEN refinement and AutoBuild outperforms standard refinement protocols.
Phasing by molecular replacement remains difficult for targets that are far from the search model or in situations where the crystal diffracts only weakly or to low resolution. Here, the process of determining and refining the structure of Cgl1109, a putative succinyl-diaminopimelate desuccinylase from Corynebacterium glutamicum, at ∼3 Å resolution is described using a combination of homology modeling with MODELLER, molecular-replacement phasing with Phaser, deformable elastic network (DEN) refinement and automated model building using AutoBuild in a semi-automated fashion, followed by final refinement cycles with phenix.refine and Coot. This difficult molecular-replacement case illustrates the power of including DEN restraints derived from a starting model to guide the movements of the model during refinement. The resulting improved model phases provide better starting points for automated model building and produce more significant difference peaks in anomalous difference Fourier maps to locate anomalous scatterers than does standard refinement. This example also illustrates a current limitation of automated procedures that require manual adjustment of local sequence misalignments between the homology model and the target sequence.
reciprocal-space refinement; DEN refinement; real-space refinement; automated model building; succinyl-diaminopimelate desuccinylase
The implementation of crystallographic structure-refinement procedures that include both X-ray and neutron data (separate or jointly) in the PHENIX system is described.
Approximately 85% of the structures deposited in the Protein Data Bank have been solved using X-ray crystallography, making it the leading method for three-dimensional structure determination of macromolecules. One of the limitations of the method is that the typical data quality (resolution) does not allow the direct determination of H-atom positions. Most hydrogen positions can be inferred from the positions of other atoms and therefore can be readily included into the structure model as a priori knowledge. However, this may not be the case in biologically active sites of macromolecules, where the presence and position of hydrogen is crucial to the enzymatic mechanism. This makes the application of neutron crystallography in biology particularly important, as H atoms can be clearly located in experimental neutron scattering density maps. Without exception, when a neutron structure is determined the corresponding X-ray structure is also known, making it possible to derive the complete structure using both data sets. Here, the implementation of crystallographic structure-refinement procedures that include both X-ray and neutron data (separate or jointly) in the PHENIX system is described.
structure refinement; neutrons; joint X-ray and neutron refinement; PHENIX
A new software system for automated ligand coordinate and restraint generation is presented.
The electronic Ligand Builder and Optimization Workbench (eLBOW) is a program module of the PHENIX suite of computational crystallographic software. It is designed to be a flexible procedure that uses simple and fast quantum-chemical techniques to provide chemically accurate information for novel and known ligands alike. A variety of input formats and options allow the attainment of a number of diverse goals including geometry optimization and generation of restraints.
ligands; coordinates; restraints; Python; object-oriented programming
Here, the crystal structure of an endoglucanase, Cel9A, from Alicyclobacillus acidocaldarius (Aa_Cel9A) is reported which displays a modular architecture composed of an N-terminal Ig-like domain connected to the catalytic domain. This paper describes the overall structure and the detailed contacts between the two modules.
The production of biofuels using biomass is an alternative route to support the growing global demand for energy and to also reduce the environmental problems caused by the burning of fossil fuels. Cellulases are likely to play an important role in the degradation of biomass and the production of sugars for subsequent fermentation to fuel. Here, the crystal structure of an endoglucanase, Cel9A, from Alicyclobacillus acidocaldarius (Aa_Cel9A) is reported which displays a modular architecture composed of an N-terminal Ig-like domain connected to the catalytic domain. This paper describes the overall structure and the detailed contacts between the two modules. Analysis suggests that the interaction involving the residues Gln13 (from the Ig-like module) and Phe439 (from the catalytic module) is important in maintaining the correct conformation of the catalytic module required for protein activity. Moreover, the Aa_Cel9A structure shows three metal-binding sites that are associated with the thermostability and/or substrate affinity of the enzyme.
endoglucanases; thermoacidophiles; cellulases; biofuels
The PHENIX software for macromolecular structure determination is described.
Macromolecular X-ray crystallography is routinely applied to understand biological processes at a molecular level. However, significant time and effort are still required to solve and complete many of these structures because of the need for manual interpretation of complex numerical data using many software packages and the repeated use of interactive three-dimensional graphics. PHENIX has been developed to provide a comprehensive system for macromolecular crystallographic structure solution with an emphasis on the automation of all procedures. This has relied on the development of algorithms that minimize or eliminate subjective input, the development of algorithms that automate procedures that are traditionally performed by hand and, finally, the development of a framework that allows a tight integration between the algorithms.
PHENIX; Python; macromolecular crystallography; algorithms
Conventional and free R factors and their difference, as well as the ratio of the number of measured reflections to the number of atoms in the crystal, were studied as functions of the resolution at which the structures were reported. When the resolution was taken uniformly on a logarithmic scale, the most frequent values of these functions were quasi-linear over a large resolution range.
Predictions of the possible model parameterization and of the values of model characteristics such as R factors are important for macromolecular refinement and validation protocols. One of the key parameters defining these and other values is the resolution of the experimentally measured diffraction data. The higher the resolution, the larger the number of diffraction data N
ref, the larger its ratio to the number N
at of non-H atoms, the more parameters per atom can be used for modelling and the more precise and detailed a model can be obtained. The ratio N
at was calculated for models deposited in the Protein Data Bank as a function of the resolution at which the structures were reported. The most frequent values for this distribution depend essentially linearly on resolution when the latter is expressed on a uniform logarithmic scale. This defines simple analytic formulae for the typical Matthews coefficient and for the typically allowed number of parameters per atom for crystals diffracting to a given resolution. This simple dependence makes it possible in many cases to estimate the expected resolution of the experimental data for a crystal with a given Matthews coefficient. When expressed using the same logarithmic scale, the most frequent values for R and R
free factors and for their difference are also essentially linear across a large resolution range. The minimal R-factor values are practically constant at resolutions better than 3 Å, below which they begin to grow sharply. This simple dependence on the resolution allows the prediction of expected R-factor values for unknown structures and may be used to guide model refinement and validation.
resolution; logarithmic scale; R factor; data-to-parameter ratio
Averaged kick maps are the sum of a series of individual kick maps, where each map is calculated from atomic coordinates modified by random shifts. These maps offer the possibility of an improved and less model-biased map interpretation.
Use of reliable density maps is crucial for rapid and successful crystal structure determination. Here, the averaged kick (AK) map approach is investigated, its application is generalized and it is compared with other map-calculation methods. AK maps are the sum of a series of kick maps, where each kick map is calculated from atomic coordinates modified by random shifts. As such, they are a numerical analogue of maximum-likelihood maps. AK maps can be unweighted or maximum-likelihood (σA) weighted. Analysis shows that they are comparable and correspond better to the final model than σA and simulated-annealing maps. The AK maps were challenged by a difficult structure-validation case, in which they were able to clarify the problematic region in the density without the need for model rebuilding. The conclusion is that AK maps can be useful throughout the entire progress of crystal structure determination, offering the possibility of improved map interpretation.
kick maps; OMIT maps; density-map calculation; model bias; maximum likelihood
X-ray and neutron crystallographic data have been combined in a joint structure-refinement procedure that has been developed using recent advances in modern computational methodologies, including cross-validated maximum-likelihood target functions with gradient-based optimization and simulated annealing.
X-ray and neutron crystallographic techniques provide complementary information on the structure and function of biological macromolecules. X-ray and neutron (XN) crystallographic data have been combined in a joint structure-refinement procedure that has been developed using recent advances in modern computational methodologies, including cross-validated maximum-likelihood target functions with gradient-based optimization and simulated annealing. The XN approach for complete (including hydrogen) macromolecular structure analysis provides more accurate and complete structures, as demonstrated for diisopropyl fluorophosphatase, photoactive yellow protein and human aldose reductase. Furthermore, this method has several practical advantages, including the easier determination of the orientation of water molecules, hydroxyl groups and some amino-acid side chains.
joint X-ray and neutron crystallography; structure refinement
Ten measures of experimental electron-density-map quality are examined and the skewness of electron density is found to be the best indicator of actual map quality. A Bayesian approach to estimating map quality is developed and used in the PHENIX AutoSol wizard to make decisions during automated structure solution.
Estimates of the quality of experimental maps are important in many stages of structure determination of macromolecules. Map quality is defined here as the correlation between a map and the corresponding map obtained using phases from the final refined model. Here, ten different measures of experimental map quality were examined using a set of 1359 maps calculated by re-analysis of 246 solved MAD, SAD and MIR data sets. A simple Bayesian approach to estimation of map quality from one or more measures is presented. It was found that a Bayesian estimator based on the skewness of the density values in an electron-density map is the most accurate of the ten individual Bayesian estimators of map quality examined, with a correlation between estimated and actual map quality of 0.90. A combination of the skewness of electron density with the local correlation of r.m.s. density gives a further improvement in estimating map quality, with an overall correlation coefficient of 0.92. The PHENIX AutoSol wizard carries out automated structure solution based on any combination of SAD, MAD, SIR or MIR data sets. The wizard is based on tools from the PHENIX package and uses the Bayesian estimates of map quality described here to choose the highest quality solutions after experimental phasing.
structure solution; scoring; Protein Data Bank; phasing; decision-making; PHENIX; experimental electron-density maps
The representation of crystallographic model characteristics in the form of a polygon allows the quick comparison of a model with a set of previously solved structures.
A crystallographic macromolecular model is typically characterized by a list of quality criteria, such as R factors, deviations from ideal stereochemistry and average B factors, which are usually provided as tables in publications or in structural databases. In order to facilitate a quick model-quality evaluation, a graphical representation is proposed. Each key parameter such as R factor or bond-length deviation from ‘ideal values’ is shown graphically as a point on a ‘ruler’. These rulers are plotted as a set of lines with the same origin, forming a hub and spokes. Different parts of the rulers are coloured differently to reflect the frequency (red for a low frequency, blue for a high frequency) with which the corresponding values are observed in a reference set of structures determined previously. The points for a given model marked on these lines are connected to form a polygon. A polygon that is strongly compressed or dilated along some axes reveals unusually low or high values of the corresponding characteristics. Polygon vertices in ‘red zones’ indicate parameters which lie outside typical values.
model quality; PDB; validation; refinement; PHENIX
A procedure for carrying out iterative model building, density modification and refinement is presented in which the density in an OMITregion is essentially unbiased by an atomic model. Density from a set of overlapping OMIT regions can be combined to create a composite ‘iterative-build’ OMIT map that is everywhere unbiased by an atomic model but also everywhere benefiting from the model-based information present elsewhere in the unit cell. The procedure may have applications in the validation of specific features in atomic models as well as in overall model validation. The procedure is demonstrated with a molecular-replacement structure and with an experimentally phased structure and a variation on the method is demonstrated by removing model bias from a structure from the Protein Data Bank.
An OMIT procedure is presented that has the benefits of iterative model building density modification and refinement yet is essentially unbiased by the atomic model that is built.
A procedure for carrying out iterative model building, density modification and refinement is presented in which the density in an OMIT region is essentially unbiased by an atomic model. Density from a set of overlapping OMIT regions can be combined to create a composite ‘iterative-build’ OMIT map that is everywhere unbiased by an atomic model but also everywhere benefiting from the model-based information present elsewhere in the unit cell. The procedure may have applications in the validation of specific features in atomic models as well as in overall model validation. The procedure is demonstrated with a molecular-replacement structure and with an experimentally phased structure and a variation on the method is demonstrated by removing model bias from a structure from the Protein Data Bank.
model building; model validation; macromolecular models; Protein Data Bank; refinement; OMIT maps; bias; structure refinement; PHENIX
The highly automated PHENIX AutoBuild wizard is described. The procedure can be applied equally well to phases derived from isomorphous/anomalous and molecular-replacement methods.
The PHENIX AutoBuild wizard is a highly automated tool for iterative model building, structure refinement and density modification using RESOLVE model building, RESOLVE statistical density modification and phenix.refine structure refinement. Recent advances in the AutoBuild wizard and phenix.refine include automated detection and application of NCS from models as they are built, extensive model-completion algorithms and automated solvent-molecule picking. Model-completion algorithms in the AutoBuild wizard include loop building, crossovers between chains in different models of a structure and side-chain optimization. The AutoBuild wizard has been applied to a set of 48 structures at resolutions ranging from 1.1 to 3.2 Å, resulting in a mean R factor of 0.24 and a mean free R factor of 0.29. The R factor of the final model is dependent on the quality of the starting electron density and is relatively independent of resolution.
model building; model completion; macromolecular models; Protein Data Bank; structure refinement; PHENIX
The presence of pseudosymmetry can cause problems in structure determination and refinement. The relevant background and representative examples are presented.
It is not uncommon for protein crystals to crystallize with more than a single molecule per asymmetric unit. When more than a single molecule is present in the asymmetric unit, various pathological situations such as twinning, modulated crystals and pseudo translational or rotational symmetry can arise. The presence of pseudosymmetry can lead to uncertainties about the correct space group, especially in the presence of twinning. The background to certain common pathologies is presented and a new notation for space groups in unusual settings is introduced. The main concepts are illustrated with several examples from the literature and the Protein Data Bank.
pathology; twinning; pseudosymmetry
Modelling deformation electron density using interatomic scatters is simpler than multipolar methods, produces comparable results at subatomic resolution and can easily be applied to macromolecules.
A study of the accurate electron-density distribution in molecular crystals at subatomic resolution (better than ∼1.0 Å) requires more detailed models than those based on independent spherical atoms. A tool that is conventionally used in small-molecule crystallography is the multipolar model. Even at upper resolution limits of 0.8–1.0 Å, the number of experimental data is insufficient for full multipolar model refinement. As an alternative, a simpler model composed of conventional independent spherical atoms augmented by additional scatterers to model bonding effects has been proposed. Refinement of these mixed models for several benchmark data sets gave results that were comparable in quality with the results of multipolar refinement and superior to those for conventional models. Applications to several data sets of both small molecules and macromolecules are shown. These refinements were performed using the general-purpose macromolecular refinement module phenix.refine of the PHENIX package.
structure refinement; subatomic resolution; deformation density; interatomic scatterers; PHENIX
Heterogeneity in ensembles generated by independent model rebuilding principally reflects the limitations of the data and of the model-building process rather than the diversity of structures in the crystal.
Automation of iterative model building, density modification and refinement in macromolecular crystallography has made it feasible to carry out this entire process multiple times. By using different random seeds in the process, a number of different models compatible with experimental data can be created. Sets of models were generated in this way using real data for ten protein structures from the Protein Data Bank and using synthetic data generated at various resolutions. Most of the heterogeneity among models produced in this way is in the side chains and loops on the protein surface. Possible interpretations of the variation among models created by repetitive rebuilding were investigated. Synthetic data were created in which a crystal structure was modelled as the average of a set of ‘perfect’ structures and the range of models obtained by rebuilding a single starting model was examined. The standard deviations of coordinates in models obtained by repetitive rebuilding at high resolution are small, while those obtained for the same synthetic crystal structure at low resolution are large, so that the diversity within a group of models cannot generally be a quantitative reflection of the actual structures in a crystal. Instead, the group of structures obtained by repetitive rebuilding reflects the precision of the models, and the standard deviation of coordinates of these structures is a lower bound estimate of the uncertainty in coordinates of the individual models.
model building; model completion; coordinate errors; models; Protein Data Bank; convergence; reproducibility; heterogeneity; precision; accuracy
An automated ligand-fitting procedure is applied to (F
o − F
c)exp(iϕc) difference density for 200 commonly found ligands from macromolecular structures in the Protein Data Bank to identify ligands from density maps.
A procedure for the identification of ligands bound in crystal structures of macromolecules is described. Two characteristics of the density corresponding to a ligand are used in the identification procedure. One is the correlation of the ligand density with each of a set of test ligands after optimization of the fit of that ligand to the density. The other is the correlation of a fingerprint of the density with the fingerprint of model density for each possible ligand. The fingerprints consist of an ordered list of correlations of each the test ligands with the density. The two characteristics are scored using a Z-score approach in which the correlations are normalized to the mean and standard deviation of correlations found for a variety of mismatched ligand-density pairs, so that the Z scores are related to the probability of observing a particular value of the correlation by chance. The procedure was tested with a set of 200 of the most commonly found ligands in the Protein Data Bank, collectively representing 57% of all ligands in the Protein Data Bank. Using a combination of these two characteristics of ligand density, ranked lists of ligand identifications were made for representative (F
o − F
c)exp(iϕc) difference density from entries in the Protein Data Bank. In 48% of the 200 cases, the correct ligand was at the top of the ranked list of ligands. This approach may be useful in identification of unknown ligands in new macromolecular structures as well as in the identification of which ligands in a mixture have bound to a macromolecule.
model building; model completion; shape analysis
An automated ligand-fitting procedure has been developed and tested on 9327 ligands and (F
o − F
c)exp(iϕc) difference density from macromolecular structures in the Protein Data Bank.
A procedure for fitting of ligands to electron-density maps by first fitting a core fragment of the ligand to density and then extending the remainder of the ligand into density is presented. The approach was tested by fitting 9327 ligands over a wide range of resolutions (most are in the range 0.8–4.8 Å) from the Protein Data Bank (PDB) into (F
o − F
c)exp(iϕc) difference density calculated using entries from the PDB without these ligands. The procedure was able to place 58% of these 9327 ligands within 2 Å (r.m.s.d.) of the coordinates of the atoms in the original PDB entry for that ligand. The success of the fitting procedure was relatively insensitive to the size of the ligand in the range 10–100 non-H atoms and was only moderately sensitive to resolution, with the percentage of ligands placed near the coordinates of the original PDB entry for fits in the range 58–73% over all resolution ranges tested.
model building; model completion; shape analysis
A robust method for determining bulk-solvent and anisotropic scaling parameters for macromolecular refinement is described. A maximum-likelihood target function for determination of flat bulk-solvent model parameters and overall anisotropic scale factor is also proposed.
A reliable method for the determination of bulk-solvent model parameters and an overall anisotropic scale factor is of increasing importance as structure determination becomes more automated. Current protocols require the manual inspection of refinement results in order to detect errors in the calculation of these parameters. Here, a robust method for determining bulk-solvent and anisotropic scaling parameters in macromolecular refinement is described. The implementation of a maximum-likelihood target function for determining the same parameters is also discussed. The formulas and corresponding derivatives of the likelihood function with respect to the solvent parameters and the components of anisotropic scale matrix are presented. These algorithms are implemented in the CCTBX bulk-solvent correction and scaling module.
bulk-solvent correction; anisotropic scaling