A method is presented for the automatic building of nucleotide chains into electron density which is fast enough to be used in interactive model-building software. Likely nucleotides lying in the vicinity of the current view are located and then grown into connected chains in a fraction of a second. When this development is combined with existing tools, assisted manual model building is as simple as or simpler than for proteins.
The crystallographic structure solution of nucleotides and nucleotide complexes is now commonplace. The resulting electron-density maps are often poorer than for proteins, and as a result interpretation in terms of an atomic model can require significant effort, particularly in the case of large structures. While model building can be performed automatically, as with proteins, the process is time-consuming, taking minutes to days depending on the software and the size of the structure. A method is presented for the automatic building of nucleotide chains into electron density which is fast enough to be used in interactive model-building software, with extended chain fragments built around the current view position in a fraction of a second. The speed of the method arises from the determination of the ‘fingerprint’ of the sugar and phosphate groups in terms of conserved high-density and low-density features, coupled with a highly efficient scoring algorithm. Use cases include the rapid evaluation of an initial electron-density map, addition of nucleotide fragments to prebuilt protein structures, and in favourable cases the completion of the structure while automated model-building software is still running. The method has been incorporated into the Coot software package.
nucleic acid chain tracing; Coot
Algorithms and geometrical properties are described for the automated building of nucleic acids in experimental electron density.
Medium- to high-resolution X-ray structures of DNA and RNA molecules were investigated to find geometric properties useful for automated model building in crystallographic electron-density maps. We describe a simple method, starting from a list of electron-density ‘blobs’, for identifying backbone phosphates and nucleic acid bases based on properties of the local electron-density distribution. This knowledge should be useful for the automated building of nucleic acid models into electron-density maps. We show that the distances and angles involving C1′ and the P atoms, using the pseudo-torsion angles and that describe the …P—C1′—P—C1′… chain, provide a promising basis for building the nucleic acid polymer. These quantities show reasonably narrow distributions with asymmetry that should allow the direction of the phosphate backbone to be established.
nucleic acids; autobuilding; geometric properties; electron-density distribution
A novel method that uses the conformational distribution of Cα atoms in known structures is used to build short missing regions (‘loops’) in protein models. An initial tree of possible loop paths is pruned according to structural and electron-density criteria and the most likely loop conformation(s) are selected and built.
One of the most cumbersome and time-demanding tasks in completing a protein model is building short missing regions or ‘loops’. A method is presented that uses structural and electron-density information to build the most likely conformations of such loops. Using the distribution of angles and dihedral angles in pentapeptides as the driving parameters, a set of possible conformations for the Cα backbone of loops was generated. The most likely candidate is then selected in a hierarchical manner: new and stronger restraints are added while the loop is built. The weight of the electron-density correlation relative to geometrical considerations is gradually increased until the most likely loop is selected on map correlation alone. To conclude, the loop is refined against the electron density in real space. This is started by using structural information to trace a set of models for the Cα backbone of the loop. Only in later steps of the algorithm is the electron-density correlation used as a criterion to select the loop(s). Thus, this method is more robust in low-density regions than an approach using density as a primary criterion. The algorithm is implemented in a loop-building program, Loopy, which can be used either alone or as part of an automatic building cycle. Loopy can build loops of up to 14 residues in length within a couple of minutes. The average root-mean-square deviation of the Cα atoms in the loops built during validation was less than 0.4 Å. When implemented in the context of automated model building in ARP/wARP, Loopy can increase the completeness of the built models.
model building; loop modelling; Loopy
Noncrystallographic symmetry is automatically detected and used to achieve higher completeness and greater accuracy of automatically built protein structures at resolutions of 2.3 Å or poorer.
A novel method is presented for the automatic detection of noncrystallographic symmetry (NCS) in macromolecular crystal structure determination which does not require the derivation of molecular masks or the segmentation of density. It was found that throughout structure determination the NCS-related parts may be differently pronounced in the electron density. This often results in the modelling of molecular fragments of variable length and accuracy, especially during automated model-building procedures. These fragments were used to identify NCS relations in order to aid automated model building and refinement. In a number of test cases higher completeness and greater accuracy of the obtained structures were achieved, specifically at a crystallographic resolution of 2.3 Å or poorer. In the best case, the method allowed the building of up to 15% more residues automatically and a tripling of the average length of the built fragments.
noncrystallographic symmetry; automated model building
Automatic modeling methods using cryo-electron microscopy (cryoEM) density maps as constrains are promising approaches to building atomic models of individual proteins or protein domains. However, their application to large macromolecular assemblies has not been possible largely due to computational limitations inherent to such unsupervised methods. Here we describe a new method, EM-IMO, for building, modifying and refining local structures of protein models using cryoEM maps as a constraint. As a supervised refinement method, EM-IMO allows users to specify parameters derived from inspections, so as to guide, and as a consequence, significantly speed up the refinement. An EM-IMO-based refinement protocol is first benchmarked on a data set of 50 homology models using simulated density maps. A multi-scale refinement strategy that combines EM-IMO-based and molecular dynamics (MD)-based refinement is then applied to build backbone models for the seven conformers of the five capsid proteins in our near-atomic resolution cryoEM map of the grass carp reovirus (GCRV) virion, a member of the aquareovirus genus of the Reoviridae family. The refined models allow us to reconstruct a backbone model of the entire GCRV capsid and provide valuable functional insights that are described in the accompanying publication. Our study demonstrates that the integrated use of homology modeling and a multi-scale refinement protocol that combines supervised and automated structure refinement offers a practical strategy for building atomic models based on medium- to high-resolution cryoEM density maps.
cryo-electron microscopy; density fitting; homology modeling; structure refinement; protein structure prediction
The PDB_REDO pipeline aims to improve macromolecular structures by optimizing the crystallographic refinement parameters and performing partial model building. Here, algorithms are presented that allowed a web-server implementation of PDB_REDO, and the first user results are discussed.
The refinement and validation of a crystallographic structure model is the last step before the coordinates and the associated data are submitted to the Protein Data Bank (PDB). The success of the refinement procedure is typically assessed by validating the models against geometrical criteria and the diffraction data, and is an important step in ensuring the quality of the PDB public archive [Read et al. (2011 ▶), Structure, 19, 1395–1412]. The PDB_REDO procedure aims for ‘constructive validation’, aspiring to consistent and optimal refinement parameterization and pro-active model rebuilding, not only correcting errors but striving for optimal interpretation of the electron density. A web server for PDB_REDO has been implemented, allowing thorough, consistent and fully automated optimization of the refinement procedure in REFMAC and partial model rebuilding. The goal of the web server is to help practicing crystallographers to improve their model prior to submission to the PDB. For this, additional steps were implemented in the PDB_REDO pipeline, both in the refinement procedure, e.g. testing of resolution limits and k-fold cross-validation for small test sets, and as new validation criteria, e.g. the density-fit metrics implemented in EDSTATS and ligand validation as implemented in YASARA. Innovative ways to present the refinement and validation results to the user are also described, which together with auto-generated Coot scripts can guide users to subsequent model inspection and improvement. It is demonstrated that using the server can lead to substantial improvement of structure models before they are submitted to the PDB.
PDB_REDO; validation; model optimization
Circuitry mapping of metazoan neural systems is difficult because canonical neural regions (regions containing one or more copies of all components) are large, regional borders are uncertain, neuronal diversity is high, and potential network topologies so numerous that only anatomical ground truth can resolve them. Complete mapping of a specific network requires synaptic resolution, canonical region coverage, and robust neuronal classification. Though transmission electron microscopy (TEM) remains the optimal tool for network mapping, the process of building large serial section TEM (ssTEM) image volumes is rendered difficult by the need to precisely mosaic distorted image tiles and register distorted mosaics. Moreover, most molecular neuronal class markers are poorly compatible with optimal TEM imaging. Our objective was to build a complete framework for ultrastructural circuitry mapping. This framework combines strong TEM-compliant small molecule profiling with automated image tile mosaicking, automated slice-to-slice image registration, and gigabyte-scale image browsing for volume annotation. Specifically we show how ultrathin molecular profiling datasets and their resultant classification maps can be embedded into ssTEM datasets and how scripted acquisition tools (SerialEM), mosaicking and registration (ir-tools), and large slice viewers (MosaicBuilder, Viking) can be used to manage terabyte-scale volumes. These methods enable large-scale connectivity analyses of new and legacy data. In well-posed tasks (e.g., complete network mapping in retina), terabyte-scale image volumes that previously would require decades of assembly can now be completed in months. Perhaps more importantly, the fusion of molecular profiling, image acquisition by SerialEM, ir-tools volume assembly, and data viewers/annotators also allow ssTEM to be used as a prospective tool for discovery in nonneural systems and a practical screening methodology for neurogenetics. Finally, this framework provides a mechanism for parallelization of ssTEM imaging, volume assembly, and data analysis across an international user base, enhancing the productivity of a large cohort of electron microscopists.
Building an accurate neural network diagram of the vertebrate nervous system is a major challenge in neuroscience. Diverse groups of neurons that function together form complex patterns of connections often spanning large regions of brain tissue, with uncertain borders. Although serial-section transmission electron microscopy remains the optimal tool for fine anatomical analyses, the time and cost of the undertaking has been prohibitive. We have assembled a complete framework for ultrastructural mapping using conventional transmission electron microscopy that tremendously accelerates image analysis. This framework combines small-molecule profiling to classify cells, automated image acquisition, automated mosaic formation, automated slice-to-slice image registration, and large-scale image browsing for volume annotation. Terabyte-scale image volumes requiring decades or more to assemble manually can now be automatically built in a few months. This makes serial-section transmission electron microscopy practical for high-resolution exploration of all complex tissue systems (neural or nonneural) as well as for ultrastructural screening of genetic models.
A framework for analysis of terabyte-scale serial-section transmission electron microscopic (ssTEM) datasets overcomes computational barriers and accelerates high-resolution tissue analysis, providing a practical way of mapping complex neural circuitry and an effective screening tool for neurogenetics.
A procedure for iterative model-building, statistical density modification and refinement at moderate resolution (up to about 2.8 Å) is described.
An iterative process for improving the completeness and quality of atomic models automatically built at moderate resolution (up to about 2.8 Å) is described. The process consists of cycles of model building interspersed with cycles of refinement and combining phase information from the model with experimental phase information (if any) using statistical density modification. The process can lead to substantial improvements in both the accuracy and completeness of the model compared with a single cycle of model building. For eight test cases solved by MAD or SAD at resolutions ranging from 2.0 to 2.8 Å, the fraction of models built and assigned to sequence was 46–91% (mean of 65%) after the first cycle of building and refinement, and 78–95% (mean of 87%) after 20 cycles. In an additional test case, an incorrect model of gene 5 protein (PDB code 2gn5; r.m.s.d. of main-chain atoms from the more recent refined structure 1vqb at 1.56 Å) was rebuilt using only structure-factor amplitude information at varying resolutions from 2.0 to 3.0 Å. Rebuilding was effective at resolutions up to about 2.5 Å. The resulting models had 60–80% of the residues built and an r.m.s.d. of main-chain atoms from the refined structure of 0.20 to 0.62 Å. The algorithm is useful for building preliminary models of macromolecules suitable for an experienced crystallographer to extend, correct and fully refine.
density modification; model building; refinement
The highly automated PHENIX AutoBuild wizard is described. The procedure can be applied equally well to phases derived from isomorphous/anomalous and molecular-replacement methods.
The PHENIX AutoBuild wizard is a highly automated tool for iterative model building, structure refinement and density modification using RESOLVE model building, RESOLVE statistical density modification and phenix.refine structure refinement. Recent advances in the AutoBuild wizard and phenix.refine include automated detection and application of NCS from models as they are built, extensive model-completion algorithms and automated solvent-molecule picking. Model-completion algorithms in the AutoBuild wizard include loop building, crossovers between chains in different models of a structure and side-chain optimization. The AutoBuild wizard has been applied to a set of 48 structures at resolutions ranging from 1.1 to 3.2 Å, resulting in a mean R factor of 0.24 and a mean free R factor of 0.29. The R factor of the final model is dependent on the quality of the starting electron density and is relatively independent of resolution.
model building; model completion; macromolecular models; Protein Data Bank; structure refinement; PHENIX
DEN refinement and automated model building with AutoBuild were used to determine the structure of a putative succinyl-diaminopimelate desuccinylase from C. glutamicum. This difficult case of molecular-replacement phasing shows that the synergism between DEN refinement and AutoBuild outperforms standard refinement protocols.
Phasing by molecular replacement remains difficult for targets that are far from the search model or in situations where the crystal diffracts only weakly or to low resolution. Here, the process of determining and refining the structure of Cgl1109, a putative succinyl-diaminopimelate desuccinylase from Corynebacterium glutamicum, at ∼3 Å resolution is described using a combination of homology modeling with MODELLER, molecular-replacement phasing with Phaser, deformable elastic network (DEN) refinement and automated model building using AutoBuild in a semi-automated fashion, followed by final refinement cycles with phenix.refine and Coot. This difficult molecular-replacement case illustrates the power of including DEN restraints derived from a starting model to guide the movements of the model during refinement. The resulting improved model phases provide better starting points for automated model building and produce more significant difference peaks in anomalous difference Fourier maps to locate anomalous scatterers than does standard refinement. This example also illustrates a current limitation of automated procedures that require manual adjustment of local sequence misalignments between the homology model and the target sequence.
reciprocal-space refinement; DEN refinement; real-space refinement; automated model building; succinyl-diaminopimelate desuccinylase
A description is given of new tools to facilitate model building and refinement into electron cryo-microscopy reconstructions.
The recent rapid development of single-particle electron cryo-microscopy (cryo-EM) now allows structures to be solved by this method at resolutions close to 3 Å. Here, a number of tools to facilitate the interpretation of EM reconstructions with stereochemically reasonable all-atom models are described. The BALBES database has been repurposed as a tool for identifying protein folds from density maps. Modifications to Coot, including new Jiggle Fit and morphing tools and improved handling of nucleic acids, enhance its functionality for interpreting EM maps. REFMAC has been modified for optimal fitting of atomic models into EM maps. As external structural information can enhance the reliability of the derived atomic models, stabilize refinement and reduce overfitting, ProSMART has been extended to generate interatomic distance restraints from nucleic acid reference structures, and a new tool, LIBG, has been developed to generate nucleic acid base-pair and parallel-plane restraints. Furthermore, restraint generation has been integrated with visualization and editing in Coot, and these restraints have been applied to both real-space refinement in Coot and reciprocal-space refinement in REFMAC.
model building; refinement; electron cryo-microscopy reconstructions; LIBG
A method for automated macromolecular side-chain model building and for aligning the sequence to the map is described.
An algorithm is described for automated building of side chains in an electron-density map once a main-chain model is built and for alignment of the protein sequence to the map. The procedure is based on a comparison of electron density at the expected side-chain positions with electron-density templates. The templates are constructed from average amino-acid side-chain densities in 574 refined protein structures. For each contiguous segment of main chain, a matrix with entries corresponding to an estimate of the probability that each of the 20 amino acids is located at each position of the main-chain model is obtained. The probability that this segment corresponds to each possible alignment with the sequence of the protein is estimated using a Bayesian approach and high-confidence matches are kept. Once side-chain identities are determined, the most probable rotamer for each side chain is built into the model. The automated procedure has been implemented in the RESOLVE software. Combined with automated main-chain model building, the procedure produces a preliminary model suitable for refinement and extension by an experienced crystallographer.
model building; template matching
A method for automated macromolecular main-chain model building is described.
An algorithm for the automated macromolecular model building of polypeptide backbones is described. The procedure is hierarchical. In the initial stages, many overlapping polypeptide fragments are built. In subsequent stages, the fragments are extended and then connected. Identification of the locations of helical and β-strand regions is carried out by FFT-based template matching. Fragment libraries of helices and β-strands from refined protein structures are then positioned at the potential locations of helices and strands and the longest segments that fit the electron-density map are chosen. The helices and strands are then extended using fragment libraries consisting of sequences three amino acids long derived from refined protein structures. The resulting segments of polypeptide chain are then connected by choosing those which overlap at two or more Cα positions. The fully automated procedure has been implemented in RESOLVE and is capable of model building at resolutions as low as 3.5 Å. The algorithm is useful for building a preliminary main-chain model that can serve as a basis for refinement and side-chain addition.
model building; template matching; fragment extension
Coot is a molecular-graphics program designed to assist in the building of protein and other macromolecular models. The current state of development and available features are presented.
Coot is a molecular-graphics application for model building and validation of biological macromolecules. The program displays electron-density maps and atomic models and allows model manipulations such as idealization, real-space refinement, manual rotation/translation, rigid-body fitting, ligand search, solvation, mutations, rotamers and Ramachandran idealization. Furthermore, tools are provided for model validation as well as interfaces to external programs for refinement, validation and graphics. The software is designed to be easy to learn for novice users, which is achieved by ensuring that tools for common tasks are ‘discoverable’ through familiar user-interface elements (menus and toolbars) or by intuitive behaviour (mouse controls). Recent developments have focused on providing tools for expert users, with customisable key bindings, extensions and an extensive scripting interface. The software is under rapid development, but has already achieved very widespread use within the crystallographic community. The current state of the software is presented, with a description of the facilities available and of some of the underlying methods employed.
Coot; model building
Electron cryo-microscopy (cryo-EM) has played an increasingly important role in elucidating the structure and function of macromolecular assemblies in near native solution conditions. Typically, however, only non-atomic resolution reconstructions have been obtained for these large complexes, necessitating computational tools for integrating and extracting structural details. With recent advances in cryo-EM, maps at near-atomic resolutions have been achieved for several macromolecular assemblies from which models have been manually constructed. In this work, we describe a new interactive modeling toolkit called Gorgon targeted at intermediate to near-atomic resolution density maps (10-3.5 Å), particularly from cryo-EM. Gorgon's de novo modeling procedure couples sequence-based secondary structure prediction with feature detection and geometric modeling techniques to generate initial protein backbone models. Beyond model building, Gorgon is an extensible interactive visualization platform with a variety of computational tools for annotating a wide variety of 3D volumes. Examples from cryo-EM maps of Rotavirus and Rice Dwarf Virus are used to demonstrate its applicability to modeling protein structure.
cryo-EM; Gorgon; modeling; protein structure; near-atomic resolution
The optimization of WbdD crystals using a novel dehydration protocol and experimental phasing at 3.5 Å resolution by cross-crystal averaging followed by molecular replacement of electron density into a non-isomorphous 3.0 Å resolution native data set are reported.
WbdD is a bifunctional kinase/methyltransferase that is responsible for regulation of lipopolysaccharide O antigen polysaccharide chain length in Escherichia coli serotype O9a. Solving the crystal structure of this protein proved to be a challenge because the available crystals belonging to space group I23 only diffracted to low resolution (>95% of the crystals diffracted to resolution lower than 4 Å and most only to 8 Å) and were non-isomorphous, with changes in unit-cell dimensions of greater than 10%. Data from a serendipitously found single native crystal that diffracted to 3.0 Å resolution were non-isomorphous with a lower (3.5 Å) resolution selenomethionine data set. Here, a strategy for improving poor (3.5 Å resolution) initial phases by density modification and cross-crystal averaging with an additional 4.2 Å resolution data set to build a crude model of WbdD is desribed. Using this crude model as a mask to cut out the 3.5 Å resolution electron density yielded a successful molecular-replacement solution of the 3.0 Å resolution data set. The resulting map was used to build a complete model of WbdD. The hydration status of individual crystals appears to underpin the variable diffraction quality of WbdD crystals. After the initial structure had been solved, methods to control the hydration status of WbdD were developed and it was thus possible to routinely obtain high-resolution diffraction (to better than 2.5 Å resolution). This novel and facile crystal-dehydration protocol may be useful for similar challenging situations.
WbdD; crystal dehydration
Many Protein Data Bank (PDB) users assume that the deposited structural models are of high quality but forget that these models are derived from the interpretation of experimental data. The accuracy of atom coordinates is not homogeneous between models or throughout the same model. To avoid basing a research project on a flawed model, we present a tool for assessing the quality of ligands and binding sites in crystallographic models from the PDB.
The Validation HElper for LIgands and Binding Sites (VHELIBS) is software that aims to ease the validation of binding site and ligand coordinates for non-crystallographers (i.e., users with little or no crystallography knowledge). Using a convenient graphical user interface, it allows one to check how ligand and binding site coordinates fit to the electron density map. VHELIBS can use models from either the PDB or the PDB_REDO databank of re-refined and re-built crystallographic models. The user can specify threshold values for a series of properties related to the fit of coordinates to electron density (Real Space R, Real Space Correlation Coefficient and average occupancy are used by default). VHELIBS will automatically classify residues and ligands as Good, Dubious or Bad based on the specified limits. The user is also able to visually check the quality of the fit of residues and ligands to the electron density map and reclassify them if needed.
VHELIBS allows inexperienced users to examine the binding site and the ligand coordinates in relation to the experimental data. This is an important step to evaluate models for their fitness for drug discovery purposes such as structure-based pharmacophore development and protein-ligand docking experiments.
Electron density map; Binding site structure validation; Ligand structure validation; Protein structure validation; PDB; PDB_REDO
We recently determined the crystal structure of the functional core of human U1 snRNP, consisting of nine proteins and one RNA, based on a 5.5 Å resolution electron density map. At 5–7 Å resolution, α helices and β sheets appear as rods and slabs, respectively, hence it is not possible to determine protein fold de novo. Using inverse beam geometry, accurate anomalous signals were obtained from weakly diffracting and radiation sensitive P1 crystals. We were able to locate anomalous scatterers with positional errors below 2 Å. This enabled us not only to place protein domains of known structure accurately into the map but also to trace an extended polypeptide chain, of previously undetermined structure, using selenomethionine derivatives of single methionine mutants spaced along the sequence. This method of Se-Met scanning, in combination with structure prediction, is a powerful tool for building a protein of unknown fold into a low resolution electron density map.
Although accurate details in RNA structure are of great importance for understanding RNA function, the backbone conformation is difficult to determine, and most existing RNA structures show serious steric clashes (≥ 0.4Å overlap) when hydrogen atoms are taken into account. We have developed a program called RNABC (RNA Backbone Correction) that performs local perturbations to search for alternative conformations that avoid those steric clashes or other local geometry problems. Its input is an all-atom coordinate file for an RNA crystal structure (usually from the MolProbity web service), with problem areas specified. RNABC rebuilds a suite (the unit from sugar to sugar) by anchoring the phosphorus and base positions, which are clearest in crystallographic electron density, and reconstructing the other atoms using forward kinematics. Geometric parameters are constrained within user-specified tolerance of canonical or original values, and torsion angles are constrained to ranges defined through empirical database analyses. Several optimizations reduce the time required to search the many possible conformations. The output results are clustered and presented to the user, who can choose whether to accept one of the alternative conformations.
Two test evaluations show the effectiveness of RNABC, first on the S-motifs from 42 RNA structures, and second on the worst problem suites (clusters of bad clashes, or serious sugar pucker outliers) in 25 unrelated RNA structures. Among the 101 S-motifs, 88 had diagnosed problems, and RNABC produced clash-free conformations with acceptable geometry for 71 of those (about 80%). For the 154 worst problem suites, RNABC proposed alternative conformations for 72. All but 8 of those were judged acceptable after examining electron density (where available) and local conformation. Thus, even for these worst cases, nearly half the time RNABC suggested corrections suitable to initiate further crystallographic refinement. The program is available from http://kinemage.biochem.duke.edu.
kinematic chain; RNA backbone conformation; RNA backbone adjustment; RNA crystallography; automated rebuilding; steric clash; S-motifs; all-atom contacts; structure validation
The practical limits of molecular replacement can be extended by using several specifically designed protein models based on fold-recognition methods and by exhaustive searches performed in a parallelized pipeline. Updated results from the JCSG MR pipeline, which to date has solved 33 molecular-replacement structures with less than 35% sequence identity to the closest homologue of known structure, are presented.
The success rate of molecular replacement (MR) falls considerably when search models share less than 35% sequence identity with their templates, but can be improved significantly by using fold-recognition methods combined with exhaustive MR searches. Models based on alignments calculated with fold-recognition algorithms are more accurate than models based on conventional alignment methods such as FASTA or BLAST, which are still widely used for MR. In addition, by designing MR pipelines that integrate phasing and automated refinement and allow parallel processing of such calculations, one can effectively increase the success rate of MR. Here, updated results from the JCSG MR pipeline are presented, which to date has solved 33 MR structures with less than 35% sequence identity to the closest homologue of known structure. By using difficult MR problems as examples, it is demonstrated that successful MR phasing is possible even in cases where the similarity between the model and the template can only be detected with fold-recognition algorithms. In the first step, several search models are built based on all homologues found in the PDB by fold-recognition algorithms. The models resulting from this process are used in parallel MR searches with different combinations of input parameters of the MR phasing algorithm. The putative solutions are subjected to rigid-body and restrained crystallographic refinement and ranked based on the final values of free R factor, figure of merit and deviations from ideal geometry. Finally, crystal packing and electron-density maps are checked to identify the correct solution. If this procedure does not yield a solution with interpretable electron-density maps, then even more alternative models are prepared. The structurally variable regions of a protein family are identified based on alignments of sequences and known structures from that family and appropriate trimmings of the models are proposed. All combinations of these trimmings are applied to the search models and the resulting set of models is used in the MR pipeline. It is estimated that with the improvements in model building and exhaustive parallel searches with existing phasing algorithms, MR can be successful for more than 50% of recognizable homologues of known structures below the threshold of 35% sequence identity. This implies that about one-third of the proteins in a typical bacterial proteome are potential MR targets.
molecular replacement; sequence-alignment accuracy; homology modeling; parameter-space screening; structural genomics
Details of the RNA polymerase I crystal structure determination provide a framework for solution of the structures of other multi-subunit complexes. Simple crystallographic experiments are described to extract relevant biological information such as the location of the enzyme active site.
Knowing the structure of multi-subunit complexes is critical to understand basic cellular functions. However, when crystals of these complexes can be obtained they rarely diffract beyond 3 Å resolution, which complicates X-ray structure determination and refinement. The crystal structure of RNA polymerase I, an essential cellular machine that synthesizes the precursor of ribosomal RNA in the nucleolus of eukaryotic cells, has recently been solved. Here, the crucial steps that were undertaken to build the atomic model of this multi-subunit enzyme are reported, emphasizing how simple crystallographic experiments can be used to extract relevant biological information. In particular, this report discusses the combination of poor molecular replacement and experimental phases, the application of multi-crystal averaging and the use of anomalous scatterers as sequence markers to guide tracing and to locate the active site. The methods outlined here will likely serve as a reference for future structural determination of large complexes at low resolution.
low-resolution structure determination; multi-subunit complexes; transcription; RNA polymerase I
Three-dimensional RNA models fitted into crystallographic density maps exhibit pervasive conformational ambiguities, geometric errors, and steric clashes. To address these problems, we present Enumerative Real-space Refinement ASsisted by Electron density under Rosetta (ERRASER), coupled to PHENIX (Python-based Hierarchical Environment for Integrated Xtallography) diffraction-based refinement. On 24 datasets, ERRASER automatically corrects the majority of MolProbity-assessed errors, improves average Rfree factor, resolves functionally important discrepancies in non-canonical structure, and refines low-resolution models to better match higher resolution models.
The solvent-picking procedure in phenix.refine has been extended and combined with Phaser anomalous substructure completion and analysis of coordination geometry to identify and place elemental ions.
Many macromolecular model-building and refinement programs can automatically place solvent atoms in electron density at moderate-to-high resolution. This process frequently builds water molecules in place of elemental ions, the identification of which must be performed manually. The solvent-picking algorithms in phenix.refine have been extended to build common ions based on an analysis of the chemical environment as well as physical properties such as occupancy, B factor and anomalous scattering. The method is most effective for heavier elements such as calcium and zinc, for which a majority of sites can be placed with few false positives in a diverse test set of structures. At atomic resolution, it is observed that it can also be possible to identify tightly bound sodium and magnesium ions. A number of challenges that contribute to the difficulty of completely automating the process of structure completion are discussed.
refinement; ions; PHENIX
Interpretation of low-resolution X-ray crystallographic data can prove to be a difficult task. The challenges faced in electron-density interpretation, the strategies that have been employed to overcome them and developments to automate the process are reviewed.
The interpretation of low-resolution X-ray crystallographic data proves to be challenging even for the most experienced crystallographer. Ambiguity in the electron-density map makes main-chain tracing and side-chain assignment difficult. However, the number of structures solved at resolutions poorer than 3.5 Å is growing rapidly and the structures are often of high biological interest and importance. Here, the challenges faced in electron-density interpretation, the strategies that have been employed to overcome them and developments to automate the process are reviewed. The methods employed in model generation from electron microscopy, which share many of the same challenges in providing high-confidence models of macromolecular structures and assemblies, are also considered.
model building; low-resolution data
The structure of Ca2+-bound EF-hand protein S100A2 was determined by calcium and sulfur SAD at a wavelength of 0.90 Å.
Human S100A2 is an EF-hand protein and acts as a major tumour suppressor, binding and activating p53 in a Ca2+-dependent manner. Ca2+-bound S100A2 was crystallized and its structure was determined based on the anomalous scattering provided by six S atoms from methionine residues and four calcium ions present in the asymmetric unit. Although the diffraction data were recorded at a wavelength of 0.90 Å, which is usually not assumed to be suitable for calcium/sulfur SAD, the anomalous signal was satisfactory. A nine-atom substructure was determined at 1.8 Å resolution using SHELXD, and SHELXE was used for density modification and phase extension to 1.3 Å resolution. The electron-density map obtained was well interpretable and could be used for automated model building by ARP/wARP.
S100A2; EF-hands; calcium; sulfur SAD