A novel method that uses the conformational distribution of Cα atoms in known structures is used to build short missing regions (‘loops’) in protein models. An initial tree of possible loop paths is pruned according to structural and electron-density criteria and the most likely loop conformation(s) are selected and built.
One of the most cumbersome and time-demanding tasks in completing a protein model is building short missing regions or ‘loops’. A method is presented that uses structural and electron-density information to build the most likely conformations of such loops. Using the distribution of angles and dihedral angles in pentapeptides as the driving parameters, a set of possible conformations for the Cα backbone of loops was generated. The most likely candidate is then selected in a hierarchical manner: new and stronger restraints are added while the loop is built. The weight of the electron-density correlation relative to geometrical considerations is gradually increased until the most likely loop is selected on map correlation alone. To conclude, the loop is refined against the electron density in real space. This is started by using structural information to trace a set of models for the Cα backbone of the loop. Only in later steps of the algorithm is the electron-density correlation used as a criterion to select the loop(s). Thus, this method is more robust in low-density regions than an approach using density as a primary criterion. The algorithm is implemented in a loop-building program, Loopy, which can be used either alone or as part of an automatic building cycle. Loopy can build loops of up to 14 residues in length within a couple of minutes. The average root-mean-square deviation of the Cα atoms in the loops built during validation was less than 0.4 Å. When implemented in the context of automated model building in ARP/wARP, Loopy can increase the completeness of the built models.
model building; loop modelling; Loopy
Algorithms and geometrical properties are described for the automated building of nucleic acids in experimental electron density.
Medium- to high-resolution X-ray structures of DNA and RNA molecules were investigated to find geometric properties useful for automated model building in crystallographic electron-density maps. We describe a simple method, starting from a list of electron-density ‘blobs’, for identifying backbone phosphates and nucleic acid bases based on properties of the local electron-density distribution. This knowledge should be useful for the automated building of nucleic acid models into electron-density maps. We show that the distances and angles involving C1′ and the P atoms, using the pseudo-torsion angles and that describe the …P—C1′—P—C1′… chain, provide a promising basis for building the nucleic acid polymer. These quantities show reasonably narrow distributions with asymmetry that should allow the direction of the phosphate backbone to be established.
nucleic acids; autobuilding; geometric properties; electron-density distribution
Noncrystallographic symmetry is automatically detected and used to achieve higher completeness and greater accuracy of automatically built protein structures at resolutions of 2.3 Å or poorer.
A novel method is presented for the automatic detection of noncrystallographic symmetry (NCS) in macromolecular crystal structure determination which does not require the derivation of molecular masks or the segmentation of density. It was found that throughout structure determination the NCS-related parts may be differently pronounced in the electron density. This often results in the modelling of molecular fragments of variable length and accuracy, especially during automated model-building procedures. These fragments were used to identify NCS relations in order to aid automated model building and refinement. In a number of test cases higher completeness and greater accuracy of the obtained structures were achieved, specifically at a crystallographic resolution of 2.3 Å or poorer. In the best case, the method allowed the building of up to 15% more residues automatically and a tripling of the average length of the built fragments.
noncrystallographic symmetry; automated model building
Automatic modeling methods using cryo-electron microscopy (cryoEM) density maps as constrains are promising approaches to building atomic models of individual proteins or protein domains. However, their application to large macromolecular assemblies has not been possible largely due to computational limitations inherent to such unsupervised methods. Here we describe a new method, EM-IMO, for building, modifying and refining local structures of protein models using cryoEM maps as a constraint. As a supervised refinement method, EM-IMO allows users to specify parameters derived from inspections, so as to guide, and as a consequence, significantly speed up the refinement. An EM-IMO-based refinement protocol is first benchmarked on a data set of 50 homology models using simulated density maps. A multi-scale refinement strategy that combines EM-IMO-based and molecular dynamics (MD)-based refinement is then applied to build backbone models for the seven conformers of the five capsid proteins in our near-atomic resolution cryoEM map of the grass carp reovirus (GCRV) virion, a member of the aquareovirus genus of the Reoviridae family. The refined models allow us to reconstruct a backbone model of the entire GCRV capsid and provide valuable functional insights that are described in the accompanying publication. Our study demonstrates that the integrated use of homology modeling and a multi-scale refinement protocol that combines supervised and automated structure refinement offers a practical strategy for building atomic models based on medium- to high-resolution cryoEM density maps.
cryo-electron microscopy; density fitting; homology modeling; structure refinement; protein structure prediction
DEN refinement and automated model building with AutoBuild were used to determine the structure of a putative succinyl-diaminopimelate desuccinylase from C. glutamicum. This difficult case of molecular-replacement phasing shows that the synergism between DEN refinement and AutoBuild outperforms standard refinement protocols.
Phasing by molecular replacement remains difficult for targets that are far from the search model or in situations where the crystal diffracts only weakly or to low resolution. Here, the process of determining and refining the structure of Cgl1109, a putative succinyl-diaminopimelate desuccinylase from Corynebacterium glutamicum, at ∼3 Å resolution is described using a combination of homology modeling with MODELLER, molecular-replacement phasing with Phaser, deformable elastic network (DEN) refinement and automated model building using AutoBuild in a semi-automated fashion, followed by final refinement cycles with phenix.refine and Coot. This difficult molecular-replacement case illustrates the power of including DEN restraints derived from a starting model to guide the movements of the model during refinement. The resulting improved model phases provide better starting points for automated model building and produce more significant difference peaks in anomalous difference Fourier maps to locate anomalous scatterers than does standard refinement. This example also illustrates a current limitation of automated procedures that require manual adjustment of local sequence misalignments between the homology model and the target sequence.
reciprocal-space refinement; DEN refinement; real-space refinement; automated model building; succinyl-diaminopimelate desuccinylase
Circuitry mapping of metazoan neural systems is difficult because canonical neural regions (regions containing one or more copies of all components) are large, regional borders are uncertain, neuronal diversity is high, and potential network topologies so numerous that only anatomical ground truth can resolve them. Complete mapping of a specific network requires synaptic resolution, canonical region coverage, and robust neuronal classification. Though transmission electron microscopy (TEM) remains the optimal tool for network mapping, the process of building large serial section TEM (ssTEM) image volumes is rendered difficult by the need to precisely mosaic distorted image tiles and register distorted mosaics. Moreover, most molecular neuronal class markers are poorly compatible with optimal TEM imaging. Our objective was to build a complete framework for ultrastructural circuitry mapping. This framework combines strong TEM-compliant small molecule profiling with automated image tile mosaicking, automated slice-to-slice image registration, and gigabyte-scale image browsing for volume annotation. Specifically we show how ultrathin molecular profiling datasets and their resultant classification maps can be embedded into ssTEM datasets and how scripted acquisition tools (SerialEM), mosaicking and registration (ir-tools), and large slice viewers (MosaicBuilder, Viking) can be used to manage terabyte-scale volumes. These methods enable large-scale connectivity analyses of new and legacy data. In well-posed tasks (e.g., complete network mapping in retina), terabyte-scale image volumes that previously would require decades of assembly can now be completed in months. Perhaps more importantly, the fusion of molecular profiling, image acquisition by SerialEM, ir-tools volume assembly, and data viewers/annotators also allow ssTEM to be used as a prospective tool for discovery in nonneural systems and a practical screening methodology for neurogenetics. Finally, this framework provides a mechanism for parallelization of ssTEM imaging, volume assembly, and data analysis across an international user base, enhancing the productivity of a large cohort of electron microscopists.
Building an accurate neural network diagram of the vertebrate nervous system is a major challenge in neuroscience. Diverse groups of neurons that function together form complex patterns of connections often spanning large regions of brain tissue, with uncertain borders. Although serial-section transmission electron microscopy remains the optimal tool for fine anatomical analyses, the time and cost of the undertaking has been prohibitive. We have assembled a complete framework for ultrastructural mapping using conventional transmission electron microscopy that tremendously accelerates image analysis. This framework combines small-molecule profiling to classify cells, automated image acquisition, automated mosaic formation, automated slice-to-slice image registration, and large-scale image browsing for volume annotation. Terabyte-scale image volumes requiring decades or more to assemble manually can now be automatically built in a few months. This makes serial-section transmission electron microscopy practical for high-resolution exploration of all complex tissue systems (neural or nonneural) as well as for ultrastructural screening of genetic models.
A framework for analysis of terabyte-scale serial-section transmission electron microscopic (ssTEM) datasets overcomes computational barriers and accelerates high-resolution tissue analysis, providing a practical way of mapping complex neural circuitry and an effective screening tool for neurogenetics.
A method for automated macromolecular main-chain model building is described.
An algorithm for the automated macromolecular model building of polypeptide backbones is described. The procedure is hierarchical. In the initial stages, many overlapping polypeptide fragments are built. In subsequent stages, the fragments are extended and then connected. Identification of the locations of helical and β-strand regions is carried out by FFT-based template matching. Fragment libraries of helices and β-strands from refined protein structures are then positioned at the potential locations of helices and strands and the longest segments that fit the electron-density map are chosen. The helices and strands are then extended using fragment libraries consisting of sequences three amino acids long derived from refined protein structures. The resulting segments of polypeptide chain are then connected by choosing those which overlap at two or more Cα positions. The fully automated procedure has been implemented in RESOLVE and is capable of model building at resolutions as low as 3.5 Å. The algorithm is useful for building a preliminary main-chain model that can serve as a basis for refinement and side-chain addition.
model building; template matching; fragment extension
The highly automated PHENIX AutoBuild wizard is described. The procedure can be applied equally well to phases derived from isomorphous/anomalous and molecular-replacement methods.
The PHENIX AutoBuild wizard is a highly automated tool for iterative model building, structure refinement and density modification using RESOLVE model building, RESOLVE statistical density modification and phenix.refine structure refinement. Recent advances in the AutoBuild wizard and phenix.refine include automated detection and application of NCS from models as they are built, extensive model-completion algorithms and automated solvent-molecule picking. Model-completion algorithms in the AutoBuild wizard include loop building, crossovers between chains in different models of a structure and side-chain optimization. The AutoBuild wizard has been applied to a set of 48 structures at resolutions ranging from 1.1 to 3.2 Å, resulting in a mean R factor of 0.24 and a mean free R factor of 0.29. The R factor of the final model is dependent on the quality of the starting electron density and is relatively independent of resolution.
model building; model completion; macromolecular models; Protein Data Bank; structure refinement; PHENIX
A procedure for iterative model-building, statistical density modification and refinement at moderate resolution (up to about 2.8 Å) is described.
An iterative process for improving the completeness and quality of atomic models automatically built at moderate resolution (up to about 2.8 Å) is described. The process consists of cycles of model building interspersed with cycles of refinement and combining phase information from the model with experimental phase information (if any) using statistical density modification. The process can lead to substantial improvements in both the accuracy and completeness of the model compared with a single cycle of model building. For eight test cases solved by MAD or SAD at resolutions ranging from 2.0 to 2.8 Å, the fraction of models built and assigned to sequence was 46–91% (mean of 65%) after the first cycle of building and refinement, and 78–95% (mean of 87%) after 20 cycles. In an additional test case, an incorrect model of gene 5 protein (PDB code 2gn5; r.m.s.d. of main-chain atoms from the more recent refined structure 1vqb at 1.56 Å) was rebuilt using only structure-factor amplitude information at varying resolutions from 2.0 to 3.0 Å. Rebuilding was effective at resolutions up to about 2.5 Å. The resulting models had 60–80% of the residues built and an r.m.s.d. of main-chain atoms from the refined structure of 0.20 to 0.62 Å. The algorithm is useful for building preliminary models of macromolecules suitable for an experienced crystallographer to extend, correct and fully refine.
density modification; model building; refinement
The optimization of WbdD crystals using a novel dehydration protocol and experimental phasing at 3.5 Å resolution by cross-crystal averaging followed by molecular replacement of electron density into a non-isomorphous 3.0 Å resolution native data set are reported.
WbdD is a bifunctional kinase/methyltransferase that is responsible for regulation of lipopolysaccharide O antigen polysaccharide chain length in Escherichia coli serotype O9a. Solving the crystal structure of this protein proved to be a challenge because the available crystals belonging to space group I23 only diffracted to low resolution (>95% of the crystals diffracted to resolution lower than 4 Å and most only to 8 Å) and were non-isomorphous, with changes in unit-cell dimensions of greater than 10%. Data from a serendipitously found single native crystal that diffracted to 3.0 Å resolution were non-isomorphous with a lower (3.5 Å) resolution selenomethionine data set. Here, a strategy for improving poor (3.5 Å resolution) initial phases by density modification and cross-crystal averaging with an additional 4.2 Å resolution data set to build a crude model of WbdD is desribed. Using this crude model as a mask to cut out the 3.5 Å resolution electron density yielded a successful molecular-replacement solution of the 3.0 Å resolution data set. The resulting map was used to build a complete model of WbdD. The hydration status of individual crystals appears to underpin the variable diffraction quality of WbdD crystals. After the initial structure had been solved, methods to control the hydration status of WbdD were developed and it was thus possible to routinely obtain high-resolution diffraction (to better than 2.5 Å resolution). This novel and facile crystal-dehydration protocol may be useful for similar challenging situations.
WbdD; crystal dehydration
A method for automated macromolecular side-chain model building and for aligning the sequence to the map is described.
An algorithm is described for automated building of side chains in an electron-density map once a main-chain model is built and for alignment of the protein sequence to the map. The procedure is based on a comparison of electron density at the expected side-chain positions with electron-density templates. The templates are constructed from average amino-acid side-chain densities in 574 refined protein structures. For each contiguous segment of main chain, a matrix with entries corresponding to an estimate of the probability that each of the 20 amino acids is located at each position of the main-chain model is obtained. The probability that this segment corresponds to each possible alignment with the sequence of the protein is estimated using a Bayesian approach and high-confidence matches are kept. Once side-chain identities are determined, the most probable rotamer for each side chain is built into the model. The automated procedure has been implemented in the RESOLVE software. Combined with automated main-chain model building, the procedure produces a preliminary model suitable for refinement and extension by an experienced crystallographer.
model building; template matching
We recently determined the crystal structure of the functional core of human U1 snRNP, consisting of nine proteins and one RNA, based on a 5.5 Å resolution electron density map. At 5–7 Å resolution, α helices and β sheets appear as rods and slabs, respectively, hence it is not possible to determine protein fold de novo. Using inverse beam geometry, accurate anomalous signals were obtained from weakly diffracting and radiation sensitive P1 crystals. We were able to locate anomalous scatterers with positional errors below 2 Å. This enabled us not only to place protein domains of known structure accurately into the map but also to trace an extended polypeptide chain, of previously undetermined structure, using selenomethionine derivatives of single methionine mutants spaced along the sequence. This method of Se-Met scanning, in combination with structure prediction, is a powerful tool for building a protein of unknown fold into a low resolution electron density map.
Interpretation of low-resolution X-ray crystallographic data can prove to be a difficult task. The challenges faced in electron-density interpretation, the strategies that have been employed to overcome them and developments to automate the process are reviewed.
The interpretation of low-resolution X-ray crystallographic data proves to be challenging even for the most experienced crystallographer. Ambiguity in the electron-density map makes main-chain tracing and side-chain assignment difficult. However, the number of structures solved at resolutions poorer than 3.5 Å is growing rapidly and the structures are often of high biological interest and importance. Here, the challenges faced in electron-density interpretation, the strategies that have been employed to overcome them and developments to automate the process are reviewed. The methods employed in model generation from electron microscopy, which share many of the same challenges in providing high-confidence models of macromolecular structures and assemblies, are also considered.
model building; low-resolution data
X-ray diffraction plays a pivotal role in understanding of biological systems by revealing atomic structures of proteins, nucleic acids, and their complexes, with much recent interest in very large assemblies like the ribosome. Since crystals of such large assemblies often diffract weakly (resolution worse than 4 Å), we need methods that work at such low resolution. In macromolecular assemblies, some of the components may be known at high resolution, while others are unknown: current refinement methods fail as they require a high-resolution starting structure for the entire complex1. Determining such complexes, which are often of key biological importance, should be possible in principle as the number of independent diffraction intensities at a resolution below 5 Å generally exceed the number of degrees of freedom. Here we introduce a new method that adds specific information from known homologous structures but allows global and local deformations of these homology models. Our approach uses the observation that local protein structure tends to be conserved as sequence and function evolve. Cross-validation with Rfree determines the optimum deformation and influence of the homology model. For test cases at 3.5 – 5 Å resolution with known structures at high resolution, our method gives significant improvements over conventional refinement in the model coordinate accuracy, the definition of secondary structure, and the quality of electron density maps. For re-refinements of a representative set of 19 low-resolution crystal structures from the PDB, we find similar improvements. Thus, a structure derived from low-resolution diffraction data can have quality similar to a high-resolution structure. Our method is applicable to studying weakly diffracting crystals using X-ray micro-diffraction2 as well as data from new X-ray light sources3. Use of homology information is not restricted to X-ray crystallography and cryo-electron microscopy: as optical imaging advances to sub-nanometer resolution4,5, it can use similar tools.
X-ray crystallography; homology modeling; cross-validation; Rfree value; refinement
Motivation: Owing to the size and complexity of large multi-component biological assemblies, the most tractable approach to determining their atomic structure is often to fit high-resolution radiographic or nuclear magnetic resonance structures of isolated components into lower resolution electron density maps of the larger assembly obtained using cryo-electron microscopy (cryo-EM). This hybrid approach to structure determination requires that an atomic resolution structure of each component, or a suitable homolog, is available. If neither is available, then the amount of structural information regarding that component is limited by the resolution of the cryo-EM map. However, even if a suitable homolog cannot be identified using sequence analysis, a search for structural homologs should still be performed because structural homology often persists throughout evolution even when sequence homology is undetectable, As macromolecules can often be described as a collection of independently folded domains, one way of searching for structural homologs would be to systematically fit representative domain structures from a protein domain database into the medium/low resolution cryo-EM map and return the best fits. Taken together, the best fitting non-overlapping structures would constitute a ‘mosaic’ backbone model of the assembly that could aid map interpretation and illuminate biological function.
Result: Using the computational principles of the Scale-Invariant Feature Transform (SIFT), we have developed FOLD-EM—a computational tool that can identify folded macromolecular domains in medium to low resolution (4–15 Å) electron density maps and return a model of the constituent polypeptides in a fully automated fashion. As a by-product, FOLD-EM can also do flexible multi-domain fitting that may provide insight into conformational changes that occur in macromolecular assemblies.
Availability and implementation: FOLD-EM is available at: http://cs.stanford.edu/~mitul/foldEM/, as a free open source software to the structural biology scientific community.
firstname.lastname@example.org or email@example.com
Supplementary data are available at Bioinformatics online.
Coot is a molecular-graphics program designed to assist in the building of protein and other macromolecular models. The current state of development and available features are presented.
Coot is a molecular-graphics application for model building and validation of biological macromolecules. The program displays electron-density maps and atomic models and allows model manipulations such as idealization, real-space refinement, manual rotation/translation, rigid-body fitting, ligand search, solvation, mutations, rotamers and Ramachandran idealization. Furthermore, tools are provided for model validation as well as interfaces to external programs for refinement, validation and graphics. The software is designed to be easy to learn for novice users, which is achieved by ensuring that tools for common tasks are ‘discoverable’ through familiar user-interface elements (menus and toolbars) or by intuitive behaviour (mouse controls). Recent developments have focused on providing tools for expert users, with customisable key bindings, extensions and an extensive scripting interface. The software is under rapid development, but has already achieved very widespread use within the crystallographic community. The current state of the software is presented, with a description of the facilities available and of some of the underlying methods employed.
Coot; model building
Electron cryo-microscopy (cryo-EM) has played an increasingly important role in elucidating the structure and function of macromolecular assemblies in near native solution conditions. Typically, however, only non-atomic resolution reconstructions have been obtained for these large complexes, necessitating computational tools for integrating and extracting structural details. With recent advances in cryo-EM, maps at near-atomic resolutions have been achieved for several macromolecular assemblies from which models have been manually constructed. In this work, we describe a new interactive modeling toolkit called Gorgon targeted at intermediate to near-atomic resolution density maps (10-3.5 Å), particularly from cryo-EM. Gorgon's de novo modeling procedure couples sequence-based secondary structure prediction with feature detection and geometric modeling techniques to generate initial protein backbone models. Beyond model building, Gorgon is an extensible interactive visualization platform with a variety of computational tools for annotating a wide variety of 3D volumes. Examples from cryo-EM maps of Rotavirus and Rice Dwarf Virus are used to demonstrate its applicability to modeling protein structure.
cryo-EM; Gorgon; modeling; protein structure; near-atomic resolution
The general principles behind the macromolecular crystal structure refinement program REFMAC5 are described.
This paper describes various components of the macromolecular crystallographic refinement program REFMAC5, which is distributed as part of the CCP4 suite. REFMAC5 utilizes different likelihood functions depending on the diffraction data employed (amplitudes or intensities), the presence of twinning and the availability of SAD/SIRAS experimental diffraction data. To ensure chemical and structural integrity of the refined model, REFMAC5 offers several classes of restraints and choices of model parameterization. Reliable models at resolutions at least as low as 4 Å can be achieved thanks to low-resolution refinement tools such as secondary-structure restraints, restraints to known homologous structures, automatic global and local NCS restraints, ‘jelly-body’ restraints and the use of novel long-range restraints on atomic displacement parameters (ADPs) based on the Kullback–Leibler divergence. REFMAC5 additionally offers TLS parameterization and, when high-resolution data are available, fast refinement of anisotropic ADPs. Refinement in the presence of twinning is performed in a fully automated fashion. REFMAC5 is a flexible and highly optimized refinement package that is ideally suited for refinement across the entire resolution spectrum encountered in macromolecular crystallography.
Biological complexes typically exhibit intermolecular interfaces of high shape complementarity. Many computational docking approaches use this surface complementarity as a guide in the search for predicting the structures of protein-protein complexes. Proteins often undergo conformational changes in order to create a highly complementary interface when associating. These conformational changes are a major cause of failure for automated docking procedures when predicting binding modes between proteins using their unbound conformations. Low resolution surfaces in which high frequency geometric details are omitted have been used to address this problem. These smoothed, or blurred, surfaces are expected to minimize the differences between free and bound structures, especially those that are due to side chain conformations or small backbone deviations.
In spite of the fact that this approach has been used in many docking protocols, there has yet to be a systematic study of the effects of such surface smoothing on the shape complementarity of the resulting interfaces. Here we investigate this question by computing shape complementarity of a set of 66 protein-protein complexes represented by multi-resolution blurred surfaces. Complexed and unbound structures are available for these protein-protein complexes. They are a subset of complexes from a non-redundant docking benchmark selected for rigidity (i.e. the proteins undergo limited conformational changes between their bound and unbound states). In this work we construct the surfaces by isocontouring a density map obtained by accumulating the densities of Gaussian functions placed at all atom centers of the molecule. The smoothness or resolution is specified by a Gaussian fall-off coefficient, termed “blobbyness”. Shape complementarity is quantified using a histogram of the shortest distances between two proteins' surface mesh vertices for both the crystallographic complexes and the complexes built using the protein structures in their unbound conformation.
The histograms calculated for the bound complex structures demonstrate that medium resolution smoothing (blobbyness=−0.9) can reproduce about 88% of the shape complementarity of atomic resolution surfaces. Complexes formed from the free component structures show a partial loss of shape complementarity (more overlaps and gaps) with the atomic resolution surfaces. For surfaces smoothed to low resolution (blobbyness=−0.3), we find more consistency of shape complementarity between the complexed and free cases. To further reduce bad contacts without significantly impacting the good contacts we introduce another blurred surface, in which the Gaussian densities of flexible atoms are reduced. From these results we discuss the use of shape complementarity in protein-protein docking.
Protein interactions; protein-protein docking; Gaussian surface; protein side-chain flexibility; protein interfaces; unbound-unbound docking; protein complexes; Blur surface; FlexBlur surface; enzyme-inhibitor complexes
Many Protein Data Bank (PDB) users assume that the deposited structural models are of high quality but forget that these models are derived from the interpretation of experimental data. The accuracy of atom coordinates is not homogeneous between models or throughout the same model. To avoid basing a research project on a flawed model, we present a tool for assessing the quality of ligands and binding sites in crystallographic models from the PDB.
The Validation HElper for LIgands and Binding Sites (VHELIBS) is software that aims to ease the validation of binding site and ligand coordinates for non-crystallographers (i.e., users with little or no crystallography knowledge). Using a convenient graphical user interface, it allows one to check how ligand and binding site coordinates fit to the electron density map. VHELIBS can use models from either the PDB or the PDB_REDO databank of re-refined and re-built crystallographic models. The user can specify threshold values for a series of properties related to the fit of coordinates to electron density (Real Space R, Real Space Correlation Coefficient and average occupancy are used by default). VHELIBS will automatically classify residues and ligands as Good, Dubious or Bad based on the specified limits. The user is also able to visually check the quality of the fit of residues and ligands to the electron density map and reclassify them if needed.
VHELIBS allows inexperienced users to examine the binding site and the ligand coordinates in relation to the experimental data. This is an important step to evaluate models for their fitness for drug discovery purposes such as structure-based pharmacophore development and protein-ligand docking experiments.
Electron density map; Binding site structure validation; Ligand structure validation; Protein structure validation; PDB; PDB_REDO
The automated building of a protein model into an electron density map remains a challenging problem. In the ARP/wARP approach, model building is facilitated by initially interpreting a density map with free atoms of unknown chemical identity; all structural information for such chemically unassigned atoms is discarded. Here, this is remedied by applying restraints between free atoms, and between free atoms and a partial protein model. These are based on geometric considerations of protein structure and tentative (conditional) assignments for the free atoms. Restraints are applied in the REFMAC5 refinement program and are generated on an ad hoc basis, allowing them to fluctuate from step to step. A large set of experimentally phased and molecular replacement structures showcases individual structures where automated building is improved drastically by the conditional restraints. The concept and implementation we present can also find application in restraining geometries, such as hydrogen bonds, in low-resolution refinement.
The practical limits of molecular replacement can be extended by using several specifically designed protein models based on fold-recognition methods and by exhaustive searches performed in a parallelized pipeline. Updated results from the JCSG MR pipeline, which to date has solved 33 molecular-replacement structures with less than 35% sequence identity to the closest homologue of known structure, are presented.
The success rate of molecular replacement (MR) falls considerably when search models share less than 35% sequence identity with their templates, but can be improved significantly by using fold-recognition methods combined with exhaustive MR searches. Models based on alignments calculated with fold-recognition algorithms are more accurate than models based on conventional alignment methods such as FASTA or BLAST, which are still widely used for MR. In addition, by designing MR pipelines that integrate phasing and automated refinement and allow parallel processing of such calculations, one can effectively increase the success rate of MR. Here, updated results from the JCSG MR pipeline are presented, which to date has solved 33 MR structures with less than 35% sequence identity to the closest homologue of known structure. By using difficult MR problems as examples, it is demonstrated that successful MR phasing is possible even in cases where the similarity between the model and the template can only be detected with fold-recognition algorithms. In the first step, several search models are built based on all homologues found in the PDB by fold-recognition algorithms. The models resulting from this process are used in parallel MR searches with different combinations of input parameters of the MR phasing algorithm. The putative solutions are subjected to rigid-body and restrained crystallographic refinement and ranked based on the final values of free R factor, figure of merit and deviations from ideal geometry. Finally, crystal packing and electron-density maps are checked to identify the correct solution. If this procedure does not yield a solution with interpretable electron-density maps, then even more alternative models are prepared. The structurally variable regions of a protein family are identified based on alignments of sequences and known structures from that family and appropriate trimmings of the models are proposed. All combinations of these trimmings are applied to the search models and the resulting set of models is used in the MR pipeline. It is estimated that with the improvements in model building and exhaustive parallel searches with existing phasing algorithms, MR can be successful for more than 50% of recognizable homologues of known structures below the threshold of 35% sequence identity. This implies that about one-third of the proteins in a typical bacterial proteome are potential MR targets.
molecular replacement; sequence-alignment accuracy; homology modeling; parameter-space screening; structural genomics
We demonstrate that it is feasible to determine high-resolution protein structures by electron crystallography of three-dimensional crystals in an electron cryo-microscope (CryoEM). Lysozyme microcrystals were frozen on an electron microscopy grid, and electron diffraction data collected to 1.7 Å resolution. We developed a data collection protocol to collect a full-tilt series in electron diffraction to atomic resolution. A single tilt series contains up to 90 individual diffraction patterns collected from a single crystal with tilt angle increment of 0.1–1° and a total accumulated electron dose less than 10 electrons per angstrom squared. We indexed the data from three crystals and used them for structure determination of lysozyme by molecular replacement followed by crystallographic refinement to 2.9 Å resolution. This proof of principle paves the way for the implementation of a new technique, which we name ‘MicroED’, that may have wide applicability in structural biology.
X-ray crystallography has been used to work out the atomic structure of a large number of proteins. In a typical X-ray crystallography experiment, a beam of X-rays is directed at a protein crystal, which scatters some of the X-ray photons to produce a diffraction pattern. The crystal is then rotated through a small angle and another diffraction pattern is recorded. Finally, after this process has been repeated enough times, it is possible to work backwards from the diffraction patterns to figure out the structure of the protein.
The crystals used for X-ray crystallography must be large to withstand the damage caused by repeated exposure to the X-ray beam. However, some proteins do not form crystals at all, and others only form small crystals. It is possible to overcome this problem by using extremely short pulses of X-rays, but this requires a very large number of small crystals and ultrashort X-ray pulses are only available at a handful of research centers around the world. There is, therefore, a need for other approaches that can determine the structure of proteins that only form small crystals.
Electron crystallography is similar to X-ray crystallography in that a protein crystal scatters a beam to produce a diffraction pattern. However, the interactions between the electrons in the beam and the crystal are much stronger than those between the X-ray photons and the crystal. This means that meaningful amounts of data can be collected from much smaller crystals. However, it is normally only possible to collect one diffraction pattern from each crystal because of beam induced damage. Researchers have developed methods to merge the diffraction patterns produced by hundreds of small crystals, but to date these techniques have only worked with very thin two-dimensional crystals that contain only one layer of the protein of interest.
Now Shi et al. report a new approach to electron crystallography that works with very small three-dimensional crystals. Called MicroED, this technique involves placing the crystal in a transmission electron cryo-microscope, which is a fairly standard piece of equipment in many laboratories. The normal ‘low-dose’ electron beam in one of these microscopes would normally damage the crystal after a single diffraction pattern had been collected. However, Shi et al. realized that it was possible to obtain diffraction patterns without severely damaging the crystal if they dramatically reduced the normal low-dose electron beam. By reducing the electron dose by a factor of 200, it was possible to collect up to 90 diffraction patterns from the same, very small, three-dimensional crystal, and then—similar to what happens in X-ray crystallography—work backwards to figure out the structure of the protein. Shi et al. demonstrated the feasibility of the MicroED approach by using it to determine the structure of lysozyme, which is widely used as a test protein in crystallography, with a resolution of 2.9 Å. This proof-of principle study paves the way for crystallographers to study protein that cannot be studied with existing techniques.
electron crystallography; electron diffraction; electron cryomicroscopy (cryo-EM); microED; protein structure; microcrystals; None
Recently, electron microscopy measurement of single particles has enabled us to reconstruct a low-resolution 3D density map of large biomolecular complexes. If structures of the complex subunits can be solved by x-ray crystallography at atomic resolution, fitting these models into the 3D density map can generate an atomic resolution model of the entire large complex. The fitting of multiple subunits, however, generally requires large computational costs; therefore, development of an efficient algorithm is required. We developed a fast fitting program, “gmfit”, which employs a Gaussian mixture model (GMM) to represent approximated shapes of the 3D density map and the atomic models. A GMM is a distribution function composed by adding together several 3D Gaussian density functions. Because our model analytically provides an integral of a product of two distribution functions, it enables us to quickly calculate the fitness of the density map and the atomic models. Using the integral, two types of potential energy function are introduced: the attraction potential energy between a 3D density map and each subunit, and the repulsion potential energy between subunits. The restraint energy for symmetry is also employed to build symmetrical origomeric complexes. To find the optimal configuration of subunits, we randomly generated initial configurations of subunit models, and performed a steepest-descent method using forces and torques of the three potential energies. Comparison between an original density map and its GMM showed that the required number of Gaussian distribution functions for a given accuracy depended on both resolution and molecular size. We then performed test fitting calculations for simulated low-resolution density maps of atomic models of homodimer, trimer, and hexamer, using different search parameters. The results indicated that our method was able to rebuild atomic models of a complex even for maps of 30 Å resolution if sufficient numbers (eight or more) of Gaussian distribution functions were employed for each subunit, and the symmetric restraints were assigned for complexes with more than three subunits. As a more realistic test, we tried to build an atomic model of the GroEL/ES complex by fitting 21-subunit atomic models into the 3D density map obtained by cryoelectron microscopy using the C7 symmetric restraints. A model with low root mean-square deviations (14.7 Å) was obtained as the lowest-energy model, showing that our fitting method was reasonably accurate. Inclusion of other restraints from biological and biochemical experiments could further enhance the accuracy.
The solvent-picking procedure in phenix.refine has been extended and combined with Phaser anomalous substructure completion and analysis of coordination geometry to identify and place elemental ions.
Many macromolecular model-building and refinement programs can automatically place solvent atoms in electron density at moderate-to-high resolution. This process frequently builds water molecules in place of elemental ions, the identification of which must be performed manually. The solvent-picking algorithms in phenix.refine have been extended to build common ions based on an analysis of the chemical environment as well as physical properties such as occupancy, B factor and anomalous scattering. The method is most effective for heavier elements such as calcium and zinc, for which a majority of sites can be placed with few false positives in a diverse test set of structures. At atomic resolution, it is observed that it can also be possible to identify tightly bound sodium and magnesium ions. A number of challenges that contribute to the difficulty of completely automating the process of structure completion are discussed.
refinement; ions; PHENIX