Algorithms and geometrical properties are described for the automated building of nucleic acids in experimental electron density.
Medium- to high-resolution X-ray structures of DNA and RNA molecules were investigated to find geometric properties useful for automated model building in crystallographic electron-density maps. We describe a simple method, starting from a list of electron-density ‘blobs’, for identifying backbone phosphates and nucleic acid bases based on properties of the local electron-density distribution. This knowledge should be useful for the automated building of nucleic acid models into electron-density maps. We show that the distances and angles involving C1′ and the P atoms, using the pseudo-torsion angles and that describe the …P—C1′—P—C1′… chain, provide a promising basis for building the nucleic acid polymer. These quantities show reasonably narrow distributions with asymmetry that should allow the direction of the phosphate backbone to be established.
nucleic acids; autobuilding; geometric properties; electron-density distribution
Noncrystallographic symmetry is automatically detected and used to achieve higher completeness and greater accuracy of automatically built protein structures at resolutions of 2.3 Å or poorer.
A novel method is presented for the automatic detection of noncrystallographic symmetry (NCS) in macromolecular crystal structure determination which does not require the derivation of molecular masks or the segmentation of density. It was found that throughout structure determination the NCS-related parts may be differently pronounced in the electron density. This often results in the modelling of molecular fragments of variable length and accuracy, especially during automated model-building procedures. These fragments were used to identify NCS relations in order to aid automated model building and refinement. In a number of test cases higher completeness and greater accuracy of the obtained structures were achieved, specifically at a crystallographic resolution of 2.3 Å or poorer. In the best case, the method allowed the building of up to 15% more residues automatically and a tripling of the average length of the built fragments.
noncrystallographic symmetry; automated model building
MAIN is interactive software designed to interactively perform the complex tasks of macromolecular crystal structure determination and validation. The features of MAIN and its tools for electron-density map calculations, model building, refinement in real and reciprocal space, and validation exploiting noncrystallographic symmetry in single and multiple crystal forms are presented.
MAIN is software that has been designed to interactively perform the complex tasks of macromolecular crystal structure determination and validation. Using MAIN, it is possible to perform density modification, manual and semi-automated or automated model building and rebuilding, real- and reciprocal-space structure optimization and refinement, map calculations and various types of molecular structure validation. The prompt availability of various analytical tools and the immediate visualization of molecular and map objects allow a user to efficiently progress towards the completed refined structure. The extraordinary depth perception of molecular objects in three dimensions that is provided by MAIN is achieved by the clarity and contrast of colours and the smooth rotation of the displayed objects. MAIN allows simultaneous work on several molecular models and various crystal forms. The strength of MAIN lies in its manipulation of averaged density maps and molecular models when noncrystallographic symmetry (NCS) is present. Using MAIN, it is possible to optimize NCS parameters and envelopes and to refine the structure in single or multiple crystal forms.
molecular modelling; molecular graphics; macromolecular crystal structure determination; map calculation; computer programs
ARP/wARP is a software suite to build macromolecular models in X-ray crystallography electron density maps. Structural genomics initiatives and the study of complex macromolecular assemblies and membrane proteins all rely on advanced methods for 3D structure determination. ARP/wARP meets these needs by providing the tools to obtain a macromolecular model automatically, with a reproducible computational procedure. ARP/wARP 7.0 tackles several tasks: iterative protein model building including a high-level decision-making control module; fast construction of the secondary structure of a protein; building flexible loops in alternate conformations; fully automated placement of ligands, including a choice of the best fitting ligand from a “cocktail”; and finding ordered water molecules. All protocols are easy to handle by a non-expert user through a graphical user interface or a command line. The time required is typically a few minutes although iterative model building may take a few hours.
Structural genomics; X-ray crystallography; software; model building; ligand placement
Coot is a molecular-graphics program designed to assist in the building of protein and other macromolecular models. The current state of development and available features are presented.
Coot is a molecular-graphics application for model building and validation of biological macromolecules. The program displays electron-density maps and atomic models and allows model manipulations such as idealization, real-space refinement, manual rotation/translation, rigid-body fitting, ligand search, solvation, mutations, rotamers and Ramachandran idealization. Furthermore, tools are provided for model validation as well as interfaces to external programs for refinement, validation and graphics. The software is designed to be easy to learn for novice users, which is achieved by ensuring that tools for common tasks are ‘discoverable’ through familiar user-interface elements (menus and toolbars) or by intuitive behaviour (mouse controls). Recent developments have focused on providing tools for expert users, with customisable key bindings, extensions and an extensive scripting interface. The software is under rapid development, but has already achieved very widespread use within the crystallographic community. The current state of the software is presented, with a description of the facilities available and of some of the underlying methods employed.
Coot; model building
Three-dimensional RNA models fitted into crystallographic density maps exhibit pervasive conformational ambiguities, geometric errors, and steric clashes. To address these problems, we present Enumerative Real-space Refinement ASsisted by Electron density under Rosetta (ERRASER), coupled to PHENIX (Python-based Hierarchical Environment for Integrated Xtallography) diffraction-based refinement. On 24 datasets, ERRASER automatically corrects the majority of MolProbity-assessed errors, improves average Rfree factor, resolves functionally important discrepancies in non-canonical structure, and refines low-resolution models to better match higher resolution models.
Interpretation of low-resolution X-ray crystallographic data can prove to be a difficult task. The challenges faced in electron-density interpretation, the strategies that have been employed to overcome them and developments to automate the process are reviewed.
The interpretation of low-resolution X-ray crystallographic data proves to be challenging even for the most experienced crystallographer. Ambiguity in the electron-density map makes main-chain tracing and side-chain assignment difficult. However, the number of structures solved at resolutions poorer than 3.5 Å is growing rapidly and the structures are often of high biological interest and importance. Here, the challenges faced in electron-density interpretation, the strategies that have been employed to overcome them and developments to automate the process are reviewed. The methods employed in model generation from electron microscopy, which share many of the same challenges in providing high-confidence models of macromolecular structures and assemblies, are also considered.
model building; low-resolution data
DEN refinement and automated model building with AutoBuild were used to determine the structure of a putative succinyl-diaminopimelate desuccinylase from C. glutamicum. This difficult case of molecular-replacement phasing shows that the synergism between DEN refinement and AutoBuild outperforms standard refinement protocols.
Phasing by molecular replacement remains difficult for targets that are far from the search model or in situations where the crystal diffracts only weakly or to low resolution. Here, the process of determining and refining the structure of Cgl1109, a putative succinyl-diaminopimelate desuccinylase from Corynebacterium glutamicum, at ∼3 Å resolution is described using a combination of homology modeling with MODELLER, molecular-replacement phasing with Phaser, deformable elastic network (DEN) refinement and automated model building using AutoBuild in a semi-automated fashion, followed by final refinement cycles with phenix.refine and Coot. This difficult molecular-replacement case illustrates the power of including DEN restraints derived from a starting model to guide the movements of the model during refinement. The resulting improved model phases provide better starting points for automated model building and produce more significant difference peaks in anomalous difference Fourier maps to locate anomalous scatterers than does standard refinement. This example also illustrates a current limitation of automated procedures that require manual adjustment of local sequence misalignments between the homology model and the target sequence.
reciprocal-space refinement; DEN refinement; real-space refinement; automated model building; succinyl-diaminopimelate desuccinylase
The general principles behind the macromolecular crystal structure refinement program REFMAC5 are described.
This paper describes various components of the macromolecular crystallographic refinement program REFMAC5, which is distributed as part of the CCP4 suite. REFMAC5 utilizes different likelihood functions depending on the diffraction data employed (amplitudes or intensities), the presence of twinning and the availability of SAD/SIRAS experimental diffraction data. To ensure chemical and structural integrity of the refined model, REFMAC5 offers several classes of restraints and choices of model parameterization. Reliable models at resolutions at least as low as 4 Å can be achieved thanks to low-resolution refinement tools such as secondary-structure restraints, restraints to known homologous structures, automatic global and local NCS restraints, ‘jelly-body’ restraints and the use of novel long-range restraints on atomic displacement parameters (ADPs) based on the Kullback–Leibler divergence. REFMAC5 additionally offers TLS parameterization and, when high-resolution data are available, fast refinement of anisotropic ADPs. Refinement in the presence of twinning is performed in a fully automated fashion. REFMAC5 is a flexible and highly optimized refinement package that is ideally suited for refinement across the entire resolution spectrum encountered in macromolecular crystallography.
Electron cryo-microscopy (cryo-EM) has played an increasingly important role in elucidating the structure and function of macromolecular assemblies in near native solution conditions. Typically, however, only non-atomic resolution reconstructions have been obtained for these large complexes, necessitating computational tools for integrating and extracting structural details. With recent advances in cryo-EM, maps at near-atomic resolutions have been achieved for several macromolecular assemblies from which models have been manually constructed. In this work, we describe a new interactive modeling toolkit called Gorgon targeted at intermediate to near-atomic resolution density maps (10-3.5 Å), particularly from cryo-EM. Gorgon's de novo modeling procedure couples sequence-based secondary structure prediction with feature detection and geometric modeling techniques to generate initial protein backbone models. Beyond model building, Gorgon is an extensible interactive visualization platform with a variety of computational tools for annotating a wide variety of 3D volumes. Examples from cryo-EM maps of Rotavirus and Rice Dwarf Virus are used to demonstrate its applicability to modeling protein structure.
cryo-EM; Gorgon; modeling; protein structure; near-atomic resolution
A number of techniques for the location of small and medium-sized model fragments in experimentally phased electron-density maps are explored. The application of one of these techniques to automated model building is discussed.
Molecular replacement is a powerful tool for the location of large models using structure-factor magnitudes alone. When phase information is available, it becomes possible to locate smaller fragments of the structure ranging in size from a few atoms to a single domain. The calculation is demanding, requiring a six-dimensional rotation and translation search. A number of approaches have been developed to this problem and a selection of these are reviewed in this paper. The application of one of these techniques to the problem of automated model building is explored in more detail, with particular reference to the problem of sequencing a protein main-chain trace.
model fragments; electron-density maps; model building
A method for automated macromolecular side-chain model building and for aligning the sequence to the map is described.
An algorithm is described for automated building of side chains in an electron-density map once a main-chain model is built and for alignment of the protein sequence to the map. The procedure is based on a comparison of electron density at the expected side-chain positions with electron-density templates. The templates are constructed from average amino-acid side-chain densities in 574 refined protein structures. For each contiguous segment of main chain, a matrix with entries corresponding to an estimate of the probability that each of the 20 amino acids is located at each position of the main-chain model is obtained. The probability that this segment corresponds to each possible alignment with the sequence of the protein is estimated using a Bayesian approach and high-confidence matches are kept. Once side-chain identities are determined, the most probable rotamer for each side chain is built into the model. The automated procedure has been implemented in the RESOLVE software. Combined with automated main-chain model building, the procedure produces a preliminary model suitable for refinement and extension by an experienced crystallographer.
model building; template matching
Recent advances in cryo-electron microscopy and single-particle reconstruction (collectively referred to as “cryoEM”) have made it possible to determine the three-dimensional (3D) structures of several macromolecular complexes at near-atomic resolution (~3.8 – 4.5 Å). These achievements were accomplished by overcoming challenges in sample handling, instrumentation, image processing, and model building. At near-atomic resolution, many detailed structural features can be resolved, such as the turns and deep grooves of helices, strand separation in β sheets, and densities for loops and bulky amino acid side chains. Such structural data of the cytoplasmic polyhedrosis virus (CPV), the Epsilon 15 bacteriophage and the GroEL complex have provided valuable constraints for atomic model building using integrative tools, thus significantly enhancing the value of the cryoEM structures. The CPV structure revealed a drastic conformational change from a helix to a β hairpin associated with RNA packaging and replication, coupling of RNA processing and release, and the long sought-after polyhedrin-binding domain. These latest advances in single-particle cryoEM provide exciting opportunities for the 3D structural determination of viruses and macromolecular complexes that are either too large or too heterogeneous to be investigated by conventional X-ray crystallography or nuclear magnetic resonance (NMR) methods.
The molecular viewer ArpNavigator allows easy execution of ARP/wARP model-building routines while model-update steps are shown in real time, rendering the whole process transparent to the user.
Automated model-building software aims at the objective interpretation of crystallographic diffraction data by means of the construction or completion of macromolecular models. Automated methods have rapidly gained in popularity as they are easy to use and generate reproducible and consistent results. However, the process of model building has become increasingly hidden and the user is often left to decide on how to proceed further with little feedback on what has preceded the output of the built model. Here, ArpNavigator, a molecular viewer tightly integrated into the ARP/wARP automated model-building package, is presented that directly controls model building and displays the evolving output in real time in order to make the procedure transparent to the user.
model building; ARP/wARP; molecular graphics
Ultrahigh-resolution protein diffraction data allow valence electron density modelling and calculations of experimental electrostatic properties. Protein–ligand interaction energy may therefore be estimated.
With an increasing number of biological macromolecular crystal structures measured at ultrahigh resolution (1 Å or better), it is necessary to extend to large systems the experimental valence electron density modelling that is applied to small molecules. A database of average multipole populations has been built, describing the electron density of chemical groups in all 20 amino acids found in proteins. It allows calculation of atomic aspherical scattering factors, which are the starting point for refinement of the protein electron density, using the MoPro software. It is shown that the use of non-spherical scattering factors has a major impact on crystallographic statistics and results in a more accurate crystal structure, notably in terms of thermal displacement parameters and bond distances involving H atoms. It is also possible to obtain a realistic valence electron density model, which is used in the calculation of the electrostatic potential and energetic properties of proteins.
electron density; protein refinement; high-resolution crystallography
A genetic algorithm has been developed to optimize the phases of the strongest reflections in SIR/SAD data. This is shown to facilitate density modification and model building in several test cases.
Experimental phasing of diffraction data from macromolecular crystals involves deriving phase probability distributions. These distributions are often bimodal, making their weighted average, the centroid phase, improbable, so that electron-density maps computed using centroid phases are often non-interpretable. Density modification brings in information about the characteristics of electron density in protein crystals. In successful cases, this allows a choice between the modes in the phase probability distributions, and the maps can cross the borderline between non-interpretable and interpretable. Based on the suggestions by Vekhter [Vekhter (2005 ▶), Acta Cryst. D61, 899–902], the impact of identifying optimized phases for a small number of strong reflections prior to the density-modification process was investigated while using the centroid phase as a starting point for the remaining reflections. A genetic algorithm was developed that optimizes the quality of such phases using the skewness of the density map as a target function. Phases optimized in this way are then used in density modification. In most of the tests, the resulting maps were of higher quality than maps generated from the original centroid phases. In one of the test cases, the new method sufficiently improved a marginal set of experimental SAD phases to enable successful map interpretation. A computer program, SISA, has been developed to apply this method for phase improvement in macromolecular crystallography.
experimental phasing; density modification; genetic algorithms
A cross-validation-based method for bias reduction in ‘classical’ iterative density modification of experimental X-ray crystallography maps provides significantly more accurate phase-quality estimates and leads to improved automated model building.
Density modification often suffers from an overestimation of phase quality, as seen by escalated figures of merit. A new cross-validation-based method to address this estimation bias by applying a bias-correction parameter ‘β’ to maximum-likelihood phase-combination functions is proposed. In tests on over 100 single-wavelength anomalous diffraction data sets, the method is shown to produce much more reliable figures of merit and improved electron-density maps. Furthermore, significantly better results are obtained in automated model building iterated with phased refinement using the more accurate phase probability parameters from density modification.
reliable figure-of-merit estimates; density modification; maximum likelihood; bias reduction
Cryo-Electron Microscopy can visualize large macromolecular assemblies at resolutions often below 10 Å and recently as good as 3.8–4.5 Å. These density maps provide important insights into the biological functioning of molecular machineries such as viruses or the ribosome, in particular if atomic-resolution crystal structures or models of individual components of the assembly can be placed into the density map. The present work introduces a novel algorithm termed BCL::EM-Fit that accurately fits atomic-detail structural models into medium resolution density maps. In an initial step, a “geometric hashing” algorithm provides a short list of likely placements. In a follow up Monte Carlo/Metropolis refinement step, the initial placements are optimized by their cross correlation coefficient. The resolution of density maps for a reliable fit was determined to be 10 Å or better using tests with simulated density maps. The algorithm was applied to fitting of capsid proteins into an experimental cryoEM density map of human adenovirus at a resolution of 6.8 and 9.0 Å, and fitting of the GroEL protein at 5.4 Å. In the process, the handedness of the cryoEM density map was unambiguously identified. The BCL::EM-Fit algorithm offers an alternative to the established Fourier/Real space fitting programs. BCL::EM-Fit is free for academic use and available from a webserver or as downloadable binary file at http://www.meilerlab.org.
Cryo-electron microscopy; cryoEM; geometric hashing; real space; Monte Carlo Metropolis; fitting; docking
The application of molecular replacement (MR) in macromolecular crystallography can be limited by the “model bias” problem. Here we propose a strategy to reduce model bias when only part of a new structure is known: after the MR search, structure determination of the unknown part of the new structure can be facilitated by cross-crystal averaging of the known part of the new structure with the search model. This strategy dramatically improves electron density in the unknown part of the new structure. It has enabled us to determine the structures of two coronavirus receptor-binding domains each complexed with their receptor at moderate resolutions. In a test case, it also enabled automated model building when over 50% of an antigen-antibody complex was absent. These results suggest that this averaging strategy can be routinely used after MR to enhance the interpretability of electron density associated with missing model.
The automated building of a protein model into an electron density map remains a challenging problem. In the ARP/wARP approach, model building is facilitated by initially interpreting a density map with free atoms of unknown chemical identity; all structural information for such chemically unassigned atoms is discarded. Here, this is remedied by applying restraints between free atoms, and between free atoms and a partial protein model. These are based on geometric considerations of protein structure and tentative (conditional) assignments for the free atoms. Restraints are applied in the REFMAC5 refinement program and are generated on an ad hoc basis, allowing them to fluctuate from step to step. A large set of experimentally phased and molecular replacement structures showcases individual structures where automated building is improved drastically by the conditional restraints. The concept and implementation we present can also find application in restraining geometries, such as hydrogen bonds, in low-resolution refinement.
With single-particle electron cryomicroscopy (cryo-eM), it is possible to visualize large, macromolecular assemblies in near-native states. although subnanometer resolutions have been routinely achieved for many specimens, state of the art cryo-eM has pushed to near-atomic (3.3–4.6 Å) resolutions. at these resolutions, it is now possible to construct reliable atomic models directly from the cryo-eM density map. In this study, we describe our recently developed protocols for performing the three-dimensional reconstruction and modeling of Mm-cpn, a group II chaperonin, determined to 4.3 Å resolution. this protocol, utilizing the software tools eMan, Gorgon and coot, can be adapted for use with nearly all specimens imaged with cryo-eM that target beyond 5 Å resolution. additionally, the feature recognition and computational modeling tools can be applied to any near-atomic resolution density maps, including those from X-ray crystallography.
The structures of large macromolecular complexes in different functional states can be determined by cryo-electron microscopy, which yields electron density maps of low to intermediate resolutions. The maps can be combined with high-resolution atomic structures of components of the complex, to produce a model for the complex that is more accurate than the formal resolution of the map. To this end, methods have been developed to dock atomic models into density maps rigidly or flexibly, and to refine a docked model so as to optimize the fit of the atomic model into the map. We have developed a new refinement method called YUP.SCX. The electron density map is converted into a component of the potential energy function to which terms for stereochemical restraints and volume exclusion are added. The potential energy function is then minimized (using simulated annealing) to yield a stereochemically-restrained atomic structure that fits into the electron density map optimally. We used this procedure to construct an atomic model of the 70S ribosome in the pre-accommodation state. Although some atoms are displaced by as much as 33 Å, they divide themselves into nearly rigid fragments along natural boundaries with smooth transitions between the fragments.
Electron microscopy; simulated annealing; structural refinement
The identification and modelling of ligands into macromolecular models is important for understanding molecule's function and for designing inhibitors to modulate its activities. We describe new algorithms for the automated building of ligands into electron density maps in crystal structure determination. Location of the ligand-binding site is achieved by matching numerical shape features describing the ligand to those of density clusters using a “fragmentation-tree” density representation. The ligand molecule is built using two distinct algorithms exploiting free atoms with inter-atomic connectivity and Metropolis-based optimisation of the conformational state of the ligand, producing an ensemble of structures from which the final model is derived. The method was validated on several thousand entries from the Protein Data Bank. In the majority of cases, the ligand-binding site could be correctly located and the ligand model built with a coordinate accuracy of better than 1 Å. We anticipate that the method will be of routine use to anyone modelling ligands, lead compounds or even compound fragments as part of protein functional analyses or drug design efforts.
electron density map; small-molecule binders; shape; hybrid approach; drug design
A method for automated macromolecular main-chain model building is described.
An algorithm for the automated macromolecular model building of polypeptide backbones is described. The procedure is hierarchical. In the initial stages, many overlapping polypeptide fragments are built. In subsequent stages, the fragments are extended and then connected. Identification of the locations of helical and β-strand regions is carried out by FFT-based template matching. Fragment libraries of helices and β-strands from refined protein structures are then positioned at the potential locations of helices and strands and the longest segments that fit the electron-density map are chosen. The helices and strands are then extended using fragment libraries consisting of sequences three amino acids long derived from refined protein structures. The resulting segments of polypeptide chain are then connected by choosing those which overlap at two or more Cα positions. The fully automated procedure has been implemented in RESOLVE and is capable of model building at resolutions as low as 3.5 Å. The algorithm is useful for building a preliminary main-chain model that can serve as a basis for refinement and side-chain addition.
model building; template matching; fragment extension
A method for rapid model building of α-helices at moderate resolution is presented.
A method for the identification of α-helices in electron-density maps at low resolution followed by interpretation at moderate to high resolution is presented. Rapid identification is achieved at low resolution, where α-helices appear as tubes of density. The positioning and direction of the α-helices is obtained at moderate to high resolution, where the positions of side chains can be seen. The method was tested on a set of 42 experimental electron-density maps at resolutions ranging from 1.5 to 3.8 Å. An average of 63% of the α-helical residues in these proteins were built and an average of 76% of the residues built matched helical residues in the refined models of the proteins. The overall average r.m.s.d. between main-chain atoms in the modeled α-helices and the nearest atom with the same name in the refined models of the proteins was 1.3 Å.
structure solution; model building; Protein Data Bank; α-helices; PHENIX; experimental electron-density maps