A procedure for model building is described that combines morphing a model to match a density map, trimming the morphed model and aligning the model to a sequence.
A procedure termed ‘morphing’ for improving a model after it has been placed in the crystallographic cell by molecular replacement has recently been developed. Morphing consists of applying a smooth deformation to a model to make it match an electron-density map more closely. Morphing does not change the identities of the residues in the chain, only their coordinates. Consequently, if the true structure differs from the working model by containing different residues, these differences cannot be corrected by morphing. Here, a procedure that helps to address this limitation is described. The goal of the procedure is to obtain a relatively complete model that has accurate main-chain atomic positions and residues that are correctly assigned to the sequence. Residues in a morphed model that do not match the electron-density map are removed. Each segment of the resulting trimmed morphed model is then assigned to the sequence of the molecule using information about the connectivity of the chains from the working model and from connections that can be identified from the electron-density map. The procedure was tested by application to a recently determined structure at a resolution of 3.2 Å and was found to increase the number of correctly identified residues in this structure from the 88 obtained using phenix.resolve sequence assignment alone (Terwilliger, 2003 ▶) to 247 of a possible 359. Additionally, the procedure was tested by application to a series of templates with sequence identities to a target structure ranging between 7 and 36%. The mean fraction of correctly identified residues in these cases was increased from 33% using phenix.resolve sequence assignment to 47% using the current procedure. The procedure is simple to apply and is available in the Phenix software package.
morphing; model building; sequence assignment; model–map correlation; loop-building
A procedure for iterative model-building, statistical density modification and refinement at moderate resolution (up to about 2.8 Å) is described.
An iterative process for improving the completeness and quality of atomic models automatically built at moderate resolution (up to about 2.8 Å) is described. The process consists of cycles of model building interspersed with cycles of refinement and combining phase information from the model with experimental phase information (if any) using statistical density modification. The process can lead to substantial improvements in both the accuracy and completeness of the model compared with a single cycle of model building. For eight test cases solved by MAD or SAD at resolutions ranging from 2.0 to 2.8 Å, the fraction of models built and assigned to sequence was 46–91% (mean of 65%) after the first cycle of building and refinement, and 78–95% (mean of 87%) after 20 cycles. In an additional test case, an incorrect model of gene 5 protein (PDB code 2gn5; r.m.s.d. of main-chain atoms from the more recent refined structure 1vqb at 1.56 Å) was rebuilt using only structure-factor amplitude information at varying resolutions from 2.0 to 3.0 Å. Rebuilding was effective at resolutions up to about 2.5 Å. The resulting models had 60–80% of the residues built and an r.m.s.d. of main-chain atoms from the refined structure of 0.20 to 0.62 Å. The algorithm is useful for building preliminary models of macromolecules suitable for an experienced crystallographer to extend, correct and fully refine.
density modification; model building; refinement
The practical limits of molecular replacement can be extended by using several specifically designed protein models based on fold-recognition methods and by exhaustive searches performed in a parallelized pipeline. Updated results from the JCSG MR pipeline, which to date has solved 33 molecular-replacement structures with less than 35% sequence identity to the closest homologue of known structure, are presented.
The success rate of molecular replacement (MR) falls considerably when search models share less than 35% sequence identity with their templates, but can be improved significantly by using fold-recognition methods combined with exhaustive MR searches. Models based on alignments calculated with fold-recognition algorithms are more accurate than models based on conventional alignment methods such as FASTA or BLAST, which are still widely used for MR. In addition, by designing MR pipelines that integrate phasing and automated refinement and allow parallel processing of such calculations, one can effectively increase the success rate of MR. Here, updated results from the JCSG MR pipeline are presented, which to date has solved 33 MR structures with less than 35% sequence identity to the closest homologue of known structure. By using difficult MR problems as examples, it is demonstrated that successful MR phasing is possible even in cases where the similarity between the model and the template can only be detected with fold-recognition algorithms. In the first step, several search models are built based on all homologues found in the PDB by fold-recognition algorithms. The models resulting from this process are used in parallel MR searches with different combinations of input parameters of the MR phasing algorithm. The putative solutions are subjected to rigid-body and restrained crystallographic refinement and ranked based on the final values of free R factor, figure of merit and deviations from ideal geometry. Finally, crystal packing and electron-density maps are checked to identify the correct solution. If this procedure does not yield a solution with interpretable electron-density maps, then even more alternative models are prepared. The structurally variable regions of a protein family are identified based on alignments of sequences and known structures from that family and appropriate trimmings of the models are proposed. All combinations of these trimmings are applied to the search models and the resulting set of models is used in the MR pipeline. It is estimated that with the improvements in model building and exhaustive parallel searches with existing phasing algorithms, MR can be successful for more than 50% of recognizable homologues of known structures below the threshold of 35% sequence identity. This implies that about one-third of the proteins in a typical bacterial proteome are potential MR targets.
molecular replacement; sequence-alignment accuracy; homology modeling; parameter-space screening; structural genomics
DEN refinement and automated model building with AutoBuild were used to determine the structure of a putative succinyl-diaminopimelate desuccinylase from C. glutamicum. This difficult case of molecular-replacement phasing shows that the synergism between DEN refinement and AutoBuild outperforms standard refinement protocols.
Phasing by molecular replacement remains difficult for targets that are far from the search model or in situations where the crystal diffracts only weakly or to low resolution. Here, the process of determining and refining the structure of Cgl1109, a putative succinyl-diaminopimelate desuccinylase from Corynebacterium glutamicum, at ∼3 Å resolution is described using a combination of homology modeling with MODELLER, molecular-replacement phasing with Phaser, deformable elastic network (DEN) refinement and automated model building using AutoBuild in a semi-automated fashion, followed by final refinement cycles with phenix.refine and Coot. This difficult molecular-replacement case illustrates the power of including DEN restraints derived from a starting model to guide the movements of the model during refinement. The resulting improved model phases provide better starting points for automated model building and produce more significant difference peaks in anomalous difference Fourier maps to locate anomalous scatterers than does standard refinement. This example also illustrates a current limitation of automated procedures that require manual adjustment of local sequence misalignments between the homology model and the target sequence.
reciprocal-space refinement; DEN refinement; real-space refinement; automated model building; succinyl-diaminopimelate desuccinylase
Statistical density modification can make use of local patterns of density found in protein structures to improve crystallographic phases.
A method for improving crystallographic phases is presented that is based on the preferential occurrence of certain local patterns of electron density in macromolecular electron-density maps. The method focuses on the relationship between the value of electron density at a point in the map and the pattern of density surrounding this point. Patterns of density that can be superimposed by rotation about the central point are considered equivalent. Standard templates are created from experimental or model electron-density maps by clustering and averaging local patterns of electron density. The clustering is based on correlation coefficients after rotation to maximize the correlation. Experimental or model maps are also used to create histograms relating the value of electron density at the central point to the correlation coefficient of the density surrounding this point with each member of the set of standard patterns. These histograms are then used to estimate the electron density at each point in a new experimental electron-density map using the pattern of electron density at points surrounding that point and the correlation coefficient of this density to each of the set of standard templates, again after rotation to maximize the correlation. The method is strengthened by excluding any information from the point in question from both the templates and the local pattern of density in the calculation. A function based on the origin of the Patterson function is used to remove information about the electron density at the point in question from nearby electron density. This allows an estimation of the electron density at each point in a map, using only information from other points in the process. The resulting estimates of electron density are shown to have errors that are nearly independent of the errors in the original map using model data and templates calculated at a resolution of 2.6 Å. Owing to this independence of errors, information from the new map can be combined in a simple fashion with information from the original map to create an improved map. An iterative phase-improvement process using this approach and other applications of the image-reconstruction method are described and applied to experimental data at resolutions ranging from 2.4 to 2.8 Å.
density modification; pattern matching
The PDB_REDO pipeline aims to improve macromolecular structures by optimizing the crystallographic refinement parameters and performing partial model building. Here, algorithms are presented that allowed a web-server implementation of PDB_REDO, and the first user results are discussed.
The refinement and validation of a crystallographic structure model is the last step before the coordinates and the associated data are submitted to the Protein Data Bank (PDB). The success of the refinement procedure is typically assessed by validating the models against geometrical criteria and the diffraction data, and is an important step in ensuring the quality of the PDB public archive [Read et al. (2011 ▶), Structure, 19, 1395–1412]. The PDB_REDO procedure aims for ‘constructive validation’, aspiring to consistent and optimal refinement parameterization and pro-active model rebuilding, not only correcting errors but striving for optimal interpretation of the electron density. A web server for PDB_REDO has been implemented, allowing thorough, consistent and fully automated optimization of the refinement procedure in REFMAC and partial model rebuilding. The goal of the web server is to help practicing crystallographers to improve their model prior to submission to the PDB. For this, additional steps were implemented in the PDB_REDO pipeline, both in the refinement procedure, e.g. testing of resolution limits and k-fold cross-validation for small test sets, and as new validation criteria, e.g. the density-fit metrics implemented in EDSTATS and ligand validation as implemented in YASARA. Innovative ways to present the refinement and validation results to the user are also described, which together with auto-generated Coot scripts can guide users to subsequent model inspection and improvement. It is demonstrated that using the server can lead to substantial improvement of structure models before they are submitted to the PDB.
PDB_REDO; validation; model optimization
An OMIT procedure is presented that has the benefits of iterative model building density modification and refinement yet is essentially unbiased by the atomic model that is built.
A procedure for carrying out iterative model building, density modification and refinement is presented in which the density in an OMIT region is essentially unbiased by an atomic model. Density from a set of overlapping OMIT regions can be combined to create a composite ‘iterative-build’ OMIT map that is everywhere unbiased by an atomic model but also everywhere benefiting from the model-based information present elsewhere in the unit cell. The procedure may have applications in the validation of specific features in atomic models as well as in overall model validation. The procedure is demonstrated with a molecular-replacement structure and with an experimentally phased structure and a variation on the method is demonstrated by removing model bias from a structure from the Protein Data Bank.
model building; model validation; macromolecular models; Protein Data Bank; refinement; OMIT maps; bias; structure refinement; PHENIX
EM-Fold was used to build models for nine proteins in the maps of GroEL (7.7 Å resolution) and ribosome (6.4 Å resolution) in the ab initio modeling category of the 2010 cryoEM modeling challenge. EM-Fold assembles predicted secondary structure elements (SSEs) into regions of the density map that were identified to correspond to either α-helices or β-strands. The assembly uses a Monte Carlo algorithm where loop closure, density-SSE length agreement, and strength of connecting density between SSEs are evaluated. Top scoring models are refined by translating, rotating and bending SSEs to yield better agreement with the density map. EM-Fold produces models that contain backbone atoms within secondary structure elements only. The RMSD values of the models with respect to native range from 2.4 Å to 3.5 Å for six of the nine proteins. EM-Fold failed to predict the correct topology in three cases. Subsequently Rosetta was used to build loops and side chains for the very best scoring models after EM-Fold refinement. The refinement within Rosetta’s force field is driven by a density agreement score that calculates a cross correlation between a density map simulated from the model and the experimental density map. All-atom RMSDs as low as 3.4 Å are achieved in favorable cases. Values above 10.0 Å are observed for two proteins with low overall content of secondary structure and hence particularly complex loop modeling problems. RMSDs over residues in secondary structure elements range from 2.5 Å to 4.8 Å.
Exhaustive exploration of molecular interactions at the level of complete proteomes requires efficient and reliable computational approaches to protein function inference. Ligand docking and ranking techniques show considerable promise in their ability to quantify the interactions between proteins and small molecules. Despite the advances in the development of docking approaches and scoring functions, the genome-wide application of many ligand docking/screening algorithms is limited by the quality of the binding sites in theoretical receptor models constructed by protein structure prediction. In this study, we describe a new template-based method for the local refinement of ligand-binding regions in protein models using remotely related templates identified by threading. We designed a Support Vector Regression (SVR) model that selects correct binding site geometries in a large ensemble of multiple receptor conformations. The SVR model employs several scoring functions that impose geometrical restraints on the Cα positions, account for the specific chemical environment within a binding site and optimize the interactions with putative ligands. The SVR score is well correlated with the RMSD from the native structure; in 47% (70%) of the cases, the Pearson’s correlation coefficient is >0.5 (>0.3). When applied to weakly homologous models, the average heavy atom, local RMSD from the native structure of the top-ranked (best of top five) binding site geometries is 3.1 Å (2.9 Å) for roughly half of the targets; this represents a 0.1 (0.3) Å average improvement over the original predicted structure. Focusing on the subset of strongly conserved residues, the average heavy atom RMSD is 2.6 Å (2.3 Å). Furthermore, we estimate the upper bound of template-based binding site refinement using only weakly related proteins to be ~2.6 Å RMSD. This value also corresponds to the plasticity of the ligand-binding regions in distant homologues. The Binding Site Refinement (BSR) approach is available to the scientific community as a web server that can be accessed at http://cssb.biology.gatech.edu/bsr/.
Ligand-binding site refinement; proteinthreading; protein structure prediction; ligand-binding site prediction; ensemble docking; molecular function
Automatic modeling methods using cryo-electron microscopy (cryoEM) density maps as constrains are promising approaches to building atomic models of individual proteins or protein domains. However, their application to large macromolecular assemblies has not been possible largely due to computational limitations inherent to such unsupervised methods. Here we describe a new method, EM-IMO, for building, modifying and refining local structures of protein models using cryoEM maps as a constraint. As a supervised refinement method, EM-IMO allows users to specify parameters derived from inspections, so as to guide, and as a consequence, significantly speed up the refinement. An EM-IMO-based refinement protocol is first benchmarked on a data set of 50 homology models using simulated density maps. A multi-scale refinement strategy that combines EM-IMO-based and molecular dynamics (MD)-based refinement is then applied to build backbone models for the seven conformers of the five capsid proteins in our near-atomic resolution cryoEM map of the grass carp reovirus (GCRV) virion, a member of the aquareovirus genus of the Reoviridae family. The refined models allow us to reconstruct a backbone model of the entire GCRV capsid and provide valuable functional insights that are described in the accompanying publication. Our study demonstrates that the integrated use of homology modeling and a multi-scale refinement protocol that combines supervised and automated structure refinement offers a practical strategy for building atomic models based on medium- to high-resolution cryoEM density maps.
cryo-electron microscopy; density fitting; homology modeling; structure refinement; protein structure prediction
Achieving atomic-level accuracy in comparative protein models is limited by our ability to refine the initial, homolog-derived model closer to the native state. Despite considerable effort, progress in developing a generalized refinement method has been limited. In contrast, methods have been described that can accurately reconstruct loop conformations in native protein structures. We hypothesize that loop refinement in homology models is much more difficult than loop reconstruction in crystal structures, in part, because side-chain, backbone, and other structural inaccuracies surrounding the loop create a challenging sampling problem; the loop cannot be refined without simultaneously refining adjacent portions. In this work, we single out one sampling issue in an artificial but useful test set and examine how loop refinement accuracy is affected by errors in surrounding side-chains. In 80 high-resolution crystal structures, we first perturbed 6–12 residue loops away from the crystal conformation, and placed all protein side chains in non-native but low energy conformations. Even these relatively small perturbations in the surroundings made the loop prediction problem much more challenging. Using a previously published loop prediction method, median backbone (N-Cα-CO) RMSD’s for groups of 6, 8, 10, and 12 residue loops are 0.3/0.6/0.4/0.6 Å, respectively, on native structures and increase to 1.1/2.2/1.5/2.3 Å on the perturbed cases. We then augmented our previous loop prediction method to simultaneously optimize the rotamer states of side chains surrounding the loop. Our results show that this augmented loop prediction method can recover the native state in many perturbed structures where the previous method failed; the median RMSD’s for the 6, 8, 10, and 12 residue perturbed loops improve to 0.4/0.8/1.1/1.2 Å. Finally, we highlight three comparative models from blind tests, in which our new method predicted loops closer to the native conformation than first modeled using the homolog template, a task generally understood to be difficult. Although many challenges remain in refining full comparative models to high accuracy, this work offers a methodical step toward that goal.
comparative; homology; modeling; refinement; loop prediction; molecular mechanics; force field
The highly automated PHENIX AutoBuild wizard is described. The procedure can be applied equally well to phases derived from isomorphous/anomalous and molecular-replacement methods.
The PHENIX AutoBuild wizard is a highly automated tool for iterative model building, structure refinement and density modification using RESOLVE model building, RESOLVE statistical density modification and phenix.refine structure refinement. Recent advances in the AutoBuild wizard and phenix.refine include automated detection and application of NCS from models as they are built, extensive model-completion algorithms and automated solvent-molecule picking. Model-completion algorithms in the AutoBuild wizard include loop building, crossovers between chains in different models of a structure and side-chain optimization. The AutoBuild wizard has been applied to a set of 48 structures at resolutions ranging from 1.1 to 3.2 Å, resulting in a mean R factor of 0.24 and a mean free R factor of 0.29. The R factor of the final model is dependent on the quality of the starting electron density and is relatively independent of resolution.
model building; model completion; macromolecular models; Protein Data Bank; structure refinement; PHENIX
This paper presents a generalized learning based framework for improving both speed and accuracy of the existing deformable registration method. The key of our framework involves the utilization of a support vector regression (SVR) to learn the correlation between brain image appearances and their corresponding shape deformations to a template, for helping significantly cut down the computation cost and improve the robustness to local minima by using the learned correlation to instantly predict a good subject-specific deformation initialization for any given subject under registration. Our framework consists of three major parts: 1) training of SVR models based on the statistics of image samples and their shape deformations to capture intrinsic image-deformation correlations, 2) deformation prediction for a new subject with the trained SVR models to generate a subject-resemblance intermediate template by warping the original template with the predicted deformations, and 3) estimating of the residual deformation from the intermediate template to the subject for refined registration. Any existing deformable registration methods can be easily employed for training the SVR models and estimating the residual deformation. We have tested in this paper the two widely used deformable registration algorithms, i.e., HAMMER  and diffeomorphic demons , for demonstration of our proposed frameowrk. Experimental results show that, compared to the registration using the original methods (with no deformation prediction), our framework achieves a significant speedup (6X faster than HAMMER, and 3X faster than diffeomorphic demons), while maintaining comparable (or even slighly better) registration accuracy.
A method for automated macromolecular side-chain model building and for aligning the sequence to the map is described.
An algorithm is described for automated building of side chains in an electron-density map once a main-chain model is built and for alignment of the protein sequence to the map. The procedure is based on a comparison of electron density at the expected side-chain positions with electron-density templates. The templates are constructed from average amino-acid side-chain densities in 574 refined protein structures. For each contiguous segment of main chain, a matrix with entries corresponding to an estimate of the probability that each of the 20 amino acids is located at each position of the main-chain model is obtained. The probability that this segment corresponds to each possible alignment with the sequence of the protein is estimated using a Bayesian approach and high-confidence matches are kept. Once side-chain identities are determined, the most probable rotamer for each side chain is built into the model. The automated procedure has been implemented in the RESOLVE software. Combined with automated main-chain model building, the procedure produces a preliminary model suitable for refinement and extension by an experienced crystallographer.
model building; template matching
MAIN is interactive software designed to interactively perform the complex tasks of macromolecular crystal structure determination and validation. The features of MAIN and its tools for electron-density map calculations, model building, refinement in real and reciprocal space, and validation exploiting noncrystallographic symmetry in single and multiple crystal forms are presented.
MAIN is software that has been designed to interactively perform the complex tasks of macromolecular crystal structure determination and validation. Using MAIN, it is possible to perform density modification, manual and semi-automated or automated model building and rebuilding, real- and reciprocal-space structure optimization and refinement, map calculations and various types of molecular structure validation. The prompt availability of various analytical tools and the immediate visualization of molecular and map objects allow a user to efficiently progress towards the completed refined structure. The extraordinary depth perception of molecular objects in three dimensions that is provided by MAIN is achieved by the clarity and contrast of colours and the smooth rotation of the displayed objects. MAIN allows simultaneous work on several molecular models and various crystal forms. The strength of MAIN lies in its manipulation of averaged density maps and molecular models when noncrystallographic symmetry (NCS) is present. Using MAIN, it is possible to optimize NCS parameters and envelopes and to refine the structure in single or multiple crystal forms.
molecular modelling; molecular graphics; macromolecular crystal structure determination; map calculation; computer programs
Accurate sequence alignment is required in many bioinformatics applications but, when sequence similarity is low, it is difficult to obtain accurate alignments based on sequence similarity alone. The accuracy improves when the structures are available, but current structure-based sequence alignment procedures still mis-align substantial numbers of residues. In order to correct such errors, we previously explored the possibility of replacing the residue-based dynamic programming algorithm in structure alignment procedures with the Seed Extension algorithm, which does not use a gap penalty. Here, we describe a new procedure called RSE (Refinement with Seed Extension) that iteratively refines a structure-based sequence alignment.
RSE uses SE (Seed Extension) in its core, which is an algorithm that we reported recently for obtaining a sequence alignment from two superimposed structures. The RSE procedure was evaluated by comparing the correctly aligned fractions of residues before and after the refinement of the structure-based sequence alignments produced by popular programs. CE, DaliLite, FAST, LOCK2, MATRAS, MATT, TM-align, SHEBA and VAST were included in this analysis and the NCBI's CDD root node set was used as the reference alignments. RSE improved the average accuracy of sequence alignments for all programs tested when no shift error was allowed. The amount of improvement varied depending on the program. The average improvements were small for DaliLite and MATRAS but about 5% for CE and VAST. More substantial improvements have been seen in many individual cases. The additional computation times required for the refinements were negligible compared to the times taken by the structure alignment programs.
RSE is a computationally inexpensive way of improving the accuracy of a structure-based sequence alignment. It can be used as a standalone procedure following a regular structure-based sequence alignment or to replace the traditional iterative refinement procedures based on residue-level dynamic programming algorithm in many structure alignment programs.
A procedure for carrying out iterative model building, density modification and refinement is presented in which the density in an OMITregion is essentially unbiased by an atomic model. Density from a set of overlapping OMIT regions can be combined to create a composite ‘iterative-build’ OMIT map that is everywhere unbiased by an atomic model but also everywhere benefiting from the model-based information present elsewhere in the unit cell. The procedure may have applications in the validation of specific features in atomic models as well as in overall model validation. The procedure is demonstrated with a molecular-replacement structure and with an experimentally phased structure and a variation on the method is demonstrated by removing model bias from a structure from the Protein Data Bank.
A method for automated macromolecular main-chain model building is described.
An algorithm for the automated macromolecular model building of polypeptide backbones is described. The procedure is hierarchical. In the initial stages, many overlapping polypeptide fragments are built. In subsequent stages, the fragments are extended and then connected. Identification of the locations of helical and β-strand regions is carried out by FFT-based template matching. Fragment libraries of helices and β-strands from refined protein structures are then positioned at the potential locations of helices and strands and the longest segments that fit the electron-density map are chosen. The helices and strands are then extended using fragment libraries consisting of sequences three amino acids long derived from refined protein structures. The resulting segments of polypeptide chain are then connected by choosing those which overlap at two or more Cα positions. The fully automated procedure has been implemented in RESOLVE and is capable of model building at resolutions as low as 3.5 Å. The algorithm is useful for building a preliminary main-chain model that can serve as a basis for refinement and side-chain addition.
model building; template matching; fragment extension
Molecular replacement (MR) is the method of choice for X-ray crystallography structure determination when structural homologues are available in the Protein Data Bank (PDB). Although the success rate of MR decreases sharply when the sequence similarity between template and target proteins drops below 35% identical residues, it has been found that screening for MR solutions with a large number of different homology models may still produce a suitable solution where the original template failed. Here we present the web tool CaspR, implementing such a strategy in an automated manner. On input of experimental diffraction data, of the corresponding target sequence and of one or several potential templates, CaspR executes an optimized molecular replacement procedure using a combination of well-established stand-alone software tools. The protocol of model building and screening begins with the generation of multiple structure–sequence alignments produced with T-COFFEE, followed by homology model building using MODELLER, molecular replacement with AMoRe and model refinement based on CNS. As a result, CaspR provides a progress report in the form of hierarchically organized summary sheets that describe the different stages of the computation with an increasing level of detail. For the 10 highest-scoring potential solutions, pre-refined structures are made available for download in PDB format. Results already obtained with CaspR and reported on the web server suggest that such a strategy significantly increases the fraction of protein structures which may be solved by MR. Moreover, even in situations where standard MR yields a solution, pre-refined homology models produced by CaspR significantly reduce the time-consuming refinement process. We expect this automated procedure to have a significant impact on the throughput of large-scale structural genomics projects. CaspR is freely available at http://igs-server.cnrs-mrs.fr/Caspr/.
The challenge in protein structure prediction using homology modeling is the lack of reliable methods to refine the low resolution homology models. Unconstrained all-atom molecular dynamics (MD) does not serve well for structure refinement due to its limited conformational search. We have developed and tested the constrained MD method, based on the Generalized Newton-Euler Inverse Mass Operator (GNEIMO) algorithm for protein structure refinement. In this method, the high-frequency degrees of freedom are replaced with hard holonomic constraints and a protein is modeled as a collection of rigid body clusters connected by flexible torsional hinges. This allows larger integration time steps and enhances the conformational search space. In this work, we have demonstrated the use of a constraint free GNEIMO method for protein structure refinement that starts from low-resolution decoy sets derived from homology methods. In the eight proteins with three decoys for each, we observed an improvement of ~2 Å in the RMSD to the known experimental structures of these proteins. The GNEIMO method also showed enrichment in the population density of native-like conformations. In addition, we demonstrated structural refinement using a “Freeze and Thaw” clustering scheme with the GNEIMO framework as a viable tool for enhancing localized conformational search. We have derived a robust protocol based on the GNEIMO replica exchange method for protein structure refinement that can be readily extended to other proteins and possibly applicable for high throughput protein structure refinement.
Constrained MD; GNEIMO; Structure Refinement; Decoys
The quality of model structures generated by contemporary protein structure prediction methods strongly depends on the degree of similarity between the target and available template structures. Therefore, the importance of improving template-based model structures beyond the accuracy available from template information has been emphasized in the structure prediction community. The GalaxyRefine web server, freely available at http://galaxy.seoklab.org/refine, is based on a refinement method that has been successfully tested in CASP10. The method first rebuilds side chains and performs side-chain repacking and subsequent overall structure relaxation by molecular dynamics simulation. According to the CASP10 assessment, this method showed the best performance in improving the local structure quality. The method can improve both global and local structure quality on average, when used for refining the models generated by state-of-the-art protein structure prediction servers.
A new module, Guided Ligand Replacement (GLR), has been developed in Phenix to increase the ease and success rate of ligand placement when prior protein-ligand complexes are available.
The process of iterative structure-based drug design involves the X-ray crystal structure determination of upwards of 100 ligands with the same general scaffold (i.e. chemotype) complexed with very similar, if not identical, protein targets. In conjunction with insights from computational models and assays, this collection of crystal structures is analyzed to improve potency, to achieve better selectivity and to reduce liabilities such as absorption, distribution, metabolism, excretion and toxicology. Current methods for modeling ligands into electron-density maps typically do not utilize information on how similar ligands bound in related structures. Even if the electron density is of sufficient quality and resolution to allow de novo placement, the process can take considerable time as the size, complexity and torsional degrees of freedom of the ligands increase. A new module, Guided Ligand Replacement (GLR), was developed in Phenix to increase the ease and success rate of ligand placement when prior protein–ligand complexes are available. At the heart of GLR is an algorithm based on graph theory that associates atoms in the target ligand with analogous atoms in the reference ligand. Based on this correspondence, a set of coordinates is generated for the target ligand. GLR is especially useful in two situations: (i) modeling a series of large, flexible, complicated or macrocyclic ligands in successive structures and (ii) modeling ligands as part of a refinement pipeline that can automatically select a reference structure. Even in those cases for which no reference structure is available, if there are multiple copies of the bound ligand per asymmetric unit GLR offers an efficient way to complete the model after the first ligand has been placed. In all of these applications, GLR leverages prior knowledge from earlier structures to facilitate ligand placement in the current structure.
ligand placement; guided ligand-replacement method; GLR
Motivation: Single-particle cryo electron microscopy (cryoEM) typically produces density maps of macromolecular assemblies at intermediate to low resolution (∼5–30 Å). By fitting high-resolution structures of assembly components into these maps, pseudo-atomic models can be obtained. Optimizing the quality-of-fit of all components simultaneously is challenging due to the large search space that makes the exhaustive search over all possible component configurations computationally unfeasible.
Results: We developed an efficient mathematical programming algorithm that simultaneously fits all component structures into an assembly density map. The fitting is formulated as a point set matching problem involving several point sets that represent component and assembly densities at a reduced complexity level. In contrast to other point matching algorithms, our algorithm is able to match multiple point sets simultaneously and not only based on their geometrical equivalence, but also based on the similarity of the density in the immediate point neighborhood. In addition, we present an efficient refinement method based on the Iterative Closest Point registration algorithm. The integer quadratic programming method generates an assembly configuration in a few seconds. This efficiency allows the generation of an ensemble of candidate solutions that can be assessed by an independent scoring function. We benchmarked the method using simulated density maps of 11 protein assemblies at 20 Å, and an experimental cryoEM map at 23.5 Å resolution. Our method was able to generate assembly structures with root-mean-square errors <6.5 Å, which have been further reduced to <1.8 Å by the local refinement procedure.
Availability: The program is available upon request as a Matlab code package.
Contact: email@example.com and firstname.lastname@example.org
Supplementary information: Supplementary data are available at Bioinformatics Online.
Modeling advances using Rosetta structure prediction to aid in solving difficult molecular-replacement problems are discussed.
Recent work has shown the effectiveness of structure-prediction methods in solving difficult molecular-replacement problems. The Rosetta protein structure modeling suite can aid in the solution of difficult molecular-replacement problems using templates from 15 to 25% sequence identity; Rosetta refinement guided by noisy density has consistently led to solved structures where other methods fail. In this paper, an overview of the use of Rosetta for these difficult molecular-replacement problems is provided and new modeling developments that further improve model quality are described. Several variations to the method are introduced that significantly reduce the time needed to generate a model and the sampling required to improve the starting template. The improvements are benchmarked on a set of nine difficult cases and it is shown that this improved method obtains consistently better models in less running time. Finally, strategies for best using Rosetta to solve difficult molecular-replacement problems are presented and future directions for the role of structure-prediction methods in crystallography are discussed.
structure prediction; molecular replacement; model building
A new real-space refinement method for low-resolution X-ray crystallography is presented. The method is based on the molecular dynamics flexible fitting protocol targeted at addressing large-scale deformations of the search model to achieve refinement with minimal manual intervention. An explanation of the method is provided, augmented by results from the refinement of both synthetic and experimental low-resolution data, including an independent electrophysiological verification of the xMDFF-refined crystal structure of a voltage-sensor protein.
X-ray crystallography remains the most dominant method for solving atomic structures. However, for relatively large systems, the availability of only medium-to-low-resolution diffraction data often limits the determination of all-atom details. A new molecular dynamics flexible fitting (MDFF)-based approach, xMDFF, for determining structures from such low-resolution crystallographic data is reported. xMDFF employs a real-space refinement scheme that flexibly fits atomic models into an iteratively updating electron-density map. It addresses significant large-scale deformations of the initial model to fit the low-resolution density, as tested with synthetic low-resolution maps of d-ribose-binding protein. xMDFF has been successfully applied to re-refine six low-resolution protein structures of varying sizes that had already been submitted to the Protein Data Bank. Finally, via systematic refinement of a series of data from 3.6 to 7 Å resolution, xMDFF refinements together with electrophysiology experiments were used to validate the first all-atom structure of the voltage-sensing protein Ci-VSP.
xMDFF; molecular dynamics flexible fitting