All-atom models derived from moderate-resolution protein crystal structures contain a high frequency of close nonbonded contacts, independent of the major refinement program used for structure determination. All-atom refinement with PrimeX corrects many of these problematic interactions, producing models that are better suited for use in computational chemistry and related applications.
All-atom models are essential for many applications in molecular modeling and computational chemistry. Nonbonded atomic contacts much closer than the sum of the van der Waals radii of the two atoms (clashes) are commonly observed in such models derived from protein crystal structures. A set of 94 recently deposited protein structures in the resolution range 1.5–2.8 Å were analyzed for clashes by the addition of all H atoms to the models followed by optimization and energy minimization of the positions of just these H atoms. The results were compared with the same set of structures after automated all-atom refinement with PrimeX and with nonbonded contacts in protein crystal structures at a resolution equal to or better than 0.9 Å. The additional PrimeX refinement produced structures with reasonable summary geometric statistics and similar R
free values to the original structures. The frequency of clashes at less than 0.8 times the sum of van der Waals radii was reduced over fourfold compared with that found in the original structures, to a level approaching that found in the ultrahigh-resolution structures. Moreover, severe clashes at less than or equal to 0.7 times the sum of atomic radii were reduced 15-fold. All-atom refinement with PrimeX produced improved crystal structure models with respect to nonbonded contacts and yielded changes in structural details that dramatically impacted on the interpretation of some protein–ligand interactions.
H atoms; van der Waals radii; restraints; nonbonded contacts; clashes; molecular geometry; model quality; force fields; refinement; riding H atoms; electrostatics; hydrogen bonds
Protein loops, the flexible short segments connecting two stable secondary
structural units in proteins, play a critical role in protein structure and
function. Constructing chemically sensible conformations of protein loops that
seamlessly bridge the gap between the anchor points without introducing any
steric collisions remains an open challenge. A variety of algorithms have been
developed to tackle the loop closure problem, ranging from inverse kinematics to
knowledge-based approaches that utilize pre-existing fragments extracted from
known protein structures. However, many of these approaches focus on the
generation of conformations that mainly satisfy the fixed end point condition,
leaving the steric constraints to be resolved in subsequent post-processing
steps. In the present work, we describe a simple solution that simultaneously
satisfies not only the end point and steric conditions, but also chirality and
planarity constraints. Starting from random initial atomic coordinates, each
individual conformation is generated independently by using a simple alternating
scheme of pairwise distance adjustments of randomly chosen atoms, followed by
fast geometric matching of the conformationally rigid components of the
constituent amino acids. The method is conceptually simple, numerically stable
and computationally efficient. Very importantly, additional constraints, such as
those derived from NMR experiments, hydrogen bonds or salt bridges, can be
incorporated into the algorithm in a straightforward and inexpensive way, making
the method ideal for solving more complex multi-loop problems. The remarkable
performance and robustness of the algorithm are demonstrated on a set of protein
loops of length 4, 8, and 12 that have been used in previous studies.
Protein loops play an important role in protein function, such as ligand binding,
recognition, and allosteric regulation. However, due to their flexibility, it is
notoriously difficult to determine their 3D structures using traditional
experimental techniques. As a result, one can often find protein structures with
missing loops in the Protein Data Bank. Their sequence variability also presents
a particular challenge for homology modeling methods, which can only yield good
overall structures given sufficient sequence identity and good experimental
reference structures. Despite extensive research, the construction of protein
loop 3D structures remains an open problem, since a sensible conformation should
seamlessly bridge the anchor points without introducing steric clashes within
the loop itself or between the loop and its surroundings environment. Here, we
present a conceptually simple, mathematically straightforward, numerically
robust and computationally efficient approach for building protein loop
conformations that simultaneously satisfy end-point, steric, planar and chiral
constraints. More importantly, additional constraints derived from experimental
sources can be incorporated in a straightforward manner, allowing the processing
of more complex structures involving multiple interlocking loops.
Protein structure prediction approaches usually perform modeling simulations based on reduced representation of protein structures. For biological utilizations, it is an important step to construct full atomic models from the reduced structure decoys. Most of the current full-atomic model reconstruction procedures have defects which either could not completely remove the steric clashes among backbone atoms or generate final atomic models with worse topology similarity relative to the native structures than the reduced models. In this work, we develop a new protocol, called REMO, to generate full atomic protein models by optimizing the hydrogen-bonding network with basic fragments matched from a newly constructed backbone isomer library of solved protein structures. The algorithm is benchmarked on 230 non-homologous proteins with reduced structure decoys generated by I-TASSER simulations. The results show that REMO has a significant ability to remove steric clashes, and meanwhile retains good topology of the reduced model. The hydrogen-bonding network of the final models is dramatically improved during the procedure. The REMO algorithm has been exploited in the recent CASP8 experiment which demonstrated significant improvements of the I-TASSER models in both atomic-level structural refinement and hydrogen-bonding network construction.
Protein structure prediction; reduced modeling; protein structure refinement; hydrogen-bonding network; structure clustering; steric clash
Although accurate details in RNA structure are of great importance for understanding RNA function, the backbone conformation is difficult to determine, and most existing RNA structures show serious steric clashes (≥ 0.4Å overlap) when hydrogen atoms are taken into account. We have developed a program called RNABC (RNA Backbone Correction) that performs local perturbations to search for alternative conformations that avoid those steric clashes or other local geometry problems. Its input is an all-atom coordinate file for an RNA crystal structure (usually from the MolProbity web service), with problem areas specified. RNABC rebuilds a suite (the unit from sugar to sugar) by anchoring the phosphorus and base positions, which are clearest in crystallographic electron density, and reconstructing the other atoms using forward kinematics. Geometric parameters are constrained within user-specified tolerance of canonical or original values, and torsion angles are constrained to ranges defined through empirical database analyses. Several optimizations reduce the time required to search the many possible conformations. The output results are clustered and presented to the user, who can choose whether to accept one of the alternative conformations.
Two test evaluations show the effectiveness of RNABC, first on the S-motifs from 42 RNA structures, and second on the worst problem suites (clusters of bad clashes, or serious sugar pucker outliers) in 25 unrelated RNA structures. Among the 101 S-motifs, 88 had diagnosed problems, and RNABC produced clash-free conformations with acceptable geometry for 71 of those (about 80%). For the 154 worst problem suites, RNABC proposed alternative conformations for 72. All but 8 of those were judged acceptable after examining electron density (where available) and local conformation. Thus, even for these worst cases, nearly half the time RNABC suggested corrections suitable to initiate further crystallographic refinement. The program is available from http://kinemage.biochem.duke.edu.
kinematic chain; RNA backbone conformation; RNA backbone adjustment; RNA crystallography; automated rebuilding; steric clash; S-motifs; all-atom contacts; structure validation
Motivation: Increasing use of structural modeling for understanding structure–function relationships in proteins has led to the need to ensure that the protein models being used are of acceptable quality. Quality of a given protein structure can be assessed by comparing various intrinsic structural properties of the protein to those observed in high-resolution protein structures.
Results: In this study, we present tools to compare a given structure to high-resolution crystal structures. We assess packing by calculating the total void volume, the percentage of unsatisfied hydrogen bonds, the number of steric clashes and the scaling of the accessible surface area. We assess covalent geometry by determining bond lengths, angles, dihedrals and rotamers. The statistical parameters for the above measures, obtained from high-resolution crystal structures enable us to provide a quality-score that points to specific areas where a given protein structural model needs improvement.
Availability and Implementation: We provide these tools that appraise protein structures in the form of a web server Gaia (http://chiron.dokhlab.org). Gaia evaluates the packing and covalent geometry of a given protein structure and provides quantitative comparison of the given structure to high-resolution crystal structures.
Supplementary information: Supplementary data are available at Bioinformatics online.
The ability to manipulate protein binding affinities is important for the development of proteins as biosensors, industrial reagents, and therapeutics. We have developed a structure-based method to rationally predict single mutations at protein-protein interfaces that enhance binding affinities. The protocol is based on the premise that increasing buried hydrophobic surface area and/or reducing buried hydrophilic surface area will generally lead to enhanced affinity if large steric clashes are not introduced and buried polar groups are not left without a hydrogen bond partner. The procedure selects affinity enhancing point mutations at the protein-protein interface using three criteria: 1) the mutation must be from a polar amino acid to a non-polar amino acid or from a non-polar amino acid to a larger non-polar amino acid, 2) the free energy of binding as calculated with the Rosetta protein modeling program should be more favorable than the free energy of binding calculated for the wild type complex and 3) the mutation should not be predicted to significantly destabilize the monomers. The Rosetta energy function emphasizes short-range interactions: steric repulsion, Van der Waals forces, hydrogen bonding, and an implicit solvation model that penalizes placing atoms adjacent to polar groups. The performance of the computational protocol was experimentally tested on two separate protein complexes; Gαi1 from the heterotrimeric G-protein system bound to the RGS14 GoLoco motif, and the E2, UbcH7, bound to the E3, E6AP from the ubiquitin pathway. 12 single-site mutations that were predicted to be stabilizing were synthesized and characterized in the laboratory. 9 of the 12 mutations successfully increased binding affinity with 5 of these increasing binding by over 1.0 kcal/mol. To further assess our approach we searched the literature for point mutations that pass our criteria and have experimentally determined binding affinities. Of the 8 mutations identified, 5 were accurately predicted to increase binding affinity, further validating the method as a useful tool to increase protein-protein binding affinities.
Computational Protein Design; Protein-Protein Interactions; Protein Binding Hotspots; Rosetta Molecular Modeling Software; Hydrophobic Effect
No universal strategy exists for humanizing mouse antibodies, and most approaches are based on primary sequence alignment and
grafting. Although this strategy theoretically decreases the immunogenicity of mouse antibodies, it neither addresses
conformational changes nor steric clashes that arise due to grafting of human germline frameworks to accommodate mouse CDR
regions. To address these issues, we created and tested a structure-based biologic design approach using a de novo homology
model to aid in the humanization of 17 unique mouse antibodies. Our approach included building a structure-based de novo
homology model from the primary mouse antibody sequence, mutation of the mouse framework residues to the closest human
germline sequence and energy minimization by simulated annealing on the humanized homology model. Certain residues
displayed force field errors and revealed steric clashes upon closer examination. Therefore, further mutations were introduced to
rationally correct these errors. In conclusion, use of de novo antibody homology modeling together with simulated annealing
improved the ability to predict conformational and steric clashes that may arise due to conversion of a mouse antibody into the
humanized form and would prevent its neutralization when administered in vivo. This design provides a robust path towards the
development of a universal strategy for humanization of mouse antibodies using computationally derived antibody homologous
antibodies; antibody humanization; antibody engineering; antibody design; structure-based homology model; simulated annealing; PIGS; Rosetta
The I-TASSER algorithm for protein 3D structure prediction was tested in CASP8, with the procedure fully automated in both the Server and Human sections. The quality of the server models is close to that of human ones but incorporating more diverse templates from other servers improves the results of human predictions in the distant homology category. For the first time, the sequence-based contact predictions from machine learning techniques are found helpful for both template-based modeling (TBM) and template-free modeling (FM). In TBM, although the average accuracy of the sequence-based contact predictions is lower than that from template-based ones, the novel contacts in the sequence-based predictions, which are complementary to the threading templates in the weakly or unaligned regions, are important to improve the global and local packing of these regions. Moreover, the newly developed atomic structural refinement algorithm was tested in CASP8 and found to improve the hydrogen-bonding networks and the overall TM-score, which is mainly due to its ability of removing steric clashes so that the models can be generated from cluster centroids. Nevertheless, one of the major issues of the I-TASSER pipeline is the model selection where the best models could not be appropriately recognized when the correct templates are detected only by the minority of the threading algorithms. There are also problems related with domain-splitting and mirror image recognition which mainly influences the performance of I-TASSER modeling in the FM-based structure predictions.
Protein structure prediction; threading; I-TASSER; CASP8; contact prediction; free modeling
Common structural biology methods (i.e., NMR and molecular dynamics) often produce ensembles of molecular structures. Consequently, averaging of 3D coordinates of molecular structures (proteins and RNA) is a frequent approach to obtain a consensus structure that is representative of the ensemble. However, when the structures are averaged, artifacts can result in unrealistic local geometries, including unphysical bond lengths and angles.
Herein, we describe a method to derive representative structures while limiting the number of artifacts. Our approach is based on a Monte Carlo simulation technique that drives a starting structure (an extended or a 'close-by' structure) towards the 'averaged structure' using a harmonic pseudo energy function. To assess the performance of the algorithm, we applied our approach to Cα models of 1364 proteins generated by the TASSER structure prediction algorithm. The average RMSD of the refined model from the native structure for the set becomes worse by a mere 0.08 Å compared to the average RMSD of the averaged structures from the native structure (3.28 Å for refined structures and 3.36 A for the averaged structures). However, the percentage of atoms involved in clashes is greatly reduced (from 63% to 1%); in fact, the majority of the refined proteins had zero clashes. Moreover, a small number (38) of refined structures resulted in lower RMSD to the native protein versus the averaged structure. Finally, compared to PULCHRA , our approach produces representative structure of similar RMSD quality, but with much fewer clashes.
The benchmarking results demonstrate that our approach for removing averaging artifacts can be very beneficial for the structural biology community. Furthermore, the same approach can be applied to almost any problem where averaging of 3D coordinates is performed. Namely, structure averaging is also commonly performed in RNA secondary prediction , which could also benefit from our approach.
Several applications in biology—e.g., incorporation of protein flexibility in ligand docking algorithms, interpretation of fuzzy X-ray crystallographic data, and homology modeling—require computing the internal parameters of a flexible fragment (usually, a loop) of a protein in order to connect its termini to the rest of the protein without causing any steric clash inside the loop and with the rest of the protein. One must often sample many such conformations in order to explore and adequately represent the conformational range of the studied loop. While sampling must be fast, it is made difficult by the fact that two conflicting constraints—kinematic closure and clash avoidance—must be satisfied concurrently. This paper describes two efficient and complementary sampling algorithms to explore the space of closed clash-free conformations of a flexible protein loop. The “seed sampling” algorithm samples broadly from this space, while the “deformation sampling” algorithm uses seed conformations as starting points to explore the conformation space around them at a finer grain. Computational results are presented for various loops ranging from 5 to 25 residues. More specific results also show that the combination of the sampling algorithms with a functional site prediction software (FEATURE) makes it possible to compute and recognize calcium-binding loop conformations. The sampling algorithms are implemented in a toolkit, called LoopTK, which is available at https://simtk.org/home/looptk.
Protein kinematics; protein loop structure; conformation sampling; deformation sampling; inverse kinematics; calcium-binding proteins
I-TASSER is an automated pipeline for protein tertiary structure prediction using multiple threading alignments and iterative structure assembly simulations. In CASP9 experiments, two new algorithms, QUARK and FG-MD, were added to the I-TASSER pipeline for improving the structural modeling accuracy. QUARK is a de novo structure prediction algorithm used for structure modeling of proteins that lack detectable template structures. For distantly homologous targets, QUARK models are found useful as a reference structure for selecting good threading alignments and guiding the I-TASSER structure assembly simulations. FG-MD is an atomic-level structural refinement program that uses structural fragments collected from the PDB structures to guide molecular dynamics simulation and improve the local structure of predicted model, including hydrogen-bonding networks, torsion angles and steric clashes. Despite considerable progress in both the template-based and template-free structure modeling, significant improvements on protein target classification, domain parsing, model selection, and ab initio folding of beta-proteins are still needed to further improve the I-TASSER pipeline.
protein structure prediction; threading; contact prediction; ab initio folding; CASP
Non-covalent interactions hold the key to understanding many chemical, biological, and technological problems. Describing these non-covalent interactions accurately, including their positions in real space, constitutes a first step in the process of decoupling the complex balance of forces that define non-covalent interactions. Because of the size of macromolecules, the most common approach has been to assign van der Waals interactions (vdW), steric clashes (SC), and hydrogen bonds (HBs) based on pairwise distances between atoms according to their van der Waals radii. We recently developed an alternative perspective, derived from the electronic density: the Non-Covalent Interactions (NCI) index [J. Am. Chem. Soc. 2010, 132, 6498]. This index has the dual advantages of being generally transferable to diverse chemical applications and being very fast to compute, since it can be calculated from promolecular densities. Thus, NCI analysis is applicable to large systems, including proteins and DNA, where analysis of non-covalent interactions is of great potential value. Here, we describe the NCI computational algorithms and their implementation for the analysis and visualization of weak interactions, using both self-consistent fully quantum-mechanical, as well as promolecular, densities. A wide range of options for tuning the range of interactions to be plotted is also presented. To demonstrate the capabilities of our approach, several examples are given from organic, inorganic, solid state, and macromolecular chemistry, including cases where NCI analysis gives insight into unconventional chemical bonding. The NCI code and its manual are available for download at http://www.chem.duke.edu/~yang/software.htm
RCrane is a new tool for the partially automated building of RNA crystallographic models into electron-density maps of low or intermediate resolution. This tool helps crystallographers to place phosphates and bases into electron density and then automatically predicts and builds the detailed all-atom structure of the traced nucleotides.
RNA crystals typically diffract to much lower resolutions than protein crystals. This low-resolution diffraction results in unclear density maps, which cause considerable difficulties during the model-building process. These difficulties are exacerbated by the lack of computational tools for RNA modeling. Here, RCrane, a tool for the partially automated building of RNA into electron-density maps of low or intermediate resolution, is presented. This tool works within Coot, a common program for macromolecular model building. RCrane helps crystallographers to place phosphates and bases into electron density and then automatically predicts and builds the detailed all-atom structure of the traced nucleotides. RCrane then allows the crystallographer to review the newly built structure and select alternative backbone conformations where desired. This tool can also be used to automatically correct the backbone structure of previously built nucleotides. These automated corrections can fix incorrect sugar puckers, steric clashes and other structural problems.
RCrane; RNA model building
Hydrogen constitutes nearly half of all atoms in proteins and their positions are essential for analyzing hydrogen-bonding interactions and refining atomic-level structures. However, most protein structures determined by experiments or computer prediction lack hydrogen coordinates. We present a new algorithm, HAAD, to predict the positions of hydrogen atoms based on the positions of heavy atoms. The algorithm is built on the basic rules of orbital hybridization followed by the optimization of steric repulsion and electrostatic interactions. We tested the algorithm using three independent data sets: ultra-high-resolution X-ray structures, structures determined by neutron diffraction, and NOE proton-proton distances. Compared with the widely used programs CHARMM and REDUCE, HAAD has a significantly higher accuracy, with the average RMSD of the predicted hydrogen atoms to the X-ray and neutron diffraction structures decreased by 26% and 11%, respectively. Furthermore, hydrogen atoms placed by HAAD have more matches with the NOE restraints and fewer clashes with heavy atoms. The average CPU cost by HAAD is 18 and 8 times lower than that of CHARMM and REDUCE, respectively. The significant advantage of HAAD in both the accuracy and the speed of the hydrogen additions should make HAAD a useful tool for the detailed study of protein structure and function. Both an executable and the source code of HAAD are freely available at http://zhang.bioinformatics.ku.edu/HAAD.
For template-based modeling in the CASP8 Critical Assessment of Techniques for Protein Structure Prediction, this work develops and applies six new full-model metrics. They are designed to complement and add value to the traditional template-based assessment by GDT (Global Distance Test) and related scores (based on multiple superpositions of Cα atoms between target structure and predictions labeled “model 1”). The new metrics evaluate each predictor group on each target, using all atoms of their best model with above-average GDT. Two metrics evaluate how “protein-like” the predicted model is: the MolProbity score used for validating experimental structures, and a mainchain reality score using all-atom steric clashes, bond length and angle outliers, and backbone dihedrals. Four other new metrics evaluate match of model to target for mainchain and sidechain hydrogen bonds, sidechain end positioning, and sidechain rotamers. Group-average Z-score across the six full-model measures is averaged with group-average GDT Z-score to produce the overall ranking for full-model, high-accuracy performance.
Separate assessments are reported for specific aspects of predictor-group performance, such as robustness of approximately correct template or fold identification, and self-scoring ability at identifying the best of their models. Fold identification is distinct from but correlated with group-average GDT Z-score if target difficulty is taken into account, while self-scoring is done best by servers and is uncorrelated with GDT performance. Outstanding individual models on specific targets are identified and discussed. Predictor groups excelled at different aspects, highlighting the diversity of current methodologies. However, good full-model scores correlate robustly with high Cα accuracy.
homology modeling; protein structure prediction; all-atom contacts; full-model assessment
Recent developments in PHENIX are reported that allow the use of reference-model torsion restraints, secondary-structure hydrogen-bond restraints and Ramachandran restraints for improved macromolecular refinement in phenix.refine at low resolution.
Traditional methods for macromolecular refinement often have limited success at low resolution (3.0–3.5 Å or worse), producing models that score poorly on crystallographic and geometric validation criteria. To improve low-resolution refinement, knowledge from macromolecular chemistry and homology was used to add three new coordinate-restraint functions to the refinement program phenix.refine. Firstly, a ‘reference-model’ method uses an identical or homologous higher resolution model to add restraints on torsion angles to the geometric target function. Secondly, automatic restraints for common secondary-structure elements in proteins and nucleic acids were implemented that can help to preserve the secondary-structure geometry, which is often distorted at low resolution. Lastly, we have implemented Ramachandran-based restraints on the backbone torsion angles. In this method, a ϕ,ψ term is added to the geometric target function to minimize a modified Ramachandran landscape that smoothly combines favorable peaks identified from nonredundant high-quality data with unfavorable peaks calculated using a clash-based pseudo-energy function. All three methods show improved MolProbity validation statistics, typically complemented by a lowered R
free and a decreased gap between R
work and R
macromolecular crystallography; low resolution; refinement; automation
The RosettaAntibody server (http://antibody.graylab.jhu.edu) predicts the structure of an antibody variable region given the amino-acid sequences of the respective light and heavy chains. In an initial stage, the server identifies and displays the most sequence homologous template structures for the light and heavy framework regions and each of the complementarity determining region (CDR) loops. Subsequently, the most homologous templates are assembled into a side-chain optimized crude model, and the server returns a picture and coordinate file. For users requesting a high-resolution model, the server executes the full RosettaAntibody protocol which additionally models the hyper-variable CDR H3 loop. The high-resolution protocol also relieves steric clashes by optimizing the CDR backbone torsion angles and by simultaneously perturbing the relative orientation of the light and heavy chains. RosettaAntibody generates 2000 independent structures, and the server returns pictures, coordinate files, and detailed scoring information for the 10 top-scoring models. The 10 models enable users to use rational judgment in choosing the best model or to use the set as an ensemble for further studies such as docking. The high-resolution models generated by RosettaAntibody have been used for the successful prediction of antibody–antigen complex structures.
The prediction of loop structures is considered one of the main challenges in the protein folding problem. Regardless of the dependence of the overall algorithm on the protein data bank, the flexibility of loop regions dictates the need for special attention to their structures. In this article, we present algorithms for loop structure prediction with fixed stem and flexible stem geometry. In the flexible stem geometry problem, only the secondary structure of three stem residues on either side of the loop is known. In the fixed stem geometry problem, the structure of the three stem residues on either side of the loop is also known. Initial loop structures are generated using a probability database for the flexible stem geometry problem, and using torsion angle dynamics for the fixed stem geometry problem. Three rotamer optimization algorithms are introduced to alleviate steric clashes between the generated backbone structures and the side chain rotamers. The structures are optimized by energy minimization using an all atom force field. The optimized structures are clustered using a traveling salesman problem based clustering algorithm. The structures in the densest clusters are then utilized to refine dihedral angle bounds on all amino acids in the loop. The entire procedure is carried out for a number of iterations, leading to improved structure prediction and refined dihedral angle bounds. The algorithms presented in this article has been tested on 3190 loops from the PDBSelect25 data set and on targets from the recently concluded CASP9 community-wide experiment.
Protein Structure Prediction; ASTRO-FOLD; all-atom potential; improved bound generation
PROSESS (PROtein Structure Evaluation Suite and Server) is a web server designed to evaluate and validate protein structures generated by X-ray crystallography, NMR spectroscopy or computational modeling. While many structure evaluation packages have been developed over the past 20 years, PROSESS is unique in its comprehensiveness, its capacity to evaluate X-ray, NMR and predicted structures as well as its ability to evaluate a variety of experimental NMR data. PROSESS integrates a variety of previously developed, well-known and thoroughly tested methods to evaluate both global and residue specific: (i) covalent and geometric quality; (ii) non-bonded/packing quality; (iii) torsion angle quality; (iv) chemical shift quality and (v) NOE quality. In particular, PROSESS uses VADAR for coordinate, packing, H-bond, secondary structure and geometric analysis, GeNMR for calculating folding, threading and solvent energetics, ShiftX for calculating chemical shift correlations, RCI for correlating structure mobility to chemical shift and PREDITOR for calculating torsion angle-chemical shifts agreement. PROSESS also incorporates several other programs including MolProbity to assess atomic clashes, Xplor-NIH to identify and quantify NOE restraint violations and NAMD to assess structure energetics. PROSESS produces detailed tables, explanations, structural images and graphs that summarize the results and compare them to values observed in high-quality or high-resolution protein structures. Using a simplified red–amber–green coloring scheme PROSESS also alerts users about both general and residue-specific structural problems. PROSESS is intended to serve as a tool that can be used by structure biologists as well as database curators to assess and validate newly determined protein structures. PROSESS is freely available at http://www.prosess.ca.
Homology models of amidase-03 from Bacillus anthracis were constructed using Modeller (9v2). Modeller constructs protein models using
an automated approach for comparative protein structure modeling by the satisfaction of spatial restraints. A template structure of Listeria
monocytogenes bacteriophage PSA endolysin PlyPSA (PDB ID: 1XOV) was selected from protein databank (PDB) using BLASTp with
BLOSUM62 sequence alignment scoring matrix. We generated five models using the Modeller default routine in which initial coordinates
are randomized and evaluated by pseudo-energy parameters. The protein models were validated using PROCHECK and energy minimized
using the steepest descent method in GROMACS 3.2 (flexible SPC water model in cubic box of size 1 Å instead of rigid SPC model). We
used G43a1 force field in GROMACS for energy calculations and the generated structure was subsequently analyzed using the VMD
software for stereo-chemistry, atomic clash and misfolding. A detailed analysis of the amidase-03 model structure from Bacillus anthracis
will provide insight to the molecular design of suitable inhibitors as drug candidates.
Homology modeling; modeller; amidase-03; hydrolase enzyme; Bacillus anthracis
Three-dimensional RNA models fitted into crystallographic density maps exhibit pervasive conformational ambiguities, geometric errors, and steric clashes. To address these problems, we present Enumerative Real-space Refinement ASsisted by Electron density under Rosetta (ERRASER), coupled to PHENIX (Python-based Hierarchical Environment for Integrated Xtallography) diffraction-based refinement. On 24 datasets, ERRASER automatically corrects the majority of MolProbity-assessed errors, improves average Rfree factor, resolves functionally important discrepancies in non-canonical structure, and refines low-resolution models to better match higher resolution models.
The NMSim web server implements a three-step approach for multiscale modeling of protein conformational changes. First, the protein structure is coarse-grained using the FIRST software. Second, a rigid cluster normal-mode analysis provides low-frequency normal modes. Third, these modes are used to extend the recently introduced idea of constrained geometric simulations by biasing backbone motions of the protein, whereas side chain motions are biased toward favorable rotamer states (NMSim). The generated structures are iteratively corrected regarding steric clashes and stereochemical constraint violations. The approach allows performing three simulation types: unbiased exploration of conformational space; pathway generation by a targeted simulation; and radius of gyration-guided simulation. On a data set of proteins with experimentally observed conformational changes, the NMSim approach has been shown to be a computationally efficient alternative to molecular dynamics simulations for conformational sampling of proteins. The generated conformations and pathways of conformational transitions can serve as input to docking approaches or more sophisticated sampling techniques. The web server output is a trajectory of generated conformations, Jmol representations of the coarse-graining and a subset of the trajectory and data plots of structural analyses. The NMSim webserver, accessible at http://www.nmsim.de, is free and open to all users with no login requirement.
We have presented a new protein–protein docking approach to model heterodimeric structures based on the conformations of the monomeric units. The conventional modeling method relies on superimposing two monomeric structures onto the crystal structure of a homologous protein dimer. The resulting structure may exhibit severe backbone clashes at the dimeric interface depending on the backbone dissimilarity between the target and template proteins. Our method overcomes the backbone clashing problem and requires no a priori knowledge of the dimeric structure of a homologous protein. Here we used human Cystic Fibrosis Transmembrane conductance Regulator (CFTR), a chloride channel whose dysfunction causes cystic fibrosis, for illustration. The two intracellular nucleotide-binding domains (NBDs) of CFTR control the opening and closing of the channel. Yet, the structure of the CFTR’s NBD1–NBD2 complex has not been experimentally determined. Thus, correct modeling of this heterodimeric structure is valuable for understanding CFTR functions and would have potential applications for drug design for cystic fibrosis treatment. Based on the crystal structure of human CFTR’s NBD1, we constructed a model of the NBD1–NBD2 complex. The constructed model is consistent with the dimeric mode observed in the crystal structures of other ABC transporters. To verify our structural model, an ATP substrate was docked into the nucleotide-binding site. The predicted binding mode shows consistency with related crystallographic findings and CFTR functional studies. Finally, genistein, an agent that enhances CFTR activity, though the mechanism for such enhancement is unclear, was docked to the model. Our predictions agreed with genistein’s bell-shaped dose-response relationship. Potential mutagenesis experiments were proposed for understanding the potentiation mechanism of genistein and for providing insightful information for drug design targeting at CFTR. The method used in this study can be applied to modeling studies of other dimeric protein structures.
Molecular modeling; Molecular docking; CFTR
Near-native selections from docking decoys have proved challenging especially when unbound proteins are used in the molecular docking. One reason is that significant atomic clashes in docking decoys lead to poor predictions of binding affinities of near native decoys. Atomic clashes can be removed by structural refinement through energy minimization. Such an energy minimization, however, will lead to an unrealistic bias toward docked structures with large interfaces. Here, we extend an empirical energy function developed for protein design to protein–protein docking selection by introducing a simple reference state that removes the unrealistic dependence of binding affinity of docking decoys on the buried solvent accessible surface area of interface. The energy function called EMPIRE (EMpirical Protein-InteRaction Energy), when coupled with a refinement strategy, is found to provide a significantly improved success rate in near native selections when applied to RosettaDock and refined ZDOCK docking decoys. Our work underlines the importance of removing nonspecific interactions from specific ones in near native selections from docking decoys.
knowledge-based potential; energy score functions; reference state; binding affinity; docking decoys
Protein structure determination and predictive modeling have long been guided by the paradigm that the peptide backbone has a single, context-independent ideal geometry. Both quantum-mechanics calculations and empirical analyses have shown this is an incorrect simplification in that backbone covalent geometry actually varies systematically as a function of the Φ and Ψ backbone dihedral angles. Here, we use a nonredundant set of ultrahigh-resolution protein structures to define these conformation-dependent variations. The trends have a rational, structural basis that can be explained by avoidance of atomic clashes or optimization of favorable electrostatic interactions. To facilitate adoption of this new paradigm, we have created a conformation-dependent library of covalent bond lengths and bond angles and shown that it has improved accuracy over existing methods without any additional variables to optimize. Protein structures derived both from crystallographic refinement and predictive modeling both stand to benefit from incorporation of the new paradigm.