All-atom models derived from moderate-resolution protein crystal structures contain a high frequency of close nonbonded contacts, independent of the major refinement program used for structure determination. All-atom refinement with PrimeX corrects many of these problematic interactions, producing models that are better suited for use in computational chemistry and related applications.
All-atom models are essential for many applications in molecular modeling and computational chemistry. Nonbonded atomic contacts much closer than the sum of the van der Waals radii of the two atoms (clashes) are commonly observed in such models derived from protein crystal structures. A set of 94 recently deposited protein structures in the resolution range 1.5–2.8 Å were analyzed for clashes by the addition of all H atoms to the models followed by optimization and energy minimization of the positions of just these H atoms. The results were compared with the same set of structures after automated all-atom refinement with PrimeX and with nonbonded contacts in protein crystal structures at a resolution equal to or better than 0.9 Å. The additional PrimeX refinement produced structures with reasonable summary geometric statistics and similar R
free values to the original structures. The frequency of clashes at less than 0.8 times the sum of van der Waals radii was reduced over fourfold compared with that found in the original structures, to a level approaching that found in the ultrahigh-resolution structures. Moreover, severe clashes at less than or equal to 0.7 times the sum of atomic radii were reduced 15-fold. All-atom refinement with PrimeX produced improved crystal structure models with respect to nonbonded contacts and yielded changes in structural details that dramatically impacted on the interpretation of some protein–ligand interactions.
H atoms; van der Waals radii; restraints; nonbonded contacts; clashes; molecular geometry; model quality; force fields; refinement; riding H atoms; electrostatics; hydrogen bonds
I-TASSER is an automated pipeline for protein tertiary structure prediction using multiple threading alignments and iterative structure assembly simulations. In CASP9 experiments, two new algorithms, QUARK and FG-MD, were added to the I-TASSER pipeline for improving the structural modeling accuracy. QUARK is a de novo structure prediction algorithm used for structure modeling of proteins that lack detectable template structures. For distantly homologous targets, QUARK models are found useful as a reference structure for selecting good threading alignments and guiding the I-TASSER structure assembly simulations. FG-MD is an atomic-level structural refinement program that uses structural fragments collected from the PDB structures to guide molecular dynamics simulation and improve the local structure of predicted model, including hydrogen-bonding networks, torsion angles and steric clashes. Despite considerable progress in both the template-based and template-free structure modeling, significant improvements on protein target classification, domain parsing, model selection, and ab initio folding of beta-proteins are still needed to further improve the I-TASSER pipeline.
protein structure prediction; threading; contact prediction; ab initio folding; CASP
Several applications in biology—e.g., incorporation of protein flexibility in ligand docking algorithms, interpretation of fuzzy X-ray crystallographic data, and homology modeling—require computing the internal parameters of a flexible fragment (usually, a loop) of a protein in order to connect its termini to the rest of the protein without causing any steric clash inside the loop and with the rest of the protein. One must often sample many such conformations in order to explore and adequately represent the conformational range of the studied loop. While sampling must be fast, it is made difficult by the fact that two conflicting constraints—kinematic closure and clash avoidance—must be satisfied concurrently. This paper describes two efficient and complementary sampling algorithms to explore the space of closed clash-free conformations of a flexible protein loop. The “seed sampling” algorithm samples broadly from this space, while the “deformation sampling” algorithm uses seed conformations as starting points to explore the conformation space around them at a finer grain. Computational results are presented for various loops ranging from 5 to 25 residues. More specific results also show that the combination of the sampling algorithms with a functional site prediction software (FEATURE) makes it possible to compute and recognize calcium-binding loop conformations. The sampling algorithms are implemented in a toolkit, called LoopTK, which is available at https://simtk.org/home/looptk.
Protein kinematics; protein loop structure; conformation sampling; deformation sampling; inverse kinematics; calcium-binding proteins
Non-covalent interactions hold the key to understanding many chemical, biological, and technological problems. Describing these non-covalent interactions accurately, including their positions in real space, constitutes a first step in the process of decoupling the complex balance of forces that define non-covalent interactions. Because of the size of macromolecules, the most common approach has been to assign van der Waals interactions (vdW), steric clashes (SC), and hydrogen bonds (HBs) based on pairwise distances between atoms according to their van der Waals radii. We recently developed an alternative perspective, derived from the electronic density: the Non-Covalent Interactions (NCI) index [J. Am. Chem. Soc. 2010, 132, 6498]. This index has the dual advantages of being generally transferable to diverse chemical applications and being very fast to compute, since it can be calculated from promolecular densities. Thus, NCI analysis is applicable to large systems, including proteins and DNA, where analysis of non-covalent interactions is of great potential value. Here, we describe the NCI computational algorithms and their implementation for the analysis and visualization of weak interactions, using both self-consistent fully quantum-mechanical, as well as promolecular, densities. A wide range of options for tuning the range of interactions to be plotted is also presented. To demonstrate the capabilities of our approach, several examples are given from organic, inorganic, solid state, and macromolecular chemistry, including cases where NCI analysis gives insight into unconventional chemical bonding. The NCI code and its manual are available for download at http://www.chem.duke.edu/~yang/software.htm
Protein structure prediction approaches usually perform modeling simulations based on reduced representation of protein structures. For biological utilizations, it is an important step to construct full atomic models from the reduced structure decoys. Most of the current full-atomic model reconstruction procedures have defects which either could not completely remove the steric clashes among backbone atoms or generate final atomic models with worse topology similarity relative to the native structures than the reduced models. In this work, we develop a new protocol, called REMO, to generate full atomic protein models by optimizing the hydrogen-bonding network with basic fragments matched from a newly constructed backbone isomer library of solved protein structures. The algorithm is benchmarked on 230 non-homologous proteins with reduced structure decoys generated by I-TASSER simulations. The results show that REMO has a significant ability to remove steric clashes, and meanwhile retains good topology of the reduced model. The hydrogen-bonding network of the final models is dramatically improved during the procedure. The REMO algorithm has been exploited in the recent CASP8 experiment which demonstrated significant improvements of the I-TASSER models in both atomic-level structural refinement and hydrogen-bonding network construction.
Protein structure prediction; reduced modeling; protein structure refinement; hydrogen-bonding network; structure clustering; steric clash
For template-based modeling in the CASP8 Critical Assessment of Techniques for Protein Structure Prediction, this work develops and applies six new full-model metrics. They are designed to complement and add value to the traditional template-based assessment by GDT (Global Distance Test) and related scores (based on multiple superpositions of Cα atoms between target structure and predictions labeled “model 1”). The new metrics evaluate each predictor group on each target, using all atoms of their best model with above-average GDT. Two metrics evaluate how “protein-like” the predicted model is: the MolProbity score used for validating experimental structures, and a mainchain reality score using all-atom steric clashes, bond length and angle outliers, and backbone dihedrals. Four other new metrics evaluate match of model to target for mainchain and sidechain hydrogen bonds, sidechain end positioning, and sidechain rotamers. Group-average Z-score across the six full-model measures is averaged with group-average GDT Z-score to produce the overall ranking for full-model, high-accuracy performance.
Separate assessments are reported for specific aspects of predictor-group performance, such as robustness of approximately correct template or fold identification, and self-scoring ability at identifying the best of their models. Fold identification is distinct from but correlated with group-average GDT Z-score if target difficulty is taken into account, while self-scoring is done best by servers and is uncorrelated with GDT performance. Outstanding individual models on specific targets are identified and discussed. Predictor groups excelled at different aspects, highlighting the diversity of current methodologies. However, good full-model scores correlate robustly with high Cα accuracy.
homology modeling; protein structure prediction; all-atom contacts; full-model assessment
Three-dimensional RNA models fitted into crystallographic density maps exhibit pervasive conformational ambiguities, geometric errors, and steric clashes. To address these problems, we present Enumerative Real-space Refinement ASsisted by Electron density under Rosetta (ERRASER), coupled to PHENIX (Python-based Hierarchical Environment for Integrated Xtallography) diffraction-based refinement. On 24 datasets, ERRASER automatically corrects the majority of MolProbity-assessed errors, improves average Rfree factor, resolves functionally important discrepancies in non-canonical structure, and refines low-resolution models to better match higher resolution models.
Although accurate details in RNA structure are of great importance for understanding RNA function, the backbone conformation is difficult to determine, and most existing RNA structures show serious steric clashes (≥ 0.4Å overlap) when hydrogen atoms are taken into account. We have developed a program called RNABC (RNA Backbone Correction) that performs local perturbations to search for alternative conformations that avoid those steric clashes or other local geometry problems. Its input is an all-atom coordinate file for an RNA crystal structure (usually from the MolProbity web service), with problem areas specified. RNABC rebuilds a suite (the unit from sugar to sugar) by anchoring the phosphorus and base positions, which are clearest in crystallographic electron density, and reconstructing the other atoms using forward kinematics. Geometric parameters are constrained within user-specified tolerance of canonical or original values, and torsion angles are constrained to ranges defined through empirical database analyses. Several optimizations reduce the time required to search the many possible conformations. The output results are clustered and presented to the user, who can choose whether to accept one of the alternative conformations.
Two test evaluations show the effectiveness of RNABC, first on the S-motifs from 42 RNA structures, and second on the worst problem suites (clusters of bad clashes, or serious sugar pucker outliers) in 25 unrelated RNA structures. Among the 101 S-motifs, 88 had diagnosed problems, and RNABC produced clash-free conformations with acceptable geometry for 71 of those (about 80%). For the 154 worst problem suites, RNABC proposed alternative conformations for 72. All but 8 of those were judged acceptable after examining electron density (where available) and local conformation. Thus, even for these worst cases, nearly half the time RNABC suggested corrections suitable to initiate further crystallographic refinement. The program is available from http://kinemage.biochem.duke.edu.
kinematic chain; RNA backbone conformation; RNA backbone adjustment; RNA crystallography; automated rebuilding; steric clash; S-motifs; all-atom contacts; structure validation
We have presented a new protein–protein docking approach to model heterodimeric structures based on the conformations of the monomeric units. The conventional modeling method relies on superimposing two monomeric structures onto the crystal structure of a homologous protein dimer. The resulting structure may exhibit severe backbone clashes at the dimeric interface depending on the backbone dissimilarity between the target and template proteins. Our method overcomes the backbone clashing problem and requires no a priori knowledge of the dimeric structure of a homologous protein. Here we used human Cystic Fibrosis Transmembrane conductance Regulator (CFTR), a chloride channel whose dysfunction causes cystic fibrosis, for illustration. The two intracellular nucleotide-binding domains (NBDs) of CFTR control the opening and closing of the channel. Yet, the structure of the CFTR’s NBD1–NBD2 complex has not been experimentally determined. Thus, correct modeling of this heterodimeric structure is valuable for understanding CFTR functions and would have potential applications for drug design for cystic fibrosis treatment. Based on the crystal structure of human CFTR’s NBD1, we constructed a model of the NBD1–NBD2 complex. The constructed model is consistent with the dimeric mode observed in the crystal structures of other ABC transporters. To verify our structural model, an ATP substrate was docked into the nucleotide-binding site. The predicted binding mode shows consistency with related crystallographic findings and CFTR functional studies. Finally, genistein, an agent that enhances CFTR activity, though the mechanism for such enhancement is unclear, was docked to the model. Our predictions agreed with genistein’s bell-shaped dose-response relationship. Potential mutagenesis experiments were proposed for understanding the potentiation mechanism of genistein and for providing insightful information for drug design targeting at CFTR. The method used in this study can be applied to modeling studies of other dimeric protein structures.
Molecular modeling; Molecular docking; CFTR
Motivation: Increasing use of structural modeling for understanding structure–function relationships in proteins has led to the need to ensure that the protein models being used are of acceptable quality. Quality of a given protein structure can be assessed by comparing various intrinsic structural properties of the protein to those observed in high-resolution protein structures.
Results: In this study, we present tools to compare a given structure to high-resolution crystal structures. We assess packing by calculating the total void volume, the percentage of unsatisfied hydrogen bonds, the number of steric clashes and the scaling of the accessible surface area. We assess covalent geometry by determining bond lengths, angles, dihedrals and rotamers. The statistical parameters for the above measures, obtained from high-resolution crystal structures enable us to provide a quality-score that points to specific areas where a given protein structural model needs improvement.
Availability and Implementation: We provide these tools that appraise protein structures in the form of a web server Gaia (http://chiron.dokhlab.org). Gaia evaluates the packing and covalent geometry of a given protein structure and provides quantitative comparison of the given structure to high-resolution crystal structures.
Supplementary information: Supplementary data are available at Bioinformatics online.
Recently, it has been shown by Calladine (1982) and Dickerson (1983) that DNA distortions due to steric clashes between opposing purines and pyrimidines can be quantitated based upon four sum functions. The distortions involve helical twist, roll, torsion angle variations and propeller twist. It is the contention of the authors that these perturbations in structure act as information carriers for various external DNA interactions. This paper describes a system that incorporates these four rules and various other functions that permit the systematic interactive exploration for significant patterns as a consequence of these steric clashes.
The RosettaAntibody server (http://antibody.graylab.jhu.edu) predicts the structure of an antibody variable region given the amino-acid sequences of the respective light and heavy chains. In an initial stage, the server identifies and displays the most sequence homologous template structures for the light and heavy framework regions and each of the complementarity determining region (CDR) loops. Subsequently, the most homologous templates are assembled into a side-chain optimized crude model, and the server returns a picture and coordinate file. For users requesting a high-resolution model, the server executes the full RosettaAntibody protocol which additionally models the hyper-variable CDR H3 loop. The high-resolution protocol also relieves steric clashes by optimizing the CDR backbone torsion angles and by simultaneously perturbing the relative orientation of the light and heavy chains. RosettaAntibody generates 2000 independent structures, and the server returns pictures, coordinate files, and detailed scoring information for the 10 top-scoring models. The 10 models enable users to use rational judgment in choosing the best model or to use the set as an ensemble for further studies such as docking. The high-resolution models generated by RosettaAntibody have been used for the successful prediction of antibody–antigen complex structures.
The ability to manipulate protein binding affinities is important for the development of proteins as biosensors, industrial reagents, and therapeutics. We have developed a structure-based method to rationally predict single mutations at protein-protein interfaces that enhance binding affinities. The protocol is based on the premise that increasing buried hydrophobic surface area and/or reducing buried hydrophilic surface area will generally lead to enhanced affinity if large steric clashes are not introduced and buried polar groups are not left without a hydrogen bond partner. The procedure selects affinity enhancing point mutations at the protein-protein interface using three criteria: 1) the mutation must be from a polar amino acid to a non-polar amino acid or from a non-polar amino acid to a larger non-polar amino acid, 2) the free energy of binding as calculated with the Rosetta protein modeling program should be more favorable than the free energy of binding calculated for the wild type complex and 3) the mutation should not be predicted to significantly destabilize the monomers. The Rosetta energy function emphasizes short-range interactions: steric repulsion, Van der Waals forces, hydrogen bonding, and an implicit solvation model that penalizes placing atoms adjacent to polar groups. The performance of the computational protocol was experimentally tested on two separate protein complexes; Gαi1 from the heterotrimeric G-protein system bound to the RGS14 GoLoco motif, and the E2, UbcH7, bound to the E3, E6AP from the ubiquitin pathway. 12 single-site mutations that were predicted to be stabilizing were synthesized and characterized in the laboratory. 9 of the 12 mutations successfully increased binding affinity with 5 of these increasing binding by over 1.0 kcal/mol. To further assess our approach we searched the literature for point mutations that pass our criteria and have experimentally determined binding affinities. Of the 8 mutations identified, 5 were accurately predicted to increase binding affinity, further validating the method as a useful tool to increase protein-protein binding affinities.
Computational Protein Design; Protein-Protein Interactions; Protein Binding Hotspots; Rosetta Molecular Modeling Software; Hydrophobic Effect
KOSMOS is the first online morph server to be able to address the structural dynamics of DNA/RNA, proteins and even their complexes, such as ribosomes. The key functions of KOSMOS are the harmonic and anharmonic analyses of macromolecules. In the harmonic analysis, normal mode analysis (NMA) based on an elastic network model (ENM) is performed, yielding vibrational modes and B-factor calculations, which provide insight into the potential biological functions of macromolecules based on their structural features. Anharmonic analysis involving elastic network interpolation (ENI) is used to generate plausible transition pathways between two given conformations by optimizing a topology-oriented cost function that guarantees a smooth transition without steric clashes. The quality of the computed pathways is evaluated based on their various facets, including topology, energy cost and compatibility with the NMA results. There are also two unique features of KOSMOS that distinguish it from other morph servers: (i) the versatility in the coarse-graining methods and (ii) the various connection rules in the ENM. The models enable us to analyze macromolecular dynamics with the maximum degrees of freedom by combining a variety of ENMs from full-atom to coarse-grained, backbone and hybrid models with one connection rule, such as distance-cutoff, number-cutoff or chemical-cutoff. KOSMOS is available at http://bioengineering.skku.ac.kr/kosmos.
Protein structure determination and predictive modeling have long been guided by the paradigm that the peptide backbone has a single, context-independent ideal geometry. Both quantum-mechanics calculations and empirical analyses have shown this is an incorrect simplification in that backbone covalent geometry actually varies systematically as a function of the Φ and Ψ backbone dihedral angles. Here, we use a nonredundant set of ultrahigh-resolution protein structures to define these conformation-dependent variations. The trends have a rational, structural basis that can be explained by avoidance of atomic clashes or optimization of favorable electrostatic interactions. To facilitate adoption of this new paradigm, we have created a conformation-dependent library of covalent bond lengths and bond angles and shown that it has improved accuracy over existing methods without any additional variables to optimize. Protein structures derived both from crystallographic refinement and predictive modeling both stand to benefit from incorporation of the new paradigm.
Near-native selections from docking decoys have proved challenging especially when unbound proteins are used in the molecular docking. One reason is that significant atomic clashes in docking decoys lead to poor predictions of binding affinities of near native decoys. Atomic clashes can be removed by structural refinement through energy minimization. Such an energy minimization, however, will lead to an unrealistic bias toward docked structures with large interfaces. Here, we extend an empirical energy function developed for protein design to protein–protein docking selection by introducing a simple reference state that removes the unrealistic dependence of binding affinity of docking decoys on the buried solvent accessible surface area of interface. The energy function called EMPIRE (EMpirical Protein-InteRaction Energy), when coupled with a refinement strategy, is found to provide a significantly improved success rate in near native selections when applied to RosettaDock and refined ZDOCK docking decoys. Our work underlines the importance of removing nonspecific interactions from specific ones in near native selections from docking decoys.
knowledge-based potential; energy score functions; reference state; binding affinity; docking decoys
Caspases, a family of apoptotic proteases, are increasingly recognized as being extensively phosphorylated, usually leading to inactivation. To date, no structural mechanism for phosphorylation-based caspase inactivation is available, although this information may be key to achieving caspase-specific inhibition. Caspase-6 has recently been implicated in neurodegenerative conditions including Huntington's and Alzheimer's diseases. A full understanding of caspase-6 regulation is crucial to caspase-6-specific inhibition. Caspase-6 is phosphorylated by ARK5 kinase at serine 257 leading to suppression of cell death via caspase-6 inhibition. Our structure of the fully inactive phosphomimetic S257D reveals that phosphorylation results in a steric clash with P201 in the L2′ loop. Removal of the proline side chain alleviates the clash resulting in nearly wild-type activity levels. This phosphomimetic-mediated steric clash causes misalignment of the substrate-binding groove, preventing substrate binding. Substrate-binding loop misalignment appears to be a widely used regulatory strategy among caspases and may present a new paradigm for caspase-specific control.
We describe an array of gaps in an antiparallel four-helix bundle structure, the cytoplasmic domains of bacterial chemoreceptors. For a given helix, the side chain interactions that define a helix’s position are analyzed in terms of residue interfaces, the most important of which are a-a, g-g, d-d, g-d, and a-d. It was found that the interdigitation of the side groups does not entirely fill the space along the long axis of the structure, which results in a rather regular array of gaps. A simulated piston motion of helix CD1 along the helical axis direction by 1.2Å shows that 85% of the side chain interactions still satisfy Van der Waals criteria, while the remaining clashes could be avoided by small rotations of side chains. Therefore, two states could exist in the structure, related by a piston motion. Analysis of the crystal structure of a small four-helix bundle, the P1short domain of CheA in Thermotoga Maritima, reveals that the two coexisting states related by a 1.3-1.7Å piston motion are defined by the same mechanism. This two-state model is a plausible candidate mechanism for the long distance signal transduction in bacterial chemoreceptors and is qualitatively consistent with literature chemoreceptor mutagenesis results. Such a mechanism could exist in many other structures with interdigitating α-helices.
Four-helix bundle; chemoreceptors; dynamics; signal transduction
KinDOCK is a new web server for the analysis of ATP-binding sites of protein kinases. This characterization is based on the docking of ligands already co-crystallized with other protein kinases. A structural library of protein kinase–ligand complexes has been extracted from the Protein Data Bank (PDB). This library can provide both potential ligands and their putative binding orientation for a given protein kinase. After protein–protein structural superposition, the ligands are transferred from the template complexes to the target protein kinase. The resulting complexes are evaluated using the program SCORE to compute a theoretical affinity. They can be dynamically visualized to allow a rapid mapping of important steric clashes and potential substitutions relevant for specificity and affinity. These characteristics allow a quick characterization of protein kinase active sites including conformation changes potentially required to accommodate particular ligands. Additionally, promising pharmacophores can be identified in the focussed library. These features will help to rationalize or optimize virtual screening (VS) on larger chemical compound libraries. The server and its documentation are freely available at .
The NMSim web server implements a three-step approach for multiscale modeling of protein conformational changes. First, the protein structure is coarse-grained using the FIRST software. Second, a rigid cluster normal-mode analysis provides low-frequency normal modes. Third, these modes are used to extend the recently introduced idea of constrained geometric simulations by biasing backbone motions of the protein, whereas side chain motions are biased toward favorable rotamer states (NMSim). The generated structures are iteratively corrected regarding steric clashes and stereochemical constraint violations. The approach allows performing three simulation types: unbiased exploration of conformational space; pathway generation by a targeted simulation; and radius of gyration-guided simulation. On a data set of proteins with experimentally observed conformational changes, the NMSim approach has been shown to be a computationally efficient alternative to molecular dynamics simulations for conformational sampling of proteins. The generated conformations and pathways of conformational transitions can serve as input to docking approaches or more sophisticated sampling techniques. The web server output is a trajectory of generated conformations, Jmol representations of the coarse-graining and a subset of the trajectory and data plots of structural analyses. The NMSim webserver, accessible at http://www.nmsim.de, is free and open to all users with no login requirement.
Recombinant antibodies are of profound clinical significance; yet, anti-carbohydrate antibodies are prone to undesirable cross-reactivity with structurally related-glycans. Here we introduce a new technology called Computational Carbohydrate Grafting (CCG), which enables a virtual library of glycans to be assessed for protein binding specificity, and employ it to define the scope and structural origin of the binding specificity of antibody JAA-F11 for glycans containing the Thomsen-Friedenreich (TF) human tumor antigen. A virtual library of the entire human glycome (GLibrary-3D) was constructed, from which 1,182 TF-containing human glycans were identified and assessed for their ability to fit into the antibody combining site. The glycans were categorized into putative binders, or non-binders, on the basis of steric clashes with the antibody surface. The analysis employed a structure of the immune complex, generated by docking the TF-disaccharide (Galβ1-3GalNAcα) into a crystal structure of the JAA-F11 antigen binding fragment, which was shown to be consistent with saturation transfer difference (STD) NMR data. The specificities predicted by CCG were fully consistent with data from experimental glycan array screening, and confirmed that the antibody is selective for the TF-antigen and certain extended core-2 type mucins. Additionally, the CCG analysis identified a limited number of related putative binding motifs, and provided a structural basis for interpreting the specificity. CCG can be utilized to facilitate clinical applications through the determination of the three-dimensional interaction of glycans with proteins, thus augmenting drug and vaccine development techniques that seek to optimize the specificity and affinity of neutralizing proteins, which target glycans associated with diseases including cancer and HIV.
The title molecule, C22H17BrN2O4S, has a twisted U shape, the dihedral angle between the quinazolin-4-one and bromobenzene ring systems being 46.25 (8)°. In order to avoid steric clashes with adjacent substituents on the quinazolin-4-one ring, the N-bound tolyl group occupies an orthogonal position [dihedral angle = 89.59 (8)°]. In the crystal, molecules are connected into a three-dimensional architecture by C—H⋯O interactions, with the ketone O atom accepting two such bonds and a sulfonate O atom one.
We present a method for the computer-based iterative assembly of native-like tertiary structures of helical proteins from alpha-helical fragments. For any pair of helices, our method, called MATCHSTIX, first generates an ensemble of possible relative orientations of the helices with various ways to form hydrophobic contacts between them. Those conformations having steric clashes, or a large radius of gyration of hydrophobic residues, or with helices too far separated to be connected by the intervening linking region, are discarded. Then, we attempt to connect the two helical fragments by using a robotics-based loop-closure algorithm. When loop closure is feasible, the algorithm generates an ensemble of viable interconnecting loops. After energy minimization and clustering, we use a representative set of conformations for further assembly with the remaining helices, adding one helix at a time. To efficiently sample the conformational space, the order of assembly generally proceeds from the pair of helices connected by the shortest loop, followed by joining one of its adjacent helices, always proceeding with the shorter connecting loop. We tested MATCHSTIX on 28 helical proteins each containing up to 5 helices and found it to heavily sample native-like conformations. The average RMSD of the best conformations for the 17 helix-bundle proteins that have 2 or 3 helices is less than 2 Å; errors increase somewhat for proteins containing more helices. Native-like states are even more densely sampled when disulfide bonds are known and imposed as restraints. We conclude that, at least for helical proteins, if the secondary structures are known, this rapid rigid-body maximization of hydrophobic interactions can lead to small ensembles of highly native-like structures. It may be useful for protein structure prediction.
Segmentation and docking are useful methods for discovery of molecular components in cryo-EM (Electron Cryo-Microscopy) density maps of macromolecular complexes. In this paper, we describe the segmentation and docking methods implemented in Segger. For 12 targets included in the 2010 Cryo-EM Modeling Challenge, we segmented the regions corresponding to individual molecular components using Segger. We then used the segmented regions to guide rigid-body docking of individual components. An assessment in the accuracy of the component segmentation of the targets based on Segger and other methods was made by comparing the docking results of individual components to the segmented regions. The docking results were evaluated by comparison to published structures and by calculation of several scores including atom inclusion, density occupancy, and geometry clash.
cryo-EM; segmentation; map based modeling
Iron is required for virulence of most bacterial pathogens, many of which rely on siderophores, small-molecule chelators, to scavenge iron in mammalian hosts. As an immune response, the human protein Siderocalin binds both apo and ferric siderophores in order to intercept delivery of iron to the bacterium, impeding virulence. The introduction of steric clashes into the siderophore structure is an important mechanism of evading sequestration. However, in the absence of steric incompatibilities, electrostatic interactions determine siderophore strength of binding by Siderocalin. By using a series of isosteric enterobactin analogs, the contribution of electrostatic interactions, including both charge-charge and cation-π, to the recognition of 2,3-catecholate siderophores has been deconvoluted. The analogs used in the study incorporate a systematic combination of 2,3-catecholamide (CAM) and N-hydroxypyridinonate (1,2-HOPO) binding units on a tris(2-aminoethyl)amine backbone, [tren(CAM)m(1,2-HOPO)n where m = 0, 1, 2, or 3 and n = 3 – m]. The shape complementarity of the synthetic analog series was determined through small molecule crystallography and the binding interactions investigated through a fluorescence-based binding assay. These results were modeled and correlated through ab initio calculations of the electrostatic properties of the binding units. Although all the analogs are accommodated in the binding pocket of Siderocalin, the ferric complexes incorporating decreasing numbers of CAM units are bound with decreasing affinities (Kd = >600, 43, 0.8 and 0.3 nM for m = 0-3). These results elucidate the role of electrostatics in the mechanism of siderophore recognition by Siderocalin.
Siderocalin; siderophore; iron; virulence; innate immunity; cation-π