Related Articles
Motivation: Metal ions are essential for the folding of RNA molecules into stable tertiary structures and are often involved in the catalytic activity of ribozymes. However, the positions of metal ions in RNA 3D structures are difficult to determine experimentally. This motivated us to develop a computational predictor of metal ion sites for RNA structures.
Results: We developed a statistical potential for predicting positions of metal ions (magnesium, sodium and potassium), based on the analysis of binding sites in experimentally solved RNA structures. The MetalionRNA program is available as a web server that predicts metal ions for RNA structures submitted by the user.
Availability: The MetalionRNA web server is accessible at http://metalionrna.genesilico.pl/.
Contact: iamb@genesilico.pl
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btr636
PMCID: PMC3259437
PMID: 22110243
The CCP4 template-restraint library defines restraints for biopolymers, their modifications and ligands that are used in macromolecular structure refinement. JLigand is a graphical editor for generating descriptions of new ligands and covalent linkages.
Biological macromolecules are polymers and therefore the restraints for macromolecular refinement can be subdivided into two sets: restraints that are applied to atoms that all belong to the same monomer and restraints that are associated with the covalent bonds between monomers. The CCP4 template-restraint library contains three types of data entries defining template restraints: descriptions of monomers and their modifications, both used for intramonomer restraints, and descriptions of links for intermonomer restraints. The library provides generic descriptions of modifications and links for protein, DNA and RNA chains, and for some post-translational modifications including glycosylation. Structure-specific template restraints can be defined in a user’s additional restraint library. Here, JLigand, a new CCP4 graphical interface to LibCheck and REFMAC that has been developed to manage the user’s library and generate new monomer entries is described, as well as new entries for links and associated modifications.
doi:10.1107/S090744491200251X
PMCID: PMC3322602
PMID: 22505263
macromolecular refinement; restraint library; molecular graphics
Recent developments in PHENIX are reported that allow the use of reference-model torsion restraints, secondary-structure hydrogen-bond restraints and Ramachandran restraints for improved macromolecular refinement in phenix.refine at low resolution.
Traditional methods for macromolecular refinement often have limited success at low resolution (3.0–3.5 Å or worse), producing models that score poorly on crystallographic and geometric validation criteria. To improve low-resolution refinement, knowledge from macromolecular chemistry and homology was used to add three new coordinate-restraint functions to the refinement program phenix.refine. Firstly, a ‘reference-model’ method uses an identical or homologous higher resolution model to add restraints on torsion angles to the geometric target function. Secondly, automatic restraints for common secondary-structure elements in proteins and nucleic acids were implemented that can help to preserve the secondary-structure geometry, which is often distorted at low resolution. Lastly, we have implemented Ramachandran-based restraints on the backbone torsion angles. In this method, a ϕ,ψ term is added to the geometric target function to minimize a modified Ramachandran landscape that smoothly combines favorable peaks identified from nonredundant high-quality data with unfavorable peaks calculated using a clash-based pseudo-energy function. All three methods show improved MolProbity validation statistics, typically complemented by a lowered R
free and a decreased gap between R
work and R
free.
doi:10.1107/S0907444911047834
PMCID: PMC3322597
PMID: 22505258
macromolecular crystallography; low resolution; refinement; automation
The general principles behind the macromolecular crystal structure refinement program REFMAC5 are described.
This paper describes various components of the macromolecular crystallographic refinement program REFMAC5, which is distributed as part of the CCP4 suite. REFMAC5 utilizes different likelihood functions depending on the diffraction data employed (amplitudes or intensities), the presence of twinning and the availability of SAD/SIRAS experimental diffraction data. To ensure chemical and structural integrity of the refined model, REFMAC5 offers several classes of restraints and choices of model parameterization. Reliable models at resolutions at least as low as 4 Å can be achieved thanks to low-resolution refinement tools such as secondary-structure restraints, restraints to known homologous structures, automatic global and local NCS restraints, ‘jelly-body’ restraints and the use of novel long-range restraints on atomic displacement parameters (ADPs) based on the Kullback–Leibler divergence. REFMAC5 additionally offers TLS parameterization and, when high-resolution data are available, fast refinement of anisotropic ADPs. Refinement in the presence of twinning is performed in a fully automated fashion. REFMAC5 is a flexible and highly optimized refinement package that is ideally suited for refinement across the entire resolution spectrum encountered in macromolecular crystallography.
doi:10.1107/S0907444911001314
PMCID: PMC3069751
PMID: 21460454
REFMAC5; refinement
There is a growing interest in structural studies of DNA by both experimental and computational approaches. Often, 3D-structural models of DNA are required, for instance, to serve as templates for homology modeling, as starting structures for macro-molecular docking or as scaffold for NMR structure calculations. The conformational adaptability of DNA when binding to a protein is often an important factor and at the same time a limitation in such studies. As a response to the demand for 3D-structural models reflecting the intrinsic plasticity of DNA we present the 3D-DART server (3DNA-Driven DNA Analysis and Rebuilding Tool). The server provides an easy interface to a powerful collection of tools for the generation of DNA-structural models in custom conformations. The computational engine beyond the server makes use of the 3DNA software suite together with a collection of home-written python scripts. The server is freely available at http://haddock.chem.uu.nl/dna without any login requirement.
doi:10.1093/nar/gkp287
PMCID: PMC2703913
PMID: 19417072
Background
Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they often restrict the number of sequence queries and/or provide a limited set of prediction methodologies. Therefore, we present a standalone protein structure prediction software package suitable for high-throughput structural genomic applications that performs all three classes of prediction methodologies: comparative modeling, fold recognition, and ab initio. This software can be deployed on a user's own high-performance computing cluster.
Methodology/Principal Findings
The pipeline consists of a Perl core that integrates more than 20 individual software packages and databases, most of which are freely available from other research laboratories. The query protein sequences are first divided into domains either by domain boundary recognition or Bayesian statistics. The structures of the individual domains are then predicted using template-based modeling or ab initio modeling. The predicted models are scored with a statistical potential and an all-atom force field. The top-scoring ab initio models are annotated by structural comparison against the Structural Classification of Proteins (SCOP) fold database. Furthermore, secondary structure, solvent accessibility, transmembrane helices, and structural disorder are predicted. The results are generated in text, tab-delimited, and hypertext markup language (HTML) formats. So far, the pipeline has been used to study viral and bacterial proteomes.
Conclusions
The standalone pipeline that we introduce here, unlike protein structure prediction Web servers, allows users to devote their own computing assets to process a potentially unlimited number of queries as well as perform resource-intensive ab initio structure prediction.
doi:10.1371/journal.pone.0006254
PMCID: PMC2707601
PMID: 19606223
We present a suite of programs, named CING for Common Interface for NMR Structure Generation that provides for a residue-based, integrated validation of the structural NMR ensemble in conjunction with the experimental restraints and other input data. External validation programs and new internal validation routines compare the NMR-derived models with empirical data, measured chemical shifts, distance- and dihedral restraints and the results are visualized in a dynamic Web 2.0 report. A red–orange–green score is used for residues and restraints to direct the user to those critiques that warrant further investigation. Overall green scores below ~20 % accompanied by red scores over ~50 % are strongly indicative of poorly modelled structures. The publically accessible, secure iCing webserver (https://nmr.le.ac.uk) allows individual users to upload the NMR data and run a CING validation analysis.
Electronic supplementary material
The online version of this article (doi:10.1007/s10858-012-9669-7) contains supplementary material, which is available to authorized users.
doi:10.1007/s10858-012-9669-7
PMCID: PMC3483101
PMID: 22986687
NMR; Structure validation; PDB; Errors; Quality; Protein structure
For many macromolecular NMR ensembles from the Protein Data Bank (PDB) the experiment-based restraint lists are available, while other experimental data, mainly chemical shift values, are often available from the BioMagResBank. The accuracy and precision of the coordinates in these macromolecular NMR ensembles can be improved by recalculation using the available experimental data and present-day software. Such efforts, however, generally fail on half of all NMR ensembles due to the syntactic and semantic heterogeneity of the underlying data and the wide variety of formats used for their deposition. We have combined the remediated restraint information from our NMR Restraints Grid (NRG) database with available chemical shifts from the BioMagResBank and the Common Interface for NMR structure Generation (CING) structure validation reports into the weekly updated NRG-CING database (http://nmr.cmbi.ru.nl/NRG-CING). Eleven programs have been included in the NRG-CING production pipeline to arrive at validation reports that list for each entry the potential inconsistencies between the coordinates and the available experimental NMR data. The longitudinal validation of these data in a publicly available relational database yields a set of indicators that can be used to judge the quality of every macromolecular structure solved with NMR. The remediated NMR experimental data sets and validation reports are freely available online.
doi:10.1093/nar/gkr1134
PMCID: PMC3245154
PMID: 22139937
Berjanskii, Mark | Tang, Peter | Liang, Jack | Cruz, Joseph A. | Zhou, Jianjun | Zhou, You | Bassett, Edward | MacDonell, Cam | Lu, Paul | Lin, Guohui | Wishart, David S.
GeNMR (GEnerate NMR structures) is a web server for rapidly generating accurate 3D protein structures using sequence data, NOE-based distance restraints and/or NMR chemical shifts as input. GeNMR accepts distance restraints in XPLOR or CYANA format as well as chemical shift files in either SHIFTY or BMRB formats. The web server produces an ensemble of PDB coordinates for the protein within 15–25 min, depending on model complexity and completeness of experimental restraints. GeNMR uses a pipeline of several pre-existing programs and servers to calculate the actual protein structure. In particular, GeNMR combines genetic algorithms for structure optimization along with homology modeling, chemical shift threading, torsion angle and distance predictions from chemical shifts/NOEs as well as ROSETTA-based structure generation and simulated annealing with XPLOR-NIH to generate and/or refine protein coordinates. GeNMR greatly simplifies the task of protein structure determination as users do not have to install or become familiar with complex stand-alone programs or obscure format conversion utilities. Tests conducted on a sample of 90 proteins from the BioMagResBank indicate that GeNMR produces high-quality models for all protein queries, regardless of the type of NMR input data. GeNMR was developed to facilitate rapid, user-friendly structure determination of protein structures via NMR spectroscopy. GeNMR is accessible at http://www.genmr.ca.
doi:10.1093/nar/gkp280
PMCID: PMC2703936
PMID: 19406927
Background
Macromolecular structures are modeled by conformational optimization within experimental and knowledge-based restraints. Discrete restraint-based sampling generates high-quality structures within these restraints and facilitates further refinement in a continuous all-atom energy landscape. This approach has been used successfully for protein loop modeling, comparative modeling and electron density fitting in X-ray crystallography.
Results
Here we present a software toolkit (Rappertk) which generalizes discrete restraint-based sampling for use in structural biology. Modular design and multi-layered architecture enables Rappertk to sample conformations of any macromolecule at many levels of detail and within a variety of experimental restraints. Performance against a Cα-tracing benchmark shows that the efficiency has not suffered despite the overhead required by this flexibility. We demonstrate the toolkit's capabilities by building high-quality β-sheets and by introducing restraint-driven sampling. RNA sampling is demonstrated by rebuilding a protein-RNA interface. Ability to construct arbitrary ligands is used in sampling protein-ligand interfaces within electron density. Finally, secondary structure and shape information derived from EM are combined to generate multiple conformations of a protein consistent with the observed density.
Conclusion
Through its modular design and ease of use, Rappertk enables exploration of a wide variety of interesting avenues in structural biology. This toolkit, with illustrative examples, is freely available to academic users from .
doi:10.1186/1472-6807-7-13
PMCID: PMC1847436
PMID: 17376228
Background
Computational sequence analysis, that is, prediction of local sequence properties, homologs, spatial structure and function from the sequence of a protein, offers an efficient way to obtain needed information about proteins under study. Since reliable prediction is usually based on the consensus of many computer programs, meta-severs have been developed to fit such needs. Most meta-servers focus on one aspect of sequence analysis, while others incorporate more information, such as PredictProtein for local sequence feature predictions, SMART for domain architecture and sequence motif annotation, and GeneSilico for secondary and spatial structure prediction. However, as predictions of local sequence properties, three-dimensional structure and function are usually intertwined, it is beneficial to address them together.
Results
We developed a MEta-Server for protein Sequence Analysis (MESSA) to facilitate comprehensive protein sequence analysis and gather structural and functional predictions for a protein of interest. For an input sequence, the server exploits a number of select tools to predict local sequence properties, such as secondary structure, structurally disordered regions, coiled coils, signal peptides and transmembrane helices; detect homologous proteins and assign the query to a protein family; identify three-dimensional structure templates and generate structure models; and provide predictive statements about the protein's function, including functional annotations, Gene Ontology terms, enzyme classification and possible functionally associated proteins. We tested MESSA on the proteome of Candidatus Liberibacter asiaticus. Manual curation shows that three-dimensional structure models generated by MESSA covered around 75% of all the residues in this proteome and the function of 80% of all proteins could be predicted.
Availability
MESSA is free for non-commercial use at http://prodata.swmed.edu/MESSA/
doi:10.1186/1741-7007-10-82
PMCID: PMC3519821
PMID: 23031578
The META-PP server (http://cubic.bioc.columbia.edu/meta/) simplifies access to a battery of public protein structure and function prediction servers by providing a common and stable web-based interface. The goal is to make these powerful and increasingly essential methods more readily available to nonexpert users and the bioinformatics community at large. At present META-PP provides access to a selected set of high-quality servers in the areas of comparative modelling, threading/fold recognition, secondary structure prediction and more specialized fields like contact and function prediction.
PMCID: PMC168978
PMID: 12824314
Background
Fold recognition techniques take advantage of the limited number of overall structural organizations, and have become increasingly effective at identifying the fold of a given target sequence. However, in the absence of sufficient sequence identity, it remains difficult for fold recognition methods to always select the correct model. While a native-like model is often among a pool of highly ranked models, it is not necessarily the highest-ranked one, and the model rankings depend sensitively on the scoring function used. Structure elucidation methods can then be employed to decide among the models based on relatively rapid biochemical/biophysical experiments.
Results
This paper presents an integrated computational-experimental method to determine the fold of a target protein by probing it with a set of planned disulfide cross-links. We start with predicted structural models obtained by standard fold recognition techniques. In a first stage, we characterize the fold-level differences between the models in terms of topological (contact) patterns of secondary structure elements (SSEs), and select a small set of SSE pairs that differentiate the folds. In a second stage, we determine a set of residue-level cross-links to probe the selected SSE pairs. Each stage employs an information-theoretic planning algorithm to maximize information gain while minimizing experimental complexity, along with a Bayes error plan assessment framework to characterize the probability of making a correct decision once data for the plan are collected. By focusing on overall topological differences and planning cross-linking experiments to probe them, our fold determination approach is robust to noise and uncertainty in the models (e.g., threading misalignment) and in the actual structure (e.g., flexibility). We demonstrate the effectiveness of our approach in case studies for a number of CASP targets, showing that the optimized plans have low risk of error while testing only a small portion of the quadratic number of possible cross-link candidates. Simulation studies with these plans further show that they do a very good job of selecting the correct model, according to cross-links simulated from the actual crystal structures.
Conclusions
Fold determination can overcome scoring limitations in purely computational fold recognition methods, while requiring less experimental effort than traditional protein structure determination approaches.
doi:10.1186/1471-2105-12-S12-S5
PMCID: PMC3247086
PMID: 22168447
The RosettaBackrub server (http://kortemmelab.ucsf.edu/backrub) implements the Backrub method, derived from observations of alternative conformations in high-resolution protein crystal structures, for flexible backbone protein modeling. Backrub modeling is applied to three related applications using the Rosetta program for structure prediction and design: (I) modeling of structures of point mutations, (II) generating protein conformational ensembles and designing sequences consistent with these conformations and (III) predicting tolerated sequences at protein–protein interfaces. The three protocols have been validated on experimental data. Starting from a user-provided single input protein structure in PDB format, the server generates near-native conformational ensembles. The predicted conformations and sequences can be used for different applications, such as to guide mutagenesis experiments, for ensemble-docking approaches or to generate sequence libraries for protein design.
doi:10.1093/nar/gkq369
PMCID: PMC2896185
PMID: 20462859
Motivation: Programs that evaluate the quality of a protein structural model are important both for validating the structure determination procedure and for guiding the model-building process. Such programs are based on properties of native structures that are generally not expected for faulty models. One such property, which is rarely used for automatic structure quality assessment, is the tendency for conserved residues to be located at the structural core and for variable residues to be located at the surface.
Results: We present ConQuass, a novel quality assessment program based on the consistency between the model structure and the protein's conservation pattern. We show that it can identify problematic structural models, and that the scores it assigns to the server models in CASP8 correlate with the similarity of the models to the native structure. We also show that when the conservation information is reliable, the method's performance is comparable and complementary to that of the other single-structure quality assessment methods that participated in CASP8 and that do not use additional structural information from homologs.
Availability: A perl implementation of the method, as well as the various perl and R scripts used for the analysis are available at http://bental.tau.ac.il/ConQuass/.
Contact: nirb@tauex.tau.ac.il
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq114
PMCID: PMC2865859
PMID: 20385730
Motivation: Chemical cross-linking of proteins or protein complexes and the mass spectrometry-based localization of the cross-linked amino acids in peptide sequences is a powerful method for generating distance restraints on the substrate's topology.
Results: Here, we introduce the algorithm Xwalk for predicting and validating these cross-links on existing protein structures. Xwalk calculates and displays non-linear distances between chemically cross-linked amino acids on protein surfaces, while mimicking the flexibility and non-linearity of cross-linker molecules. It returns a ‘solvent accessible surface distance’, which corresponds to the length of the shortest path between two amino acids, where the path leads through solvent occupied space without penetrating the protein surface.
Availability: Xwalk is freely available as a web server or stand-alone JAVA application at http://www.xwalk.org.
Contact: abdullah@imsb.biol.ethz.ch; aebersold@imsb.biol.ethz.ch
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btr348
PMCID: PMC3137222
PMID: 21666267
To determine the structures of protein-protein interactions, protein docking is a valuable tool that complements experimental methods to characterize protein complexes. While protein docking can often produce a near-native solution within a set of global docking predictions, there are sometimes predictions that require refinement to elucidate correct contacts and conformation. Previously, we developed the ZRANK algorithm to rerank initial docking predictions from ZDOCK, a docking program developed by our lab. In this study, we have applied the ZRANK algorithm toward refinement of protein docking models, in conjunction with the protein docking program RosettaDock. This was performed by reranking global docking predictions from ZDOCK, performing local side chain and rigid-body refinement using RosettaDock, and selecting the refined model based on ZRANK score. For comparison, we examined using RosettaDock score instead of ZRANK score, and a larger perturbation size for the RosettaDock search, and determined that the larger RosettaDock perturbation size with ZRANK scoring was optimal. This method was validated on a protein-protein docking benchmark. For refining docking benchmark predictions from the newest ZDOCK version, this led to improved structures of top-ranked hits in 20 of 27 cases, and an increase from 23 to 27 cases with hits in the top 20 predictions. Finally, we optimized the ZRANK energy function using refined models, which provides a significant improvement over the original ZRANK energy function. Using this optimized function and the refinement protocol, the numbers of cases with hits ranked at number one increased from 12 to 19 and from 7 to 15 for two different ZDOCK versions. This shows the effective combination of independently developed docking protocols (ZDOCK/ZRANK, and RosettaDock), indicating that using diverse search and scoring functions can improve protein docking results.
doi:10.1002/prot.21920
PMCID: PMC2696687
PMID: 18214977
INFO-RNA is a new web server for designing RNA sequences that fold into a user given secondary structure. Furthermore, constraints on the sequence can be specified, e.g. one can restrict sequence positions to a fixed nucleotide or to a set of nucleotides. Moreover, the user can allow violations of the constraints at some positions, which can be advantageous in complicated cases.
The INFO-RNA web server allows biologists to design RNA sequences in an automatic manner. It is clearly and intuitively arranged and easy to use. The procedure is fast, as most applications are completed within seconds and it proceeds better and faster than other existing tools. The INFO-RNA web server is freely available at http://www.bioinf.uni-freiburg.de/Software/INFO-RNA/
doi:10.1093/nar/gkm218
PMCID: PMC1933236
PMID: 17452349
Summary
A major challenge in structural biology is to determine the configuration of domains and proteins in multi-domain proteins and assemblies, respectively. To maximize the accuracy and precision of these models, all available data should be considered. Small angle x-ray scattering (SAXS) efficiently provides low-resolution experimental data about the shapes of proteins and their assemblies. Thus, we integrated SAXS profiles into our software for modeling proteins and their assemblies by satisfaction of spatial restraints. Specifically, we model the quaternary structures of multidomain proteins with structurally defined rigid domains as well as quaternary structures of binary complexes of structurally defined rigid proteins. In addition to SAXS profiles and the component structures, we employ stereochemical restraints and an atomic distance-dependent statistical potential. The scoring function is optimized by a biased Monte Carlo protocol, including quasi-Newton and simulated annealing schemes. The final prediction corresponds to the best scoring solution in the largest cluster of many independently calculated solutions. To quantify how well the quaternary structures are determined based on their SAXS profiles, we used a benchmark of 12 simulated examples as well as an experimental SAXS profile of the homo-tetramer D-xylose isomerase. Optimization of the SAXS-dependent scoring function generally results in accurate models, if sufficiently precise approximations for the constituent rigid bodies are available; otherwise, the best scoring models can have significant errors. Thus, SAXS profiles can play a useful role in the structural characterization of proteins and assemblies, if they are combined with additional data and used judiciously. Our integration of a SAXS profile into modeling by satisfaction of spatial restraints will facilitate further integration of different kinds of data for structure determination of proteins and their assemblies.
doi:10.1016/j.jmb.2008.07.074
PMCID: PMC2745287
PMID: 18694757
small-angle X-ray scattering; quaternary structure; macromolecular assembly modeling; statistical potentials; protein structure prediction
Summary: G protein-coupled receptors (GPCRs) comprise the largest family of integral membrane proteins. They are the most important class of drug targets. While there exist crystal structures for only a very few GPCR sequences, numerous experiments have been performed on GPCRs to identify the critical residues and motifs. GPCRRD database is designed to systematically collect all experimental restraints (including residue orientation, contact and distance maps) available from the literature and primary GPCR resources using an automated text mining algorithm combined with manual validation, with the purpose of assisting GPCR 3D structure modeling and function annotation. The current dataset contains thousands of spatial restraints from mutagenesis, disulfide mapping distances, electron cryo-microscopy and Fourier-transform infrared spectroscopy experiments.
Availability: http://zhanglab.ccmb.med.umich.edu/GPCRRD/
Contact: zhng@umich.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq563
PMCID: PMC3003545
PMID: 20926423
Summary: RNATOPS-W is a web server to search sequences for RNA secondary structures including pseudoknots. The server accepts an annotated RNA multiple structural alignment as a structural profile and genomic or other sequences to search. It is built upon RNATOPS, a command line C++software package for the same purpose, in which filters to speed up search are manually selected. RNATOPS-W improves upon RNATOPS by adding the function of automatic selection of a hidden Markov model (HMM) filter and also a friendly user interface for selection of a substructure filter by the user. In addition, RNATOPS-W complements existing RNA secondary structure search web servers that either use built-in structure profiles or are not able to detect pseudoknots. RNATOPS-W inherits the efficiency of RNATOPS in detecting large, complex RNA structures.
Availability: The web server RNATOPS-W is available at the web site www.uga.edu/RNA-Informatics/?f=software&p=RNATOPS-w. The underlying search program RNATOPS can be downloaded at www.uga.edu/RNA-Informatics/?f=software&p=RNATOPS.
Contact: cai@cs.uga.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btp095
PMCID: PMC2720711
PMID: 19269988
This paper describes an approach for making use of the components of the experimentally determined rotational diffusion tensor derived from NMR relaxation measurements in macomolecular structure determination. The parameters of the rotational diffusion tensor describe the shape and size of the macromolecule or macromolecular complex and are therefore complimentary to traditional NMR restraints. The structural information contained in the rotational diffusion tensor is not dissimilar to that present in the small angle region of the solution X-ray scattering profiles. We demonstrate the utility of rotational diffusion tensor restraints for protein structure refinement using the N-terminal domain of enzyme I (EIN) as an example and validate the results by solution small angle X-ray scattering. We also show how rotational diffusion tensor restraints can be used for docking complexes using the dimeric HIV-1 protease and the EIN-HPr complexes as examples. In the former case, the rotational diffusion tensor restraints are sufficient in their own right to determine the position of one subunit relative to another. In the latter case, rotational diffusion tensor restraints complemented by highly ambiguous distance restraints derived from chemical shift pertubation mapping and a hydrophobic contact potential are sufficient to correctly dock EIN to HPr. In each case, the cluster containing the lowest energy structure corresponds to the correct solution.
doi:10.1021/ja902336c
PMCID: PMC2739456
PMID: 19537713
Ab initio prediction is the challenging attempt to predict protein structures based only on sequence information and without using templates. It is often divided into
two distinct sub-problems: (a) the scoring function that can distinguish native, or
native-like structures, from non-native ones; and (b) the method of searching the
conformational space. Currently, there is no reliable scoring function that can
always drive a search to the native fold, and there is no general search method
that can guarantee a significant sampling of near-natives. Pathway models combine
the scoring function and the search. In this short review, we explore some of the
ways pathway models are used in folding, in published works since 2001, and
present a new pathway model, HMMSTR-CM, that uses a fragment library and
a set of nucleation/propagation-based rules. The new method was used for ab initio
predictions as part of CASP5. This work was presented at the Winter School in
Bioinformatics, Bologna, Italy, 10–14 February 2003.
doi:10.1002/cfg.305
PMCID: PMC2447365
PMID: 18629080
Methods and resources for obtaining chemically plausible starting models and restraint sets for refinement of ligand complexes are described and some of the potential pitfalls are discussed.
Model building and refinement of complexes between biomacromolecules and small molecules requires sensible starting coordinates as well as the specification of restraint sets for all but the most common non-macromolecular entities. Here, it is described why this is necessary, how it can be accomplished and what pitfalls need to be avoided in order to produce chemically plausible models of the low-molecular-weight entities. A number of programs, servers, databases and other resources that can be of assistance in the process are also discussed.
doi:10.1107/S0907444906022657
PMCID: PMC2483469
PMID: 17164531
refinement; model building; ligand complexes; restraint sets; macromolecular crystallography
Motivation: Decoy datasets, consisting of a solved protein structure and numerous alternative native-like structures, are in common use for the evaluation of scoring functions in protein structure prediction. Several pitfalls with the use of these datasets have been identified in the literature, as well as useful guidelines for generating more effective decoy datasets. We contribute to this ongoing discussion an empirical assessment of several decoy datasets commonly used in experimental studies.
Results: We find that artefacts and sampling issues in the large majority of these data make it trivial to discriminate the native structure. This underlines that evaluation based on the rank/z-score of the native is a weak test of scoring function performance. Moreover, sampling biases present in the way decoy sets are generated or used can strongly affect other types of evaluation measures such as the correlation between score and root mean squared deviation (RMSD) to the native. We demonstrate how, depending on type of bias and evaluation context, sampling biases may lead to both over- or under-estimation of the quality of scoring terms, functions or methods.
Availability: Links to the software and data used in this study are available at http://dbkgroup.org/handl/decoy_sets.
Contact: simon.lovell@manchester.ac.uk
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btp150
PMCID: PMC2677743
PMID: 19297350