Motivation: Metal ions are essential for the folding of RNA molecules into stable tertiary structures and are often involved in the catalytic activity of ribozymes. However, the positions of metal ions in RNA 3D structures are difficult to determine experimentally. This motivated us to develop a computational predictor of metal ion sites for RNA structures.
Results: We developed a statistical potential for predicting positions of metal ions (magnesium, sodium and potassium), based on the analysis of binding sites in experimentally solved RNA structures. The MetalionRNA program is available as a web server that predicts metal ions for RNA structures submitted by the user.
Availability: The MetalionRNA web server is accessible at http://metalionrna.genesilico.pl/.
Supplementary information: Supplementary data are available at Bioinformatics online.
The CCP4 template-restraint library defines restraints for biopolymers, their modifications and ligands that are used in macromolecular structure refinement. JLigand is a graphical editor for generating descriptions of new ligands and covalent linkages.
Biological macromolecules are polymers and therefore the restraints for macromolecular refinement can be subdivided into two sets: restraints that are applied to atoms that all belong to the same monomer and restraints that are associated with the covalent bonds between monomers. The CCP4 template-restraint library contains three types of data entries defining template restraints: descriptions of monomers and their modifications, both used for intramonomer restraints, and descriptions of links for intermonomer restraints. The library provides generic descriptions of modifications and links for protein, DNA and RNA chains, and for some post-translational modifications including glycosylation. Structure-specific template restraints can be defined in a user’s additional restraint library. Here, JLigand, a new CCP4 graphical interface to LibCheck and REFMAC that has been developed to manage the user’s library and generate new monomer entries is described, as well as new entries for links and associated modifications.
macromolecular refinement; restraint library; molecular graphics
Recent developments in PHENIX are reported that allow the use of reference-model torsion restraints, secondary-structure hydrogen-bond restraints and Ramachandran restraints for improved macromolecular refinement in phenix.refine at low resolution.
Traditional methods for macromolecular refinement often have limited success at low resolution (3.0–3.5 Å or worse), producing models that score poorly on crystallographic and geometric validation criteria. To improve low-resolution refinement, knowledge from macromolecular chemistry and homology was used to add three new coordinate-restraint functions to the refinement program phenix.refine. Firstly, a ‘reference-model’ method uses an identical or homologous higher resolution model to add restraints on torsion angles to the geometric target function. Secondly, automatic restraints for common secondary-structure elements in proteins and nucleic acids were implemented that can help to preserve the secondary-structure geometry, which is often distorted at low resolution. Lastly, we have implemented Ramachandran-based restraints on the backbone torsion angles. In this method, a ϕ,ψ term is added to the geometric target function to minimize a modified Ramachandran landscape that smoothly combines favorable peaks identified from nonredundant high-quality data with unfavorable peaks calculated using a clash-based pseudo-energy function. All three methods show improved MolProbity validation statistics, typically complemented by a lowered R
free and a decreased gap between R
work and R
macromolecular crystallography; low resolution; refinement; automation
Once you have generated a 3D model of a protein, how do you know whether it bears any resemblance to the actual structure? To determine the usefulness of 3D models of proteins, they must be assessed in terms of their quality by methods that predict their similarity to the native structure. The ModFOLD4 server is the latest version of our leading independent server for the estimation of both the global and local (per-residue) quality of 3D protein models. The server produces both machine readable and graphical output, providing users with intuitive visual reports on the quality of predicted protein tertiary structures. The ModFOLD4 server is freely available to all at: http://www.reading.ac.uk/bioinf/ModFOLD/.
The general principles behind the macromolecular crystal structure refinement program REFMAC5 are described.
This paper describes various components of the macromolecular crystallographic refinement program REFMAC5, which is distributed as part of the CCP4 suite. REFMAC5 utilizes different likelihood functions depending on the diffraction data employed (amplitudes or intensities), the presence of twinning and the availability of SAD/SIRAS experimental diffraction data. To ensure chemical and structural integrity of the refined model, REFMAC5 offers several classes of restraints and choices of model parameterization. Reliable models at resolutions at least as low as 4 Å can be achieved thanks to low-resolution refinement tools such as secondary-structure restraints, restraints to known homologous structures, automatic global and local NCS restraints, ‘jelly-body’ restraints and the use of novel long-range restraints on atomic displacement parameters (ADPs) based on the Kullback–Leibler divergence. REFMAC5 additionally offers TLS parameterization and, when high-resolution data are available, fast refinement of anisotropic ADPs. Refinement in the presence of twinning is performed in a fully automated fashion. REFMAC5 is a flexible and highly optimized refinement package that is ideally suited for refinement across the entire resolution spectrum encountered in macromolecular crystallography.
There is a growing interest in structural studies of DNA by both experimental and computational approaches. Often, 3D-structural models of DNA are required, for instance, to serve as templates for homology modeling, as starting structures for macro-molecular docking or as scaffold for NMR structure calculations. The conformational adaptability of DNA when binding to a protein is often an important factor and at the same time a limitation in such studies. As a response to the demand for 3D-structural models reflecting the intrinsic plasticity of DNA we present the 3D-DART server (3DNA-Driven DNA Analysis and Rebuilding Tool). The server provides an easy interface to a powerful collection of tools for the generation of DNA-structural models in custom conformations. The computational engine beyond the server makes use of the 3DNA software suite together with a collection of home-written python scripts. The server is freely available at http://haddock.chem.uu.nl/dna without any login requirement.
The RNA Bricks database (http://iimcb.genesilico.pl/rnabricks), stores information about recurrent RNA 3D motifs and their interactions, found in experimentally determined RNA structures and in RNA–protein complexes. In contrast to other similar tools (RNA 3D Motif Atlas, RNA Frabase, Rloom) RNA motifs, i.e. ‘RNA bricks’ are presented in the molecular environment, in which they were determined, including RNA, protein, metal ions, water molecules and ligands. All nucleotide residues in RNA bricks are annotated with structural quality scores that describe real-space correlation coefficients with the electron density data (if available), backbone geometry and possible steric conflicts, which can be used to identify poorly modeled residues. The database is also equipped with an algorithm for 3D motif search and comparison. The algorithm compares spatial positions of backbone atoms of the user-provided query structure and of stored RNA motifs, without relying on sequence or secondary structure information. This enables the identification of local structural similarities among evolutionarily related and unrelated RNA molecules. Besides, the search utility enables searching ‘RNA bricks’ according to sequence similarity, and makes it possible to identify motifs with modified ribonucleotide residues at specific positions.
Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they often restrict the number of sequence queries and/or provide a limited set of prediction methodologies. Therefore, we present a standalone protein structure prediction software package suitable for high-throughput structural genomic applications that performs all three classes of prediction methodologies: comparative modeling, fold recognition, and ab initio. This software can be deployed on a user's own high-performance computing cluster.
The pipeline consists of a Perl core that integrates more than 20 individual software packages and databases, most of which are freely available from other research laboratories. The query protein sequences are first divided into domains either by domain boundary recognition or Bayesian statistics. The structures of the individual domains are then predicted using template-based modeling or ab initio modeling. The predicted models are scored with a statistical potential and an all-atom force field. The top-scoring ab initio models are annotated by structural comparison against the Structural Classification of Proteins (SCOP) fold database. Furthermore, secondary structure, solvent accessibility, transmembrane helices, and structural disorder are predicted. The results are generated in text, tab-delimited, and hypertext markup language (HTML) formats. So far, the pipeline has been used to study viral and bacterial proteomes.
The standalone pipeline that we introduce here, unlike protein structure prediction Web servers, allows users to devote their own computing assets to process a potentially unlimited number of queries as well as perform resource-intensive ab initio structure prediction.
The META-PP server (http://cubic.bioc.columbia.edu/meta/) simplifies access to a battery of public protein structure and function prediction servers by providing a common and stable web-based interface. The goal is to make these powerful and increasingly essential methods more readily available to nonexpert users and the bioinformatics community at large. At present META-PP provides access to a selected set of high-quality servers in the areas of comparative modelling, threading/fold recognition, secondary structure prediction and more specialized fields like contact and function prediction.
The CABS-fold web server provides tools for protein structure prediction from sequence only (de novo modeling) and also using alternative templates (consensus modeling). The web server is based on the CABS modeling procedures ranked in previous Critical Assessment of techniques for protein Structure Prediction competitions as one of the leading approaches for de novo and template-based modeling. Except for template data, fragmentary distance restraints can also be incorporated into the modeling process. The web server output is a coarse-grained trajectory of generated conformations, its Jmol representation and predicted models in all-atom resolution (together with accompanying analysis). CABS-fold can be freely accessed at http://biocomp.chem.uw.edu.pl/CABSfold.
We present a suite of programs, named CING for Common Interface for NMR Structure Generation that provides for a residue-based, integrated validation of the structural NMR ensemble in conjunction with the experimental restraints and other input data. External validation programs and new internal validation routines compare the NMR-derived models with empirical data, measured chemical shifts, distance- and dihedral restraints and the results are visualized in a dynamic Web 2.0 report. A red–orange–green score is used for residues and restraints to direct the user to those critiques that warrant further investigation. Overall green scores below ~20 % accompanied by red scores over ~50 % are strongly indicative of poorly modelled structures. The publically accessible, secure iCing webserver (https://nmr.le.ac.uk) allows individual users to upload the NMR data and run a CING validation analysis.
Electronic supplementary material
The online version of this article (doi:10.1007/s10858-012-9669-7) contains supplementary material, which is available to authorized users.
NMR; Structure validation; PDB; Errors; Quality; Protein structure
For many macromolecular NMR ensembles from the Protein Data Bank (PDB) the experiment-based restraint lists are available, while other experimental data, mainly chemical shift values, are often available from the BioMagResBank. The accuracy and precision of the coordinates in these macromolecular NMR ensembles can be improved by recalculation using the available experimental data and present-day software. Such efforts, however, generally fail on half of all NMR ensembles due to the syntactic and semantic heterogeneity of the underlying data and the wide variety of formats used for their deposition. We have combined the remediated restraint information from our NMR Restraints Grid (NRG) database with available chemical shifts from the BioMagResBank and the Common Interface for NMR structure Generation (CING) structure validation reports into the weekly updated NRG-CING database (http://nmr.cmbi.ru.nl/NRG-CING). Eleven programs have been included in the NRG-CING production pipeline to arrive at validation reports that list for each entry the potential inconsistencies between the coordinates and the available experimental NMR data. The longitudinal validation of these data in a publicly available relational database yields a set of indicators that can be used to judge the quality of every macromolecular structure solved with NMR. The remediated NMR experimental data sets and validation reports are freely available online.
Summary: RNATOPS-W is a web server to search sequences for RNA secondary structures including pseudoknots. The server accepts an annotated RNA multiple structural alignment as a structural profile and genomic or other sequences to search. It is built upon RNATOPS, a command line C++software package for the same purpose, in which filters to speed up search are manually selected. RNATOPS-W improves upon RNATOPS by adding the function of automatic selection of a hidden Markov model (HMM) filter and also a friendly user interface for selection of a substructure filter by the user. In addition, RNATOPS-W complements existing RNA secondary structure search web servers that either use built-in structure profiles or are not able to detect pseudoknots. RNATOPS-W inherits the efficiency of RNATOPS in detecting large, complex RNA structures.
Availability: The web server RNATOPS-W is available at the web site www.uga.edu/RNA-Informatics/?f=software&p=RNATOPS-w. The underlying search program RNATOPS can be downloaded at www.uga.edu/RNA-Informatics/?f=software&p=RNATOPS.
Supplementary information: Supplementary data are available at Bioinformatics online.
GeNMR (GEnerate NMR structures) is a web server for rapidly generating accurate 3D protein structures using sequence data, NOE-based distance restraints and/or NMR chemical shifts as input. GeNMR accepts distance restraints in XPLOR or CYANA format as well as chemical shift files in either SHIFTY or BMRB formats. The web server produces an ensemble of PDB coordinates for the protein within 15–25 min, depending on model complexity and completeness of experimental restraints. GeNMR uses a pipeline of several pre-existing programs and servers to calculate the actual protein structure. In particular, GeNMR combines genetic algorithms for structure optimization along with homology modeling, chemical shift threading, torsion angle and distance predictions from chemical shifts/NOEs as well as ROSETTA-based structure generation and simulated annealing with XPLOR-NIH to generate and/or refine protein coordinates. GeNMR greatly simplifies the task of protein structure determination as users do not have to install or become familiar with complex stand-alone programs or obscure format conversion utilities. Tests conducted on a sample of 90 proteins from the BioMagResBank indicate that GeNMR produces high-quality models for all protein queries, regardless of the type of NMR input data. GeNMR was developed to facilitate rapid, user-friendly structure determination of protein structures via NMR spectroscopy. GeNMR is accessible at http://www.genmr.ca.
G-protein coupled receptors (GPCRs) are a superfamily of cell signaling membrane proteins that include >750 members in the human genome alone. They are the largest family of drug targets. The vast diversity and relevance of GPCRs contrasts with the paucity of structures available: only 21 unique GPCR structures have been experimentally determined as of the beginning of 2013. User-friendly modeling and small molecule docking tools are thus in great demand. While both GPCR structural predictions and docking servers exist separately, with GOMoDo (GPCR Online Modeling and Docking), we provide a web server to seamlessly model GPCR structures and dock ligands to the models in a single consistent pipeline. GOMoDo can automatically perform template choice, homology modeling and either blind or information-driven docking by combining together proven, state of the art bioinformatic tools. The web server gives the user the possibility of guiding the whole procedure. The GOMoDo server is freely accessible at http://molsim.sci.univr.it/gomodo.
Motivation: Ions are essential component of the cell and frequently are found bound to various macromolecules, in particular to proteins. A binding of an ion to a protein greatly affects protein’s biophysical characteristics and needs to be taken into account in any modeling approach. However, ion’s bounded positions cannot be easily revealed experimentally, especially if they are loosely bound to macromolecular surface.
Results: Here, we report a web server, the BION web server, which addresses the demand for tools of predicting surface bound ions, for which specific interactions are not crucial; thus, they are difficult to predict. The BION is easy to use web server that requires only coordinate file to be inputted, and the user is provided with various, but easy to navigate, options. The coordinate file with predicted bound ions is displayed on the output and is available for download.
Supplementary data are available at Bioinformatics online.
Consensus is a server developed to produce high-quality alignments for comparative modeling, and to identify the alignment regions reliable for copying from a given template. This is accomplished even when target–template sequence identity is as low as 5%. Combining the output from five different alignment methods, the server produces a consensus alignment, with a reliability measure indicated for each position and a prediction of the regions suitable for modeling. Models built using the server predictions are typically within 3 Å rms deviations from the crystal structure. Users can upload a target protein sequence and specify a template (PDB code); if no template is given, the server will search for one. The method has been validated on a large set of homologous protein structure pairs. The Consensus server should prove useful for modelers for whom the structural reliability of the model is critical in their applications. It is currently available at http://structure.bu.edu/cgi-bin/consensus/consensus.cgi.
RNA pseudoknots are an important structural feature of RNAs, but often neglected in computer predictions for reasons of efficiency. Here, we present the pknotsRG Web Server for single sequence RNA secondary structure prediction including pseudoknots. pknotsRG employs the newest Turner energy rules for finding the structure of minimal free energy. The algorithm has been improved in several ways recently. First, it has been reimplemented in the C programming language, resulting in a 60-fold increase in speed. Second, all suboptimal foldings up to a user-defined threshold can be enumerated. For large scale analysis, a fast sliding window mode is available. Further improvements of the Web Server are a new output visualization using the PseudoViewer Web Service or RNAmovies for a movie like animation of several suboptimal foldings.
The tool is available as source code, binary executable, online tool or as Web Service. The latter alternative allows for an easy integration into bio-informatics pipelines. pknotsRG is available at the Bielefeld Bioinformatics Server (http://bibiserv.techfak.uni-bielefeld.de/pknotsrg).
RNAstructure is a software package for RNA secondary structure prediction and analysis. This contribution describes a new set of web servers to provide its functionality. The web server offers RNA secondary structure prediction, including free energy minimization, maximum expected accuracy structure prediction and pseudoknot prediction. Bimolecular secondary structure prediction is also provided. Additionally, the server can predict secondary structures conserved in either two homologs or more than two homologs. Folding free energy changes can be predicted for a given RNA structure using nearest neighbor rules. Secondary structures can be compared using circular plots or the scoring methods, sensitivity and positive predictive value. Additionally, structure drawings can be rendered as SVG, postscript, jpeg or pdf. The web server is freely available for public use at: http://rna.urmc.rochester.edu/RNAstructureWeb.
Macromolecular structures are modeled by conformational optimization within experimental and knowledge-based restraints. Discrete restraint-based sampling generates high-quality structures within these restraints and facilitates further refinement in a continuous all-atom energy landscape. This approach has been used successfully for protein loop modeling, comparative modeling and electron density fitting in X-ray crystallography.
Here we present a software toolkit (Rappertk) which generalizes discrete restraint-based sampling for use in structural biology. Modular design and multi-layered architecture enables Rappertk to sample conformations of any macromolecule at many levels of detail and within a variety of experimental restraints. Performance against a Cα-tracing benchmark shows that the efficiency has not suffered despite the overhead required by this flexibility. We demonstrate the toolkit's capabilities by building high-quality β-sheets and by introducing restraint-driven sampling. RNA sampling is demonstrated by rebuilding a protein-RNA interface. Ability to construct arbitrary ligands is used in sampling protein-ligand interfaces within electron density. Finally, secondary structure and shape information derived from EM are combined to generate multiple conformations of a protein consistent with the observed density.
Through its modular design and ease of use, Rappertk enables exploration of a wide variety of interesting avenues in structural biology. This toolkit, with illustrative examples, is freely available to academic users from .
Motivation: Decoy datasets, consisting of a solved protein structure and numerous alternative native-like structures, are in common use for the evaluation of scoring functions in protein structure prediction. Several pitfalls with the use of these datasets have been identified in the literature, as well as useful guidelines for generating more effective decoy datasets. We contribute to this ongoing discussion an empirical assessment of several decoy datasets commonly used in experimental studies.
Results: We find that artefacts and sampling issues in the large majority of these data make it trivial to discriminate the native structure. This underlines that evaluation based on the rank/z-score of the native is a weak test of scoring function performance. Moreover, sampling biases present in the way decoy sets are generated or used can strongly affect other types of evaluation measures such as the correlation between score and root mean squared deviation (RMSD) to the native. We demonstrate how, depending on type of bias and evaluation context, sampling biases may lead to both over- or under-estimation of the quality of scoring terms, functions or methods.
Availability: Links to the software and data used in this study are available at http://dbkgroup.org/handl/decoy_sets.
Supplementary information: Supplementary data are available at Bioinformatics online.
Computational sequence analysis, that is, prediction of local sequence properties, homologs, spatial structure and function from the sequence of a protein, offers an efficient way to obtain needed information about proteins under study. Since reliable prediction is usually based on the consensus of many computer programs, meta-severs have been developed to fit such needs. Most meta-servers focus on one aspect of sequence analysis, while others incorporate more information, such as PredictProtein for local sequence feature predictions, SMART for domain architecture and sequence motif annotation, and GeneSilico for secondary and spatial structure prediction. However, as predictions of local sequence properties, three-dimensional structure and function are usually intertwined, it is beneficial to address them together.
We developed a MEta-Server for protein Sequence Analysis (MESSA) to facilitate comprehensive protein sequence analysis and gather structural and functional predictions for a protein of interest. For an input sequence, the server exploits a number of select tools to predict local sequence properties, such as secondary structure, structurally disordered regions, coiled coils, signal peptides and transmembrane helices; detect homologous proteins and assign the query to a protein family; identify three-dimensional structure templates and generate structure models; and provide predictive statements about the protein's function, including functional annotations, Gene Ontology terms, enzyme classification and possible functionally associated proteins. We tested MESSA on the proteome of Candidatus Liberibacter asiaticus. Manual curation shows that three-dimensional structure models generated by MESSA covered around 75% of all the residues in this proteome and the function of 80% of all proteins could be predicted.
MESSA is free for non-commercial use at http://prodata.swmed.edu/MESSA/
Exploiting the experimental information from small-angle x-ray solution scattering (SAXS) in conjunction with structure prediction algorithms can be advantageous in the case of ribonucleic acids (RNA), where global restraints on the 3D fold are often lacking. Traditional usage of SAXS data often starts by attempting to reconstruct the molecular shape ab initio, which is subsequently used to assess the quality of model Here, an alternative strategy is explored whereby the models from a very large decoy set are directly sorted according to their fit to the SAXS data is developed. For rapid computation of SAXS patterns, the method developed here makes use of a coarse-grained representation of RNA. It also accounts for the explicit treatment of the contribution to the scattering of water molecules and ions surrounding the RNA. The method, called Fast-SAXS-RNA, is first calibrated using a transfer RNA (tRNA-val) and then tested on the P4-P6 fragment of group I intron (P4-P6). Fast-SAXS-RNA is then used as a filter for decoy models generated by the MC-Fold and MC-Sym pipeline, a suite of RNA 3D all-atoms structure algorithms that encode and exploit RNA 3D architectural principles. The ability of Fast-SAXS-RNA to discriminate native folds is tested against three widely used RNA molecules in molecular modeling benchmarks: the tRNA, the P4-P6, and a synthetic hairpin suspected to assemble into a homodimer. For each molecule, a large pool of decoys are generated, scored, and ranked using Fast-SAXS-RNA. The method is able to identify low-RMSD models among top ranking structures, for both tRNA and P4-P6. For the hairpin, the approach correctly identifies the dimeric state as the solution structure over the monomeric state and alternative secondary structures. The method offers a powerful strategy for recognizing native RNA conformations as well as multimeric assemblies and alternative secondary structures, thus enabling high-throughput RNA structure determination using SAXS data.
The function of non-coding RNA genes largely depends on their secondary structure and the interaction with other molecules. Thus, an accurate prediction of secondary structure and RNA–RNA interaction is essential for the understanding of biological roles and pathways associated with a specific RNA gene. We present web servers to analyze multiple RNA sequences for common RNA structure and for RNA interaction sites. The web servers are based on the recent PET (Probabilistic Evolutionary and Thermodynamic) models PETfold and PETcofold, but add user friendly features ranging from a graphical layer to interactive usage of the predictors. Additionally, the web servers provide direct access to annotated RNA alignments, such as the Rfam 10.0 database and multiple alignments of 16 vertebrate genomes with human. The web servers are freely available at: http://rth.dk/resources/petfold/
Fold recognition techniques take advantage of the limited number of overall structural organizations, and have become increasingly effective at identifying the fold of a given target sequence. However, in the absence of sufficient sequence identity, it remains difficult for fold recognition methods to always select the correct model. While a native-like model is often among a pool of highly ranked models, it is not necessarily the highest-ranked one, and the model rankings depend sensitively on the scoring function used. Structure elucidation methods can then be employed to decide among the models based on relatively rapid biochemical/biophysical experiments.
This paper presents an integrated computational-experimental method to determine the fold of a target protein by probing it with a set of planned disulfide cross-links. We start with predicted structural models obtained by standard fold recognition techniques. In a first stage, we characterize the fold-level differences between the models in terms of topological (contact) patterns of secondary structure elements (SSEs), and select a small set of SSE pairs that differentiate the folds. In a second stage, we determine a set of residue-level cross-links to probe the selected SSE pairs. Each stage employs an information-theoretic planning algorithm to maximize information gain while minimizing experimental complexity, along with a Bayes error plan assessment framework to characterize the probability of making a correct decision once data for the plan are collected. By focusing on overall topological differences and planning cross-linking experiments to probe them, our fold determination approach is robust to noise and uncertainty in the models (e.g., threading misalignment) and in the actual structure (e.g., flexibility). We demonstrate the effectiveness of our approach in case studies for a number of CASP targets, showing that the optimized plans have low risk of error while testing only a small portion of the quadratic number of possible cross-link candidates. Simulation studies with these plans further show that they do a very good job of selecting the correct model, according to cross-links simulated from the actual crystal structures.
Fold determination can overcome scoring limitations in purely computational fold recognition methods, while requiring less experimental effort than traditional protein structure determination approaches.