Motivation: Metal ions are essential for the folding of RNA molecules into stable tertiary structures and are often involved in the catalytic activity of ribozymes. However, the positions of metal ions in RNA 3D structures are difficult to determine experimentally. This motivated us to develop a computational predictor of metal ion sites for RNA structures.
Results: We developed a statistical potential for predicting positions of metal ions (magnesium, sodium and potassium), based on the analysis of binding sites in experimentally solved RNA structures. The MetalionRNA program is available as a web server that predicts metal ions for RNA structures submitted by the user.
Availability: The MetalionRNA web server is accessible at http://metalionrna.genesilico.pl/.
Supplementary information: Supplementary data are available at Bioinformatics online.
Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.
COLORADO3D is a World Wide Web server for the visual presentation of three-dimensional (3D) protein structures. COLORADO3D indicates the presence of potential errors (detected by ANOLEA, PROSAII, PROVE or VERIFY3D), identifies buried residues and depicts sequence conservations. As input, the server takes a file of Protein Data Bank (PDB) coordinates and, optionally, a multiple sequence alignment. As output, the server returns a PDB-formatted file, replacing the B-factor column with values of the chosen parameter (structure quality, residue burial or conservation). Thus, the coordinates of the analyzed protein ‘colored’ by COLORADO3D can be conveniently displayed with structure viewers such as RASMOL in order to visualize the 3D clusters of regions with common features, which may not necessarily be adjacent to each other at the amino acid sequence level. In particular, COLORADO3D may serve as a tool to judge a structure's quality at various stages of the modeling and refinement (during both experimental structure determination and homology modeling). The GeneSilico group used COLORADO3D in the fifth Critical Assessment of Techniques for Protein Structure Prediction (CASP5) to successfully identify well-folded parts of preliminary homology models and to guide the refinement of misthreaded protein sequences. COLORADO3D is freely available for academic use at http://asia.genesilico.pl/colorado3d/.
Summary: Co-crystallization experiments of proteins with nucleic acids do not guarantee that both components are present in the crystal. We have previously developed DIBER to predict crystal content when protein and DNA are present in the crystallization mix. Here, we present RIBER, which should be used when protein and RNA are in the crystallization drop. The combined RIBER/DIBER suite builds on machine learning techniques to make reliable, quantitative predictions of crystal content for non-expert users and high-throughput crystallography.
Availability: The program source code, Linux binaries and a web server are available at http://diber.iimcb.gov.pl/ RIBER/DIBER requires diffraction data to at least 3.0 Å resolution in MTZ or CIF (web server only) format. The RIBER/DIBER code is subject to the GNU Public License.
Supplementary data are available at Bioinformatics online.
Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they often restrict the number of sequence queries and/or provide a limited set of prediction methodologies. Therefore, we present a standalone protein structure prediction software package suitable for high-throughput structural genomic applications that performs all three classes of prediction methodologies: comparative modeling, fold recognition, and ab initio. This software can be deployed on a user's own high-performance computing cluster.
The pipeline consists of a Perl core that integrates more than 20 individual software packages and databases, most of which are freely available from other research laboratories. The query protein sequences are first divided into domains either by domain boundary recognition or Bayesian statistics. The structures of the individual domains are then predicted using template-based modeling or ab initio modeling. The predicted models are scored with a statistical potential and an all-atom force field. The top-scoring ab initio models are annotated by structural comparison against the Structural Classification of Proteins (SCOP) fold database. Furthermore, secondary structure, solvent accessibility, transmembrane helices, and structural disorder are predicted. The results are generated in text, tab-delimited, and hypertext markup language (HTML) formats. So far, the pipeline has been used to study viral and bacterial proteomes.
The standalone pipeline that we introduce here, unlike protein structure prediction Web servers, allows users to devote their own computing assets to process a potentially unlimited number of queries as well as perform resource-intensive ab initio structure prediction.
The explosion of the size of the universe of known protein sequences has stimulated two complementary approaches to structural mapping of these sequences: theoretical structure prediction and experimental determination by structural genomics (SG). In this work, we assess the accuracy of structure prediction by two automated template-based structure prediction metaservers (genesilico.pl and bioinfo.pl) by measuring the structural similarity of the predicted models to corresponding experimental models determined a posteriori. Of 199 targets chosen from SG programs, the metaservers predicted the structures of about a fourth of them “correctly.” (In this case, “correct” was defined as placing more than 70% of the alpha carbon atoms in the model within 2 Å of the experimentally determined positions.) Almost all of the targets that could be modeled to this accuracy were those with an available template in the Protein Data Bank (PDB) with more than 25% sequence identity. The majority of those SG targets with lower sequence identity to structures in the PDB were not predicted by the metaservers with this accuracy. We also compared metaserver results to CASP8 results, finding that the models obtained by participants in the CASP competition were significantly better than those produced by the metaservers.
The accurate prediction of ligand binding residues from amino acid sequences is important for the automated functional annotation of novel proteins. In the previous two CASP experiments, the most successful methods in the function prediction category were those which used structural superpositions of 3D models and related templates with bound ligands in order to identify putative contacting residues. However, whilst most of this prediction process can be automated, visual inspection and manual adjustments of parameters, such as the distance thresholds used for each target, have often been required to prevent over prediction. Here we describe a novel method FunFOLD, which uses an automatic approach for cluster identification and residue selection. The software provided can easily be integrated into existing fold recognition servers, requiring only a 3D model and list of templates as inputs. A simple web interface is also provided allowing access to non-expert users. The method has been benchmarked against the top servers and manual prediction groups tested at both CASP8 and CASP9.
The FunFOLD method shows a significant improvement over the best available servers and is shown to be competitive with the top manual prediction groups that were tested at CASP8. The FunFOLD method is also competitive with both the top server and manual methods tested at CASP9. When tested using common subsets of targets, the predictions from FunFOLD are shown to achieve a significantly higher mean Matthews Correlation Coefficient (MCC) scores and Binding-site Distance Test (BDT) scores than all server methods that were tested at CASP8. Testing on the CASP9 set showed no statistically significant separation in performance between FunFOLD and the other top server groups tested.
The FunFOLD software is freely available as both a standalone package and a prediction server, providing competitive ligand binding site residue predictions for expert and non-expert users alike. The software provides a new fully automated approach for structure based function prediction using 3D models of proteins.
Protein tertiary structure prediction is a fundamental problem in computational biology and identifying the most native-like model from a set of predicted models is a key sub-problem. Consensus methods work well when the redundant models in the set are the most native-like, but fail when the most native-like model is unique. In contrast, structure-based methods score models independently and can be applied to model sets of any size and redundancy level. Additionally, structure-based methods have a variety of important applications including analogous fold recognition, refinement of sequence-structure alignments, and de novo prediction. The purpose of this work was to develop a structure-based model selection method based on predicted structural features that could be applied successfully to any set of models.
Here we introduce SELECTpro, a novel structure-based model selection method derived from an energy function comprising physical, statistical, and predicted structural terms. Novel and unique energy terms include predicted secondary structure, predicted solvent accessibility, predicted contact map, β-strand pairing, and side-chain hydrogen bonding.
SELECTpro participated in the new model quality assessment (QA) category in CASP7, submitting predictions for all 95 targets and achieved top results. The average difference in GDT-TS between models ranked first by SELECTpro and the most native-like model was 5.07. This GDT-TS difference was less than 1% of the GDT-TS of the most native-like model for 18 targets, and less than 10% for 66 targets. SELECTpro also ranked the single most native-like first for 15 targets, in the top five for 39 targets, and in the top ten for 53 targets, more often than any other method. Because the ranking metric is skewed by model redundancy and ignores poor models with a better ranking than the most native-like model, the BLUNDER metric is introduced to overcome these limitations. SELECTpro is also evaluated on a recent benchmark set of 16 small proteins with large decoy sets of 12500 to 20000 models for each protein, where it outperforms the benchmarked method (I-TASSER).
SELECTpro is an effective model selection method that scores models independently and is appropriate for use on any model set. SELECTpro is available for download as a stand alone application at: . SELECTpro is also available as a public server at the same site.
Understanding the numerous functions that RNAs play in living cells depends critically on knowledge of their three-dimensional structure. Due to the difficulties in experimentally assessing structures of large RNAs, there is currently great demand for new high-resolution structure prediction methods. We present the novel method for the fully automated prediction of RNA 3D structures from a user-defined secondary structure. The concept is founded on the machine translation system. The translation engine operates on the RNA FRABASE database tailored to the dictionary relating the RNA secondary structure and tertiary structure elements. The translation algorithm is very fast. Initial 3D structure is composed in a range of seconds on a single processor. The method assures the prediction of large RNA 3D structures of high quality. Our approach needs neither structural templates nor RNA sequence alignment, required for comparative methods. This enables the building of unresolved yet native and artificial RNA structures. The method is implemented in a publicly available, user-friendly server RNAComposer. It works in an interactive mode and a batch mode. The batch mode is designed for large-scale modelling and accepts atomic distance restraints. Presently, the server is set to build RNA structures of up to 500 residues.
Exploiting the experimental information from small-angle x-ray solution scattering (SAXS) in conjunction with structure prediction algorithms can be advantageous in the case of ribonucleic acids (RNA), where global restraints on the 3D fold are often lacking. Traditional usage of SAXS data often starts by attempting to reconstruct the molecular shape ab initio, which is subsequently used to assess the quality of model Here, an alternative strategy is explored whereby the models from a very large decoy set are directly sorted according to their fit to the SAXS data is developed. For rapid computation of SAXS patterns, the method developed here makes use of a coarse-grained representation of RNA. It also accounts for the explicit treatment of the contribution to the scattering of water molecules and ions surrounding the RNA. The method, called Fast-SAXS-RNA, is first calibrated using a transfer RNA (tRNA-val) and then tested on the P4-P6 fragment of group I intron (P4-P6). Fast-SAXS-RNA is then used as a filter for decoy models generated by the MC-Fold and MC-Sym pipeline, a suite of RNA 3D all-atoms structure algorithms that encode and exploit RNA 3D architectural principles. The ability of Fast-SAXS-RNA to discriminate native folds is tested against three widely used RNA molecules in molecular modeling benchmarks: the tRNA, the P4-P6, and a synthetic hairpin suspected to assemble into a homodimer. For each molecule, a large pool of decoys are generated, scored, and ranked using Fast-SAXS-RNA. The method is able to identify low-RMSD models among top ranking structures, for both tRNA and P4-P6. For the hairpin, the approach correctly identifies the dimeric state as the solution structure over the monomeric state and alternative secondary structures. The method offers a powerful strategy for recognizing native RNA conformations as well as multimeric assemblies and alternative secondary structures, thus enabling high-throughput RNA structure determination using SAXS data.
We present a suite of programs, named CING for Common Interface for NMR Structure Generation that provides for a residue-based, integrated validation of the structural NMR ensemble in conjunction with the experimental restraints and other input data. External validation programs and new internal validation routines compare the NMR-derived models with empirical data, measured chemical shifts, distance- and dihedral restraints and the results are visualized in a dynamic Web 2.0 report. A red–orange–green score is used for residues and restraints to direct the user to those critiques that warrant further investigation. Overall green scores below ~20 % accompanied by red scores over ~50 % are strongly indicative of poorly modelled structures. The publically accessible, secure iCing webserver (https://nmr.le.ac.uk) allows individual users to upload the NMR data and run a CING validation analysis.
Electronic supplementary material
The online version of this article (doi:10.1007/s10858-012-9669-7) contains supplementary material, which is available to authorized users.
NMR; Structure validation; PDB; Errors; Quality; Protein structure
QA-RecombineIt provides a web interface to assess the quality of protein 3D structure models and to improve the accuracy of models by merging fragments of multiple input models. QA-RecombineIt has been developed for protein modelers who are working on difficult problems, have a set of different homology models and/or de novo models (from methods such as I-TASSER or ROSETTA) and would like to obtain one consensus model that incorporates the best parts into one structure that is internally coherent. An advanced mode is also available, in which one can modify the operation of the fragment recombination algorithm by manually identifying individual fragments or entire models to recombine. Our method produces up to 100 models that are expected to be on the average more accurate than the starting models. Therefore, our server may be useful for crystallographic protein structure determination, where protein models are used for Molecular Replacement to solve the phase problem. To address the latter possibility, a special feature was added to the QA-RecombineIt server. The QA-RecombineIt server can be freely accessed at http://iimcb.genesilico.pl/qarecombineit/.
Protein-RNA interactions play fundamental roles in many biological processes. Understanding the molecular mechanism of protein-RNA recognition and formation of protein-RNA complexes is a major challenge in structural biology. Unfortunately, the experimental determination of protein-RNA complexes is tedious and difficult, both by X-ray crystallography and NMR. For many interacting proteins and RNAs the individual structures are available, enabling computational prediction of complex structures by computational docking. However, methods for protein-RNA docking remain scarce, in particular in comparison to the numerous methods for protein-protein docking.
We developed two medium-resolution, knowledge-based potentials for scoring protein-RNA models obtained by docking: the quasi-chemical potential (QUASI-RNP) and the Decoys As the Reference State potential (DARS-RNP). Both potentials use a coarse-grained representation for both RNA and protein molecules and are capable of dealing with RNA structures with posttranscriptionally modified residues. We compared the discriminative power of DARS-RNP and QUASI-RNP for selecting rigid-body docking poses with the potentials previously developed by the Varani and Fernandez groups.
In both bound and unbound docking tests, DARS-RNP showed the highest ability to identify native-like structures. Python implementations of DARS-RNP and QUASI-RNP are freely available for download at http://iimcb.genesilico.pl/RNP/
RNA; protein; RNP; macromolecular docking; complex modeling; structural bioinformatics
domain assembly server, available at http://ffas.burnham.org/AIDA/ is a tool that can identify domains in multi-domain proteins and then predict their 3D structures and relative spatial arrangements. The server is free and open to all users, and there is an option for a user to provide an e-mail to get the link to result page. Domains are evolutionary conserved and often functionally independent units in proteins. Most proteins, especially eukaryotic ones, consist of multiple domains while at the same time, most experimentally determined protein structures contain only one or two domains. As a result, often structures of individual domains in multi-domain proteins can be accurately predicted, but the mutual arrangement of different domains remains unknown. To address this issue we have developed AIDA program, which combines steps of identifying individual domains, predicting (separately) their structures and assembling them into multiple domain complexes using an ab initio folding potential to describe domain–domain interactions. AIDA server not only supports the assembly of a large number of continuous domains, but also allows the assembly of domains inserted into other domains. Users can also provide distance restraints to guide the AIDA energy minimization.
MODOMICS, a database devoted to the systems biology of RNA modification, has been subjected to substantial improvements. It provides comprehensive information on the chemical structure of modified nucleosides, pathways of their biosynthesis, sequences of RNAs containing these modifications and RNA-modifying enzymes. MODOMICS also provides cross-references to other databases and to literature. In addition to the previously available manually curated tRNA sequences from a few model organisms, we have now included additional tRNAs and rRNAs, and all RNAs with 3D structures in the Nucleic Acid Database, in which modified nucleosides are present. In total, 3460 modified bases in RNA sequences of different organisms have been annotated. New RNA-modifying enzymes have been also added. The current collection of enzymes includes mainly proteins for the model organisms Escherichia coli and Saccharomyces cerevisiae, and is currently being expanded to include proteins from other organisms, in particular Archaea and Homo sapiens. For enzymes with known structures, links are provided to the corresponding Protein Data Bank entries, while for many others homology models have been created. Many new options for database searching and querying have been included. MODOMICS can be accessed at http://genesilico.pl/modomics.
The RNA Bricks database (http://iimcb.genesilico.pl/rnabricks), stores information about recurrent RNA 3D motifs and their interactions, found in experimentally determined RNA structures and in RNA–protein complexes. In contrast to other similar tools (RNA 3D Motif Atlas, RNA Frabase, Rloom) RNA motifs, i.e. ‘RNA bricks’ are presented in the molecular environment, in which they were determined, including RNA, protein, metal ions, water molecules and ligands. All nucleotide residues in RNA bricks are annotated with structural quality scores that describe real-space correlation coefficients with the electron density data (if available), backbone geometry and possible steric conflicts, which can be used to identify poorly modeled residues. The database is also equipped with an algorithm for 3D motif search and comparison. The algorithm compares spatial positions of backbone atoms of the user-provided query structure and of stored RNA motifs, without relying on sequence or secondary structure information. This enables the identification of local structural similarities among evolutionarily related and unrelated RNA molecules. Besides, the search utility enables searching ‘RNA bricks’ according to sequence similarity, and makes it possible to identify motifs with modified ribonucleotide residues at specific positions.
Recent developments in PHENIX are reported that allow the use of reference-model torsion restraints, secondary-structure hydrogen-bond restraints and Ramachandran restraints for improved macromolecular refinement in phenix.refine at low resolution.
Traditional methods for macromolecular refinement often have limited success at low resolution (3.0–3.5 Å or worse), producing models that score poorly on crystallographic and geometric validation criteria. To improve low-resolution refinement, knowledge from macromolecular chemistry and homology was used to add three new coordinate-restraint functions to the refinement program phenix.refine. Firstly, a ‘reference-model’ method uses an identical or homologous higher resolution model to add restraints on torsion angles to the geometric target function. Secondly, automatic restraints for common secondary-structure elements in proteins and nucleic acids were implemented that can help to preserve the secondary-structure geometry, which is often distorted at low resolution. Lastly, we have implemented Ramachandran-based restraints on the backbone torsion angles. In this method, a ϕ,ψ term is added to the geometric target function to minimize a modified Ramachandran landscape that smoothly combines favorable peaks identified from nonredundant high-quality data with unfavorable peaks calculated using a clash-based pseudo-energy function. All three methods show improved MolProbity validation statistics, typically complemented by a lowered R
free and a decreased gap between R
work and R
macromolecular crystallography; low resolution; refinement; automation
Restrained molecular dynamics simulations are a robust, though perhaps underused, tool for the end-stage refinement of biomolecular structures. We demonstrate their utility—using modern simulation protocols, optimized force fields, and inclusion of explicit solvent and mobile counterions—by re-investigating the solution structures of two RNA hairpins that had previously been refined using conventional techniques. The structures, both domain 5 group II intron ribozymes from yeast ai5γ and Pylaiella littoralis, share a nearly identical primary sequence yet the published 3D structures appear quite different. Relatively long restrained MD simulations using the original NMR restraint data identified the presence of a small set of violated distance restraints in one structure and a possibly incorrect trapped bulge nucleotide conformation in the other structure. The removal of problematic distance restraints and the addition of a heating step yielded representative ensembles with very similar 3D structures and much lower pairwise RMSD values. Analysis of ion density during the restrained simulations helped to explain chemical shift perturbation data published previously. These results suggest that restrained MD simulations, with proper caution, can be used to “update” older structures or aid in the refinement of new structures that lack sufficient experimental data to produce a high quality result. Notable cautions include the need for sufficient sampling, awareness of potential force field bias (such as small angle deviations with the current AMBER force fields), and a proper balance between the various restraint weights.
Electronic supplementary material
The online version of this article (doi:10.1007/s10858-012-9642-5) contains supplementary material, which is available to authorized users.
RNA structure; Molecular dynamics; Residual dipolar coupling restraints; Bulge structure; Force fields; Ion binding
For many macromolecular NMR ensembles from the Protein Data Bank (PDB) the experiment-based restraint lists are available, while other experimental data, mainly chemical shift values, are often available from the BioMagResBank. The accuracy and precision of the coordinates in these macromolecular NMR ensembles can be improved by recalculation using the available experimental data and present-day software. Such efforts, however, generally fail on half of all NMR ensembles due to the syntactic and semantic heterogeneity of the underlying data and the wide variety of formats used for their deposition. We have combined the remediated restraint information from our NMR Restraints Grid (NRG) database with available chemical shifts from the BioMagResBank and the Common Interface for NMR structure Generation (CING) structure validation reports into the weekly updated NRG-CING database (http://nmr.cmbi.ru.nl/NRG-CING). Eleven programs have been included in the NRG-CING production pipeline to arrive at validation reports that list for each entry the potential inconsistencies between the coordinates and the available experimental NMR data. The longitudinal validation of these data in a publicly available relational database yields a set of indicators that can be used to judge the quality of every macromolecular structure solved with NMR. The remediated NMR experimental data sets and validation reports are freely available online.
The CCP4 template-restraint library defines restraints for biopolymers, their modifications and ligands that are used in macromolecular structure refinement. JLigand is a graphical editor for generating descriptions of new ligands and covalent linkages.
Biological macromolecules are polymers and therefore the restraints for macromolecular refinement can be subdivided into two sets: restraints that are applied to atoms that all belong to the same monomer and restraints that are associated with the covalent bonds between monomers. The CCP4 template-restraint library contains three types of data entries defining template restraints: descriptions of monomers and their modifications, both used for intramonomer restraints, and descriptions of links for intermonomer restraints. The library provides generic descriptions of modifications and links for protein, DNA and RNA chains, and for some post-translational modifications including glycosylation. Structure-specific template restraints can be defined in a user’s additional restraint library. Here, JLigand, a new CCP4 graphical interface to LibCheck and REFMAC that has been developed to manage the user’s library and generate new monomer entries is described, as well as new entries for links and associated modifications.
macromolecular refinement; restraint library; molecular graphics
Computational models of protein structure are usually inaccurate and exhibit significant deviations from the true structure. The utility of models depends on the degree of these deviations. A number of predictive methods have been developed to discriminate between the globally incorrect and approximately correct models. However, only a few methods predict correctness of different parts of computational models. Several Model Quality Assessment Programs (MQAPs) have been developed to detect local inaccuracies in unrefined crystallographic models, but it is not known if they are useful for computational models, which usually exhibit different and much more severe errors.
The ability to identify local errors in models was tested for eight MQAPs: VERIFY3D, PROSA, BALA, ANOLEA, PROVE, TUNE, REFINER, PROQRES on 8251 models from the CASP-5 and CASP-6 experiments, by calculating the Spearman's rank correlation coefficients between per-residue scores of these methods and local deviations between C-alpha atoms in the models vs. experimental structures. As a reference, we calculated the value of correlation between the local deviations and trivial features that can be calculated for each residue directly from the models, i.e. solvent accessibility, depth in the structure, and the number of local and non-local neighbours. We found that absolute correlations of scores returned by the MQAPs and local deviations were poor for all methods. In addition, scores of PROQRES and several other MQAPs strongly correlate with 'trivial' features. Therefore, we developed MetaMQAP, a meta-predictor based on a multivariate regression model, which uses scores of the above-mentioned methods, but in which trivial parameters are controlled. MetaMQAP predicts the absolute deviation (in Ångströms) of individual C-alpha atoms between the model and the unknown true structure as well as global deviations (expressed as root mean square deviation and GDT_TS scores). Local model accuracy predicted by MetaMQAP shows an impressive correlation coefficient of 0.7 with true deviations from native structures, a significant improvement over all constituent primary MQAP scores. The global MetaMQAP score is correlated with model GDT_TS on the level of 0.89.
Finally, we compared our method with the MQAPs that scored best in the 7th edition of CASP, using CASP7 server models (not included in the MetaMQAP training set) as the test data. In our benchmark, MetaMQAP is outperformed only by PCONS6 and method QA_556 – methods that require comparison of multiple alternative models and score each of them depending on its similarity to other models. MetaMQAP is however the best among methods capable of evaluating just single models.
We implemented the MetaMQAP as a web server available for free use by all academic users at the URL
The general principles behind the macromolecular crystal structure refinement program REFMAC5 are described.
This paper describes various components of the macromolecular crystallographic refinement program REFMAC5, which is distributed as part of the CCP4 suite. REFMAC5 utilizes different likelihood functions depending on the diffraction data employed (amplitudes or intensities), the presence of twinning and the availability of SAD/SIRAS experimental diffraction data. To ensure chemical and structural integrity of the refined model, REFMAC5 offers several classes of restraints and choices of model parameterization. Reliable models at resolutions at least as low as 4 Å can be achieved thanks to low-resolution refinement tools such as secondary-structure restraints, restraints to known homologous structures, automatic global and local NCS restraints, ‘jelly-body’ restraints and the use of novel long-range restraints on atomic displacement parameters (ADPs) based on the Kullback–Leibler divergence. REFMAC5 additionally offers TLS parameterization and, when high-resolution data are available, fast refinement of anisotropic ADPs. Refinement in the presence of twinning is performed in a fully automated fashion. REFMAC5 is a flexible and highly optimized refinement package that is ideally suited for refinement across the entire resolution spectrum encountered in macromolecular crystallography.
Text mining methods have added considerably to our capacity to extract biological knowledge from the literature. Recently the field of systems biology has begun to model and simulate metabolic networks, requiring knowledge of the set of molecules involved. While genomics and proteomics technologies are able to supply the macromolecular parts list, the metabolites are less easily assembled. Most metabolites are known and reported through the scientific literature, rather than through large-scale experimental surveys. Thus it is important to recover them from the literature. Here we present a novel tool to automatically identify metabolite names in the literature, and associate structures where possible, to define the reported yeast metabolome. With ten-fold cross validation on a manually annotated corpus, our recognition tool generates an f-score of 78.49 (precision of 83.02) and demonstrates greater suitability in identifying metabolite names than other existing recognition tools for general chemical molecules. The metabolite recognition tool has been applied to the literature covering an important model organism, the yeast Saccharomyces cerevisiae, to define its reported metabolome. By coupling to ChemSpider, a major chemical database, we have identified structures for much of the reported metabolome and, where structure identification fails, been able to suggest extensions to ChemSpider. Our manually annotated gold-standard data on 296 abstracts are available as supplementary materials. Metabolite names and, where appropriate, structures are also available as supplementary materials.
Electronic supplementary material
The online version of this article (doi:10.1007/s11306-010-0251-6) contains supplementary material, which is available to authorized users.
Text mining; Named entity recognition; Yeast metabolome
PROSESS (PROtein Structure Evaluation Suite and Server) is a web server designed to evaluate and validate protein structures generated by X-ray crystallography, NMR spectroscopy or computational modeling. While many structure evaluation packages have been developed over the past 20 years, PROSESS is unique in its comprehensiveness, its capacity to evaluate X-ray, NMR and predicted structures as well as its ability to evaluate a variety of experimental NMR data. PROSESS integrates a variety of previously developed, well-known and thoroughly tested methods to evaluate both global and residue specific: (i) covalent and geometric quality; (ii) non-bonded/packing quality; (iii) torsion angle quality; (iv) chemical shift quality and (v) NOE quality. In particular, PROSESS uses VADAR for coordinate, packing, H-bond, secondary structure and geometric analysis, GeNMR for calculating folding, threading and solvent energetics, ShiftX for calculating chemical shift correlations, RCI for correlating structure mobility to chemical shift and PREDITOR for calculating torsion angle-chemical shifts agreement. PROSESS also incorporates several other programs including MolProbity to assess atomic clashes, Xplor-NIH to identify and quantify NOE restraint violations and NAMD to assess structure energetics. PROSESS produces detailed tables, explanations, structural images and graphs that summarize the results and compare them to values observed in high-quality or high-resolution protein structures. Using a simplified red–amber–green coloring scheme PROSESS also alerts users about both general and residue-specific structural problems. PROSESS is intended to serve as a tool that can be used by structure biologists as well as database curators to assess and validate newly determined protein structures. PROSESS is freely available at http://www.prosess.ca.
Macromolecular structures are modeled by conformational optimization within experimental and knowledge-based restraints. Discrete restraint-based sampling generates high-quality structures within these restraints and facilitates further refinement in a continuous all-atom energy landscape. This approach has been used successfully for protein loop modeling, comparative modeling and electron density fitting in X-ray crystallography.
Here we present a software toolkit (Rappertk) which generalizes discrete restraint-based sampling for use in structural biology. Modular design and multi-layered architecture enables Rappertk to sample conformations of any macromolecule at many levels of detail and within a variety of experimental restraints. Performance against a Cα-tracing benchmark shows that the efficiency has not suffered despite the overhead required by this flexibility. We demonstrate the toolkit's capabilities by building high-quality β-sheets and by introducing restraint-driven sampling. RNA sampling is demonstrated by rebuilding a protein-RNA interface. Ability to construct arbitrary ligands is used in sampling protein-ligand interfaces within electron density. Finally, secondary structure and shape information derived from EM are combined to generate multiple conformations of a protein consistent with the observed density.
Through its modular design and ease of use, Rappertk enables exploration of a wide variety of interesting avenues in structural biology. This toolkit, with illustrative examples, is freely available to academic users from .