Motivation: Metal ions are essential for the folding of RNA molecules into stable tertiary structures and are often involved in the catalytic activity of ribozymes. However, the positions of metal ions in RNA 3D structures are difficult to determine experimentally. This motivated us to develop a computational predictor of metal ion sites for RNA structures.
Results: We developed a statistical potential for predicting positions of metal ions (magnesium, sodium and potassium), based on the analysis of binding sites in experimentally solved RNA structures. The MetalionRNA program is available as a web server that predicts metal ions for RNA structures submitted by the user.
Availability: The MetalionRNA web server is accessible at http://metalionrna.genesilico.pl/.
Supplementary information: Supplementary data are available at Bioinformatics online.
Protein–RNA and protein–DNA interactions play fundamental roles in many biological processes. A detailed understanding of these interactions requires knowledge about protein–nucleic acid complex structures. Because the experimental determination of these complexes is time-consuming and perhaps futile in some instances, we have focused on computational docking methods starting from the separate structures. Docking methods are widely employed to study protein–protein interactions; however, only a few methods have been made available to model protein–nucleic acid complexes. Here, we describe NPDock (Nucleic acid–Protein Docking); a novel web server for predicting complexes of protein–nucleic acid structures which implements a computational workflow that includes docking, scoring of poses, clustering of the best-scored models and refinement of the most promising solutions. The NPDock server provides a user-friendly interface and 3D visualization of the results. The smallest set of input data consists of a protein structure and a DNA or RNA structure in PDB format. Advanced options are available to control specific details of the docking process and obtain intermediate results. The web server is available at http://genesilico.pl/NPDock.
Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.
COLORADO3D is a World Wide Web server for the visual presentation of three-dimensional (3D) protein structures. COLORADO3D indicates the presence of potential errors (detected by ANOLEA, PROSAII, PROVE or VERIFY3D), identifies buried residues and depicts sequence conservations. As input, the server takes a file of Protein Data Bank (PDB) coordinates and, optionally, a multiple sequence alignment. As output, the server returns a PDB-formatted file, replacing the B-factor column with values of the chosen parameter (structure quality, residue burial or conservation). Thus, the coordinates of the analyzed protein ‘colored’ by COLORADO3D can be conveniently displayed with structure viewers such as RASMOL in order to visualize the 3D clusters of regions with common features, which may not necessarily be adjacent to each other at the amino acid sequence level. In particular, COLORADO3D may serve as a tool to judge a structure's quality at various stages of the modeling and refinement (during both experimental structure determination and homology modeling). The GeneSilico group used COLORADO3D in the fifth Critical Assessment of Techniques for Protein Structure Prediction (CASP5) to successfully identify well-folded parts of preliminary homology models and to guide the refinement of misthreaded protein sequences. COLORADO3D is freely available for academic use at http://asia.genesilico.pl/colorado3d/.
Summary: Co-crystallization experiments of proteins with nucleic acids do not guarantee that both components are present in the crystal. We have previously developed DIBER to predict crystal content when protein and DNA are present in the crystallization mix. Here, we present RIBER, which should be used when protein and RNA are in the crystallization drop. The combined RIBER/DIBER suite builds on machine learning techniques to make reliable, quantitative predictions of crystal content for non-expert users and high-throughput crystallography.
Availability: The program source code, Linux binaries and a web server are available at http://diber.iimcb.gov.pl/ RIBER/DIBER requires diffraction data to at least 3.0 Å resolution in MTZ or CIF (web server only) format. The RIBER/DIBER code is subject to the GNU Public License.
Supplementary data are available at Bioinformatics online.
The explosion of the size of the universe of known protein sequences has stimulated two complementary approaches to structural mapping of these sequences: theoretical structure prediction and experimental determination by structural genomics (SG). In this work, we assess the accuracy of structure prediction by two automated template-based structure prediction metaservers (genesilico.pl and bioinfo.pl) by measuring the structural similarity of the predicted models to corresponding experimental models determined a posteriori. Of 199 targets chosen from SG programs, the metaservers predicted the structures of about a fourth of them “correctly.” (In this case, “correct” was defined as placing more than 70% of the alpha carbon atoms in the model within 2 Å of the experimentally determined positions.) Almost all of the targets that could be modeled to this accuracy were those with an available template in the Protein Data Bank (PDB) with more than 25% sequence identity. The majority of those SG targets with lower sequence identity to structures in the PDB were not predicted by the metaservers with this accuracy. We also compared metaserver results to CASP8 results, finding that the models obtained by participants in the CASP competition were significantly better than those produced by the metaservers.
Structural characterization of protein–protein interactions at molecular level is essential to understand biological processes and identify new therapeutic opportunities. However, atomic resolution structural techniques cannot keep pace with current advances in interactomics. Low-resolution structural techniques, such as small-angle X-ray scattering (SAXS), can be applied at larger scale, but they miss atomic details. For efficient application to protein–protein complexes, low-resolution information can be combined with theoretical methods that provide energetic description and atomic details of the interactions. Here we present the pyDockSAXS web server (http://life.bsc.es/pid/pydocksaxs) that provides an automatic pipeline for modeling the structure of a protein–protein complex from SAXS data. The method uses FTDOCK to generate rigid-body docking models that are subsequently evaluated by a combination of pyDock energy-based scoring function and their capacity to describe SAXS data. The only required input files are structural models for the interacting partners and a SAXS curve. The server automatically provides a series of structural models for the complex, sorted by the pyDockSAXS scoring function. The user can also upload a previously computed set of docking poses, which opens the possibility to filter the docking solutions by potential interface residues or symmetry restraints. The server is freely available to all users without restriction.
The accurate prediction of ligand binding residues from amino acid sequences is important for the automated functional annotation of novel proteins. In the previous two CASP experiments, the most successful methods in the function prediction category were those which used structural superpositions of 3D models and related templates with bound ligands in order to identify putative contacting residues. However, whilst most of this prediction process can be automated, visual inspection and manual adjustments of parameters, such as the distance thresholds used for each target, have often been required to prevent over prediction. Here we describe a novel method FunFOLD, which uses an automatic approach for cluster identification and residue selection. The software provided can easily be integrated into existing fold recognition servers, requiring only a 3D model and list of templates as inputs. A simple web interface is also provided allowing access to non-expert users. The method has been benchmarked against the top servers and manual prediction groups tested at both CASP8 and CASP9.
The FunFOLD method shows a significant improvement over the best available servers and is shown to be competitive with the top manual prediction groups that were tested at CASP8. The FunFOLD method is also competitive with both the top server and manual methods tested at CASP9. When tested using common subsets of targets, the predictions from FunFOLD are shown to achieve a significantly higher mean Matthews Correlation Coefficient (MCC) scores and Binding-site Distance Test (BDT) scores than all server methods that were tested at CASP8. Testing on the CASP9 set showed no statistically significant separation in performance between FunFOLD and the other top server groups tested.
The FunFOLD software is freely available as both a standalone package and a prediction server, providing competitive ligand binding site residue predictions for expert and non-expert users alike. The software provides a new fully automated approach for structure based function prediction using 3D models of proteins.
Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they often restrict the number of sequence queries and/or provide a limited set of prediction methodologies. Therefore, we present a standalone protein structure prediction software package suitable for high-throughput structural genomic applications that performs all three classes of prediction methodologies: comparative modeling, fold recognition, and ab initio. This software can be deployed on a user's own high-performance computing cluster.
The pipeline consists of a Perl core that integrates more than 20 individual software packages and databases, most of which are freely available from other research laboratories. The query protein sequences are first divided into domains either by domain boundary recognition or Bayesian statistics. The structures of the individual domains are then predicted using template-based modeling or ab initio modeling. The predicted models are scored with a statistical potential and an all-atom force field. The top-scoring ab initio models are annotated by structural comparison against the Structural Classification of Proteins (SCOP) fold database. Furthermore, secondary structure, solvent accessibility, transmembrane helices, and structural disorder are predicted. The results are generated in text, tab-delimited, and hypertext markup language (HTML) formats. So far, the pipeline has been used to study viral and bacterial proteomes.
The standalone pipeline that we introduce here, unlike protein structure prediction Web servers, allows users to devote their own computing assets to process a potentially unlimited number of queries as well as perform resource-intensive ab initio structure prediction.
Protein-RNA interactions play fundamental roles in many biological processes. Understanding the molecular mechanism of protein-RNA recognition and formation of protein-RNA complexes is a major challenge in structural biology. Unfortunately, the experimental determination of protein-RNA complexes is tedious and difficult, both by X-ray crystallography and NMR. For many interacting proteins and RNAs the individual structures are available, enabling computational prediction of complex structures by computational docking. However, methods for protein-RNA docking remain scarce, in particular in comparison to the numerous methods for protein-protein docking.
We developed two medium-resolution, knowledge-based potentials for scoring protein-RNA models obtained by docking: the quasi-chemical potential (QUASI-RNP) and the Decoys As the Reference State potential (DARS-RNP). Both potentials use a coarse-grained representation for both RNA and protein molecules and are capable of dealing with RNA structures with posttranscriptionally modified residues. We compared the discriminative power of DARS-RNP and QUASI-RNP for selecting rigid-body docking poses with the potentials previously developed by the Varani and Fernandez groups.
In both bound and unbound docking tests, DARS-RNP showed the highest ability to identify native-like structures. Python implementations of DARS-RNP and QUASI-RNP are freely available for download at http://iimcb.genesilico.pl/RNP/
RNA; protein; RNP; macromolecular docking; complex modeling; structural bioinformatics
A computer program that builds crystal structure models of nucleic acid molecules is presented. It can be accessed at http://iimcb.genesilico.pl/brickworx.
Brickworx is a computer program that builds crystal structure models of nucleic acid molecules using recurrent motifs including double-stranded helices. In a first step, the program searches for electron-density peaks that may correspond to phosphate groups; it may also take into account phosphate-group positions provided by the user. Subsequently, comparing the three-dimensional patterns of the P atoms with a database of nucleic acid fragments, it finds the matching positions of the double-stranded helical motifs (A-RNA or B-DNA) in the unit cell. If the target structure is RNA, the helical fragments are further extended with recurrent RNA motifs from a fragment library that contains single-stranded segments. Finally, the matched motifs are merged and refined in real space to find the most likely conformations, including a fit of the sequence to the electron-density map. The Brickworx program is available for download and as a web server at http://iimcb.genesilico.pl/brickworx.
Brickworx; model building; nucleic acids
MODOMICS, a database devoted to the systems biology of RNA modification, has been subjected to substantial improvements. It provides comprehensive information on the chemical structure of modified nucleosides, pathways of their biosynthesis, sequences of RNAs containing these modifications and RNA-modifying enzymes. MODOMICS also provides cross-references to other databases and to literature. In addition to the previously available manually curated tRNA sequences from a few model organisms, we have now included additional tRNAs and rRNAs, and all RNAs with 3D structures in the Nucleic Acid Database, in which modified nucleosides are present. In total, 3460 modified bases in RNA sequences of different organisms have been annotated. New RNA-modifying enzymes have been also added. The current collection of enzymes includes mainly proteins for the model organisms Escherichia coli and Saccharomyces cerevisiae, and is currently being expanded to include proteins from other organisms, in particular Archaea and Homo sapiens. For enzymes with known structures, links are provided to the corresponding Protein Data Bank entries, while for many others homology models have been created. Many new options for database searching and querying have been included. MODOMICS can be accessed at http://genesilico.pl/modomics.
Computational simulation of protein-protein docking can expedite the process of molecular modeling and drug discovery. This paper reports on our new F2 Dock protocol which improves the state of the art in initial stage rigid body exhaustive docking search, scoring and ranking by introducing improvements in the shape-complementarity and electrostatics affinity functions, a new knowledge-based interface propensity term with FFT formulation, a set of novel knowledge-based filters and finally a solvation energy (GBSA) based reranking technique. Our algorithms are based on highly efficient data structures including the dynamic packing grids and octrees which significantly speed up the computations and also provide guaranteed bounds on approximation error.
The improved affinity functions show superior performance compared to their traditional counterparts in finding correct docking poses at higher ranks. We found that the new filters and the GBSA based reranking individually and in combination significantly improve the accuracy of docking predictions with only minor increase in computation time. We compared F2 Dock 2.0 with ZDock 3.0.2 and found improvements over it, specifically among 176 complexes in ZLab Benchmark 4.0, F2 Dock 2.0 finds a near-native solution as the top prediction for 22 complexes; where ZDock 3.0.2 does so for 13 complexes. F2 Dock 2.0 finds a near-native solution within the top 1000 predictions for 106 complexes as opposed to 104 complexes for ZDock 3.0.2. However, there are 17 and 15 complexes where F2 Dock 2.0 finds a solution but ZDock 3.0.2 does not and vice versa; which indicates that the two docking protocols can also complement each other.
The docking protocol has been implemented as a server with a graphical client (TexMol) which allows the user to manage multiple docking jobs, and visualize the docked poses and interfaces. Both the server and client are available for download. Server: http://www.cs.utexas.edu/~bajaj/cvc/software/f2dock.shtml. Client: http://www.cs.utexas.edu/~bajaj/cvc/software/f2dockclient.shtml.
We present a suite of programs, named CING for Common Interface for NMR Structure Generation that provides for a residue-based, integrated validation of the structural NMR ensemble in conjunction with the experimental restraints and other input data. External validation programs and new internal validation routines compare the NMR-derived models with empirical data, measured chemical shifts, distance- and dihedral restraints and the results are visualized in a dynamic Web 2.0 report. A red–orange–green score is used for residues and restraints to direct the user to those critiques that warrant further investigation. Overall green scores below ~20 % accompanied by red scores over ~50 % are strongly indicative of poorly modelled structures. The publically accessible, secure iCing webserver (https://nmr.le.ac.uk) allows individual users to upload the NMR data and run a CING validation analysis.
Electronic supplementary material
The online version of this article (doi:10.1007/s10858-012-9669-7) contains supplementary material, which is available to authorized users.
NMR; Structure validation; PDB; Errors; Quality; Protein structure
The RNA Bricks database (http://iimcb.genesilico.pl/rnabricks), stores information about recurrent RNA 3D motifs and their interactions, found in experimentally determined RNA structures and in RNA–protein complexes. In contrast to other similar tools (RNA 3D Motif Atlas, RNA Frabase, Rloom) RNA motifs, i.e. ‘RNA bricks’ are presented in the molecular environment, in which they were determined, including RNA, protein, metal ions, water molecules and ligands. All nucleotide residues in RNA bricks are annotated with structural quality scores that describe real-space correlation coefficients with the electron density data (if available), backbone geometry and possible steric conflicts, which can be used to identify poorly modeled residues. The database is also equipped with an algorithm for 3D motif search and comparison. The algorithm compares spatial positions of backbone atoms of the user-provided query structure and of stored RNA motifs, without relying on sequence or secondary structure information. This enables the identification of local structural similarities among evolutionarily related and unrelated RNA molecules. Besides, the search utility enables searching ‘RNA bricks’ according to sequence similarity, and makes it possible to identify motifs with modified ribonucleotide residues at specific positions.
Recent developments in PHENIX are reported that allow the use of reference-model torsion restraints, secondary-structure hydrogen-bond restraints and Ramachandran restraints for improved macromolecular refinement in phenix.refine at low resolution.
Traditional methods for macromolecular refinement often have limited success at low resolution (3.0–3.5 Å or worse), producing models that score poorly on crystallographic and geometric validation criteria. To improve low-resolution refinement, knowledge from macromolecular chemistry and homology was used to add three new coordinate-restraint functions to the refinement program phenix.refine. Firstly, a ‘reference-model’ method uses an identical or homologous higher resolution model to add restraints on torsion angles to the geometric target function. Secondly, automatic restraints for common secondary-structure elements in proteins and nucleic acids were implemented that can help to preserve the secondary-structure geometry, which is often distorted at low resolution. Lastly, we have implemented Ramachandran-based restraints on the backbone torsion angles. In this method, a ϕ,ψ term is added to the geometric target function to minimize a modified Ramachandran landscape that smoothly combines favorable peaks identified from nonredundant high-quality data with unfavorable peaks calculated using a clash-based pseudo-energy function. All three methods show improved MolProbity validation statistics, typically complemented by a lowered R
free and a decreased gap between R
work and R
macromolecular crystallography; low resolution; refinement; automation
Understanding the numerous functions that RNAs play in living cells depends critically on knowledge of their three-dimensional structure. Due to the difficulties in experimentally assessing structures of large RNAs, there is currently great demand for new high-resolution structure prediction methods. We present the novel method for the fully automated prediction of RNA 3D structures from a user-defined secondary structure. The concept is founded on the machine translation system. The translation engine operates on the RNA FRABASE database tailored to the dictionary relating the RNA secondary structure and tertiary structure elements. The translation algorithm is very fast. Initial 3D structure is composed in a range of seconds on a single processor. The method assures the prediction of large RNA 3D structures of high quality. Our approach needs neither structural templates nor RNA sequence alignment, required for comparative methods. This enables the building of unresolved yet native and artificial RNA structures. The method is implemented in a publicly available, user-friendly server RNAComposer. It works in an interactive mode and a batch mode. The batch mode is designed for large-scale modelling and accepts atomic distance restraints. Presently, the server is set to build RNA structures of up to 500 residues.
The CCP4 template-restraint library defines restraints for biopolymers, their modifications and ligands that are used in macromolecular structure refinement. JLigand is a graphical editor for generating descriptions of new ligands and covalent linkages.
Biological macromolecules are polymers and therefore the restraints for macromolecular refinement can be subdivided into two sets: restraints that are applied to atoms that all belong to the same monomer and restraints that are associated with the covalent bonds between monomers. The CCP4 template-restraint library contains three types of data entries defining template restraints: descriptions of monomers and their modifications, both used for intramonomer restraints, and descriptions of links for intermonomer restraints. The library provides generic descriptions of modifications and links for protein, DNA and RNA chains, and for some post-translational modifications including glycosylation. Structure-specific template restraints can be defined in a user’s additional restraint library. Here, JLigand, a new CCP4 graphical interface to LibCheck and REFMAC that has been developed to manage the user’s library and generate new monomer entries is described, as well as new entries for links and associated modifications.
macromolecular refinement; restraint library; molecular graphics
For many macromolecular NMR ensembles from the Protein Data Bank (PDB) the experiment-based restraint lists are available, while other experimental data, mainly chemical shift values, are often available from the BioMagResBank. The accuracy and precision of the coordinates in these macromolecular NMR ensembles can be improved by recalculation using the available experimental data and present-day software. Such efforts, however, generally fail on half of all NMR ensembles due to the syntactic and semantic heterogeneity of the underlying data and the wide variety of formats used for their deposition. We have combined the remediated restraint information from our NMR Restraints Grid (NRG) database with available chemical shifts from the BioMagResBank and the Common Interface for NMR structure Generation (CING) structure validation reports into the weekly updated NRG-CING database (http://nmr.cmbi.ru.nl/NRG-CING). Eleven programs have been included in the NRG-CING production pipeline to arrive at validation reports that list for each entry the potential inconsistencies between the coordinates and the available experimental NMR data. The longitudinal validation of these data in a publicly available relational database yields a set of indicators that can be used to judge the quality of every macromolecular structure solved with NMR. The remediated NMR experimental data sets and validation reports are freely available online.
The general principles behind the macromolecular crystal structure refinement program REFMAC5 are described.
This paper describes various components of the macromolecular crystallographic refinement program REFMAC5, which is distributed as part of the CCP4 suite. REFMAC5 utilizes different likelihood functions depending on the diffraction data employed (amplitudes or intensities), the presence of twinning and the availability of SAD/SIRAS experimental diffraction data. To ensure chemical and structural integrity of the refined model, REFMAC5 offers several classes of restraints and choices of model parameterization. Reliable models at resolutions at least as low as 4 Å can be achieved thanks to low-resolution refinement tools such as secondary-structure restraints, restraints to known homologous structures, automatic global and local NCS restraints, ‘jelly-body’ restraints and the use of novel long-range restraints on atomic displacement parameters (ADPs) based on the Kullback–Leibler divergence. REFMAC5 additionally offers TLS parameterization and, when high-resolution data are available, fast refinement of anisotropic ADPs. Refinement in the presence of twinning is performed in a fully automated fashion. REFMAC5 is a flexible and highly optimized refinement package that is ideally suited for refinement across the entire resolution spectrum encountered in macromolecular crystallography.
QA-RecombineIt provides a web interface to assess the quality of protein 3D structure models and to improve the accuracy of models by merging fragments of multiple input models. QA-RecombineIt has been developed for protein modelers who are working on difficult problems, have a set of different homology models and/or de novo models (from methods such as I-TASSER or ROSETTA) and would like to obtain one consensus model that incorporates the best parts into one structure that is internally coherent. An advanced mode is also available, in which one can modify the operation of the fragment recombination algorithm by manually identifying individual fragments or entire models to recombine. Our method produces up to 100 models that are expected to be on the average more accurate than the starting models. Therefore, our server may be useful for crystallographic protein structure determination, where protein models are used for Molecular Replacement to solve the phase problem. To address the latter possibility, a special feature was added to the QA-RecombineIt server. The QA-RecombineIt server can be freely accessed at http://iimcb.genesilico.pl/qarecombineit/.
Structure determination of homooligomeric proteins by NMR spectroscopy is difficult due to the lack of chemical shift perturbation data, which is very effective in restricting the binding interface in heterooligomeric systems, and the difficulty of obtaining a sufficient number of intermonomer distance restraints. Here we solved the high-resolution solution structure of the 15.4 kDa homodimer CylR2, the regulator of cytolysin production from Enterococcus faecalis, which deviates by 1.1 Å from the previously determined X-ray structure. We studied the influence of different experimental information such as long-range distances derived from paramagnetic relaxation enhancement, residual dipolar couplings, symmetry restraints and intermonomer Nuclear Overhauser Effect restraints on the accuracy of the derived structure. In addition, we show that it is useful to combine experimental information with methods of ab initio docking when the available experimental data are not sufficient to obtain convergence to the correct homodimeric structure. In particular, intermonomer distances may not be required when residual dipolar couplings are compared to values predicted on the basis of the charge distribution and the shape of ab initio docking solutions.
Electronic supplementary material
The online version of this article (doi:10.1007/s10858-007-9204-4) contains supplementary material, which is available to authorized users.
NMR solution structure; Homodimer CylR2; Paramagnetic relaxation enhancement; PALES; Residual dipolar couplings; Rigid-body docking
Restrained molecular dynamics simulations are a robust, though perhaps underused, tool for the end-stage refinement of biomolecular structures. We demonstrate their utility—using modern simulation protocols, optimized force fields, and inclusion of explicit solvent and mobile counterions—by re-investigating the solution structures of two RNA hairpins that had previously been refined using conventional techniques. The structures, both domain 5 group II intron ribozymes from yeast ai5γ and Pylaiella littoralis, share a nearly identical primary sequence yet the published 3D structures appear quite different. Relatively long restrained MD simulations using the original NMR restraint data identified the presence of a small set of violated distance restraints in one structure and a possibly incorrect trapped bulge nucleotide conformation in the other structure. The removal of problematic distance restraints and the addition of a heating step yielded representative ensembles with very similar 3D structures and much lower pairwise RMSD values. Analysis of ion density during the restrained simulations helped to explain chemical shift perturbation data published previously. These results suggest that restrained MD simulations, with proper caution, can be used to “update” older structures or aid in the refinement of new structures that lack sufficient experimental data to produce a high quality result. Notable cautions include the need for sufficient sampling, awareness of potential force field bias (such as small angle deviations with the current AMBER force fields), and a proper balance between the various restraint weights.
Electronic supplementary material
The online version of this article (doi:10.1007/s10858-012-9642-5) contains supplementary material, which is available to authorized users.
RNA structure; Molecular dynamics; Residual dipolar coupling restraints; Bulge structure; Force fields; Ion binding
Protein-protein interactions are fundamental for the majority of cellular processes and their study is of enormous biotechnological and therapeutic interest. In recent years, a variety of computational approaches to the protein-protein docking problem have been reported, with encouraging results. Most of the currently available protein-protein docking algorithms are composed of two clearly defined parts: the sampling of the rotational and translational space of the interacting molecules, and the scoring and clustering of the resulting orientations. Although this kind of strategy has shown some of the most successful results in the CAPRI blind test http://www.ebi.ac.uk/msd-srv/capri, more efforts need to be applied. Thus, the sampling protocol should generate a pool of conformations that include a sufficient number of near-native ones, while the scoring function should discriminate between near-native and non-near-native proposed conformations. On the other hand, protocols to efficiently include full flexibility on the protein structures are increasingly needed.
In these work we present new computational tools for protein-protein docking. We describe here the RotBUS (Rotation-Based Uniform Sampling) method to generate uniformly distributed sets of rigid-body docking poses, with a new fast calculation of the optimal contacting distance between molecules. We have tested the method on a standard benchmark of unbound structures and we can find near-native solutions in 100% of the cases. After applying a new fast filtering scheme based on residue-based desolvation, in combination with FTDock plus pyDock scoring, near-native solutions are found with rank ≤ 50 in 39% of the cases. Knowledge-based experimental restraints can be easily included to reduce computational times during sampling and improve success rates, and the method can be extended in the future to include flexibility of the side-chains.
This new sampling algorithm has the advantage of its high speed achieved by fast computing of the intermolecular distance based on a coarse representation of the interacting surfaces. In addition, a fast desolvation scoring permits the screening of millions of conformations at low computational cost, without compromising accuracy. The protocol presented here can be used as a framework to include restraints, flexibility and ensemble docking approaches.
Macromolecular structures are modeled by conformational optimization within experimental and knowledge-based restraints. Discrete restraint-based sampling generates high-quality structures within these restraints and facilitates further refinement in a continuous all-atom energy landscape. This approach has been used successfully for protein loop modeling, comparative modeling and electron density fitting in X-ray crystallography.
Here we present a software toolkit (Rappertk) which generalizes discrete restraint-based sampling for use in structural biology. Modular design and multi-layered architecture enables Rappertk to sample conformations of any macromolecule at many levels of detail and within a variety of experimental restraints. Performance against a Cα-tracing benchmark shows that the efficiency has not suffered despite the overhead required by this flexibility. We demonstrate the toolkit's capabilities by building high-quality β-sheets and by introducing restraint-driven sampling. RNA sampling is demonstrated by rebuilding a protein-RNA interface. Ability to construct arbitrary ligands is used in sampling protein-ligand interfaces within electron density. Finally, secondary structure and shape information derived from EM are combined to generate multiple conformations of a protein consistent with the observed density.
Through its modular design and ease of use, Rappertk enables exploration of a wide variety of interesting avenues in structural biology. This toolkit, with illustrative examples, is freely available to academic users from .