Summary: Three-dimensional RNA structure prediction and folding is of significant interest in the biological research community. Here, we present iFoldRNA, a novel web-based methodology for RNA structure prediction with near atomic resolution accuracy and analysis of RNA folding thermodynamics. iFoldRNA rapidly explores RNA conformations using discrete molecular dynamics simulations of input RNA sequences. Starting from simplified linear-chain conformations, RNA molecules (<50 nt) fold to native-like structures within half an hour of simulation, facilitating rapid RNA structure prediction. All-atom reconstruction of energetically stable conformations generates iFoldRNA predicted RNA structures. The predicted RNA structures are within 2–5 Å root mean squre deviations (RMSDs) from corresponding experimentally derived structures. RNA folding parameters including specific heat, contact maps, simulation trajectories, gyration radii, RMSDs from native state, fraction of native-like contacts are accessible from iFoldRNA. We expect iFoldRNA will serve as a useful resource for RNA structure prediction and folding thermodynamic analyses.
Supplementary information: Supplementary data are available at Bioinformatics online.
Recent approaches for predicting the three-dimensional (3D) structure of proteins such as de novo or fold recognition methods mostly rely on simplified energy potential functions and a reduced representation of the polypeptide chain. These simplifications facilitate the exploration of the protein conformational space but do not permit to capture entirely the subtle relationship that exists between the amino acid sequence and its native structure. It has been proposed that physics-based energy functions together with techniques for sampling the conformational space, e.g., Monte Carlo or molecular dynamics (MD) simulations, are better suited to the task of modelling proteins at higher resolutions than those of models obtained with the former type of methods. In this study we monitor different protein structural properties along MD trajectories to discriminate correct from erroneous models. These models are based on the sequence-structure alignments provided by our fold recognition method, FROST. We define correct models as being built from alignments of sequences with structures similar to their native structures and erroneous models from alignments of sequences with structures unrelated to their native structures.
For three test sequences whose native structures belong to the all-α, all-β and αβ classes we built a set of models intended to cover the whole spectrum: from a perfect model, i.e., the native structure, to a very poor model, i.e., a random alignment of the test sequence with a structure belonging to another structural class, including several intermediate models based on fold recognition alignments. We submitted these models to 11 ns of MD simulations at three different temperatures. We monitored along the corresponding trajectories the mean of the Root-Mean-Square deviations (RMSd) with respect to the initial conformation, the RMSd fluctuations, the number of conformation clusters, the evolution of secondary structures and the surface area of residues. None of these criteria alone is 100% efficient in discriminating correct from erroneous models. The mean RMSd, RMSd fluctuations, secondary structure and clustering of conformations show some false positives whereas the residue surface area criterion shows false negatives. However if we consider these criteria in combination it is straightforward to discriminate the two types of models.
The ability of discriminating correct from erroneous models allows us to improve the specificity and sensitivity of our fold recognition method for a number of ambiguous cases.
Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer new protein molecules of improved fold stability and biological functionality.
The goal of computational protein design is to create new protein sequences of desirable structure and biological function. Most protein design methods are developed to search for sequences with the lowest free-energy based on physics-based force fields following Anfinsen's thermodynamic hypothesis. A major obstacle of such approaches is the inaccuracy of the force-field design, which cannot accurately describe atomic interactions or correctly recognize protein folds. We propose a novel method which uses evolutionary information, in the form of sequence profiles from structure families, to guide the sequence design. Since sequence profiles are generally more accurate than physics-based potentials in protein fold recognition, a unique advantage lies on that it targets the design procedure to a family of protein sequence profiles to enhance the robustness of designed sequences. The method was tested on 87 proteins and the designed sequences can be folded by I-TASSER to models with an average RMSD 2.1 Å. As a case study of large-scale application, the method is extended to redesign all structurally resolved proteins in the human pathogenic bacteria, Mycobacterium tuberculosis. Five sequences varying in fold and sizes were characterized by circular dichroism and NMR spectroscopy experiments and three were shown to have ordered tertiary structure.
Single-molecule fluorescence experiments reveal how DEAD-box proteins unfold structured RNAs to promote conformational transitions and refolding to the native functional state.
DEAD-box helicase proteins accelerate folding and rearrangements of highly structured RNAs and RNA–protein complexes (RNPs) in many essential cellular processes. Although DEAD-box proteins have been shown to use ATP to unwind short RNA helices, it is not known how they disrupt RNA tertiary structure. Here, we use single molecule fluorescence to show that the DEAD-box protein CYT-19 disrupts tertiary structure in a group I intron using a helix capture mechanism. CYT-19 binds to a helix within the structured RNA only after the helix spontaneously loses its tertiary contacts, and then CYT-19 uses ATP to unwind the helix, liberating the product strands. Ded1, a multifunctional yeast DEAD-box protein, gives analogous results with small but reproducible differences that may reflect its in vivo roles. The requirement for spontaneous dynamics likely targets DEAD-box proteins toward less stable RNA structures, which are likely to experience greater dynamic fluctuations, and provides a satisfying explanation for previous correlations between RNA stability and CYT-19 unfolding efficiency. Biologically, the ability to sense RNA stability probably biases DEAD-box proteins to act preferentially on less stable misfolded structures and thereby to promote native folding while minimizing spurious interactions with stable, natively folded RNAs. In addition, this straightforward mechanism for RNA remodeling does not require any specific structural environment of the helicase core and is likely to be relevant for DEAD-box proteins that promote RNA rearrangements of RNP complexes including the spliceosome and ribosome.
In addition to carrying genetic information from DNA to protein, RNAs function in many essential cellular processes. This often requires the RNA to form a specific three-dimensional structure, and some functions require cycling between multiple structures. However, RNAs have a strong propensity to become trapped in nonfunctional, misfolded structures. Due to the intrinsic stability of local structure for RNA, these misfolded species can be long-lived and therefore accumulate. ATP-dependent RNA chaperone proteins called DEAD-box proteins are known to promote native RNA folding by disrupting misfolded structures. Although these proteins can unwind short RNA helices, the mechanism by which they act upon higher order tertiary contacts is unknown. Our current work shows that DEAD-box proteins capture transiently exposed RNA helices, preventing any tertiary contacts from reforming and potentially destabilizing the global RNA architecture. Helix unwinding by the DEAD-box protein then allows the product RNA strands to form new contacts. This helix capture mechanism for manipulation of RNA tertiary structure does not require a specific binding motif or structural environment and may be general for DEAD-box helicase proteins that act on structured RNAs.
A key component in protein structure prediction is a scoring or discriminatory function that can distinguish near-native conformations from misfolded ones. Various types of scoring functions have been developed to accomplish this goal, but their performance is not adequate to solve the structure selection problem. In addition, there is poor correlation between the scores and the accuracy of the generated conformations.
We present a simple and nonparametric formula to estimate the accuracy of predicted conformations (or decoys). This scoring function, called the density score function, evaluates decoy conformations by performing an all-against-all Cα RMSD (Root Mean Square Deviation) calculation in a given decoy set. We tested the density score function on 83 decoy sets grouped by their generation methods (4state_reduced, fisa, fisa_casp3, lmds, lattice_ssfit, semfold and Rosetta). The density scores have correlations as high as 0.9 with the Cα RMSDs of the decoy conformations, measured relative to the experimental conformation for each decoy.
We previously developed a residue-specific all-atom probability discriminatory function (RAPDF), which compiles statistics from a database of experimentally determined conformations, to aid in structure selection. Here, we present a decoy-dependent discriminatory function called self-RAPDF, where we compiled the atom-atom contact probabilities from all the conformations in a decoy set instead of using an ensemble of native conformations, with a weighting scheme based on the density scores. The self-RAPDF has a higher correlation with Cα RMSD than RAPDF for 76/83 decoy sets, and selects better near-native conformations for 62/83 decoy sets. Self-RAPDF may be useful not only for selecting near-native conformations from decoy sets, but also for fold simulations and protein structure refinement.
Both the density score and the self-RAPDF functions are decoy-dependent scoring functions for improved protein structure selection. Their success indicates that information from the ensemble of decoy conformations can be used to derive statistical probabilities and facilitate the identification of near-native structures.
A network analysis is used to uncover hidden folding pathways in free-energy landscapes usually defined in terms of such arbitrary order parameters as root-mean-square deviation from the native structure, radius of gyration, etc. The analysis has been applied to molecular dynamics (MD) trajectories of the B-domain of staphylococcal protein A, generated with the coarse-grained united-residue (UNRES) force field in a broad range of temperatures (270K ≤ T ≤ 325K). Thousands of folding pathways have been identified at each temperature. Out of these many folding pathways, several most probable ones were selected for investigation of the conformational transitions during protein folding. Unlike other conformational space network (CSN) methods, a node in the CSN variant implemented in this work is defined according to the nativelikeness class of the structure, which defines the similarity of segments of the compared structures in terms of secondary-structure, contact-pattern, and local geometry, as well as the overall geometric similarity of the conformation under consideration to that of the reference (experimental) structure. Our previous findings, regarding the folding model and conformations found at the folding-transition temperature for protein A (Maisuradze et al., J. Am. Chem. Soc. 132, 9444, 2010), were confirmed by the conformational space network analysis. In the methodology and in the analysis of the results, the shortest path identified by using the shortest-path algorithm corresponds to the most probable folding pathway in the conformational space network.
A variety of coarse-grained (CG) models exists for simulation of proteins. An outstanding problem is the construction of a CG model with physically accurate conformational energetics rivaling all-atom force fields. In the present work, atomistic simulations of peptide folding and aggregation equilibria are force-matched using multiscale coarse-graining to develop and test a CG interaction potential of general utility for the simulation of proteins of arbitrary sequence. The reduced representation relies on multiple interaction sites to maintain the anisotropic packing and polarity of individual sidechains. CG energy landscapes computed from replica exchange simulations of the folding of Trpzip, Trp-cage and adenylate kinase resemble those of other reduced representations; non-native structures are observed with energies similar to those of the native state. The artifactual stabilization of misfolded states implies that non-native interactions play a deciding role in deviations from ideal funnel-like cooperative folding. The role of surface tension, backbone hydrogen bonding and the smooth pairwise CG landscape is discussed. Ab initio folding aside, the improved treatment of sidechain rotamers results in stability of the native state in constant temperature simulations of Trpzip, Trp-cage, and the open to closed conformational transition of adenylate kinase, illustrating the potential value of the CG force field for simulating protein complexes and transitions between well-defined structural states.
Biological function originates from the dynamical motions of proteins in response to cellular stimuli. Protein dynamics arise from physical interactions that are well-predicted by detailed atomistic simulations. In order to examine large protein complexes on long timescales of biological importance, however, coarse-grained simulation approaches are needed to complement experiment. Previous coarse-grained models have proved successful for investigations involving a given protein's native structure, including protein folding and structure prediction. We construct a model capable of simulating proteins regardless of their sequence or structure. The present coarse-grained model was, however, developed rigorously from the underlying atomistic forces as opposed to knowledge-based or ad hoc parameterizations. Examination of the model predictions on various accessible timescales reveals successes and limitations of the model. While functionally relevant conformational transitions can be studied, the coarse-grained representation has some difficulty with the ab initio folding of the peptide chain into its proper structure. Our observations highlight the complex molecular nature of a protein's underlying energy landscape, offering rigorous insight into the information missing in reduced representations of the peptide chain. With these caveats in mind, the physical interaction–based, coarse-grained model will find application in simulations of a wide variety of proteins and continue to guide future coarse-graining efforts.
Trp-cage is a designed 20-residue polypeptide that, in spite of its size, shares several features with larger globular proteins. Although the system has been intensively investigated experimentally and theoretically, its folding mechanism is not yet fully understood. Indeed, some experiments suggest a two-state behavior, while others point to the presence of intermediates. In this work we show that the results of a bias-exchange metadynamics simulation can be used for constructing a detailed thermodynamic and kinetic model of the system. The model, although constructed from a biased simulation, has a quality similar to those extracted from the analysis of long unbiased molecular dynamics trajectories. This is demonstrated by a careful benchmark of the approach on a smaller system, the solvated Ace-Ala3-Nme peptide. For the Trp-cage folding, the model predicts that the relaxation time of 3100 ns observed experimentally is due to the presence of a compact molten globule-like conformation. This state has an occupancy of only 3% at 300 K, but acts as a kinetic trap. Instead, non-compact structures relax to the folded state on the sub-microsecond timescale. The model also predicts the presence of a state at of 4.4 Å from the NMR structure in which the Trp strongly interacts with Pro12. This state can explain the abnormal temperature dependence of the and chemical shifts. The structures of the two most stable misfolded intermediates are in agreement with NMR experiments on the unfolded protein. Our work shows that, using biased molecular dynamics trajectories, it is possible to construct a model describing in detail the Trp-cage folding kinetics and thermodynamics in agreement with experimental data.
Understanding the mechanism by which proteins find their folded state is a holy grail of computational biology. Accurate all-atom simulations have the potential to describe such a process in great detail, but, unfortunately, folding of most proteins takes place on a time scale that is still not accessible to routine computer simulations. We introduce here an approach that allows for constructing an accurate kinetic and thermodynamic model of folding (or other complex biological processes) using trajectories in which the process under investigation is forced to happen in a short simulation time by an appropriate external bias. An important strength of this approach is the possibility of identifying and characterizing misfolded conformations that, in some proteins, are related to important diseases. We use this method to study the folding of Trp-cage, predicting the structure of the folded state and the presence of several intermediates. We find that, surprisingly, fully unstructured “unfolded” states relax towards the folded conformation rather quickly. The slowest relaxation time of the system is instead related to the equilibration between the folded state and another compact structure that acts as a kinetic trap. Thus, the experimental folding time would be determined primarily by this process.
It has long been proposed that much of the information encoding how a protein folds is contained locally in the peptide chain. Here we present a large-scale simulation study designed to examine the extent to which conformations of peptide fragments in water predict native conformations in proteins. We perform replica exchange molecular dynamics (REMD) simulations of 872 8-mer, 12-mer, and 16-mer peptide fragments from 13 proteins using the AMBER 96 force field and the OBC implicit solvent model. To analyze the simulations, we compute various contact-based metrics, such as contact probability, and then apply Bayesian classifier methods to infer which metastable contacts are likely to be native vs. non-native. We find that a simple measure, the observed contact probability, is largely more predictive of a peptide's native structure in the protein than combinations of metrics or multi-body components. Our best classification model is a logistic regression model that can achieve up to 63% correct classifications for 8-mers, 71% for 12-mers, and 76% for 16-mers. We validate these results on fragments of a protein outside our training set. We conclude that local structure provides information to solve some but not all of the conformational search problem. These results help improve our understanding of folding mechanisms, and have implications for improving physics-based conformational sampling and structure prediction using all-atom molecular simulations.
Proteins must fold to unique native structures in order to perform their functions. To do this, proteins must solve a complicated conformational search problem, the details of which remain difficult to study experimentally. Predicting folding pathways and the mechanisms by which proteins fold is thus central to understanding how proteins work. One longstanding question is the extent to which proteins solve the search problem locally, by folding into sub-structures that are dictated primarily by local sequence. Here, we address this question by conducting a large-scale molecular dynamics simulation study of protein fragments in water. The simulation data was then used to optimize a statistical model that predicted native and non-native contacts. The performance of the resulting model suggests that local structuring provides some but not all of the information to solve the folding problem, and that molecular dynamics simulation of fragments can be useful for protein structure prediction and design.
The folding of the B-domain of staphylococcal protein A has been studied by coarse-grained canonical and multiplexed replica-exchange molecular dynamics simulations with the UNRES force field in a broad range of temperatures (270K ≤ T ≤ 350K). In canonical simulations, the folding was found to occur either directly to the native state or through kinetic traps, mainly the topological mirror image of the native three-helix bundle. The latter folding scenario was observed more frequently at low temperatures. With increase of temperature, the frequency of the transitions between the folded and misfolded/unfolded states increased and the folded state became more diffuse with conformations exhibiting increased root-mean-square deviations from the experimental structure (from about 4 Å at T = 300K to 8.7 Å at T = 325K). An analysis of the equilibrium conformational ensemble determined from multiplexed replica exchange simulations at the folding-transition temperature (Tf = 325K) showed that the conformational ensemble at this temperature is a collection of conformations with residual secondary structures, which possess native or near-native clusters of nonpolar residues in place, and not a 50%-50% mixture of fully-folded and fully-unfolded conformations. These findings contradict the quasi-chemical picture of two- or multi-state protein folding, which assumes an equilibrium between the folded, unfolded, and intermediate states, with equilibrium shifting with temperature but with the native conformations remaining essentially unchanged. Our results also suggest that long-range hydrophobic contacts are the essential factor to keep the structure of a protein thermally stable.
protein folding; folding/unfolding transition; coarse-grained dynamics; conformational ensemble
We describe here the PRIMO (PRotein Intermediate Model) force field, a physics-based fully transferable additive coarse-grained potential energy function that is compatible with an all-atom force field for multi-scale simulations. The energy function consists of standard molecular dynamics energy terms plus a hydrogen-bonding potential term and is mainly parameterized based on the CHARMM22/CMAP force field in a bottom-up fashion. The solvent is treated implicitly via the generalized Born model. The bonded interactions are either harmonic or distance-based spline interpolated potentials. These potentials are defined on the basis of all-atom molecular dynamics (MD) simulations of dipeptides with the CHARMM22/CMAP force field. The non-bonded parameters are tuned by matching conformational free energies of diverse set of conformations with that of CHARMM all-atom results. PRIMO is designed to provide a correct description of conformational distribution of the backbone (ϕ/ψ) and side chains (χ1) for all amino acids with a CMAP correction term. The CMAP potential in PRIMO is optimized based on the new CHARMM C36 CMAP. The resulting optimized force field has been applied in MD simulations of several proteins of 36–155 amino acids and shown that the root-mean-squared-deviation of the average structure from the corresponding crystallographic structure varies between 1.80 and 4.03 Å. PRIMO is shown to fold several small peptides to their native-like structures from extended conformations. These results suggest the applicability of the PRIMO force field in the study of protein structures in aqueous solution, structure predictions as well as ab initio folding of small peptides.
Coarse-grain; force field; implicit solvent; molecular dynamics; replica exchange
Using a combined master equation and kinetic cluster approach, we investigate RNA pseudoknot folding and unfolding kinetics. The energetic parameters are computed from a recently developed Vfold model for RNA secondary structure and pseudoknot folding thermodynamics. The folding kinetics theory is based on the complete conformational ensemble, including all the native-like and non-native states. The predicted folding and unfolding pathways, activation barriers, Arrhenius plots, and rate-limiting steps lead to several findings. First, for the PK5 pseudoknot, a misfolded 5′ hairpin emerges as a stable kinetic trap in the folding process, and the detrapping from this misfolded state is the rate-limiting step for the overall folding process. The calculated rate constant and activation barrier agree well with the experimental data. Second, as an application of the model, we investigate the kinetic folding pathways for hTR (human Telomerase RNA) pseudoknot. The predicted folding and unfolding pathways not only support the proposed role of conformational switch between hairpin and pseudoknot in hTR activity, but also reveal molecular mechanism for the conformational switch. Furthermore, for an experimentally studied hTR mutation, whose hairpin intermediate is destabilized, the model predicts a long-lived transient hairpin structure, and the switch between the transient hairpin intermediate and the native pseudoknot may be responsible for the observed hTR activity. Such finding would help resolve the apparent contradiction between the observed hTR activity and the absence of a stable hairpin.
Kinetics; RNA pseudoknot; Activation energy; Misfolded state; Telomerase
Knowledge of all residue-residue contacts within a protein allows determination of the protein fold. Accurate prediction of even a subset of long-range contacts (contacts between amino acids far apart in sequence) can be instrumental for determining tertiary structure. Here we present BCL::Contact, a novel contact prediction method that utilizes artificial neural networks (ANNs) and specializes in the prediction of medium to long-range contacts. BCL::Contact comes in two modes: sequence-based and structure-based. The sequence-based mode uses only sequence information and has individual ANNs specialized for helix-helix, helix-strand, strand-helix, strand-strand, and sheet-sheet contacts. The structure-based mode combines results from 32-fold recognition methods with sequence information to a consensus prediction. The two methods were presented in the 6th and 7th Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiments. The present work focuses on elucidating the impact of fold recognition results onto contact prediction via a direct comparison of both methods on a joined benchmark set of proteins. The sequence-based mode predicted contacts with 42% accuracy (7% false positive rate), while the structure-based mode achieved 45% accuracy (2% false positive rate). Predictions by both modes of BCL::Contact were supplied as input to the protein tertiary structure prediction program Rosetta for a benchmark of 17 proteins with no close sequence homologs in the protein data bank (PDB). Rosetta created higher accuracy models, signified by an improvement of 1.3 Å on average root mean square deviation (RMSD), when driven by the predicted contacts. Further, filtering Rosetta models by agreement with the predicted contacts enriches for native-like fold topologies.
CASP; computational structural biology; contact prediction; structure prediction
The increasing importance of non-coding RNA in biology and medicine has led to a growing interest in the problem of RNA 3-D structure prediction. As is the case for proteins, RNA 3-D structure prediction methods require two key ingredients: an accurate energy function and a conformational sampling procedure. Both are only partly solved problems. Here, we focus on the problem of conformational sampling. The current state of the art solution is based on fragment assembly methods, which construct plausible conformations by stringing together short fragments obtained from experimental structures. However, the discrete nature of the fragments necessitates the use of carefully tuned, unphysical energy functions, and their non-probabilistic nature impairs unbiased sampling. We offer a solution to the sampling problem that removes these important limitations: a probabilistic model of RNA structure that allows efficient sampling of RNA conformations in continuous space, and with associated probabilities. We show that the model captures several key features of RNA structure, such as its rotameric nature and the distribution of the helix lengths. Furthermore, the model readily generates native-like 3-D conformations for 9 out of 10 test structures, solely using coarse-grained base-pairing information. In conclusion, the method provides a theoretical and practical solution for a major bottleneck on the way to routine prediction and simulation of RNA structure and dynamics in atomic detail.
The importance of RNA in biology and medicine has increased immensely over the last several years, due to the discovery of a wide range of important biological processes that are under the guidance of non-coding RNA. As is the case with proteins, the function of an RNA molecule is encoded in its three-dimensional (3-D) structure, which in turn is determined by the molecule's sequence. Therefore, interest in the computational prediction of the 3-D structure of RNA from sequence is great. One of the main bottlenecks in routine prediction and simulation of RNA structure and dynamics is sampling, the efficient generation of RNA-like conformations, ideally in a mathematically and physically sound way. Current methods require the use of unphysical energy functions to amend the shortcomings of the sampling procedure. We have developed a mathematical model that describes RNA's conformational space in atomic detail, without the shortcomings of other sampling methods. As an illustration of its potential, we describe a simple yet efficient method to sample conformations that are compatible with a given secondary structure. An implementation of the sampling method, called BARNACLE, is freely available.
The accurate prediction of an RNAs three dimensional structure from its “primary structure” will have a tremendous influence on the experimental design and its interpretation, and ultimately our understanding of the many functions of RNA. This paper presents a general coarse-grained (CG) potential for modeling RNA 3-D structures. Each nucleotide is represented by five pseudo atoms, two for the backbone (one for the phosphate and another for the sugar), and three for the base to represent base-stacking interactions. The CG potential has been parameterized from statistical analysis of 688 RNA experimental structures. Molecular dynamic simulations of 15 RNA molecules with the length of 12 to 27 nucleotides have been performed using the CG potential, with performance comparable to that from all-atom simulations. For ~75% of systems tested, simulated annealing led to native-like structures at least once out of multiple repeated runs. Furthermore, with weak distance restraints based on the knowledge of three to five canonical Watson-Crick pairs, all 15 RNAs tested are successfully folded to within 6.5 Å of native structures using the CG potential and simulated annealing. The results reveal that with a limited secondary structure model, the current CG potential can reliably predict the 3-D structures for small RNA molecules. We also explored an all-atom force field to construct atomic structures from the CG simulations.
Coarse-Grained Model; RNA structure; 3-D structure prediction; Molecular Dynamics
Self-cleavage assays of RNA folding reveal that mRNA structures fold sequentially in vitro and in vivo, but exchange between adjacent structures is much faster in vivo than it is in vitro.
RNAs adopt defined structures to perform biological activities, and conformational transitions among alternative structures are critical to virtually all RNA-mediated processes ranging from metabolite-activation of bacterial riboswitches to pre-mRNA splicing and viral replication in eukaryotes. Mechanistic analysis of an RNA folding reaction in a biological context is challenging because many steps usually intervene between assembly of a functional RNA structure and execution of a biological function. We developed a system to probe mechanisms of secondary structure folding and exchange directly in vivo using self-cleavage to monitor competition between mutually exclusive structures that promote or inhibit ribozyme assembly. In previous work, upstream structures were more effective than downstream structures in blocking ribozyme assembly during transcription in vitro, consistent with a sequential folding mechanism. However, upstream and downstream structures blocked ribozyme assembly equally well in vivo, suggesting that intracellular folding outcomes reflect thermodynamic equilibration or that annealing of contiguous sequences is favored kinetically. We have extended these studies to learn when, if ever, thermodynamic stability becomes an impediment to rapid equilibration among alternative RNA structures in vivo. We find that a narrow thermodynamic threshold determines whether kinetics or thermodynamics govern RNA folding outcomes in vivo. mRNA secondary structures fold sequentially in vivo, but exchange between adjacent secondary structures is much faster in vivo than it is in vitro. Previous work showed that simple base-paired RNA helices dissociate at similar rates in vivo and in vitro so exchange between adjacent structures must occur through a different mechanism, one that likely involves facilitation of branch migration by proteins associated with nascent transcripts.
Properly folded RNAs are critical for virtually all RNA-mediated processes ranging from feedback regulation of gene expression to RNA maturation. The ability of RNAs to adopt specific structures in living cells is remarkable given their propensity to become trapped in a mixture of stable, misfolded structures in vitro. Using mRNA with an inserted ribozyme and self-cleavage to monitor competition between mutually exclusive structures, we previously showed that upstream structures dominated folding outcomes during RNA synthesis in vitro, suggesting that folding occurs sequentially. However, when studied in vivo upstream and downstream structures blocked ribozyme assembly equally well in yeast, providing evidence that intracellular folding outcomes reflect the relative stability of alternative structures. We find that very stable upstream structures can block assembly of downstream structures in vivo even when the downstream structures are more stable, and that a narrow threshold of stability determines whether folding and unfolding rates or thermodynamic stability govern folding outcomes. Thus, mRNAs fold sequentially in vitro and in vivo but exchange between adjacent structures is faster in vivo than in vitro. Simple RNA structures unfold at similar rates in vivo and in vitro, so exchange between adjacent structures in vivo probably occurs through a distinct, step-wise mechanism that could be facilitated by proteins associated with nascent RNAs.
Computer generated trajectories can, in principle, reveal the folding pathways of a protein at atomic resolution and possibly suggest general and simple rules for predicting the folded structure of a given sequence. While such reversible folding trajectories can only be determined ab initio using all-atom transferable force-fields for a few small proteins, they can be determined for a large number of proteins using coarse-grained and structure-based force-fields, in which a known folded structure is by construction the absolute energy and free-energy minimum. Here we use a model of the fast folding helical λ-repressor protein to generate trajectories in which native and non-native states are in equilibrium and transitions are accurately sampled. Yet, representation of the free-energy surface, which underlies the thermodynamic and dynamic properties of the protein model, from such a trajectory remains a challenge. Projections over one or a small number of arbitrarily chosen progress variables often hide the most important features of such surfaces. The results unequivocally show that an unprojected representation of the free-energy surface provides important and unbiased information and allows a simple and meaningful description of many-dimensional, heterogeneous trajectories, providing new insight into the possible mechanisms of fast-folding proteins.
The process of protein folding is a complex transition from a disordered to an ordered state. Here, we simulate a specific fast-folding protein at the point at which the native and denatured states are at equilibrium and show that obtaining an accurate description of the mechanisms of folding and unfolding is far from trivial. Using simple quantities which quantify the degree of native order is, in the case of this protein, clearly misleading. We show that an unbiased representation of the free-energy surface can be obtained; using such a representation we are able to redesign the landscape and thus modify, upon site-specific “mutations”, the folding and unfolding rates. This leads us to formulate a hypothesis to explain the very fast folding of many proteins.
A novel protocol for all-atom RNA tertiary structure prediction is presented that employs restrained molecular mechanics and simulated annealing. The restraints are from secondary structure, co-variation analysis, coaxial stacking predictions for helices in junctions, and, when available, cross-linking data. Results are demonstrated on the Alu domain of the mammalian signal recognition particle RNA, the Saccharomyces cerevisiae phenylalanine tRNA, the hammerhead ribozyme, the hepatitis C virus internal ribosomal entry site, and the P4-P6 domain of the Tetrahymena thermophila group I intron. The predicted structure is selected from a pool of decoy structures with a score that maximizes radius of gyration and base-base contacts, which was empirically found to select higher quality decoys. This simple ab initio approach is sufficient to make good predictions of the structure of RNAs compared to current crystal structures using both root mean square deviation and the accuracy of base-base contacts.
An RNA secondary structure is locally optimal if there is no lower energy structure that can be obtained by the addition or removal of a single base pair, where energy is defined according to the widely accepted Turner nearest neighbor model. Locally optimal structures form kinetic traps, since any evolution away from a locally optimal structure must involve energetically unfavorable folding steps. Here, we present a novel, efficient algorithm to compute the partition function over all locally optimal secondary structures of a given RNA sequence. Our software, RNAlocopt runs in time and space. Additionally, RNAlocopt samples a user-specified number of structures from the Boltzmann subensemble of all locally optimal structures. We apply RNAlocopt to show that (1) the number of locally optimal structures is far fewer than the total number of structures – indeed, the number of locally optimal structures approximately equal to the square root of the number of all structures, (2) the structural diversity of this subensemble may be either similar to or quite different from the structural diversity of the entire Boltzmann ensemble, a situation that depends on the type of input RNA, (3) the (modified) maximum expected accuracy structure, computed by taking into account base pairing frequencies of locally optimal structures, is a more accurate prediction of the native structure than other current thermodynamics-based methods. The software RNAlocopt constitutes a technical breakthrough in our study of the folding landscape for RNA secondary structures. For the first time, locally optimal structures (kinetic traps in the Turner energy model) can be rapidly generated for long RNA sequences, previously impossible with methods that involved exhaustive enumeration. Use of locally optimal structure leads to state-of-the-art secondary structure prediction, as benchmarked against methods involving the computation of minimum free energy and of maximum expected accuracy. Web server and source code available at http://bioinformatics.bc.edu/clotelab/RNAlocopt/.
The reliable prediction of protein tertiary structure from the amino acid sequence remains challenging even for small proteins. We have developed an all-atom free-energy protein forcefield (PFF01) that we could use to fold several small proteins from completely extended conformations. Because the computational cost of de-novo folding studies rises steeply with system size, this approach is unsuitable for structure prediction purposes. We therefore investigate here a low-cost free-energy relaxation protocol for protein structure prediction that combines heuristic methods for model generation with all-atom free-energy relaxation in PFF01.
We use PFF01 to rank and cluster the conformations for 32 proteins generated by ROSETTA. For 22/10 high-quality/low quality decoy sets we select near-native conformations with an average Cα root mean square deviation of 3.03 Å/6.04 Å. The protocol incorporates an inherent reliability indicator that succeeds for 78% of the decoy sets. In over 90% of these cases near-native conformations are selected from the decoy set. This success rate is rationalized by the quality of the decoys and the selectivity of the PFF01 forcefield, which ranks near-native conformations an average 3.06 standard deviations below that of the relaxed decoys (Z-score).
All-atom free-energy relaxation with PFF01 emerges as a powerful low-cost approach toward generic de-novo protein structure prediction. The approach can be applied to large all-atom decoy sets of any origin and requires no preexisting structural information to identify the native conformation. The study provides evidence that a large class of proteins may be foldable by PFF01.
Computational methods for predicting evolutionarily conserved rather than thermodynamic RNA structures have recently attracted increased interest. These methods are indispensable not only for elucidating the regulatory roles of known RNA transcripts, but also for predicting RNA genes. It has been notoriously difficult to devise them to make the best use of the available data and to predict high-quality RNA structures that may also contain pseudoknots. We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences. We also present an implementation of the framework in a new computer program, called SimulFold, which employs a Bayesian Markov chain Monte Carlo method to sample from the joint posterior distribution of RNA structures, alignments, and trees. We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs. We also present preliminary data that show SimulFold's potential as an alignment and phylogeny prediction method. SimulFold overcomes many conceptual limitations that current RNA structure prediction methods face, introduces several new theoretical techniques, and generates high-quality predictions of conserved RNA structures that may include pseudoknots. It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.
Not only is the prediction of evolutionarily conserved RNA structures important for elucidating the potential functions of RNA sequences and the mechanisms by which these functions are exerted, but it also lies at the core of RNA gene prediction. To get an accurate prediction of the conserved RNA structure, we need a high-quality sequence alignment and an evolutionary tree relating several evolutionarily related sequences. These are two strong requirements that are typically difficult to fulfill unless the encoded RNA structure is already known. We present what is to our knowledge the first method that solves this chicken-and-egg problem by co-estimating all three quantities simultaneously. We show that our novel method, called SimulFold, can be successfully applied over a wide range of sequence similarities to detect conserved RNA structures, including those with pseudoknots. We also show its potential as an alignment and phylogeny prediction method. Our method overcomes several significant limitations of existing methods and has the potential to be used for a very diverse range of tasks.
Repeat-proteins are made up of near repetitions of 20– to 40–amino acid stretches. These polypeptides usually fold up into non-globular, elongated architectures that are stabilized by the interactions within each repeat and those between adjacent repeats, but that lack contacts between residues distant in sequence. The inherent symmetries both in primary sequence and three-dimensional structure are reflected in a folding landscape that may be analyzed as a quasi–one-dimensional problem. We present a general description of repeat-protein energy landscapes based on a formal Ising-like treatment of the elementary interaction energetics in and between foldons, whose collective ensemble are treated as spin variables. The overall folding properties of a complete “domain” (the stability and cooperativity of the repeating array) can be derived from this microscopic description. The one-dimensional nature of the model implies there are simple relations for the experimental observables: folding free-energy (ΔGwater) and the cooperativity of denaturation (m-value), which do not ordinarily apply for globular proteins. We show how the parameters for the “coarse-grained” description in terms of foldon spin variables can be extracted from more detailed folding simulations on perfectly funneled landscapes. To illustrate the ideas, we present a case-study of a family of tetratricopeptide (TPR) repeat proteins and quantitatively relate the results to the experimentally observed folding transitions. Based on the dramatic effect that single point mutations exert on the experimentally observed folding behavior, we speculate that natural repeat proteins are “poised” at particular ratios of inter- and intra-element interaction energetics that allow them to readily undergo structural transitions in physiologically relevant conditions, which may be intrinsically related to their biological functions.
Repeat-proteins are coded in repetitions of similar amino acid stretches. Unlike typical globular domains, repeat-protein domains fold into elongated superhelical shapes of stacked elements, stabilized only by interactions within each repeat or between adjacent repeats. This architecture allows folding to be treated as a quasi–one-dimensional problem. We introduce an analytical model that describes the folding energy landscape of repeat-proteins, based on a representation in terms of spin variables. This representation groups together conformations on the basis of the degree of order in local quasi-independent folding units, often called foldons. We derive simple relations between the experimentally observed stability and cooperativity of denaturation of the whole repeat-domain, which differ from those found in three-dimensionally connected globular proteins. Folding simulations on perfectly funneled landscapes reproduce these relations. We document that these relations are experimentally observed in a variety of repeat-protein systems. We show the parameters in the foldon spin description can be predicted on the basis, largely, of protein topology, reflecting the funneled energy landscape.
Due to the energetic frustration of RNA folding, tertiary structured RNA is typically characterized by a rugged folding free energy landscape where deep kinetic barriers separate numerous misfolded states from one or more native states. While most in vitro studies of RNA rely on (re)folding chemically and/or enzymatically synthesized RNA in its entirety, which frequently leads into kinetic traps, nature reduces the complexity of the RNA folding problem by segmental, co-transcriptional folding starting from the 5′ end. We here have developed a simplified, general, nondenaturing purification protocol for RNA to ask whether avoiding denaturation of a co-transcriptionally folded RNA can reduce commonly observed in vitro folding heterogeneity. Our protocol bypasses the need for large-scale auxiliary protein purification and expensive chromatographic equipment and involves rapid affinity capture with magnetic beads and removal of chemical heterogeneity by cleavage of the target RNA from the beads using the ligand-induced glmS ribozyme. For two disparate model systems, the Varkud satellite (VS) and hepatitis delta virus (HDV) ribozymes, we achieve >95% conformational purity within one hour of enzymatic transcription, without the need for any folding chaperones. We further demonstrate that in vitro refolding introduces severe conformational heterogeneity into the natively-purified VS ribozyme but not into the compact, double-nested pseudoknot fold of the HDV ribozyme. We conclude that conformational heterogeneity in complex RNAs can be avoided by co-transcriptional folding followed by nondenaturing purification, providing rapid access to chemically and conformationally pure RNA for biologically relevant biochemical and biophysical studies.
Peptides often have conformational preferences. We simulated 133 peptide 8-mer fragments from six different proteins, sampled by replica-exchange molecular dynamics using Amber7 with a GB/SA (generalized-Born/solvent-accessible electrostatic approximation to water) implicit solvent. We found that 85 of the peptides have no preferred structure, while 48 of them converge to a preferred structure. In 85% of the converged cases (41 peptides), the structures found by the simulations bear some resemblance to their native structures, based on a coarse-grained backbone description. In particular, all seven of the β hairpins in the native structures contain a fragment in the turn that is highly structured. In the eight cases where the bioinformatics-based I-sites library picks out native-like structures, the present simulations are largely in agreement. Such physics-based modeling may be useful for identifying early nuclei in folding kinetics and for assisting in protein-structure prediction methods that utilize the assembly of peptide fragments.
To carry out specific biochemical reactions, proteins must adopt precise three-dimensional conformations. During the folding of a protein, the protein picks out the right conformation out of billions of other conformations. It is not yet possible to do this computationally. Picking out the native conformation using physics-based atomically detailed models, sampled by molecular dynamics, is presently beyond the reach of computer methods. How can we speed up computational protein-structure prediction? One idea is that proteins start folding at specific parts of a chain that kink up early in the folding process. If we can identify these kinks, we should be able to speed up protein-structure prediction. Previous studies have identified likely kinks through bioinformatic analysis of existing protein structures. The goal of the authors here is to identify these putative folding initiation sites with a physical model instead. In this study, Ho and Dill show that, by chopping a protein chain into peptide pieces, then simulating the pieces in molecular dynamics, they can identify those peptide fragments that have conformational biases. These peptides identify the kinks in the protein chain.
Predicting 3-dimensional protein structures from amino-acid sequences is an important unsolved problem in computational structural biology. The problem becomes relatively easier if close homologous proteins have been solved, as high-resolution models can be built by aligning target sequences to the solved homologous structures. However, for sequences without similar folds in the Protein Data Bank (PDB) library, the models have to be predicted from scratch. Progress in the ab initio structure modeling is slow. The aim of this study was to extend the TASSER (threading/assembly/refinement) method for the ab initio modeling and examine systemically its ability to fold small single-domain proteins.
We developed I-TASSER by iteratively implementing the TASSER method, which is used in the folding test of three benchmarks of small proteins. First, data on 16 small proteins (< 90 residues) were used to generate I-TASSER models, which had an average Cα-root mean square deviation (RMSD) of 3.8Å, with 6 of them having a Cα-RMSD < 2.5Å. The overall result was comparable with the all-atomic ROSETTA simulation, but the central processing unit (CPU) time by I-TASSER was much shorter (150 CPU days vs. 5 CPU hours). Second, data on 20 small proteins (< 120 residues) were used. I-TASSER folded four of them with a Cα-RMSD < 2.5Å. The average Cα-RMSD of the I-TASSER models was 3.9Å, whereas it was 5.9Å using TOUCHSTONE-II software. Finally, 20 non-homologous small proteins (< 120 residues) were taken from the PDB library. An average Cα-RMSD of 3.9Å was obtained for the third benchmark, with seven cases having a Cα-RMSD < 2.5Å.
Our simulation results show that I-TASSER can consistently predict the correct folds and sometimes high-resolution models for small single-domain proteins. Compared with other ab initio modeling methods such as ROSETTA and TOUCHSTONE II, the average performance of I-TASSER is either much better or is similar within a lower computational time. These data, together with the significant performance of automated I-TASSER server (the Zhang-Server) in the 'free modeling' section of the recent Critical Assessment of Structure Prediction (CASP)7 experiment, demonstrate new progresses in automated ab initio model generation. The I-TASSER server is freely available for academic users .