Summary: Three-dimensional RNA structure prediction and folding is of significant interest in the biological research community. Here, we present iFoldRNA, a novel web-based methodology for RNA structure prediction with near atomic resolution accuracy and analysis of RNA folding thermodynamics. iFoldRNA rapidly explores RNA conformations using discrete molecular dynamics simulations of input RNA sequences. Starting from simplified linear-chain conformations, RNA molecules (<50 nt) fold to native-like structures within half an hour of simulation, facilitating rapid RNA structure prediction. All-atom reconstruction of energetically stable conformations generates iFoldRNA predicted RNA structures. The predicted RNA structures are within 2–5 Å root mean squre deviations (RMSDs) from corresponding experimentally derived structures. RNA folding parameters including specific heat, contact maps, simulation trajectories, gyration radii, RMSDs from native state, fraction of native-like contacts are accessible from iFoldRNA. We expect iFoldRNA will serve as a useful resource for RNA structure prediction and folding thermodynamic analyses.
Supplementary information: Supplementary data are available at Bioinformatics online.
Discrete molecular dynamics (DMD) is a rapid sampling method used in protein folding and aggregation studies. Until now, DMD was used to perform simulations of simplified protein models in conjunction with structure-based force fields. Here, we develop an all-atom protein model and a transferable force field featuring packing, solvation, and environment-dependent hydrogen bond interactions. Using the replica exchange method, we perform folding simulations of six small proteins (20–60 residues) with distinct native structures. In all cases, native or near-native states are reached in simulations. For three small proteins, multiple folding transitions are observed and the computationally-characterized thermodynamics are in quantitative agreement with experiments. The predictive power of all-atom DMD highlights the importance of environment-dependent hydrogen bond interactions in modeling protein folding. The developed approach can be used for accurate and rapid sampling of conformational spaces of proteins and protein-protein complexes, and applied to protein engineering and design of protein-protein interactions.
ab initio protein folding; environment-dependent hydrogen bond; replica exchange; free energy landscape; conformational sampling
Based on the experimentally determined atomic coordinates for RNA helices and the self-avoiding walks of the P (phosphate) and C4 (carbon) atoms in the diamond lattice for the polynucleotide loop conformations, we derive a set of conformational entropy parameters for RNA pseudoknots. Based on the entropy parameters, we develop a folding thermodynamics model that enables us to compute the sequence-specific RNA pseudoknot folding free energy landscape and thermodynamics. The model is validated through extensive experimental tests both for the native structures and for the folding thermodynamics. The model predicts strong sequence-dependent helix-loop competitions in the pseudoknot stability and the resultant conformational switches between different hairpin and pseudoknot structures. For instance, for the pseudoknot domain of human telomerase RNA, a native-like and a misfolded hairpin intermediates are found to coexist on the (equilibrium) folding pathways, and the interplay between the stabilities of these intermediates causes the conformational switch that may underlie a human telomerase disease.
The structural information encoding specific conformations of natural RNAs can be implemented within artificial RNA sequences to control both three-dimensional (3D) shape and self-assembling interfaces for nanotechnology and synthetic biology applications. We have identified three natural RNA motifs known to direct helical topology into approximately 90° bends: a five-way tRNA junction, a three-way junction and a two-helix bend. These three motifs, embedded within rationally designed RNAs (tectoRNA), were chosen for generating square-shaped tetrameric RNA nanoparticles (NPs). The ability of each motif to direct the formation of supramolecular assemblies was compared by both native gel assays and atomic force microscopy (AFM). While there are multiple structural solutions for building square-shaped RNA particles, differences in the thermodynamics and molecular dynamics of the 90°-motif can lead to different biophysical behaviors for the resulting supramolecular complexes. We demonstrate via structural assembly programming how the different 90°-motifs can preferentially direct the formation of either 2D or 3D assemblies.
E Unus pluribum, or “Of One, Many”, may be at the root of decoding the RNA sequence-structure-function relationship. RNAs embody the large majority of genes in higher eukaryotes and fold in a sequence-directed fashion into three-dimensional structures that perform functions conserved across all cellular life forms, ranging from regulating to executing gene expression. While it is the most important determinant of RNA structure, the nucleotide sequence is generally not sufficient to specify a unique set of secondary and tertiary interactions due to the highly frustrated nature of RNA folding. This frustration results in folding heterogeneity, a common phenomenon wherein a chemically homogeneous population of RNA molecules folds into multiple stable structures. Often, these alternative conformations constitute misfolds, lacking the biological activity of the natively folded RNA. Intriguingly, a number of RNAs have recently been described as capable of adopting multiple distinct conformations that all perform, or contribute to, the same function. Characteristically, these conformations interconvert slowly on the experimental timescale, suggesting that they should be regarded as distinct native states. We discuss how rugged folding free energy landscapes give rise to multiple native states in the Tetrahymena Group I intron ribozyme, hairpin ribozyme, sarcin-ricin loop, ribosome, and an in vitro selected aptamer. We further describe the varying degrees to which folding heterogeneity impacts function in these RNAs, and compare and contrast this impact with that of heterogeneities found in protein folding. Embracing that one sequence can give rise to multiple native folds, we hypothesize that this phenomenon imparts adaptive advantages on any functionally evolving RNA quasispecies.
Using a combined master equation and kinetic cluster approach, we investigate RNA pseudoknot folding and unfolding kinetics. The energetic parameters are computed from a recently developed Vfold model for RNA secondary structure and pseudoknot folding thermodynamics. The folding kinetics theory is based on the complete conformational ensemble, including all the native-like and non-native states. The predicted folding and unfolding pathways, activation barriers, Arrhenius plots, and rate-limiting steps lead to several findings. First, for the PK5 pseudoknot, a misfolded 5′ hairpin emerges as a stable kinetic trap in the folding process, and the detrapping from this misfolded state is the rate-limiting step for the overall folding process. The calculated rate constant and activation barrier agree well with the experimental data. Second, as an application of the model, we investigate the kinetic folding pathways for hTR (human Telomerase RNA) pseudoknot. The predicted folding and unfolding pathways not only support the proposed role of conformational switch between hairpin and pseudoknot in hTR activity, but also reveal molecular mechanism for the conformational switch. Furthermore, for an experimentally studied hTR mutation, whose hairpin intermediate is destabilized, the model predicts a long-lived transient hairpin structure, and the switch between the transient hairpin intermediate and the native pseudoknot may be responsible for the observed hTR activity. Such finding would help resolve the apparent contradiction between the observed hTR activity and the absence of a stable hairpin.
Kinetics; RNA pseudoknot; Activation energy; Misfolded state; Telomerase
RNA function is dependent on its structure, yet three-dimensional folds for most biologically important RNAs are unknown. We develop a generic discrete molecular dynamics (DMD)-based modeling system that uses long-range constraints inferred from diverse biochemical or bioinformatic analyses to create statistically significant (p < 0.01) native-like folds for RNAs of known structure ranging from 45 to 158 nucleotides. We then predict the unknown structure of the hepatitis C virus IRES pseudoknot domain. The resulting RNA model rationalizes independent solvent accessibility and cryo-electron microscopy structure information. The pseudoknot positions the AUG start codon near the mRNA channel and is tRNA-like, suggesting the IRES employs molecular mimicry as a functional strategy.
RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more recently received notable interest. With a growing appreciation and understanding of the functional and structural properties of RNA motifs, and a growing interest in utilising biomolecules in nano-scale designs, the interest in the inverse RNA folding problem is bound to increase. However, whereas the RNA folding problem from an algorithmic viewpoint has an elegant and efficient solution, the inverse RNA folding problem appears to be hard.
In this paper we present a genetic algorithm approach to solve the inverse folding problem. The main aims of the development was to address the hitherto mostly ignored extension of solving the inverse folding problem, the multi-target inverse folding problem, while simultaneously designing a method with superior performance when measured on the quality of designed sequences. The genetic algorithm has been implemented as a Python program called Frnakenstein. It was benchmarked against four existing methods and several data sets totalling 769 real and predicted single structure targets, and on 292 two structure targets. It performed as well as or better at finding sequences which folded in silico into the target structure than all existing methods, without the heavy bias towards CG base pairs that was observed for all other top performing methods. On the two structure targets it also performed well, generating a perfect design for about 80% of the targets.
Our method illustrates that successful designs for the inverse RNA folding problem does not necessarily have to rely on heavy biases in base pair and unpaired base distributions. The design problem seems to become more difficult on larger structures when the target structures are real structures, while no deterioration was observed for predicted structures. Design for two structure targets is considerably more difficult, but far from impossible, demonstrating the feasibility of automated design of artificial riboswitches. The Python implementation is available at
RNA; Inverse folding; Genetic algorithm; Riboswitch
RNA folds to a myriad of three-dimensional structures and performs an equally diverse set of functions. The ability of RNA to fold and function in vivo is all the more remarkable because, in vitro, RNA has been shown to have a strong propensity to adopt misfolded, non-functional conformations. A principal factor underlying the dominance of RNA misfolding is that local RNA structure can be quite stable even in the absence of enforcing global tertiary structure. This property allows non-native structure to persist, and it also allows native structure to form and stabilize non-native contacts or non-native topology. In recent years it has become clear that one of the central reasons for the apparent disconnect between the capabilities of RNA in vivo and its in vitro folding properties is the presence of RNA chaperones, which facilitate conformational transitions of RNA and therefore mitigate the deleterious effects of RNA misfolding. Over the past two decades, it has been demonstrated that several classes of non-specific RNA binding proteins possess profound RNA chaperone activity in vitro and when overexpressed in vivo, and at least some of these proteins appear to function as chaperones in vivo. More recently, it has been shown that certain DExD/H-box proteins function as general chaperones to facilitate folding of group I and group II introns. These proteins are RNA-dependent ATPases and have RNA helicase activity, and are proposed to function by using energy from ATP binding and hydrolysis to disrupt RNA structure and/or to displace proteins from RNA-protein complexes. This review outlines experimental studies that have led to our current understanding of the range of misfolded RNA structures, the physical origins of RNA misfolding, and the functions and mechanisms of putative RNA chaperone proteins.
RNA folding; DEAD-box Proteins; Cold-Shock Proteins; tRNA; group I intron; Review
Optimal exploitation of the expanding database of sequences requires rapid finding and folding of RNAs. Methods are reviewed that automate folding and discovery of RNAs with algorithms that couple thermodynamics with chemical mapping, NMR, and/or sequence comparison. New functional noncoding RNAs in genome sequences can be found by combining sequence comparison with the assumption that functional noncoding RNAs will have more favorable folding free energies than other RNAs. When a new RNA is discovered, experiments and sequence comparison can restrict folding space so that secondary structure can be rapidly determined with the help of predicted free energies. In turn, secondary structure restricts folding in three dimensions, which allows modeling of three-dimensional structure. An example from a domain of a retrotransposon is described. Discovery of new RNAs and their structures will provide insights into evolution, biology, and design of therapeutics. Applications to studies of evolution are also reviewed.
Combining sequence comparison and thermodynamic considerations with experimental approaches such as chemical mapping and NMR allows rapid modeling of RNA secondary structure.
The folding of the B-domain of staphylococcal protein A has been studied by coarse-grained canonical and multiplexed replica-exchange molecular dynamics simulations with the UNRES force field in a broad range of temperatures (270K ≤ T ≤ 350K). In canonical simulations, the folding was found to occur either directly to the native state or through kinetic traps, mainly the topological mirror image of the native three-helix bundle. The latter folding scenario was observed more frequently at low temperatures. With increase of temperature, the frequency of the transitions between the folded and misfolded/unfolded states increased and the folded state became more diffuse with conformations exhibiting increased root-mean-square deviations from the experimental structure (from about 4 Å at T = 300K to 8.7 Å at T = 325K). An analysis of the equilibrium conformational ensemble determined from multiplexed replica exchange simulations at the folding-transition temperature (Tf = 325K) showed that the conformational ensemble at this temperature is a collection of conformations with residual secondary structures, which possess native or near-native clusters of nonpolar residues in place, and not a 50%-50% mixture of fully-folded and fully-unfolded conformations. These findings contradict the quasi-chemical picture of two- or multi-state protein folding, which assumes an equilibrium between the folded, unfolded, and intermediate states, with equilibrium shifting with temperature but with the native conformations remaining essentially unchanged. Our results also suggest that long-range hydrophobic contacts are the essential factor to keep the structure of a protein thermally stable.
protein folding; folding/unfolding transition; coarse-grained dynamics; conformational ensemble
According to the “thermodynamic hypothesis”, the sequence of a biological macromolecule defines its folded, active structure as a global energy minimum on the folding landscape.1,2 But the enormous complexity of folding landscapes of large macromolecules raises a question: Is there indeed a unique global energy minimum corresponding to a unique native conformation, or are there deep local minima corresponding to alternative active conformations?3 Folding of many proteins is well described by two-state models, leading to highly simplified representations of protein folding landscapes with a single native conformation.4,5 Nevertheless, accumulating experimental evidence suggests a more complex topology of folding landscapes with multiple active conformations that can take seconds or longer to interconvert.6,7,8 Here we employ single molecule experiments to demonstrate that an RNA enzyme folds into multiple distinct native states that interconvert much slower than the time scale of catalysis. These data demonstrate that the severe ruggedness of RNA folding landscapes extends into conformational space occupied by native conformations.
Many non-coding RNAs fold into complex three-dimensional structures, yet the self-assembly of RNA structure is hampered by mispairing, weak tertiary interactions, electrostatic barriers, and the frequent requirement that the 5′ and 3′ ends of the transcript interact. This rugged free energy landscape for RNA folding means that some RNA molecules in a population rapidly form their native structure, while many others become kinetically trapped in misfolded conformations. Transient binding of RNA chaperone proteins destabilize misfolded intermediates and lower the transition states between conformations, producing a smoother landscape that increases the rate of folding and the probability that a molecule will find the native structure. DEAD-box proteins couple the chemical potential of ATP hydrolysis with repetitive cycles of RNA binding and release, expanding the range of conditions under which they can refold RNA structures.
RNA folding; kinetic partitioning; ribozyme; RNA chaperone; DEAD-box
RNAs must fold into unique three-dimensional structures to function in the cell, but how each polynucleotide finds its native structure is not understood. To investigate whether the stability of the tertiary structure determines the speed and accuracy of RNA folding, docking of a tetraloop with its receptor in a bacterial group I ribozyme was perturbed by site-directed mutagenesis. Disruption of the tetraloop or its receptor destabilizes tertiary interactions throughout the ribozyme by 2-3 kcal/mol, demonstrating that tertiary interactions form cooperatively in the transition from a native-like intermediate to the native state. Nondenaturing PAGE and RNase T1 digestion showed that base pairs form less homogeneously in the mutant RNAs during the transition from the unfolded state to the intermediate. Thus, tertiary interactions between helices bias the ensemble of secondary structures toward native-like conformations. Time-resolved hydroxyl radical footprinting showed that the wild type ribozyme folds completely within 5-20 ms. By contrast, only 40-60% of a tetraloop mutant ribozyme folds in 30-40 ms, with the remainder folding in 30 – 200 s via non-native intermediates. Therefore, destabilization of tetraloop-receptor docking introduces an alternate folding pathway in the otherwise smooth energy landscape of the wild type ribozyme. Our results show that stable tertiary structure increases the flux through folding pathways that lead directly and rapidly to the native structure.
ribozyme; group I intron; footprinting; tetraloop; cooperative folding
The diverse landscape of RNA conformational space includes many canyons and crevices that are distant from the lowest minimum free energy valley and remain unexplored by traditional RNA structure prediction methods. A complete description of the entire RNA folding landscape can facilitate identification of biologically important conformations. The Crumple algorithm rapidly enumerates all possible non-pseudoknotted structures for an RNA sequence without consideration of thermodynamics while filtering the output with experimental data. The Crumple algorithm provides an alternative approach to traditional free energy minimization programs for RNA secondary structure prediction. A complete computation of all non-pseudoknotted secondary structures can reveal structures that would not be predicted by methods that sample the RNA folding landscape based on thermodynamic predictions. The free energy minimization approach is often successful but is limited by not considering RNA tertiary and protein interactions and the possibility that kinetics rather than thermodynamics determines the functional RNA fold. Efficient parallel computing and filters based on experimental data make practical the complete enumeration of all non-pseudoknotted structures. Efficient parallel computing for Crumple is implemented in a ring graph approach. Filters for experimental data include constraints from chemical probing of solvent accessibility, enzymatic cleavage of paired or unpaired nucleotides, phylogenetic covariation, and the minimum number and lengths of helices determined from crystallography or cryo-electron microscopy. The minimum number and length of helices has a significant effect on reducing conformational space. Pairing constraints reduce conformational space more than single nucleotide constraints. Examples with Alfalfa Mosaic Virus RNA and Trypanosome brucei guide RNA demonstrate the importance of evaluating all possible structures when pseduoknots, RNA-protein interactions, and metastable structures are important for biological function. Crumple software is freely available at http://adenosine.chem.ou.edu/software.html.
Difficult problems in structural bioinformatics are often studied in simple exact models to gain insights and to derive general principles. Protein folding, for example, has long been studied in the lattice model. Recently, researchers have also begun to apply the lattice model to the study of RNA folding.
We present a novel method for predicting RNA secondary structures with pseudoknots: first simulate the folding dynamics of the RNA sequence on the 3D triangular lattice, next extract and select a set of disjoint base pairs from the best lattice conformation found by the folding simulation. Experiments on sequences from PseudoBase show that our prediction method outperforms the HotKnot algorithm of Ren, Rastegari, Condon and Hoos, a leading method for RNA pseudoknot prediction. Our method for RNA secondary structure prediction can be adapted into an efficient reconstruction method that, given an RNA sequence and an associated secondary structure, finds a conformation of the sequence on the 3D triangular lattice that realizes the base pairs in the secondary structure. We implemented a suite of computer programs for the simulation and visualization of RNA folding on the 3D triangular lattice. These programs come with detailed documentation and are accessible from the companion website of this paper at http://www.cs.usu.edu/~mjiang/rna/DeltaIS/.
Folding simulation on the 3D triangular lattice is effective method for RNA secondary structure prediction and lattice conformation reconstruction. The visualization software for the lattice conformations of RNA structures is a valuable tool for the study of RNA folding and is a great pedagogic device.
Computational methods for predicting evolutionarily conserved rather than thermodynamic RNA structures have recently attracted increased interest. These methods are indispensable not only for elucidating the regulatory roles of known RNA transcripts, but also for predicting RNA genes. It has been notoriously difficult to devise them to make the best use of the available data and to predict high-quality RNA structures that may also contain pseudoknots. We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences. We also present an implementation of the framework in a new computer program, called SimulFold, which employs a Bayesian Markov chain Monte Carlo method to sample from the joint posterior distribution of RNA structures, alignments, and trees. We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs. We also present preliminary data that show SimulFold's potential as an alignment and phylogeny prediction method. SimulFold overcomes many conceptual limitations that current RNA structure prediction methods face, introduces several new theoretical techniques, and generates high-quality predictions of conserved RNA structures that may include pseudoknots. It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.
Not only is the prediction of evolutionarily conserved RNA structures important for elucidating the potential functions of RNA sequences and the mechanisms by which these functions are exerted, but it also lies at the core of RNA gene prediction. To get an accurate prediction of the conserved RNA structure, we need a high-quality sequence alignment and an evolutionary tree relating several evolutionarily related sequences. These are two strong requirements that are typically difficult to fulfill unless the encoded RNA structure is already known. We present what is to our knowledge the first method that solves this chicken-and-egg problem by co-estimating all three quantities simultaneously. We show that our novel method, called SimulFold, can be successfully applied over a wide range of sequence similarities to detect conserved RNA structures, including those with pseudoknots. We also show its potential as an alignment and phylogeny prediction method. Our method overcomes several significant limitations of existing methods and has the potential to be used for a very diverse range of tasks.
One of the key issues in the theoretical prediction of RNA folding is the prediction of loop structure from the sequence. RNA loop free energies are dependent on the loop sequence content. However, most current models account only for the loop length-dependence. The previously developed “Vfold” model (a coarse-grained RNA folding model) provides an effective method to generate the complete ensemble of coarse-grained RNA loop and junction conformations. However, due to the lack of sequence-dependent scoring parameters, the method is unable to identify the native and near-native structures from the sequence. In this study, using a previously developed iterative method for extracting the knowledge-based potential parameters from the known structures, we derive a set of dinucleotide-based statistical potentials for RNA loops and junctions. A unique advantage of the approach is its ability to go beyond the the (known) native structures by accounting for the full free energy landscape, including all the nonnative folds. The benchmark tests indicate that for given loop/junction sequences, the statistical potentials enable successful predictions for the coarse-grained 3D structures from the complete conformational ensemble generated by the Vfold model. The predicted coarse-grained structures can provide useful initial folds for further detailed structural refinement.
The discovery that RNA molecules can fold into complex structures and carry out diverse cellular roles has led to interest in developing tools for modeling RNA tertiary structure. While significant progress has been made in establishing that the RNA backbone is rotameric, few libraries of discrete conformations specifically for use in RNA modeling have been validated. Here, we present six libraries of discrete RNA conformations based on a simplified pseudo-torsional notation of the RNA backbone, comparable to phi and psi in the protein backbone. We evaluate the ability of each library to represent single nucleotide backbone conformations and we show how individual library fragments can be assembled into dinucleotides that are consistent with established RNA backbone descriptors spanning from sugar to sugar. We then use each library to build all-atom models of 20 test folds and we show how the composition of a fragment library can limit model quality. Despite the limitations inherent in using discretized libraries, we find that several hundred discrete fragments can rebuild RNA folds up to 174 nucleotides in length with atomic-level accuracy (<1.5Å RMSD). We anticipate the libraries presented here could easily be incorporated into RNA structural modeling, analysis, or refinement tools.
RNA structure; RNA backbone conformation; RNA fragment library; RNA modeling
A new modeling technique for arriving at the three dimensional (3-D) structure of an RNA stem-loop has been developed based on a conformational search by a genetic algorithm and the following refinement by energy minimization. The genetic algorithm simultaneously optimizes a population of conformations in the predefined conformational space and generates 3-D models of RNA. The fitness function to be optimized by the algorithm has been defined to reflect the satisfaction of known conformational constraints. In addition to a term for distance constraints, the fitness function contains a term to constrain each local conformation near to a prepared template conformation. The technique has been applied to the two loops of tRNA, the anticodon loop and the T-loop, and has found good models with small root mean square deviations from the crystal structure. Slightly different models have also been found for the anticodon loop. The analysis of a collection of alternative models obtained has revealed statistical features of local variations at each base position.
Since experimental determination of protein folding pathways remains difficult, computational techniques are often used to simulate protein folding. Most current techniques to predict protein folding pathways are computationally intensive and are suitable only for small proteins.
By assuming that the native structure of a protein is known and representing each intermediate conformation as a collection of fully folded structures in which each of them contains a set of interacting secondary structure elements, we show that it is possible to significantly reduce the conformation space while still being able to predict the most energetically favorable folding pathway of large proteins with hundreds of residues at the mesoscopic level, including the pig muscle phosphoglycerate kinase with 416 residues. The model is detailed enough to distinguish between different folding pathways of structurally very similar proteins, including the streptococcal protein G and the peptostreptococcal protein L. The model is also able to recognize the differences between the folding pathways of protein G and its two structurally similar variants NuG1 and NuG2, which are even harder to distinguish. We show that this strategy can produce accurate predictions on many other proteins with experimentally determined intermediate folding states.
Our technique is efficient enough to predict folding pathways for both large and small proteins at the mesoscopic level. Such a strategy is often the only feasible choice for large proteins. A software program implementing this strategy (SSFold) is available at .
The biopolymer chain elasticity (BCE) approach and the new molecular modelling methodology presented previously are used to predict the tri- dimensional backbones of DNA and RNA hairpin loops. The structures of eight remarkably stable DNA or RNA hairpin molecules closed by a mispair, recently determined in solution by NMR and deposited in the PDB, are shown to verify the predicted trajectories by an analysis automated for large numbers of PDB conformations. They encompass: one DNA tetraloop, -GTTA-; three DNA triloops, -AAA- or -GCA-; and four RNA tetraloops, -UUCG-. Folding generates no distortions and bond lengths and bond angles of main atoms of the sugar–phosphate backbone are well restored upon energy refinement. Three different methods (superpositions, distance of main chain atoms to the elastic line and RMSd) are used to show a very good agreement between the trajectories of sugar–phosphate backbones and between entire molecules of theoretical models and of PDB conformations. The geometry of end conditions imposed by the stem is sufficient to dictate the different characteristic DNA or RNA folding shapes. The reduced angular space, consisting of the new parameter, angle Ω, together with the χ angle offers a simple, coherent and quantitative description of hairpin loops.
Current experiments on structural determination cannot keep up the pace with the steadily emerging RNA sequences and new functions. This underscores the request for an accurate model for RNA three-dimensional (3D) structural prediction. Although considerable progress has been made in mechanistic studies, accurate prediction for RNA tertiary folding from sequence remains an unsolved problem. The first and most important requirement for the prediction of RNA structure from physical principles is an accurate free energy model. A recently developed three-vector virtual bond-based RNA folding model (“Vfold”) has allowed us to compute the chain entropy and predict folding free energies and structures for RNA secondary structures and simple pseudoknots. Here we develop a free energy-based method to predict larger more complex RNA tertiary folds. The approach is based on a multiscaling strategy: from the nucleotide sequence, we predict the two-dimensional (2D) structures (defined by the base pairs and tertiary contacts); based on the 2D structure, we construct a 3D scaffold; with the 3D scaffold as the initial state, we combine AMBER energy minimization and PDB-based fragment search to predict the all-atom structure. A key advantage of the approach is the statistical mechanical calculation for the conformational entropy of RNA structures, including those with cross-linked loops. Benchmark tests show that the model leads to significant improvements in RNA 3D structure prediction.
Energy landscape; RNA folding; Structural prediction; Tertiary structure
Computational methods for determining the secondary structure of RNA sequences from given alignments are currently either based on thermodynamic folding, compensatory base pair substitutions or both. However, there is currently no approach that combines both sources of information in a single optimization problem. Here, we present a model that formally integrates both the energy-based and evolution-based approaches to predict the folding of multiple aligned RNA sequences. We have implemented an extended version of Pfold that identifies base pairs that have high probabilities of being conserved and of being energetically favorable. The consensus structure is predicted using a maximum expected accuracy scoring scheme to smoothen the effect of incorrectly predicted base pairs. Parameter tuning revealed that the probability of base pairing has a higher impact on the RNA structure prediction than the corresponding probability of being single stranded. Furthermore, we found that structurally conserved RNA motifs are mostly supported by folding energies. Other problems (e.g. RNA-folding kinetics) may also benefit from employing the principles of the model we introduce. Our implementation, PETfold, was tested on a set of 46 well-curated Rfam families and its performance compared favorably to that of Pfold and RNAalifold.
Accurately modeling unpaired regions of RNA is important for predicting structure, dynamics, and thermodynamics of folded RNA. Comparisons between NMR data and molecular dynamics simulations provide a test of force fields used for modeling. Here, NMR spectroscopy, including NOESY, 1H–31P HETCOR, DQF-COSY, and TOCSY, was used to determine conformational preferences for single-stranded GACC RNA. The spectra are consistent with a conformational ensemble containing major and minor A-form-like structures. In a series of 50 ns molecular dynamics (MD) simulations with the AMBER99 force field in explicit solvent, initial A-form-like structures rapidly evolve to disordered conformations. A set of 50 ns simulations with revised χ torsions (AMBER99χ force field) gives two primary conformations, consistent with the NMR spectra. A single 1.9 μs MD simulation with the AMBER99χ force field showed that the major and minor conformations are retained for almost 68% of the time in the first 700 ns, with multiple transformations from A-form to non-A-form conformations. For the rest of the simulation, random-coil structures and a stable non-A-form conformation inconsistent with NMR spectra were seen. Evidently, the AMBER99χ force field improves structural predictions for single-stranded GACC RNA compared to the AMBER99 force field, but further force field improvements are needed.