RNA inverse folding is a computational technology for designing RNA sequences which fold into a user-specified secondary structure. Although pseudoknots are functionally important motifs in RNA structures, less reports concerning the inverse folding of pseudoknotted RNAs have been done compared to those for pseudoknot-free RNA design. In this paper, we present a new version of our multi-objective genetic algorithm (MOGA), MODENA, which we have previously proposed for pseudoknot-free RNA inverse folding. In the new version of MODENA, (i) a new crossover operator is implemented and (ii) pseudoknot prediction methods, IPknot and HotKnots, are used to evaluate the designed RNA sequences, allowing us to perform the inverse folding of pseudoknotted RNAs. The new version of MODENA with the new crossover operator was benchmarked with a dataset composed of natural pseudoknotted RNA secondary structures, and we found that MODENA can successfully design more pseudoknotted RNAs compared to the other pseudoknot design algorithm. In addition, a sequence constraint function newly implemented in the new version of MODENA was tested by designing RNA sequences which fold into the pseudoknotted structure of a hepatitis delta virus ribozyme; as a result, we successfully designed eight RNA sequences. The new version of MODENA is downloadable from http://rna.eit.hirosaki-u.ac.jp/modena/.
inverse folding; pseudoknot; secondary structure; pseudobase; Rfam; sequence constraint
RNA exhibits a variety of structural configurations. Here we consider a structure to be tantamount to the noncrossing Watson-Crick and G-U-base pairings (secondary structure) and additional cross-serial base pairs. These interactions are called pseudoknots and are observed across the whole spectrum of RNA functionalities. In the context of studying natural RNA structures, searching for new ribozymes and designing artificial RNA, it is of interest to find RNA sequences folding into a specific structure and to analyze their induced neutral networks. Since the established inverse folding algorithms, RNAinverse, RNA-SSD as well as INFO-RNA are limited to RNA secondary structures, we present in this paper the inverse folding algorithm Inv which can deal with 3-noncrossing, canonical pseudoknot structures.
In this paper we present the inverse folding algorithm Inv. We give a detailed analysis of Inv, including pseudocodes. We show that Inv allows to design in particular 3-noncrossing nonplanar RNA pseudoknot 3-noncrossing RNA structures-a class which is difficult to construct via dynamic programming routines. Inv is freely available at http://www.combinatorics.cn/cbpc/inv.html.
The algorithm Inv extends inverse folding capabilities to RNA pseudoknot structures. In comparison with RNAinverse it uses new ideas, for instance by considering sets of competing structures. As a result, Inv is not only able to find novel sequences even for RNA secondary structures, it does so in the context of competing structures that potentially exhibit cross-serial interactions.
RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more recently received notable interest. With a growing appreciation and understanding of the functional and structural properties of RNA motifs, and a growing interest in utilising biomolecules in nano-scale designs, the interest in the inverse RNA folding problem is bound to increase. However, whereas the RNA folding problem from an algorithmic viewpoint has an elegant and efficient solution, the inverse RNA folding problem appears to be hard.
In this paper we present a genetic algorithm approach to solve the inverse folding problem. The main aims of the development was to address the hitherto mostly ignored extension of solving the inverse folding problem, the multi-target inverse folding problem, while simultaneously designing a method with superior performance when measured on the quality of designed sequences. The genetic algorithm has been implemented as a Python program called Frnakenstein. It was benchmarked against four existing methods and several data sets totalling 769 real and predicted single structure targets, and on 292 two structure targets. It performed as well as or better at finding sequences which folded in silico into the target structure than all existing methods, without the heavy bias towards CG base pairs that was observed for all other top performing methods. On the two structure targets it also performed well, generating a perfect design for about 80% of the targets.
Our method illustrates that successful designs for the inverse RNA folding problem does not necessarily have to rely on heavy biases in base pair and unpaired base distributions. The design problem seems to become more difficult on larger structures when the target structures are real structures, while no deterioration was observed for predicted structures. Design for two structure targets is considerably more difficult, but far from impossible, demonstrating the feasibility of automated design of artificial riboswitches. The Python implementation is available at
RNA; Inverse folding; Genetic algorithm; Riboswitch
Optimal exploitation of the expanding database of sequences requires rapid finding and folding of RNAs. Methods are reviewed that automate folding and discovery of RNAs with algorithms that couple thermodynamics with chemical mapping, NMR, and/or sequence comparison. New functional noncoding RNAs in genome sequences can be found by combining sequence comparison with the assumption that functional noncoding RNAs will have more favorable folding free energies than other RNAs. When a new RNA is discovered, experiments and sequence comparison can restrict folding space so that secondary structure can be rapidly determined with the help of predicted free energies. In turn, secondary structure restricts folding in three dimensions, which allows modeling of three-dimensional structure. An example from a domain of a retrotransposon is described. Discovery of new RNAs and their structures will provide insights into evolution, biology, and design of therapeutics. Applications to studies of evolution are also reviewed.
Combining sequence comparison and thermodynamic considerations with experimental approaches such as chemical mapping and NMR allows rapid modeling of RNA secondary structure.
A program for overlaying multiple flexible molecules has been developed. Candidate overlays are generated by a novel fingerprint algorithm, scored on three objective functions (union volume, hydrogen-bond match, and hydrophobic match), and ranked by constrained Pareto ranking. A diverse subset of the best ranked solutions is chosen using an overlay-dissimilarity metric. If necessary, the solutions can be optimised. A multi-objective genetic algorithm can be used to find additional overlays with a given mapping of chemical features but different ligand conformations. The fingerprint algorithm may also be used to produce constrained overlays, in which user-specified chemical groups are forced to be superimposed. The program has been tested on several sets of ligands, for each of which the true overlay is known from protein–ligand crystal structures. Both objective and subjective success criteria indicate that good results are obtained on the majority of these sets.
Electronic supplementary material
The online version of this article (doi:10.1007/s10822-012-9573-y) contains supplementary material, which is available to authorized users.
Alignment; Overlay; Pharmacophore
The Sfold web server provides user-friendly access to Sfold, a recently developed nucleic acid folding software package, via the World Wide Web (WWW). The software is based on a new statistical sampling paradigm for the prediction of RNA secondary structure. One of the main objectives of this software is to offer computational tools for the rational design of RNA-targeting nucleic acids, which include small interfering RNAs (siRNAs), antisense oligonucleotides and trans-cleaving ribozymes for gene knock-down studies. The methodology for siRNA design is based on a combination of RNA target accessibility prediction, siRNA duplex thermodynamic properties and empirical design rules. Our approach to target accessibility evaluation is an original extension of the underlying RNA folding algorithm to account for the likely existence of a population of structures for the target mRNA. In addition to the application modules Sirna, Soligo and Sribo for siRNAs, antisense oligos and ribozymes, respectively, the module Srna offers comprehensive features for statistical representation of sampled structures. Detailed output in both graphical and text formats is available for all modules. The Sfold server is available at http://sfold.wadsworth.org and http://www.bioinfo.rpi.edu/applications/sfold.
The development of algorithms for designing artificial RNA sequences that fold into specific secondary structures has many potential biomedical and synthetic biology applications. To date, this problem remains computationally difficult, and current strategies to address it resort to heuristics and stochastic search techniques. The most popular methods consist of two steps: First a random seed sequence is generated; next, this seed is progressively modified (i.e. mutated) to adopt the desired folding properties. Although computationally inexpensive, this approach raises several questions such as (i) the influence of the seed; and (ii) the efficiency of single-path directed searches that may be affected by energy barriers in the mutational landscape. In this article, we present RNA-ensign, a novel paradigm for RNA design. Instead of taking a progressive adaptive walk driven by local search criteria, we use an efficient global sampling algorithm to examine large regions of the mutational landscape under structural and thermodynamical constraints until a solution is found. When considering the influence of the seeds and the target secondary structures, our results show that, compared to single-path directed searches, our approach is more robust, succeeds more often and generates more thermodynamically stable sequences. An ensemble approach to RNA design is thus well worth pursuing as a complement to existing approaches. RNA-ensign is available at http://csb.cs.mcgill.ca/RNAensign.
The protein structure prediction (PSP) problem is concerned with the prediction of the folded, native, tertiary structure of a protein given its sequence of amino acids. It is a challenging and computationally open problem, as proven by the numerous methodological attempts and the research effort applied to it in the last few years. The potential energy functions used in the literature to evaluate the conformation of a protein are based on the calculations of two different interaction energies: local (bond atoms) and non-local (non-bond atoms). In this paper, we show experimentally that those types of interactions are in conflict, and do so by using the potential energy function Chemistry at HARvard Macromolecular Mechanics. A multi-objective formulation of the PSP problem is introduced and its applicability studied. We use a multi-objective evolutionary algorithm as a search procedure for exploring the conformational space of the PSP problem.
multi-objective optimization; Pareto front; protein folding; protein structure prediction; multi-objective evolutionary algorithms
In ribonucleic acid (RNA) molecules whose function depends on their final, folded three-dimensional shape (such as those in ribosomes or spliceosome complexes), the secondary structure, defined by the set of internal basepair interactions, is more consistently conserved than the primary structure, defined by the sequence of nucleotides.
The research presented here investigates the possibility of applying a progressive, pairwise approach to the alignment of multiple RNA sequences by simultaneously predicting an energy-optimized consensus secondary structure. We take an existing algorithm for finding the secondary structure common to two RNA sequences, Dynalign, and alter it to align profiles of multiple sequences. We then explore the relative successes of different approaches to designing the tree that will guide progressive alignments of sequence profiles to create a multiple alignment and prediction of conserved structure.
We have found that applying a progressive, pairwise approach to the alignment of multiple ribonucleic acid sequences produces highly reliable predictions of conserved basepairs, and we have shown how these predictions can be used as constraints to improve the results of a single-sequence structure prediction algorithm. However, we have also discovered that the amount of detail included in a consensus structure prediction is highly dependent on the order in which sequences are added to the alignment (the guide tree), and that if a consensus structure does not have sufficient detail, it is less likely to provide useful constraints for the single-sequence method.
RNA molecules are important cellular components involved in many fundamental biological processes. Understanding the mechanisms behind their functions requires RNA tertiary structure knowledge. While modeling approaches for the study of RNA structures and dynamics lag behind efforts in protein folding, much progress has been achieved in the past two years. Here, we review recent advances in RNA folding algorithms, RNA tertiary motif discovery, applications of graph theory approaches to RNA structure and function, and in silico generation of RNA sequence pools for aptamer design. Advances within each area can be combined to impact many problems in RNA structure and function.
RNA folding; RNA tertiary motifs; RNA graphs; in vitro selection
RNA molecules fold into characteristic secondary structures for their diverse functional activities such as post-translational regulation of gene expression. Searching homologs of a pre-defined RNA structural motif, which may be a known functional element or a putative RNA structural motif, can provide useful information for deciphering RNA regulatory mechanisms. Since searching for the RNA structural homologs among the numerous RNA sequences is extremely time-consuming, this work develops a data preprocessing strategy to enhance the search efficiency and presents RNAMST, which is an efficient and flexible web server for rapidly identifying homologs of a pre-defined RNA structural motif among numerous RNA sequences. Intuitive user interface are provided on the web server to facilitate the predictive analysis. By comparing the proposed web server to other tools developed previously, RNAMST performs remarkably more efficiently and provides more effective and flexible functions. RNAMST is now available on the web at .
An RNA secondary structure is locally optimal if there is no lower energy structure that can be obtained by the addition or removal of a single base pair, where energy is defined according to the widely accepted Turner nearest neighbor model. Locally optimal structures form kinetic traps, since any evolution away from a locally optimal structure must involve energetically unfavorable folding steps. Here, we present a novel, efficient algorithm to compute the partition function over all locally optimal secondary structures of a given RNA sequence. Our software, RNAlocopt runs in time and space. Additionally, RNAlocopt samples a user-specified number of structures from the Boltzmann subensemble of all locally optimal structures. We apply RNAlocopt to show that (1) the number of locally optimal structures is far fewer than the total number of structures – indeed, the number of locally optimal structures approximately equal to the square root of the number of all structures, (2) the structural diversity of this subensemble may be either similar to or quite different from the structural diversity of the entire Boltzmann ensemble, a situation that depends on the type of input RNA, (3) the (modified) maximum expected accuracy structure, computed by taking into account base pairing frequencies of locally optimal structures, is a more accurate prediction of the native structure than other current thermodynamics-based methods. The software RNAlocopt constitutes a technical breakthrough in our study of the folding landscape for RNA secondary structures. For the first time, locally optimal structures (kinetic traps in the Turner energy model) can be rapidly generated for long RNA sequences, previously impossible with methods that involved exhaustive enumeration. Use of locally optimal structure leads to state-of-the-art secondary structure prediction, as benchmarked against methods involving the computation of minimum free energy and of maximum expected accuracy. Web server and source code available at http://bioinformatics.bc.edu/clotelab/RNAlocopt/.
Learning how native RNA conformations can be stabilized relative to unfolded states is an important objective, both for understanding natural RNAs and for improving the design of artificial functional RNAs. Here we show that covalently attached double-stranded DNA constraints (ca. 14 base pairs in length) can significantly stabilize the native conformation of an RNA molecule. Using the P4-P6 domain of the Tetrahymena group I intron as the test system, we identified pairs of RNA sites where attaching a DNA duplex is predicted to be structurally compatible with only the folded state of the RNA. The DNA-constrained RNAs were synthesized and shown by nondenaturing polyacrylamide gel electrophoresis (native PAGE) to have substantial decreases in their Mg2+ midpoints ([Mg2+]1/2 values). These changes are equivalent to free energy stabilizations as large as ΔΔG° = −2.5 kcal/mol, which is ∼14% of the total tertiary folding energy. For comparison, the sole modification of P4-P6 previously reported to stabilize this RNA is a single-nucleotide deletion (ΔC209) that provides only 1.1 kcal/mol of stabilization. Our findings indicate that nature has not completely optimized P4-P6 RNA folding. Furthermore, the DNA constraints are designed not to interact directly and extensively with the RNA, but rather more indirectly to modulate the relative stabilities of folded and unfolded RNA states. The successful implementation of this strategy to further stabilize a natively folded RNA conformation suggests an important element of modularity in stabilization of RNA structure, with implications for how nature might use other molecules such as proteins to stabilize specific RNA conformations.
Summary: Three-dimensional RNA structure prediction and folding is of significant interest in the biological research community. Here, we present iFoldRNA, a novel web-based methodology for RNA structure prediction with near atomic resolution accuracy and analysis of RNA folding thermodynamics. iFoldRNA rapidly explores RNA conformations using discrete molecular dynamics simulations of input RNA sequences. Starting from simplified linear-chain conformations, RNA molecules (<50 nt) fold to native-like structures within half an hour of simulation, facilitating rapid RNA structure prediction. All-atom reconstruction of energetically stable conformations generates iFoldRNA predicted RNA structures. The predicted RNA structures are within 2–5 Å root mean squre deviations (RMSDs) from corresponding experimentally derived structures. RNA folding parameters including specific heat, contact maps, simulation trajectories, gyration radii, RMSDs from native state, fraction of native-like contacts are accessible from iFoldRNA. We expect iFoldRNA will serve as a useful resource for RNA structure prediction and folding thermodynamic analyses.
Supplementary information: Supplementary data are available at Bioinformatics online.
Three-dimensional RNA structure prediction and folding is of significant interest in the biological research community. Here, we present iFoldRNA, a novel web-based methodology for RNA structure prediction with near atomic resolution accuracy and analysis of RNA folding thermodynamics. iFoldRNA rapidly explores RNA conformations using discrete molecular dynamics simulations of input RNA sequences. Starting from simplified linear-chain conformations, RNA molecules (<50 nucleotides) fold to native-like structures within half an hour of simulation, facilitating rapid RNA structure prediction. All-atom reconstruction of energetically stable conformations generates iFoldRNA predicted RNA structures. The predicted RNA structures are within 2–5 Angstrom root mean square deviations from corresponding experimentally derived structures. RNA folding parameters including specific heat, contact maps, simulation trajectories, gyration radii, root mean square deviations from native state, fraction of native-like contacts are accessible from iFoldRNA. We expect iFoldRNA will serve as a useful resource for RNA structure prediction and folding thermodynamic analyses.
The accurate prediction of the secondary and tertiary structure of an RNA with different folding algorithms are dependent on several factors, including the energy functions. However, an RNA higher-order structure cannot be accurately predicted from its sequence based on a limited set of energy parameters. The inter- and intra-molecular forces between this RNA and other small molecules and macromolecules, in addition to other factors in the cell such as pH, ionic strength, and temperature influence the complex dynamics associated with a single stranded RNA's transitioning to its secondary and tertiary structure. Since all of the factors that affect the formation of an RNAs three-dimensional structure cannot be determined experimentally, statistically derived potential energy has been used in the prediction of protein structure. In the current work, we evaluate the statistical free energy of various secondary structure motifs, including base-pair stacks, hairpin loops, and internal loops, using their statistical frequencies obtained from the comparative analysis of more than 50 000 RNA sequences stored in the RNA Comparative Analysis Database (rCAD) at the Comparative RNA Web (CRW) Site. Statistical energies were computed from the structural statistics for several datasets. While the statistical energies for base-pair stacks correlate with experimentally derived free energy values, suggesting a Boltzmann-like distribution, variation is observed between different molecules and their location on the phylogenetic tree of life. Our statistical energies for several structural elements were utilized in the Mfold RNA folding algorithm. The combined statistical energies for base-pair stacks, hairpins and internal loop flanks results in a significant improvement in the accuracy of secondary structure prediction; however, the hairpin flanks contribute the most.
statistical potentials; RNA folding; thermodynamic stability; comparative analysis
Predicting RNA secondary structure is often the first step to determining the structure of RNA. Prediction approaches have historically avoided searching for pseudoknots because of the extreme combinatorial and time complexity of the problem. Yet neglecting pseudoknots limits the utility of such approaches. Here, an algorithm utilizing structure mapping and thermodynamics is introduced for RNA pseudoknot prediction that finds the minimum free energy and identifies information about the flexibility of the RNA. The heuristic approach takes advantage of the 5′ to 3′ folding direction of many biological RNA molecules and is consistent with the hierarchical folding hypothesis and the contact order model. Mapping methods are used to build and analyze the folded structure for pseudoknots and to add important 3D structural considerations. The program can predict some well known pseudoknot structures correctly. The results of this study suggest that many functional RNA sequences are optimized for proper folding. They also suggest directions we can proceed in the future to achieve even better results.
The in vitro selection of nucleic acid libraries has driven the discovery of RNA and DNA receptors (aptamers) and catalysts with tailor-made functional properties. Functional nucleic acids emerging from selections have been observed to possess an unusually high degree of secondary structure. In this study, we experimentally examined the relationship between the degree of secondary structure in a nucleic acid library and its ability to yield aptamers that bind protein targets. We designed a patterned nucleic acid library (denoted R*Y*) to enhance the formation of stem-loop structures without imposing any specific sequence or secondary structural requirement. This patterned library was predicted computationally to contain a significantly higher average folding energy compared to a standard, unpatterned N60 library of the same length. We performed three different iterated selections for protein binding using patterned and unpatterned libraries competing in the same solution. In all three cases, the patterned R*Y* library was enriched relative to the unpatterned library over the course of the 9- to 10-round selection. Characterization of individual aptamer clones emerging from the three selections revealed that the highest affinity aptamer assayed arose from the patterned library for two protein targets, while in the third case, the highest affinity aptamers from the patterned and random libraries exhibited comparable affinity. We identified the binding motif requirements for the most active aptamers generated against two of the targets. The two binding motifs are 3.4- and 27-fold more likely to occur in the R*Y* library than in the N60 library. Collectively, our findings suggest that researchers performing selections for nucleic acid aptamers and catalysts should consider patterned libraries rather than commonly used Nm libraries to increase both the likelihood of isolating functional molecules and the potential activities of the resulting molecules.
Large, multi-domain RNA molecules are generally thought to fold following multiple pathways down rugged landscapes populated with intermediates and traps. A challenge to understanding RNA folding reactions are the complex relationships that exist between the structure of the RNA and its folding landscape. The identification of intermediate species that populate folding landscapes and characterization of elements of their structures are key components to solving the RNA folding problem. This review explores recent studies that characterize the dominant pathways by which RNA folds, structural and dynamic features of intermediates that populate the folding landscape and the energy barriers that separate the distinct steps of the folding process.
RNA has been recognized as a key player in cellular regulation in recent years. In many cases, non-coding RNAs exert their function by binding to other nucleic acids, as in the case of microRNAs and snoRNAs. The specificity of these interactions derives from the stability of inter-molecular base pairing. The accurate computational treatment of RNA-RNA binding therefore lies at the heart of target prediction algorithms.
The standard dynamic programming algorithms for computing secondary structures of linear single-stranded RNA molecules are extended to the co-folding of two interacting RNAs.
We present a program, RNAcofold, that computes the hybridization energy and base pairing pattern of a pair of interacting RNA molecules. In contrast to earlier approaches, complex internal structures in both RNAs are fully taken into account. RNAcofold supports the calculation of the minimum energy structure and of a complete set of suboptimal structures in an energy band above the ground state. Furthermore, it provides an extension of McCaskill's partition function algorithm to compute base pairing probabilities, realistic interaction energies, and equilibrium concentrations of duplex structures.
RNAcofold is distributed as part of the Vienna RNA Package, .
Stephan H. Bernhart – firstname.lastname@example.org
RNA molecules fold into characteristic secondary and tertiary structures that account for their diverse functional activities. Many of these RNA structures, or certain structural motifs within them, are thought to recur in multiple genes within a single organism or across the same gene in several organisms and provide a common regulatory mechanism. Search algorithms, such as RNAMotif, can be used to mine nucleotide sequence databases for these repeating motifs. RNAMotif allows users to capture essential features of known structures in detailed descriptors and can be used to identify, with high specificity, other similar motifs within the nucleotide database. However, when the descriptor constraints are relaxed to provide more flexibility, or when there is very little a priori information about hypothesized RNA structures, the number of motif ‘hits’ may become very large. Exhaustive methods to search for similar RNA structures over these large search spaces are likely to be computationally intractable. Here we describe a powerful new algorithm based on evolutionary computation to solve this problem. A series of experiments using ferritin IRE and SRP RNA stem–loop motifs were used to verify the method. We demonstrate that even when searching extremely large search spaces, of the order of 1023 potential solutions, we could find the correct solution in a fraction of the time it would have taken for exhaustive comparisons.
RNA molecules fold into characteristic secondary and tertiary structures that account for their diverse functional activities. Many of these RNA structures are assembled from a collection of RNA structural motifs. These basic building blocks are used repeatedly, and in various combinations, to form different RNA types and define their unique structural and functional properties. Identification of recurring RNA structural motifs will therefore enhance our understanding of RNA structure and help associate elements of RNA structure with functional and regulatory elements. Our goal was to develop a computer program that can describe an RNA structural element of any complexity and then search any nucleotide sequence database, including the complete prokaryotic and eukaryotic genomes, for these structural elements. Here we describe in detail a new computational motif search algorithm, RNAMotif, and demonstrate its utility with some motif search examples. RNAMotif differs from other motif search tools in two important aspects: first, the structure definition language is more flexible and can specify any type of base–base interaction; second, RNAMotif provides a user controlled scoring section that can be used to add capabilities that patterns alone cannot provide.
Motivation: With recent advances in sequencing, structural and functional studies of RNA lag behind the discovery of sequences. Computational analysis of RNA is increasingly important to reveal structure–function relationships with low cost and speed. The purpose of this study is to use multiple homologous sequences to infer a conserved RNA structure.
Results: A new algorithm, called Multilign, is presented to find the lowest free energy RNA secondary structure common to multiple sequences. Multilign is based on Dynalign, which is a program that simultaneously aligns and folds two sequences to find the lowest free energy conserved structure. For Multilign, Dynalign is used to progressively construct a conserved structure from multiple pairwise calculations, with one sequence used in all pairwise calculations. A base pair is predicted only if it is contained in the set of low free energy structures predicted by all Dynalign calculations. In this way, Multilign improves prediction accuracy by keeping the genuine base pairs and excluding competing false base pairs. Multilign has computational complexity that scales linearly in the number of sequences. Multilign was tested on extensive datasets of sequences with known structure and its prediction accuracy is among the best of available algorithms. Multilign can run on long sequences (> 1500 nt) and an arbitrarily large number of sequences.
Availability: The algorithm is implemented in ANSI C++ and can be downloaded as part of the RNAstructure package at: http://rna.urmc.rochester.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
RNAexinv is an interactive java application that performs RNA sequence design, constrained to yield a specific RNA shape and physical attributes. It is an extended inverse RNA folding program with the rationale behind that the generated sequences should not only fold into a desired structure, but they should also exhibit favorable attributes such as thermodynamic stability and mutational robustness. RNAexinv considers not only the secondary structure in order to design sequences, but also the mutational robustness and the minimum free energy. The sequences that are generated may not fully conform with the given RNA secondary structure, but they will strictly conform with the RNA shape of the given secondary structure and thereby take into consideration the recommended values of thermodynamic stability and mutational robustness that are provided.
The output consists of designed sequences that are generated by the proposed method. Selecting a sequence displays the secondary structure drawings of the target and the predicted fold of the sequence, including some basic information about the desired and achieved thermodynamic stability and mutational robustness. RNAexinv can be used successfully without prior experience, simply specifying an initial RNA secondary structure in dot-bracket notation and numerical values for the desired neutrality and minimum free energy. The package runs under LINUX operating system. Secondary structure predictions are performed using the Vienna RNA package.
RNAexinv is a user friendly tool that can be used for RNA sequence design. It is especially useful in cases where a functional stem-loop structure of a natural sequence should be strictly kept in the designed sequences but a distant motif in the rest of the structure may contain one more or less nucleotide at the expense of another, as long as the global shape is preserved. This allows the insertion of physical observables as constraints. RNAexinv is available at http://www.cs.bgu.ac.il/~RNAexinv.
For the purposes of finding and aligning noncoding RNA gene- and cis-regulatory elements in multiple-genome datasets, it is useful to be able to derive multi-sequence stochastic grammars (and hence multiple alignment algorithms) systematically, starting from hypotheses about the various kinds of random mutation event and their rates.
Here, we consider a highly simplified evolutionary model for RNA, called "The TKF91 Structure Tree" (following Thorne, Kishino and Felsenstein's 1991 model of sequence evolution with indels), which we have implemented for pairwise alignment as proof of principle for such an approach. The model, its strengths and its weaknesses are discussed with reference to four examples of functional ncRNA sequences: a riboswitch (guanine), a zipcode (nanos), a splicing factor (U4) and a ribozyme (RNase P). As shown by our visualisations of posterior probability matrices, the selected examples illustrate three different signatures of natural selection that are highly characteristic of ncRNA: (i) co-ordinated basepair substitutions, (ii) co-ordinated basepair indels and (iii) whole-stem indels.
Although all three types of mutation "event" are built into our model, events of type (i) and (ii) are found to be better modeled than events of type (iii). Nevertheless, we hypothesise from the model's performance on pairwise alignments that it would form an adequate basis for a prototype multiple alignment and genefinding tool.