Modular architecture is a hallmark of RNA structures, implying structural, and possibly functional, similarity among existing RNAs. To systematically delineate the existence of smaller topologies within larger structures, we develop and apply an efficient RNA secondary structure comparison algorithm using a newly developed two-dimensional RNA graphical representation. Our survey of similarity among 14 pseudoknots and subtopologies within ribosomal RNAs (rRNAs) uncovers eight pairs of structurally related pseudoknots with non-random sequence matches and reveals modular units in rRNAs. Significantly, three structurally related pseudoknot pairs have functional similarities not previously known: one pair involves the 3′ end of brome mosaic virus genomic RNA (PKB134) and the alternative hammerhead ribozyme pseudoknot (PKB173), both of which are replicase templates for viral RNA replication; the second pair involves structural elements for translation initiation and ribosome recruitment found in the viral internal ribosome entry site (PKB223) and the V4 domain of 18S rRNA (PKB205); the third pair involves 18S rRNA (PKB205) and viral tRNA-like pseudoknot (PKB134), which probably recruits ribosomes via structural mimicry and base complementarity. Additionally, we quantify the modularity of 16S and 23S rRNAs by showing that RNA motifs can be constructed from at least 210 building blocks. Interestingly, we find that the 5S rRNA and two tree modules within 16S and 23S rRNAs have similar topologies and tertiary shapes. These modules can be applied to design novel RNA motifs via build-up-like procedures for constructing sequences and folds.
RNA inverse folding is a computational technology for designing RNA sequences which fold into a user-specified secondary structure. Although pseudoknots are functionally important motifs in RNA structures, less reports concerning the inverse folding of pseudoknotted RNAs have been done compared to those for pseudoknot-free RNA design. In this paper, we present a new version of our multi-objective genetic algorithm (MOGA), MODENA, which we have previously proposed for pseudoknot-free RNA inverse folding. In the new version of MODENA, (i) a new crossover operator is implemented and (ii) pseudoknot prediction methods, IPknot and HotKnots, are used to evaluate the designed RNA sequences, allowing us to perform the inverse folding of pseudoknotted RNAs. The new version of MODENA with the new crossover operator was benchmarked with a dataset composed of natural pseudoknotted RNA secondary structures, and we found that MODENA can successfully design more pseudoknotted RNAs compared to the other pseudoknot design algorithm. In addition, a sequence constraint function newly implemented in the new version of MODENA was tested by designing RNA sequences which fold into the pseudoknotted structure of a hepatitis delta virus ribozyme; as a result, we successfully designed eight RNA sequences. The new version of MODENA is downloadable from http://rna.eit.hirosaki-u.ac.jp/modena/.
inverse folding; pseudoknot; secondary structure; pseudobase; Rfam; sequence constraint
Translation of Hepatitis C viral proteins requires an internal ribosome entry site (IRES) located in the 5′ untranslated region of the viral mRNA. The core domain of the Hepatitis C virus (HCV) IRES contains a four-way helical junction that is integrated within a predicted pseudoknot. This domain is required for positioning the mRNA start codon correctly on the 40S ribosomal subunit during translation initiation. Here we present the crystal structure of this RNA, revealing a complex double-pseudoknot fold that establishes the alignment of two helical elements on either side of the four-helix junction. The conformation of this core domain constrains the open reading frame’s orientation for positioning on the 40S ribosomal subunit. This structure, representing the last major domain of HCV-like IRESs to be determined at near-atomic resolution, provides the basis for a comprehensive cryo-electron microscopy-guided model of the intact HCV IRES and its interaction with 40S ribosomal subunits.
In a previous study it was shown that RNase P from E. coli cleaves the tRNA-like structure of turnip yellow mosaic virus (TYMV) RNA in vitro (Guerrier-Takada et al. (1988) Cell, 53, 267-272). Cleavage takes place at the 3' side of the loop that crosses the deep groove of the pseudoknot structure present in the aminoacyl acceptor domain. In the present study fragments of TYMV RNA with mutations in the pseudoknot, generated by transcription in vitro, were tested for susceptibility to cleavage by RNase P. Changes in the specificity with respect to the site of cleavage and decreases in the rate of cleavage were observed with most of these substrates. The behaviour of various mutants in the reaction catalyzed by RNase P is in agreement with the present model of the TYMV RNA pseudoknot (Dumas et al. (1987), J. Biomol. Struct. Dyn. 263, 652-657). Base substitutions in the loop that crosses the shallow groove of the pseudoknot structure resulted, however, in an unexpected decrease in the rate of cleavage, probably due to conformational changes in the substrates. Studies on other tRNA-like structures revealed an important role in the reaction with RNase P for both the nucleotide at the 3' side of the loop that spans the deep groove and the nucleotide at position 4, which correspond to positions--1 and 73, respectively, in tRNA precursors.
The diverse landscape of RNA conformational space includes many canyons and crevices that are distant from the lowest minimum free energy valley and remain unexplored by traditional RNA structure prediction methods. A complete description of the entire RNA folding landscape can facilitate identification of biologically important conformations. The Crumple algorithm rapidly enumerates all possible non-pseudoknotted structures for an RNA sequence without consideration of thermodynamics while filtering the output with experimental data. The Crumple algorithm provides an alternative approach to traditional free energy minimization programs for RNA secondary structure prediction. A complete computation of all non-pseudoknotted secondary structures can reveal structures that would not be predicted by methods that sample the RNA folding landscape based on thermodynamic predictions. The free energy minimization approach is often successful but is limited by not considering RNA tertiary and protein interactions and the possibility that kinetics rather than thermodynamics determines the functional RNA fold. Efficient parallel computing and filters based on experimental data make practical the complete enumeration of all non-pseudoknotted structures. Efficient parallel computing for Crumple is implemented in a ring graph approach. Filters for experimental data include constraints from chemical probing of solvent accessibility, enzymatic cleavage of paired or unpaired nucleotides, phylogenetic covariation, and the minimum number and lengths of helices determined from crystallography or cryo-electron microscopy. The minimum number and length of helices has a significant effect on reducing conformational space. Pairing constraints reduce conformational space more than single nucleotide constraints. Examples with Alfalfa Mosaic Virus RNA and Trypanosome brucei guide RNA demonstrate the importance of evaluating all possible structures when pseduoknots, RNA-protein interactions, and metastable structures are important for biological function. Crumple software is freely available at http://adenosine.chem.ou.edu/software.html.
Bovine viral diarrhea virus (BVDV) is the prototype representative of the pestivirus genus in the Flaviviridae family. It has been shown that the initiation of translation of BVDV RNA occurs by an internal ribosome entry mechanism mediated by the 5' untranslated region of the viral RNA . The 5' and 3' boundaries of the IRES of the cytopathic BVDV NADL have been mapped and it has been suggested that the IRES extends into the coding of the BVDV polyprotein . A putative pseudoknot structure has been recognized in the BVDV 5'UTR in close proximity to the AUG start codon. A pseudoknot structure is characteristic for flavivirus IRESes and in the case of the closely related classical swine fever virus (CSFV) and the more distantly related Hepatitis C virus (HCV) pseudoknot function in translation has been demonstrated.
To characterize the BVDV IRESes in detail, we studied the BVDV translational initiation by transfection of dicistronic expression plasmids into mammalian cells. A region coding for the amino terminus of the BVDV SD-1 polyprotein contributes considerably to efficient initiation of translation. The translation efficiency mediated by the IRES of BVDV strains NADL and SD-1 approximates the poliovirus type I IRES directed translation in BHK cells. Compared to the poliovirus IRES increased expression levels are mediated by the BVDV IRES of strain SD-1 in murine cell lines, while lower levels are observed in human cell lines. Site directed mutagenesis revealed that a RNA pseudoknot upstream of the initiator AUG is an important structural element for IRES function. Mutants with impaired ability to base pair in stem I or II lost their translational activity. In mutants with repaired base pairing either in stem 1 or in stem 2 full translational activity was restored. Thus, the BVDV IRES translation is dependent on the pseudoknot integrity. These features of the pestivirus IRES are reminiscent of those of the classical swine fever virus, a pestivirus, and the hepatitis C viruses, another genus of the Flaviviridae.
The IRES of the non-cytopathic BVDV SD-1 strain displays features known from other pestivirus IRESes. The predicted pseudoknot in the 5'UTR of BVDV SD-1 virus represents an important structural element in BVDV translation.
Key elements of the conformational switch model describing regulation of alfalfa mosaic virus (AMV) replication (R. C. Olsthoorn, S. Mertens, F. T. Brederode, and J. F. Bol, EMBO J. 18:4856-4864, 1999) have been tested using biochemical assays and functional studies in nontransgenic protoplasts. Although comparative sequence analysis suggests that the 3′ untranslated regions of AMV and ilarvirus RNAs have the potential to fold into pseudoknots, we were unable to confirm that a proposed pseudoknot forms or has a functional role in regulating coat protein-RNA binding or viral RNA replication. Published work has suggested that the pseudoknot is part of a tRNA-like structure (TLS); however, we argue that the canonical sequence and functional features that define the TLS are absent. We suggest here that the absence of the TLS correlates directly with the distinctive requirement for coat protein to activate replication in these viruses. Experimental data are evidence that elevated magnesium concentrations proposed to stabilize the pseudoknot structure do not block coat protein binding. Additionally, covarying nucleotide changes proposed to reestablish pseudoknot pairings do not rescue replication. Furthermore, as described in the accompanying paper (L. M. Guogas, S. M. Laforest, and L. Gehrke, J. Virol. 79:5752-5761, 2005), coat protein is not, by definition, inhibitory to minus-strand RNA synthesis. Rather, the activation of viral RNA replication by coat protein is shown to be concentration dependent. We describe the 3′ organization model as an alternate model of AMV replication that offers an improved fit to the available data.
The 3' noncoding region of turnip yellow mosaic virus RNA includes an 82-nucleotide-long tRNA-like structure domain and a short upstream region that includes a potential pseudoknot overlapping the coat protein termination codon. Genomic RNAs with point mutations in the 3' noncoding region that result in poor replication in protoplasts and no systemic symptoms in planta were inoculated onto Chinese cabbage plants in an effort to obtain second-site suppressor mutations. Putative second-site suppressor mutations were identified by RNase protection and sequencing and were then introduced into genomic cDNA clones to permit their characterization. A C-57----U mutation in the tRNA-like structure was a strong suppressor of the C-55----A mutation which prevented both systemic infection and in vitro valylation of the viral RNA. Both of these phenotypes were rescued in the double mutant. An A-107----C mutation was a strong second-site suppressor of the U-96----G mutation, permitting the double mutant to establish systemic infection. The C-107 and G-96 mutations are located on opposite strands of one helix of a potential pseudoknot, and the results support a functional role for the pseudoknot structure. A mutation near the 5' end of the genome (G + 92----A), at position -3 relative to the initiation codon of the essential open reading frame 206, was found to be a general potentiator of viral replication, probably as a result of enhanced expression of open reading frame 206. The A + 92 mutation enhanced the replication of mutant TYMC-G96 in protoplasts but was not a sufficiently potent suppressor to permit systemic spread of the A + 92/G-96 double mutant in plants.
Trans-translation releases stalled ribosomes from truncated mRNAs and tags defective proteins for proteolytic degradation using transfer-messenger RNA (tmRNA). This small stable RNA represents a hybrid of tRNA- and mRNA-like domains connected by a variable number of pseudoknots. Comparative sequence analysis of tmRNAs found in bacteria, plastids, and mitochondria provides considerable insights into their secondary structures. Progress toward understanding the molecular mechanism of template switching, which constitutes an essential step in trans-translation, is hampered by our limited knowledge about the three-dimensional folding of tmRNA.
To facilitate experimental testing of the molecular intricacies of trans-translation, which often require appropriately modified tmRNA derivatives, we developed a procedure for building three-dimensional models of tmRNA. Using comparative sequence analysis, phylogenetically-supported 2-D structures were obtained to serve as input for the program ERNA-3D. Motifs containing loops and turns were extracted from the known structures of other RNAs and used to improve the tmRNA models. Biologically feasible 3-D models for the entire tmRNA molecule could be obtained. The models were characterized by a functionally significant close proximity between the tRNA-like domain and the resume codon. Potential conformational changes which might lead to a more open structure of tmRNA upon binding to the ribosome are discussed. The method, described in detail for the tmRNAs of Escherichia coli, Bacillus anthracis, and Caulobacter crescentus, is applicable to every tmRNA.
Improved molecular models of biological significance were obtained. These models will guide in the design of experiments and provide a better understanding of trans-translation. The comparative procedure described here for tmRNA is easily adopted for the modeling the members of other RNA families.
tmRNA combines tRNA- and mRNA-like properties and ameliorates problems arising from stalled ribosomes. Research on the mechanism, structure and biology of tmRNA is served by the tmRNA website (http://www.indiana.edu/~tmrna), a collection of sequences, alignments, secondary structures and other information. Because many of these sequences are not in GenBank, a BLAST server has been added; another new feature is an abbreviated alignment for the tRNA-like domain only. Many tmRNA sequences from plastids have been added, five found in public sequence data and another 10 generated by direct sequencing; detection in early-branching members of the green plastid lineage brings coverage to all three primary plastid lineages. The new sequences include the shortest known tmRNA sequence. While bacterial tmRNAs usually have a lone pseudoknot upstream of the mRNA segment and a string of three or four pseudoknots downstream, plastid tmRNAs collectively show loss of pseudoknots at both postions. The pseudoknot-string region is also too short to contain the usual pseudoknot number in another new entry, the tmRNA sequence from a bacterial endosymbiont of insect cells, Tremblaya princeps. Pseudoknots may optimize tmRNA function in free-living bacteria, yet become dispensible when the endosymbiotic lifestyle relaxes selective pressure for fast growth.
The intergenic region internal ribosome entry site (IGR IRES) of the Dicistroviridae family adopts an overlapping triple pseudoknot structure to directly recruit the 80S ribosome in the absence of initiation factors. The pseudoknot I (PKI) domain of the IRES mimics a tRNA-like codon:anticodon interaction in the ribosomal P site to direct translation initiation from a non-AUG initiation codon in the A site. In this study, we have performed a comprehensive mutational analysis of this region to delineate the molecular parameters that drive IRES translation. We demonstrate that IRES-mediated translation can initiate at an alternate adjacent and overlapping start site, provided that basepairing interactions within PKI remain intact. Consistent with this, IGR IRES translation tolerates increases in the variable loop region that connects the anticodon- and codon-like elements within the PKI domain, as IRES activity remains relatively robust up to a 4-nucleotide insertion in this region. Finally, elements from an authentic tRNA anticodon stem-loop can functionally supplant corresponding regions within PKI. These results verify the importance of the codon:anticodon interaction of the PKI domain and further define the specific elements within the tRNA-like domain that contribute to optimal initiator Met-tRNAi-independent IRES translation.
Three tRNA-associated properties of a representative set of tymoviral RNAs have been quantitatively assessed using higher plant (wheat germ) proteins: aminoacylation, EF-1alpha*GTP binding, and 3'-adenylation of 3'-CC forms of the RNAs by CTP, ATP:tRNA nucleotidyltransferase. The RNAs fall into three classes differing in the extent of tRNA mimicry. Turnip yellow mosaic (TYMV) and kennedya yellow mosaic virus RNAs had activities in all three properties similar to those of a higher plant tRNAValtranscript, and thus are remarkable tRNA mimics. Although the isolated approximately 83 nt long tRNA-like structures showed high activity in these assays, in the case of TYMV, the 6318 nt long TYMV RNA was an even better substrate for valylation. Eggplant mosaic virus RNA, which has a differently constructed acceptor stem pseudoknot, differed from the above tymoviral RNAs in binding more weakly to EF-1alpha*GTP. Erysimum latent virus RNA, which lacks an identifiable anticodon domain, could not be valylated and had very low 3'-adenylation activity. The range of tRNA mimicry within the tymovirus genus thus ranges from extremely highly developed to minimal. The implications on the role of the tRNA mimicry in viral biology are discussed.
The 104 nucleotides long 3' terminal region of TMV RNA was shown previously to contain two pseudoknotted structures (Rietveld et al. (1984), EMBO J. 3, 2613-2619). We here present evidence for the occurrence, within the 204 nucleotides long 3' noncoding region, of another highly structured domain located immediately adjacent to the tRNA-like structure of 95 nucleotides (Joshi et al. (1985) Nucleic Acids Res. 13, 347-354). A model for the three-dimensional folding of this region, containing three more pseudoknots, is proposed on the basis of chemical modification and enzymatic digestion. The existence of these three consecutive pseudoknots was supported by sequence comparisons with the RNA from the related tobamoviruses TMV-L, CcTMV and CGMMV. Coaxial stacking of the six double helical segments involved gives rise to the formation of a 25 basepair long quasi-continuous double helix. The results show that the three-dimensional folding of the 3' non-translated region of tobamoviral RNAs is largely maintained by the formation of five pseudoknots. The organisation of this region in the RNA of the tobamovirus CcTMV suggests that recombinational events among aminoacylatable plant viral RNAs have to be considered.
Transfer-messenger RNA (tmRNA) is a unique molecule that combines properties from both tRNA and mRNA, and facilitates a novel translation reaction termed trans -translation. According to phylogenetic sequence analysis among various bacteria and chemical probing analysis, the secondary structure of the 350-400 nt RNA is commonly characterized by a tRNA-like structure, and four pseudoknots with different sizes. A mutational analysis using a number of Escherichia coli tmRNA variants as well as a chemical probing analysis has recently demonstrated not only the presence of the smallest pseudoknot, PK1, upstream of the internal coding region, but also its direct implication in trans -translation. Here, NMR methods were used to investigate the structure of the 31 nt pseudoknot PK1 and its 11 mutants in which nucleotide substitutions are introduced into each of two stems or the linking loops. NMR results provide evidence that the PK1 RNA is folded into a pseudoknot structure in the presence of Mg(2+). Imino proton resonances were observed consistent with formation of two helical stem regions and these stems stacked to each other as often seen in pseudoknot structures, in spite of the existence of three intervening nucleo-tides, loop 3, between the stems. Structural instability of the pseudoknot structure, even in the presence of Mg(2+), was found in the PK1 mutants except in the loop 3 mutants which still maintained the pseudoknot folding. These results together with their biological activities indicate that trans -translation requires the pseudoknot structure stabilized by Mg(2+)and specific residues G61 and G62 in loop 3.
The importance of certain structural features of the 5′ untranslated region of classical swine fever virus (CSFV) RNA for the function of the internal ribosome entry site (IRES) was investigated by mutagenesis followed by in vitro transcription and translation. Deletions made from the 5′ end of the CSFV genome sequence showed that the IRES boundary was close to nucleotide 65: thus, the IRES includes the whole of domain II but no sequences upstream of this domain. Deletions which invaded domain II even to a small extent reduced activity to about 20% that of the full-length structure, and this 20% residual activity persisted with more extensive deletions until the whole of domain II had been removed and the deletions invaded the pseudoknot, whereupon IRES activity fell to zero. The importance of both stems of the pseudoknot was verified by making mutations in both sides of each stem; this severely reduced IRES activity, but the compensating mutations which restored base pairing caused almost full IRES function to be regained. The importance of the length of the loop linking the two stems of the pseudoknot was demonstrated by the finding that a reduction in length from the wild-type AUAAAAUU to AUU almost completely abrogated IRES activity. Random A→U substitutions in the wild-type sequence showed that IRES activity was fairly proportional to the number of A residues retained in this pseudoknot loop, with a preference for clustered neighboring A residues rather than dispersed As. Finally, it was found that the sequence of the highly conserved domain IIIa loop is, rather surprisingly, not important for the maintenance of full IRES activity, although amputation of the entire domain IIIa stem and loop was highly debilitating. These results are interpreted in the light of recent models, derived from cryo-electron microscopy, of the interaction of the closely related hepatitis C virus IRES with 40S ribosomal subunits.
Programmed −1 ribosomal frameshifting is widely used in the expression of RNA virus replicases and represents a potential target for antiviral intervention. There is interest in determining the extent to which frameshifting efficiency can be modulated before virus replication is compromised, and we have addressed this question using the alpharetrovirus Rous sarcoma virus (RSV) as a model system. In RSV, frameshifting is essential in the production of the Gag-Pol polyprotein from the overlapping gag and pol coding sequences. The frameshift signal is composed of two elements, a heptanucleotide slippery sequence and, just downstream, a stimulatory RNA structure that has been proposed to be an RNA pseudoknot. Point mutations were introduced into the frameshift signal of an infectious RSV clone, and virus replication was monitored following transfection and subsequent infection of susceptible cells. The introduced mutations were designed to generate a range of frameshifting efficiencies, yet with minimal impact on encoded amino acids. Our results reveal that point mutations leading to a 3-fold decrease in frameshifting efficiency noticeably reduce virus replication and that further reduction is severely inhibitory. In contrast, a 3-fold stimulation of frameshifting is well tolerated. These observations suggest that small-molecule inhibitors of frameshifting are likely to have potential as agents for antiviral intervention. During the course of this work, we were able to confirm, for the first time in vivo, that the RSV stimulatory RNA is indeed an RNA pseudoknot but that the pseudoknot per se is not absolutely required for virus viability.
Telomerase is the ribonucleoprotein reverse transcriptase involved in the maintenance of the telomeres, the termini of eukaryotic chromosomes. The RNA component of human telomerase (hTR) consists of 451 nucleotides with the 5′ half folding into a highly conserved catalytic core comprising the template region and an adjacent pseudoknot domain (nucleotides 1–208). While the secondary structure of hTR is established, there is little understanding of its three-dimensional (3D) architecture. Here, we have used fluorescence resonance energy transfer (FRET) between fluorescently labelled peptide nucleic acids, hybridized to defined single stranded regions of full length hTR, to evaluate long-range distances. Using molecular modeling, the distance constraints derived by FRET were subsequently used, together with the known secondary structure, to generate a 3D model of the catalytic core of hTR. An overlay of a large set of models generated has provided a low-resolution structure (6.5–8.0 Å) that can readily be refined as new structural information becomes available. A notable feature of the modeled structure is the positioning of the template adjacent to the pseudoknot, which brings a number of conserved nucleotides close in space.
Using a combined master equation and kinetic cluster approach, we investigate RNA pseudoknot folding and unfolding kinetics. The energetic parameters are computed from a recently developed Vfold model for RNA secondary structure and pseudoknot folding thermodynamics. The folding kinetics theory is based on the complete conformational ensemble, including all the native-like and non-native states. The predicted folding and unfolding pathways, activation barriers, Arrhenius plots, and rate-limiting steps lead to several findings. First, for the PK5 pseudoknot, a misfolded 5′ hairpin emerges as a stable kinetic trap in the folding process, and the detrapping from this misfolded state is the rate-limiting step for the overall folding process. The calculated rate constant and activation barrier agree well with the experimental data. Second, as an application of the model, we investigate the kinetic folding pathways for hTR (human Telomerase RNA) pseudoknot. The predicted folding and unfolding pathways not only support the proposed role of conformational switch between hairpin and pseudoknot in hTR activity, but also reveal molecular mechanism for the conformational switch. Furthermore, for an experimentally studied hTR mutation, whose hairpin intermediate is destabilized, the model predicts a long-lived transient hairpin structure, and the switch between the transient hairpin intermediate and the native pseudoknot may be responsible for the observed hTR activity. Such finding would help resolve the apparent contradiction between the observed hTR activity and the absence of a stable hairpin.
Kinetics; RNA pseudoknot; Activation energy; Misfolded state; Telomerase
Computational methods for predicting evolutionarily conserved rather than thermodynamic RNA structures have recently attracted increased interest. These methods are indispensable not only for elucidating the regulatory roles of known RNA transcripts, but also for predicting RNA genes. It has been notoriously difficult to devise them to make the best use of the available data and to predict high-quality RNA structures that may also contain pseudoknots. We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences. We also present an implementation of the framework in a new computer program, called SimulFold, which employs a Bayesian Markov chain Monte Carlo method to sample from the joint posterior distribution of RNA structures, alignments, and trees. We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs. We also present preliminary data that show SimulFold's potential as an alignment and phylogeny prediction method. SimulFold overcomes many conceptual limitations that current RNA structure prediction methods face, introduces several new theoretical techniques, and generates high-quality predictions of conserved RNA structures that may include pseudoknots. It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses.
Not only is the prediction of evolutionarily conserved RNA structures important for elucidating the potential functions of RNA sequences and the mechanisms by which these functions are exerted, but it also lies at the core of RNA gene prediction. To get an accurate prediction of the conserved RNA structure, we need a high-quality sequence alignment and an evolutionary tree relating several evolutionarily related sequences. These are two strong requirements that are typically difficult to fulfill unless the encoded RNA structure is already known. We present what is to our knowledge the first method that solves this chicken-and-egg problem by co-estimating all three quantities simultaneously. We show that our novel method, called SimulFold, can be successfully applied over a wide range of sequence similarities to detect conserved RNA structures, including those with pseudoknots. We also show its potential as an alignment and phylogeny prediction method. Our method overcomes several significant limitations of existing methods and has the potential to be used for a very diverse range of tasks.
RNA virus genomes contain cis-acting sequence and structural elements that participate in viral replication. We previously identified a bulged stem-loop secondary structure at the upstream end of the 3′ untranslated region (3′ UTR) of the genome of the coronavirus mouse hepatitis virus (MHV). This element, beginning immediately downstream of the nucleocapsid gene stop codon, was shown to be essential for virus replication. Other investigators discovered an adjacent downstream pseudoknot in the 3′ UTR of the closely related bovine coronavirus (BCoV). This pseudoknot was also shown to be essential for replication, and it has a conserved counterpart in every group 1 and group 2 coronavirus. In MHV and BCoV, the bulged stem-loop and pseudoknot are, in part, mutually exclusive, because of the overlap of the last segment of the stem-loop and stem 1 of the pseudoknot. This led us to hypothesize that they form a molecular switch, possibly regulating a transition occurring during viral RNA synthesis. We have now performed an extensive genetic analysis of the two components of this proposed switch. Our results define essential and nonessential components of these structures and establish the limits to which essential parts of each element can be destabilized prior to loss of function. Most notably, we have confirmed the interrelationship of the two putative switch elements. Additionally, we have identified a pseudoknot loop insertion mutation that appears to point to a genetic interaction between the pseudoknot and a distant region of the genome.
The analysis of sequence-structure relations of RNA is based on a specific notion and folding of RNA structure. The notion of coarse grained structure employed here is that of canonical RNA pseudoknot contact-structures with at most two mutually crossing bonds (3-noncrossing). These structures are folded by a novel, ab initio prediction algorithm, cross, capable of searching all 3-noncrossing RNA structures. The algorithm outputs the minimum free energy structure.
After giving some background on RNA pseudoknot structures and providing an outline of the folding algorithm being employed, we present in this paper various, statistical results on the mapping from RNA sequences into 3-noncrossing RNA pseudoknot structures. We study properties, like the fraction of pseudoknot structures, the dominant pseudoknot-shapes, neutral walks, neutral neighbors and local connectivity. We then put our results into context of molecular evolution of RNA.
Our results imply that, in analogy to RNA secondary structures, 3-noncrossing pseudoknot RNA represents a molecular phenotype that is well suited for molecular and in particular neutral evolution. We can conclude that extended, percolating neutral networks of pseudoknot RNA exist.
Motivation: Thermodynamics-based dynamic programming RNA secondary structure algorithms have been of immense importance in molecular biology, where applications range from the detection of novel selenoproteins using expressed sequence tag (EST) data, to the determination of microRNA genes and their targets. Dynamic programming algorithms have been developed to compute the minimum free energy secondary structure and partition function of a given RNA sequence, the minimum free-energy and partition function for the hybridization of two RNA molecules, etc. However, the applicability of dynamic programming methods depends on disallowing certain types of interactions (pseudoknots, zig-zags, etc.), as their inclusion renders structure prediction an nondeterministic polynomial time (NP)-complete problem. Nevertheless, such interactions have been observed in X-ray structures.
Results: A non-Boltzmannian Monte Carlo algorithm was designed by Wang and Landau to estimate the density of states for complex systems, such as the Ising model, that exhibit a phase transition. In this article, we apply the Wang-Landau (WL) method to compute the density of states for secondary structures of a given RNA sequence, and for hybridizations of two RNA sequences. Our method is shown to be much faster than existent software, such as RNAsubopt. From density of states, we compute the partition function over all secondary structures and over all pseudoknot-free hybridizations. The advantage of the WL method is that by adding a function to evaluate the free energy of arbitary pseudoknotted structures and of arbitrary hybridizations, we can estimate thermodynamic parameters for situations known to be NP-complete. This extension to pseudoknots will be made in the sequel to this article; in contrast, the current article describes the WL algorithm applied to pseudoknot-free secondary structures and hybridizations.
Availability: The WL RNA hybridization web server is under construction at http://bioinformatics.bc.edu/clotelab/.
We study the sparsification of dynamic programming based on folding algorithms of RNA structures. Sparsification is a method that improves significantly the computation of minimum free energy (mfe) RNA structures.
We provide a quantitative analysis of the sparsification of a particular decomposition rule, Λ∗. This rule splits an interval of RNA secondary and pseudoknot structures of fixed topological genus. Key for quantifying sparsifications is the size of the so called candidate sets. Here we assume mfe-structures to be specifically distributed (see Assumption 1) within arbitrary and irreducible RNA secondary and pseudoknot structures of fixed topological genus. We then present a combinatorial framework which allows by means of probabilities of irreducible sub-structures to obtain the expectation of the Λ∗-candidate set w.r.t. a uniformly random input sequence. We compute these expectations for arc-based energy models via energy-filtered generating functions (GF) in case of RNA secondary structures as well as RNA pseudoknot structures. Furthermore, for RNA secondary structures we also analyze a simplified loop-based energy model. Our combinatorial analysis is then compared to the expected number of Λ∗-candidates obtained from the folding mfe-structures. In case of the mfe-folding of RNA secondary structures with a simplified loop-based energy model our results imply that sparsification provides a significant, constant improvement of 91% (theory) to be compared to an 96% (experimental, simplified arc-based model) reduction. However, we do not observe a linear factor improvement. Finally, in case of the “full” loop-energy model we can report a reduction of 98% (experiment).
Sparsification was initially attributed a linear factor improvement. This conclusion was based on the so called polymer-zeta property, which stems from interpreting polymer chains as self-avoiding walks. Subsequent findings however reveal that the O(n) improvement is not correct. The combinatorial analysis presented here shows that, assuming a specific distribution (see Assumption 1), of mfe-structures within irreducible and arbitrary structures, the expected number of Λ∗-candidates is Θ(n2). However, the constant reduction is quite significant, being in the range of 96%. We furthermore show an analogous result for the sparsification of the Λ∗-decomposition rule for RNA pseudoknotted structures of genus one. Finally we observe that the effect of sparsification is sensitive to the employed energy model.
Sparsification; Generating function; Dynamic programming
Based on the experimentally determined atomic coordinates for RNA helices and the self-avoiding walks of the P (phosphate) and C4 (carbon) atoms in the diamond lattice for the polynucleotide loop conformations, we derive a set of conformational entropy parameters for RNA pseudoknots. Based on the entropy parameters, we develop a folding thermodynamics model that enables us to compute the sequence-specific RNA pseudoknot folding free energy landscape and thermodynamics. The model is validated through extensive experimental tests both for the native structures and for the folding thermodynamics. The model predicts strong sequence-dependent helix-loop competitions in the pseudoknot stability and the resultant conformational switches between different hairpin and pseudoknot structures. For instance, for the pseudoknot domain of human telomerase RNA, a native-like and a misfolded hairpin intermediates are found to coexist on the (equilibrium) folding pathways, and the interplay between the stabilities of these intermediates causes the conformational switch that may underlie a human telomerase disease.
The human prion gene contains five copies of a 24 nt repeat that
is highly conserved among species. An analysis of folding free energies
of the human prion mRNA, in particular in the repeat region, suggested biased
codon selection and the presence of RNA patterns. In particular,
pseudoknots, similar to the one predicted by Wills in the human
prion mRNA, were identified in the repeat region of all available prion
mRNAs available in GenBank, but not those of birds and the red slider
turtle. An alignment of these mRNAs, which share low sequence homology, shows
several co-variations that maintain the pseudoknot pattern. The
presence of pseudoknots in yeast Sup35p and Rnq1 suggests acquisition
in the prokaryotic era. Computer generated three-dimensional structures
of the human prion pseudoknot highlight protein and RNA interaction
domains, which suggest a possible effect in prion protein translation.
The role of pseudoknots in prion diseases is discussed as individuals
with extra copies of the 24 nt repeat develop the familial form
of Creutzfeldt–Jakob disease.