In a previous study it was shown that RNase P from E. coli cleaves the tRNA-like structure of turnip yellow mosaic virus (TYMV) RNA in vitro (Guerrier-Takada et al. (1988) Cell, 53, 267-272). Cleavage takes place at the 3' side of the loop that crosses the deep groove of the pseudoknot structure present in the aminoacyl acceptor domain. In the present study fragments of TYMV RNA with mutations in the pseudoknot, generated by transcription in vitro, were tested for susceptibility to cleavage by RNase P. Changes in the specificity with respect to the site of cleavage and decreases in the rate of cleavage were observed with most of these substrates. The behaviour of various mutants in the reaction catalyzed by RNase P is in agreement with the present model of the TYMV RNA pseudoknot (Dumas et al. (1987), J. Biomol. Struct. Dyn. 263, 652-657). Base substitutions in the loop that crosses the shallow groove of the pseudoknot structure resulted, however, in an unexpected decrease in the rate of cleavage, probably due to conformational changes in the substrates. Studies on other tRNA-like structures revealed an important role in the reaction with RNase P for both the nucleotide at the 3' side of the loop that spans the deep groove and the nucleotide at position 4, which correspond to positions--1 and 73, respectively, in tRNA precursors.
The intergenic region internal ribosome entry site (IGR IRES) of the Dicistroviridae family adopts an overlapping triple pseudoknot structure to directly recruit the 80S ribosome in the absence of initiation factors. The pseudoknot I (PKI) domain of the IRES mimics a tRNA-like codon:anticodon interaction in the ribosomal P site to direct translation initiation from a non-AUG initiation codon in the A site. In this study, we have performed a comprehensive mutational analysis of this region to delineate the molecular parameters that drive IRES translation. We demonstrate that IRES-mediated translation can initiate at an alternate adjacent and overlapping start site, provided that basepairing interactions within PKI remain intact. Consistent with this, IGR IRES translation tolerates increases in the variable loop region that connects the anticodon- and codon-like elements within the PKI domain, as IRES activity remains relatively robust up to a 4-nucleotide insertion in this region. Finally, elements from an authentic tRNA anticodon stem-loop can functionally supplant corresponding regions within PKI. These results verify the importance of the codon:anticodon interaction of the PKI domain and further define the specific elements within the tRNA-like domain that contribute to optimal initiator Met-tRNAi-independent IRES translation.
Translation of Hepatitis C viral proteins requires an internal ribosome entry site (IRES) located in the 5′ untranslated region of the viral mRNA. The core domain of the Hepatitis C virus (HCV) IRES contains a four-way helical junction that is integrated within a predicted pseudoknot. This domain is required for positioning the mRNA start codon correctly on the 40S ribosomal subunit during translation initiation. Here we present the crystal structure of this RNA, revealing a complex double-pseudoknot fold that establishes the alignment of two helical elements on either side of the four-helix junction. The conformation of this core domain constrains the open reading frame’s orientation for positioning on the 40S ribosomal subunit. This structure, representing the last major domain of HCV-like IRESs to be determined at near-atomic resolution, provides the basis for a comprehensive cryo-electron microscopy-guided model of the intact HCV IRES and its interaction with 40S ribosomal subunits.
Based on the experimentally determined atomic coordinates for RNA helices and the self-avoiding walks of the P (phosphate) and C4 (carbon) atoms in the diamond lattice for the polynucleotide loop conformations, we derive a set of conformational entropy parameters for RNA pseudoknots. Based on the entropy parameters, we develop a folding thermodynamics model that enables us to compute the sequence-specific RNA pseudoknot folding free energy landscape and thermodynamics. The model is validated through extensive experimental tests both for the native structures and for the folding thermodynamics. The model predicts strong sequence-dependent helix-loop competitions in the pseudoknot stability and the resultant conformational switches between different hairpin and pseudoknot structures. For instance, for the pseudoknot domain of human telomerase RNA, a native-like and a misfolded hairpin intermediates are found to coexist on the (equilibrium) folding pathways, and the interplay between the stabilities of these intermediates causes the conformational switch that may underlie a human telomerase disease.
The 104 nucleotides long 3' terminal region of TMV RNA was shown previously to contain two pseudoknotted structures (Rietveld et al. (1984), EMBO J. 3, 2613-2619). We here present evidence for the occurrence, within the 204 nucleotides long 3' noncoding region, of another highly structured domain located immediately adjacent to the tRNA-like structure of 95 nucleotides (Joshi et al. (1985) Nucleic Acids Res. 13, 347-354). A model for the three-dimensional folding of this region, containing three more pseudoknots, is proposed on the basis of chemical modification and enzymatic digestion. The existence of these three consecutive pseudoknots was supported by sequence comparisons with the RNA from the related tobamoviruses TMV-L, CcTMV and CGMMV. Coaxial stacking of the six double helical segments involved gives rise to the formation of a 25 basepair long quasi-continuous double helix. The results show that the three-dimensional folding of the 3' non-translated region of tobamoviral RNAs is largely maintained by the formation of five pseudoknots. The organisation of this region in the RNA of the tobamovirus CcTMV suggests that recombinational events among aminoacylatable plant viral RNAs have to be considered.
Key elements of the conformational switch model describing regulation of alfalfa mosaic virus (AMV) replication (R. C. Olsthoorn, S. Mertens, F. T. Brederode, and J. F. Bol, EMBO J. 18:4856-4864, 1999) have been tested using biochemical assays and functional studies in nontransgenic protoplasts. Although comparative sequence analysis suggests that the 3′ untranslated regions of AMV and ilarvirus RNAs have the potential to fold into pseudoknots, we were unable to confirm that a proposed pseudoknot forms or has a functional role in regulating coat protein-RNA binding or viral RNA replication. Published work has suggested that the pseudoknot is part of a tRNA-like structure (TLS); however, we argue that the canonical sequence and functional features that define the TLS are absent. We suggest here that the absence of the TLS correlates directly with the distinctive requirement for coat protein to activate replication in these viruses. Experimental data are evidence that elevated magnesium concentrations proposed to stabilize the pseudoknot structure do not block coat protein binding. Additionally, covarying nucleotide changes proposed to reestablish pseudoknot pairings do not rescue replication. Furthermore, as described in the accompanying paper (L. M. Guogas, S. M. Laforest, and L. Gehrke, J. Virol. 79:5752-5761, 2005), coat protein is not, by definition, inhibitory to minus-strand RNA synthesis. Rather, the activation of viral RNA replication by coat protein is shown to be concentration dependent. We describe the 3′ organization model as an alternate model of AMV replication that offers an improved fit to the available data.
The 3'-untranslated region (UTR) of tobacco mosaic virus (TMV), which terminates in a tRNA-like structure, functionally substitutes for a poly(A) tail in both plant and animal cells. The addition of the TMV 3'-UTR to chimeric mRNA constructs increases their expression up to 100-fold, increasing both translational efficiency and mRNA stability. The domain largely responsible for the regulation maps to a 72 base region immediately upstream of the tRNA-like structure, however, the 3'-terminal, tRNA-like structure is required for full function. Its contribution is lost if separated from the upstream pseudoknot domain by as few as 5 bases or if 6 bases are removed from the 3'-terminus. Sequence addition to the 3'-terminus of the TMV 3'UTR or the upstream pseudoknot domain inhibits function in both tobacco and Chinese hamster ovary cells.
RNA secondary structure prediction is one major task in bioinformatics, and various computational methods have been proposed so far. Pseudoknot is one of the typical substructures appearing in several RNAs, and plays an important role in some biological processes. Prediction of RNA secondary structure with pseudoknots is still challenging since the problem is NP-hard when arbitrary pseudoknots are taken into consideration.
We introduce a new method of predicting RNA secondary structure with pseudoknots based on integer programming. In our formulation, we aim at minimizing the value of the objective function that reflects free energy of a folding structure of an input RNA sequence. We focus on a practical class of pseudoknots by setting constraints appropriately. Experimental results for a set of real RNA sequences show that our proposed method outperforms several existing methods in sensitivity. Furthermore, for a set of sequences of small length, our approach achieved good performance in both sensitivity and specificity.
Our integer programming-based approach for RNA structure prediction is flexible and extensible.
The approximately 200-nucleotide-long 3'-terminal noncoding region of tobacco mosaic virus (TMV) RNA contains a tRNA-like structure and, in its immediate upstream region, three consecutive pseudoknots, each of which is composed of two double-helical segments. To elucidate the biological functions of the pseudoknot region, we constructed several deletion mutant TMV-L (a tomato strain) RNAs by using an in vitro transcription system and tested their ability to multiply in both tobacco plants and protoplasts. When deletions were introduced just downstream of the termination codon of the coat protein gene in the 5'-to-3' direction progressively, five of six double-helical segments were dispensable for viral multiplication, indicating that the pseudoknot structures are not essential for multiplication. However, extension of the deletion into the central pseudoknot region resulted in reduction in viral multiplication, accompanied by loss of development of mosaic symptoms on systemic tobacco plants. Cessation of multiplication was observed when the sequence involved in formation of double-helical segment I just upstream of the tRNA-like structure was deleted irrespective of the start point and extent of deletion. Point mutations that destabilized double-helical segment I resulted in a loss or great reduction of viral multiplication, whereas the double mutants in which the double helix was restored by additional compensating base substitutions restored multiplication to nearly the wild-type level. Thus, double-helical segment I just upstream of the tRNA-like structure is a structural feature essential for viral multiplication.
Telomerase is the ribonucleoprotein reverse transcriptase involved in the maintenance of the telomeres, the termini of eukaryotic chromosomes. The RNA component of human telomerase (hTR) consists of 451 nucleotides with the 5′ half folding into a highly conserved catalytic core comprising the template region and an adjacent pseudoknot domain (nucleotides 1–208). While the secondary structure of hTR is established, there is little understanding of its three-dimensional (3D) architecture. Here, we have used fluorescence resonance energy transfer (FRET) between fluorescently labelled peptide nucleic acids, hybridized to defined single stranded regions of full length hTR, to evaluate long-range distances. Using molecular modeling, the distance constraints derived by FRET were subsequently used, together with the known secondary structure, to generate a 3D model of the catalytic core of hTR. An overlay of a large set of models generated has provided a low-resolution structure (6.5–8.0 Å) that can readily be refined as new structural information becomes available. A notable feature of the modeled structure is the positioning of the template adjacent to the pseudoknot, which brings a number of conserved nucleotides close in space.
Bovine viral diarrhea virus (BVDV) is the prototype representative of the pestivirus genus in the Flaviviridae family. It has been shown that the initiation of translation of BVDV RNA occurs by an internal ribosome entry mechanism mediated by the 5' untranslated region of the viral RNA . The 5' and 3' boundaries of the IRES of the cytopathic BVDV NADL have been mapped and it has been suggested that the IRES extends into the coding of the BVDV polyprotein . A putative pseudoknot structure has been recognized in the BVDV 5'UTR in close proximity to the AUG start codon. A pseudoknot structure is characteristic for flavivirus IRESes and in the case of the closely related classical swine fever virus (CSFV) and the more distantly related Hepatitis C virus (HCV) pseudoknot function in translation has been demonstrated.
To characterize the BVDV IRESes in detail, we studied the BVDV translational initiation by transfection of dicistronic expression plasmids into mammalian cells. A region coding for the amino terminus of the BVDV SD-1 polyprotein contributes considerably to efficient initiation of translation. The translation efficiency mediated by the IRES of BVDV strains NADL and SD-1 approximates the poliovirus type I IRES directed translation in BHK cells. Compared to the poliovirus IRES increased expression levels are mediated by the BVDV IRES of strain SD-1 in murine cell lines, while lower levels are observed in human cell lines. Site directed mutagenesis revealed that a RNA pseudoknot upstream of the initiator AUG is an important structural element for IRES function. Mutants with impaired ability to base pair in stem I or II lost their translational activity. In mutants with repaired base pairing either in stem 1 or in stem 2 full translational activity was restored. Thus, the BVDV IRES translation is dependent on the pseudoknot integrity. These features of the pestivirus IRES are reminiscent of those of the classical swine fever virus, a pestivirus, and the hepatitis C viruses, another genus of the Flaviviridae.
The IRES of the non-cytopathic BVDV SD-1 strain displays features known from other pestivirus IRESes. The predicted pseudoknot in the 5'UTR of BVDV SD-1 virus represents an important structural element in BVDV translation.
RNA virus genomes contain cis-acting sequence and structural elements that participate in viral replication. We previously identified a bulged stem-loop secondary structure at the upstream end of the 3′ untranslated region (3′ UTR) of the genome of the coronavirus mouse hepatitis virus (MHV). This element, beginning immediately downstream of the nucleocapsid gene stop codon, was shown to be essential for virus replication. Other investigators discovered an adjacent downstream pseudoknot in the 3′ UTR of the closely related bovine coronavirus (BCoV). This pseudoknot was also shown to be essential for replication, and it has a conserved counterpart in every group 1 and group 2 coronavirus. In MHV and BCoV, the bulged stem-loop and pseudoknot are, in part, mutually exclusive, because of the overlap of the last segment of the stem-loop and stem 1 of the pseudoknot. This led us to hypothesize that they form a molecular switch, possibly regulating a transition occurring during viral RNA synthesis. We have now performed an extensive genetic analysis of the two components of this proposed switch. Our results define essential and nonessential components of these structures and establish the limits to which essential parts of each element can be destabilized prior to loss of function. Most notably, we have confirmed the interrelationship of the two putative switch elements. Additionally, we have identified a pseudoknot loop insertion mutation that appears to point to a genetic interaction between the pseudoknot and a distant region of the genome.
Programmed −1 ribosomal frameshifting (PRF) and stop codon readthrough are two translational recoding mechanisms utilized by some RNA viruses to express their structural and enzymatic proteins at a defined ratio. Efficient recoding usually requires an RNA pseudoknot located several nucleotides downstream from the recoding site. To assess the strategic importance of the recoding pseudoknots, we have carried out a large scale genome-wide analysis in which we used an in-house developed program to detect all possible H-type pseudoknots within the genomic mRNAs of 81 animal viruses. Pseudoknots are detected downstream from ~85% of the recoding sites, including many previously unknown pseudoknots. ~78% of the recoding pseudoknots are the most stable pseudoknot within the viral genomes. However, they are not as strong as some designed pseudoknots that exhibit roadblocking effect on the translating ribosome. Strong roadblocking pseudoknots are not detected within the viral genomes. These results indicate that the decoding pseudoknots have evolved to possess optimal stability for efficient recoding. We also found that the sequence at the gag-pol frameshift junction of HIV1 harbors potential elaborated pseudoknots encompassing the frameshift site. A novel mechanism is proposed for possible involvement of the elaborated pseudoknots in the HIV1 PRF event.
The algorithm and the program for the prediction of RNA secondary structure with pseudoknot formation have been proposed. The algorithm simulates stepwise folding by generating random structures using Monte Carlo method, followed by the selection of helices to final structure on the basis of both their probabilities of occurrence in a random structure and free energy parameters. The program versions have been tested on ribosomal RNA structures and on RNAs with pseudoknots evidenced by experimental data. It is shown that the simulation of folding during RNA synthesis improves the results. The introduction of pseudoknot formation permits to predict the pseudoknotted structures and to improve the prediction of long-range interactions. The computer program is rather fast and allows to predict the structures for long RNAs without using large memory volumes in usual personal computer.
Difficult problems in structural bioinformatics are often studied in simple exact models to gain insights and to derive general principles. Protein folding, for example, has long been studied in the lattice model. Recently, researchers have also begun to apply the lattice model to the study of RNA folding.
We present a novel method for predicting RNA secondary structures with pseudoknots: first simulate the folding dynamics of the RNA sequence on the 3D triangular lattice, next extract and select a set of disjoint base pairs from the best lattice conformation found by the folding simulation. Experiments on sequences from PseudoBase show that our prediction method outperforms the HotKnot algorithm of Ren, Rastegari, Condon and Hoos, a leading method for RNA pseudoknot prediction. Our method for RNA secondary structure prediction can be adapted into an efficient reconstruction method that, given an RNA sequence and an associated secondary structure, finds a conformation of the sequence on the 3D triangular lattice that realizes the base pairs in the secondary structure. We implemented a suite of computer programs for the simulation and visualization of RNA folding on the 3D triangular lattice. These programs come with detailed documentation and are accessible from the companion website of this paper at http://www.cs.usu.edu/~mjiang/rna/DeltaIS/.
Folding simulation on the 3D triangular lattice is effective method for RNA secondary structure prediction and lattice conformation reconstruction. The visualization software for the lattice conformations of RNA structures is a valuable tool for the study of RNA folding and is a great pedagogic device.
Hepatitis delta virus RNAs possess self-cleavage activities that produce 2′,3′-cyclic phosphate and 5′-hydroxyl termini (i.e. cis-acting delta ribozyme). Trans-acting delta ribozymes have been engineered by removing a junction from the cis version, thereby producing one molecule possessing the substrate sequence and the other the catalytic domain. According to the pseudoknot model, the secondary structure of the delta ribozyme includes a pseudoknot (i.e. P1.1 stem) formed by two base pairs from residues of the L3 loop and J1/4 junction. A collection of 48 P1.1 stem mutants was synthesized in order to provide an original characterization of both the importance and the structure of this pseudoknot in a trans-acting version of the ribozyme. Several structural differences were noted compared to the results reported for cis-acting ribozymes. For example, a combination of two stable Watson–Crick base pairs composing the essential P1.1 stem was demonstrated to be crucial for a significant level of activity, while the cis version required only one base pair. In addition, we present the first physical evidences revealing that the composition of the P1.1 stem affects the substrate specificity for ribozyme cleavage. Depending on the residues forming the J1/4 junction, non-productive ribozyme–substrate complexes can be observed. This phenomenon is proposed to be important for further development of a gene-inactivation system based on delta ribozyme.
The human prion gene contains five copies of a 24 nt repeat that
is highly conserved among species. An analysis of folding free energies
of the human prion mRNA, in particular in the repeat region, suggested biased
codon selection and the presence of RNA patterns. In particular,
pseudoknots, similar to the one predicted by Wills in the human
prion mRNA, were identified in the repeat region of all available prion
mRNAs available in GenBank, but not those of birds and the red slider
turtle. An alignment of these mRNAs, which share low sequence homology, shows
several co-variations that maintain the pseudoknot pattern. The
presence of pseudoknots in yeast Sup35p and Rnq1 suggests acquisition
in the prokaryotic era. Computer generated three-dimensional structures
of the human prion pseudoknot highlight protein and RNA interaction
domains, which suggest a possible effect in prion protein translation.
The role of pseudoknots in prion diseases is discussed as individuals
with extra copies of the 24 nt repeat develop the familial form
of Creutzfeldt–Jakob disease.
Pseudoknotted structures play important structural and functional roles in RNA cellular functions at the level of transcription, splicing and translation. However, the problem of computational prediction for large pseudoknotted folds remains. Here we develop a domain-based method for predicting complex and large pseudoknotted structures from RNA sequences. The model is based on the observation that large RNAs can be separated into different structural domains. The basic idea is to first identify the domains and then predict the structures for each domain. Assembly of the domain structures gives the full structure. The use of the domain-based approach leads to a reduction of computational time by a factor of about ~N2 for an N-nt sequence. As applications of the model, we predict structures for a variety of RNA systems, such as regions in human telomerase RNA (hTR), internal ribosome entry site (IRES) and HIV genome. The lengths of these sequences range from 200-nt to 400-nt. The results show good agreements with the experiments.
hepatitis delta virus (HDV); human immunodeficiency virus (HIV); human telomerase RNA (hTR); internal ribosome entry site (IRES); large RNAs; Pseudoknots; structural predictions
The analysis of sequence-structure relations of RNA is based on a specific notion and folding of RNA structure. The notion of coarse grained structure employed here is that of canonical RNA pseudoknot contact-structures with at most two mutually crossing bonds (3-noncrossing). These structures are folded by a novel, ab initio prediction algorithm, cross, capable of searching all 3-noncrossing RNA structures. The algorithm outputs the minimum free energy structure.
After giving some background on RNA pseudoknot structures and providing an outline of the folding algorithm being employed, we present in this paper various, statistical results on the mapping from RNA sequences into 3-noncrossing RNA pseudoknot structures. We study properties, like the fraction of pseudoknot structures, the dominant pseudoknot-shapes, neutral walks, neutral neighbors and local connectivity. We then put our results into context of molecular evolution of RNA.
Our results imply that, in analogy to RNA secondary structures, 3-noncrossing pseudoknot RNA represents a molecular phenotype that is well suited for molecular and in particular neutral evolution. We can conclude that extended, percolating neutral networks of pseudoknot RNA exist.
Trans-translation releases stalled ribosomes from truncated mRNAs and tags defective proteins for proteolytic degradation using transfer-messenger RNA (tmRNA). This small stable RNA represents a hybrid of tRNA- and mRNA-like domains connected by a variable number of pseudoknots. Comparative sequence analysis of tmRNAs found in bacteria, plastids, and mitochondria provides considerable insights into their secondary structures. Progress toward understanding the molecular mechanism of template switching, which constitutes an essential step in trans-translation, is hampered by our limited knowledge about the three-dimensional folding of tmRNA.
To facilitate experimental testing of the molecular intricacies of trans-translation, which often require appropriately modified tmRNA derivatives, we developed a procedure for building three-dimensional models of tmRNA. Using comparative sequence analysis, phylogenetically-supported 2-D structures were obtained to serve as input for the program ERNA-3D. Motifs containing loops and turns were extracted from the known structures of other RNAs and used to improve the tmRNA models. Biologically feasible 3-D models for the entire tmRNA molecule could be obtained. The models were characterized by a functionally significant close proximity between the tRNA-like domain and the resume codon. Potential conformational changes which might lead to a more open structure of tmRNA upon binding to the ribosome are discussed. The method, described in detail for the tmRNAs of Escherichia coli, Bacillus anthracis, and Caulobacter crescentus, is applicable to every tmRNA.
Improved molecular models of biological significance were obtained. These models will guide in the design of experiments and provide a better understanding of trans-translation. The comparative procedure described here for tmRNA is easily adopted for the modeling the members of other RNA families.
The 3' noncoding region of turnip yellow mosaic virus RNA includes an 82-nucleotide-long tRNA-like structure domain and a short upstream region that includes a potential pseudoknot overlapping the coat protein termination codon. Genomic RNAs with point mutations in the 3' noncoding region that result in poor replication in protoplasts and no systemic symptoms in planta were inoculated onto Chinese cabbage plants in an effort to obtain second-site suppressor mutations. Putative second-site suppressor mutations were identified by RNase protection and sequencing and were then introduced into genomic cDNA clones to permit their characterization. A C-57----U mutation in the tRNA-like structure was a strong suppressor of the C-55----A mutation which prevented both systemic infection and in vitro valylation of the viral RNA. Both of these phenotypes were rescued in the double mutant. An A-107----C mutation was a strong second-site suppressor of the U-96----G mutation, permitting the double mutant to establish systemic infection. The C-107 and G-96 mutations are located on opposite strands of one helix of a potential pseudoknot, and the results support a functional role for the pseudoknot structure. A mutation near the 5' end of the genome (G + 92----A), at position -3 relative to the initiation codon of the essential open reading frame 206, was found to be a general potentiator of viral replication, probably as a result of enhanced expression of open reading frame 206. The A + 92 mutation enhanced the replication of mutant TYMC-G96 in protoplasts but was not a sufficiently potent suppressor to permit systemic spread of the A + 92/G-96 double mutant in plants.
The genomic RNA of tobacco mosaic virus (TMV), like that of other positive-strand RNA viruses, acts as a template for both translation and replication. The highly structured 3′ untranslated region (UTR) of TMV RNAs plays an important role in both processes; it is not polyadenylated but ends with a tRNA-like structure (TLS) preceded by a conserved upstream pseudoknot domain (UPD). The TLS of tobamoviral RNAs can be specifically aminoacylated and, in this state, can interact with eukaryotic elongation factor 1A (eEF1A)/GTP with high affinity. Using a UV cross-linking assay, we detected another specific binding site for eEF1A/GTP, within the UPDs of TMV and crucifer-infecting tobamovirus (crTMV), that does not require aminoacylation. A mutational analysis revealed that UPD pseudoknot conformation and some conserved primary sequence elements are required for this interaction. Its possible role in the regulation of tobamovirus gene expression and replication is discussed.
We have investigated the secondary structure of peach latent mosaic viroid (PLMVd) in solution, and we present here the first description of the structure of a branched viroid in solution. Different PLMVd transcripts of plus polarity were produced by using the circularly permuted RNA method and the exploitation of RNA internal secondary structure to position the 5′ and 3′ termini and studied by nuclease mapping and binding shift assays using DNA and RNA oligonucleotides. We show that PLMVd folds into a complex, branched secondary structure. In general, this structure is similar to that reported previously, which was based on sequence comparison and computer modelling. The structural microheterogeneity is apparently limited to only some small domains. More importantly, this structure includes a novel pseudoknot that is conserved in all PLMVd isolates and seems to allow folding into a very compact form. This pseudoknot is also found in chrysanthemum chlorotic mottle viroid, suggesting that it is a unique feature of the viroid members of the PLMVd subgroup.
Predicting RNA secondary structure is often the first step to determining the structure of RNA. Prediction approaches have historically avoided searching for pseudoknots because of the extreme combinatorial and time complexity of the problem. Yet neglecting pseudoknots limits the utility of such approaches. Here, an algorithm utilizing structure mapping and thermodynamics is introduced for RNA pseudoknot prediction that finds the minimum free energy and identifies information about the flexibility of the RNA. The heuristic approach takes advantage of the 5′ to 3′ folding direction of many biological RNA molecules and is consistent with the hierarchical folding hypothesis and the contact order model. Mapping methods are used to build and analyze the folded structure for pseudoknots and to add important 3D structural considerations. The program can predict some well known pseudoknot structures correctly. The results of this study suggest that many functional RNA sequences are optimized for proper folding. They also suggest directions we can proceed in the future to achieve even better results.
The general problem of RNA secondary structure prediction under the widely used thermodynamic model is known to be NP-complete when the structures considered include arbitrary pseudoknots. For restricted classes of pseudoknots, several polynomial time algorithms have been designed, where the O(n6)time and O(n4) space algorithm by Rivas and Eddy is currently the best available program.
We introduce the class of canonical simple recursive pseudoknots and present an algorithm that requires O(n4) time and O(n2) space to predict the energetically optimal structure of an RNA sequence, possible containing such pseudoknots. Evaluation against a large collection of known pseudoknotted structures shows the adequacy of the canonization approach and our algorithm.
RNA pseudoknots of medium size can now be predicted reliably as well as efficiently by the new algorithm.