PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of narLink to Publisher's site
 
Nucleic Acids Res. 2010 April; 38(7): e103.
Published online 2010 January 31. doi:  10.1093/nar/gkq021
PMCID: PMC2853144

DotKnot: pseudoknot prediction using the probability dot plot under a refined energy model

Abstract

RNA pseudoknots are functional structure elements with key roles in viral and cellular processes. Prediction of a pseudoknotted minimum free energy structure is an NP-complete problem. Practical algorithms for RNA structure prediction including restricted classes of pseudoknots suffer from high runtime and poor accuracy for longer sequences. A heuristic approach is to search for promising pseudoknot candidates in a sequence and verify those. Afterwards, the detected pseudoknots can be further analysed using bioinformatics or laboratory techniques. We present a novel pseudoknot detection method called DotKnot that extracts stem regions from the secondary structure probability dot plot and assembles pseudoknot candidates in a constructive fashion. We evaluate pseudoknot free energies using novel parameters, which have recently become available. We show that the conventional probability dot plot makes a wide class of pseudoknots including those with bulged stems manageable in an explicit fashion. The energy parameters now become the limiting factor in pseudoknot prediction. DotKnot is an efficient method for long sequences, which finds pseudoknots with higher accuracy compared to other known prediction algorithms. DotKnot is accessible as a web server at http://dotknot.csse.uwa.edu.au.

INTRODUCTION

RNA is a versatile nucleic acid, which is no longer seen as the passive intermediate between DNA and proteins. Numerous functional RNAs with an astonishing variety have been uncovered in the past decade. For example, non-coding RNAs participate in a wide range of cellular processes, are able to regulate gene expression and can act as catalyst (1,2). Macromolecule function is closely connected to its 3D folding and structure prediction from the base sequence is thus of great importance. RNA structure formation is understood to be hierarchical and, therefore, secondary structure prediction is the foundation for determining the tertiary folding (3,4).

There are two streams in computational RNA secondary structure prediction: comparative approaches and single sequence methods. In general, comparative approaches give accurate results for a set of well-conserved sequences (5). However, comparative methods rely on the quality of multiple alignments and are not always feasible for RNA structure prediction, due to a lack of reliable data sets (6).

When only a single sequence is given, the most popular approach for RNA structure prediction is free energy minimization. In the minimum free energy (MFE) model, continuous base pairs contribute enthalpic terms and loop regions are purely entropic. RNA comprises various secondary structure elements, i.e. stems, hairpin loops, bulge loops, internal loops and multiloops. Much experimental work has been done to determine their free energy parameters (7,8). The key concept that allows for dynamic programming is that all of these motifs are non-crossing and self-contained in terms of their free energy. The MFE secondary structure based on the additive free energy model can be predicted in An external file that holds a picture, illustration, etc.
Object name is gkq021i1.jpg time and An external file that holds a picture, illustration, etc.
Object name is gkq021i2.jpg space using dynamic programming (9,10). MFE prediction has been extended in several ways (11). Suboptimal structures with free energy close to the MFE can be calculated (12,13). Using the dynamic programming principle, the full equilibrium partition function for RNA secondary structure is computed in An external file that holds a picture, illustration, etc.
Object name is gkq021i3.jpg time and An external file that holds a picture, illustration, etc.
Object name is gkq021i4.jpg space (14). From the partition function, probabilities for base pairs and structure elements are derived. The main advantage of the MFE algorithm is its guarantee to find an optimal structure with regards to the underlying energy model. However, the inability to predict crossing structure elements, so-called pseudoknots, is a major drawback.

Pseudoknots are functional structure elements, which have been reported in most classes of RNA (15). A pseudoknot forms when unpaired bases in a loop pair with complementary bases in a single-stranded region outside the loop (Figure 1). Pseudoknots play key roles in viral genome replication and regulation of protein synthesis (16). Pseudoknots are also found in the cell where they participate in processes such as splicing, ribosomal frameshifting, telomerase activity and ribosome function (17–19). In many cases, they assist in the overall 3D folding (20,21) and should not be excluded from computational structure prediction.

Figure 1.
A simple H-type pseudoknot. (a) Unpaired bases within a hairpin loop bond with unpaired bases in a single-stranded region outside the loop. (b) The resulting pseudoknot has two stems An external file that holds a picture, illustration, etc.
Object name is gkq021i5.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i6.jpg and three loops An external file that holds a picture, illustration, etc.
Object name is gkq021i7.jpg, An external file that holds a picture, illustration, etc.
Object name is gkq021i8.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i9.jpg. (c) A pseudoknot has at least two crossing ...

RNA secondary structure prediction with arbitrary pseudoknots under a basic energy model is NP-complete (22,23). Restricted classes of pseudoknots can be included in the dynamic programming algorithm for prediction of the MFE structure, resulting in high computational complexity. In dynamic programming, there is always a trade-off between the generality of pseudoknots, which can be predicted, and runtime. Rivas and Eddy (24) cover a broad class of pseudoknots including kissing hairpins in their algorithm, which requires An external file that holds a picture, illustration, etc.
Object name is gkq021i10.jpg time and An external file that holds a picture, illustration, etc.
Object name is gkq021i11.jpg space. More restricted pseudoknots are included in other dynamic programming algorithms, which have runtime of An external file that holds a picture, illustration, etc.
Object name is gkq021i12.jpg using An external file that holds a picture, illustration, etc.
Object name is gkq021i13.jpg or An external file that holds a picture, illustration, etc.
Object name is gkq021i14.jpg space (22,23,25). All of these algorithms are only feasible for short RNA sequences. The most practical method is pknotsRG, which computes the MFE structure with canonical simple recursive pseudoknots in An external file that holds a picture, illustration, etc.
Object name is gkq021i15.jpg time and An external file that holds a picture, illustration, etc.
Object name is gkq021i16.jpg space (26). Dynamic programming does guarantee to find a structure with minimum free energy with respect to the underlying energy model. However, the energy model for pseudoknots used in dynamic programming is only a simple parameterization adopted from the affine multiloop energy model (24–26). It was first introduced by Rivas and Eddy (24) because of the lack of experimentally measured parameters for pseudoknot energies. Predictive accuracy of MFE folding is always limited by the underlying energy model, and hence pseudoknot prediction results are poor for longer sequences.

Due to the computational complexity of dynamic programming for pseudoknot prediction, heuristic approaches were developed as an alternative. Heuristic methods do not necessarily return the MFE structure; however, they can include a wide class of pseudoknots and more advanced energy models in reasonable runtime. RNA secondary structure prediction including pseudoknots has been approached using genetic algorithms (27,28), stochastic context-free grammars (29,30), kinetic folding simulations (31,32) and maximum weighted matching in a folding graph (33). Iterative stem adding procedures have also been developed (34–36). In several heuristic algorithms, the underlying energy model for secondary structures is simple base pair maximization, neglecting loop entropies (33,34). This may lead to unreliable results, especially for longer sequences. Another drawback is that most heuristic methods reported in the literature employ the same affine pseudoknot energy model as dynamic programming.

A different algorithmic framework is used in heuristic pseudoknot detection programs (37–40). Here, one attempts to find promising pseudoknot candidates in a sequence as a first step. These potential pseudoknots are subsequently analysed and verified. After pseudoknot detection, the remaining sequence can be folded using free energy minimization in An external file that holds a picture, illustration, etc.
Object name is gkq021i17.jpg time and An external file that holds a picture, illustration, etc.
Object name is gkq021i18.jpg space. Pseudoknot detection has two main advantages over dynamic programming. First, it is computationally much more efficient and, therefore, practical for scanning long RNA sequences for pseudoknots. Second, the underlying framework is less restrictive and allows for easy incorporation of sophisticated energy rules for pseudoknots or even comparative information. Pseudoknot detection delivers accurate pseudoknot prediction results for longer sequences in many cases (37,39).

Pseudoknot energy models have been studied in more detail in the past years and reliable energy parameters are in high demand. It is widely accepted that pseudoknot energy cannot be estimated with an additive model as used in MFE folding. There is strong interference between opposite loops and stems (An external file that holds a picture, illustration, etc.
Object name is gkq021i19.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i20.jpg, An external file that holds a picture, illustration, etc.
Object name is gkq021i21.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i22.jpg), which come in close contact in the 3D fold. In a simple hairpin type (H-type) pseudoknot, loops An external file that holds a picture, illustration, etc.
Object name is gkq021i23.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i24.jpg span across the deep narrow (major) groove and the shallow wide (minor) groove, respectively. The corresponding loop entropies are, therefore, not equivalent and depend on the structure and length of the opposite stem (41,42).

The lack of loop entropy parameters is a critical issue in pseudoknot prediction (43). For H-type pseudoknots with interhelix loop size An external file that holds a picture, illustration, etc.
Object name is gkq021i25.jpg nt, loop entropy values were derived using several fitted parameters (41). Gaussian chain approximation based on polymer physics for pseudoknot loop entropies was also proposed (44). Lattice-based models were developed to take into account volume exclusion effects (45–47). The most successful models are based on the atomic coordinates of the RNA backbone. Using polymer statistical mechanics, Cao and Chen (42) calculated loop entropy parameters for pseudoknots with interhelix loop size An external file that holds a picture, illustration, etc.
Object name is gkq021i26.jpg nt. Recently, their so-called virtual bond model has been extended to pseudoknots with interhelix loop size An external file that holds a picture, illustration, etc.
Object name is gkq021i27.jpg nt (48). Inclusion of such a pseudoknot energy model is not straightforward in dynamic programming in reasonable runtime due to dependencies between opposite loops and stems. Calculation of the full partition function under the virtual bond model takes An external file that holds a picture, illustration, etc.
Object name is gkq021i28.jpg time and An external file that holds a picture, illustration, etc.
Object name is gkq021i29.jpg space, making the approach only feasible for sequences shorter than say 150 nt (42). However, pseudoknot energy parameters derived from the virtual bond model are readily available in tabular form for many stem and loop length combinations (42,48). It is straightforward to incorporate such energy parameters in a pseudoknot detection approach, making prediction using much improved energy models feasible for long sequences.

We present DotKnot, a pseudoknot prediction method that incorporates stem–loop correlated pseudoknot energy parameters from the virtual bond model (42,48). The workflow is similar to the detection approach used in KnotSeeker (39), with two main improvements. First, a secondary structure partition function calculation in An external file that holds a picture, illustration, etc.
Object name is gkq021i30.jpg time and An external file that holds a picture, illustration, etc.
Object name is gkq021i31.jpg space is the basis for finding a set of promising structure elements with high confidence (14). This set includes heuristically derived bulge loops, internal loops and multiloops with low free energy. Second, pseudoknot candidates are assembled using the set of promising structure elements and, therefore, loop entropy parameters can readily be used for pseudoknot energy evaluations (42,48). There is no dynamic programming kernel for verification of pseudoknot candidates. This is a major step towards successful pseudoknot prediction as the vast majority of methods in the literature use simple approximations of pseudoknot energies. However, improving the folding model behind the algorithmic framework is clearly the key to accurate prediction (43).

DotKnot predicts the class of recursive H-type pseudoknots where one of the pseudoknot stems can be interrupted by bulge or internal loops. All of the three pseudoknot loops are allowed to form internal secondary structures. Using the set of structural elements derived from the probability dot plot as a construction kit, we could predict a broad class of pseudoknots, including kissing hairpins, in an efficient manner. However, our knowledge about energy parameters and folding mechanisms become the limiting factor for a pseudoknot search tool such as DotKnot. Given an RNA sequence, DotKnot returns only the (possibly empty) set of detected pseudoknots. These detected pseudoknots can subsequently be verified using laboratory techniques or comparative information. The remaining non-crossing sequence can then be folded in An external file that holds a picture, illustration, etc.
Object name is gkq021i32.jpg time and An external file that holds a picture, illustration, etc.
Object name is gkq021i33.jpg space using state-of-the-art RNA secondary structure prediction algorithms to obtain a global folding.

MATERIALS AND METHODS

The detailed workflow for DotKnot is shown in Figure 2. Given an RNA sequence, DotKnot finds sequence fragments with pseudoknot folding potential. These candidates are analysed in terms of their free energy values and credibility in the folded sequence. The output is a (possibly empty) set of detected pseudoknots, which can then be examined closely.

Figure 2.
Workflow used by DotKnot for detecting pseudoknots in a sequence using the probability dot plot. MWIS stands for maximum weight independent set calculation (56).

Stems in the probability dot plot

RNA structure forms through complementary base pairing, resulting in stabilizing stems and destabilizing loop regions. The basic building blocks for a pseudoknot are two crossing stems. We use the probability dot plot derived from the partition function as a guide for finding such building blocks. Given an RNA sequence, the partition function An external file that holds a picture, illustration, etc.
Object name is gkq021i34.jpg is defined as the weighted sum over the set of all possible secondary structures An external file that holds a picture, illustration, etc.
Object name is gkq021i35.jpg, i.e. An external file that holds a picture, illustration, etc.
Object name is gkq021i36.jpg where An external file that holds a picture, illustration, etc.
Object name is gkq021i37.jpg is the universal gas constant and An external file that holds a picture, illustration, etc.
Object name is gkq021i38.jpg the temperature. Once the partition function is known, probabilities for base pairs and structure elements can be calculated. The software RNAfold returns the probability dot plot representing both base pair probabilities and stack probabilities (11). The stack probability An external file that holds a picture, illustration, etc.
Object name is gkq021i39.jpg for a base pair An external file that holds a picture, illustration, etc.
Object name is gkq021i40.jpg is defined as the probability that pair An external file that holds a picture, illustration, etc.
Object name is gkq021i41.jpg and the subsequent pair An external file that holds a picture, illustration, etc.
Object name is gkq021i42.jpg are formed simultaneously (49).

It has been shown that choosing base pairs with high probability can improve secondary structure prediction (50). There are also several approaches that discuss the exclusion of base pairs with low probability for improving runtime. Here, the common technique is to use a cut-off value for base pair probabilities in order to determine only significant base pairs (51). However, for pseudoknot stems we cannot simply dismiss base pairs with low probability. The partition function calculation is based on the ensemble of secondary structure elements, which are non-crossing interactions. In general, the pseudoknot stems are visible in the probability dot plot as they are members of the folding space. One can expect that at least one of the pseudoknot stems will have low base pair and stack probabilities. By default, RNAfold only displays probabilities larger than An external file that holds a picture, illustration, etc.
Object name is gkq021i43.jpg. For pseudoknot detection in the dot plot, we use a cutoff probability of An external file that holds a picture, illustration, etc.
Object name is gkq021i44.jpg to make sure that all pseudoknot stems can be found for long sequences.

Given the probability dot plot, stems are assembled according to certain criteria. First, a stem must have at least three base pairs. Second, one can expect that the stack probabilities in a stable stem do not rise or drop sharply (49). Helix elongation is an energetically favourable process; however, wobble base pairs can be destabilizing (8,52,53). Furthermore, there may be other stable secondary structure elements competing for base pairs. Therefore, we demand that the absolute percentage increase or decrease of stack probabilities for subsequent base pairs An external file that holds a picture, illustration, etc.
Object name is gkq021i45.jpg, An external file that holds a picture, illustration, etc.
Object name is gkq021i46.jpg in a stem has to be smaller than a certain threshold An external file that holds a picture, illustration, etc.
Object name is gkq021i47.jpg.

One has to keep in mind that base pair probabilities are not independent. Therefore, stem probabilities can never be calculated by simply multiplying the base pair probabilities. However, the average probability of participating base pairs in a stem can be used as a confidence indicator (54). In our approach, we calculate the confidence An external file that holds a picture, illustration, etc.
Object name is gkq021i48.jpg for a stem as the average stack probabilities. For each stem, an absolute weight is introduced in addition. The confidence An external file that holds a picture, illustration, etc.
Object name is gkq021i49.jpg is based on the energy model for secondary structures, excluding pseudoknots. Therefore, pseudoknot stems tend to have low average probabilities especially for longer sequences, which does not necessarily correspond to their dominance in native RNA structures. We assign two additional weights for a stem based on a local energy evaluation. The simple stacking model employs the favourable base pair stacking parameters in a stem; however, it excludes the destabilizing energy contributed by the hairpin loop. The more sophisticated free energy model includes all entropy and enthalpy parameters derived by the Turner group (8). The tool RNAeval is used to evaluate the local free energies of the stem candidates according to the two energy models introduced above, taking into account dangling ends on both sides (11). DotKnot stores two stem weights An external file that holds a picture, illustration, etc.
Object name is gkq021i50.jpg (simple stacking model) and An external file that holds a picture, illustration, etc.
Object name is gkq021i51.jpg (free energy model) for a stem An external file that holds a picture, illustration, etc.
Object name is gkq021i52.jpg. Only stems An external file that holds a picture, illustration, etc.
Object name is gkq021i53.jpg satisfying the following conditions are kept: An external file that holds a picture, illustration, etc.
Object name is gkq021i54.jpg kcal/mol and An external file that holds a picture, illustration, etc.
Object name is gkq021i55.jpg kcal/mol. The resulting stems are stored in a stem dictionary An external file that holds a picture, illustration, etc.
Object name is gkq021i56.jpg. Each stem An external file that holds a picture, illustration, etc.
Object name is gkq021i57.jpg has a unique key An external file that holds a picture, illustration, etc.
Object name is gkq021i58.jpg, where An external file that holds a picture, illustration, etc.
Object name is gkq021i59.jpg is the start position and An external file that holds a picture, illustration, etc.
Object name is gkq021i60.jpg the end position of the stem in sequence An external file that holds a picture, illustration, etc.
Object name is gkq021i61.jpg. For each stem An external file that holds a picture, illustration, etc.
Object name is gkq021i62.jpg, we store its length and the following values in the stem dictionary: An external file that holds a picture, illustration, etc.
Object name is gkq021i63.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i64.jpg.

Finding bulges, internal loops and multiloops

In RNA structures, stems are often interrupted by bulge loops or internal loops. In the following, a stem interrupted by bulge loops or internal loops is denoted as An external file that holds a picture, illustration, etc.
Object name is gkq021i65.jpg (Figure 3). Naive construction of interrupted stems An external file that holds a picture, illustration, etc.
Object name is gkq021i67.jpg from the stem dictionary An external file that holds a picture, illustration, etc.
Object name is gkq021i68.jpg would be inefficient due to a large number of possible stem combinations. Therefore, we employ a heuristic strategy for finding interrupted stems with low free energy.

Figure 3.
(a) Example for a stem An external file that holds a picture, illustration, etc.
Object name is gkq021i66.jpg interrupted by a bulge loop and an internal loop. (b) The corresponding secondary structure is shown.

Given the stem dictionary An external file that holds a picture, illustration, etc.
Object name is gkq021i69.jpg, interrupted stems are constructed using maximum weight independent set (MWIS) calculations with the confidence indicators as stem weights (Supplementary Algorithm 1). Using confidence An external file that holds a picture, illustration, etc.
Object name is gkq021i70.jpg instead of local free energy gives better results in the MWIS calculation. First, it implicitly penalizes the formation of long bulge or internal loops. Second, the confidence is a relative measurement based on the whole folding ensemble. Only stems An external file that holds a picture, illustration, etc.
Object name is gkq021i71.jpg with confidence An external file that holds a picture, illustration, etc.
Object name is gkq021i72.jpg are considered as base pairs below this threshold are unlikely to participate in non-crossing secondary structure formation (55). This cutoff step also significantly reduces runtime for longer sequences. Each stem An external file that holds a picture, illustration, etc.
Object name is gkq021i73.jpg has two (left and right) endpoints An external file that holds a picture, illustration, etc.
Object name is gkq021i74.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i75.jpg and is represented as an interval on the line. First, the sorted endpoints list for all stems is constructed and scanned from left to right. If a right endpoint is discovered, a candidate list of stems nested in the interval An external file that holds a picture, illustration, etc.
Object name is gkq021i76.jpg is returned for the corresponding stem An external file that holds a picture, illustration, etc.
Object name is gkq021i77.jpg. A MWIS calculation on the candidate list returns the set of internal stems with maximum weight in linear time (56). The outer stem An external file that holds a picture, illustration, etc.
Object name is gkq021i78.jpg can, therefore, become an interrupted stem An external file that holds a picture, illustration, etc.
Object name is gkq021i79.jpg with bulges and internal loops or the exterior stem of a multiloop. The sum of confidences serves as the updated weight for outer stem An external file that holds a picture, illustration, etc.
Object name is gkq021i80.jpg. After the whole endpoints list has been scanned, stems interrupted by bulge loops or internal loops are stored in a new dictionary An external file that holds a picture, illustration, etc.
Object name is gkq021i81.jpg. Multiloops are stored in a separate dictionary An external file that holds a picture, illustration, etc.
Object name is gkq021i82.jpg. As a last step, the free energies of the interrupted stems and multiloops are evaluated with the tool RNAeval (11). Only structure elements with negative free energy are stored.

So far, DotKnot derived a set of regular stems, interrupted stems and multiloops in a heuristic fashion from the probability dot plot (Figure 2). There is no guarantee to find the structure element with local minimum free energy for a given sequence stretch. However, dictionaries An external file that holds a picture, illustration, etc.
Object name is gkq021i83.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i84.jpg will contain structures with free energy value close or equal to the local minimum free energy. These elements will be of great benefit for the elimination of false positive pseudoknots.

Construction of candidate pseudoknots

One major advantage of the detection approach is that the pseudoknot prediction target class is predefined and transparent. In contrast, dynamic programming algorithms construct pseudoknots through their recursion scheme, which in some cases leads to unspecific pseudoknot target classes (24,57). In this work, recursive H-type pseudoknots will be considered similar to the class of pseudoknots predicted by pknotsRG (26). A recursive H-type pseudoknot has two crossing stems An external file that holds a picture, illustration, etc.
Object name is gkq021i85.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i86.jpg, resulting in three loops An external file that holds a picture, illustration, etc.
Object name is gkq021i87.jpg, An external file that holds a picture, illustration, etc.
Object name is gkq021i88.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i89.jpg. All of the three loops are allowed to form internal secondary structures; however, loop–loop interactions are not allowed. In this work, we also include pseudoknots where one of the stems An external file that holds a picture, illustration, etc.
Object name is gkq021i90.jpg or An external file that holds a picture, illustration, etc.
Object name is gkq021i91.jpg is interrupted by bulges or internal loops. This leads to a more comprehensive prediction class as in pknotsRG, where only bulges of size 1 nt are considered (26). In general, the three dictionaries An external file that holds a picture, illustration, etc.
Object name is gkq021i92.jpg, An external file that holds a picture, illustration, etc.
Object name is gkq021i93.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i94.jpg allow for the construction of complicated pseudoknot folds. For example, it is straightforward to construct kissing hairpins from the stem dictionary. The pseudoknot energy parameters become the bottleneck in this approach, not necessarily the complexity of pseudoknots.

The three main steps for constructing recursive H-type pseudoknots are shown in Figure 4. First, so-called core H-type pseudoknots form through simple combination of two crossing stems. They become recursive H-type pseudoknots when additional secondary structure elements fold in each of the three loops. Note that recursive pseudoknots are not allowed in the loops as this may lead to sterically infeasible configurations. The overall recursive H-type candidate pseudoknot is assembled in a third step.

Figure 4.
Construction of a recursive H-type pseudoknot. On the first level, two stems form a core H-type pseudoknot. On the second level, recursive secondary structure elements may form in each of the three loops. On the third level, the recursive H-type pseudoknot ...

First level: assembling core H-type pseudoknots

On the first level, two crossing stems are combined to form a core H-type pseudoknot (Figure 4). Core H-type pseudoknots are the building blocks for more complex pseudoknots. The pseudoknot stems can either be regular stems taken from the dictionary An external file that holds a picture, illustration, etc.
Object name is gkq021i95.jpg or interrupted stems from the dictionary An external file that holds a picture, illustration, etc.
Object name is gkq021i96.jpg (Figure 5). Here, we only allow pseudoknots with at most one interrupted stem because for more complex and less rigid pseudoknots we would have to employ a highly assumptive energy model. Certain loop length restrictions are applied because more meaningful results can be expected from prediction of shorter and well-studied pseudoknots. Loop An external file that holds a picture, illustration, etc.
Object name is gkq021i103.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i104.jpg are both required to have at least 1 and 2 nt, respectively. Interhelix loop An external file that holds a picture, illustration, etc.
Object name is gkq021i105.jpg can have a size of 0 nt. All three loops are restricted to a maximum length, as there are no reliable energy parameters for very long pseudoknots. Loops An external file that holds a picture, illustration, etc.
Object name is gkq021i106.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i107.jpg can have a length of up to 100 nt, whereas interhelix loop An external file that holds a picture, illustration, etc.
Object name is gkq021i108.jpg is restricted to a maximum length of 50 nt. During construction of the pseudoknot candidates, we discovered that crossing stems may compete for a base pair. This leads to an overlap at loop An external file that holds a picture, illustration, etc.
Object name is gkq021i109.jpg. In such a case, one of the stems is truncated according to certain rules (Supplementary Figure S2 and Supplementary Algorithm 2).

Figure 5.
On the first level of pseudoknot construction two stems form a crossing structure. (a) Two regular stems An external file that holds a picture, illustration, etc.
Object name is gkq021i97.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i98.jpg are crossing. (b) Stem An external file that holds a picture, illustration, etc.
Object name is gkq021i99.jpg and stem An external file that holds a picture, illustration, etc.
Object name is gkq021i100.jpg are crossing. (c) Stem An external file that holds a picture, illustration, etc.
Object name is gkq021i101.jpg and stem An external file that holds a picture, illustration, etc.
Object name is gkq021i102.jpg are crossing.

There is little knowledge about the 3D folding of complex pseudoknots with interrupted stems, which may lead to bending or distortions of the RNA A-helix. Loops An external file that holds a picture, illustration, etc.
Object name is gkq021i110.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i111.jpg need to cross the major and minor groove of stems An external file that holds a picture, illustration, etc.
Object name is gkq021i112.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i113.jpg, respectively. To disallow sterically infeasible configurations for long interrupted stems, we make the following assumptions (41). For interrupted stems with more than 10 bp (including bulges and internal loops), the loops bridging the stems must have a minimum length: loop An external file that holds a picture, illustration, etc.
Object name is gkq021i114.jpg must be An external file that holds a picture, illustration, etc.
Object name is gkq021i115.jpg nt and loop An external file that holds a picture, illustration, etc.
Object name is gkq021i116.jpg must be An external file that holds a picture, illustration, etc.
Object name is gkq021i117.jpg nt.

After the first level of pseudoknot construction, free energy values are evaluated for each core H-type pseudoknot, which allows to filter unlikely pseudoknots. Only pseudoknots with low free energy (and therefore likely to form) will remain for the next step, which involves recursive secondary structure formation in the three pseudoknot loops. For all core H-type pseudoknots An external file that holds a picture, illustration, etc.
Object name is gkq021i118.jpg, free energy An external file that holds a picture, illustration, etc.
Object name is gkq021i119.jpg is calculated as

equation image

where An external file that holds a picture, illustration, etc.
Object name is gkq021i120.jpg is the purely entropic free energy for loops An external file that holds a picture, illustration, etc.
Object name is gkq021i121.jpg, An external file that holds a picture, illustration, etc.
Object name is gkq021i122.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i123.jpg. Three different pseudoknot energy models are employed for the loop entropy calculation, each of that comprise pseudoknots with certain characteristics. The length of loop An external file that holds a picture, illustration, etc.
Object name is gkq021i124.jpg determines which energy model is used for the respective pseudoknot candidate (Table 1). For details on pseudoknot loop entropy calculation using the virtual bond models CC06 and CC09 see (42) and (48), respectively. For pseudoknots with long interhelix loop An external file that holds a picture, illustration, etc.
Object name is gkq021i133.jpg, there is no physical loop entropy model available and we have to employ heuristics (24–26). This heuristic energy model (LongPK) is also used for pseudoknots with one interrupted stem regardless of the length of loop An external file that holds a picture, illustration, etc.
Object name is gkq021i134.jpg. Stems interrupted by long bulge or internal loops are likely to result in bending rather than rigid formations (48). Therefore, the loop entropy calculation becomes intricate. We calculate the loop entropy as An external file that holds a picture, illustration, etc.
Object name is gkq021i135.jpg, where An external file that holds a picture, illustration, etc.
Object name is gkq021i136.jpg is the number of unpaired nucleotides in the three pseudoknot loops with An external file that holds a picture, illustration, etc.
Object name is gkq021i137.jpg kcal/mol and An external file that holds a picture, illustration, etc.
Object name is gkq021i138.jpg kcal/mol.

Table 1.
Three different energy models used for pseudoknot energy evaluation

For pseudoknots with regular stems and loop length An external file that holds a picture, illustration, etc.
Object name is gkq021i139.jpg of 0 or 1 nt, one can generally assume that the two pseudoknot stems are coaxially stacked. This leads to a stabilizing effect for the two base pairs at the interhelix junction. Here, coaxial stacking is calculated using the Turner energy model, multiplied by an estimated weighting parameter An external file that holds a picture, illustration, etc.
Object name is gkq021i140.jpg and added to the free energy of a pseudoknot (7,8,24). For pseudoknots with interrupted stems and absent loop An external file that holds a picture, illustration, etc.
Object name is gkq021i141.jpg, we also add the appropriate coaxial stacking energy multiplied by an estimated weighting parameter An external file that holds a picture, illustration, etc.
Object name is gkq021i142.jpg. After energy evaluation, only pseudoknots with negative free energy are stored in the pseudoknot dictionary An external file that holds a picture, illustration, etc.
Object name is gkq021i143.jpg. Additionally, we demand that the free energy of a core H-type pseudoknot needs to be lower than the free energies An external file that holds a picture, illustration, etc.
Object name is gkq021i144.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i145.jpg of the pseudoknot stems An external file that holds a picture, illustration, etc.
Object name is gkq021i146.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i147.jpg.

Second level: recursive structure formation

Secondary structure elements often form in pseudoknot loops, resulting in a recursive pseudoknot. After finding stable core H-type pseudoknots, the three loops An external file that holds a picture, illustration, etc.
Object name is gkq021i148.jpg, An external file that holds a picture, illustration, etc.
Object name is gkq021i149.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i150.jpg are examined for likely secondary structure elements. It is a valid assumption that the three loops form recursive elements independently and can be treated separately (Figure 4). From an algorithmic point of view, it is efficient to find recursive structure elements using a MWIS calculation (Supplementary Algorithm 3). Given a core H-type pseudoknot, three candidate lists hold all possible secondary structure elements from dictionaries An external file that holds a picture, illustration, etc.
Object name is gkq021i151.jpg, An external file that holds a picture, illustration, etc.
Object name is gkq021i152.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i153.jpg contained in each of the loops An external file that holds a picture, illustration, etc.
Object name is gkq021i154.jpg, An external file that holds a picture, illustration, etc.
Object name is gkq021i155.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i156.jpg. A standard MWIS calculation with free energy weights for each of the three lists returns the set of secondary structure elements with best local free energy for each loop. The results are combined to form a recursive H-type pseudoknot.

For each recursive H-type pseudoknot, we need to evaluate the free energy with respect to the recursive structure elements. As described in Table 1, the set of pseudoknots is divided into the three different classes according to the length of loop An external file that holds a picture, illustration, etc.
Object name is gkq021i157.jpg. To account for recursive structure elements, the loop entropies need to be recalculated. Following the notation in Cao and Chen (48), we first calculate the effective loop lengths. The effective loop length An external file that holds a picture, illustration, etc.
Object name is gkq021i158.jpg for a pseudoknot loop An external file that holds a picture, illustration, etc.
Object name is gkq021i159.jpg (An external file that holds a picture, illustration, etc.
Object name is gkq021i160.jpg) with internal structure elements is the number of unpaired nucleotides outside those internal structure elements plus the number of internal structure elements. For all pseudoknots with recursive structure elements, the loop entropy is recalculated using the effective loop lengths. Internal secondary structure elements add free energy values as given by the Turner energy model (8). We keep only pseudoknots An external file that holds a picture, illustration, etc.
Object name is gkq021i161.jpg in the pseudoknot dictionary An external file that holds a picture, illustration, etc.
Object name is gkq021i162.jpg, which satisfy the following two conditions. First, a recursive H-type pseudoknot must have free energy An external file that holds a picture, illustration, etc.
Object name is gkq021i163.jpg kcal/mol. Second, the normalized pseudoknot free energy must fulfill An external file that holds a picture, illustration, etc.
Object name is gkq021i164.jpg where An external file that holds a picture, illustration, etc.
Object name is gkq021i165.jpg denotes the length of pseudoknot An external file that holds a picture, illustration, etc.
Object name is gkq021i166.jpg. Setting the threshold to An external file that holds a picture, illustration, etc.
Object name is gkq021i167.jpg is not too restrictive; however, it helps us to eliminate pseudoknots with high free energy (58). For example, for the 3′-UTR TMV sequence with length 214 nt, we have 445 candidate stems in dictionary An external file that holds a picture, illustration, etc.
Object name is gkq021i168.jpg and 2026 pseudoknot candidates before filtering. After the length-normalized filtering step, only 274 pseudoknot candidates remain.

Verification of pseudoknot candidates in the sequence

The set of recursive H-type pseudoknots stored in dictionary An external file that holds a picture, illustration, etc.
Object name is gkq021i169.jpg will certainly contain false positive pseudoknots. Therefore, a MWIS calculation with local free energy weights is employed using the structure elements from all four dictionaries: An external file that holds a picture, illustration, etc.
Object name is gkq021i170.jpg, An external file that holds a picture, illustration, etc.
Object name is gkq021i171.jpg, An external file that holds a picture, illustration, etc.
Object name is gkq021i172.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i173.jpg. Stems An external file that holds a picture, illustration, etc.
Object name is gkq021i174.jpg need to have confidence An external file that holds a picture, illustration, etc.
Object name is gkq021i175.jpg to participate (55). Furthermore, stems are allowed to contain nested structure elements, including pseudoknots. The MWIS procedure with nesting is described in detail in KnotSeeker (39). Note that we do not include the free energy gain of An external file that holds a picture, illustration, etc.
Object name is gkq021i176.jpg kcal/mol for the outer loop as in KnotSeeker, because favourable nested structures are already included through dictionaries An external file that holds a picture, illustration, etc.
Object name is gkq021i177.jpg and An external file that holds a picture, illustration, etc.
Object name is gkq021i178.jpg.

RESULTS

We evaluated our algorithm on a set of pseudoknotted and pseudoknot-free sequences of different RNA types (Table 2). Given a sequence, DotKnot is a method that predicts only pseudoknots. Therefore, predictive accuracy is measured for base pairs belonging to a pseudoknot. Two measurements are used for comparison of DotKnot and other selected algorithms from the literature. For each published pseudoknot in a sequence we report sensitivity An external file that holds a picture, illustration, etc.
Object name is gkq021i179.jpg and the positive predictive value An external file that holds a picture, illustration, etc.
Object name is gkq021i180.jpg. True positive (An external file that holds a picture, illustration, etc.
Object name is gkq021i181.jpg) corresponds to the number of correctly predicted base pairs in the predicted pseudoknot, False negative (An external file that holds a picture, illustration, etc.
Object name is gkq021i182.jpg) to the number of base pairs in the published pseudoknot that were not predicted and False positive (An external file that holds a picture, illustration, etc.
Object name is gkq021i183.jpg) to the number of incorrectly predicted base pairs in the predicted pseudoknot. A pseudoknot is said to be predicted by an algorithm if it is a crossing structure element and at least one of the two pseudoknot stems is partially predicted. Furthermore, the ratio An external file that holds a picture, illustration, etc.
Object name is gkq021i184.jpg (number of correctly predicted pseudoknots)/(number of predicted pseudoknots) is reported. We compare DotKnot to two dynamic programming methods, namely pknots (24) and pknotsRG (26), the pseudoknot detection tool KnotSeeker (39) and the heuristic approach HotKnots (35). HotKnots returns a number of sub-optimal scenarios; however, we only evaluate predictive accuracy for the best solution.

Table 3.
Summary of pseudoknot detection results on various RNA sequences
Table 2.
RNA types and sequences used for pseudoknot prediction

5S rRNA, tRNA and miRNA are pseudoknot-free types of RNA. For the 5S rRNA and miRNA sequences chosen in our test set, DotKnot does not introduce any spurious pseudoknots. For two of the tRNA sequences, DotKnot predicts false positive pseudoknots. KnotSeeker, pknotsRG and HotKnots also predict false positive pseudoknots for some of the tRNA sequences. Minimum free energy prediction is known to have low accuracy for tRNAs. This is due to modified bases as well as coaxially stacked helices, which determine the characteristic 3D cloverleaf fold of tRNAs (24,26). Coaxial energies are implemented by pknots, which might explain why it does not predict false positive pseudoknots for the tested tRNA sequences.

Several of the test sequences contain pseudoknots where one of the core pseudoknot stems is interrupted by bulges or internal loops: Escherichia coli tmRNA, Legionella pneumophila tmRNA, the SARS frameshifting pseudoknot, human telomerase and Tetrahymena telomerase. For all of these pseudoknots, DotKnot delivers the best results in terms of sensitivity and specificity. For example, the Tetrahymena telomerase RNA (TER) contains a pseudoknot with a conserved central GA bulge in one of its stems, which is vital for telomerase function (76). DotKnot perfectly predicts this bulged pseudoknot, while all other methods do not predict a pseudoknot structure. The biological relevance of pseudoknots with bulged residues should not be underestimated. Therefore, a prediction algorithm that can handle pseudoknots with interrupted stems such as DotKnot is highly desirable.

For complex pseudoknot foldings such as the hepatitis delta virus (HDV) double pseudoknot configuration, DotKnot gives the best prediction results out of all tested algorithms. The CrPV IRES has a long pseudoknot, which contains another nested pseudoknot. For this pseudoknot, DotKnot also gives the best prediction in terms of sensitivity and PPV.

DotKnot has excellent accuracy for simple H-type pseudoknots as those reported in many viral 3′-UTRs and frameshifting regions. We found that the energy models CC06 and CC09 used for predicting such pseudoknots give better results than the heuristic pseudoknot energy parameters employed by the other algorithms. For example, the NeRNV and TMV 3′-UTRs both have five pseudoknots where four are simple H-type pseudoknots with interhelix loop An external file that holds a picture, illustration, etc.
Object name is gkq021i185.jpg nt. For these pseudoknots, DotKnot gives the most accurate predictions, which we claim is due to the improved energy parameters by Cao and Chen (42,48).

In terms of computational performance, DotKnot is very efficient due to the sparseness of the probability dot plot, the resulting low number of pseudoknot candidates and the implementation using dictionaries in Python. DotKnot runs in the order of seconds for all of the test sequences except T2 and T4, which take several minutes. For T4 with 1340 nt, we have 6567 candidate stems in dictionary An external file that holds a picture, illustration, etc.
Object name is gkq021i186.jpg and 7534 pseudoknot candidates before filtering. After the length-normalized filtering step, only 100 pseudoknot candidates remain for verification. Overall, it takes DotKnot <5 min to predict the correct pseudoknot in this sequence on our reference machine (Intel QC 2.66 GHz, 4 GB RAM). This is significantly faster than HotKnots, which takes 29 min, and pknotsRG, which takes 31 min. KnotSeeker is even faster than DotKnot and takes <2 min for the T4 sequence, because it does not rely on a partition function calculation. However, DotKnot is a more powerful prediction algorithm than KnotSeeker due to the inclusion of pseudoknots with one interrupted stem.

CONCLUSION

We presented DotKnot, a program that detects recursive H-type pseudoknots given an RNA sequence. Pseudoknot detection is a promising and efficient approach for determining the folding of an RNA. Using pseudoknot detection tools such as DotKnot, KnotSeeker or HPknotter, one can find likely pseudoknots in a sequence with high accuracy (37,39). The structure of the detected pseudoknots can subsequently be investigated using laboratory or bioinformatics techniques. The remaining non-crossing sequence can be folded using secondary structure prediction algorithms in An external file that holds a picture, illustration, etc.
Object name is gkq021i187.jpg time and An external file that holds a picture, illustration, etc.
Object name is gkq021i188.jpg space. DotKnot and other pseudoknot detection approaches are very time efficient, even allowing scanning of long regions in viral genomes.

DotKnot assembles pseudoknot candidates from a set of structural building blocks. This set contains stems, bulge loops, internal loops and multiloops. In general, complex pseudoknots can be constructed. However, there is a trade-off between the generality of predictable pseudoknots and the biological relevance of the result. Therefore, we restrict DotKnot to the prediction of recursive H-type pseudoknots where one of the pseudoknot stems can contain bulges and internal loops. For these recursive H-type pseudoknots, we are confident that the pseudoknot energy parameters used give a good approximation. For more complex pseudoknot folds such as those with loop–loop interactions, we would have to employ a highly assumptive energy model, thus sacrificing predictive accuracy. In the future, DotKnot will be extended to the prediction of kissing hairpins and other biologically relevant classes of pseudoknots.

DotKnot uses stack probabilities from the probability dot plot as the basis for finding pseudoknot building blocks. At this stage, only a single sequence is accepted as an input. It is well-known that functional pseudoknots are highly conserved in nature, for example in virus families. The pseudoknot detection framework used in DotKnot allows for incorporation of comparative information using aligned probability dot plots. One can expect that reliable alignments will greatly improve confidence in the predicted pseudoknots. Especially for complex pseudoknot foldings such as those found in bacterial tmRNA or in the telomerase RNA component, inclusion of comparative information will be invaluable.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Funding for open access charge: The University of Western Australia.

Supplementary Material

[Supplementary Data]

ACKNOWLEDGEMENTS

J.S. would like to thank Elena Rivas, Eric Westhof and the participants of the RNA workshop 2009 in Benasque, Spain, for the inspiring meeting and discussions.

REFERENCES

1. Storz G. An expanding universe of noncoding RNAs. Science. 2002;296:1260–1263. [PubMed]
2. Mello CC, Conte D., Jr Revealing the world of RNA interference. Nature. 2004;431:338–342. [PubMed]
3. Tinoco I, Bustamante C. How RNA folds. J. Mol. Biol. 1999;293:271–281. [PubMed]
4. Shapiro BA, Yingling YG, Kasprzak W, Bindewald E. Bridging the gap in RNA structure prediction. Curr. Opin. Struct. Biol. 2007;17:157–165. [PubMed]
5. Gardner PP, Giegerich R. A comprehensive comparison of comparative RNA structure prediction approaches. BMC Bioinformatics. 2004;5 [PMC free article] [PubMed]
6. Wilm A, Higgins DG, Notredame C. R-Coffee: a method for multiple alignment of non-coding RNA. Nucleic Acids Res. 2008;36 [PMC free article] [PubMed]
7. Serra MJ, Turner DH. Predicting thermodynamic properties of RNA. Methods Enzymol. 1995;259:242–261. [PubMed]
8. Mathews DH, Sabina J, Zuker M, Turner DH. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 1999;288:911–940. [PubMed]
9. Zuker M, Stiegler P. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 1981;9:133–148. [PMC free article] [PubMed]
10. Lyngsø RB, Zuker M, Pedersen CN. Fast evaluation of internal loops in RNA secondary structure prediction. Bioinformatics. 1999;15:440–445. [PubMed]
11. Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P. Fast Folding and Comparison of RNA Secondary Structures. Monatsh. Chem. 1994;125:167–188.
12. Zuker M. On finding all suboptimal foldings of an RNA molecule. Science. 1989;244:48–52. [PubMed]
13. Wuchty S, Fontana W, Hofacker IL, Schuster P. Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers. 1999;49:145–165. [PubMed]
14. McCaskill JS. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers. 1990;29:1105–1119. [PubMed]
15. Brierley I, Pennell S, Gilbert RJ. Viral RNA pseudoknots: versatile motifs in gene expression and replication. Nat. Rev. Microbiol. 2007;5:598–610. [PubMed]
16. Brierley I, Gilbert RJ, Pennell S. RNA pseudoknots and the regulation of protein synthesis. Biochem. Soc. Trans. 2008;36:684–689. [PubMed]
17. Giedroc DP, Theimer CA, Nixon PL. Structure, stability and function of RNA pseudoknots involved in stimulating ribosomal frameshifting. J. Mol. Biol. 2000;298:167–185. [PubMed]
18. Staple DW, Butcher SE. Pseudoknots: RNA structures with diverse functions. PLoS Biol. 2005;3:e213. [PMC free article] [PubMed]
19. Giedroc DP, Cornish PV. Frameshifting RNA pseudoknots: structure and mechanism. Virus Res. 2009;139:193–208. [PMC free article] [PubMed]
20. Fechter P, Rudinger-Thirion J, Florentz C, Giege R. Novel features in the tRNA-like world of plant viral RNAs. Cell. Mol. Life Sci. 2001;58:1547–1561. [PubMed]
21. Hammond JA, Rambo RP, Filbin ME, Kieft JS. Comparison and functional implications of the 3D architectures of viral tRNA-like structures. RNA. 2009;15:294–307. [PubMed]
22. Akutsu T. Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots. Discrete Appl. Math. 2000;104:45–62.
23. Lyngsø RB, Pedersen CN. RNA pseudoknot prediction in energy-based models. J. Comput. Biol. 2000;7:409–427. [PubMed]
24. Rivas E, Eddy SR. A dynamic programming algorithm for RNA structure prediction including pseudoknots. J. Mol. Biol. 1999;285:2053–2068. [PubMed]
25. Dirks RM, Pierce NA. A partition function algorithm for nucleic acid secondary structure including pseudoknots. J. Comput. Chem. 2003;24:1664–1677. [PubMed]
26. Reeder J, Giegerich R. Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics. BMC Bioinformatics. 2004;5:104. [PMC free article] [PubMed]
27. Gultyaev AP, van Batenburg FH, Pleij CW. The computer simulation of RNA folding pathways using a genetic algorithm. J. Mol. Biol. 1995;250:37–51. [PubMed]
28. van Batenburg FH, Gultyaev AP, Pleij CW. An APL-programmed genetic algorithm for the prediction of RNA secondary structure. J. Theor. Biol. 1995;174:269–280. [PubMed]
29. Brown M, Wilson C. RNA pseudoknot modeling using intersections of stochastic context free grammars with applications to database search. Pac. Symp. Biocomput. 1996:109–125. [PubMed]
30. Cai L, Malmberg RL, Wu Y. Stochastic modeling of RNA pseudoknotted structures: a grammatical approach. Bioinformatics. 2003;19:i66–i73. [PubMed]
31. Abrahams JP, van den Berg M, van Batenburg E, Pleij C. Prediction of RNA secondary structure, including pseudoknotting, by computer simulation. Nucleic Acids Res. 1990;18:3035–3044. [PMC free article] [PubMed]
32. Xayaphoummine A, Bucher T, Thalmann F, Isambert H. Prediction and statistics of pseudoknots in RNA structures using exactly clustered stochastic simulations. Proc. Natl. Acad. Sci. USA. 2003;100:15310–15315. [PubMed]
33. Tabaska JE, Cary RB, Gabow HN, Stormo GD. An RNA folding method capable of identifying pseudoknots and base triples. Bioinformatics. 1998;14:691–699. [PubMed]
34. Ruan J, Stormo GD, Zhang W. An iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots. Bioinformatics. 2004;20:58–66. [PubMed]
35. Ren J, Rastegari B, Condon A, Hoos HH. HotKnots: heuristic prediction of RNA secondary structures including pseudoknots. RNA. 2005;11:1494–1504. [PubMed]
36. Chen X, He S, Bu D, Zhang F, Wang Z, Chen R, Gao W. FlexStem: improving predictions of RNA secondary structures with pseudoknots by reducing the search space. Bioinformatics. 2008;24:1994–2001. [PubMed]
37. Huang CH, Lu CL, Chiu HT. A heuristic approach for detecting RNA H-type pseudoknots. Bioinformatics. 2005;21:3501–3508. [PubMed]
38. Huang X, Ali H. High sensitivity RNA pseudoknot prediction. Nucleic Acids Res. 2007;35:656–663. [PMC free article] [PubMed]
39. Sperschneider J, Datta A. KnotSeeker: heuristic pseudoknot detection in long RNA sequences. RNA. 2008;14:630–640. [PubMed]
40. Theis C, Reeder J, Giegerich R. KnotInFrame: prediction of -1 ribosomal frameshift events. Nucleic Acids Res. 2008;36:6013–6020. [PMC free article] [PubMed]
41. Gultyaev AP, van Batenburg FH, Pleij CW. An approximation of loop free energy values of RNA H-pseudoknots. RNA. 1999;5:609–617. [PubMed]
42. Cao S, Chen SJ. Predicting RNA pseudoknot folding thermodynamics. Nucleic Acids Res. 2006;34:2634–2652. [PMC free article] [PubMed]
43. Chen SJ. RNA folding: conformational statistics, folding kinetics and ion electrostatics. Annu. Rev. Biophys. 2008;37:197–214. [PMC free article] [PubMed]
44. Aalberts DP, Hodas NO. Asymmetry in RNA pseudoknots: observation and theory. Nucleic Acids Res. 2005;33:2210–2214. [PMC free article] [PubMed]
45. Lucas A, Dill KA. Statistical mechanics of pseudoknot polymers. J. Chem. Phys. 2003;119:2414–2421.
46. Kopeikin Z, Chen SJ. Statistical thermodynamics for chain molecules with simple RNA tertiary contacts. J. Chem. Phys. 2005;122:094909. [PMC free article] [PubMed]
47. Kopeikin Z, Chen SJ. Folding thermodynamics of pseudoknotted chain conformations. J. Chem. Phys. 2006;124:154903. [PMC free article] [PubMed]
48. Cao S, Chen SJ. Predicting structures and stabilities for H-type pseudoknots with interhelix loops. RNA. 2009;15:696–706. [PubMed]
49. Bompfünewerer AF, Backofen R, Bernhart SH, Hertel J, Hofacker IL, Stadler PF, Will S. Variations on RNA folding and alignment: lessons from Benasque. J. Math. Biol. 2008;56:129–144. [PubMed]
50. Mathews DH. Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. RNA. 2004;10:1178–1190. [PubMed]
51. Will S, Reiche K, Hofacker IL, Stadler PF, Backofen R. Inferring noncoding RNA families and classes by means of genome-scale structure-based clustering. PLoS Comput. Biol. 2007;3:680–691. [PubMed]
52. Serra MJ, Lyttle MH, Axenson TJ, Schadt CA, Turner DH. RNA hairpin loop stability depends on closing base pair. Nucleic Acids Res. 1993;21:3845–3849. [PMC free article] [PubMed]
53. Giese MR, Betschart K, Dale T, Riley CK, Rowan C, Sprouse KJ, Serra MJ. Stability of RNA hairpins closed by wobble base pairs. Biochemistry. 1998;37:1094–1100. [PubMed]
54. Hamada M, Tsuda K, Kudo T, Kin T, Asai K. Mining frequent stem patterns from unaligned RNA sequences. Bioinformatics. 2006;22:2480–2487. [PubMed]
55. Hofacker IL, Stadler PF. Automatic detection of conserved base pairing patterns in RNA virus genomes. Comput. Chem. 1999;23:401–414. [PubMed]
56. Hsiao JY, Tang CY, Chang RS. An Efficient Algorithm for Finding a Maximum Weight 2-Independent Set on Interval-Graphs. Inf. Process. Lett. 1992;43:229–235.
57. Condon A, Davy B, Rastegari B, Zhao S, Tarrant F. Classifying RNA pseudoknotted structures. Theor. Comput. Sci. 2004;320:35–50.
58. Freyhult E, Gardner PP, Moulton V. A comparison of RNA folding measures. BMC Bioinformatics. 2005;6:241. [PMC free article] [PubMed]
59. Cannone JJ, Subramanian S, Schnare MN, Collett JR, D'S;ouza LM, Du YS, Feng B, Lin N, Madabusi LV, Muller KM, et al. The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics. 2002;3:15. [PMC free article] [PubMed]
60. Sprinzl M, Horn C, Brown M, Ioudovitch A, Steinberg S. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 1998;26:148–153. [PMC free article] [PubMed]
61. Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res. 2006;34:D140–D144. [PMC free article] [PubMed]
62. Ferre-D'A;mare AR, Zhou KH, Doudna JA. Crystal structure of a hepatitis delta virus ribozyme. Nature. 1998;395:567–574. [PubMed]
63. van Batenburg FH, Gultyaev AP, Pleij CW, Ng J, Oliehoek J. Pseudobase: a database with RNA pseudoknots. Nucleic Acids Res. 2000;28:201–204. [PMC free article] [PubMed]
64. Schüler M, Connell SR, Lescoute A, Giesebrecht J, Dabrowski M, Schroeer B, Mielke T, Penczek PA, Westhof E, Spahn CM. Structure of the ribosome-bound cricket paralysis virus IRES RNA. Nat. Struct. Mol. Biol. 2006;13:1092–1096. [PubMed]
65. Williams GD, Chang RY, Brian DA. A phylogenetically conserved hairpin-type 3′ untranslated region pseudoknot functions in coronavirus RNA replication. J. Virol. 1999;73:8349–8355. [PMC free article] [PubMed]
66. Koenig R, Barends S, Gultyaev AP, Lesemann DE, Vetten HJ, Loss S, Pleij CW. Nemesia ring necrosis virus: a new tymovirus with a genomic RNA having a histidylatable tobamovirus-like 3′-end. J. Gen. Virol. 2005;86:1827–1833. [PubMed]
67. van Belkum A, Abrahams JP, Pleij CW, Bosch L. Five pseudoknots are present at the 204 nucleotides long 3′ noncoding region of tobacco mosaic virus RNA. Nucleic Acids Res. 1985;13:7673–7686. [PMC free article] [PubMed]
68. Williams KP. The tmRNA website. Nucleic Acids Res. 2000;28:168. [PMC free article] [PubMed]
69. Tuerk C, MacDougal S, Gold L. RNA pseudoknots that inhibit human immunodeficiency virus type 1 reverse transcriptase. Proc. Natl. Acad. Sci. USA. 1992;89:6988–6992. [PubMed]
70. Solovyev AG, Savenkov EI, Agranovsky AA, Morozov SY. Comparisons of the genomic cis-elements and coding regions in RNA beta components of the hordeiviruses barley stripe mosaic virus, lychnis ringspot virus, and poa semilatent virus. Virology. 1996;219:9–18. [PubMed]
71. Matsuda D, Dreher TW. The tRNA-like structure of Turnip yellow mosaic virus RNA is a 3′-translational enhancer. Virology. 2004;321:36–46. [PubMed]
72. Su L, Chen L, Egli M, Berger JM, Rich A. Minor groove RNA triplex in the crystal structure of a ribosomal frameshifting viral pseudoknot. Nat. Struct. Biol. 1999;6:285–292. [PubMed]
73. Firth AE, Atkins JF. A conserved predicted pseudoknot in the NS2A-encoding sequence of West Nile and Japanese encephalitis flaviviruses suggests NS1′ may derive from ribosomal frameshifting. Virol. J. 2009;6:14. [PMC free article] [PubMed]
74. Pennell S, Manktelow E, Flatt A, Kelly G, Smerdon SJ, Brierley I. The stimulatory RNA of the Visna-Maedi retrovirus ribosomal frameshifting signal is an unusual pseudoknot with an interstem element. RNA. 2008;14:1366–1377. [PubMed]
75. Plant EP, Perez-Alvarado GC, Jacobs JL, Mukhopadhyay B, Hennig M, Dinman JD. A three-stemmed mRNA pseudoknot in the SARS coronavirus frameshift signal. PLoS Biol. 2005;3:e172. [PMC free article] [PubMed]
76. Theimer CA, Feigon J. Structure and function of telomerase RNA. Curr. Opin. Struct. Biol. 2006;16:307–318. [PubMed]
77. Nateri AS, Hughes PJ, Stanway G. Terminal RNA replication elements in human parechovirus 1. J. Virol. 2002;76:13116–13122. [PMC free article] [PubMed]
78. Du Z, Hoffman DW. An NMR and mutational study of the pseudoknot within the gene 32 mRNA of bacteriophage T2: insights into a family of structurally related RNA pseudoknots. Nucleic Acids Res. 1997;25:1130–1135. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press