Over the range of RNA families tested, which have a variety of functions and structures, TurboKnot averaged a sensitivity of 79.8% () and a PPV of 72.9% (). For comparison, TurboKnot was benchmarked against two other multisequence methods that can predict pseudoknots: ILM version 1.0 (Ruan et al., 2004
) and Hxmatch version 1.2.1 (Witwer et al., 2004
). Each was run with default parameters. Sequences were aligned for input into these algorithms using MUSCLE version 3.6 (Edgar, 2004
), which performs well for RNA sequence alignment compared with other available tools (Wilm et al., 2006
). In addition to these two methods, benchmarks were run using ProbKnot (Bellaousov and Mathews, 2010
), a single sequence method that can predict pseudoknots, TurboFold (Harmanci et al., 2011
), the underlying multisequence method that cannot predict pseudoknots and MaxExpect (Lu et al., 2009
), a single sequence method that cannot predict pseudoknots. Other single-sequence methods capable of predicting pseudoknots were recently benchmarked (Bellaousov and Mathews, 2010
), and ProbKnot compares favorably against these. It is used here as a representative of this class of algorithms. One algorithm, DotKnot, has been published more recently. Its performance on this test set is included in Supplementary Tables S3–S5
for comparison. Similarly, multisequence algorithms that cannot predict pseudoknots were recently benchmarked (Harmanci et al., 2011
), and TurboFold compares favorable and is used as a representative of them here.
Sensitivities of tested methods for all base pairs
PPVs of tested methods for all base pairs
The average sensitivity and PPV of TurboKnot were better than any other method capable of predicting pseudoknots. Additionally, TurboKnot had a higher sensitivity than TurboFold, albeit at a trade-off in PPV. Since TurboKnot considers a larger structure space than TurboFold, it is natural to find more correct base pairs with a cost to the PPV.
For pseudoknots in particular, TurboKnot found 289 correct pseudoknotted base pairs out of a total of 5897 in the known structures, and it made 672 false positive predictions of pseudoknotted pairs (). This is more true positives and fewer false positives than any other tested method. When considering structures with pseudoknots rather than individual pairs, TurboKnot found at least one true positive pseudoknot in 59 structures out of 239 predictions, and out of 426 tested structures that contain at least one pseudoknot (Supplementary Table S6
). This is again more true positives and fewer false positives than any other tested method.
Evaluation of tested methods for pseudoknotted base pairs
The use of an MWM algorithm to assemble structures given the TurboFold pair probabilities was also considered as a control (Supplementary Tables S7 and S8
). This approach resulted in the identification of the exact same set of correct base pairs as those based on mutual maximum probability pairs, plus a small number of extra incorrect pairs. The additional, incorrect pairs were so few that they only affected the second decimal places of the PPV percentages. The exception was the tmRNA family, in which a small number of additional correct base pairs were identified. The minor difference between these two approaches was due to how well the TurboFold algorithm refines the pairing probabilities by excluding structures that are not conserved.
As another control, an MWM algorithm was used to predict structures from pair probabilities calculated using a single sequence partition function. This added a number of spurious base pairs (Supplementary Table S9
). This emphasized the importance of the ProbKnot approach when single sequences are used.
Time benchmarks were performed on all tested algorithms. Groups of five random sequences were used, with average lengths varying from 80.0 nt to 458.8 nt (). The Hxmatch algorithm performed the fastest. TurboKnot was significantly slower, but it could still carry out the computations in a reasonable amount of time for a user with a typical desktop computer.
shows a sample structure prediction for Mycobacterium tuberculosis
RNase P using TurboKnot, ProbKnot or TurboFold (Brown, 1998
). The TurboKnot and TurboFold structures are similar, but the TurboKnot algorithm is able to identify most of the pseudoknotted base pairs. ProbKnot has a lower sensitivity and PPV overall for non-pseudoknotted base pairs in this structure. It identifies the four correct pseudoknotted base pairs that TurboKnot missed, but it also finds eight pseudoknotted base pairs that are not in the accepted structure.
Fig. 1. Accepted and predicted structures for M.tuberculosis RNase P. (a) The accepted structure of M.tuberculosis RNase P. (Brown, 1998). (b) The structure predicted by TurboKnot. (c) The structure predicted by ProbKnot. (d) The structure predicted by TurboFold. (more ...)
TurboKnot overextends some helices as compared with TurboFold when there are valid bases available to pair. The accuracy of the extra base pairs added when TurboKnot extends helices compared to TurboFold is low; of the base pairs on ends of helices predicted by TurboKnot and not TurboFold, 27.6% were correct in all the structures. This is for cases where a helix is in common in the predictions of TurboKnot and TurboFold. Single nucleotide bulges are not considered to break a helix. These helices that are extended by 1 bp on an end are sometimes correct, so they do increase the sensitivity score of TurboKnot compared with TurboFold but at the expense of PPV.