We illustrate here a typical use of the BRASERO website, by comparing several programs based on computing an edit distance or an alignment between pairs of RNA secondary structures, applied on a benchmark for the RNA family of Signal Recognition Particle (SRP). We compare six tools: RNAdistance [19
], RNAforester [10
], MiGaL [8
], TreeMatching [12
], Gardenia [9
], NestedAlign [20
], and RNAStrAT [11
]. These tools rely on different models of secondary structures, such as ordered trees, multilayers models, arc-annotated sequences, but are all based on the edit distance and alignment approach pioneered in [19
]. As these tools also rely on a different usage of the primary sequence conservation, we also included BLAST [24
] for comparison. For each software, the default parameters were used.
RNAforester is an ordered trees local/global alignment algorithm. It uses a special tree encoding that allows to break nucleotide pairings under certain conditions. MiGaL uses a multilevel representation of the secondary structure composed by four layers coded by rooted ordered trees. The layers model different structural levels from multiloop network to the sequence of nucleotides composing the RNA. The algorithm successively applies edit distance computations to each layer. TreeMatching is based on a quotiented tree representation of the secondary structure which is an autosimilar structure composed of two rooted ordered trees on two different scales (nucleotides and structural elements). The core of the method relies on the comparison of both scales simultaneously: it computes an edit distance between quotiented trees at the macroscopic scale using edit costs defined as edit distances between subtrees at the microscopic scale. Gardenia and NestdAlign use an arc-annotated-based representation, that allows for complex edit operations, such as arc breaking or arc altering. They allow local and global alignment features. Gardenia notably allows affine gap scores, while NestedAlign implements an original local alignment algorithm. RNAStrAT performs the comparison in two steps. First, it compares stems of the two structures using an alignment algorithm with complex edit operations. Then it finds an optimal mapping between the different stems. All tools were used with the default parameters (in particular their default scoring scheme). We applied all tools on a benchmark available on BRASERO for the SRP family benchmark, with noise obtained from viral genomes (details are available on the website). Results are illustrated in . Note the choice of the scoring scheme for a given tool may greatly impact the final results and should be evaluated independently before using BRASERO.
SRP benchmark with 8 pairwise edit distance/alignment methods. ROC curve and computation time. By increasing computation time: BLAST, RNAdistance, Gardenia, NestedAlign, RNAStrAT, Migal, RNAforester, and TreeMatching.
We can observe on a clear separation between the software tools based on the principle of computing a global alignment of arc-annotated sequences, and the software tools based on multilayer or hierarchical approaches, that rely on more local alignments. The later seem to perform better, that is, to have a better classification power for the SRP family. Without providing a full analysis of the obtained results, which is beyond the scope of this note, a possible explanation could be that the SRP family exhibits much less sequence and structure conservation than other RNA families (such as tRNA) and that multilayer approaches are able to break down the task of aligning two structures into corresponding sub-structures. This observation, together with its interpretation, can then be used directly in restricting the set of software tools/models to consider when analyzing SRP secondary structures, but also in a longer term perspective by orienting further research specific to this family towards methods based on a multilayer approach.