|Home | About | Journals | Submit | Contact Us | Français|
Summary: RDP3 is a new version of the RDP program for characterizing recombination events in DNA-sequence alignments. Among other novelties, this version includes four new recombination analysis methods (3SEQ, VISRD, PHYLRO and LDHAT), new tests for recombination hot-spots, a range of matrix methods for visualizing over-all patterns of recombination within datasets and recombination-aware ancestral sequence reconstruction. Complementary to a high degree of analysis flow automation, RDP3 also has a highly interactive and detailed graphical user interface that enables more focused hands-on cross-checking of results with a wide variety of newly implemented phylogenetic tree construction and matrix-based recombination signal visualization methods. The new RDP3 can accommodate large datasets and is capable of analyzing alignments ranging in size from 1000×10 kilobase sequences to 20×2 megabase sequences within 48 h on a desktop PC.
Availability: RDP3 is available for free from its web site http://darwin.uvigo.es/rdp/rdp.html
Supplementary information: The RDP3 program manual contains detailed descriptions of the various methods it implements and a step-by-step guide describing how best to use these.
rpd3 is a computer program for statistical identification and characterization of historical recombination events. Given a set of aligned nucleotide sequences, rpd3 will rapidly analyze these with a range of powerful non-parametric recombination detection methods (including bootscan, maxchi, chimaera, 3seq, geneconv, siscan, phylpro and visrd; Boni et al., 2007; Gibbs et al., 2000; Lemey et al., 2009; Padidam et al., 1999, Posada and Crandall, 2001; Weiller, 1998). It will provide a detailed breakdown of recombination breakpoint locations, and the identities of recombinant and parental sequences. For further downstream analyses, the program enables users to save edited sequence alignments with (i) recombinant sequences removed; (ii) recombinationally derived tracts of sequence removed; or (iii) recombinant sequences split into their constituent parts.
An important strength of rdp3 that makes it applicable to a variety of recombination analysis problems is that, unlike many other recombination detection programs such as simplot (Lole et al., 1999), dual brothers (Minin et al., 2005), jphmm (Schultz et al., 2006) or scueal (Kosakovsky et al., 2009), it does not screen predefined sets of potentially recombinant (or query) sequences against other predefined sets of non-recombinant (or reference) sequences. rdp3 instead treats every sequence within an input alignment as a potential recombinant and systematically screens large numbers of sequence triplets and/or quartets to identify sets of three or four sequences that contain a recombinant and two sequences resembling its parents. Such an approach means that rdp3 can simultaneously detect the entire scope of recombination evident within a dataset (i.e. not just that occurring between the reference strains or species) enabling its use in the characterization of complex recombinants such as those derived through recombination between parental sequences that were themselves recombinant. The drawback of such a flexible, exploratory framework is that it can often be difficult to assess the uncertainty associated with inferred recombination patterns. However, with its wide range of cross-checking tools, rpd3 is complementary to probabilistic recombination analysis approaches.
Although the graphically intensive and highly interactive rpd3 interface remains superficially unchanged from that of its predecessor, rpd2 (Martin et al., 2005a, b), it includes simple point-and-click access to a multitude of powerful new features. Among these are three new non-parametric recombination detection methods (3seq, visrd and phylpro; Boni et al., 2007; Lemey et al., 2009; Weiller, 1998), a parametric recombination rate estimation method (ldhat; McVean et al., 2004), two new tree construction methods (Maximum likelihood with phyml and Bayesian with mrbayes; Guindon and Gascual, 2003; Ronquist and Huelsenbeck, 2003), two recombination hotspot-tests (Heath et al., 2006), a test of recombination induced protein mis-folding (Lefeuvre et al., 2007; Voigt et al., 2002), recombination-aware methods for reconstructing ancestral sequences (Arenas and Posada et al., 2010) and a range of matrix methods for visualizing overall patterns of recombination within datasets (Jakobsen and Easteal, 1996; Lefeuvre et al., 2009; McVean et al., 2004).
In addition to the new methods implemented in rpd3, another important improvement over rpd2 is the way in which rpd3 automatically scans alignments for recombination signals and then infers the minimum numbers of recombination events needed to account for these signals. rpd3 implements a range of heuristic recombinant sequence identification methods based on the phylpro (Weiller, 1998), visrd (Lemey et al., 2009) and subtree-prune and regraft methods (that identify recombinants sequences as those which ‘jump’ between the branches of phylogenetic trees constructed from different fragments of the same sequence alignment; Beiko and Hamilton, 2006; Heath et al., 2006). rdp3 also automatically checks detected recombination signals to determine whether they might not be better accounted for by sequence misalignment than recombination. Misalignments introduce homoplasy and are a common cause of false positive recombination signals. Misalignments are automatically detected in rpd3 by separately realigning recombinant sequences with each of their identified parents (rpd3 uses clustalw to do this; Chenna et al., 2003) and comparing these pair-wise alignments to those of the corresponding sequence pairs in the full multiple sequence alignment. By more accurately identifying recombinant sequences and discounting recombination signals attributable to sequence misalignments, rpd3 significantly outperforms rdp2 for overall quantitative assessments of recombination patterns such as those carried out in the new breakpoint hot-spot and protein folding disruption tests.
In addition to streamlined tools for managing, testing and editing information on detected recombination events, rpd3 also provides a range of new tools for users to cross-check how accurately the program has identified (i) groups of recombinants supposedly sharing traces of the same recombination events; (ii) recombinant and parental sequences; and (iii) recombination breakpoint positions. These include heat-plots indicating how closely the recombination patterns in two recombinants resemble one another in relation to their supposed parental sequences, color coded phylogenetic trees for identifying recombinants and parental sequences and maxchi (Maynard Smith, 1992) and lard (Holmes et al., 1999) breakpoint matrices for manually identifying breakpoint positions.
All of the automated recombination detection methods in rpd3 have been rigorously speed optimized and as a result the program is able to analyze datasets containing up to 40 million nt within 48 h on a standard 2 GHz processor with 2 GB of RAM. Such large datasets might, for example, consist of 20 full bacterial genome sequences, or 1000 full viral genome sequences. With default program settings datasets containing 100 10 kb long sequences can be analyzed within 10 min.
Funding: Wellcome Trust (to D.P.M.); Postdoctoral fellowship from the Fund for Scientific Research (FWO) Flanders (to Ph.L.); South African Centre of High Performance Computing bursary (to M.L.); European Research Council (ERC-2007-Stg 203161-PHYGENOM to D.P.); Spanish Ministry of Science and Education (BFU2009-08611 to D.P.); GIS CRVOI (grant NPRAO/AIRD/CRVOI/08/03 to Pi.L.); Wellcome Trust (grant number GR079127MA).
Conflict of Interest: none declared.