|Home | About | Journals | Submit | Contact Us | Français|
Ecdysozoa is the recently recognized clade of molting animals that comprises the vast majority of extant animal species and the most important invertebrate model organisms—the fruit fly and the nematode worm. Evolutionary relationships within the ecdysozoans remain, however, unresolved, impairing the correct interpretation of comparative genomic studies. In particular, the affinities of the three Panarthropoda phyla (Arthropoda, Onychophora, and Tardigrada) and the position of Myriapoda within Arthropoda (Mandibulata vs. Myriochelata hypothesis) are among the most contentious issues in animal phylogenetics.
To elucidate these relationships, we have determined and analyzed complete or nearly complete mitochondrial genome sequences of two Tardigrada, Hypsibius dujardini and Thulinia sp. (the first genomes to date for this phylum); one Priapulida, Halicryptus spinulosus; and two Onychophora, Peripatoides sp. and Epiperipatus biolleyi; and a partial mitochondrial genome sequence of the Onychophora Euperipatoides kanagrensis. Tardigrada mitochondrial genomes resemble those of the arthropods in term of the gene order and strand asymmetry, whereas Onychophora genomes are characterized by numerous gene order rearrangements and strand asymmetry variations. In addition, Onychophora genomes are extremely enriched in A and T nucleotides, whereas Priapulida and Tardigrada are more balanced.
Phylogenetic analyses based on concatenated amino acid coding sequences support a monophyletic origin of the Ecdysozoa and the position of Priapulida as the sister group of a monophyletic Panarthropoda (Tardigrada plus Onychophora plus Arthropoda). The position of Tardigrada is more problematic, most likely because of long branch attraction (LBA). However, experiments designed to reduce LBA suggest that the most likely placement of Tardigrada is as a sister group of Onychophora. The same analyses also recover monophyly of traditionally recognized arthropod lineages such as Arachnida and of the highly debated clade Mandibulata.
In spite of an ongoing debate concerning their utility in phylogenetics (Curole and Kocher 1999, Delsuc et al. 2003, Cameron et al. 2004), mitogenomic studies have proven to be informative and insightful for phylogenetic studies (e.g., Boore et al. 1998, Lavrov and Lang 2005). This can be explained by conceptual advantages such as the conserved gene set, (almost) unambiguous orthology of genes, and presence of rare genomic changes, including gene rearrangement and changes in the genetic code, as well as some historical and methodological advantages, such as the availability of primers for amplifying specific genes from many lineages and the relative ease of generating new data (Fendt et al. 2009, Boore et al. 2005). On the other hand, phylogenies based on mitochondrial sequences are well known to be affected by a variety of reconstruction artifacts, which may be responsible for dilution of the true phylogenetic signal and generation of homoplasies.
One of the main problems of mitogenomics is thought to be the lineage-specific compositional heterogeneity, which also influences the amino acid content of the encoded proteins (Foster et al. 1997; Gibson et al. 2005). For example, some ecdysozoan lineages, such as some Arthropoda and Nematoda, have mitochondrial genomes enriched for A + T nucleotides, and in the absence of strong purifying selection, the corresponding proteins are enriched for amino acids encoded by A + T rich codons (Foster et al. 1997; Saccone et al. 2002). Another type of compositional heterogeneity, measured by GC and AT skews between the two strands of DNA (Perna and Kocher 1995), reflects a directional mutational bias driven by the asymmetric nature of replication of the mitochondrial genome and results in opposite compositional biases in genes with opposite transcriptional polarities (Saccone et al. 1999; Lavrov et al. 2000, Hassanin et al. 2005; Hassanin 2006, Jones et al. 2007). Gene inversions that cause a gene to change its orientation relative to the replication origin will result in a rapid compositional change as the sequence evolves from its ancestral nucleotide skews toward the ones driven by its new position (Helfenbein et al. 2001). It has been shown that heterogeneities in both A + T proportion and AT and GC skew can cause erroneous results in phylogenetic inference (Gibson et al. 2005, Jones et al. 2007, Masta, et al. 2009).
Compositional heterogeneity is only one of the factors affecting mitochondrial phylogenies. Accelerated substitution rates may also play a role in masking and eroding phylogenetic signal through unrecognized homoplasy and lead to increased susceptibility to systematic biases, such as long branch attraction (LBA; Felsenstein 1978, Brinkmann et al. 2005). Because of the variety of possibly confounding biases that could affect mitochondrial genomes concurrently, strong outgroup effects should be expected and have been observed (Cameron et al. 2004, Rota-Stabelli and Telford 2008), with different outgroups suggesting alternative equally well-supported rooting positions for ingroup taxa. One approach to deal with these problems is to improve the standard general time reversible (GTR) models of mitochondrial sequence evolution both at the nucleotide (Hassanin et al. 2005) and at the amino acid level (Abascal et al. 2007, Rota-Stabelli et al. 2009). More sophisticated evolutionary models such as the heterogenous CAT model, which accounts for among-site heterogeneity (Lartillot and Philippe 2004), and the derived CAT-BP model, which also accounts for lineage-specific compositional heterogeneities (Blanquart and Lartillot 2008), can lessen the effects of compositional biases. Another approach to reduce bias is to increase taxonomic sampling. Sampling more taxa, particularly close to weakly supported nodes, can break long branches, allowing for a better elucidation of homoplastic similarities resulting from multiple substitutions and thus reducing the likelihood of the LBA artifact. Finally, site-stripping approaches (Brinkmann and Philippe 1999; Pisani 2004; Sperling et al. 2009) can be used to eliminate rapidly evolving sites and limit LBA artifacts because the most rapidly evolving sites are expected to be the most heterogenous in composition.
Our knowledge of metazoan evolution has changed dramatically since the seminal work of Aguinaldo et al. (1997), which first formally proposed the Ecdysozoa, a group of molting organisms that includes Arthropoda (e.g., insects), Tardigrada (water bears), Onychophora (velvet worms), Nematoda (round worms), Nematomorpha (horsehair worms), Priapulida (penis worms), Kinorhyncha (mud dragons), and Loricifera. Although a monophyletic origin of the Ecdysozoa is now widely accepted (reviewed in Telford et al. 2008), the relationships among the eight extant ecdysozoan phyla, in particular the position of Tardigrada, are still vigorously debated. Although there is a strong support from morphological and developmental gene expression data for the monophyly of Panarthropoda, a group characterized by segmental, paired, locomotory appendages, comprising Arthropoda, Onychophora, and Tardigrada (Nielsen 2001, Telford et al. 2008), these data are ambiguous about the placement of the Tardigrada and Onychophora within Panarthropoda (Peterson and Eernisse 2001, Nielsen 2001, Mayer and Whitington 2009b). Furthermore, monophyly of Panarthropoda has found little molecular support. Although arthropod affinity of the Onychophora is strongly supported by expressed sequence tag (EST)–derived phylogenomic data sets (Dunn et al. 2008, Roeding et al. 2009; Hejnol et al. 2009), mitochondrial data from the complete mitochondrial genomes of one Onychophoran placed it as the sister group of Arthropoda plus Priapulida (Podsiadlowski et al. 2008). The position of the Tardigrada is equally unclear. Ribosomal RNA sequences support a group of Tardigrada plus Onychophora as a sister lineage to the Arthropoda (Mallatt and Giribet 2006), whereas EST data challenged the Panarthropoda hypothesis, grouping Tardigrada with Nematoda (Lartillot and Philippe 2008, Roeding et al. 2009, Hejnol et al. 2009). In these analyses, Tardigrada and Nematoda are characterized by long branches, suggesting that this clade could represent a phylogenetic artifact. This possibility is reinforced by the analyses of Dunn et al. (2008), also based on EST data, which recover a monophyletic Panarthropoda, suggesting that the placement of Tardigrada may be model dependent.
The monophyly of Crustacea plus Hexapoda (Tetraconata or Pancrustacea) within Arthropoda is now well accepted (Friedrich and Tautz 1995; Boore et al. 1998, Dohle 2001, Pisani 2009). However, the phylogenetic relationships among Pancrustacea, Myriapoda (millipedes, centipedes, symphylans), and Chelicerata (arachnids, ticks, and their allies) are hotly debated. Many morphological and paleontological studies group Myriapoda with Hexapoda and Crustacea in a clade called Mandibulata (Scholz et al. 1998, Harzsch et al. 2005, but see Mayer and Whitington 2009a). By contrast, different types of molecular data (mitochondrial, ribosomal, nuclear protein–coding genes, and EST data sets) have supported a sister group relationship between Myriapoda and Chelicerata, which were placed together in Myriochelata or Paradoxopoda (Friedrich and Tautz 1995; Pisani et al. 2004, Podsiadlowski et al. 2008, Mallatt and Giribet 2006, Dunn et al. 2008, Hejnol et al. 2009, Roeding et al. 2009). Conversely, two recent analyses based on mixed markers and 62 nuclear genes found, convincing support for Mandibulata (Bourlat et al. 2008, Regier et al. 2010). Support for either Mandibulata or Myriochelata may depend on the outgroup used (Rota-Stabelli and Telford 2008), exclusion of sites (Pisani 2004), and/or method of phylogenetic inference (Regier et al. 2008), suggesting that signal is weak and that some phylogenetic conclusions may be prone to systematic errors.
To further clarify relationships within Ecdysozoa, shed light on the evolution of their mitochondrial genomes, and fill the gap in taxonomic representation that currently exists, we have sequenced and analyzed the complete mitochondrial genomes of five species: two tardigrades, Hypsibius dujardini and Thulinia sp. (the first tardigrade mitochondrial genomes to be sequenced); one priapulid, Halicryptus spinulosus; and two onychophorans, Peripatoides sp. and Epiperipatus biolleyi. We also determined a partial mitochondrial genome sequence from a third onychophoran Euperipatoides kanagrensis. Here, we briefly describe compositional and genomic characteristics of the mitochondrial genomes of Ecdysozoa, particularly focusing on these newly studied species. We show that Tardigrada species have an “arthropod-like” mitochondrial genome and that the two priapulids share the same inverted fragment with an unexpected difference in GC skew. Finally, Onychophora mitochondrial genomes are highly divergent and contain several gene rearrangements.
We carried out phylogenetic analyses to understand the relationships within Panarthropoda. Results strongly support a sister relationship between the Onychophora and the Arthropoda, whereas the position of Tardigrada is more problematic. However, analyses performed using the CAT model (which is more robust to LBA) support the grouping of Onychophora and Tardigrada within a monophyletic Panarthropoda. This hypothesis is reinforced by sequential removal of rapidly evolving lineages and by the exploration of phylogenetic signal using partitions of sites with different evolutionary rates. We also revisit the relationships within Arthropoda, providing insight into the relationships among the major arthropod subphyla (Chelicerata, Myriapoda, and Crustacea and Hexapoda), as well as the relationships within Chelicerata.
The complete mitochondrial genomes of the onychophoran E. biolleyi and Peripatoides sp., the tardigrades Thulinia sp. and H. dujardini, and the priapulid Halicryptus spinulosus were amplified and sequenced as described in Lavrov et al. (2000). Partial sequences encompassing five coding genes were identified in E. kanagrensis using EST data. Open reading frames in the newly sequenced genomes were annotated based on comparisons with protein sequences from closely related species. In addition, the mitochondrial genome from Metaperipatus inae (Podsiadlowski et al., unpublished data; GenBank accession EF624055) was re-annotated based on the two other onychophoran mitochondrial genomes. tRNA genes were inferred using the tRNAscan-SE and ARWEN programs (Lowe and Eddy 1997; Laslett and Canbäck 2008) and checked manually. tRNA genes not found by the computer programs were identified based on expected anticodon sequences, conserved positions, potential secondary structures, and similarities with sequences from closely related species. If several potential tRNA gene sequences were found, we preferred the one with a more conserved gene order position. The E. biolleyi mitochondrial genome has been previously independently analyzed by Podsiadlowski et al. (2008). Comparison with the genome sequenced by us revealed a 97.9% identity at nucleotide level and 98.8% identity at amino acid level. However, our analysis resulted in a very different annotation of this genome, especially in respect to tRNA genes.
For each species of our data set, we concatenated the 13 mt protein-coding genes to calculate 1) the overall nucleotide G + C% using all three codon positions and 2) the frequency of amino acids encoded by GC-rich codons (G + A + R + P%). In addition, we calculated the amino acid composition (again as G + A + R + P%) expected from the nucleotide composition if the nucleotide frequencies at all three codon positions are the same (F1 × 4 codon model). We have generated 10,000 random nucleotides using the same nucleotide composition of the real data and translated them into amino acids using the appropriate genetic code. We plotted the corresponding values in figure 1 (white squares and dotted line).
We also calculated the GC skew (Perna and Kocher 1995) and the skew index (Rota-Stabelli and Telford 2008) on the concatenated alignment, and for each gene independently, using all three codon positions. To test if the strand asymmetry of genes was at equilibrium, we also calculated GC skew for the 1st + 2nd and 3rd codon position separately. GC skew values were plotted for species of interest using the inferred arthropod ancestral gene order (AAGO, which is the same as Limulus polyphemus and the ancestral ecdysozoan; Lavrov et al. 2000) as a reference to order genes on the abscissa of the plots in figures 2 and and3.3. The skew index was calculated as an absolute value, and not referenced to a focal species, as in Rota-Stabelli and Telford (2008). We summarize some of these statistics in supplementary table 1 (Supplementary Material online).
We downloaded nucleotide sequences of the 13 mitochondrial protein-coding genes for 240 metazoan species from the NCBI (http://www.ncbi.nlm.nih.gov) and the OGRe database (http://drake.physics.mcmaster.ca/ogre/compare.shtml). To this initial data set, we added complete sequences for the six species determined for this study, thus generating a data set of 245 taxa (246 minus the onychophoran E. biolleyi previously published by Podsiadlowski et al. 2008). We translated nucleotide sequences into amino acids, using appropriate genetic codes, and aligned the 13 protein sets individually with ClustalW (http://www.ebi.ac.uk/Tools/clustalw2/index.html). We back aligned nucleotide sequences to the amino acid alignment using TranslatorX (downloadable from http://translatorx.co.uk/) and assembled a concatenated alignment of the nucleotide sequences of the 13 genes.
To avoid artifacts due to inadequate outgroup and ingroup selection, we followed the decision making strategy of Rota-Stabelli and Telford (2008), generating a table (data not shown, but available upon request) containing statistics for each of the 245 ingroup and outgroup species sampled. We thus sampled a set of Ecdysozoa and outgroups aiming to minimize root-to-tip distances and compositional heterogeneity across the data set. The key estimates used to identify optimal outgroups included, for each possible outgroup 1) the maximum likelihood (ML) distance to the Ecdysozoa, 2) the G + C content, 3) the content of G + C-rich codon–encoded amino acids, and 4) the two indicators of GC strand asymmetry: GC skew and the skew index, an indicator of gene overall strand asymmetry. For each possible outgroup, an ML distance metric was calculated as the average ML distance to Priapulus caudatus, L. polyphemus, and Tribolium castaneum, and the skew index was calculated with reference to the average across all Arthropoda with similar skew profile.
From the 245 taxa alignment, we selected a balanced sample of 66 species (listed in supplementary table 1, Supplementary Material online), of which ten were outgroup taxa. The nucleotide and the amino acid alignments were processed independently with Gblocks at default settings, followed by realignment using Muscle (Edgar 2004; http://www.drive5.com/muscle) and a manual removal of poorly aligned regions, resulting in an amino acid alignment of 2,016 residues and a nucleotide alignment of 7,482 residues. Because Nematoda are rapidly evolving, their mitochondrial genomes are particularly A + T rich, and could generate LBA artifacts, we did not include them in our initial 66 species data set. However, Nematoda are a key taxon for resolution of the phylogenetic position of Tardigrada. Accordingly, we generated two additional data sets, partially based on our initial 66-taxa data set, with 11 nematodes, including the slowly evolving enoplean nematodes (Rota-Stabelli and Telford 2008). The two data sets that contained Nematoda had a total of 88 taxa and 2,016 amino acid residues and 59 taxa and 2,946 amino acid residues.
We analyzed the 66-taxa nucleotide and amino acid data sets using a variety of evolutionary models and phylogenetic tools. The nucleotide alignment was analyzed under both Bayesian and ML frameworks using MrBayes3.1.2 and RAxML7.0.3, respectively (Huelsenbeck and Ronquist 2001, Stamatakis 2006). In both cases, we excluded third codon positions and modeled 1st and 2nd codon partitions separately using two GTR models and a gamma distribution with five categories (Lanave et al. 1984). For the RAxML analyses, we used the fast ML method and performed a bootstrap analysis (100 replicates). For the nucleotide data, we also used the Neutral Transition Exclusion (NTE) model of Hassanin (2006) with codon positions recoded using the program Recoder (Masta et al. 2009, http://web.pdx.edu/~stul/Software.html) and modeling the 1st and 2nd codon positions with two distinct GTRs and the 3rd position with a two-state character model. Two independent runs of 1 million generations did not converge and supported different tree topologies. We therefore calculated the consensus tree by sampling trees only from the run associated with the higher mean log likelihood.
The amino acid data set was analyzed more extensively using homogenous and heterogenous models of sequence evolution under both Bayesian and ML frameworks. Cross-validation analyses to test the fit of different models to our data set were carried out with PhyloBayes 2.3 following the protocol described in the manual (Lartillot et al. 2009). We used the MtREV mitochondrial model (Adachi and Hasegawa 1996) as a reference to test the fit of other models: the CAT model (Lartillot and Philippe 2004), the mechanistic GTR model (Lanave et al. 1984; Yang et al. 1998), MtZoa (Rota-Stabelli et al. 2009), and MtArt (Abascal et al. 2007). Using the MtREV model as a reference, results of the cross-validation were as follows: mtREV versus: MtArt = −80.1 (±25.8); GTR = −85.4925 (±25.3); MtZoa = −91.46 (±21.2); CAT = −169.242 (±18.8). Bearing in mind that negative values correspond to a better fit, we chose CAT and MtZoa for further analyses.
Bootstrap ML analyses with 100 replicates were carried out with the fast ML method implemented in RAxML using custom implementations of the MtZoa models and a four-category gamma distribution. Bayesian analyses were carried out using both MrBayes and PhyloBayes. In both cases, we described the among-site rate variation with a gamma distribution using four categories. We ran two independent tree searches and stopped them after the likelihood of the sampled trees had stabilized and the two runs had satisfactorily converged (standard deviation [SD] of split frequencies lower than 0.02 in MrBayes and maxdiff less than 0.2, but in most of the cases less than 0.01, in PhyloBayes). Bayesian analyses under CAT, GTR, and MtZoa models were performed with PhyloBayes.
Bayesian analyses were also performed using the CAT-BP model implemented in NH-PhyloBayes (Blanquart and Lartillot 2008) to account for among-site-heterogeneity of the replacement process and among-branch heterogeneity of the stationary frequencies (Lartillot and Philippe 2004; Blanquart and Lartillot 2008). For the NH-PhyloBayes analyses, we set the number of categories to a value ranging from 120 and 140, as learned by using standard CAT analyses. We ran a minimum of two independent runs for each analysis in NH-PhyloBayes, but it was impossible to obtain a meaningful convergence even after millions of generations and multiple runs. This can be explained by the large number of taxa in the data sets and the many free parameters of the model. We therefore sampled trees from each run independently and compared the results of independent runs.
In order to explore the signal in our data set, and clarify the placement of the Tardigrada, we sequentially removed rapidly evolving species, which show dubious relationship with Tardigrada in the 66-taxa alignments. We sequentially removed the two Pycnogonida (64-taxa data set), the two Symphyla (62-taxa data set) and then the outgroup taxa plus the rapidly evolving Arachnida (46-taxa data set). We inferred phylogenies from these data sets using PhyloBayes and RAxML, modeling the evolutionary process with the CAT and MtZoa models, respectively.
We further explored the signal in sequences by removing classes of rapidly and slowly evolving sites using the slow–fast approach of Brinkmann and Philippe (1999) as modified in Sperling et al. (2009). It is well known (e.g., Brinkmann and Philippe 2007, Pisani 2004, Sperling et al. 2009) that rapidly evolving sites present a problem for phylogenetic inference as they often contain no genuine phylogenetic signal but contribute to various phylogenetic artifacts. Castoe et al. (2009) recently pointed out that sites that evolve too slowly in mitochondrial coded proteins can also be misleading for phylogenetic analyses because they are more likely to undergo adaptive convergent evolution in unrelated lineages. We thus decided to use the approach of Sperling et al. (2009) in which sites are partitioned into quartiles and only those from the two internal ones (i.e., those with the most homogenous rate of substitution) are used for phylogenetic analyses, with the very slow and very fast sites analyzed for comparison. The taxa were partitioned into seven monophyletic groups (Echinodermata, Lophotrochozoa, Aranea, Acari, Myriapoda, Hexapoda, and Crustacea), and PAUP4b10 (http://paup.csit.fsu.edu/) was used to calculate site-specific parsimony scores. Phylogenetic analyses using CAT and MtZoa were then performed on two data sets: one including the sites from the 2nd and 3rd quartiles of the distribution of parsimony scores and, as a matter of comparison, one including the sites from the 1st and 4th quartiles. The sites from the internal quartiles are expected to be homogenous among them, whereas sites from external quartiles are intrinsically heterogenous.
We explored the nucleotide compositional diversity of Ecdysozoa mitochondrial protein-coding genes (fig. 1). Compared with outgroups, all the ecdysozoans are characterized by mitochondrial coding sequences impoverished in G and C. However, the degree of heterogeneity among the main Ecdysozoa groups is remarkable. Sequences in Onychophora are extremely A + T rich to a degree that is comparable only with those of the well-known compositionally problematic ticks and nematodes (supplementary table 1, Supplementary Material online). Priapulida are characterized by a more balanced nucleotide composition, as, to a lesser extent, are tardigrades. Notably, the sequences of the four arthopods “subphyla” are also heterogenous, with hexapods and chelicerates being A + T rich and myriapods and crustaceans less so. Interestingly, the Chromadorea nematodes are extremely A + T rich, but the relatively slower evolving Enoplea (Rota-Stabelli and Telford 2008) have a less extreme nucleotide (and amino acid) composition. Such variability in nucleotide composition is known to result in erroneous phylogenetic reconstructions (see Mooers and Holmes 2000).
As expected, the overall nucleotide composition and the proportion of the “GARP” amino acids are highly correlated (R2 = 0.76). However, for some lineages such as Onychophora, Tardigrada, and Hexapoda, the amino acid composition, and its SD, is evidently less biased than the nucleotide one. Furthermore, comparison with the expected amino acid composition of randomized nucleotide sequences (white squares and dotted regression line) suggest that real amino acid sequences are less GARP biased then expected by chance alone. Also, the slope of the regression line of expected amino acid values is steeper than that of the real data. This observation suggests that some constrain is working at the amino acid level and that the inference of phylogeny based on amino acids may be less prone to compositionally driven systematic errors (but see Delsuc et al. 2003).
We compared the gene order (fig. 2) and strand asymmetry properties (fig. 3) of mitochondrial genomes sequenced for this study with those of other Ecdysozoa (Webster et al. 2006, Podsiadlowski et al. 2008) and the putative AAGO (identical to that of L. polyphemus) (Lavrov et al. 2000). Because protein-coding genes in AAGO (represented by L. polyphemus) have two possible transcriptional orientations, they experience different strand asymmetry pressures. In addition, 1st and 2nd positions in the conserved genes of complex IV (cox1, cox2, and cox3; Nardi et al. 2003) are slightly affected by strand bias, whereas the more rapidly evolving genes of complex I (the NADH subunits) are clearly positively or negatively skewed. The 3rd codon position (right part of fig. 3) that is less constrained than the 1st plus 2nd positions (left part of fig. 3) also accumulates nucleotide skews more quickly and is more likely to be at equilibrium comparing with them.
The mitochondrial gene order of the tardigrade Thulinia sp. differs from the AAGO only in the position of trnI, which is located between trnL1 and trnL2 (as also observed in H. dujardini) and has an opposite transcriptional polarity (fig. 2). The H. dujardini mitochondrial genome displays several additional rearrangements not present in Thulinia. These autapomorphies include the inversion of trnR, an interchange of the positions of the trnT-nad6-cob-trnS2 and the nad1-trnL2 regions and transpositions of nad2 and two clusters of tRNAs (trnW-trnC-trnY and trnK-trnD). None of the protein-coding genes change their transcriptional polarity in these rearrangements, and so their GC skew values resemble the AAGO strand profile (red and orange bars in fig. 3A).
The gene order in the onychophoran Peripatoides sp. is identical to the AAGO with the exception of an inversion of trnQ (fig. 2). Conversely, the two other representatives of Onychophora (Podsiadlowski et al. 2008, Podsiadlowski et al., unpublished data; GenBank accession EF624055) display multiple gene rearrangements, autapomorphic for each species. As a result, the three onychophoran mitochondrial genomes share very few gene boundaries (only atp6-atp8, nad1-trnL2, and cob-trnS2). The strand profile in Onychophora differs significantly from L. polyphemus, showing a reversal of strand asymmetry in some regions (fig. 3B). The three onychophorans share positive GC skew signatures for cox1, cox2, cox3, atp6, nad2, and nad3 (genes that have negative skew in L. polyphemus) suggesting a shared ancestral global reversal of the skew, possibly due to an inversion of the control region. Some genes such as nad5, nad4 and nad4L, and cob do not have a conserved strand profile across the three onychophorans, possibly because of a recent shift in the strand environment of these genes. The strong correlation of strand asymmetry at 1st + 2nd positions and at 3rd codon position for all genes in E. biolleyi illustrates that this genome is at equilibrium and supports the hypothesis of an ancestral inversion of the control region in the group. By contrast, the skew pattern in M. inae appears to be out of equilibrium and displays a more complex series of rearrangement events. Additional mitochondrial genomes from related taxa will be very valuable in unraveling this conundrum.
The mitochondrial gene arrangement of the priapulid H. spinulosus is exactly the same as that of the previously published P. caudatus (Webster et al. 2006). Both arrangements differ from AAGO by a single inversion of the trnS1-rns region (fig. 2). Because the GC skew is considered to result from the replication process, we expected the inverted genes in the priapulids to have an opposite skew to that of L. polyphemus. Indeed, the skew values at 1st + 2nd and 3rd positions for these genes in H. spinulosus are as expected (fig. 3C). However, in P. caudatus, the skews at 1st and 2nd positions are much reduced in magnitude, whereas at the 3rd position, the skew is opposite to that predicted. One explanation for this discrepancy is that there has been a recent inversion of the control region in P. caudatus and that skew values have not yet reached equilibrium in their new mutational pressure regime.
The lack of unambiguous shared derived (synapomorphic) gene rearrangements among Arthropoda, Tardigrada, Onychophora, and Priapulida means that no resolution can be achieved for their interrelationships using mitochondrial gene order data. This conclusion rejects some previous claims based on mitochondrial gene order data of close relationships between Arthropoda and Tardigrada (Ryu et al. 2007). Because the AAGO has also been inferred as the putative protostome ancestral gene order (Lavrov and Lang 2005), no resolution for the relationships between Ecdysozoa and other protostome groups (such as Lophotrochozoa) can be achieved based on this character set. However, the presence of synapomorphies for Tardigrada and Priapulida (see above) as well as additional rearrangements in Tardigrada and Onychophora suggest that mitochondrial gene order data will be informative for phylogenetic studies within these groups.
The 66-taxa data set analyzed (see below) does not contain any Nematoda, although they are of key importance for resolving the affinities and internal relationships of the Ecdysozoas, in particular of the Tardigrada, which have been linked to Nematoda in phylogenomic studies (Lartillot and Philippe 2008, Dunn et al. 2008). However, nematode mitochondrial genomes have high evolutionary rates, and previous attempts to use them in phylogenetic reconstruction led to dubious assemblages of Nematoda with other rapidly evolving lineages (Mwinyi et al. 2009, Podsiadlowski et al. 2008). We investigated the effect of Nematoda on phylogenetic inference by assembling preliminary data sets that included representatives from this group, in particular sampling slowly evolving enoplean nematodes. In these analyses, Nematoda were associated with rapidly evolving lophotrochozoan outgroups, implying a polyphyletic Ecdysozoa (supplementary fig. 1a and b, Supplementary Material online). These issues appear insurmountable with our current models of sequence evolution, and we have therefore excluded Nematoda from our main study data set.
Bayesian and ML analyses of 1st and 2nd codon positions performed under the GTR model (fig. 4) supported monophyly of Ecdysozoa, with Priapulida placed as the sister group of Onychophora plus Arthropoda, although with weak support. Arthropoda is paraphyletic in this tree as Tardigrada are grouped with the rapidly evolving sea spider (Pycnogonida) and Symphyla. Bootstrap support values (in bold in fig. 4) from the ML analyses are low, suggesting that either the phylogenetic signal in this data set is weak or competing non-phylogenetic signals are present. Furthermore, inspection of branch lengths shows that Tardigrada, Pycnogonida, and Symphyla are rapidly evolving lineages, suggesting that their grouping may be the result of the LBA. Mitochondrial genomes of Ecdysozoa are characterized by different patterns of strand asymmetry (fig. 3 and supplementary table 1, Supplementary Material online). The NTE recoding strategy has been shown to reduce strand bias artifacts (Hassanin et al. 2005, Jones et al. 2007), but analysis of the alignment under NTE recoding yields trees that are largely similar to those recovered using standard GTR models, grouping Tardigrada and Pycnogonida with the Symphyla (see posterior probabilities [PPs] underlined in fig. 4). We conclude that artifacts due to strand asymmetry are not driving the resolution of the relationships of Tardigrada as their mitochondrial genomes have a pattern of strand asymmetry typical for the majority of Arthropoda sampled (fig. 2A).
The amino acid content of ecdysozoan mitochondrially encoded protein genes is markedly more homogenous among different lineages than the nucleotide content (fig. 1), suggesting that structural constraints acting at the protein level may reduce the effects of mutational pressure acting at the nucleotide level. As the homogeneity of the stationary frequencies across the tree is an assumption of the majority of evolutionary models, the amino acid alignment appears to be a better substrate for inference of phylogeny (but see Delsuc et al. 2003). Consequently, we carried out more detailed analyses on a protein data set of 66 taxa and 2,307 amino acid residues. Initially, we performed a cross-validation analysis to test the fit of different models (MtREV, MtArt, MtZoa, GTR, CAT) to the data set. Results clearly show that the heterogenous CAT model best fits the 66-taxa data set. Interestingly, the second best model is MtZoa, which fits the data set better than the mechanistic GTR, probably as a result of the data set not containing enough replacement information to satisfactorily estimate all the parameters of the GTR matrix (Rota-Stabelli et al. 2009). We thus chose the CAT and MtZoa models for further analyses and used the other models for comparative purposes only.
The consensus tree from the Bayesian and ML analyses using the MtZoa model (fig. 5A) resembles the nucleotide tree of figure 4, with the exception that Tardigrada plus Pycnogonida was nested within paraphyletic Arachnida, and was not in a monophyletic clade with the Symphyla. Analyses performed using MtArt and MtREV resulted in topologies that were very similar to that derived using MtZoa (data not shown).
The grouping of Tardigrada, Pycnogonida, and Symphyla has no support from morphological data and challenges two commonly accepted notions, monophyly of Chelicerata (supported, e.g., by the presence of chelicerae) and monophyly of Arthropoda (which possesses articulated appendages). A possible LBA artifact is suggested by the extremely accelerated rate of evolution of the mitochondrial genomes of the sampled tardigrades, pycnogonids, and Symphyla. Analyses under the NTE model and the exclusion of strand-biased amino acids recovered the same topology as analysis of the original data set under GTR, suggesting that a Pycnogonida/Symphyla affinity of Tardigrada is not simply due to strand asymmetry–driven bias but rather to a more general LBA artifact.
To test the possible effect of systematic LBA errors, we sequentially removed taxa from the 66-taxa data set: the rapidly evolving Pycnogonida, the rapidly evolving Symphyla, and all the outgroups and some Chelicerata with accelerated rates of evolution and/or inversions of strand asymmetry (see supplementary table 1, Supplementary Material online). Under the homogenous MtZoa model, the position of Tardigrada was very unstable through this data reduction scheme. Using the full 66-taxa data set, Tardigrada were sister to the Pycnogonida (fig. 5A). Considering only the ecdysozoan taxa, this corresponds to a four-taxa tree ((paraphyletic Arthropoda + Tardigrada), (Onychophora, Priapulida)) or ((pA + T),(O,P)). When Pycnogonida were excluded, Tardigrada were sister to Symphyla (i.e., ((pA + T),(O,P)); fig. 5B). Exclusion of Symphyla yielded Tardigrada as sister to all remaining Ecdysozoa and Priapulida as most closely related to Arthropoda (i.e., (((A,P),O)T)); fig. 5C). Eventually, when the fastest evolving lineages were removed and only slowly evolving Priapulida retained, Tardigrada are weakly recovered as sisters to Onychophora (i.e., (P,(A,(O,T))); fig. 5D). Analyses, using MtREV, MtArt, and GTR, gave similar results (data not shown).
A clade of Tardigrada and Onychophora is also consistently recovered using the CAT model whether the rapidly evolving species are included or not (fig. 6). The CAT model has been shown to be quite effective at overcoming the effects of LBA (Bourlat et al. 2009, Lartillot et al. 2007, Lartillot and Philippe 2008) and is the model that best fits our data. Consequently, the CAT topology should be regarded as more likely than that obtained using homogenous models (MtZoa or GTR) and the full set of taxa (respectively, fig. 5A and fig. 4). We have also analyzed the data set using the CAT-BP model (Blanquart and Lartillot 2008), although problems with convergence (see Materials and Methods for more details) prevented us from drawing definite conclusions using this model. The use of the CAT-BP model in the analysis of the full data set resulted in a sister group relationship between Tardigrada and Pycnogonida, whereas the same analysis in the absence of the fast evolving Pycnogonida tepidly supported a sister relationship between Tardigrada and Onychophora in accordance with the CAT analyses (supplementary fig. 3, Supplementary Material online).
In conclusion, although the position of Tardigrada is highly unstable, we tentatively support its affiliation with Onychophora, as suggested by the CAT model and the analyses using reduced data sets and MtZoa. The outgroup roots in the Priapulida branch in the four-taxa tree in most analyses, suggesting the tree (outgroup,(P,(A,(O,T)))).
Not all sites in an alignment have the same rate of evolution. Rapidly evolving sites accumulate multiple mutations and tend to be saturated and contribute to LBA (see Brinkmann and Philippe 1999; Pisani 2004, Sperling et al. 2009). Recently, Castoe et al. (2009) have shown that very slowly evolving sites (under strong purifying selection) can also be phylogenetically misleading as they may experience parallel adaptive changes in unrelated lineages. Accordingly, following Sperling et al. (2009), we explored the distribution of signal in the alignment by separating the sites with moderate evolutionary rates (i.e., those most likely to represent the most reliable source of phylogenetic signal) from the slowly and rapidly evolving sites (those more likely to convey misleading signal). To identify sites with moderate rates (see Materials and Methods), we ranked individual sites according to their rate (inferred using the slow–fast method; Brinkmann and Philippe 1999). We then used sites in the 2nd and 3rd rate quartiles (moderately evolving) for phylogenetic reconstructions and sites in the 1st and 4th quartiles (slow and fast evolving sites) for comparison.
Unsurprisingly, the CAT tree built using the collection of rate heterogenous sites (i.e., the fast and slowly evolving; fig. 7A) supports a tardigrades affinity for the chelicerates. Because these sites are most likely to be misleading (Brinkmann and Philippe 1999; Pisani 2004; Sperling et al. 2009; Castoe et al. 2009), this result confirms that a Chelicerata affinity for the Tardigrada is unlikely to be correct. The same tree also supports a group of onychophorans plus spiders as well as paraphyletic Pancrustacea (insects plus crustaceans), all extremely dubious topologies. This suggests that the signal associated with the set of the heterogenous sites carries a high amount of non-phylogenetic signal. Furthermore, fast and slow evolving sites may be driven by different pressures (GC% for fast sites, positive selection for the slow sites), making the data set in conflict internally.
On the other hand, a CAT tree based on the sites with moderate evolutionary rates (fig. 7B) supports monophyly of commonly accepted Pancrustacea, Chelicerata, and Arachnida groups rendered poly- or paraphyletic in the analyses of heterogenous sites (fig. 7A). These results suggest that signal in moderately evolving sites is reliable. Notably, this partition supports a monophyletic origin of Panarthropoda, the tree (outgroup,(P,(A,(O,T)))). The sister relationship between Tardigrada and Onychophora is weakly supported (at PP of 58), as is the basal position of Priapulida in Ecdysozoa (PP 53). This weaker support could have resulted from the reduced dimensions of the alignment (and the exclusion of some sites conveying genuine phylogenetic signal) and the inability of the reduced data set to efficiently resolve alternatives in the parameter-rich CAT model.
When data transformations that are known to reduce LBA are implemented—the use of optimal outgroups (fig. 5D) using effective substitution models (CAT in fig. 6) and the exclusion of unreliable sites (fig. 7B)—support for a monophyletic Panarthropoda in which Onychophora and Tardigrada are sister groups emerges. The recent phylogenomic study of Hejnol et al. (2009) supported a closer relationship between Onychophora and Arthropoda but places Tardigrada as a sister group to Nematoda + Nematomorpha. Similar phylogenetic position of Tardigrada was found in the study by Roeding et al. (2009). In these studies, however, Tardigrada and Nematoda are fast evolving lineages and their clustering may be due to phylogenetic artifacts, such as LBA. In our data set, the grouping of Tardigrada and Onychophora is unlikely due to LBA: Mitochondrial sequences in Tardigrada are rapidly evolving, whereas those in Onychophora have a moderate rate of evolution.
There are, however, no commonly accepted synapomorphies of a Tardigrada plus Onychophora clade, although morphologists are divided over whether one of the two is the sister group of the Euarthropoda. A tentative character uniting the tardigrades and the onychophorans is their shared possession of non-articulated clawed appendages, as in the Cambrian lobopodian Aysheaia, but in contrast with arthropods that have articulated ones (Nielsen 2001). A lack of information from panarthropod stem group (and/or the difficulty to assess their phylogenetic position) prevents from possible polarization of this character. Tardigrada lacks an ostiate heart, which is shared by Onychophora and Arthropoda, the two latter also sharing segmental leg musculature (Edgecombe 2010). Conversely, evidences from cuticular and developmental structures suggests a sister relationship between Arthropoda and Tardigrada (Nielsen 2001; Mayer and Whitington 2009b). It is, however, clear that morphological comparisons are complicated by the extremely reduced and derived nature of Tardigrada and possibility of parallel evolution.
We were not able to elucidate the phylogenetic position of Nematoda; therefore, it is possible that it still forms a sister group to Tardigrada within the Panarthropoda. Clearly, our favorite topology (Panarthropoda) is inconsistent with that of published phylogenomic analyses (Tardigrada plus Nematoda with the exclusion of Onychophora) (Hejnol et al. 2009; Roeding et al. 2009). However, it is clear that the mutual relationship of Nematoda and Tardigrada remain an open question, exacerbated by the derived nature of nematodes mitochondrial sequences. Complete mitochondrial genome from the nematode-like Nematomorpha may help in shorten the extremely long stem Nematoda branch and polarize possible characters as apomorphies of the nematodes.
Ecdysozoa is strongly recovered in all our analyses, with the Priapulida as the sister group of remaining ecdysozoans, the Panarthropoda. Regardless of the position of Tardigrada, the majority of our trees support a sister relationship between Onychophora and Arthropoda. The only exception is the tree in figure 5C, which supports Priapulida as the sister group of the Arthropoda, a topology that can be interpreted as an artifact due to the mutual attraction of Tardigrada and the outgroup, which may have also pulled Onychophora (assuming they are the sister group of Tardigrada, see figs. 5D and 6) toward the base of the tree. This view is reinforced by the analysis of the same data set of figure 5C from which the Tardigrada were excluded, which recovers Onychophora as sister group of the Arthropoda (data not shown).
In a previous analysis, Podsiadlowski et al. (2008) could not recover the sister relationship between Onychophora and Arthropoda; this is easily explained by a limited taxon sampling in their analyses (e.g., only one onychophoran and priapulid, no tardigrades). The addition of new sequences from Peripatoides and Euperipatoides appears to increase the informative phylogenetic signal and thus resolution of the Onychophora sister group position. Within Onychophora, the Peripatopsidae (austral Onychophora) are monophyletic, with the Australian species (E. kanagrensis) more closely related to the New Zealand species (Peripatoides sp.) than to the Chilean species (M. inae), likely reflecting ancient Gondwanan distributions.
A monophyletic origin of Pancrustacea (the clade comprising Hexapoda and Crustacea) is strongly supported in all our analyses. Our reduced pancrustacean taxon sampling (only Malacostraca and Branchiopoda for the crustaceans and no Collembola) prevented us from testing monophyly of hexapods, which has been challenged by previous mitochondrial analyses (Carapelli et al. 2007).
As for the position of Myriapoda, most of our analyses (figs. 4, 5A, 5B, and and6)6) tend to support Myriochelata. However, as rapidly evolving lineages are removed (toward fig. 5), support for Myriochelata decays and in the data set with all putative rapidly evolving species excluded (characterized by greater homogeneity of the rate of evolution among lineages), Myriapoda are grouped with the Pancrustacea, supporting the Mandibulata hypothesis (PP 0.96 in fig. 5D). Decrease in support for Myriochelata is also observed using the CAT model (fig. 6). Furthermore, when only the sites with moderate rates of evolution are analyzed, the CAT model strongly supports Mandibulata (PP 98 in fig. 7B). These results suggest that signal supporting Myriochelata is found in rapidly evolving sites or is associated with data sets containing rapidly evolving species. In particular, the Symphyla, which tends to group with Chelicerata in most of the analyses (e.g., in figs. 4 and and5),5), render Myriapoda paraphyletic. Conversely, when sources of systematic error are reduced (excluding rapidly evolving sites and/or rapidly evolving lineages), this data set lends support to Mandibulata.
Finally, in some of our phylogenies (figs. 4 and and5A),5A), Chelicerata are paraphyletic due to the inclusion of Tardigrada as a sister group to Pycnogonida, an affinity we have above interpreted as LBA. When Pycnogonids are excluded from the analysis (fig. 5B–D), Tardigrada are placed in other parts of the tree leaving Chelicerata monophyletic. On the other hand, using the CAT model, Chelicerata is recovered as a monophyletic group with the Pycnogonida being the sister group to the remaining euchelicerates (fig. 6). Unexpectedly, but in accordance with a recent mitochondrial study of chelicerates (Masta et al. 2009), the horseshoe crab L. polyphemus (Merostomata) is grouped with the harvestman P. opilio (Opilionidae) and the camel spiders Nothopuga sp. and E. palpisetulosus (Solifugae) in most of our analyses, rendering the Arachnida polyphyletic. However, as in the case of the Myriochelata, support for this dubious grouping decays as rapidly evolving lineages are excluded from the alignment. When long branch arachnids are excluded, both CAT (fig. 6) and MtZoa (fig. 5C and D) recover Merostomata as basal Chelicerata, whereas Opiliones and Solifugae join Acari in a monophyletic Arachnida.
Given the ancient origin of Ecdysozoa and the high rate of mitochondrial DNA evolution in bilaterian animals, one can expect the phylogenetic signal in ecdysozoan mitochondrial genomes to be low. Furthermore, the recovery of this signal is impeded by lineage-specific rate heterogeneities (supplementary table 1, Supplementary Material online), nucleotide composition biases (fig. 1), and strand asymmetrical properties (fig. 3). It is clear that the amount of phylogenetic information available for resolution of some nodes is meager in mitochondrial sequences. However, heterogenous sequence composition did not play a key role in misleading phylogenetic reconstruction in our data set as analyses designed to reduce the effect of strand bias (NTE model) and amino acid composition (CAT-BP model) do not ameliorate the problems. One explanation for the difficulty observed in robustly placing some lineages of interest is LBA. In particular, our analyses suggest that LBA is most likely responsible for grouping Tardigrada with the rapidly evolving Arthropoda (figs. 4 and and5A5A).
Here, we have shown that experiments designed to reduce LBA recover a group of Tardigrada plus Onychophora as sister to the Arthropoda in agreement with morphological predictions of a common origin of paired walking appendages (and possibly segmentation) in the Panarthropoda. We note that the same experiments also recover monophyly of usually accepted groups, such as Arachnida and Mandibulata.
Thus, whereas the phylogenetic signal in our mitochondrial data sets is limited, preventing us from drawing firm conclusions, the congruence of analyses that are expected to provide more accurate results in the presence of LBA suggests the following hypotheses for Ecdysozoa: (outgroups, (Priapulida, (Arthropoda, (Tardigrada, Onychophora)))), and Arthropoda: (outgroups, (Chelicerata, (Myriapoda, (Hexapoda, Crustacea)))). The addition of the remaining ecdysozoan phyla, in particular of Nematomorpha, to the mitogenomic data set may elucidate these relationships with more confidence.
The authors thank Martin Jones for useful comments on the manuscript, Stuart Longhorn for NTE recoding the data set, and Samuel Blanquart for assistance with NH-PhyloBayes. O.R.-S. was supported by the Marie Curie RTN ‘ZOONET’ and by Science Foundation Ireland RFP grant (08/RFP/EOB1595). All computational work was completed using the infrastructure of the NUI Maynooth High Performance Computing Facility.