Analyses of the individual protein datasets using neighbor-joining [
16] show that most (62%) support Coelomata while 25% support Ecdysozoa and 13% support hypothesis III (
Supplemental Table 1, four-taxon analysis). Of the 25 proteins in which support for one of the three hypotheses is significant (≥ 95%), the results are 84% (21 proteins), 16% (4), and 0%, respectively; for those ten proteins with a highly significant (≥ 99%) topology, the results are 90% (9 proteins), 10% (1), and 0%, respectively. The four proteins showing significant support (>95%) for a hypothesis other than Coelomata were reanalyzed using maximum parsimony and maximum likelihood; bootstrap values were not significant using other methods. Such divided results are typical of single-gene analyses because of limited information (~400 amino acids), necessitating combined analysis. Coelomata was supported significantly (100% bootstrap confidence, posterior probability = 1.0) when the 100 four-taxon protein alignments were concatenated and analyzed using neighbor-joining, maximum parsimony, maximum likelihood, and Bayesian inference (Fig. ). Using the Shimodaira-Hasegawa (SH) test [
17], this maximum likelihood topology was significantly different from the alternative hypotheses of Ecdysozoa (
P < 0.001) and Hypothesis III (
P < 0.001). These results agree with earlier studies involving 10–36 nuclear proteins [
8,
9,
11].
To test the stability of Coelomata to taxon sampling, we included new sequences of the planarian
Dugesia japonica (Phylum Platyhelminthes) in 100 five-taxon protein alignments (
Supplemental Table 1, five-taxon analysis). The results, upon concatenation with this additional taxon, were unchanged (Fig. ): Coelomata continued to be significantly supported (≥ 98% bootstrap support, posterior probability = 1.0) and the alternative hypotheses were both rejected using the SH test (
P < 0.001), although the relationships among the basal phyla (Nematoda and Platyhelminthes) could not be resolved.
Another potential bias is the specific taxa included in an analysis. For example, the original support for Ecdysozoa was obtained only with a particular genus of nematode,
Trichinella, that had a short branch in the 18S rRNA tree [
3]. To test this, phylogenies were constructed using different species of nematodes. The Coelomata hypothesis was significantly supported (≥ 97% bootstrap support, posterior probabilities = 1.0; Ecdysozoa rejected by SH test,
P < 0.006; Hypothesis III rejected by SH test,
P < 0.027) using either a genus in a different order,
Brugia (18 proteins), or
Trichinella (six proteins) (Fig. ). To further address the possibility that these results could be biased by taxon sampling, we included representatives from all available phyla for each protein. The results indicate that an increase in the number of taxa does not decrease single-protein support for Coelomata; in fact, the trend is the reverse (Fig. ). Simulation studies have shown that incomplete taxon sampling does not increase topological errors, and that most error is caused by limited sequence data [
18].
In the initial study defining Ecdysozoa [
3], rate variation was considered to be the major bias affecting the phylogenetic position of nematodes. In the 18S rRNA gene, nematodes typically have long branches indicating an increased rate of sequence change. Other nuclear genes also show this pattern, but to a lesser degree [
8,
9]. Phylogenetic methods can accommodate moderate amounts of rate variation among lineages without producing an incorrect phylogeny [
19]. However, if the rate of change is sufficiently large, longer branches in a phylogeny will sometimes attract one another [
20]. If that happens, an ingroup species with a long branch may move to a more basal position in the tree. In analyses of the 18S rRNA gene, nematodes typically appear basal to arthropods + vertebrates. Because the use of a short branch nematode (
Trichinella) resulted in a tree whereby nematodes clustered with arthropods, the basal position of nematodes in typical 18S analyses has been interpreted as long-branch attraction [
3].
If nematodes cluster basally because of long-branch attraction, then the strongest support for Ecdysozoa should be obtained with the slowest evolving proteins. This was tested in an analysis of 36 nuclear proteins [
8], but the results were equivocal. Therefore, we tested this suggestion with our four-taxon data set of 100 proteins, ordered by rate of evolution. Rate orders were determined in two ways: (i) nematode branch length and (ii) vertebrate-arthropod pairwise distance. The 100 proteins were grouped into concatenations of 10 proteins and 20 proteins to increase statistical resolution. The results show support for Coelomata at all rate orders, but the support is significant with the slowest evolving proteins, regardless of rate measure or number of proteins combined (Fig. ). Concatenations of slow evolving proteins also show compositional homogeneity (pairwise disparity index test,
P < 0.05) [
21], suggesting the basal position of the nematode results from true phylogenetic signal and not compositional bias. Support for Coelomata was weakest with the fastest evolving proteins (which also showed compositional heterogeneity), indicating that Ecdysozoa, not Coelomata, may be the result of a rate bias, compositional bias, or other artifact.
To ensure that this result was not affected by mutational saturation in our data set, the mean number of variants per variable site was determined for each protein and averaged for ten groups of proteins ordered by evolutionary rate (Fig ). As predicted, variable sites in the faster evolving proteins showed a higher number of variants than those in slow-evolving proteins. We also examined the minimum number of nucleotide changes required for sites where only the nematode sequence varied (Fig ). Slow-evolving proteins showed a smaller number of nucleotide changes required to alter amino acid identity, while faster evolving proteins required more changes. Thus, the nematode sequences in the slow-evolving proteins do not appear to be mutationally saturated.
Finally, the affect of lineage-specific rate variation on support for Coelomata was tested with the use of relative rate tests. Presumably, the selective elimination of genes with long branches will increase statistical support for the correct topology. Individual proteins from the four-taxon data set were each subjected to two different relative rate tests [
22,
23]. Proteins determined to be rate-constant at the typically applied stringency level (5% significance), and at two greater stringency levels (10% and 40% significance) were concatenated and bootstrap support was determined using neighbor joining. The results of the two tests were similar. As stringency increased, 40–83% of proteins were rejected, and the relative nematode branch length (to the arthropod and vertebrate branches) dropped from 16% to 0%. However, in all cases, Coelomata remained highly significant (Fig. ). Thus, the suggestion that a basal position of nematodes is the result of long-branch attraction [
3] can be rejected.
The importance of knowing the branching order of these species is illustrated by the immediate and wide acceptance of the Ecdysozoa hypothesis and its use in tracing patterns of developmental evolution [
5-
7,
10]. However, in the initial analysis of 18S rRNA sequences [
3], Ecdysozoa was statistically significant only when a paralinear distance method was used; three other methods did not yield significant bootstrap support. In that study, Ecdysozoa also was not significant, using any method, when the flatworm sequence was included [
3]. Subsequent analyses of the 18S rRNA gene have been interpreted differently [
24,
25], but none has yielded statistically significant results supporting Ecdysozoa. Moreover, the molting cuticles of arthropods (chitin) and nematodes (collagen) are not homologous [
4]. The significance of other morphological characteristics bearing on the position of nematodes continues to be debated [
26].
Besides the 18S rRNA evidence, other genetic evidence for the grouping of nematodes and arthropods has come from qualitative interpretations of
Hox gene [
10] and β-thymosin [
12] evolution. In the case of
Hox genes, support comes from a single posterior gene sequence (Y75B8A.1) of the nematode
Caenorhabditis elegans argued to have greater amino acid similarity with a posterior
Hox genes of
Drosophila and
Priapulus[
10]. Unfortunately, the
Hox homeodomain is a short (60 amino acid) region with many sequence differences between these taxa. Definition of "sequence signatures" is qualitative and has not been tested statistically. In a subsequent study of nematode posterior
Hox genes, other researchers were unable to determine if the simple nematode
Hox cluster of six genes is an ancestral or a derived condition [
13].
In the case of β-thymosin, a sequence signature also has been argued to support a grouping of
Drosophila and
Caenorhabditis[
12]. However, it is a gene family known to have paralogs within animals, the position of introns differs between sequences from the two species, and only four other metazoan taxa were surveyed. In addition, knowing the presence or absence of a gene can be problematic without the complete genome sequence of an organism (in this case, genomes were known only in
Drosophila and
Caenorhabditis). Thus, although suggestive, it is too soon to judge the significance of this sequence signature. One difficulty with interpreting such qualitative evidence, including
Hox gene orthology, is that almost any pattern can be found in nature if one looks. In other words, sequence signatures have not yet been surveyed systematically and objectively. In contrast, sequence evidence from randomly selected genes, analyzed phylogenetically, provides a more unbiased database amenable to statistical analysis.