|Home | About | Journals | Submit | Contact Us | Français|
Resolving the relationships among animal phyla is a key biological problem that remains to be solved. Morphology is unable to determine the relationships among most phyla and although molecular data have unveiled a new evolutionary scenario, they have their own limitations. Nuclear ribosomal genes (18S and 28S rDNA) have been used effectively for many years. However, they are considered of limited use for resolving deep divergences such as the origin of the bilaterians, due to certain drawbacks such as the long-branch attraction (LBA) problem. Here, we attempt to overcome these pitfalls by combining several methods suggested in previous studies and routinely used in contemporary standard phylogenetic analyses but that have not yet been applied to any bilaterian phylogeny based on these genes. The methods used include maximum likelihood and Bayesian inference, the application of models with rate heterogeneity across sites, wide taxon sampling and compartmentalized analyses for each problematic clade. The results obtained show that the combination of the above-mentioned methodologies minimizes the LBA effect, and a new Lophotrochozoa phylogeny emerges. Also, the Acoela and Nemertodermatida are confirmed with maximum support as the first branching bilaterians. Ribosomal RNA genes are thus a reliable source for the study of deep divergences in the metazoan tree, provided that the data are treated carefully.
Resolving the relationships among animal phyla is a key problem in modern biology, since they are instrumental in understanding the evolution of many biological features including, among others, body plans, embryonic development and gene networks. Unfortunately, morphology is unable to clarify the precise relationships among most phyla. Twenty years ago, interest in this field was intensified by the introduction of the small subunit (SSU) RNA gene (18S rDNA or SSU) into metazoan phylogenies (Field et al. 1988; Lake 1990). However, molecular phylogenies also have their own downsides and raise new problems. The SSU's lack of resolving power on some regions of the metazoan tree (Philippe et al. 1994; Abouheif et al. 1998; Adoutte et al. 2000) and the long-branch attraction effect (LBA, Felsenstein 1978) are the major concerns for phylogeny resolution and credibility (Anderson & Swofford 2004).
Different sources of data have been added to the SSU sequences to overcome these drawbacks and new markers have been developed, the most recent being phylogenomics (Philippe & Telford 2006; Dunn et al. 2008). These approaches have generally supported the SSU division of bilaterians, but have not, however, solved all the questions raised. Moreover, the extra information obtained in these studies is usually counteracted by reduced sampling of animal phyla. The most recent and complete analysis (Dunn et al. 2008) shows better resolution within groups such as the Lophotrochozoa that so far have been poorly resolved. Such improvement, however, comes at the price of removing key taxa (i.e. gastrotrichs, gnathostomulids, rotifers, acoelomorphs, bryozoans or chaetognaths) before the final analyses are performed. Despite these shortcomings, a consensus tree of the Bilateria has been portrayed by various authors (Adoutte et al. 2000; Halanych & Passamaneck 2001; Giribet 2002; Balavoine & Adoutte 2003; Halanych 2004; Telford 2006). In this scheme, the Bilateria are divided into three main groups, Lophotrochozoa, Ecdysozoa and Deuterostomia, the first being the most problematic group, owing to its high number of phyla and poor internal resolution. Furthermore, there are some groups that continue to have unsolved affinities that might hold the key to understand essential transitions in the bilaterian tree, namely the Acoelomorpha, the Chaetognatha, the Gnathifera and the Gastrotricha.
The main aim of this study was to combine different approaches to unravel the status of the three big clades, mainly the internal relationships in the Lophotrochozoa and the position of groups of uncertain affinities. For this endeavour, we have attempted to maximize the metazoan phyla sampling, and at the same time, minimize the LBA effect, these being two factors that have formerly been suggested to cause uncertainties and lack of resolution. We applied a careful analysis involving two steps. First, we used several strategies that have been proposed to avoid LBA based on real and simulated data (Anderson & Swofford 2004; Bergsten 2005) using methods less sensitive to LBA, such as maximum likelihood (ML) or Bayesian inference (BI); employing model modifications such as rate heterogeneity across sites with a discrete γ-distribution parameter; using the shortest branched representatives available for each phyla, and searching for the widest taxon sampling. Second, we compartmentalized the analysis of the still problematic lineages (those with extremely long branches or with dubious status such as polyphyly) by removing all of them from the analysis and re-introducing them again one at a time; this allowed evaluation of their respective position and the support they receive without the interaction with other problematic taxa. Surprisingly, despite the numerous SSU and large subunit (LSU) rDNA sequences present in the databases, an extensive analysis of SSUs and LSUs for all bilaterians, now applying these routine phylogenetic methods, has yet to be performed.
The SSU alignment used in Wallberg et al. (2004) was downloaded and new sequences were added from GenBank to complete the taxonomic sampling, namely the Nemertodermatida representatives, resulting in a total of 564 SSU representatives. The added SSU sequences were aligned using the Wallberg et al. alignment as a profile (see that paper for a thorough discussion on the alignment methodology). For LSU, 142 sequences from the former studies were aligned to secondary structure with notation modified from Gillespie et al. (2005). Alignments were performed and checked on Bioedit (v. 7.5, Hall 1999). The length-heterogeneous regions for which nucleotide homology could not be granted and the regions containing indels in the majority of sequences were removed prior to the analyses, using a very conservative criterion of retaining only unambiguously conserved blocks. The final alignments contained 1425 sites (out of 3365 nucleotides) for SSU and 2271 sites (out of 6847 nucleotides) for LSU. For accession numbers see additional table in the electronic supplementary material. A phylogeny was inferred from each of these two alignments and patristic distances to these outgroups (by ML with Treepuzzle) were calculated. These distances were used to select, for each gene, the shortest branched representatives of each phyla and these were reassembled in two alignments (SSU and LSU) both with 104 representatives to be used in the subsequent analyses.
The SSU and LSU datasets with 104 representatives (28 bilaterian phyla and the outgroup) resulting from the previous step were merged into a combined dataset. Whenever possible, SSU and LSU sequences came from the same species (additional table in the electronic supplementary material). For those representatives lacking LSU sequences, LSUs were replaced with Ns. In the Chaetognatha, the only LSU representative available was combined with the two chosen SSUs.
The phylogenetic analyses with the All-set were used to detect groups showing high rates of substitutions (considered fast evolving when the patristic distance to the outgroup was above 0.3, as calculated by ML with Treepuzzle) or presenting dubious status such as the Gastrotricha polyphyly. These groups (from now on dubbed problematic groups) were selected to produce five different subsamples in order to examine more accurately the position of such groups when the others were not present, hence avoiding interactions among them. These clades were removed from the All-set and five subsets were built adding only one of these groups at a time: ‘Acoelomorpha’ (Acoel-set), Gnathifera (Gnat-set, Gnathostomulida lacks LSU), Bryozoa (Bryo-set), Gastrotricha (Gast-set, Gastrotricha lacks LSU), and Chaetognatha (Chaet-set). With these compartmentalized analyses, we also wanted to test the effect of the problematic groups on the general topology.
This dataset simultaneously excluded the five problematic groups defined in the previous step. This dataset comprised 88 sequences from SSU and 87 from LSU (that lacks Micrognathozoa representatives) for 22 bilaterian phyla.
The Akaike information criterion was used in ModelTest (v. 3.6, Posada & Crandall 1998) to determine the evolutionary model best fitting each dataset. The specified model (GTR+Γ+I) was applied in all the algorithms where it was available. BI trees were inferred with a parallelized version of MrBayes software (v. 3.1, Ronquist & Huelsenbeck 2003). BI analyses were performed with and without partitioning the dataset into the two ribosomal genes (in the former case, unlinking the estimation of Statefreq, revmat, Pinvar and shape parameters for each partition) and with and without the covarion model. In all the cases, 3000000 generations were run in two independent analyses with a sample frequency of 1000, allowing the two runs to converge onto the stationary distribution. To obtain the consensus tree and the BI supports, 1000000 generations were removed to avoid including trees sampled before likelihood values had reached a plateau. RaxML (Stamatakis 2006), Treefinder (Jobb 2007) and PhyMl (Guindon & Gascuel 2003) were used to infer ML trees, with 1000 bootstrap replicates and the GTR+Γ+I model; in RaxML, a random topology was used as a starting tree and the support values were obtained with the Rapid Bootstrap algorithm. Neighbour joining trees were estimated using Mega with 1000 bootstrap replicates using the Kimura two-parameter model and pairwise-deletion option.
Competing topologies were evaluated for different datasets. Alternative topologies were based on the previous morphological or molecular studies indicated in the footnote) or were variations based on our analyses (table 1). The alternative trees were constructed using Treeview (v. 1.6.6, Page 1996) and PAUP (Swofford 2000) was used to calculate the site likelihoods for all trees and prepare the input dataset for Consel. Consel (v. 0.1i, Shimodaira & Hasegawa 2001) was run to perform the approximately unbiased test (AU, Shimodaira 2002; RELL; 1000 replicates; Shimodaira 2002).
NJ, Treefinder and PhyMl gave similar results, in many cases showing clear differences when compared with BI and RaxML trees. As a general trend, the first (NJ, Treefinder and PhyMl) result in some groupings contradicted by widely accepted clades (see examples and a discussion in figure 3 in the electronic supplementary material), mostly showing a tendency to group long branches together. A distance-based method such as NJ was expected to be prone to artefactual groupings, but to our surprise, similar results were found for many PhyMl and Treefinder topologies. Both programs start the heuristic search with a NJ inferred tree that could trap the search in local minima close to the topology of the NJ algorithm, pointing to inefficiency in the heuristic search. Given the worries about artefacts affecting these methods, we relied on RaxML as well as BI results (from all datasets), together with comparison of topologies, to define which clades were robustly recovered in our analyses.
BI and RaxML phylogenies agreed for all the datasets analysed, recovering the same basic topology across all datasets. No differences were seen in the topologies recovered by BI when covarion was used or when BI estimates were unlinked for the SSU and LSU partitions of the matrix. Figure 1 shows the BI and RaxML topology obtained from the All-set; the problematic groups are boxed (see §2a(iii); further analysed in figure 4 in the electronic supplementary material). Figure 4 in the electronic supplementary material (a) to (e) shows the trees obtained when the subsets including only one of the problematic groups were analysed. Finally, another dataset excluding all the problematic taxa (Basic-set, figure 4f in the electronic supplementary material) was used to test the effect on the support values when all the problematic groups were excluded.
The overall topology of the tree was consistent between the All-set (figure 1), the subsets and Basic-set (figure 4 in the electronic supplementary material). However, most of the nodal supports increased in the subsets and even more so in the Basic-set (except for the Deuterostomia), as shown in table 1. The fact that the supports did not decrease when long branches were removed clearly indicates that high supports in the All-set are not a consequence of a LBA artefact misleading the method. The position of the problematic groups in the All-set tree is consistent with their position in the subset phylogenies (compare figure 1 with figure 4 in the electronic supplementary material), although again the subsets showed notably higher support for these groups (table 1).
For each dataset, the best tree was statistically compared against alternative trees (table 2). Concerning the subsets, all the alternative topologies tested were significantly worse than the original tree for all the sets with two exceptions: (i) the test based on the Gast-set (hypotheses 9–11) rejects the original polyphyletic Gastrotricha in favour of their monophyly, despite the fact that the former is found in BI and ML trees, and (ii) the hypothesis placing chaetognaths as a sister group to ecdysozoans (hypothesis 13) can not be rejected. The All-set allowed the same alternative hypotheses tested in the previous datasets to be studied, as well as new ones. In general, the All-set allowed the rejection of fewer hypotheses than the previous analysis (table 2).
The analyses shown here represent, to our knowledge, the largest animal dataset of SSU and LSU sequences analysed to date using probabilistic methods. Overall, our results confirm that a combination of wide taxon sampling, the use of short-branched representatives and analyses that take into account the flaws of ribosomal genes still allows them to furnish new answers. ML and BI algorithms place long branches deep inside the ingroup, as clearly shown in the All-set (figure 1) and the subset analyses (figure 4 in the electronic supplementary material). If the LBA effect were active, the long branches would appear near the outgroup or close to one another. In our view, this suggests that LBA generally does not affect our results obtained with BI and ML.
Regarding the compartmentalized and the basic-set analyses, the trees show topologies that are consistent with the All-set, although with higher support (table 1). Moreover, the topology comparison test (table 2) for subsets generally rejects the alternative hypotheses, while in some cases; the same hypothesis cannot be rejected for the All-set. This could merely be an effect of the reduced taxon sampling or it could stem from the simultaneous presence of fast-evolving sequences in the All-set leading to homoplasy. Homoplasy, while not misleading the inference method, would reduce the proportion of sites supporting a node and would make the differences among alternative topologies non-significant. Taken together, these findings suggest that compartmentalized analyses are an adequate strategy to deal with simultaneous problematic groups in a phylogeny.
BI and ML results from all datasets, together with the comparison of topologies, were used to define which clades were robustly recovered in our analyses. These groups are summarized in the tree depicted in figure 2. For the first time, to our knowledge, in such a comprehensive SSU+LSU analysis, the monophyly of protostomia is recovered, and remarkably Lophotrochozoa and Ecdysozoa also appear with high support on most of the subsets (figure 4 in the electronic supplementary material). The most noticeable improvement when compared with previous studies is the increase in resolution obtained within the Lophotrochozoa, mainly in the subset trees (with the exception of the Gnathifera dataset).
Although, beyond the scope of the present paper, some of the relationships observed outside the lophotrochozoans are worthy of comment, such as the problematic behaviour of Ciona sequences that results in a non-monophyletic Deuterostomes in some analyses, and especially the position of the Chaetognatha and the Acoelomorpha. To avoid an extremely long discussion, these commentaries have been placed in the electronic supplementary material.
The lophotrochozoans are of special interest because they include the greatest body plan diversity of the three main bilaterian superclades. In fact, in its original node-based definition, this group included the last common ancestor of annelids, molluscs and lophophorates and all their descendants (Halanych et al. 1995). In some posterior comprehensive analyses involving many phyla, Lophotrochozoa refers to an extended group including many other phyla (such as Gnathostomulida, Platyhelminthes, Rotifera, etc.), owing to the very basal Bryozoa position in the resulting phylogenies. However, since most of the phyla included do not fit the original definition of having either trochophora larvae (Trochozoa) or a lophophore (Lophophorata), and the composition of the Lophotrochozoa is in a state of flux due to the unsettled situation regarding the Bryozoa, some authors propose avoiding this name. Until a better name is agreed, we prefer to avoid introducing more confusion by using new names. Hence, we have used Lophotrochozoa sensu stricto to name the group resulting from applying the original node-based definition to our final tree (clade I and II), and to avoid using new names we will refer to the extended assemblage (including Gnathostomulida, Gastrotricha and clade III) as Lophotrochozoa sensu lato.
Clade I receives high support in the analyses of all the datasets. It is an assemblage constituted by phyla with spiral cleavage (nemertines, annelids, molluscs, echiurans and sipunculans) and two lophophorate phyla with radial cleavage (brachiopods and phoronids). Although, affinities among these groups have already been hinted at in previous studies (Zrzavy et al. 1998; Giribet et al. 2000; Peterson & Eernisse 2001; Winchell et al. 2002; Passamaneck & Halanych 2006), the internal phylogeny shown here was not recovered in any of them. The recent phylogenomic study based on 150 genes (Dunn et al. 2008) shows the same group with similar relationships and also points to the inclusion of brachiopods and phoronids. The most basal group in clade I is Nemertea that are known to bear a coelomic cavity (Turbeville et al. 1992) and the hox signatures of lophotrochozoans (de Rosa et al. 1999; Balavoine et al. 2002). Next to the nemerteans, we find a highly supported (Echiura+(Annelida+Sipunculida)) group. A close relationship between echiurans and annelids has been proposed both on morphological grounds (Nielsen 1995; Hessling & Westheide, 2002) and molecular data (McHugh 1997; Giribet et al. 2000; Peterson & Eernisse 2001; Mallatt & Winchell 2002; Bourlat et al. 2008). In turn, sipunculans also have developmental affinities to annelids (Clark 1969; Rice 1985), a relationship that is supported by recent mtDNA and multigenic studies (Boore & Staton 2002; Struck et al. 2007; Bourlat et al. 2008) but that has never been proposed in previous SSU and LSU studies (Mallatt & Winchell 2002; Passamaneck & Halanych 2006). The sister group to the annelids assemblage is a clade made up by Mollusca and Phoronida+Brachiopoda. The brachiopod–phoronid affinity has already been shown on the basis of SSU data (Cohen et al. 1998; Cohen 2000; Peterson & Eernisse 2001) and mitochondrial gene data (Stechmann & Schlegel 1999; Helfenbein & Boore 2004; ). Former SSU+LSU analysis suggested the close relationship of brachiopods and phoronids to molluscs (Mallatt & Winchell 2002), a placement corroborated in our trees and suggested by paleontological evidence (Vinther & Nielsen 2005), while multigenic analyses have related brachiopods and phoronids to nemertea (Bourlat et al. 2008; Dunn et al. 2008) or the phoronida to a non-monophyletic mollusc group and brachiopods to nemertines (Bourlat et al. 2008).
In clade II, the Bryozoa cluster with Entoprocta+Cycliophora with maximum support and Platyhelminthes are the most basal phyla. Former SSU studies have already shown that Bryozoa are not closely related to lophophorates (Littlewood et al. 1998; Cohen 2000) and recent multigenic analyses linked them to Entoprocta (Hausdorf et al. 2007) or to nemertines and brachiopods (Bourlat et al. 2008). Cycliophora have also been related to entoprocts in morphological analyses (Funch & Kristensen 1995; Zrzavy et al. 1998; Sørensen et al. 2000) and in the most recent SSU+LSU study (Passamaneck & Halanych 2006), but SSU studies left their position open (Winnepenninckx et al. 1998; Giribet et al. 2000; Peterson & Eernisse 2001; Giribet et al. 2004). Therefore, independent molecular studies have suggested the relationship of bryozoans and cycliophorans with entoprocts, this study being the first proposing a strong supported clade that groups them all together. Regarding Platyhelminthes (Catenulida+Rhabditophora), molecular phylogenies have shown them as basal lophotrochozoans (Ruiz-Trillo et al. 1999; Peterson & Eernisse 2001) or situated within the Platyzoa (Giribet et al. 2000; Passamaneck & Halanych 2006). In our tree, flatworms appear in an unprecedented new position as a sister group to the Bryozoa+(Cycliophora+Entoprocta) clade, with high support in the Basic-set and Bryo-set (table 2). Albeit platyhelminths and cycliophorans share a negative trait, the acoelomate condition, this does not hold true for the pseudocelomate bryozoans and entoprocts. No obvious morphological synapomorphies exist for clade II.
Gnathostomulida are the first branching phylum of the Lophotrochozoa s.l. in the BI and ML trees for All-set and Gnat-set (figure 4b in the electronic supplementary material). Unfortunately, this relationship never shows significant support, probably due to the absence of LSU sequences for this phylum. Previous molecular studies related them to ecdysozoa (Littlewood et al. 1998; Zrzavy et al. 1998) or situated them within Platyzoa (close to rotifers, acanthocephalans and cycliophorans, Giribet et al. 2000), while morphology placed them close to rotifers and acanthocephalans forming the Gnathifera (Rieger & Tyler 1995; Ahlrichs 1997). Regarding the Gastrotricha, they appear to be polyphyletic, but their monophyly can not be rejected by the comparison of topologies test. Gastrotrich SSU sequences have presented conflicting results in previous studies, showing them to be either a polyphyletic group within lophotrochozoans (Giribet et al. 2004; Manylov et al. 2004) or monophyletic in the most recent study (but with low support, Todaro et al. 2006). Their problematic nature, together with the lack of gastrotrich LSU in our dataset, may explain our failure to recover their monophyly. Despite their polyphyly, their clustering with the rest of the lophotrochozoans (excluding gnathostomulids) has maximum support (figure 4c and additional table in the electronic supplementary material) and any relationship to ecdysozoa is rejected by the comparison of topologies (table 2).
Clade III includes Rotifera, Acantocephala and Micrognathozoa. The relationship among Rotifera and Acanthocephala is solidly recovered in our trees despite the very long branches of acanthocephalans, a clade suggested by morphology (see examples in Schmidt-Rhaesa 2003) and SSU (Syndermata, Garey et al. 1996; Garey & Schmidt-Rhaesa 1998; Zrzavy et al. 1998). With regards to Micrognathozoa, morphology related them to gnathostomulids and rotifers (clade Gnathifera, based on homologous jaw elements, Kristensen & Funch 2000; Sørensen 2003), although recent molecular data are more ambiguous (Giribet et al. 2004). Our analyses clearly recover the clade (Micrognathozoa+(Rotifera+Acanthocephala)) with maximum support in the Gnat-set (figure 4b in the electronic supplementary material).
Overall, our phylogeny of the Protostomia and the comparison of topologies tests do not recover proposals such as Gnathifera (Gnathostomulida+Micrognathozoa (Rotifera+Acanthocephala); Ahlrichs 1997; Sørensen et al. 2000; Nielsen 2001), Cycloneuralia (sensu lato, Gastrotricha+Nematoida+Scalidophora; Schmidt-Rhaesa et al. 1998; Sørensen et al. 2000; Nielsen 2001; Peterson & Eernisse 2001; Zrzavy 2003), Neotrichozoa (Gastrotricha+Gnathostomulida; Zrzavy et al. 1998) and Platyzoa (Cavalier-Smith 1998; Giribet et al. 2000; Garey 2001; Passamaneck & Halanych 2006). However, if acoels are excluded from the Platyzoa definition (Giribet et al. 2000), the platyzoan representatives could be seen as a paraphyletic assemblage at the base of the Lophotrochozoa made up by Gnathostomulida (Gastrotricha+((Micrognathozoa (Rotifera+Acanthocephala))+Lophotrochozoa ss). Hence, the characters that have been proposed as synapomorphies for the Platyzoa may be reconsidered as plesiomorphic states for the Lophotrochozoa s.l.
The results presented here (figure 2) have some interesting evolutionary implications. First, the paraphyletic branching of the acoels and nemertodermatids at the base of the bilaterians suggests that the last common ancestor of all bilaterians, however, different from present-day acoels and nemertodermatids, was a small, benthonic, acoelomate worm with an anterior concentration of nerve cells (primitive brain), a blind gut, mesoderm that forms the musculature and mesenchymal cells and direct development. Second, the early branching of gnathostomulids within the lophotrochozoans agrees with their acoelomate nature and its presumed lack of a permanent anus that may be plesiomorphies shared with acoelomorphs and diploblasts. Next to gnathostomulids branch the gastrotrichs that are also acoelomate worm-like animals, but with a through-gut with anus. According to this scenario, gnathostomulids and gastrotrichs may be an intermediate state between an acoel-like ancestor and the more complex lophotrochozoans, as suggested by a recent study of mouth and anus evolution (Hejnol & Martindale 2008).
Finally, this phylogenetic scheme clearly demonstrates that some morphological features, such as the presence and type of coelomic cavities or the type of cleavage (for example, the multiple apparition of radial cleavage in bilaterians, including the case of the radial Brachiozoa within a clade of spiralian animals) that are classically considered as good phylogenetic characters for the metazoa, have appeared independently more than once.
To summarize, this study demonstrates that the combination of broad taxon sampling, short-branched sequences and the application of adequate methodologies to avoid LBA, together with careful compartmentalized analyses of problematic taxa allows a phylogenetic hypothesis of the bilaterian animals to be inferred with better resolution than previous similar studies. Furthermore, the vast taxonomic sampling available for ribosomal genes allowed us to test the position of some key clades that have been poorly examined for the new genetic markers. Altogether, these observations point to the fact that ribosomal RNA genes are still a reliable source for the study of deep divergences in the metazoan tree.
This research was supported by DGI-MEC grant BOS2002-02097 and CIRIT grant 2005SGR00578 to J.B. We are grateful to Iñaki Ruiz-Trillo for all the lively discussions and stimulating insights on the subject and to Sara Rojas for the animal drawings included in the final tree. We also wish to thank David Vicente (Barcelona Supercomputing Centre) and Alfred Gil (CESCA) for their kind help with installing programs and resolving problems during execution. M.R. and J.P. designed the study and performed the analyses. M.R., J.P. and J.B. prepared the manuscript. All the three authors read and approved the final manuscript.
Sequences analyzed in this study and their GenBank Accesión Numbers
Results obtained with NJ, Treefinder and PhyML
Bayesian and RaxML topologies for the subsets
Discussion on the results for the non-lophotrochozoan clades