To settle the debate about the position of the root of placental mammals, we used the ENCODE consortium sequencing data, covering 1% of human genome and orthologous genomic regions in other mammals [23
]. We created two independent alignments: the first from CDSs, the second from CNCs. Both alignments were prepared in a way in which in every column there are positions for at least one representative of each major mammalian group (Primates, Glires, Laurasiatheria, Xenarthra, Afrotheria, Marsupialia, and Monotremata). The final alignment of concatenated CDSs contained 204,786 base pairs (bp) belonging to 218 genes while the concatenated CNC alignment included 429,675 bp coming from all individual CNCs longer than 50 bp (the CNC alignment is more than twice as large as the CDS alignment). With this unprecedented amount of sequence information (at least 39 times larger than previous studies) we assessed mammalian phylogeny using Monodelphis
(Metatheria) and Platypus
(Monotremata) as an out-group.
The two independent datasets, CDS and CNC, were analyzed using maximum likelihood (ML) methods as implemented in PHYML and a general time-reversible (GTR) + gamma (G) + proportion of invariable sites (I) model of sequence evolution as defined by ModelTest [25
]. Both datasets converged to the same highly supported topology (). All major mammalian lineages known so far are reconstructed: Primates, Glires, Eurarchontoglires, Laurasiatheria, Boreoeutheria (B), Xenarthra (X), and Afrotheria (A). The bootstrap support for those nodes is 100% on trees from both phylogenetic markers. Moreover, our phylogenetic analyses provide a clear topological solution to the long-standing question of the position of mammalian root: placental mammals are split onto Afrotheria on one side and Exafroplacentalia on the other side with a high statistical support (CDS amino acids [aa]: 95% bootstrap proportion [BP]; CDS nucleotides: 88% BP; CNC: 73% BP). The phylogeny based on CNCs fully corroborates the CDS-based phylogeny. The consistent results of the two non-intersecting datasets provide additional support for the settlement of the debate regarding the position of the placental root.
The two remaining alternative topologies of the position of the root of placentals: (1) the Epitheria hypothesis (X [A,B]), and (2) the Atlantogenata hypothesis ([A,X] B), were confronted to the ML topology () using approximately unbiased, Kishino-Hasegawa, and Shimodaira-Hasegawa topological tests as implemented in the CONSEL package [27
]. The results of ML analysis with GTR + G + I model performed with baseml program implemented in PAML package [28
] () using the CDS_DNA dataset show that none of the alternative topologies can be significantly rejected. Using the CDS_AA dataset the Atlantogenata theory can be rejected (p
< 0.00001) while the Epitheria theory cannot (). Only using the CNC dataset, we can reject the Epitheria theory (p
< 0.01) while the two other topologies are almost equally possible. The combined dataset from CNC and CDS_DNA 1 and 2 (first and second positions from CDS_DNA) alignments places Afrotheria at the base and allows us to reject only the Epitheria but not the Atlantogenata hypotheses.
Topological Tests of Alternative Hypotheses of Placental Root
Taken together, these results suggest that the root of the placental lineage lies between the Afrotheria and the group formed by all other Placentalia. Thus, it is only by the combined use of the CNC and CDS datasets (taken from 1% of mammalian genome) that a reasonable amount of evidence supporting the root of the tree is provided. We conclude that CNCs are powerful phylogenetic markers that can be complementary to CDS markers in phylogenetic reconstructions.
By concatenating the CNC and CDS_DNA 1 and 2 datasets, we obtain a DNA matrix that is approximately four times larger than when using CDS_DNA 1 and 2 alignments alone (592,027 bp, comprised of one fourth of CDS data and three fourths of CNC data). The bootstrap support for the placental root between Afrotheria and Exafroplacentalia using the combined data provides a confidence of 98% of BP (the individual BP supports were 95% and 73% using CDS_DNA 1 and 2 and CNC data alone, respectively). Thus, the inclusion of the CNC data has a significant impact in determining the most likely topology of Mammalia.
In order to test if a partitioning of our concatenated dataset might affect our results, we calculated the ML scores for the three alternative topologies using PAML and partitioning CNC versus CDS_DNA 1 and 2 (). We found that the topology with Afrotheria at the base is the best; and using CONSEL, the Epitheria hypothesis is rejected while the Atlantogenata hypothesis is not significantly rejected.
Third codon positions of CDSs are known to saturate over evolutionary time, possibly at the placental evolutionary scale. To test this, we first applied a codon model (GTR + G) that assigns different values to all parameters for first, second, and third codon positions, as implemented in baseml. We also tested the exclusion of the third position from the analysis (CDS_DNA 1 and 2) (). Both analyses increased the robustness of our results. In all analyses, the Afrotheria clade remained at the base of Placentalia (with 95% BP support if excluding third position); the Atlantogenata hypothesis was rejected with p < 0.01 in both cases (codon model or excluding third position). In both analyses, topological tests did not reject the Epitheria hypothesis, yet with the CDS_DNA 1 and 2 data, the p-value is close to the significance threshold of 5% (see ).
Problems with base compositional differences are important to address in studies where taxonomic sampling is sparse rather than dense. In the concatenated dataset CNC + CDS_DNA 1 and 2, the homogeneity chi-square test for base composition rejected base stationarity. Therefore, we assessed the impact of base composition by using baseml to perform likelihood estimates of the three competing topologies under a non-stationarity model (nhomo = 3 option) with TN93 + G + I. The best ML score was obtained for the topology with Afrotheria at the base (), suggesting that our results are not sensitive to non-stationarity of base composition.
Distal out-groups may influence the branching order of the basal in-group lineages. One way of exploring the potential impact of the out-group sampling on the rooting of Placentalia is to delete either Monodelphys or Platypus. Deleting Monodelphys favors the topology with the Afrotheria at the base for all datasets, while deleting Platypus favors the Epitheria hypothesis (for CDS-derived datasets) or the Atlantogenata hypothesis (CNC dataset). We further tested for the potential impact of the long branch of Tenrec on the rooting of the Placentalia by its deletion. All three alternative possibilities were found depending on the three datasets. There is, therefore, no clear evidence that a long-branch attraction artifact favors the topology with Afrotheria at the base.
To test if our phylogenetic searches were sensitive to the starting tree, we repeated the analysis for CDS_AA, CDS_DNA, and CNC using different starting trees (Afrotheria, Epitheria, and Atlantogenata). In all cases the tree with Afrotheria at the base was retrieved. To further test if PHYML NNI could result in the best topology, we performed PHYML SPR analysis for three datasets, CDS_DNA, CDS_AA, and CNC. All three resulted in the trees with Afrotheria at the base with support of 92%, 95%, and 65%, respectively.
Our analysis demonstrated that mammalian CNCs contain abundant signal for phylogenetic studies. In order to assess the phylogenetic signal as compared to CDS, we used two approaches: (1) jacknife analysis, (); and (2) likelihood mapping (Figure S1
). With jacknife analysis, the relative amount of phylogenetic signal is measured in CNC and CDS datasets (both DNA and AA) by systematically reducing the length of the initial alignment and measuring jacknife supports. We generated a CNC alignment of equivalent length to that of CDS_DNA alignment (205 kilobases [kb]) and for both sets we generated four gradually reduced datasets, comprising 100 kb, 50 kb, 20 kb, and 10 kb (for CDS_AA the length was calculated for the corresponding DNA alignment). We observed that for almost all nodes of the tree, even 5% of initial alignment comprising 10 kb (3,300 aa) is sufficient to assess highly supported phylogenetic relationship with jacknife proportion (JP) of near 100%. Among the less stable nodes of the tree is that of Exafroplacentalia. In the CNC and CDS datasets, the JP support of the Exafroplacentalia declines drastically with the reduction of the length of the alignments (). Only with around a 100-kb sequence alignment (for both coding and noncoding sequence) is it possible to reconstruct the basal Exafroplacentalia group with support between 60% and 90% and 90% JP. This result explains why the majority of previous studies addressing the question of placental root were unable to give a conclusive answer, since these studies were conducted using a maximum alignment length of 16.4 kb [2
], that is, approximately six times less than needed according to our estimates.
Jacknife Support for Exafroplacentalia Depending on the Alignment Length
Likelihood mapping [29
] was used as a second test of assessing the quality of the phylogenetic signal contained in CNCs as compared to CDSs. In the 18 species datasets used in this study, the 3,060 possible quartets were phylogenetically analyzed. The proportion of resolved quartets indicates the amount of information in the dataset. For this test, the length of the CDS and CNC alignments were 205 kb and 430 kb, respectively.
The results showed a similar performance of the CNC dataset which gave 99.97% of resolution of 3,060 quartets (one quartet remained unresolved, 0.03%); the CDS_DNA dataset (for which six quartets were unresolved, 0.2%); and CDS_AA (one quartet unresolved, 0.03%).
Overall, the results suggest that CNCs are equally powerful phylogenetic markers as CDSs, and hence they could be used in parallel with CDSs, to maximize the statistical support of phylogenetic trees.
The recent study of retroposed LINE elements in mammals revealed a number of insertions supporting all major mammalian clades that are also supported by our analyses [10
]. Two insertions (L1MB5) common for Boreoeutheria and Afrotheria that are absent in Xenarthra were found supporting the Epitheria hypothesis (X [A,B]). The analysis of rare genomic changes, such as the insertion of retroposed elements, are thought to be exceptionally useful markers due to their ambiguity-free phylogenetic information, because the coincidence of orthologous insertions of retroposed elements belonging to the same type is unlikely [30
]. However, little is known about the frequency of retroposon loss by small-scale deletions. Because extant Xenarthra radiated quite recently (during the Tertiary) from a 35-million-y-long standing stem lineage [31
], the probability of deletion of one or more retroposons in this 35-million-y period of time is not negligible. This explanation may reconcile the findings by Kriegs et al. [10
] and the phylogenomic results presented here. Another possible explanation comes from the fact that the splitting among Afrotheria, Xenarthra, and Boreoeutheria occurred in a relatively short period of time (estimated 5–10 million y [2
]), and therefore incomplete lineage sorting [32
] may also explain the observation of Kriegs et al.
Although our study provides strong evidence for the rooting of Placentalia between Afrotheria and Exafroplacentalia, the phylogenetic signal supporting this hypothesis in 1% of mammalian genome is not sufficiently conclusive. The final resolution of the placental root might come with the addition in the genomic datasets of Xenarthra and Afrotheria species with short branch lengths. Previous phylogenetic studies suggest that short branches are expected for some xenarthrans: Choloepus spp. (two-toed sloth), Cyclopes didactylus (silky anteater), Cabassous spp. (naked-tailed armadillo); and afrotherians: Dugong dugong (Dugong), Chrysochloris spp. (golden mole), Talpa spp. (mole).