shows the trees inferred by our method on the simulated data and the two real data sets alongside their corresponding true simulated tree or the condensed Shriver and Kittles “gold standard” trees. shows the inferred tree produced by our model on the simulated data set. Based on the numbers of observed bipartitions explained by each model bipartition, the tree reconstruction correctly infers the key divergence events across the three populations when compared to . The method also picks up some additional splits below the division into three subgroups that represent substructure within the defined subgroups. The fractions of mutations assigned to each edge roughly correspond to the number of generations simulated on that edge, although with the edge from the MRCA of all populations to the MRCA of populations one and two assigned slightly fewer mutations and the two edges below that somewhat more mutations than would be proportional to their divergence times.
shows the inferred tree from the HapMap data set. The tree reconstruction infers there to be an initial separation of the YRI (African) subpopulation from the others (CEU + JPT + CHB) followed by a subsequent separation of CEU (European) from JPT + CHB (East Asian). When collapsed to the same three populations (African, European, and East Asian), the gold standard tree () shows an identical structure. Furthermore, these results are consistent with many independent lines of evidence for the out-of-Africa hypothesis of human origins [
26], [
28], [
29]. The edge weights indicate that a comparable number of generations elapsed between the divergence of African and non-African subgroups and the divergence of Asian from European subgroups, consistent with a single migration of both groups out of Africa long before the two separated from one another.
For the HGDP data set, the trees differ slightly from run to run, so we arbitrarily provide our first run, , as a representative. The tree infers the most ancient divergence to be that between Africans and the rest of the population groups, followed by a separation of Oceanian from other non-Africans, a separation of Asian + American from European + Middle Eastern (and a subset of Central South Asian), and then a more recent split of American from Asian. Finally, a small cluster of just two Middle Eastern individuals is inferred to have separated recently from the rest of the Middle Eastern, European, and subset of Central South Asian. The tree is nearly identical to that derived from Shriver and Kittles for the same population groups (). The only notable distinctions are that gold standard tree has no equivalent to our purely Middle Eastern node; that the gold standard does not distinguish between the divergence times of Oceanian and other non-African populations from the African, while ours predicts a divergence of Oceanian and European/Asian well after the African/non-African split; and that the gold standard groups Central South Asian with East Asians while ours splits Central South Asian groups between European and East Asian subgroups (an interpretation supported by more recent analyses [
30]). Our results are also consistent with the simpler picture provided by the HapMap data as well as with a general consensus in the field derived from many independent phylogenetic analyses [
28], [
31]. The relative edge weights provide a qualitatively similar picture to that of the HapMap data regarding relative divergence times of their common subpopulations, although the HGDP data suggest a proportionally longer gap between the divergence of African from non-African subgroups and further divergence between the non-African subgroups.
visualizes the corresponding cluster assignments, as described in Methods, in order to provide a secondary assessment of our method’s utility for the simpler subproblem of subpopulation inference. Note that STRUCTURE and our consensus tree method assign sequences to clusters while Spectrum assigns each sequence a distribution of ancestral haplotypes, accounting for the very different appearance of the Spectrum output.
The three methods produced essentially equivalent output for the simulated and HapMap data. For the simulated data (), all of the methods were able to separate the three population groups. For HapMap (), all three methods consistently identified YRI and CEU as distinct subpopulations but failed to separate CHB and JPT.
Results were more ambiguous for HGDP (). The consensus tree method reliably finds five of the seven populations, usually conflating Middle Eastern and European and failing to recognize Central South Asians, consistent with a similar outcome from He et al. [
32]. STRUCTURE showed generally greater sensitivity but slightly worse consistency than our method, usually at least approximately finding six of the annotated seven population groups and having difficulty only in identifying Central South Asians as a distinct group. Spectrum showed a pattern similar to STRUCTURE but the individual ancestral profile seemed to be similar in several population subgroups. For example, the African subgroup seemed to have a similar ancestral profile to the European subgroup.
We further quantified the quality of the cluster inference from our method and STRUCTURE by converting the result to the most likely cluster assignment and computing VI scores and inconsistency scores. shows the VI and inconsistency scores of the three algorithms using inputs with different number of trees and SNPs. When examining the variation of information across different data sets, we can see increased accuracy for both STRUCTURE and consensus tree as we increase the number of trees or SNPs. When we compare the inconsistency scores, neither of the algorithms showed a clear trend with increasing numbers of trees or SNPs. When the number of trees or SNPs is large, however, our method typically becomes more consistent than STRUCTURE.
We also measured the runtimes of the algorithms using 10, 100, 1,000, and 10,000 trees or the corresponding SNPs (). In all cases, our method consistently ran faster than both STRUCTURE and Spectrum, which both use similar Gibbs sampling approaches.
shows the consensus trees constructed using different sizes of data set subsampled from the simulated data. From the figure, we can see that the trees never infer substructure that cuts across the true groups, but that as the data set size increases, the method yields increasingly refined tree structures. This observation is what we would expect for the chosen MDL approach. The method identifies the separation of populations one and two with 100 trees but not with 10, and can discriminate substructure within the individual populations when provided 10,000 trees but not 1,000 or fewer. The number of mutations assigned to each edge increases as we increased the number of observed trees, but the fraction of all mutations assigned to each edge remains nearly constant with increasing data set size.