We developed a model to study the historical relationship of Indian groups to those worldwide, based on the hypothesis that most groups can be approximated as a mixture of two ancestral populations followed by group-specific drift. To fit the model to the data, we computed the squared allele frequency difference between all pairs of groups, and chose parameters by minimizing the difference between observation and expectation (Note S4
). The idea of fitting allele frequency differentiation to historical models was first explored by Cavalli-Sforza and Edwards35
and here we extend it to trees with mixture. This approach contrasts with the STRUCTURE algorithm, which fits data without a tree36
, or a tree in which many groups split simultaneously from an ancestral population followed by mixture37
. While STRUCTURE is accurate for estimating individual mixture proportions in recently mixed groups, it is not clear whether its estimates of ancient mixture are biased because it does not model hierarchical relationships among groups, leading to inaccurate modeling frequencies in ancestral populations. By contrast, we use a more realistic tree model, and provide a test of fit.
Applying our model-fitting procedure, we find that the tree (YRI,(CEU,ANI),(ASI, Onge))) provides an excellent fit to the data from Indian groups. In particular, when the Pathan, Vaish, Meghawal and Bhil are modeled as mixtures of ANI and ASI (), the observed allele frequency differentiation statistics are all consistent with the theoretical expectation within three standard deviations (Note S4
Figure 4 A model relating the history of Indian and non-Indian groups. Modeling the Pathan, Vaish, Meghawal and Bhil as mixtures of ANI and ASI, and relating them to non-Indians by the phylogenetic tree (YRI,(CEU,ANI),(ASI, Onge))), provides an excellent fit to (more ...)
Two features of the inferred history are of special interest. First, the ANI and CEU form a clade, and further analysis shows that the Adygei, a Caucasian group, are an outgroup (Note S4
). Many Indian and European groups speak Indo-European languages, while the Adygei speak a Northwest Caucasian language. It is tempting to hypothesize that the population ancestral to ANI and CEU spoke “Proto-Indo-European”, which has been reconstructed as ancestral to both Sanskrit and European languages38
, although we cannot be certain without a date for ANI-ASI mixture.
Second, our analysis shows that the Onge form a clade with the ASI (Note S4
), which we verified by running the 4 Population Test
on ((YRI,Papuan)(Dai,X)), and finding that it is consistent when X=Onge (Z=1.7) but inconsistent for all Indian Cline groups (Z
-9) (Table S4
). Previous mtDNA analyses suggested that the Onge do not share any maternal ancestry with groups outside India within the last ~48,000 years19,39
. While they do share ancestry with some rare haplogroups in some Indian tribal populations within the last ~24,000 years 39,40
, this is consistent with our inferred Onge-ASI clade, as long as the gene flow predated the ASI-ANI mixture that later occurred on the mainland.
We caution that “models” in population genetics should be treated with caution. While they provide an important framework for testing historical hypotheses, they are oversimplifications. For example, the true ancestral populations of India were probably not homogeneous as we assume in our model but instead were likely to have been formed by clusters of related groups that mixed at different times. However, modeling them as homogeneous fits the data and appears to capture meaningful features of history.