Austroasiatic is the eighth largest language family in the world in terms of the number of native speakers (104 million) (
Lewis 2009). As its name implies, it is spoken in southern parts of Asia—in Vietnam and Cambodia as the main official languages and in India, Bangladesh, Nepal, Burma, Laos, Thailand, and Malaysia as the first language of many minority groups that are isolated from each other by other language speakers. Two major extant branches of the Austroasiatic language tree are Munda in eastern, northeastern, and central India and Khasi-Aslian, which stretches from the Meghalaya in the northeast of the subcontinent to the Nicobars, Malay Peninsula, and Mekong delta in southeast Asia (). Since the birth of historical linguistics in the 1640s, attempts have been made to explain the wide and continuous geographic spread of some language families, such as the Indo-European, Uralic, and Bantu, in contrast to the more patchy or constrained distribution of others, for example, the Basque and Khoi-San languages. Models proposed to explain the success of a few rather than many language families range from those stressing pure demic diffusion to pure cultural diffusion driven by some economic or technological advance as the key mechanism of the language spread. One of the prehistoric events that has been considered as a plausible device to fuel both demographic and cultural spread is the shift from a hunter-gatherer to an agricultural mode of subsistence thought to have occurred independently in only a few places in the world (
Ammerman and Cavalli-Sforza 1984). However, the attempt at explaining the success of the ten most widely spoken language families of the world in terms of the Neolithic demic diffusion model (
Diamond and Bellwood 2003)—that is, by linking the spread of languages, genes, and economy—has been challenged in almost every single case (
Richards et al. 2000;
Fuller 2003;
Ehret et al. 2004). The hypothesis that the spread of the Austroasiatic language family can be traced back to rice cultivators of southeast Asia (
Higham 2003;
Bellwood 2005) is contested, but some relationship between early Austroasiatics and rice agriculture is a view that remains prevalent among linguists.
The Higham–Bellwood model (
Higham 2003;
Bellwood 2005) considers Indian Munda-speaking and Khasi-Aslian–speaking hunter-gatherer populations, who regardless of their current lifestyle, share rice cultivation related cognates with Khasi-Aslian–speaking populations of southeast Asia, as Neolithic immigrants in India, because traditionally a single origin of rice cultivation in China has been assumed (). However, as argued by
Fuller (2007), the genetic evidence of independent domestications for the
Oryza indica and
japonica cultivars suggests a plausible alternative scenario () by which the homeland of the Austroasiatic family lies in India. If
O. indica rice was indeed domesticated first in India, then its spread to southeast Asia may have been coupled with the spread of Austroasiatic speakers (
Fuller 2007). However, the phylogenetic evidence from genes associated with rice domestication is not unequivocal—phylogenies of some functionally important genes continue to support the single-origin model (e.g.,
Jin et al. 2008;
Tan et al. 2008). Opposing evidence from different genes may be reconciled by a model according to which the domestication was a lengthy process extending back to and even beyond the Last Glacial Maximum, as opposed to the earlier view of a rapid transition which placed the domestication of crops to the Pleistocene/Holocene boundary (
Allaby et al. 2008). However, according to current archaeological evidence, the shift to a lifestyle where rice would be an essential staple food would be younger than 7 thousand years ago (KYA) in China and even more recent in India (
Fuller et al. 2009;
Purugganan and Fuller 2009). In the light of the archaeobotanical, linguistic, and rice genomic evidence, the differentiation of Austroasiatic languages into their major subgroups could therefore be placed either in south or southeast Asia with their split or the latest date of contact probably being more recent than 7 KYA.
Genetic studies on human populations of south and southeast Asia have, hitherto, proved to be inconclusive about the two opposing models of the geographic origins of the Austroasiatic-speaking people and about the timing of the split between the two major branches in this language family. The mitochondrial DNA (mtDNA) information available so far indicates a clear distinction of Indian Munda and southeast Asian Khasi-Aslian–speaking groups, as both share their mtDNA haplogroups with their regional neighbors who speak languages other than Austroasiatic ( and ). Consistent with this linguistic separation, the Khasi-Aslian–speaking Nicobarese carry almost exclusively East Asian–specific mtDNA (
Thangaraj et al. 2005). Notably, Khasi (the only Khasi-Aslian group of mainland India) speakers residing in Meghalaya state in India show an admixed package of both Indian and East Asian mtDNA haplogroups ( and ). Overall, the mtDNA haplogroup distributions make a clear distinction between Indian and southeast Asian Austroasiatic speakers; because of the lack of shared lineages, this evidence is not informative about any shared phase of evolutionary history of Munda and Khasi-Aslian–speaking populations. In contrast, Y chromosome haplogroup O2a occurs frequently both among Indian and southeast Asian Austroasiatic speakers () and thus appears as evidence for some degree of shared ancestry (
Kivisild et al. 2003). Because all other branches of haplogroup O are largely restricted to East Asia and given the recent time depth of Y short tandem repeat (STR) variation of Indian haplogroup O2a, its recent (<10 KYA) entry from southeast Asia () has been implied in some studies (
Sahoo et al. 2006;
Sengupta et al. 2006). On the one hand, the frequency of haplogroup O lineages in India is correlated with languages boundaries and cannot be explained only by isolation by distance ( and ). On the other hand, high levels of genetic diversity of mtDNA haplogroups in Munda speakers and an independent assessment of Y-STR diversity of haplogroup O2a in India, dating its origin to ~65 KYA, have been used to argue in favor of a model that assumes direct descent of Austroasiatic speakers from the initial settlers of India () and their subsequent dispersal to southeast Asia, possibly before the Last Glacial Maximum (
Basu et al. 2003;
Kumar et al. 2007;
Chakravarti 2009). Arguably, the more recent (<10 KYA) estimates of the age of O2a variation in India could have been deflated by limited regional sampling. It should be noted, however, that the 65 KYA dating of haplogroup O2a in India appears much older than the estimated age of its ancestral haplogroups K and NO (
Rootsi et al. 2007;
Karafet et al. 2008). Moreover, the southeast Asian populations have been underrepresented in all previous studies, and furthermore, no high-resolution autosomal evidence has been considered in these debates. Therefore, the genetic origins of Austroasiatic-speaking populations remain largely controversial.
| Table 2mtDNA and Y Chromosome Haplogroup Profiles in South (S) and Southeast (SE) Asia by Population. |
In this paper, we sought to investigate the extent of population structure and admixture among the Indian and southeast Asian AA speakers embedded in their autosomal genomes and to combine the results obtained with data from uniparental loci and from regional selection signatures, such as that of the EDAR gene. We used Illumina HumanHap 610K genotyping chips on 45 diverse Indian samples covering three major language groups from India relevant to our study (22 Austroasiatic [19 Munda and 3 Khasi-Aslian], 19 Dravidian [
Behar et al. 2010], and 4 Tibeto-Burman speakers) and 15 Burmese samples from Myanmar. These results were combined with the global data set (
Li et al. 2008), generated with Illumina HumanHap 650K chips, which, among others, included a set of Pakistani populations as proxy for the Indo-European speakers of south Asia and a sample of ten individuals from Cambodia which is predominantly a Khmer-speaking country (for a full list of populations and sample sizes see
supplementary table S1, Supplementary Material online).