The Indian subcontinent is presently inhabited by four major linguistic groups, viz. Austro-Asiatic, Dravidian, Indo-European and Tibeto-Burman that might have entered at different points of time. Based on the observation that Austro-Asiatic family has the greatest divergence in their nouns [1
] and some other linguistic features (for details, refer to discussion), it is considered to be the oldest of the four linguistic families [1
] and consists of three sub-families [3
]: (1) Mundari, spoken by a number of tribes inhabiting Chota-Nagpur plateau in Central and Eastern India, (2) Mon-Khmer, spoken by Nicobarese and Shompen tribes from Andaman and Nicobar islands and (3) Khasi-Khmuic (which linguists earlier considered as part of Mon-Khmer) represented by only the Khasi subtribes from Northeast India (Fig. ). The Indian Khasi-Khmuic to a certain extent and Mon-Khmer groups have physical features of East Asian populations [4
], whereas the Mundari populations have features similar to those of the Dravidian linguistic family. Further, except the Mundari sub-family which is restricted to the Indian subcontinent, the languages of the other two sub-families of Austro-Asiatics are spoken by a large number of populations in Southeast Asia (Fig. ). However, neither the possibility of any genetic link among the three linguistic branches of Indian Austro-Asiatics, nor that between the Indian and Southeast Asian Austro-Asiatics has been comprehensively explored till now, despite the fact that the Indian subcontinent has been considered to have probably served as an important corridor for migrations to Southeast Asia.
Map showing present-day distribution of Austro-Asiatic groups (modified from van Driem ) and the schematic representation of the routes of migration of the different Austro-Asiatic linguistic subgroups of India.
Two routes of migration by which Austro-Asiatic groups possibly entered the Indian subcontinent have been suggested based on the linguistic, archaeological and classical genetic marker [4
] and the references therein]; the first being migration from Africa to India via Central Asia, while the second route is from Africa to Northeast Asia and then to the Indian subcontinent. Basu et al. [5
] found high frequency of Haplogroup K-M9 among the Mundari populations and inferred that the Austro-Asiatic populations have migrated from Africa to India via central Asia, which is flawed since this haplogroup is ubiquitously found in Asia and has a substantial presence in the whole of East Asia. On the other hand, from the analysis of mtDNA 9bp (9-base-pair) del/ins (deletion/insertion) polymorphisms, Thangaraj et al. [6
] and Prasad et al. [7
] reported only East Asian-specific mtDNA haplogroups in Nicobarese, while Roychoudury et al. [8
] and Metspalu et al. [9
] found only Indian-specific mtDNA haplogroups in Mundari populations. The above inferences were, however, based on meager genetic evidence and very few Austro-Asiatic populations (a maximum of 3) were included in those studies. Although Kumar et al. [10
] analysed a large number of Austro-Asiatic populations and suggested distinct origins and migration histories of the Mundari, Khasi-Khmuic and Mon-Khmer populations, the analysis was based only on the mtDNA 9bp del/ins polymorphisms and its characterization.
We sampled almost all the Austro-Asiatic populations of India covering the entire geographic and micro-linguistic heterogeneity inherent among them (Table and Fig. S1 [see Additional file 1
]). This includes molecular genetic data on the Austro-Asiatic Khasi from Northeast India, which is considered an important corridor for human migrations to Southeast Asia. We present results based on the analysis of Y-chromosome SNP and STR data of Austro-Asiatic tribes along with the previously published data of 214 other relevant populations and try to trace the origin and historic expansion of Austro-Asiatic groups of India. Based on this evidence, we propose that haplogroup O-M95 has originated in the Indian Austro-Asiatics, particularly among the Mundaris, whose ancestors moved further to Southeast Asia along with this haplogroup.
Geographical distribution and the Linguistic Affiliations along with sample size of the Twenty Five Studied Populations