1.  Austro-Asiatic Tribes of Northeast India Provide Hitherto Missing Genetic Link between South and Southeast Asia 
PLoS ONE  2007;2(11):e1141.
Northeast India, the only region which currently forms a land bridge between the Indian subcontinent and Southeast Asia, has been proposed as an important corridor for the initial peopling of East Asia. Given that the Austro-Asiatic linguistic family is considered to be the oldest and spoken by certain tribes in India, Northeast India and entire Southeast Asia, we expect that populations of this family from Northeast India should provide the signatures of genetic link between Indian and Southeast Asian populations. In order to test this hypothesis, we analyzed mtDNA and Y-Chromosome SNP and STR data of the eight groups of the Austro-Asiatic Khasi from Northeast India and the neighboring Garo and compared with that of other relevant Asian populations. The results suggest that the Austro-Asiatic Khasi tribes of Northeast India represent a genetic continuity between the populations of South and Southeast Asia, thereby advocating that northeast India could have been a major corridor for the movement of populations from India to East/Southeast Asia.
PMCID: PMC2065843  PMID: 17989774
2.  Migration of Chadic speaking pastoralists within Africa based on population structure of Chad Basin and phylogeography of mitochondrial L3f haplogroup 
Chad Basin, lying within the bidirectional corridor of African Sahel, is one of the most populated places in Sub-Saharan Africa today. The origin of its settlement appears connected with Holocene climatic ameliorations (aquatic resources) that started ~10,000 years before present (YBP). Although both Nilo-Saharan and Niger-Congo language families are encountered here, the most diversified group is the Chadic branch belonging to the Afro-Asiatic language phylum. In this article, we investigate the proposed ancient migration of Chadic pastoralists from Eastern Africa based on linguistic data and test for genetic traces of this migration in extant Chadic speaking populations.
We performed whole mitochondrial genome sequencing of 16 L3f haplotypes, focused on clade L3f3 that occurs almost exclusively in Chadic speaking people living in the Chad Basin. These data supported the reconstruction of a L3f phylogenetic tree and calculation of times to the most recent common ancestor for all internal clades. A date ~8,000 YBP was estimated for the L3f3 sub-haplogroup, which is in good agreement with the supposed migration of Chadic speaking pastoralists and their linguistic differentiation from other Afro-Asiatic groups of East Africa. As a whole, the Afro-Asiatic language family presents low population structure, as 92.4% of mtDNA variation is found within populations and only 3.4% of variation can be attributed to diversity among language branches. The Chadic speaking populations form a relatively homogenous cluster, exhibiting lower diversification than the other Afro-Asiatic branches (Berber, Semitic and Cushitic).
The results of our study support an East African origin of mitochondrial L3f3 clade that is present almost exclusively within Chadic speaking people living in Chad Basin. Whole genome sequence-based dates show that the ancestral haplogroup L3f must have emerged soon after the Out-of-Africa migration (around 57,100 ± 9,400 YBP), but the "Chadic" L3f3 clade has much less internal variation, suggesting an expansion during the Holocene period about 8,000 ± 2,500 YBP. This time period in the Chad Basin is known to have been particularly favourable for the expansion of pastoralists coming from northeastern Africa, as suggested by archaeological, linguistic and climatic data.
PMCID: PMC2680838  PMID: 19309521
3.  Y-chromosome evidence suggests a common paternal heritage of Austro-Asiatic populations 
The Austro-Asiatic linguistic family, which is considered to be the oldest of all the families in India, has a substantial presence in Southeast Asia. However, the possibility of any genetic link among the linguistic sub-families of the Indian Austro-Asiatics on the one hand and between the Indian and the Southeast Asian Austro-Asiatics on the other has not been explored till now. Therefore, to trace the origin and historic expansion of Austro-Asiatic groups of India, we analysed Y-chromosome SNP and STR data of the 1222 individuals from 25 Indian populations, covering all the three branches of Austro-Asiatic tribes, viz. Mundari, Khasi-Khmuic and Mon-Khmer, along with the previously published data on 214 relevant populations from Asia and Oceania.
Our results suggest a strong paternal genetic link, not only among the subgroups of Indian Austro-Asiatic populations but also with those of Southeast Asia. However, maternal link based on mtDNA is not evident. The results also indicate that the haplogroup O-M95 had originated in the Indian Austro-Asiatic populations ~65,000 yrs BP (95% C.I. 25,442 – 132,230) and their ancestors carried it further to Southeast Asia via the Northeast Indian corridor. Subsequently, in the process of expansion, the Mon-Khmer populations from Southeast Asia seem to have migrated and colonized Andaman and Nicobar Islands at a much later point of time.
Our findings are consistent with the linguistic evidence, which suggests that the linguistic ancestors of the Austro-Asiatic populations have originated in India and then migrated to Southeast Asia.
PMCID: PMC1851701  PMID: 17389048
4.  Genetic Affinities of the Central Indian Tribal Populations 
PLoS ONE  2012;7(2):e32546.
The central Indian state Madhya Pradesh is often called as ‘heart of India’ and has always been an important region functioning as a trinexus belt for three major language families (Indo-European, Dravidian and Austroasiatic). There are less detailed genetic studies on the populations inhabited in this region. Therefore, this study is an attempt for extensive characterization of genetic ancestries of three tribal populations, namely; Bharia, Bhil and Sahariya, inhabiting this region using haploid and diploid DNA markers.
Methodology/Principal Findings
Mitochondrial DNA analysis showed high diversity, including some of the older sublineages of M haplogroup and prominent R lineages in all the three tribes. Y-chromosomal biallelic markers revealed high frequency of Austroasiatic-specific M95-O2a haplogroup in Bharia and Sahariya, M82-H1a in Bhil and M17-R1a in Bhil and Sahariya. The results obtained by haploid as well as diploid genetic markers revealed strong genetic affinity of Bharia (a Dravidian speaking tribe) with the Austroasiatic (Munda) group. The gene flow from Austroasiatic group is further confirmed by their Y-STRs haplotype sharing analysis, where we determined their founder haplotype from the North Munda speaking tribe, while, autosomal analysis was largely in concordant with the haploid DNA results.
Bhil exhibited largely Indo-European specific ancestry, while Sahariya and Bharia showed admixed genetic package of Indo-European and Austroasiatic populations. Hence, in a landscape like India, linguistic label doesn't unequivocally follow the genetic footprints.
PMCID: PMC3290590  PMID: 22393414
5.  Genetic evidence supports linguistic affinity of Mlabri - a hunter-gatherer group in Thailand 
BMC Genetics  2010;11:18.
The Mlabri are a group of nomadic hunter-gatherers inhabiting the rural highlands of Thailand. Little is known about the origins of the Mlabri and linguistic evidence suggests that the present-day Mlabri language most likely arose from Tin, a Khmuic language in the Austro-Asiatic language family. This study aims to examine whether the genetic affinity of the Mlabri is consistent with this linguistic relationship, and to further explore the origins of this enigmatic population.
We conducted a genome-wide analysis of genetic variation using more than fifty thousand single nucleotide polymorphisms (SNPs) typed in thirteen population samples from Thailand, including the Mlabri, Htin and neighboring populations of the Northern Highlands, speaking Austro-Asiatic, Tai-Kadai and Hmong-Mien languages. The Mlabri population showed higher LD and lower haplotype diversity when compared with its neighboring populations. Both model-free and Bayesian model-based clustering analyses indicated a close genetic relationship between the Mlabri and the Htin, a group speaking a Tin language.
Our results strongly suggested that the Mlabri share more recent common ancestry with the Htin. We thus provided, to our knowledge, the first genetic evidence that supports the linguistic affinity of Mlabri, and this association between linguistic and genetic classifications could reflect the same past population processes.
PMCID: PMC2858090  PMID: 20302622
6.  Influence of language and ancestry on genetic structure of contiguous populations: A microsatellite based study on populations of Orissa 
BMC Genetics  2005;6:4.
We have examined genetic diversity at fifteen autosomal microsatellite loci in seven predominant populations of Orissa to decipher whether populations inhabiting the same geographic region can be differentiated on the basis of language or ancestry. The studied populations have diverse historical accounts of their origin, belong to two major ethnic groups and different linguistic families. Caucasoid caste populations are speakers of Indo-European language and comprise Brahmins, Khandayat, Karan and Gope, while the three Australoid tribal populations include two Austric speakers: Juang and Saora and a Dravidian speaking population, Paroja. These divergent groups provide a varied substratum for understanding variation of genetic patterns in a geographical area resulting from differential admixture between migrants groups and aboriginals, and the influence of this admixture on population stratification.
The allele distribution pattern showed uniformity in the studied groups with approximately 81% genetic variability within populations. The coefficient of gene differentiation was found to be significantly higher in tribes (0.014) than caste groups (0.004). Genetic variance between the groups was 0.34% in both ethnic and linguistic clusters and statistically significant only in the ethnic apportionment. Although the populations were genetically close (FST = 0.010), the contemporary caste and tribal groups formed distinct clusters in both Principal-Component plot and Neighbor-Joining tree. In the phylogenetic tree, the Orissa Brahmins showed close affinity to populations of North India, while Khandayat and Gope clustered with the tribal groups, suggesting a possibility of their origin from indigenous people.
The extent of genetic differentiation in the contemporary caste and tribal groups of Orissa is highly significant and constitutes two distinct genetic clusters. Based on our observations, we suggest that since genetic distances and coefficient of gene differentiation were fairly small, the studied populations are indeed genetically similar and that the genetic structure of populations in a geographical region is primarily influenced by their ancestry and not by socio-cultural hierarchy or language. The scenario of genetic structure, however, might be different for other regions of the subcontinent where populations have more similar ethnic and linguistic backgrounds and there might be variations in the patterns of genomic and socio-cultural affinities in different geographical regions.
PMCID: PMC549189  PMID: 15694006
7.  The Phylogeography of Y-Chromosome Haplogroup H1a1a-M82 Reveals the Likely Indian Origin of the European Romani Populations 
PLoS ONE  2012;7(11):e48477.
Linguistic and genetic studies on Roma populations inhabited in Europe have unequivocally traced these populations to the Indian subcontinent. However, the exact parental population group and time of the out-of-India dispersal have remained disputed. In the absence of archaeological records and with only scanty historical documentation of the Roma, comparative linguistic studies were the first to identify their Indian origin. Recently, molecular studies on the basis of disease-causing mutations and haploid DNA markers (i.e. mtDNA and Y-chromosome) supported the linguistic view. The presence of Indian-specific Y-chromosome haplogroup H1a1a-M82 and mtDNA haplogroups M5a1, M18 and M35b among Roma has corroborated that their South Asian origins and later admixture with Near Eastern and European populations. However, previous studies have left unanswered questions about the exact parental population groups in South Asia. Here we present a detailed phylogeographical study of Y-chromosomal haplogroup H1a1a-M82 in a data set of more than 10,000 global samples to discern a more precise ancestral source of European Romani populations. The phylogeographical patterns and diversity estimates indicate an early origin of this haplogroup in the Indian subcontinent and its further expansion to other regions. Tellingly, the short tandem repeat (STR) based network of H1a1a-M82 lineages displayed the closest connection of Romani haplotypes with the traditional scheduled caste and scheduled tribe population groups of northwestern India.
PMCID: PMC3509117  PMID: 23209554
8.  Population Differentiation of Southern Indian Male Lineages Correlates with Agricultural Expansions Predating the Caste System 
PLoS ONE  2012;7(11):e50269.
Previous studies that pooled Indian populations from a wide variety of geographical locations, have obtained contradictory conclusions about the processes of the establishment of the Varna caste system and its genetic impact on the origins and demographic histories of Indian populations. To further investigate these questions we took advantage that both Y chromosome and caste designation are paternally inherited, and genotyped 1,680 Y chromosomes representing 12 tribal and 19 non-tribal (caste) endogamous populations from the predominantly Dravidian-speaking Tamil Nadu state in the southernmost part of India. Tribes and castes were both characterized by an overwhelming proportion of putatively Indian autochthonous Y-chromosomal haplogroups (H-M69, F-M89, R1a1-M17, L1-M27, R2-M124, and C5-M356; 81% combined) with a shared genetic heritage dating back to the late Pleistocene (10–30 Kya), suggesting that more recent Holocene migrations from western Eurasia contributed <20% of the male lineages. We found strong evidence for genetic structure, associated primarily with the current mode of subsistence. Coalescence analysis suggested that the social stratification was established 4–6 Kya and there was little admixture during the last 3 Kya, implying a minimal genetic impact of the Varna (caste) system from the historically-documented Brahmin migrations into the area. In contrast, the overall Y-chromosomal patterns, the time depth of population diversifications and the period of differentiation were best explained by the emergence of agricultural technology in South Asia. These results highlight the utility of detailed local genetic studies within India, without prior assumptions about the importance of Varna rank status for population grouping, to obtain new insights into the relative influences of past demographic events for the population structure of the whole of modern India.
PMCID: PMC3508930  PMID: 23209694
9.  Y Chromosome Haplogroup Distribution in Indo-European Speaking Tribes of Gujarat, Western India 
PLoS ONE  2014;9(3):e90414.
The present study was carried out in the Indo-European speaking tribal population groups of Southern Gujarat, India to investigate and reconstruct their paternal population structure and population histories. The role of language, ethnicity and geography in determining the observed pattern of Y haplogroup clustering in the study populations was also examined. A set of 48 bi-allelic markers on the non-recombining region of Y chromosome (NRY) were analysed in 284 males; representing nine Indo-European speaking tribal populations. The genetic structure of the populations revealed that none of these groups was overtly admixed or completely isolated. However, elevated haplogroup diversity and FST value point towards greater diversity and differentiation which suggests the possibility of early demographic expansion of the study groups. The phylogenetic analysis revealed 13 paternal lineages, of which six haplogroups: C5, H1a*, H2, J2, R1a1* and R2 accounted for a major portion of the Y chromosome diversity. The higher frequency of the six haplogroups and the pattern of clustering in the populations indicated overlapping of haplogroups with West and Central Asian populations. Other analyses undertaken on the population affiliations revealed that the Indo-European speaking populations along with the Dravidian speaking groups of southern India have an influence on the tribal groups of Gujarat. The vital role of geography in determining the distribution of Y lineages was also noticed. This implies that although language plays a vital role in determining the distribution of Y lineages, the present day linguistic affiliation of any population in India for reconstructing the demographic history of the country should be considered with caution.
PMCID: PMC3948632  PMID: 24614885
10.  Y-chromosomal variation in Sub-Saharan Africa: insights into the history of Niger-Congo groups 
Molecular biology and evolution  2010;28(3):1255-1269.
Technological and cultural innovations, as well as climate changes, are thought to have influenced the diffusion of major language phyla in sub-Saharan Africa. The most widespread and the richest in diversity is the Niger-Congo phylum, thought to have originated in West Africa ~10,000 years ago. The expansion of Bantu languages (a family within the Niger-Congo phylum) ~5,000 years ago represents a major event in the past demography of the continent. Many previous studies on Y chromosomal variation in Africa associated the Bantu expansion with haplogroup E1b1a (and sometimes its sub-lineage E1b1a7). However, the distribution of these two lineages extends far beyond the area occupied nowadays by Bantu speaking people, raising questions on the actual genetic structure behind this expansion. To address these issues, we directly genotyped 31 biallelic markers and 12 microsatellites on the Y chromosome in 1195 individuals of African ancestry focusing on areas that were previously poorly characterized (Botswana, Burkina Faso, D.R.C, and Zambia). With the inclusion of published data, we analyzed 2736 individuals from 26 groups representing all linguistic phyla and covering a large portion of Sub-Saharan Africa. Within the Niger-Congo phylum, we ascertain for the first time differences in haplogroup composition between Bantu and non-Bantu groups via two markers (U174 and U175) on the background of haplogroup E1b1a (and E1b1a7), which were directly genotyped in our samples and for which genotypes were inferred from published data using Linear Discriminant Analysis on STR haplotypes. No reduction in STR diversity levels was found across the Bantu groups, suggesting the absence of serial founder effects. In addition, the homogeneity of haplogroup composition and pattern of haplotype sharing between Western and Eastern Bantu groups suggest that their expansion throughout Sub-Saharan Africa reflects a rapid spread followed by backward and forward migrations. Overall, we found that linguistic affiliations played a notable role in shaping sub-Saharan African Y chromosomal diversity, although the impact of geography is clearly discernible.
PMCID: PMC3561512  PMID: 21109585
Human; Language; Geography; Migration; Y chromosome; Bantu
11.  Kinship Institutions and Sex Ratios in India 
Demography  2010;47(4):989-1012.
This article explores the relationship between kinship institutions and sex ratios in India at the turn of the twentieth century. Because kinship rules vary by caste, language, religion, and region, we construct sex ratios by these categories at the district level by using data from the 1901 Census of India for Punjab (North), Bengal (East), and Madras (South). We find that the male-to-female sex ratio varied positively with caste rank, fell as one moved from the North to the East and then to the South, was higher for Hindus than for Muslims, and was higher for northern Indo-Aryan speakers than for the southern Dravidian-speaking people. We argue that these systematic patterns in the data are consistent with variations in the institution of family, kinship, and inheritance.
PMCID: PMC3000033  PMID: 21308567
12.  Evolutionary Origin and Phylogeography of the Diploid Obligate Parthenogen Artemia parthenogenetica (Branchiopoda: Anostraca) 
PLoS ONE  2010;5(8):e11932.
Understanding the evolutionary origin and the phylogeographic patterns of asexual taxa can shed light on the origin and maintenance of sexual reproduction. We assessed the geographic origin, genetic diversity, and phylogeographic history of obligate parthenogen diploid Artemia parthenogenetica populations, a widespread halophilic crustacean.
Methodology/Principal Findings
We analysed a partial sequence of the Cytochrome c Oxidase Subunit I mitochondrial gene from an extensive set of localities (including Eurasia, Africa, and Australia), and examined their phylogeographic patterns and the phylogenetic relationships of diploid A. parthenogenetica and its closest sexual relatives. Populations displayed an extremely low level of mitochondrial genetic diversity, with one widespread haplotype shared by over 79% of individuals analysed. Phylogenetic and phylogeographic analyses indicated a multiple and recent evolutionary origin of diploid A. parthenogenetica, and strongly suggested that the geographic origin of parthenogenesis in Artemia was in Central Asia. Our results indicate that the maternal sexual ancestors of diploid A. parthenogenetica were an undescribed species from Kazakhstan and A. urmiana.
We found evidence for multiple origin of parthenogenesis in Central Asia. Our results indicated that, shortly after its origin, diploid A. parthenogenetica populations underwent a rapid range expansion from Central Asia towards the Mediterranean region, and probably to the rest of its current geographic distribution. This contrasts with the restricted geographic distribution, strong genetic structure, and regional endemism of sexual Artemia lineages and other passively dispersed sexual continental aquatic invertebrates. We hypothesize that diploid parthenogens might have reached their current distribution in historical times, with a range expansion possibly facilitated by an increased availability of suitable habitat provided by anthropogenic activities, such as the spread of solar saltworks, aided by their natural dispersal vectors (i.e., waterbirds).
PMCID: PMC2915914  PMID: 20694140
13.  Genetic affinities among the lower castes and tribal groups of India: inference from Y chromosome and mitochondrial DNA 
BMC Genetics  2006;7:42.
India is a country with enormous social and cultural diversity due to its positioning on the crossroads of many historic and pre-historic human migrations. The hierarchical caste system in the Hindu society dominates the social structure of the Indian populations. The origin of the caste system in India is a matter of debate with many linguists and anthropologists suggesting that it began with the arrival of Indo-European speakers from Central Asia about 3500 years ago. Previous genetic studies based on Indian populations failed to achieve a consensus in this regard. We analysed the Y-chromosome and mitochondrial DNA of three tribal populations of southern India, compared the results with available data from the Indian subcontinent and tried to reconstruct the evolutionary history of Indian caste and tribal populations.
No significant difference was observed in the mitochondrial DNA between Indian tribal and caste populations, except for the presence of a higher frequency of west Eurasian-specific haplogroups in the higher castes, mostly in the north western part of India. On the other hand, the study of the Indian Y lineages revealed distinct distribution patterns among caste and tribal populations. The paternal lineages of Indian lower castes showed significantly closer affinity to the tribal populations than to the upper castes. The frequencies of deep-rooted Y haplogroups such as M89, M52, and M95 were higher in the lower castes and tribes, compared to the upper castes.
The present study suggests that the vast majority (>98%) of the Indian maternal gene pool, consisting of Indio-European and Dravidian speakers, is genetically more or less uniform. Invasions after the late Pleistocene settlement might have been mostly male-mediated. However, Y-SNP data provides compelling genetic evidence for a tribal origin of the lower caste populations in the subcontinent. Lower caste groups might have originated with the hierarchical divisions that arose within the tribal groups with the spread of Neolithic agriculturalists, much earlier than the arrival of Aryan speakers. The Indo-Europeans established themselves as upper castes among this already developed caste-like class structure within the tribes.
PMCID: PMC1569435  PMID: 16893451
14.  Chloroplast DNA Phylogeography of Holy Basil (Ocimum tenuiflorum) in Indian Subcontinent 
The Scientific World Journal  2014;2014:847482.
Ocimum tenuiflorum L., holy basil “Tulsi”, is an important medicinal plant that is being grown and traditionally revered throughout Indian Subcontinent for thousands of years; however, DNA sequence-based genetic diversity of this aromatic herb is not yet known. In this report, we present our studies on the phylogeography of this species using trnL-trnF intergenic spacer of plastid genome as the DNA barcode for isolates from Indian subcontinent. Our pairwise distance analyses indicated that genetic heterogeneity of isolates remained quite low, with overall mean nucleotide p-distance of 5 × 10−4. However, our sensitive phylogenetic analysis using maximum likelihood framework was able to reveal subtle intraspecific molecular evolution of this species within the subcontinent. All isolates except that from North-Central India formed a distinct phylogenetic clade, notwithstanding low bootstrap support and collapse of the clade in Bayesian Inference. North-Central isolates occupied more basal position compared to other isolates, which is suggestive of its evolutionarily primitive status. Indian isolates formed a monophyletic and well-supported clade within O. tenuiflorum clade, which indicates a distinct haplotype. Given the vast geographical area of more than 3 million km2 encompassing many exclusive biogeographical and ecological zones, relatively low rate of evolution of this herb at this locus in India is particularly interesting.
PMCID: PMC3910118  PMID: 24523650
15.  In the heartland of Eurasia: the multilocus genetic landscape of Central Asian populations 
Located in the Eurasian heartland, Central Asia has played a major role in both the early spread of modern humans out of Africa and the more recent settlements of differentiated populations across Eurasia. A detailed knowledge of the peopling in this vast region would therefore greatly improve our understanding of range expansions, colonizations and recurrent migrations, including the impact of the historical expansion of eastern nomadic groups that occurred in Central Asia. However, despite its presumable importance, little is known about the level and the distribution of genetic variation in this region. We genotyped 26 Indo-Iranian- and Turkic-speaking populations, belonging to six different ethnic groups, at 27 autosomal microsatellite loci. The analysis of genetic variation reveals that Central Asian diversity is mainly shaped by linguistic affiliation, with Turkic-speaking populations forming a cluster more closely related to East-Asian populations and Indo-Iranian speakers forming a cluster closer to Western Eurasians. The scattered position of Uzbeks across Turkic- and Indo-Iranian-speaking populations may reflect their origins from the union of different tribes. We propose that the complex genetic landscape of Central Asian populations results from the movements of eastern, Turkic-speaking groups during historical times, into a long-lasting group of settled populations, which may be represented nowadays by Tajiks and Turkmen. Contrary to what is generally thought, our results suggest that the recurrent expansions of eastern nomadic groups did not result in the complete replacement of local populations, but rather into partial admixture.
PMCID: PMC3025785  PMID: 20823912
admixture; Central Asia; ethnic groups; genetic diversity; microsatellites; population genetics
16.  Genetic diversity in India and the inference of Eurasian population expansion 
Genome Biology  2010;11(11):R113.
Genetic studies of populations from the Indian subcontinent are of great interest because of India's large population size, complex demographic history, and unique social structure. Despite recent large-scale efforts in discovering human genetic variation, India's vast reservoir of genetic diversity remains largely unexplored.
To analyze an unbiased sample of genetic diversity in India and to investigate human migration history in Eurasia, we resequenced one 100-kb ENCODE region in 92 samples collected from three castes and one tribal group from the state of Andhra Pradesh in south India. Analyses of the four Indian populations, along with eight HapMap populations (692 samples), showed that 30% of all SNPs in the south Indian populations are not seen in HapMap populations. Several Indian populations, such as the Yadava, Mala/Madiga, and Irula, have nucleotide diversity levels as high as those of HapMap African populations. Using unbiased allele-frequency spectra, we investigated the expansion of human populations into Eurasia. The divergence time estimates among the major population groups suggest that Eurasian populations in this study diverged from Africans during the same time frame (approximately 90 to 110 thousand years ago). The divergence among different Eurasian populations occurred more than 40,000 years after their divergence with Africans.
Our results show that Indian populations harbor large amounts of genetic variation that have not been surveyed adequately by public SNP discovery efforts. Our data also support a delayed expansion hypothesis in which an ancestral Eurasian founding population remained isolated long after the out-of-Africa diaspora, before expanding throughout Eurasia.
PMCID: PMC3156952  PMID: 21106085
17.  Comparative analysis of Panicum streak virus and Maize streak virus diversity, recombination patterns and phylogeography 
Virology Journal  2009;6:194.
Panicum streak virus (PanSV; Family Geminiviridae; Genus Mastrevirus) is a close relative of Maize streak virus (MSV), the most serious viral threat to maize production in Africa. PanSV and MSV have the same leafhopper vector species, largely overlapping natural host ranges and similar geographical distributions across Africa and its associated Indian Ocean Islands. Unlike MSV, however, PanSV has no known economic relevance.
Here we report on 16 new PanSV full genome sequences sampled throughout Africa and use these together with others in public databases to reveal that PanSV and MSV populations in general share very similar patterns of genetic exchange and geographically structured diversity. A potentially important difference between the species, however, is that the movement of MSV strains throughout Africa is apparently less constrained than that of PanSV strains. Interestingly the MSV-A strain which causes maize streak disease is apparently the most mobile of all the PanSV and MSV strains investigated.
We therefore hypothesize that the generally increased mobility of MSV relative to other closely related species such as PanSV, may have been an important evolutionary step in the eventual emergence of MSV-A as a serious agricultural pathogen.
The GenBank accession numbers for the sequences reported in this paper are GQ415386-GQ415401
PMCID: PMC2777162  PMID: 19903330
18.  Phylogeography of the Crown-of-Thorns Starfish in the Indian Ocean 
PLoS ONE  2012;7(8):e43499.
Understanding the limits and population dynamics of closely related sibling species in the marine realm is particularly relevant in organisms that require management. The crown-of-thorns starfish Acanthaster planci, recently shown to be a species complex of at least four closely related species, is a coral predator infamous for its outbreaks that have devastated reefs throughout much of its Indo-Pacific distribution.
Methodology/Principal Findings
In this first Indian Ocean-wide genetic study of a marine organism we investigated the genetic structure and inferred the paleohistory of the two Indian Ocean sister-species of Acanthaster planci using mitochondrial DNA sequence analyses. We suggest that the first of two main diversification events led to the formation of a Southern and Northern Indian Ocean sister-species in the late Pliocene-early Pleistocene. The second led to the formation of two internal clades within each species around the onset of the last interglacial. The subsequent demographic history of the two lineages strongly differed, the Southern Indian Ocean sister-species showing a signature of recent population expansion and hardly any regional structure, whereas the Northern Indian Ocean sister-species apparently maintained a constant size with highly differentiated regional groupings that were asymmetrically connected by gene flow.
Past and present surface circulation patterns in conjunction with ocean primary productivity were identified as the processes most likely to have shaped the genetic structure between and within the two Indian Ocean lineages. This knowledge will help to understand the biological or ecological differences of the two sibling species and therefore aid in developing strategies to manage population outbreaks of this coral predator in the Indian Ocean.
PMCID: PMC3424128  PMID: 22927975
19.  Phylogeography of the ant Myrmica rubra and its inquiline social parasite 
Ecology and Evolution  2011;1(1):46-62.
Widely distributed Palearctic insects are ideal to study phylogeographic patterns owing to their high potential to survive in many Pleistocene refugia and—after the glaciation—to recolonize vast, continuous areas. Nevertheless, such species have received little phylogeographic attention. Here, we investigated the Pleistocene refugia and subsequent postglacial colonization of the common, abundant, and widely distributed ant Myrmica rubra over most of its Palearctic area, using mitochondrial DNA (mtDNA). The western and eastern populations of M. rubra belonged predominantly to separate haplogroups, which formed a broad secondary contact zone in Central Europe. The distribution of genetic diversity and haplogroups implied that M. rubra survived the last glaciation in multiple refugia located over an extensive area from Iberia in the west to Siberia in the east, and colonized its present areas of distribution along several routes. The matrilineal genetic structure of M. rubra was probably formed during the last glaciation and subsequent postglacial expansion. Additionally, because M. rubra has two queen morphs, the obligately socially parasitic microgyne and its macrogyne host, we tested the suggested speciation of the parasite. Locally, the parasite and host usually belonged to the same haplogroup but differed in haplotype frequencies. This indicates that genetic differentiation between the morphs is a universal pattern and thus incipient, sympatric speciation of the parasite from its host is possible. If speciation is taking place, however, it is not yet visible as lineage sorting of the mtDNA between the morphs.
PMCID: PMC3287377  PMID: 22393482
Hymenoptera; inquilinism; Pleistocene glaciations; postglacial recolonization; social parasitism; speciation
20.  Genetic Structure of Tibeto-Burman Populations of Bangladesh: Evaluating the Gene Flow along the Sides of Bay-of-Bengal 
PLoS ONE  2013;8(10):e75064.
Human settlement and migrations along sides of Bay-of-Bengal have played a vital role in shaping the genetic landscape of Bangladesh, Eastern India and Southeast Asia. Bangladesh and Northeast India form the vital land bridge between the South and Southeast Asia. To reconstruct the population history of this region and to see whether this diverse region geographically acted as a corridor or barrier for human interaction between South Asia and Southeast Asia, we, for the first time analyzed high resolution uniparental (mtDNA and Y chromosome) and biparental autosomal genetic markers among aboriginal Bangladesh tribes currently speaking Tibeto-Burman language. All the three studied populations; Chakma, Marma and Tripura from Bangladesh showed strikingly high homogeneity among themselves and strong affinities to Northeast Indian Tibeto-Burman groups. However, they show substantially higher molecular diversity than Northeast Indian populations. Unlike Austroasiatic (Munda) speakers of India, we observed equal role of both males and females in shaping the Tibeto-Burman expansion in Southern Asia. Moreover, it is noteworthy that in admixture proportion, TB populations of Bangladesh carry substantially higher mainland Indian ancestry component than Northeast Indian Tibeto-Burmans. Largely similar expansion ages of two major paternal haplogroups (O2a and O3a3c), suggested that they arose before the differentiation of any language group and approximately at the same time. Contrary to the scenario proposed for colonization of Northeast India as male founder effect that occurred within the past 4,000 years, we suggest a significantly deep colonization of this region. Overall, our extensive analysis revealed that the population history of South Asian Tibeto-Burman speakers is more complex than it was suggested before.
PMCID: PMC3794028  PMID: 24130682
21.  Phylogeography and postglacial expansion of the endangered semi-aquatic mammal Galemys pyrenaicus 
Species with strict ecological requirements may provide new insights into the forces that shaped the geographic variation of genetic diversity. The Pyrenean desman, Galemys pyrenaicus, is a small semi-aquatic mammal that inhabits clean streams of the northern half of the Iberian Peninsula and is endangered in most of its geographic range, but its genetic structure is currently unknown. While the stringent ecological demands derived from its aquatic habitat might have caused a partition of the genetic diversity among river basins, Pleistocene glaciations would have generated a genetic pattern related to glacial refugia.
To study the relative importance of historical and ecological factors in the genetic structure of G. pyrenaicus, we used mitochondrial and intronic sequences of specimens covering most of the species range. We show, first, that the Pyrenean desman has very low levels of genetic diversity compared to other mammals. In addition, phylogenetic and dating analyses of the mitochondrial sequences reveal a strong phylogeographic structure of a Middle Pleistocene origin, suggesting that the main lineages arose during periods of glacial isolation. Furthermore, both the spatial distribution of nuclear and mitochondrial diversity and the results of species distribution modeling suggest the existence of a major glacial refugium in the northwestern part of the Iberian Peninsula. Finally, the main mitochondrial lineages show a striking parapatric distribution without any apparent exchange of mitochondrial haplotypes between the lineages that came into secondary contact (although with certain permeability to nuclear genes), indicating incomplete mixing after the post-glacial recolonization. On the other hand, when we analyzed the partition of the genetic diversity among river basins, the Pyrenean desman showed a lower than expected genetic differentiation among main rivers.
The analysis of mitochondrial and intronic markers in G. pyrenaicus showed the predominant effects of Pleistocene glaciations on the genetic structure of this species, while the distribution of the genetic diversity was not greatly influenced by the main river systems. These results and, particularly, the discovery of a marked phylogeographic structure, may have important implications for the conservation of the Pyrenean desman.
PMCID: PMC3682870  PMID: 23738626
Conservation genetics; Introns; Mammals; Mitochondrial genes; Nuclear genes; Pyrenean desman; Niche modeling; Iberian Peninsula; Endemism
22.  Bayesian phylogeography of the Arawak expansion in lowland South America 
Phylogenetic inference based on language is a vital tool for tracing the dynamics of human population expansions. The timescale of agriculture-based expansions around the world provides an informative amount of linguistic change ideal for reconstructing phylogeographies. Here we investigate the expansion of Arawak, one of the most widely dispersed language families in the Americas, scattered from the Antilles to Argentina. It has been suggested that Northwest Amazonia is the Arawak homeland based on the large number of diverse languages in the region. We generate language trees by coding cognates of basic vocabulary words for 60 Arawak languages and dialects to estimate the phylogenetic relationships among Arawak societies, while simultaneously implementing a relaxed random walk model to infer phylogeographic history. Estimates of the Arawak homeland exclude Northwest Amazonia and are bi-modal, with one potential homeland on the Atlantic seaboard and another more likely origin in Western Amazonia. Bayesian phylogeography better supports a Western Amazonian origin, and consequent dispersal to the Caribbean and across the lowlands. Importantly, the Arawak expansion carried with it not only language but also a number of cultural traits that contrast Arawak societies with other lowland cultures.
PMCID: PMC3136831  PMID: 21247954
Amazonian languages and cultures; population expansions; phylogenetic trees; ancient histories
23.  Evolutionary History of Helicobacter pylori Sequences Reflect Past Human Migrations in Southeast Asia 
PLoS ONE  2011;6(7):e22058.
The human population history in Southeast Asia was shaped by numerous migrations and population expansions. Their reconstruction based on archaeological, linguistic or human genetic data is often hampered by the limited number of informative polymorphisms in classical human genetic markers, such as the hypervariable regions of the mitochondrial DNA. Here, we analyse housekeeping gene sequences of the human stomach bacterium Helicobacter pylori from various countries in Southeast Asia and we provide evidence that H. pylori accompanied at least three ancient human migrations into this area: i) a migration from India introducing hpEurope bacteria into Thailand, Cambodia and Malaysia; ii) a migration of the ancestors of Austro-Asiatic speaking people into Vietnam and Cambodia carrying hspEAsia bacteria; and iii) a migration of the ancestors of the Thai people from Southern China into Thailand carrying H. pylori of population hpAsia2. Moreover, the H. pylori sequences reflect iv) the migrations of Chinese to Thailand and Malaysia within the last 200 years spreading hspEasia strains, and v) migrations of Indians to Malaysia within the last 200 years distributing both hpAsia2 and hpEurope bacteria. The distribution of the bacterial populations seems to strongly influence the incidence of gastric cancer as countries with predominantly hspEAsia isolates exhibit a high incidence of gastric cancer while the incidence is low in countries with a high proportion of hpAsia2 or hpEurope strains. In the future, the host range expansion of hpEurope strains among Asian populations, combined with human motility, may have a significant impact on gastric cancer incidence in Asia.
PMCID: PMC3139604  PMID: 21818291
24.  Genomic view on the peopling of India 
India is known for its vast human diversity, consisting of more than four and a half thousand anthropologically well-defined populations. Each population differs in terms of language, culture, physical features and, most importantly, genetic architecture. The size of populations varies from a few hundred to millions. Based on the social structure, Indians are classified into various caste, tribe and religious groups. These social classifications are very rigid and have remained undisturbed by emerging urbanisation and cultural changes. The variable social customs, strict endogamy marriage practices, long-term isolation and evolutionary forces have added immensely to the diversification of the Indian populations. These factors have also led to these populations acquiring a set of Indian-specific genetic variations responsible for various diseases in India. Interestingly, most of these variations are absent outside the Indian subcontinent. Thus, this review is focused on the peopling of India, the caste system, marriage practice and the resulting health and forensic implications.
PMCID: PMC3514343  PMID: 23020857
Admixture; caste; Indians; mtDNA; tribe; Y-chromosome
25.  Alpine Crossroads or Origin of Genetic Diversity? Comparative Phylogeography of Two Sympatric Microgastropod Species 
PLoS ONE  2012;7(5):e37089.
The Alpine Region, constituting the Alps and the Dinaric Alps, has played a major role in the formation of current patterns of biodiversity either as a contact zone of postglacial expanding lineages or as the origin of genetic diversity. In our study, we tested these hypotheses for two widespread, sympatric microgastropod taxa – Carychium minimum O.F. Müller, 1774 and Carychium tridentatum (Risso, 1826) (Gastropoda, Eupulmonata, Carychiidae) – by using COI sequence data and species potential distribution models analyzed in a statistical phylogeographical framework. Additionally, we examined disjunct transatlantic populations of those taxa from the Azores and North America. In general, both Carychium taxa demonstrate a genetic structure composed of several differentiated haplotype lineages most likely resulting from allopatric diversification in isolated refugial areas during the Pleistocene glacial periods. However, the genetic structure of Carychium minimum is more pronounced, which can be attributed to ecological constraints relating to habitat proximity to permanent bodies of water. For most of the Carychium lineages, the broader Alpine Region was identified as the likely origin of genetic diversity. Several lineages are endemic to the broader Alpine Region whereas a single lineage per species underwent a postglacial expansion to (re)colonize previously unsuitable habitats, e.g. in Northern Europe. The source populations of those expanding lineages can be traced back to the Eastern and Western Alps. Consequently, we identify the Alpine Region as a significant ‘hot-spot’ for the formation of genetic diversity within European Carychium lineages. Passive dispersal via anthropogenic means best explains the presence of transatlantic European Carychium populations on the Azores and in North America. We conclude that passive (anthropogenic) transport could mislead the interpretation of observed phylogeographical patterns in general.
PMCID: PMC3351404  PMID: 22606334

