The northern region of the Indian subcontinent is a vast landscape interlaced by diverse ecologies, for example, the Gangetic Plain and the Himalayas. A great number of ethnic groups are found there, displaying a multitude of languages and cultures. The Tharu is one of the largest and most linguistically diverse of such groups, scattered across the Tarai region of Nepal and bordering Indian states. Their origins are uncertain. Hypotheses have been advanced postulating shared ancestry with Austroasiatic, or Tibeto-Burman-speaking populations as well as aboriginal roots in the Tarai. Several Tharu groups speak a variety of Indo-Aryan languages, but have traditionally been described by ethnographers as representing East Asian phenotype. Their ancestry and intra-population diversity has previously been tested only for haploid (mitochondrial DNA and Y-chromosome) markers in a small portion of the population. This study presents the first systematic genetic survey of the Tharu from both Nepal and two Indian states of Uttarakhand and Uttar Pradesh, using genome-wide SNPs and haploid markers. We show that the Tharu have dual genetic ancestry as up to one-half of their gene pool is of East Asian origin. Within the South Asian proportion of the Tharu genetic ancestry, we see vestiges of their common origin in the north of the South Asian Subcontinent manifested by mitochondrial DNA haplogroup M43.
The bacteria Yersinia pestis is the etiological agent of plague and has caused human pandemics with millions of deaths in historic times. How and when it originated remains contentious. Here, we report the oldest direct evidence of Yersinia pestis identified by ancient DNA in human teeth from Asia and Europe dating from 2,800 to 5,000 years ago. By sequencing the genomes, we find that these ancient plague strains are basal to all known Yersinia pestis. We find the origins of the Yersinia pestis lineage to be at least two times older than previous estimates. We also identify a temporal sequence of genetic changes that lead to increased virulence and the emergence of the bubonic plague. Our results show that plague infection was endemic in the human populations of Eurasia at least 3,000 years before any historical recordings of pandemics.
•Yersinia pestis was common across Eurasia in the Bronze Age•The most recent common ancestor of all Y. pestis was 5,783 years ago•The ymt gene was acquired before 951 cal BC, giving rise to transmission via fleas•Bronze Age Y. pestis was not capable of causing bubonic plague
The plague-causing bacteria Yersinia pestis infected humans in Bronze Age Eurasia, three millennia earlier than any historical records of plague, but only acquired the genetic changes making it a highly virulent, flea-borne bubonic strain ∼3,000 years ago.
In order to explore the diversity and selective signatures of duplication and deletion human copy number variants (CNVs), we sequenced 236 individuals from 125 distinct human populations. We observed that duplications exhibit fundamentally different population genetic and selective signatures than deletions and are more likely to be stratified between human populations. Through reconstruction of the ancestral human genome, we identify megabases of DNA lost in different human lineages and pinpoint large duplications that introgressed from the extinct Denisova lineage now found at high frequency exclusively in Oceanic populations. We find that the proportion of CNV base pairs to single nucleotide variant base pairs is greater among non-Africans than it is among African populations, but we conclude that this difference is likely due to unique aspects of non-African population history as opposed to differences in CNV load.
The Slavic branch of the Balto-Slavic sub-family of Indo-European languages underwent rapid divergence as a result of the spatial expansion of its speakers from Central-East Europe, in early medieval times. This expansion–mainly to East Europe and the northern Balkans–resulted in the incorporation of genetic components from numerous autochthonous populations into the Slavic gene pools. Here, we characterize genetic variation in all extant ethnic groups speaking Balto-Slavic languages by analyzing mitochondrial DNA (n = 6,876), Y-chromosomes (n = 6,079) and genome-wide SNP profiles (n = 296), within the context of other European populations. We also reassess the phylogeny of Slavic languages within the Balto-Slavic branch of Indo-European. We find that genetic distances among Balto-Slavic populations, based on autosomal and Y-chromosomal loci, show a high correlation (0.9) both with each other and with geography, but a slightly lower correlation (0.7) with mitochondrial DNA and linguistic affiliation. The data suggest that genetic diversity of the present-day Slavs was predominantly shaped in situ, and we detect two different substrata: ‘central-east European’ for West and East Slavs, and ‘south-east European’ for South Slavs. A pattern of distribution of segments identical by descent between groups of East-West and South Slavs suggests shared ancestry or a modest gene flow between those two groups, which might derive from the historic spread of Slavic people.
The Turkic peoples represent a diverse collection of ethnic groups defined by the Turkic languages. These groups have dispersed across a vast area, including Siberia, Northwest China, Central Asia, East Europe, the Caucasus, Anatolia, the Middle East, and Afghanistan. The origin and early dispersal history of the Turkic peoples is disputed, with candidates for their ancient homeland ranging from the Transcaspian steppe to Manchuria in Northeast Asia. Previous genetic studies have not identified a clear-cut unifying genetic signal for the Turkic peoples, which lends support for language replacement rather than demic diffusion as the model for the Turkic language’s expansion. We addressed the genetic origin of 373 individuals from 22 Turkic-speaking populations, representing their current geographic range, by analyzing genome-wide high-density genotype data. In agreement with the elite dominance model of language expansion most of the Turkic peoples studied genetically resemble their geographic neighbors. However, western Turkic peoples sampled across West Eurasia shared an excess of long chromosomal tracts that are identical by descent (IBD) with populations from present-day South Siberia and Mongolia (SSM), an area where historians center a series of early Turkic and non-Turkic steppe polities. While SSM matching IBD tracts (> 1cM) are also observed in non-Turkic populations, Turkic peoples demonstrate a higher percentage of such tracts (p-values ≤ 0.01) compared to their non-Turkic neighbors. Finally, we used the ALDER method and inferred admixture dates (~9th–17th centuries) that overlap with the Turkic migrations of the 5th–16th centuries. Thus, our results indicate historical admixture among Turkic peoples, and the recent shared ancestry with modern populations in SSM supports one of the hypothesized homelands for their nomadic Turkic and related Mongolic ancestors.
Centuries of nomadic migrations have ultimately resulted in the distribution of Turkic languages over a large area ranging from Siberia, across Central Asia to Eastern Europe and the Middle East. Despite the profound cultural impact left by these nomadic peoples, little is known about their prehistoric origins. Moreover, because contemporary Turkic speakers tend to genetically resemble their geographic neighbors, it is not clear whether their nomadic ancestors left an identifiable genetic trace. In this study, we show that Turkic-speaking peoples sampled across the Middle East, Caucasus, East Europe, and Central Asia share varying proportions of Asian ancestry that originate in a single area, southern Siberia and Mongolia. Mongolic- and Turkic-speaking populations from this area bear an unusually high number of long chromosomal tracts that are identical by descent with Turkic peoples from across west Eurasia. Admixture induced linkage disequilibrium decay across chromosomes in these populations indicates that admixture occurred during the 9th–17th centuries, in agreement with the historically recorded Turkic nomadic migrations and later Mongol expansion. Thus, our findings reveal genetic traces of recent large-scale nomadic migrations and map their source to a previously hypothesized area of Mongolia and southern Siberia.
We sequenced the genomes of a ~7,000 year old farmer from Germany and eight
~8,000 year old hunter-gatherers from Luxembourg and Sweden. We analyzed these and other
ancient genomes1–4 with 2,345 contemporary humans to show that most
present Europeans derive from at least three highly differentiated populations: West
European Hunter-Gatherers (WHG), who contributed ancestry to all Europeans but not to Near
Easterners; Ancient North Eurasians (ANE) related to Upper Paleolithic Siberians3, who contributed to both Europeans and Near
Easterners; and Early European Farmers (EEF), who were mainly of Near Eastern origin but
also harbored WHG-related ancestry. We model these populations’ deep relationships
and show that EEF had ~44% ancestry from a “Basal Eurasian”
population that split prior to the diversification of other non-African lineages.
The origins of the First Americans remain contentious. Although Native Americans
seem to be genetically most closely related to east Asians1–3, there is no
consensus with regard to which specific Old World populations they are closest
to4–8. Here we sequence the draft genome of an approximately 24,000-year-old
individual (MA-1), from Mal’ta in south-central Siberia9, to an average depth of 13. To our knowledge this is the
oldest anatomically modern human genome reported to date. The MA-1 mitochondrial genome
belongs to haplogroup U, which has also been found at high frequency among Upper
Palaeolithic and Mesolithic European hunter-gatherers10–12, and the Y
chromosome of MA-1 is basal to modern-day western Eurasians and near the root of most
Native American lineages5. Similarly, we
find autosomal evidence that MA-1 is basal to modern-day western Eurasians and genetically
closely related to modern-day Native Americans, with no close affinity to east Asians.
This suggests that populations related to contemporary western Eurasians had a more
north-easterly distribution 24,000 years ago than commonly thought. Furthermore, we
estimate that 14 to 38% of Native American ancestry may originate through gene
flow from this ancient population. This is likely to have occurred after the divergence of
Native American ancestors from east Asian ancestors, but before the diversification of
Native American populations in the New World. Gene flow from the MA-1 lineage into Native
American ancestors could explain why several crania from the First Americans have been
reported as bearing morphological characteristics that do not resemble those of east
Asians2,13. Sequencing of another south-central Siberian, Afontova Gora-2 dating
to approximately 17,000 years ago14,
revealed similar autosomal genetic signatures as MA-1, suggesting that the region was
continuously occupied by humans throughout the Last Glacial Maximum. Our findings reveal
that western Eurasian genetic signatures in modern-day Native Americans derive not only
from post-Columbian admixture, as commonly thought, but also from a mixed ancestry of the
Following the dispersal out of Africa, where hominins evolved in warm environments for millions of years, our species has colonised different climate zones of the world, including high latitudes and cold environments. The extent to which human habitation in (sub-)Arctic regions has been enabled by cultural buffering, short-term acclimatization and genetic adaptations is not clearly understood. Present day indigenous populations of Siberia show a number of phenotypic features, such as increased basal metabolic rate, low serum lipid levels and increased blood pressure that have been attributed to adaptation to the extreme cold climate. In this study we introduce a dataset of 200 individuals from ten indigenous Siberian populations that were genotyped for 730,525 SNPs across the genome to identify genes and non-coding regions that have undergone unusually rapid allele frequency and long-range haplotype homozygosity change in the recent past. At least three distinct population clusters could be identified among the Siberians, each of which showed a number of unique signals of selection. A region on chromosome 11 (chr11:66–69 Mb) contained the largest amount of clustering of significant signals and also the strongest signals in all the different selection tests performed. We present a list of candidate cold adaption genes that showed significant signals of positive selection with our strongest signals associated with genes involved in energy regulation and metabolism (CPT1A, LRP5, THADA) and vascular smooth muscle contraction (PRKG1). By employing a new method that paints phased chromosome chunks by their ancestry we distinguish local Siberian-specific long-range haplotype signals from those introduced by admixture.
We present an Aboriginal Australian genomic sequence obtained from a 100-year-old lock of hair donated by an Aboriginal man from southern Western Australia in the early 20th century. We detect no evidence of European admixture and estimate contamination levels to be below 0.5%. We show that Aboriginal Australians are descendants of an early human dispersal into eastern Asia, possibly 62,000 to 75,000 years ago. This dispersal is separate from the one that gave rise to modern Asians 25,000 to 38,000 years ago. We also find evidence of gene flow between populations of the two dispersal waves prior to the divergence of Native Americans from modern Asian ancestors. Our findings support the hypothesis that present-day Aboriginal Australians descend from the earliest humans to occupy Australia, likely representing one of the oldest continuous populations outside Africa.
We report here the genome sequence of an ancient human. Obtained from ∼4,000-year-old permafrost-preserved hair, the genome represents a male individual from the first known culture to settle in Greenland. Sequenced to an average depth of 20×, we recover 79% of the diploid genome, an amount close to the practical limit of current sequencing technologies. We identify 353,151 high-confidence single-nucleotide polymorphisms (SNPs), of which 6.8% have not been reported previously. We estimate raw read contamination to be no higher than 0.8%. We use functional SNP assessment to assign possible phenotypic characteristics of the individual that belonged to a culture whose location has yielded only trace human remains. We compare the high-confidence SNPs to those of contemporary populations to find the populations most closely related to the individual. This provides evidence for a migration from Siberia into the New World some 5,500 years ago, independent of that giving rise to the modern Native Americans and Inuit.
Skin pigmentation is one of the most variable phenotypic traits in humans. A non-synonymous substitution (rs1426654) in the third exon of SLC24A5 accounts for lighter skin in Europeans but not in East Asians. A previous genome-wide association study carried out in a heterogeneous sample of UK immigrants of South Asian descent suggested that this gene also contributes significantly to skin pigmentation variation among South Asians. In the present study, we have quantitatively assessed skin pigmentation for a largely homogeneous cohort of 1228 individuals from the Southern region of the Indian subcontinent. Our data confirm significant association of rs1426654 SNP with skin pigmentation, explaining about 27% of total phenotypic variation in the cohort studied. Our extensive survey of the polymorphism in 1573 individuals from 54 ethnic populations across the Indian subcontinent reveals wide presence of the derived-A allele, although the frequencies vary substantially among populations. We also show that the geospatial pattern of this allele is complex, but most importantly, reflects strong influence of language, geography and demographic history of the populations. Sequencing 11.74 kb of SLC24A5 in 95 individuals worldwide reveals that the rs1426654-A alleles in South Asian and West Eurasian populations are monophyletic and occur on the background of a common haplotype that is characterized by low genetic diversity. We date the coalescence of the light skin associated allele at 22–28 KYA. Both our sequence and genome-wide genotype data confirm that this gene has been a target for positive selection among Europeans. However, the latter also shows additional evidence of selection in populations of the Middle East, Central Asia, Pakistan and North India but not in South India.
Human skin color is one of the most visible aspects of human diversity. The genetic basis of pigmentation in Europeans has been understood to some extent, but our knowledge about South Asians has been restricted to a handful of studies. It has been suggested that a single nucleotide difference in SLC24A5 accounts for 25–38% European-African pigmentation differences and correlates with lighter skin. This genetic variant has also been associated with skin color variation among South Asians living in the UK. Here, we report a study based on a homogenous cohort of South India. Our results confirm that SLC24A5 plays a key role in pigmentation diversity of South Asians. Country-wide screening of the variant reveals that the light skin associated allele is widespread in the Indian subcontinent and its complex patterning is shaped by a combination of processes involving selection and demographic history of the populations. By studying the variation of SLC24A5 sequences among a diverse set of individuals, we show that the light skin associated allele in South Asians is identical by descent to that found in Europeans. Our study also provides new insights into positive selection acting on the gene and the evolutionary history of light skin in humans.
Despite being located at the crossroads of Asia, genetics of the Afghanistan populations have been largely overlooked. It is currently inhabited by five major ethnic populations: Pashtun, Tajik, Hazara, Uzbek and Turkmen. Here we present autosomal from a subset of our samples, mitochondrial and Y- chromosome data from over 500 Afghan samples among these 5 ethnic groups. This Afghan data was supplemented with the same Y-chromosome analyses of samples from Iran, Kyrgyzstan, Mongolia and updated Pakistani samples (HGDP-CEPH). The data presented here was integrated into existing knowledge of pan-Eurasian genetic diversity. The pattern of genetic variation, revealed by structure-like and Principal Component analyses and Analysis of Molecular Variance indicates that the people of Afghanistan are made up of a mosaic of components representing various geographic regions of Eurasian ancestry. The absence of a major Central Asian-specific component indicates that the Hindu Kush, like the gene pool of Central Asian populations in general, is a confluence of gene flows rather than a source of distinctly autochthonous populations that have arisen in situ: a conclusion that is reinforced by the phylogeography of both haploid loci.
Sakha – an area connecting South and Northeast Siberia – is significant for understanding the history of peopling of Northeast Eurasia and the Americas. Previous studies have shown a genetic contiguity between Siberia and East Asia and the key role of South Siberia in the colonization of Siberia.
We report the results of a high-resolution phylogenetic analysis of 701 mtDNAs and 318 Y chromosomes from five native populations of Sakha (Yakuts, Evenks, Evens, Yukaghirs and Dolgans) and of the analysis of more than 500,000 autosomal SNPs of 758 individuals from 55 populations, including 40 previously unpublished samples from Siberia. Phylogenetically terminal clades of East Asian mtDNA haplogroups C and D and Y-chromosome haplogroups N1c, N1b and C3, constituting the core of the gene pool of the native populations from Sakha, connect Sakha and South Siberia. Analysis of autosomal SNP data confirms the genetic continuity between Sakha and South Siberia. Maternal lineages D5a2a2, C4a1c, C4a2, C5b1b and the Yakut-specific STR sub-clade of Y-chromosome haplogroup N1c can be linked to a migration of Yakut ancestors, while the paternal lineage C3c was most likely carried to Sakha by the expansion of the Tungusic people. MtDNA haplogroups Z1a1b and Z1a3, present in Yukaghirs, Evens and Dolgans, show traces of different and probably more ancient migration(s). Analysis of both haploid loci and autosomal SNP data revealed only minor genetic components shared between Sakha and the extreme Northeast Siberia. Although the major part of West Eurasian maternal and paternal lineages in Sakha could originate from recent admixture with East Europeans, mtDNA haplogroups H8, H20a and HV1a1a, as well as Y-chromosome haplogroup J, more probably reflect an ancient gene flow from West Eurasia through Central Asia and South Siberia.
Our high-resolution phylogenetic dissection of mtDNA and Y-chromosome haplogroups as well as analysis of autosomal SNP data suggests that Sakha was colonized by repeated expansions from South Siberia with minor gene flow from the Lower Amur/Southern Okhotsk region and/or Kamchatka. The minor West Eurasian component in Sakha attests to both recent and ongoing admixture with East Europeans and an ancient gene flow from West Eurasia.
mtDNA; Y chromosome; Autosomal SNPs; Sakha
Population genetic studies on European populations have highlighted Italy as one of genetically most diverse regions. This is possibly due to the country's complex demographic history and large variability in terrain throughout the territory. This is the reason why Italy is enriched for population isolates, Sardinia being the best-known example. As the population isolates have a great potential in disease-causing genetic variants identification, we aimed to genetically characterize a region from northeastern Italy, which is known for isolated communities. Total of 1310 samples, collected from six geographically isolated villages, were genotyped at >145 000 single-nucleotide polymorphism positions. Newly genotyped data were analyzed jointly with the available genome-wide data sets of individuals of European descent, including several population isolates. Despite the linguistic differences and geographical isolation the village populations still show the greatest genetic similarity to other Italian samples. The genetic isolation and small effective population size of the village populations is manifested by higher levels of genomic homozygosity and elevated linkage disequilibrium. These estimates become even more striking when the detected substructure is taken into account. The observed level of genetic isolation in Friuli-Venezia Giulia region is more extreme according to several measures of isolation compared with Sardinians, French Basques and northern Finns, thus proving the status of an isolate.
population genetics; isolated population; genetic distance
A Southwest Asian origin and dispersal to North Africa in the Early Upper Palaeolithic era has been inferred in previous studies for mtDNA haplogroups M1 and U6. Both haplogroups have been proposed to show similar geographic patterns and shared demographic histories.
We report here 24 M1 and 33 U6 new complete mtDNA sequences that allow us to refine the existing phylogeny of these haplogroups. The resulting phylogenetic information was used to genotype a further 131 M1 and 91 U6 samples to determine the geographic spread of their sub-clades. No southwest Asian specific clades for M1 or U6 were discovered. U6 and M1 frequencies in North Africa, the Middle East and Europe do not follow similar patterns, and their sub-clade divisions do not appear to be compatible with their shared history reaching back to the Early Upper Palaeolithic. The Bayesian Skyline Plots testify to non-overlapping phases of expansion, and the haplogroups’ phylogenies suggest that there are U6 sub-clades that expanded earlier than those in M1. Some M1 and U6 sub-clades could be linked with certain events. For example, U6a1 and M1b, with their coalescent ages of ~20,000–22,000 years ago and earliest inferred expansion in northwest Africa, could coincide with the flourishing of the Iberomaurusian industry, whilst U6b and M1b1 appeared at the time of the Capsian culture.
Our high-resolution phylogenetic dissection of both haplogroups and coalescent time assessments suggest that the extant main branching pattern of both haplogroups arose and diversified in the mid-later Upper Palaeolithic, with some sub-clades concomitantly with the expansion of the Iberomaurusian industry. Carriers of these maternal lineages have been later absorbed into and diversified further during the spread of Afro-Asiatic languages in North and East Africa.
mtDNA haplogroups M1 and U6; Afro-Asiatic languages; North Africa
The geographic origin and time of dispersal of Austroasiatic (AA) speakers, presently settled in south and southeast Asia, remains disputed. Two rival hypotheses, both assuming a demic component to the language dispersal, have been proposed. The first of these places the origin of Austroasiatic speakers in southeast Asia with a later dispersal to south Asia during the Neolithic, whereas the second hypothesis advocates pre-Neolithic origins and dispersal of this language family from south Asia. To test the two alternative models, this study combines the analysis of uniparentally inherited markers with 610,000 common single nucleotide polymorphism loci from the nuclear genome. Indian AA speakers have high frequencies of Y chromosome haplogroup O2a; our results show that this haplogroup has significantly higher diversity and coalescent time (17–28 thousand years ago) in southeast Asia, strongly supporting the first of the two hypotheses. Nevertheless, the results of principal component and “structure-like” analyses on autosomal loci also show that the population history of AA speakers in India is more complex, being characterized by two ancestral components—one represented in the pattern of Y chromosomal and EDAR results and the other by mitochondrial DNA diversity and genomic structure. We propose that AA speakers in India today are derived from dispersal from southeast Asia, followed by extensive sex-specific admixture with local Indian populations.
Austroasiatic; mtDNA; Y chromosome; autosomes; admixture
Human Y-chromosome haplogroup structure is largely circumscribed by continental boundaries. One notable exception to this general pattern is the young haplogroup R1a that exhibits post-Glacial coalescent times and relates the paternal ancestry of more than 10% of men in a wide geographic area extending from South Asia to Central East Europe and South Siberia. Its origin and dispersal patterns are poorly understood as no marker has yet been described that would distinguish European R1a chromosomes from Asian. Here we present frequency and haplotype diversity estimates for more than 2000 R1a chromosomes assessed for several newly discovered SNP markers that introduce the onset of informative R1a subdivisions by geography. Marker M434 has a low frequency and a late origin in West Asia bearing witness to recent gene flow over the Arabian Sea. Conversely, marker M458 has a significant frequency in Europe, exceeding 30% in its core area in Eastern Europe and comprising up to 70% of all M17 chromosomes present there. The diversity and frequency profiles of M458 suggest its origin during the early Holocene and a subsequent expansion likely related to a number of prehistoric cultural developments in the region. Its primary frequency and diversity distribution correlates well with some of the major Central and East European river basins where settled farming was established before its spread further eastward. Importantly, the virtual absence of M458 chromosomes outside Europe speaks against substantial patrilineal gene flow from East Europe to Asia, including to India, at least since the mid-Holocene.
Y chromosome; haplogroup R1a; human evolution; population genetics
We have analyzed 7137 samples from 125 different caste, tribal and religious groups of India and 99 samples from three populations of Nepal for the length variation in the COII/tRNALys region of mtDNA. Samples showing length variation were subjected to detailed phylogenetic analysis based on HVS-I and informative coding region sequence variation. The overall frequencies of the 9-bp deletion and insertion variants in South Asia were 1.8% and 0.5%, respectively. We have also defined a novel deep-rooting haplogroup M43 and identified the rare haplogroup H14 in Indian populations carrying the 9bp-deletion by complete mtDNA sequencing. Moreover, we redefined haplogroup M6 and dissected it into two well-defined subclades. The presence of haplogroups F1 and B5a in Uttar Pradesh suggests minor maternal contribution from Southeast Asia to Northern India. The occurrence of haplogroup F1 in the Nepalese sample implies that Nepal might have served as a bridge for the flow of eastern lineages to India. The presence of R6 in the Nepalese, on the other hand, suggests that the gene flow between India and Nepal has been reciprocal.
South Asia; 9bp indel; mtDNA; Haplogroup
Human genetic diversity observed in Indian subcontinent is second only to that of Africa. This implies an early settlement and demographic growth soon after the first 'Out-of-Africa' dispersal of anatomically modern humans in Late Pleistocene. In contrast to this perspective, linguistic diversity in India has been thought to derive from more recent population movements and episodes of contact. With the exception of Dravidian, which origin and relatedness to other language phyla is obscure, all the language families in India can be linked to language families spoken in different regions of Eurasia. Mitochondrial DNA and Y chromosome evidence has supported largely local evolution of the genetic lineages of the majority of Dravidian and Indo-European speaking populations, but there is no consensus yet on the question of whether the Munda (Austro-Asiatic) speaking populations originated in India or derive from a relatively recent migration from further East.
Here, we report the analysis of 35 novel complete mtDNA sequences from India which refine the structure of Indian-specific varieties of haplogroup R. Detailed analysis of haplogroup R7, coupled with a survey of ~12,000 mtDNAs from caste and tribal groups over the entire Indian subcontinent, reveals that one of its more recently derived branches (R7a1), is particularly frequent among Munda-speaking tribal groups. This branch is nested within diverse R7 lineages found among Dravidian and Indo-European speakers of India. We have inferred from this that a subset of Munda-speaking groups have acquired R7 relatively recently. Furthermore, we find that the distribution of R7a1 within the Munda-speakers is largely restricted to one of the sub-branches (Kherwari) of northern Munda languages. This evidence does not support the hypothesis that the Austro-Asiatic speakers are the primary source of the R7 variation. Statistical analyses suggest a significant correlation between genetic variation and geography, rather than between genes and languages.
Our high-resolution phylogeographic study, involving diverse linguistic groups in India, suggests that the high frequency of mtDNA haplogroup R7 among Munda speaking populations of India can be explained best by gene flow from linguistically different populations of Indian subcontinent. The conclusion is based on the observation that among Indo-Europeans, and particularly in Dravidians, the haplogroup is, despite its lower frequency, phylogenetically more divergent, while among the Munda speakers only one sub-clade of R7, i.e. R7a1, can be observed. It is noteworthy that though R7 is autochthonous to India, and arises from the root of hg R, its distribution and phylogeography in India is not uniform. This suggests the more ancient establishment of an autochthonous matrilineal genetic structure, and that isolation in the Pleistocene, lineage loss through drift, and endogamy of prehistoric and historic groups have greatly inhibited genetic homogenization and geographical uniformity.
We have analyzed 7,137 samples from 125 different caste, tribal and religious groups of India and 99 samples from three populations of Nepal for the length variation in the COII/tRNALys region of mtDNA. Samples showing length variation were subjected to detailed phylogenetic analysis based on HVS-I and informative coding region sequence variation. The overall frequencies of the 9-bp deletion and insertion variants in South Asia were 1.9 and 0.6%, respectively. We have also defined a novel deep-rooting haplogroup M43 and identified the rare haplogroup H14 in Indian populations carrying the 9-bp deletion by complete mtDNA sequencing. Moreover, we redefined haplogroup M6 and dissected it into two well-defined subclades. The presence of haplogroups F1 and B5a in Uttar Pradesh suggests minor maternal contribution from Southeast Asia to Northern India. The occurrence of haplogroup F1 in the Nepalese sample implies that Nepal might have served as a bridge for the flow of eastern lineages to India. The presence of R6 in the Nepalese, on the other hand, suggests that the gene flow between India and Nepal has been reciprocal.
South Asia; 9bp indel; mtDNA; Haplogroup
Native Americans derive from a small number of Asian founders who likely arrived to the Americas via Beringia. However, additional details about the intial colonization of the Americas remain unclear. To investigate the pioneering phase in the Americas we analyzed a total of 623 complete mtDNAs from the Americas and Asia, including 20 new complete mtDNAs from the Americas and seven from Asia. This sequence data was used to direct high-resolution genotyping from 20 American and 26 Asian populations. Here we describe more genetic diversity within the founder population than was previously reported. The newly resolved phylogenetic structure suggests that ancestors of Native Americans paused when they reached Beringia, during which time New World founder lineages differentiated from their Asian sister-clades. This pause in movement was followed by a swift migration southward that distributed the founder types all the way to South America. The data also suggest more recent bi-directional gene flow between Siberia and the North American Arctic.
The issue of errors in genetic data sets is of growing concern, particularly in population genetics where whole genome mtDNA sequence data is coming under increased scrutiny. Multiplexed PCR reactions, combined with SNP typing, are currently under-exploited in this context, but have the potential to genotype whole populations rapidly and accurately, significantly reducing the amount of errors appearing in published data sets. To show the sensitivity of this technique for screening mtDNA genomic sequence data, 20 historic samples of the enigmatic Andaman Islanders and 12 modern samples from three Indian tribal populations (Chenchu, Lambadi and Lodha) were genotyped for 20 coding region sites after provisional haplogroup assignment with control region sequences. The genotype data from the historic samples significantly revise the topologies for the Andaman M31 and M32 mtDNA lineages by rectifying conflicts in published data sets. The new Indian data extend the distribution of the M31a lineage to South Asia, challenging previous interpretations of mtDNA phylogeography. This genetic connection between the ancestors of the Andamanese and South Asian tribal groups ∼30 kya has important implications for the debate concerning migration routes and settlement patterns of humans leaving Africa during the late Pleistocene, and indicates the need for more detailed genotyping strategies. The methodology serves as a low-cost, high-throughput model for the production and authentication of data from modern or ancient DNA, and demonstrates the value of museum collections as important records of human genetic diversity.