The Y-chromosome haplogroup N-M231 (Hg N) is distributed widely in eastern and central Asia, Siberia, as well as in eastern and northern Europe. Previous studies suggested a counterclockwise prehistoric migration of Hg N from eastern Asia to eastern and northern Europe. However, the root of this Y chromosome lineage and its detailed dispersal pattern across eastern Asia are still unclear. We analyzed haplogroup profiles and phylogeographic patterns of 1,570 Hg N individuals from 20,826 males in 359 populations across Eurasia. We first genotyped 6,371 males from 169 populations in China and Cambodia, and generated data of 360 Hg N individuals, and then combined published data on 1,210 Hg N individuals from Japanese, Southeast Asian, Siberian, European and Central Asian populations. The results showed that the sub-haplogroups of Hg N have a distinct geographical distribution. The highest Y-STR diversity of the ancestral Hg N sub-haplogroups was observed in the southern part of mainland East Asia, and further phylogeographic analyses supports an origin of Hg N in southern China. Combined with previous data, we propose that the early northward dispersal of Hg N started from southern China about 21 thousand years ago (kya), expanding into northern China 12–18 kya, and reaching further north to Siberia about 12–14 kya before a population expansion and westward migration into Central Asia and eastern/northern Europe around 8.0–10.0 kya. This northward migration of Hg N likewise coincides with retreating ice sheets after the Last Glacial Maximum (22–18 kya) in mainland East Asia.
North East Europe harbors a high diversity of cultures and languages, suggesting a complex genetic history. Archaeological, anthropological, and genetic research has revealed a series of influences from Western and Eastern Eurasia in the past. While genetic data from modern-day populations is commonly used to make inferences about their origins and past migrations, ancient DNA provides a powerful test of such hypotheses by giving a snapshot of the past genetic diversity. In order to better understand the dynamics that have shaped the gene pool of North East Europeans, we generated and analyzed 34 mitochondrial genotypes from the skeletal remains of three archaeological sites in northwest Russia. These sites were dated to the Mesolithic and the Early Metal Age (7,500 and 3,500 uncalibrated years Before Present). We applied a suite of population genetic analyses (principal component analysis, genetic distance mapping, haplotype sharing analyses) and compared past demographic models through coalescent simulations using Bayesian Serial SimCoal and Approximate Bayesian Computation. Comparisons of genetic data from ancient and modern-day populations revealed significant changes in the mitochondrial makeup of North East Europeans through time. Mesolithic foragers showed high frequencies and diversity of haplogroups U (U2e, U4, U5a), a pattern observed previously in European hunter-gatherers from Iberia to Scandinavia. In contrast, the presence of mitochondrial DNA haplogroups C, D, and Z in Early Metal Age individuals suggested discontinuity with Mesolithic hunter-gatherers and genetic influx from central/eastern Siberia. We identified remarkable genetic dissimilarities between prehistoric and modern-day North East Europeans/Saami, which suggests an important role of post-Mesolithic migrations from Western Europe and subsequent population replacement/extinctions. This work demonstrates how ancient DNA can improve our understanding of human population movements across Eurasia. It contributes to the description of the spatio-temporal distribution of mitochondrial diversity and will be of significance for future reconstructions of the history of Europeans.
The history of human populations can be retraced by studying the archaeological and anthropological record, but also by examining the current distribution of genetic markers, such as the maternally inherited mitochondrial DNA. Ancient DNA research allows the retrieval of DNA from ancient skeletal remains and contributes to the reconstruction of the human population history through the comparison of ancient and present-day genetic data. Here, we analysed the mitochondrial DNA of prehistoric remains from archaeological sites dated to 7,500 and 3,500 years Before Present. These sites are located in North East Europe, a region that displays a significant cultural and linguistic diversity today but for which no ancient human DNA was available before. We show that prehistoric hunter-gatherers of North East Europe were genetically similar to other European foragers. We also detected a prehistoric genetic input from Siberia, followed by migrations from Western Europe into North East Europe. Our research contributes to the understanding of the origins and past dynamics of human population in Europe.
Sakha – an area connecting South and Northeast Siberia – is significant for understanding the history of peopling of Northeast Eurasia and the Americas. Previous studies have shown a genetic contiguity between Siberia and East Asia and the key role of South Siberia in the colonization of Siberia.
We report the results of a high-resolution phylogenetic analysis of 701 mtDNAs and 318 Y chromosomes from five native populations of Sakha (Yakuts, Evenks, Evens, Yukaghirs and Dolgans) and of the analysis of more than 500,000 autosomal SNPs of 758 individuals from 55 populations, including 40 previously unpublished samples from Siberia. Phylogenetically terminal clades of East Asian mtDNA haplogroups C and D and Y-chromosome haplogroups N1c, N1b and C3, constituting the core of the gene pool of the native populations from Sakha, connect Sakha and South Siberia. Analysis of autosomal SNP data confirms the genetic continuity between Sakha and South Siberia. Maternal lineages D5a2a2, C4a1c, C4a2, C5b1b and the Yakut-specific STR sub-clade of Y-chromosome haplogroup N1c can be linked to a migration of Yakut ancestors, while the paternal lineage C3c was most likely carried to Sakha by the expansion of the Tungusic people. MtDNA haplogroups Z1a1b and Z1a3, present in Yukaghirs, Evens and Dolgans, show traces of different and probably more ancient migration(s). Analysis of both haploid loci and autosomal SNP data revealed only minor genetic components shared between Sakha and the extreme Northeast Siberia. Although the major part of West Eurasian maternal and paternal lineages in Sakha could originate from recent admixture with East Europeans, mtDNA haplogroups H8, H20a and HV1a1a, as well as Y-chromosome haplogroup J, more probably reflect an ancient gene flow from West Eurasia through Central Asia and South Siberia.
Our high-resolution phylogenetic dissection of mtDNA and Y-chromosome haplogroups as well as analysis of autosomal SNP data suggests that Sakha was colonized by repeated expansions from South Siberia with minor gene flow from the Lower Amur/Southern Okhotsk region and/or Kamchatka. The minor West Eurasian component in Sakha attests to both recent and ongoing admixture with East Europeans and an ancient gene flow from West Eurasia.
mtDNA; Y chromosome; Autosomal SNPs; Sakha
To better define the structure and origin of the Bulgarian paternal gene pool, we have examined the Y-chromosome variation in 808 Bulgarian males. The analysis was performed by high-resolution genotyping of biallelic markers and by analyzing the STR variation within the most informative haplogroups. We found that the Y-chromosome gene pool in modern Bulgarians is primarily represented by Western Eurasian haplogroups with ∼ 40% belonging to haplogroups E-V13 and I-M423, and 20% to R-M17. Haplogroups common in the Middle East (J and G) and in South Western Asia (R-L23*) occur at frequencies of 19% and 5%, respectively. Haplogroups C, N and Q, distinctive for Altaic and Central Asian Turkic-speaking populations, occur at the negligible frequency of only 1.5%. Principal Component analyses group Bulgarians with European populations, apart from Central Asian Turkic-speaking groups and South Western Asia populations. Within the country, the genetic variation is structured in Western, Central and Eastern Bulgaria indicating that the Balkan Mountains have been permeable to human movements. The lineage analysis provided the following interesting results: (i) R-L23* is present in Eastern Bulgaria since the post glacial period; (ii) haplogroup E-V13 has a Mesolithic age in Bulgaria from where it expanded after the arrival of farming; (iii) haplogroup J-M241 probably reflects the Neolithic westward expansion of farmers from the earliest sites along the Black Sea. On the whole, in light of the most recent historical studies, which indicate a substantial proto-Bulgarian input to the contemporary Bulgarian people, our data suggest that a common paternal ancestry between the proto-Bulgarians and the Altaic and Central Asian Turkic-speaking populations either did not exist or was negligible.
Human origins and migration models proposing the Horn of Africa as a prehistoric exit route to Asia have stimulated molecular genetic studies in the region using uniparental loci. However, from a Y-chromosome perspective, Saudi Arabia, the largest country of the region, has not yet been surveyed. To address this gap, a sample of 157 Saudi males was analyzed at high resolution using 67 Y-chromosome binary markers. In addition, haplotypic diversity for its most prominent J1-M267 lineage was estimated using a set of 17 Y-specific STR loci.
Saudi Arabia differentiates from other Arabian Peninsula countries by a higher presence of J2-M172 lineages. It is significantly different from Yemen mainly due to a comparative reduction of sub-Saharan Africa E1-M123 and Levantine J1-M267 male lineages. Around 14% of the Saudi Arabia Y-chromosome pool is typical of African biogeographic ancestry, 17% arrived to the area from the East across Iran, while the remainder 69% could be considered of direct or indirect Levantine ascription. Interestingly, basal E-M96* (n = 2) and J-M304* (n = 3) lineages have been detected, for the first time, in the Arabian Peninsula. Coalescence time for the most prominent J1-M267 haplogroup in Saudi Arabia (11.6 ± 1.9 ky) is similar to that obtained previously for Yemen (11.3 ± 2) but significantly older that those estimated for Qatar (7.3 ± 1.8) and UAE (6.8 ± 1.5).
The Y-chromosome genetic structure of the Arabian Peninsula seems to be mainly modulated by geography. The data confirm that this area has mainly been a recipient of gene flow from its African and Asian surrounding areas, probably mainly since the last Glacial maximum onwards. Although rare deep rooting lineages for Y-chromosome haplogroups E and J have been detected, the presence of more basal clades supportive of the southern exit route of modern humans to Eurasian, were not found.
More than a half of the northern Asian pool of human mitochondrial DNA (mtDNA) is fragmented into a number of subclades of haplogroups C and D, two of the most frequent haplogroups throughout northern, eastern, central Asia and America. While there has been considerable recent progress in studying mitochondrial variation in eastern Asia and America at the complete genome resolution, little comparable data is available for regions such as southern Siberia – the area where most of northern Asian haplogroups, including C and D, likely diversified. This gap in our knowledge causes a serious barrier for progress in understanding the demographic pre-history of northern Eurasia in general. Here we describe the phylogeography of haplogroups C and D in the populations of northern and eastern Asia. We have analyzed 770 samples from haplogroups C and D (174 and 596, respectively) at high resolution, including 182 novel complete mtDNA sequences representing haplogroups C and D (83 and 99, respectively). The present-day variation of haplogroups C and D suggests that these mtDNA clades expanded before the Last Glacial Maximum (LGM), with their oldest lineages being present in the eastern Asia. Unlike in eastern Asia, most of the northern Asian variants of haplogroups C and D began the expansion after the LGM, thus pointing to post-glacial re-colonization of northern Asia. Our results show that both haplogroups were involved in migrations, from eastern Asia and southern Siberia to eastern and northeastern Europe, likely during the middle Holocene.
Although human Y chromosomes belonging to haplogroup R1b are quite rare in Africa, being found mainly in Asia and Europe, a group of chromosomes within the paragroup R-P25* are found concentrated in the central-western part of the African continent, where they can be detected at frequencies as high as 95%. Phylogenetic evidence and coalescence time estimates suggest that R-P25* chromosomes (or their phylogenetic ancestor) may have been carried to Africa by an Asia-to-Africa back migration in prehistoric times. Here, we describe six new mutations that define the relationships among the African R-P25* Y chromosomes and between these African chromosomes and earlier reported R-P25 Eurasian sub-lineages. The incorporation of these new mutations into a phylogeny of the R1b haplogroup led to the identification of a new clade (R1b1a or R-V88) encompassing all the African R-P25* and about half of the few European/west Asian R-P25* chromosomes. A worldwide phylogeographic analysis of the R1b haplogroup provided strong support to the Asia-to-Africa back-migration hypothesis. The analysis of the distribution of the R-V88 haplogroup in >1800 males from 69 African populations revealed a striking genetic contiguity between the Chadic-speaking peoples from the central Sahel and several other Afroasiatic-speaking groups from North Africa. The R-V88 coalescence time was estimated at 9200–5600 kya, in the early mid Holocene. We suggest that R-V88 is a paternal genetic record of the proposed mid-Holocene migration of proto-Chadic Afroasiatic speakers through the Central Sahara into the Lake Chad Basin, and geomorphological evidence is consistent with this view.
Y chromosome haplogroups; human migrations; Holocene; Africa; Chadic-speaking populations
Numerous studies of human populations in Europe and Asia have revealed a concordance between their extant genetic structure and the prevailing regional pattern of geography and language. For native South Americans, however, such evidence has been lacking so far. Therefore, we examined the relationship between Y-chromosomal genotype on the one hand, and male geographic origin and linguistic affiliation on the other, in the largest study of South American natives to date in terms of sampled individuals and populations. A total of 1,011 individuals, representing 50 tribal populations from 81 settlements, were genotyped for up to 17 short tandem repeat (STR) markers and 16 single nucleotide polymorphisms (Y-SNPs), the latter resolving phylogenetic lineages Q and C. Virtually no structure became apparent for the extant Y-chromosomal genetic variation of South American males that could sensibly be related to their inter-tribal geographic and linguistic relationships. This continent-wide decoupling is consistent with a rapid peopling of the continent followed by long periods of isolation in small groups. Furthermore, for the first time, we identified a distinct geographical cluster of Y-SNP lineages C-M217 (C3*) in South America. Such haplotypes are virtually absent from North and Central America, but occur at high frequency in Asia. Together with the locally confined Y-STR autocorrelation observed in our study as a whole, the available data therefore suggest a late introduction of C3* into South America no more than 6,000 years ago, perhaps via coastal or trans-Pacific routes. Extensive simulations revealed that the observed lack of haplogroup C3* among extant North and Central American natives is only compatible with low levels of migration between the ancestor populations of C3* carriers and non-carriers. In summary, our data highlight the fact that a pronounced correlation between genetic and geographic/cultural structure can only be expected under very specific conditions, most of which are likely not to have been met by the ancestors of native South Americans.
In the largest population genetic study of South Americans to date, we analyzed the Y-chromosomal makeup of more than 1,000 male natives. We found that the male-specific genetic variation of Native Americans lacks any clear structure that could sensibly be related to their geographic and/or linguistic relationships. This finding is consistent with a rapid initial peopling of South America, followed by long periods of isolation in small tribal groups. The observed continent-wide decoupling of geography, spoken language, and genetics contrasts strikingly with previous reports of such correlation from many parts of Europe and Asia. Moreover, we identified a cluster of Native American founding lineages of Y chromosomes, called C-M217 (C3*), within a restricted area of Ecuador in North-Western South America. The same haplogroup occurs at high frequency in Central, East, and North East Asia, but is virtually absent from North (except Alaska) and Central America. Possible scenarios for the introduction of C-M217 (C3*) into Ecuador may thus include a coastal or trans-Pacific route, an idea also supported by occasional archeological evidence and the recent coalescence of the C3* haplotypes, estimated from our data to have occurred some 6,000 years ago.
The origins of the First Americans remain contentious. Although Native Americans
seem to be genetically most closely related to east Asians1–3, there is no
consensus with regard to which specific Old World populations they are closest
to4–8. Here we sequence the draft genome of an approximately 24,000-year-old
individual (MA-1), from Mal’ta in south-central Siberia9, to an average depth of 13. To our knowledge this is the
oldest anatomically modern human genome reported to date. The MA-1 mitochondrial genome
belongs to haplogroup U, which has also been found at high frequency among Upper
Palaeolithic and Mesolithic European hunter-gatherers10–12, and the Y
chromosome of MA-1 is basal to modern-day western Eurasians and near the root of most
Native American lineages5. Similarly, we
find autosomal evidence that MA-1 is basal to modern-day western Eurasians and genetically
closely related to modern-day Native Americans, with no close affinity to east Asians.
This suggests that populations related to contemporary western Eurasians had a more
north-easterly distribution 24,000 years ago than commonly thought. Furthermore, we
estimate that 14 to 38% of Native American ancestry may originate through gene
flow from this ancient population. This is likely to have occurred after the divergence of
Native American ancestors from east Asian ancestors, but before the diversification of
Native American populations in the New World. Gene flow from the MA-1 lineage into Native
American ancestors could explain why several crania from the First Americans have been
reported as bearing morphological characteristics that do not resemble those of east
Asians2,13. Sequencing of another south-central Siberian, Afontova Gora-2 dating
to approximately 17,000 years ago14,
revealed similar autosomal genetic signatures as MA-1, suggesting that the region was
continuously occupied by humans throughout the Last Glacial Maximum. Our findings reveal
that western Eurasian genetic signatures in modern-day Native Americans derive not only
from post-Columbian admixture, as commonly thought, but also from a mixed ancestry of the
The chamois, distributed over most of the medium to high altitude mountain ranges of southern Eurasia, provides an excellent model for exploring the effects of historical and evolutionary events on diversification. Populations have been grouped into two species, Rupicapra pyrenaica from southwestern Europe and R. rupicapra from eastern Europe. The study of matrilineal mitochondrial DNA (mtDNA) and biparentally inherited microsatellites showed that the two species are paraphyletic and indicated alternate events of population contraction and dispersal-hybridization in the diversification of chamois. Here we investigate the pattern of variation of the Y-chromosome to obtain information on the patrilineal phylogenetic position of the genus Rupicapra and on the male-specific dispersal of chamois across Europe.
We analyzed the Y-chromosome of 87 males covering the distribution range of the Rupicapra genus. We sequenced a fragment of the SRY gene promoter and characterized the male specific microsatellites UMN2303 and SRYM18. The SRY promoter sequences of two samples of Barbary sheep (Ammotragus lervia) were also determined and compared with the sequences of Bovidae available in the GenBank. Phylogenetic analysis of the alignment showed the clustering of Rupicapra with Capra and the Ammotragus sequence obtained in this study, different from the previously reported sequence of Ammotragus which groups with Ovis. Within Rupicapra, the combined data define 10 Y-chromosome haplotypes forming two haplogroups, which concur with taxonomic classification, instead of the three clades formed for mtDNA and nuclear microsatellites. The variation shows a west-to-east geographical cline of ancestral to derived alleles.
The phylogeny of the SRY-promoter shows an association between Rupicapra and Capra. The position of Ammotragus needs a reinvestigation. The study of ancestral and derived characters in the Y-chromosome suggests that, contrary to the presumed Asian origin, the paternal lineage of chamois originated in the Mediterranean, most probably in the Iberian Peninsula, and dispersed eastwards through serial funding events during the glacial-interglacial cycles of the Quaternary. The diversity of Y-chromosomes in chamois is very low. The differences in patterns of variation among Y-chromosome, mtDNA and biparental microsatellites reflect the evolutionary characteristics of the different markers as well as the effects of sex-biased dispersal and species phylogeography.
Ethnic Belarusians make up more than 80% of the nine and half million people inhabiting the Republic of Belarus. Belarusians together with Ukrainians and Russians represent the East Slavic linguistic group, largest both in numbers and territory, inhabiting East Europe alongside Baltic-, Finno-Permic- and Turkic-speaking people. Till date, only a limited number of low resolution genetic studies have been performed on this population. Therefore, with the phylogeographic analysis of 565 Y-chromosomes and 267 mitochondrial DNAs from six well covered geographic sub-regions of Belarus we strove to complement the existing genetic profile of eastern Europeans. Our results reveal that around 80% of the paternal Belarusian gene pool is composed of R1a, I2a and N1c Y-chromosome haplogroups – a profile which is very similar to the two other eastern European populations – Ukrainians and Russians. The maternal Belarusian gene pool encompasses a full range of West Eurasian haplogroups and agrees well with the genetic structure of central-east European populations. Our data attest that latitudinal gradients characterize the variation of the uniparentally transmitted gene pools of modern Belarusians. In particular, the Y-chromosome reflects movements of people in central-east Europe, starting probably as early as the beginning of the Holocene. Furthermore, the matrilineal legacy of Belarusians retains two rare mitochondrial DNA haplogroups, N1a3 and N3, whose phylogeographies were explored in detail after de novo sequencing of 20 and 13 complete mitogenomes, respectively, from all over Eurasia. Our phylogeographic analyses reveal that two mitochondrial DNA lineages, N3 and N1a3, both of Middle Eastern origin, might mark distinct events of matrilineal gene flow to Europe: during the mid-Holocene period and around the Pleistocene-Holocene transition, respectively.
Human Y chromosomes belonging to the haplogroup R1b1-P25, although very common in Europe, are usually rare in Africa. However, recently published studies have reported high frequencies of this haplogroup in the central-western region of the African continent and proposed that this represents a ‘back-to-Africa' migration during prehistoric times. To obtain a deeper insight into the history of these lineages, we characterised the paternal genetic background of a population in Equatorial Guinea, a Central-West African country located near the region in which the highest frequencies of the R1b1 haplogroup in Africa have been found to date. In our sample, the large majority (78.6%) of the sequences belong to subclades in haplogroup E, which are the most frequent in Bantu groups. However, the frequency of the R1b1 haplogroup in our sample (17.0%) was higher than that previously observed for the majority of the African continent. Of these R1b1 samples, nine are defined by the V88 marker, which was recently discovered in Africa. As high microsatellite variance was found inside this haplogroup in Central-West Africa and a decrease in this variance was observed towards Northeast Africa, our findings do not support the previously hypothesised movement of Chadic-speaking people from the North across the Sahara as the explanation for these R1b1 lineages in Central-West Africa. The present findings are also compatible with an origin of the V88-derived allele in the Central-West Africa, and its presence in North Africa may be better explained as the result of a migration from the south during the mid-Holocene.
Central-West Africa; Equatorial Guinea; human male lineages; Y chromosome; haplogroup R-V88; back to Africa hypothesis
The genetic impact associated to the Neolithic spread in Europe has been widely debated over the last 20 years. Within this context, ancient DNA studies have provided a more reliable picture by directly analyzing the protagonist populations at different regions in Europe. However, the lack of available data from the original Near Eastern farmers has limited the achieved conclusions, preventing the formulation of continental models of Neolithic expansion. Here we address this issue by presenting mitochondrial DNA data of the original Near-Eastern Neolithic communities with the aim of providing the adequate background for the interpretation of Neolithic genetic data from European samples. Sixty-three skeletons from the Pre Pottery Neolithic B (PPNB) sites of Tell Halula, Tell Ramad and Dja'de El Mughara dating between 8,700–6,600 cal. B.C. were analyzed, and 15 validated mitochondrial DNA profiles were recovered. In order to estimate the demographic contribution of the first farmers to both Central European and Western Mediterranean Neolithic cultures, haplotype and haplogroup diversities in the PPNB sample were compared using phylogeographic and population genetic analyses to available ancient DNA data from human remains belonging to the Linearbandkeramik-Alföldi Vonaldiszes Kerámia and Cardial/Epicardial cultures. We also searched for possible signatures of the original Neolithic expansion over the modern Near Eastern and South European genetic pools, and tried to infer possible routes of expansion by comparing the obtained results to a database of 60 modern populations from both regions. Comparisons performed among the 3 ancient datasets allowed us to identify K and N-derived mitochondrial DNA haplogroups as potential markers of the Neolithic expansion, whose genetic signature would have reached both the Iberian coasts and the Central European plain. Moreover, the observed genetic affinities between the PPNB samples and the modern populations of Cyprus and Crete seem to suggest that the Neolithic was first introduced into Europe through pioneer seafaring colonization.
Since the original human expansions out of Africa 200,000 years ago, different prehistoric and historic migration events have taken place in Europe. Considering that the movement of the people implies a consequent movement of their genes, it is possible to estimate the impact of these migrations through the genetic analysis of human populations. Agricultural and husbandry practices originated 10,000 years ago in a region of the Near East known as the Fertile Crescent. According to the archaeological record this phenomenon, known as “Neolithic”, rapidly expanded from these territories into Europe. However, whether this diffusion was accompanied or not by human migrations is greatly debated. In the present work, mitochondrial DNA –a type of maternally inherited DNA located in the cell cytoplasm- from the first Near Eastern Neolithic populations was recovered and compared to available data from other Neolithic populations in Europe and also to modern populations from South Eastern Europe and the Near East. The obtained results show that substantial human migrations were involved in the Neolithic spread and suggest that the first Neolithic farmers entered Europe following a maritime route through Cyprus and the Aegean Islands.
South Asia possesses a significant amount of genetic diversity due to considerable intergroup differences in culture and language. There have been numerous reports on the genetic structure of Asian Indians, although these have mostly relied on genotyping microarrays or targeted sequencing of the mitochondria and Y chromosomes. Asian Indians in Singapore are primarily descendants of immigrants from Dravidian-language–speaking states in south India, and 38 individuals from the general population underwent deep whole-genome sequencing with a target coverage of 30X as part of the Singapore Sequencing Indian Project (SSIP). The genetic structure and diversity of these samples were compared against samples from the Singapore Sequencing Malay Project and populations in Phase 1 of the 1,000 Genomes Project (1 KGP). SSIP samples exhibited greater intra-population genetic diversity and possessed higher heterozygous-to-homozygous genotype ratio than other Asian populations. When compared against a panel of well-defined Asian Indians, the genetic makeup of the SSIP samples was closely related to South Indians. However, even though the SSIP samples clustered distinctly from the Europeans in the global population structure analysis with autosomal SNPs, eight samples were assigned to mitochondrial haplogroups that were predominantly present in Europeans and possessed higher European admixture than the remaining samples. An analysis of the relative relatedness between SSIP with two archaic hominins (Denisovan, Neanderthal) identified higher ancient admixture in East Asian populations than in SSIP. The data resource for these samples is publicly available and is expected to serve as a valuable complement to the South Asian samples in Phase 3 of 1 KGP.
Indians of South Asia has long been a population of interest to a wide audience, due to its unique diversity. We have deep-sequenced 38 individuals of Indian descent residing in Singapore (SSIP) in an effort to illustrate their diversity from a whole-genome standpoint. Indeed, among Asians in our population panel, SSIP was most diverse, followed by the Malays in Singapore (SSMP). Their diversity is further observed in the population's chromosome Y haplogroup and mitochondria haplogroup profiles; individuals with European-dominant haplogroups had greater proportion of European admixture. Among variants (single nucleotide polymorphism and small insertions/deletions) discovered in SSIP, 21.69% were novel with respect to previous sequencing projects. In addition, some 14 loss-of-function variants (LOFs) were associated to cancer, Type II diabetes, and cholesterol levels. Finally, D statistic test with ancient hominids concurred that there was gene flow to East Asians compared to South Asians.
Koreans are generally considered a Northeast Asian group, thought to be related to Altaic-language-speaking populations. However, recent findings have indicated that the peopling of Korea might have been more complex, involving dual origins from both southern and northern parts of East Asia. To understand the male lineage history of Korea, more data from informative genetic markers from Korea and its surrounding regions are necessary. In this study, 25 Y-chromosome single nucleotide polymorphism markers and 17 Y-chromosome short tandem repeat (Y-STR) loci were genotyped in 1,108 males from several populations in East Asia.
In general, we found East Asian populations to be characterized by male haplogroup homogeneity, showing major Y-chromosomal expansions of haplogroup O-M175 lineages. Interestingly, a high frequency (31.4%) of haplogroup O2b-SRY465 (and its sublineage) is characteristic of male Koreans, whereas the haplogroup distribution elsewhere in East Asian populations is patchy. The ages of the haplogroup O2b-SRY465 lineages (~9,900 years) and the pattern of variation within the lineages suggested an ancient origin in a nearby part of northeastern Asia, followed by an expansion in the vicinity of the Korean Peninsula. In addition, the coalescence time (~4,400 years) for the age of haplogroup O2b1-47z, and its Y-STR diversity, suggest that this lineage probably originated in Korea. Further studies with sufficiently large sample sizes to cover the vast East Asian region and using genomewide genotyping should provide further insights.
These findings are consistent with linguistic, archaeological and historical evidence, which suggest that the direct ancestors of Koreans were proto-Koreans who inhabited the northeastern region of China and the Korean Peninsula during the Neolithic (8,000-1,000 BC) and Bronze (1,500-400 BC) Ages.
The first farmers from Central Europe reveal a genetic affinity to modern-day populations from the Near East and Anatolia, which suggests a significant demographic input from this area during the early Neolithic.
In Europe, the Neolithic transition (8,000–4,000 b.c.) from hunting and gathering to agricultural communities was one of the most important demographic events since the initial peopling of Europe by anatomically modern humans in the Upper Paleolithic (40,000 b.c.). However, the nature and speed of this transition is a matter of continuing scientific debate in archaeology, anthropology, and human population genetics. To date, inferences about the genetic make up of past populations have mostly been drawn from studies of modern-day Eurasian populations, but increasingly ancient DNA studies offer a direct view of the genetic past. We genetically characterized a population of the earliest farming culture in Central Europe, the Linear Pottery Culture (LBK; 5,500–4,900 calibrated b.c.) and used comprehensive phylogeographic and population genetic analyses to locate its origins within the broader Eurasian region, and to trace potential dispersal routes into Europe. We cloned and sequenced the mitochondrial hypervariable segment I and designed two powerful SNP multiplex PCR systems to generate new mitochondrial and Y-chromosomal data from 21 individuals from a complete LBK graveyard at Derenburg Meerenstieg II in Germany. These results considerably extend the available genetic dataset for the LBK (n = 42) and permit the first detailed genetic analysis of the earliest Neolithic culture in Central Europe (5,500–4,900 calibrated b.c.). We characterized the Neolithic mitochondrial DNA sequence diversity and geographical affinities of the early farmers using a large database of extant Western Eurasian populations (n = 23,394) and a wide range of population genetic analyses including shared haplotype analyses, principal component analyses, multidimensional scaling, geographic mapping of genetic distances, and Bayesian Serial Simcoal analyses. The results reveal that the LBK population shared an affinity with the modern-day Near East and Anatolia, supporting a major genetic input from this area during the advent of farming in Europe. However, the LBK population also showed unique genetic features including a clearly distinct distribution of mitochondrial haplogroup frequencies, confirming that major demographic events continued to take place in Europe after the early Neolithic.
The transition from a hunter–gatherer existence to a sedentary farming-based lifestyle has had key consequences for human groups around the world and has profoundly shaped human societies. Originating in the Near East around 11,000 y ago, an agricultural lifestyle subsequently spread across Europe during the New Stone Age (Neolithic). Whether it was mediated by incoming farmers or driven by the transmission of innovative ideas and techniques remains a subject of continuing debate in archaeology, anthropology, and human population genetics. Ancient DNA from the earliest farmers can provide a direct view of the genetic diversity of these populations in the earliest Neolithic. Here, we compare Neolithic haplogroups and their diversity to a large database of extant European and Eurasian populations. We identified Neolithic haplotypes that left clear traces in modern populations, and the data suggest a route for the migrating farmers that extends from the Near East and Anatolia into Central Europe. When compared to indigenous hunter–gatherer populations, the unique and characteristic genetic signature of the early farmers suggests a significant demographic input from the Near East during the onset of farming in Europe.
The picture of dog mtDNA diversity, as obtained from geographically wide samplings but from a small number of individuals per region or breed, has revealed weak geographic correlation and high degree of haplotype sharing between very distant breeds. We aimed at a more detailed picture through extensive sampling (n = 143) of four Portuguese autochthonous breeds – Castro Laboreiro Dog, Serra da Estrela Mountain Dog, Portuguese Sheepdog and Azores Cattle Dog-and comparatively reanalysing published worldwide data.
Fifteen haplotypes belonging to four major haplogroups were found in these breeds, of which five are newly reported. The Castro Laboreiro Dog presented a 95% frequency of a new A haplotype, while all other breeds contained a diverse pool of existing lineages. The Serra da Estrela Mountain Dog, the most heterogeneous of the four Portuguese breeds, shared haplotypes with the other mainland breeds, while Azores Cattle Dog shared no haplotypes with the other Portuguese breeds.
A review of mtDNA haplotypes in dogs across the world revealed that: (a) breeds tend to display haplotypes belonging to different haplogroups; (b) haplogroup A is present in all breeds, and even uncommon haplogroups are highly dispersed among breeds and continental areas; (c) haplotype sharing between breeds of the same region is lower than between breeds of different regions and (d) genetic distances between breeds do not correlate with geography.
MtDNA haplotype sharing occurred between Serra da Estrela Mountain dogs (with putative origin in the centre of Portugal) and two breeds in the north and south of the country-with the Castro Laboreiro Dog (which behaves, at the mtDNA level, as a sub-sample of the Serra da Estrela Mountain Dog) and the southern Portuguese Sheepdog. In contrast, the Azores Cattle Dog did not share any haplotypes with the other Portuguese breeds, but with dogs sampled in Northern Europe. This suggested that the Azores Cattle Dog descended maternally from Northern European dogs rather than Portuguese mainland dogs. A review of published mtDNA haplotypes identified thirteen non-Portuguese breeds with sufficient data for comparison. Comparisons between these thirteen breeds, and the four Portuguese breeds, demonstrated widespread haplotype sharing, with the greatest diversity among Asian dogs, in accordance with the central role of Asia in canine domestication.
Archaeological studies have revealed a series of cultural changes around the Last Glacial Maximum in East Asia; whether these changes left any signatures in the gene pool of East Asians remains poorly indicated. To achieve deeper insights into the demographic history of modern humans in East Asia around the Last Glacial Maximum, we extensively analyzed mitochondrial DNA haplogroup M9a'b, a specific haplogroup that was suggested to have some potential for tracing the migration around the Last Glacial Maximum in East Eurasia.
A total of 837 M9a'b mitochondrial DNAs (583 from the literature, while the remaining 254 were newly collected in this study) pinpointed from over 28,000 subjects residing across East Eurasia were studied here. Fifty-nine representative samples were further selected for total mitochondrial DNA sequencing so we could better understand the phylogeny within M9a'b. Based on the updated phylogeny, an extensive phylogeographic analysis was carried out to reveal the differentiation of haplogroup M9a'b and to reconstruct the dispersal histories.
Our results indicated that southern China and/or Southeast Asia likely served as the source of some post-Last Glacial Maximum dispersal(s). The detailed dissection of haplogroup M9a'b revealed the existence of an inland dispersal in mainland East Asia during the post-glacial period. It was this dispersal that expanded not only to western China but also to northeast India and the south Himalaya region. A similar phylogeographic distribution pattern was also observed for haplogroup F1c, thus substantiating our proposition. This inland post-glacial dispersal was in agreement with the spread of the Mesolithic culture originating in South China and northern Vietnam.
The Asian origin of Native Americans is largely accepted. However uncertainties persist regarding the source population(s) within Asia, the divergence and arrival time(s) of the founder groups, the number of expansion events, and migration routes into the New World. mtDNA data, presented over the past two decades, have been used to suggest a single-migration model for which the Beringian land mass plays an important role.
In our analysis of 568 mitochondrial genomes, the coalescent age estimates of shared roots between Native American and Siberian-Asian lineages, calculated using two different mutation rates, are A4 (27.5 ± 6.8 kya/22.7 ± 7.4 kya), C1 (21.4 ± 2.7 kya/16.4 ± 1.5 kya), C4 (21.0 ± 4.6 kya/20.0 ± 6.4 kya), and D4e1 (24.1 ± 9.0 kya/17.9 ± 10.0 kya). The coalescent age estimates of pan-American haplogroups calculated using the same two mutation rates (A2:19.5 ± 1.3 kya/16.1 ± 1.5 kya, B2:20.8 ± 2.0 kya/18.1 ± 2.4 kya, C1:21.4 ± 2.7 kya/16.4 ± 1.5 kya and D1:17.2 ± 2.0 kya/14.9 ± 2.2 kya) and estimates of population expansions within America (~21-16 kya), support the pre-Clovis occupation of the New World. The phylogeography of sublineages within American haplogroups A2, B2, D1 and the C1b, C1c andC1d subhaplogroups of C1 are complex and largely specific to geographical North, Central and South America. However some sub-branches (B2b, C1b, C1c, C1d and D1f) already existed in American founder haplogroups before expansion into the America.
Our results suggest that Native American founders diverged from their Siberian-Asian progenitors sometime during the last glacial maximum (LGM) and expanded into America soon after the LGM peak (~20-16 kya). The phylogeography of haplogroup C1 suggest that this American founder haplogroup differentiated in Siberia-Asia. The situation is less clear for haplogroup B2, however haplogroups A2 and D1 may have differentiated soon after the Native American founders divergence. A moderate population bottle neck in American founder populations just before the expansion most plausibly resulted in few founder types in America. The similar estimates of the diversity indices and Bayesian skyline analysis in North America, Central America and South America suggest almost simultaneous (~ 2.0 ky from South to North America) colonization of these geographical regions with rapid population expansion differentiating into more or less regional branches across the pan-American haplogroups.
The genetic structure, affinities, and diversity of the 1 billion Indians hold important keys to numerous unanswered questions regarding the evolution of human populations and the forces shaping contemporary patterns of genetic variation. Although there have been several recent studies of South Indian caste groups, North Indian caste groups, and South Indian Muslims using Y-chromosomal markers, overall, the Indian population has still not been well studied compared to other geographical populations. In particular, no genetic study has been conducted on Shias and Sunnis from North India.
This study aims to investigate genetic variation and the gene pool in North Indians.
Subjects and methods
A total of 32 Y-chromosomal markers in 560 North Indian males collected from three higher caste groups (Brahmins, Chaturvedis and Bhargavas) and two Muslims groups (Shia and Sunni) were genotyped.
Three distinct lineages were revealed based upon 13 haplogroups. The first was a Central Asian lineage harbouring haplogroups R1 and R2. The second lineage was of Middle-Eastern origin represented by haplogroups J2*, Shia-specific E1b1b1, and to some extent G* and L*. The third was the indigenous Indian Y-lineage represented by haplogroups H1*, F*, C* and O*. Haplogroup E1b1b1 was observed in Shias only.
The results revealed that a substantial part of today’s North Indian paternal gene pool was contributed by Central Asian lineages who are Indo-European speakers, suggesting that extant Indian caste groups are primarily the descendants of Indo-European migrants. The presence of haplogroup E in Shias, first reported in this study, suggests a genetic distinction between the two Indo Muslim sects. The findings of the present study provide insights into prehistoric and early historic patterns of migration into India and the evolution of Indian populations in recent history.
Paternal lineages; Y-chromosomal markers; North Indians; migration
Linguistic and genetic studies on Roma populations inhabited in Europe have unequivocally traced these populations to the Indian subcontinent. However, the exact parental population group and time of the out-of-India dispersal have remained disputed. In the absence of archaeological records and with only scanty historical documentation of the Roma, comparative linguistic studies were the first to identify their Indian origin. Recently, molecular studies on the basis of disease-causing mutations and haploid DNA markers (i.e. mtDNA and Y-chromosome) supported the linguistic view. The presence of Indian-specific Y-chromosome haplogroup H1a1a-M82 and mtDNA haplogroups M5a1, M18 and M35b among Roma has corroborated that their South Asian origins and later admixture with Near Eastern and European populations. However, previous studies have left unanswered questions about the exact parental population groups in South Asia. Here we present a detailed phylogeographical study of Y-chromosomal haplogroup H1a1a-M82 in a data set of more than 10,000 global samples to discern a more precise ancestral source of European Romani populations. The phylogeographical patterns and diversity estimates indicate an early origin of this haplogroup in the Indian subcontinent and its further expansion to other regions. Tellingly, the short tandem repeat (STR) based network of H1a1a-M82 lineages displayed the closest connection of Romani haplotypes with the traditional scheduled caste and scheduled tribe population groups of northwestern India.
Human genetic diversity observed in Indian subcontinent is second only to that of Africa. This implies an early settlement and demographic growth soon after the first 'Out-of-Africa' dispersal of anatomically modern humans in Late Pleistocene. In contrast to this perspective, linguistic diversity in India has been thought to derive from more recent population movements and episodes of contact. With the exception of Dravidian, which origin and relatedness to other language phyla is obscure, all the language families in India can be linked to language families spoken in different regions of Eurasia. Mitochondrial DNA and Y chromosome evidence has supported largely local evolution of the genetic lineages of the majority of Dravidian and Indo-European speaking populations, but there is no consensus yet on the question of whether the Munda (Austro-Asiatic) speaking populations originated in India or derive from a relatively recent migration from further East.
Here, we report the analysis of 35 novel complete mtDNA sequences from India which refine the structure of Indian-specific varieties of haplogroup R. Detailed analysis of haplogroup R7, coupled with a survey of ~12,000 mtDNAs from caste and tribal groups over the entire Indian subcontinent, reveals that one of its more recently derived branches (R7a1), is particularly frequent among Munda-speaking tribal groups. This branch is nested within diverse R7 lineages found among Dravidian and Indo-European speakers of India. We have inferred from this that a subset of Munda-speaking groups have acquired R7 relatively recently. Furthermore, we find that the distribution of R7a1 within the Munda-speakers is largely restricted to one of the sub-branches (Kherwari) of northern Munda languages. This evidence does not support the hypothesis that the Austro-Asiatic speakers are the primary source of the R7 variation. Statistical analyses suggest a significant correlation between genetic variation and geography, rather than between genes and languages.
Our high-resolution phylogeographic study, involving diverse linguistic groups in India, suggests that the high frequency of mtDNA haplogroup R7 among Munda speaking populations of India can be explained best by gene flow from linguistically different populations of Indian subcontinent. The conclusion is based on the observation that among Indo-Europeans, and particularly in Dravidians, the haplogroup is, despite its lower frequency, phylogenetically more divergent, while among the Munda speakers only one sub-clade of R7, i.e. R7a1, can be observed. It is noteworthy that though R7 is autochthonous to India, and arises from the root of hg R, its distribution and phylogeography in India is not uniform. This suggests the more ancient establishment of an autochthonous matrilineal genetic structure, and that isolation in the Pleistocene, lineage loss through drift, and endogamy of prehistoric and historic groups have greatly inhibited genetic homogenization and geographical uniformity.
Global mitochondrial DNA (mtDNA) data indicates that the dog originates from domestication of wolf in Asia South of Yangtze River (ASY), with minor genetic contributions from dog–wolf hybridisation elsewhere. Archaeological data and autosomal single nucleotide polymorphism data have instead suggested that dogs originate from Europe and/or South West Asia but, because these datasets lack data from ASY, evidence pointing to ASY may have been overlooked. Analyses of additional markers for global datasets, including ASY, are therefore necessary to test if mtDNA phylogeography reflects the actual dog history and not merely stochastic events or selection. Here, we analyse 14 437 bp of Y-chromosome DNA sequence in 151 dogs sampled worldwide. We found 28 haplotypes distributed in five haplogroups. Two haplogroups were universally shared and included three haplotypes carried by 46% of all dogs, but two other haplogroups were primarily restricted to East Asia. Highest genetic diversity and virtually complete phylogenetic coverage was found within ASY. The 151 dogs were estimated to originate from 13–24 wolf founders, but there was no indication of post-domestication dog–wolf hybridisations. Thus, Y-chromosome and mtDNA data give strikingly similar pictures of dog phylogeography, most importantly that roughly 50% of the gene pools are shared universally but only ASY has nearly the full range of genetic diversity, such that the gene pools in all other regions may derive from ASY. This corroborates that ASY was the principal, and possibly sole region of wolf domestication, that a large number of wolves were domesticated, and that subsequent dog–wolf hybridisation contributed modestly to the dog gene pool.
dog; canis familiaris; domestication; Y-chromosome DNA; genetic diversity; phylogeography
Central Asia has served as a corridor for human migrations providing trading routes since ancient times. It has functioned as a conduit connecting Europe and the Middle East with South Asia and far Eastern civilizations. Therefore, the study of populations in this region is essential for a comprehensive understanding of early human dispersal on the Eurasian continent. Although Y- chromosome distributions in Central Asia have been widely surveyed, present-day Afghanistan remains poorly characterized genetically. The present study addresses this lacuna by analyzing 190 Pathan males from Afghanistan using high-resolution Y-chromosome binary markers. In addition, haplotype diversity for its most common lineages (haplogroups R1a1a*-M198 and L3-M357) was estimated using a set of 15 Y-specific STR loci. The observed haplogroup distribution suggests some degree of genetic isolation of the northern population, likely due to the Hindu Kush mountain range separating it from the southern Afghans who have had greater contact with neighboring Pathans from Pakistan and migrations from the Indian subcontinent. Our study demonstrates genetic similarities between Pathans from Afghanistan and Pakistan, both of which are characterized by the predominance of haplogroup R1a1a*-M198 (>50%) and the sharing of the same modal haplotype. Furthermore, the high frequencies of R1a1a-M198 and the presence of G2c-M377 chromosomes in Pathans might represent phylogenetic signals from Khazars, a common link between Pathans and Ashkenazi groups, whereas the absence of E1b1b1a2-V13 lineage does not support their professed Greek ancestry.
Afghanistan; Pathans/Pashtuns; Y-SNP; phylogenetic analyses; haplogroup; haplotype
Huntington disease (HD) results from CAG expansion in the huntingtin (HTT) gene. Although HD occurs worldwide, there are large geographic differences in its prevalence. The prevalence in populations derived from Europe is 10–100 times greater than in East Asia. The European general population chromosomes can be grouped into three major haplogroups (group of similar haplotypes): A, B and C. The majority of HD chromosomes in Europe are found on haplogroup A. However, in the East-Asian populations of China and Japan, we find the majority of HD chromosomes are associated with haplogroup C. The highest risk HD haplotypes (A1 and A2), are absent from the general and HD populations of China and Japan, and therefore provide an explanation for why HD prevalence is low in East Asia. Interestingly, both East-Asian and European populations share a similar low level of HD on haplogroup C. Our data are consistent with the hypothesis that different HTT haplotypes have different mutation rates, and geographic differences in HTT haplotypes explain the difference in HD prevalence. Further, the bias for expansion on haplogroup C in the East-Asian population cannot be explained by a higher average CAG size, as haplogroup C has a lower average CAG size in the general East-Asian population compared with other haplogroups. This finding suggests that CAG-tract size is not the only factor important for CAG instability. Instead, the expansion bias may be because of genetic cis-elements within the haplotype that influence CAG instability in HTT, possibly through different mutational mechanisms for the different haplogroups.
Huntington disease; prevalence; CAG expansion; CAG instability; haplotypes; Cis-elements