Following the dispersal out of Africa, where hominins evolved in warm environments for millions of years, our species has colonised different climate zones of the world, including high latitudes and cold environments. The extent to which human habitation in (sub-)Arctic regions has been enabled by cultural buffering, short-term acclimatization and genetic adaptations is not clearly understood. Present day indigenous populations of Siberia show a number of phenotypic features, such as increased basal metabolic rate, low serum lipid levels and increased blood pressure that have been attributed to adaptation to the extreme cold climate. In this study we introduce a dataset of 200 individuals from ten indigenous Siberian populations that were genotyped for 730,525 SNPs across the genome to identify genes and non-coding regions that have undergone unusually rapid allele frequency and long-range haplotype homozygosity change in the recent past. At least three distinct population clusters could be identified among the Siberians, each of which showed a number of unique signals of selection. A region on chromosome 11 (chr11:66–69 Mb) contained the largest amount of clustering of significant signals and also the strongest signals in all the different selection tests performed. We present a list of candidate cold adaption genes that showed significant signals of positive selection with our strongest signals associated with genes involved in energy regulation and metabolism (CPT1A, LRP5, THADA) and vascular smooth muscle contraction (PRKG1). By employing a new method that paints phased chromosome chunks by their ancestry we distinguish local Siberian-specific long-range haplotype signals from those introduced by admixture.
We present an Aboriginal Australian genomic sequence obtained from a 100-year-old lock of hair donated by an Aboriginal man from southern Western Australia in the early 20th century. We detect no evidence of European admixture and estimate contamination levels to be below 0.5%. We show that Aboriginal Australians are descendants of an early human dispersal into eastern Asia, possibly 62,000 to 75,000 years ago. This dispersal is separate from the one that gave rise to modern Asians 25,000 to 38,000 years ago. We also find evidence of gene flow between populations of the two dispersal waves prior to the divergence of Native Americans from modern Asian ancestors. Our findings support the hypothesis that present-day Aboriginal Australians descend from the earliest humans to occupy Australia, likely representing one of the oldest continuous populations outside Africa.
We report here the genome sequence of an ancient human. Obtained from ∼4,000-year-old permafrost-preserved hair, the genome represents a male individual from the first known culture to settle in Greenland. Sequenced to an average depth of 20×, we recover 79% of the diploid genome, an amount close to the practical limit of current sequencing technologies. We identify 353,151 high-confidence single-nucleotide polymorphisms (SNPs), of which 6.8% have not been reported previously. We estimate raw read contamination to be no higher than 0.8%. We use functional SNP assessment to assign possible phenotypic characteristics of the individual that belonged to a culture whose location has yielded only trace human remains. We compare the high-confidence SNPs to those of contemporary populations to find the populations most closely related to the individual. This provides evidence for a migration from Siberia into the New World some 5,500 years ago, independent of that giving rise to the modern Native Americans and Inuit.
Skin pigmentation is one of the most variable phenotypic traits in humans. A non-synonymous substitution (rs1426654) in the third exon of SLC24A5 accounts for lighter skin in Europeans but not in East Asians. A previous genome-wide association study carried out in a heterogeneous sample of UK immigrants of South Asian descent suggested that this gene also contributes significantly to skin pigmentation variation among South Asians. In the present study, we have quantitatively assessed skin pigmentation for a largely homogeneous cohort of 1228 individuals from the Southern region of the Indian subcontinent. Our data confirm significant association of rs1426654 SNP with skin pigmentation, explaining about 27% of total phenotypic variation in the cohort studied. Our extensive survey of the polymorphism in 1573 individuals from 54 ethnic populations across the Indian subcontinent reveals wide presence of the derived-A allele, although the frequencies vary substantially among populations. We also show that the geospatial pattern of this allele is complex, but most importantly, reflects strong influence of language, geography and demographic history of the populations. Sequencing 11.74 kb of SLC24A5 in 95 individuals worldwide reveals that the rs1426654-A alleles in South Asian and West Eurasian populations are monophyletic and occur on the background of a common haplotype that is characterized by low genetic diversity. We date the coalescence of the light skin associated allele at 22–28 KYA. Both our sequence and genome-wide genotype data confirm that this gene has been a target for positive selection among Europeans. However, the latter also shows additional evidence of selection in populations of the Middle East, Central Asia, Pakistan and North India but not in South India.
Human skin color is one of the most visible aspects of human diversity. The genetic basis of pigmentation in Europeans has been understood to some extent, but our knowledge about South Asians has been restricted to a handful of studies. It has been suggested that a single nucleotide difference in SLC24A5 accounts for 25–38% European-African pigmentation differences and correlates with lighter skin. This genetic variant has also been associated with skin color variation among South Asians living in the UK. Here, we report a study based on a homogenous cohort of South India. Our results confirm that SLC24A5 plays a key role in pigmentation diversity of South Asians. Country-wide screening of the variant reveals that the light skin associated allele is widespread in the Indian subcontinent and its complex patterning is shaped by a combination of processes involving selection and demographic history of the populations. By studying the variation of SLC24A5 sequences among a diverse set of individuals, we show that the light skin associated allele in South Asians is identical by descent to that found in Europeans. Our study also provides new insights into positive selection acting on the gene and the evolutionary history of light skin in humans.
Despite being located at the crossroads of Asia, genetics of the Afghanistan populations have been largely overlooked. It is currently inhabited by five major ethnic populations: Pashtun, Tajik, Hazara, Uzbek and Turkmen. Here we present autosomal from a subset of our samples, mitochondrial and Y- chromosome data from over 500 Afghan samples among these 5 ethnic groups. This Afghan data was supplemented with the same Y-chromosome analyses of samples from Iran, Kyrgyzstan, Mongolia and updated Pakistani samples (HGDP-CEPH). The data presented here was integrated into existing knowledge of pan-Eurasian genetic diversity. The pattern of genetic variation, revealed by structure-like and Principal Component analyses and Analysis of Molecular Variance indicates that the people of Afghanistan are made up of a mosaic of components representing various geographic regions of Eurasian ancestry. The absence of a major Central Asian-specific component indicates that the Hindu Kush, like the gene pool of Central Asian populations in general, is a confluence of gene flows rather than a source of distinctly autochthonous populations that have arisen in situ: a conclusion that is reinforced by the phylogeography of both haploid loci.
Sakha – an area connecting South and Northeast Siberia – is significant for understanding the history of peopling of Northeast Eurasia and the Americas. Previous studies have shown a genetic contiguity between Siberia and East Asia and the key role of South Siberia in the colonization of Siberia.
We report the results of a high-resolution phylogenetic analysis of 701 mtDNAs and 318 Y chromosomes from five native populations of Sakha (Yakuts, Evenks, Evens, Yukaghirs and Dolgans) and of the analysis of more than 500,000 autosomal SNPs of 758 individuals from 55 populations, including 40 previously unpublished samples from Siberia. Phylogenetically terminal clades of East Asian mtDNA haplogroups C and D and Y-chromosome haplogroups N1c, N1b and C3, constituting the core of the gene pool of the native populations from Sakha, connect Sakha and South Siberia. Analysis of autosomal SNP data confirms the genetic continuity between Sakha and South Siberia. Maternal lineages D5a2a2, C4a1c, C4a2, C5b1b and the Yakut-specific STR sub-clade of Y-chromosome haplogroup N1c can be linked to a migration of Yakut ancestors, while the paternal lineage C3c was most likely carried to Sakha by the expansion of the Tungusic people. MtDNA haplogroups Z1a1b and Z1a3, present in Yukaghirs, Evens and Dolgans, show traces of different and probably more ancient migration(s). Analysis of both haploid loci and autosomal SNP data revealed only minor genetic components shared between Sakha and the extreme Northeast Siberia. Although the major part of West Eurasian maternal and paternal lineages in Sakha could originate from recent admixture with East Europeans, mtDNA haplogroups H8, H20a and HV1a1a, as well as Y-chromosome haplogroup J, more probably reflect an ancient gene flow from West Eurasia through Central Asia and South Siberia.
Our high-resolution phylogenetic dissection of mtDNA and Y-chromosome haplogroups as well as analysis of autosomal SNP data suggests that Sakha was colonized by repeated expansions from South Siberia with minor gene flow from the Lower Amur/Southern Okhotsk region and/or Kamchatka. The minor West Eurasian component in Sakha attests to both recent and ongoing admixture with East Europeans and an ancient gene flow from West Eurasia.
mtDNA; Y chromosome; Autosomal SNPs; Sakha
Population genetic studies on European populations have highlighted Italy as one of genetically most diverse regions. This is possibly due to the country's complex demographic history and large variability in terrain throughout the territory. This is the reason why Italy is enriched for population isolates, Sardinia being the best-known example. As the population isolates have a great potential in disease-causing genetic variants identification, we aimed to genetically characterize a region from northeastern Italy, which is known for isolated communities. Total of 1310 samples, collected from six geographically isolated villages, were genotyped at >145 000 single-nucleotide polymorphism positions. Newly genotyped data were analyzed jointly with the available genome-wide data sets of individuals of European descent, including several population isolates. Despite the linguistic differences and geographical isolation the village populations still show the greatest genetic similarity to other Italian samples. The genetic isolation and small effective population size of the village populations is manifested by higher levels of genomic homozygosity and elevated linkage disequilibrium. These estimates become even more striking when the detected substructure is taken into account. The observed level of genetic isolation in Friuli-Venezia Giulia region is more extreme according to several measures of isolation compared with Sardinians, French Basques and northern Finns, thus proving the status of an isolate.
population genetics; isolated population; genetic distance
A Southwest Asian origin and dispersal to North Africa in the Early Upper Palaeolithic era has been inferred in previous studies for mtDNA haplogroups M1 and U6. Both haplogroups have been proposed to show similar geographic patterns and shared demographic histories.
We report here 24 M1 and 33 U6 new complete mtDNA sequences that allow us to refine the existing phylogeny of these haplogroups. The resulting phylogenetic information was used to genotype a further 131 M1 and 91 U6 samples to determine the geographic spread of their sub-clades. No southwest Asian specific clades for M1 or U6 were discovered. U6 and M1 frequencies in North Africa, the Middle East and Europe do not follow similar patterns, and their sub-clade divisions do not appear to be compatible with their shared history reaching back to the Early Upper Palaeolithic. The Bayesian Skyline Plots testify to non-overlapping phases of expansion, and the haplogroups’ phylogenies suggest that there are U6 sub-clades that expanded earlier than those in M1. Some M1 and U6 sub-clades could be linked with certain events. For example, U6a1 and M1b, with their coalescent ages of ~20,000–22,000 years ago and earliest inferred expansion in northwest Africa, could coincide with the flourishing of the Iberomaurusian industry, whilst U6b and M1b1 appeared at the time of the Capsian culture.
Our high-resolution phylogenetic dissection of both haplogroups and coalescent time assessments suggest that the extant main branching pattern of both haplogroups arose and diversified in the mid-later Upper Palaeolithic, with some sub-clades concomitantly with the expansion of the Iberomaurusian industry. Carriers of these maternal lineages have been later absorbed into and diversified further during the spread of Afro-Asiatic languages in North and East Africa.
mtDNA haplogroups M1 and U6; Afro-Asiatic languages; North Africa
The geographic origin and time of dispersal of Austroasiatic (AA) speakers, presently settled in south and southeast Asia, remains disputed. Two rival hypotheses, both assuming a demic component to the language dispersal, have been proposed. The first of these places the origin of Austroasiatic speakers in southeast Asia with a later dispersal to south Asia during the Neolithic, whereas the second hypothesis advocates pre-Neolithic origins and dispersal of this language family from south Asia. To test the two alternative models, this study combines the analysis of uniparentally inherited markers with 610,000 common single nucleotide polymorphism loci from the nuclear genome. Indian AA speakers have high frequencies of Y chromosome haplogroup O2a; our results show that this haplogroup has significantly higher diversity and coalescent time (17–28 thousand years ago) in southeast Asia, strongly supporting the first of the two hypotheses. Nevertheless, the results of principal component and “structure-like” analyses on autosomal loci also show that the population history of AA speakers in India is more complex, being characterized by two ancestral components—one represented in the pattern of Y chromosomal and EDAR results and the other by mitochondrial DNA diversity and genomic structure. We propose that AA speakers in India today are derived from dispersal from southeast Asia, followed by extensive sex-specific admixture with local Indian populations.
Austroasiatic; mtDNA; Y chromosome; autosomes; admixture
Human Y-chromosome haplogroup structure is largely circumscribed by continental boundaries. One notable exception to this general pattern is the young haplogroup R1a that exhibits post-Glacial coalescent times and relates the paternal ancestry of more than 10% of men in a wide geographic area extending from South Asia to Central East Europe and South Siberia. Its origin and dispersal patterns are poorly understood as no marker has yet been described that would distinguish European R1a chromosomes from Asian. Here we present frequency and haplotype diversity estimates for more than 2000 R1a chromosomes assessed for several newly discovered SNP markers that introduce the onset of informative R1a subdivisions by geography. Marker M434 has a low frequency and a late origin in West Asia bearing witness to recent gene flow over the Arabian Sea. Conversely, marker M458 has a significant frequency in Europe, exceeding 30% in its core area in Eastern Europe and comprising up to 70% of all M17 chromosomes present there. The diversity and frequency profiles of M458 suggest its origin during the early Holocene and a subsequent expansion likely related to a number of prehistoric cultural developments in the region. Its primary frequency and diversity distribution correlates well with some of the major Central and East European river basins where settled farming was established before its spread further eastward. Importantly, the virtual absence of M458 chromosomes outside Europe speaks against substantial patrilineal gene flow from East Europe to Asia, including to India, at least since the mid-Holocene.
Y chromosome; haplogroup R1a; human evolution; population genetics
We have analyzed 7137 samples from 125 different caste, tribal and religious groups of India and 99 samples from three populations of Nepal for the length variation in the COII/tRNALys region of mtDNA. Samples showing length variation were subjected to detailed phylogenetic analysis based on HVS-I and informative coding region sequence variation. The overall frequencies of the 9-bp deletion and insertion variants in South Asia were 1.8% and 0.5%, respectively. We have also defined a novel deep-rooting haplogroup M43 and identified the rare haplogroup H14 in Indian populations carrying the 9bp-deletion by complete mtDNA sequencing. Moreover, we redefined haplogroup M6 and dissected it into two well-defined subclades. The presence of haplogroups F1 and B5a in Uttar Pradesh suggests minor maternal contribution from Southeast Asia to Northern India. The occurrence of haplogroup F1 in the Nepalese sample implies that Nepal might have served as a bridge for the flow of eastern lineages to India. The presence of R6 in the Nepalese, on the other hand, suggests that the gene flow between India and Nepal has been reciprocal.
South Asia; 9bp indel; mtDNA; Haplogroup
Human genetic diversity observed in Indian subcontinent is second only to that of Africa. This implies an early settlement and demographic growth soon after the first 'Out-of-Africa' dispersal of anatomically modern humans in Late Pleistocene. In contrast to this perspective, linguistic diversity in India has been thought to derive from more recent population movements and episodes of contact. With the exception of Dravidian, which origin and relatedness to other language phyla is obscure, all the language families in India can be linked to language families spoken in different regions of Eurasia. Mitochondrial DNA and Y chromosome evidence has supported largely local evolution of the genetic lineages of the majority of Dravidian and Indo-European speaking populations, but there is no consensus yet on the question of whether the Munda (Austro-Asiatic) speaking populations originated in India or derive from a relatively recent migration from further East.
Here, we report the analysis of 35 novel complete mtDNA sequences from India which refine the structure of Indian-specific varieties of haplogroup R. Detailed analysis of haplogroup R7, coupled with a survey of ~12,000 mtDNAs from caste and tribal groups over the entire Indian subcontinent, reveals that one of its more recently derived branches (R7a1), is particularly frequent among Munda-speaking tribal groups. This branch is nested within diverse R7 lineages found among Dravidian and Indo-European speakers of India. We have inferred from this that a subset of Munda-speaking groups have acquired R7 relatively recently. Furthermore, we find that the distribution of R7a1 within the Munda-speakers is largely restricted to one of the sub-branches (Kherwari) of northern Munda languages. This evidence does not support the hypothesis that the Austro-Asiatic speakers are the primary source of the R7 variation. Statistical analyses suggest a significant correlation between genetic variation and geography, rather than between genes and languages.
Our high-resolution phylogeographic study, involving diverse linguistic groups in India, suggests that the high frequency of mtDNA haplogroup R7 among Munda speaking populations of India can be explained best by gene flow from linguistically different populations of Indian subcontinent. The conclusion is based on the observation that among Indo-Europeans, and particularly in Dravidians, the haplogroup is, despite its lower frequency, phylogenetically more divergent, while among the Munda speakers only one sub-clade of R7, i.e. R7a1, can be observed. It is noteworthy that though R7 is autochthonous to India, and arises from the root of hg R, its distribution and phylogeography in India is not uniform. This suggests the more ancient establishment of an autochthonous matrilineal genetic structure, and that isolation in the Pleistocene, lineage loss through drift, and endogamy of prehistoric and historic groups have greatly inhibited genetic homogenization and geographical uniformity.
We have analyzed 7,137 samples from 125 different caste, tribal and religious groups of India and 99 samples from three populations of Nepal for the length variation in the COII/tRNALys region of mtDNA. Samples showing length variation were subjected to detailed phylogenetic analysis based on HVS-I and informative coding region sequence variation. The overall frequencies of the 9-bp deletion and insertion variants in South Asia were 1.9 and 0.6%, respectively. We have also defined a novel deep-rooting haplogroup M43 and identified the rare haplogroup H14 in Indian populations carrying the 9-bp deletion by complete mtDNA sequencing. Moreover, we redefined haplogroup M6 and dissected it into two well-defined subclades. The presence of haplogroups F1 and B5a in Uttar Pradesh suggests minor maternal contribution from Southeast Asia to Northern India. The occurrence of haplogroup F1 in the Nepalese sample implies that Nepal might have served as a bridge for the flow of eastern lineages to India. The presence of R6 in the Nepalese, on the other hand, suggests that the gene flow between India and Nepal has been reciprocal.
South Asia; 9bp indel; mtDNA; Haplogroup
Native Americans derive from a small number of Asian founders who likely arrived to the Americas via Beringia. However, additional details about the intial colonization of the Americas remain unclear. To investigate the pioneering phase in the Americas we analyzed a total of 623 complete mtDNAs from the Americas and Asia, including 20 new complete mtDNAs from the Americas and seven from Asia. This sequence data was used to direct high-resolution genotyping from 20 American and 26 Asian populations. Here we describe more genetic diversity within the founder population than was previously reported. The newly resolved phylogenetic structure suggests that ancestors of Native Americans paused when they reached Beringia, during which time New World founder lineages differentiated from their Asian sister-clades. This pause in movement was followed by a swift migration southward that distributed the founder types all the way to South America. The data also suggest more recent bi-directional gene flow between Siberia and the North American Arctic.
The issue of errors in genetic data sets is of growing concern, particularly in population genetics where whole genome mtDNA sequence data is coming under increased scrutiny. Multiplexed PCR reactions, combined with SNP typing, are currently under-exploited in this context, but have the potential to genotype whole populations rapidly and accurately, significantly reducing the amount of errors appearing in published data sets. To show the sensitivity of this technique for screening mtDNA genomic sequence data, 20 historic samples of the enigmatic Andaman Islanders and 12 modern samples from three Indian tribal populations (Chenchu, Lambadi and Lodha) were genotyped for 20 coding region sites after provisional haplogroup assignment with control region sequences. The genotype data from the historic samples significantly revise the topologies for the Andaman M31 and M32 mtDNA lineages by rectifying conflicts in published data sets. The new Indian data extend the distribution of the M31a lineage to South Asia, challenging previous interpretations of mtDNA phylogeography. This genetic connection between the ancestors of the Andamanese and South Asian tribal groups ∼30 kya has important implications for the debate concerning migration routes and settlement patterns of humans leaving Africa during the late Pleistocene, and indicates the need for more detailed genotyping strategies. The methodology serves as a low-cost, high-throughput model for the production and authentication of data from modern or ancient DNA, and demonstrates the value of museum collections as important records of human genetic diversity.
Recent advances in the understanding of the maternal and paternal heritage of south and southwest Asian populations have highlighted their role in the colonization of Eurasia by anatomically modern humans. Further understanding requires a deeper insight into the topology of the branches of the Indian mtDNA phylogenetic tree, which should be contextualized within the phylogeography of the neighboring regional mtDNA variation. Accordingly, we have analyzed mtDNA control and coding region variation in 796 Indian (including both tribal and caste populations from different parts of India) and 436 Iranian mtDNAs. The results were integrated and analyzed together with published data from South, Southeast Asia and West Eurasia.
Four new Indian-specific haplogroup M sub-clades were defined. These, in combination with two previously described haplogroups, encompass approximately one third of the haplogroup M mtDNAs in India. Their phylogeography and spread among different linguistic phyla and social strata was investigated in detail. Furthermore, the analysis of the Iranian mtDNA pool revealed patterns of limited reciprocal gene flow between Iran and the Indian sub-continent and allowed the identification of different assemblies of shared mtDNA sub-clades.
Since the initial peopling of South and West Asia by anatomically modern humans, when this region may well have provided the initial settlers who colonized much of the rest of Eurasia, the gene flow in and out of India of the maternally transmitted mtDNA has been surprisingly limited. Specifically, our analysis of the mtDNA haplogroups, which are shared between Indian and Iranian populations and exhibit coalescence ages corresponding to around the early Upper Paleolithic, indicates that they are present in India largely as Indian-specific sub-lineages. In contrast, other ancient Indian-specific variants of M and R are very rare outside the sub-continent.
Previous Y-chromosome studies have demonstrated that Ashkenazi Levites, members of a paternally inherited Jewish priestly caste, display a distinctive founder event within R1a, the most prevalent Y-chromosome haplogroup in Eastern Europe. Here we report the analysis of 16 whole R1 sequences and show that a set of 19 unique nucleotide substitutions defines the Ashkenazi R1a lineage. While our survey of one of these, M582, in 2,834 R1a samples reveals its absence in 922 Eastern Europeans, we show it is present in all sampled R1a Ashkenazi Levites, as well as in 33.8% of other R1a Ashkenazi Jewish males and 5.9% of 303 R1a Near Eastern males, where it shows considerably higher diversity. Moreover, the M582 lineage also occurs at low frequencies in non-Ashkenazi Jewish populations. In contrast to the previously suggested Eastern European origin for Ashkenazi Levites, the current data are indicative of a geographic source of the Levite founder lineage in the Near East and its likely presence among pre-Diaspora Hebrews.
Population genetics studies continue to debate whether Ashkenazi Levites originated in Europe or the Near East. Here, Rootsi et al. use whole Y-chromosome DNA sequences to unravel the phylogenetic origin of the Ashkenazi Levite and suggest an origin for the Levite founder lineage in the Near East.