To construct maternal phylogeny and prehistoric dispersals of modern human being in the Indian sub continent, a diverse subset of 641 complete mitochondrial DNA (mtDNA) genomes belonging to macrohaplogroup M was chosen from a total collection of 2,783 control-region sequences, sampled from 26 selected tribal populations of India. On the basis of complete mtDNA sequencing, we identified 12 new haplogroups - M53 to M64; redefined/ascertained and characterized haplogroups M2, M3, M4, M5, M6, M8′C′Z, M9, M10, M11, M12-G, D, M18, M30, M33, M35, M37, M38, M39, M40, M41, M43, M45 and M49, which were previously described by control and/or coding-region polymorphisms. Our results indicate that the mtDNA lineages reported in the present study (except East Asian lineages M8′C′Z, M9, M10, M11, M12-G, D ) are restricted to Indian region.The deep rooted lineages of macrohaplogroup ‘M’ suggest in-situ origin of these haplogroups in India. Most of these deep rooting lineages are represented by multiple ethnic/linguist groups of India. Hierarchical analysis of molecular variation (AMOVA) shows substantial subdivisions among the tribes of India (Fst = 0.16164). The current Indian mtDNA gene pool was shaped by the initial settlers and was galvanized by minor events of gene flow from the east and west to the restricted zones. Northeast Indian mtDNA pool harbors region specific lineages, other Indian lineages and East Asian lineages. We also suggest the establishment of an East Asian gene in North East India through admixture rather than replacement.
Analysis of human complete mitochondrial DNA sequences has largely contributed to resolve phylogenies and antiquity of different lineages belonging to the majorhaplogroups L, N and M (East-Asian lineages). In the absence of whole mtDNA sequence information of M lineages reported in India that exhibits highest diversity within the sub-continent, the present study was undertaken to provide a detailed analysis of this macrohaplogroup to precisely characterize and unravel the intricate phylogeny of the lineages and to establish the antiquity of M lineages in India.
The phylogenetic tree constructed from sequencing information of twenty-four whole mtDNA genome revealed novel substitutions in the previously defined M2a and M6 lineages. The most striking feature of this phylogenetic tree is the recognition of two new lineages, M30 and M31, distinguished by transitions at 12007 and 5319, respectively. M30 comprises of M18 and identifies a potential new sub-lineage possessing substitution at 16223 and 16300. It further branches into M30a sub-lineage, defined by 15431 and 195A substitution. The age of M30 lineage was estimated at 33,042 YBP, indicating a more recent expansion time than M2 (49,686 YBP). The M31 branch encompasses the M6 lineage along with the previously defined M3 and M4 lineages. Contradictory to earlier reports, the M5 lineage does not always include a 12477 substitution, and is more appropriately defined by a transversion at 10986A. The phylogenetic tree also identifies a potential new lineage in the M* branch with HVSI sequence as 16223,16325. Substitutions in M25 were in concordance with previous reports.
This study describes five new basal mutations and recognizes two new lineages, M30 and M31 that substantially contribute to the present understanding of macrohaplogroup M. These two newly erected lineages include the previously independent lineages M18 and M6 as sub-lineages within them, respectively, suggesting that most mt DNA genomes might arise as limited offshoots of M trunk. Furthermore, this study supports the non existence of lineages such as M3 and M4 that are solely defined on the basis of fast mutating control region motifs and hence, establishes the importance of coding region markers for an accurate understanding of the phylogeny. The deep roots of M phylogeny clearly establish the antiquity of Indian lineages, especially M2, as compared to Ethiopian M1 lineage and hence, support an Asian origin of M majorhaplogroup.
An early dispersal of biologically and behaviorally modern humans from their African origins to Australia, by at least 45 thousand years via southern Asia has been suggested by studies based on morphology, archaeology and genetics. However, mtDNA lineages sampled so far from south Asia, eastern Asia and Australasia show non-overlapping distributions of haplogroups within pan Eurasian M and N macrohaplogroups. Likewise, support from the archaeology is still ambiguous.
In our completely sequenced 966-mitochondrial genomes from 26 relic tribes of India, we have identified seven genomes, which share two synonymous polymorphisms with the M42 haplogroup, which is specific to Australian Aborigines.
Our results showing a shared mtDNA lineage between Indians and Australian Aborigines provides direct genetic evidence of an early colonization of Australia through south Asia, following the "southern route".
The phylogeny of the indigenous Indian-specific mitochondrial DNA (mtDNA) haplogroups have been determined and refined in previous reports. Similar to mtDNA superhaplogroups M and N, a profusion of reports are also available for superhaplogroup R. However, there is a dearth of information on South Asian subhaplogroups in particular, including R8. Therefore, we ought to access the genealogy and pre-historic expansion of haplogroup R8 which is considered one of the autochthonous lineages of South Asia.
Upon screening the mtDNA of 5,836 individuals belonging to 104 distinct ethnic populations of the Indian subcontinent, we found 54 individuals with the HVS-I motif that defines the R8 haplogroup. Complete mtDNA sequencing of these 54 individuals revealed two deep-rooted subclades: R8a and R8b. Furthermore, these subclades split into several fine subclades. An isofrequency contour map detected the highest frequency of R8 in the state of Orissa. Spearman's rank correlation analysis suggests significant correlation of R8 occurrence with geography.
The coalescent age of newly-characterized subclades of R8, R8a (15.4±7.2 Kya) and R8b (25.7±10.2 Kya) indicates that the initial maternal colonization of this haplogroup occurred during the middle and upper Paleolithic period, roughly around 40 to 45 Kya. These results signify that the southern part of Orissa currently inhabited by Munda speakers is likely the origin of these autochthonous maternal deep-rooted haplogroups. Our high-resolution study on the genesis of R8 haplogroup provides ample evidence of its deep-rooted ancestry among the Orissa (Austro-Asiatic) tribes.
The European genetic landscape has been shaped by several human migrations occurred since Paleolithic times. The accumulation of archaeological records and the concordance of different lines of genetic evidence during the last two decades have triggered an interesting debate concerning the role of ancient settlers from the Franco-Cantabrian region in the postglacial resettlement of Europe. Among the Franco-Cantabrian populations, Basques are regarded as one of the oldest and more intriguing human groups of Europe. Recent data on complete mitochondrial DNA genomes focused on macrohaplogroup R0 revealed that Basques harbor some autochthonous lineages, suggesting a genetic continuity since pre-Neolithic times. However, excluding haplogroup H, the most representative lineage of macrohaplogroup R0, the majority of maternal lineages of this area remains virtually unexplored, so that further refinement of the mtDNA phylogeny based on analyses at the highest level of resolution is crucial for a better understanding of the European prehistory. We thus explored the maternal ancestry of 548 autochthonous individuals from various Franco-Cantabrian populations and sequenced 76 mitogenomes of the most representative lineages. Interestingly, we identified three mtDNA haplogroups, U5b1f, J1c5c1 and V22, that proved to be representative of Franco-Cantabria, notably of the Basque population. The seclusion and diversity of these female genetic lineages support a local origin in the Franco-Cantabrian area during the Mesolithic of southwestern Europe, ∼10,000 years before present (YBP), with signals of expansions at ∼3,500 YBP. These findings provide robust evidence of a partial genetic continuity between contemporary autochthonous populations from the Franco-Cantabrian region, specifically the Basques, and Paleolithic/Mesolithic hunter-gatherer groups. Furthermore, our results raise the current proportion (≈15%) of the Franco-Cantabrian maternal gene pool with a putative pre-Neolithic origin to ≈35%, further supporting the notion of a predominant Paleolithic genetic substrate in extant European populations.
Human genetic diversity observed in Indian subcontinent is second only to that of Africa. This implies an early settlement and demographic growth soon after the first 'Out-of-Africa' dispersal of anatomically modern humans in Late Pleistocene. In contrast to this perspective, linguistic diversity in India has been thought to derive from more recent population movements and episodes of contact. With the exception of Dravidian, which origin and relatedness to other language phyla is obscure, all the language families in India can be linked to language families spoken in different regions of Eurasia. Mitochondrial DNA and Y chromosome evidence has supported largely local evolution of the genetic lineages of the majority of Dravidian and Indo-European speaking populations, but there is no consensus yet on the question of whether the Munda (Austro-Asiatic) speaking populations originated in India or derive from a relatively recent migration from further East.
Here, we report the analysis of 35 novel complete mtDNA sequences from India which refine the structure of Indian-specific varieties of haplogroup R. Detailed analysis of haplogroup R7, coupled with a survey of ~12,000 mtDNAs from caste and tribal groups over the entire Indian subcontinent, reveals that one of its more recently derived branches (R7a1), is particularly frequent among Munda-speaking tribal groups. This branch is nested within diverse R7 lineages found among Dravidian and Indo-European speakers of India. We have inferred from this that a subset of Munda-speaking groups have acquired R7 relatively recently. Furthermore, we find that the distribution of R7a1 within the Munda-speakers is largely restricted to one of the sub-branches (Kherwari) of northern Munda languages. This evidence does not support the hypothesis that the Austro-Asiatic speakers are the primary source of the R7 variation. Statistical analyses suggest a significant correlation between genetic variation and geography, rather than between genes and languages.
Our high-resolution phylogeographic study, involving diverse linguistic groups in India, suggests that the high frequency of mtDNA haplogroup R7 among Munda speaking populations of India can be explained best by gene flow from linguistically different populations of Indian subcontinent. The conclusion is based on the observation that among Indo-Europeans, and particularly in Dravidians, the haplogroup is, despite its lower frequency, phylogenetically more divergent, while among the Munda speakers only one sub-clade of R7, i.e. R7a1, can be observed. It is noteworthy that though R7 is autochthonous to India, and arises from the root of hg R, its distribution and phylogeography in India is not uniform. This suggests the more ancient establishment of an autochthonous matrilineal genetic structure, and that isolation in the Pleistocene, lineage loss through drift, and endogamy of prehistoric and historic groups have greatly inhibited genetic homogenization and geographical uniformity.
The out of Africa hypothesis has gained generalized consensus. However, many specific questions remain unsettled. To know whether the two M and N macrohaplogroups that colonized Eurasia were already present in Africa before the exit is puzzling. It has been proposed that the east African clade M1 supports a single origin of haplogroup M in Africa. To test the validity of that hypothesis, the phylogeographic analysis of 13 complete mitochondrial DNA (mtDNA) sequences and 261 partial sequences belonging to haplogroup M1 was carried out.
The coalescence age of the African haplogroup M1 is younger than those for other M Asiatic clades. In contradiction to the hypothesis of an eastern Africa origin for modern human expansions out of Africa, the most ancestral M1 lineages have been found in Northwest Africa and in the Near East, instead of in East Africa. The M1 geographic distribution and the relative ages of its different subclades clearly correlate with those of haplogroup U6, for which an Eurasian ancestor has been demonstrated.
This study provides evidence that M1, or its ancestor, had an Asiatic origin. The earliest M1 expansion into Africa occurred in northwestern instead of eastern areas; this early spread reached the Iberian Peninsula even affecting the Basques. The majority of the M1a lineages found outside and inside Africa had a more recent eastern Africa origin. Both western and eastern M1 lineages participated in the Neolithic colonization of the Sahara. The striking parallelism between subclade ages and geographic distribution of M1 and its North African U6 counterpart strongly reinforces this scenario. Finally, a relevant fraction of M1a lineages present today in the European Continent and nearby islands possibly had a Jewish instead of the commonly proposed Arab/Berber maternal ascendance.
Human T-Cell Lymphotropic Virus Type 1 (HTLV-1) is the etiological agent of adult T-cell leukemia (ATL) and HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP). It has been estimated that 10–20 million people are infected worldwide, but no successful treatment is available. Recently, the epidemiology of this virus was addressed in blood donors from Maputo, showing rates from 0.9 to 1.2%. However, the origin and impact of HTLV endemic in this population is unknown.
To assess the HTLV-1 molecular epidemiology in Mozambique and to investigate their relationship with HTLV-1 lineages circulating worldwide.
Blood donors and HIV patients were screened for HTLV antibodies by using enzyme immunoassay, followed by Western Blot. PCR and sequencing of HTLV-1 LTR region were applied and genetic HTLV-1 subtypes were assigned by the neighbor-joining method. The mean genetic distance of Mozambican HTLV-1 lineages among the genetic clusters were determined. Human mitochondrial (mt) DNA analysis was performed and individuals classified in mtDNA haplogroups.
LTR HTLV-1 analysis demonstrated that all isolates belong to the Transcontinental subgroup of the Cosmopolitan subtype. Mozambican HTLV-1 sequences had a high inter-strain genetic distance, reflecting in three major clusters. One cluster is associated with the South Africa sequences, one is related with Middle East and India strains and the third is a specific Mozambican cluster. Interestingly, 83.3% of HIV/HTLV-1 co-infection was observed in the Mozambican cluster. The human mtDNA haplotypes revealed that all belong to the African macrohaplogroup L with frequencies representatives of the country.
The Mozambican HTLV-1 genetic diversity detected in this study reveals that although the strains belong to the most prevalent and worldwide distributed Transcontinental subgroup of the Cosmopolitan subtype, there is a high HTLV diversity that could be correlated with at least 3 different HTLV-1 introductions in the country. The significant rate of HTLV-1a/HIV-1C co-infection, particularly in the Mozambican cluster, has important implications for the controls programs of both viruses.
Human T-cell lymphotropic virus type 1 (HTLV-1) is the causative agent of Adult T-Cell Leukemia/Lymphoma (ATL), the Tropical Spastic Paraparesis/HTLV-1-associated Myelopathy (TSP/HAM) and other inflammatory diseases, including dermatitis, uveitis, and myositis. It is estimated that 2–8% of the infected persons will develop a HTLV-1-associated disease during their lifetimes, frequently TSP/HAM. Thus far, there is not a specific treatment to this progressive and chronic disease. HTLV-1 has means of three transmission: (i) from mother to child during prolonged breastfeeding, (ii) between sexual partners and (iii) through blood transfusion. HTLV-1 has been characterized in 7 subtypes and the geographical distribution and the clinical impact of this infection is not well known, mainly in African population. HTLV-1 is endemic in sub-Saharan Africa. Mozambique is a country of southeastern Africa where TSP/HAM cases were reported. Recently, our group estimated the HTLV prevalence among Mozambican blood donors as 0.9%. In this work we performed a genetic analysis of HTLV-1 in blood donors and HIV/HTLV co-infected patients from Maputo, Mozambique. Our results showed the presence of three HTLV-1 clusters within the Cosmopolitan/Transcontinental subtype/subgroup. The differential rates of HIV-1/HTLV-1 co-infection in the three HTLV-1 clusters demonstrated the dynamic of the two viruses and the need for implementation of control measures focusing on both retroviruses.
We have analyzed 7,137 samples from 125 different caste, tribal and religious groups of India and 99 samples from three populations of Nepal for the length variation in the COII/tRNALys region of mtDNA. Samples showing length variation were subjected to detailed phylogenetic analysis based on HVS-I and informative coding region sequence variation. The overall frequencies of the 9-bp deletion and insertion variants in South Asia were 1.9 and 0.6%, respectively. We have also defined a novel deep-rooting haplogroup M43 and identified the rare haplogroup H14 in Indian populations carrying the 9-bp deletion by complete mtDNA sequencing. Moreover, we redefined haplogroup M6 and dissected it into two well-defined subclades. The presence of haplogroups F1 and B5a in Uttar Pradesh suggests minor maternal contribution from Southeast Asia to Northern India. The occurrence of haplogroup F1 in the Nepalese sample implies that Nepal might have served as a bridge for the flow of eastern lineages to India. The presence of R6 in the Nepalese, on the other hand, suggests that the gene flow between India and Nepal has been reciprocal.
South Asia; 9bp indel; mtDNA; Haplogroup
We have analyzed 7137 samples from 125 different caste, tribal and religious groups of India and 99 samples from three populations of Nepal for the length variation in the COII/tRNALys region of mtDNA. Samples showing length variation were subjected to detailed phylogenetic analysis based on HVS-I and informative coding region sequence variation. The overall frequencies of the 9-bp deletion and insertion variants in South Asia were 1.8% and 0.5%, respectively. We have also defined a novel deep-rooting haplogroup M43 and identified the rare haplogroup H14 in Indian populations carrying the 9bp-deletion by complete mtDNA sequencing. Moreover, we redefined haplogroup M6 and dissected it into two well-defined subclades. The presence of haplogroups F1 and B5a in Uttar Pradesh suggests minor maternal contribution from Southeast Asia to Northern India. The occurrence of haplogroup F1 in the Nepalese sample implies that Nepal might have served as a bridge for the flow of eastern lineages to India. The presence of R6 in the Nepalese, on the other hand, suggests that the gene flow between India and Nepal has been reciprocal.
South Asia; 9bp indel; mtDNA; Haplogroup
Recent advances in the understanding of the maternal and paternal heritage of south and southwest Asian populations have highlighted their role in the colonization of Eurasia by anatomically modern humans. Further understanding requires a deeper insight into the topology of the branches of the Indian mtDNA phylogenetic tree, which should be contextualized within the phylogeography of the neighboring regional mtDNA variation. Accordingly, we have analyzed mtDNA control and coding region variation in 796 Indian (including both tribal and caste populations from different parts of India) and 436 Iranian mtDNAs. The results were integrated and analyzed together with published data from South, Southeast Asia and West Eurasia.
Four new Indian-specific haplogroup M sub-clades were defined. These, in combination with two previously described haplogroups, encompass approximately one third of the haplogroup M mtDNAs in India. Their phylogeography and spread among different linguistic phyla and social strata was investigated in detail. Furthermore, the analysis of the Iranian mtDNA pool revealed patterns of limited reciprocal gene flow between Iran and the Indian sub-continent and allowed the identification of different assemblies of shared mtDNA sub-clades.
Since the initial peopling of South and West Asia by anatomically modern humans, when this region may well have provided the initial settlers who colonized much of the rest of Eurasia, the gene flow in and out of India of the maternally transmitted mtDNA has been surprisingly limited. Specifically, our analysis of the mtDNA haplogroups, which are shared between Indian and Iranian populations and exhibit coalescence ages corresponding to around the early Upper Paleolithic, indicates that they are present in India largely as Indian-specific sub-lineages. In contrast, other ancient Indian-specific variants of M and R are very rare outside the sub-continent.
To provide a screening tool to reduce time and sample consumption when attempting mtDNA haplogroup typing.
A single base primer extension assay was developed to enable typing, in a single reaction, of twelve mtDNA haplogroup specific polymorphisms. For validation purposes a total of 147 samples were tested including 73 samples successfully haplogroup typed using mtDNA control region (CR) sequence data, 21 samples inconclusively haplogroup typed by CR data, 20 samples previously haplogroup typed using restriction fragment length polymorphism (RFLP) analysis, and 31 samples of known ancestral origin without previous haplogroup typing. Additionally, two highly degraded human bones embalmed and buried in the early 1950s were analyzed using the single nucleotide polymorphisms (SNP) multiplex.
When the SNP multiplex was used to type the 96 previously CR sequenced specimens, an increase in haplogroup or macrohaplogroup assignment relative to conventional CR sequence analysis was observed. The single base extension assay was also successfully used to assign a haplogroup to decades-old, embalmed skeletal remains dating to World War II.
The SNP multiplex was successfully used to obtain haplogroup status of highly degraded human bones, and demonstrated the ability to eliminate possible contributors. The SNP multiplex provides a low-cost, high throughput method for typing of mtDNA haplogroups A, B, C, D, E, F, G, H, L1/L2, L3, M, and N that could be useful for screening purposes for human identification efforts and anthropological studies.
India is a country with enormous social and cultural diversity due to its positioning on the crossroads of many historic and pre-historic human migrations. The hierarchical caste system in the Hindu society dominates the social structure of the Indian populations. The origin of the caste system in India is a matter of debate with many linguists and anthropologists suggesting that it began with the arrival of Indo-European speakers from Central Asia about 3500 years ago. Previous genetic studies based on Indian populations failed to achieve a consensus in this regard. We analysed the Y-chromosome and mitochondrial DNA of three tribal populations of southern India, compared the results with available data from the Indian subcontinent and tried to reconstruct the evolutionary history of Indian caste and tribal populations.
No significant difference was observed in the mitochondrial DNA between Indian tribal and caste populations, except for the presence of a higher frequency of west Eurasian-specific haplogroups in the higher castes, mostly in the north western part of India. On the other hand, the study of the Indian Y lineages revealed distinct distribution patterns among caste and tribal populations. The paternal lineages of Indian lower castes showed significantly closer affinity to the tribal populations than to the upper castes. The frequencies of deep-rooted Y haplogroups such as M89, M52, and M95 were higher in the lower castes and tribes, compared to the upper castes.
The present study suggests that the vast majority (>98%) of the Indian maternal gene pool, consisting of Indio-European and Dravidian speakers, is genetically more or less uniform. Invasions after the late Pleistocene settlement might have been mostly male-mediated. However, Y-SNP data provides compelling genetic evidence for a tribal origin of the lower caste populations in the subcontinent. Lower caste groups might have originated with the hierarchical divisions that arose within the tribal groups with the spread of Neolithic agriculturalists, much earlier than the arrival of Aryan speakers. The Indo-Europeans established themselves as upper castes among this already developed caste-like class structure within the tribes.
The Asian origin of Native Americans is largely accepted. However uncertainties persist regarding the source population(s) within Asia, the divergence and arrival time(s) of the founder groups, the number of expansion events, and migration routes into the New World. mtDNA data, presented over the past two decades, have been used to suggest a single-migration model for which the Beringian land mass plays an important role.
In our analysis of 568 mitochondrial genomes, the coalescent age estimates of shared roots between Native American and Siberian-Asian lineages, calculated using two different mutation rates, are A4 (27.5 ± 6.8 kya/22.7 ± 7.4 kya), C1 (21.4 ± 2.7 kya/16.4 ± 1.5 kya), C4 (21.0 ± 4.6 kya/20.0 ± 6.4 kya), and D4e1 (24.1 ± 9.0 kya/17.9 ± 10.0 kya). The coalescent age estimates of pan-American haplogroups calculated using the same two mutation rates (A2:19.5 ± 1.3 kya/16.1 ± 1.5 kya, B2:20.8 ± 2.0 kya/18.1 ± 2.4 kya, C1:21.4 ± 2.7 kya/16.4 ± 1.5 kya and D1:17.2 ± 2.0 kya/14.9 ± 2.2 kya) and estimates of population expansions within America (~21-16 kya), support the pre-Clovis occupation of the New World. The phylogeography of sublineages within American haplogroups A2, B2, D1 and the C1b, C1c andC1d subhaplogroups of C1 are complex and largely specific to geographical North, Central and South America. However some sub-branches (B2b, C1b, C1c, C1d and D1f) already existed in American founder haplogroups before expansion into the America.
Our results suggest that Native American founders diverged from their Siberian-Asian progenitors sometime during the last glacial maximum (LGM) and expanded into America soon after the LGM peak (~20-16 kya). The phylogeography of haplogroup C1 suggest that this American founder haplogroup differentiated in Siberia-Asia. The situation is less clear for haplogroup B2, however haplogroups A2 and D1 may have differentiated soon after the Native American founders divergence. A moderate population bottle neck in American founder populations just before the expansion most plausibly resulted in few founder types in America. The similar estimates of the diversity indices and Bayesian skyline analysis in North America, Central America and South America suggest almost simultaneous (~ 2.0 ky from South to North America) colonization of these geographical regions with rapid population expansion differentiating into more or less regional branches across the pan-American haplogroups.
Genetic affinities between aboriginal Taiwanese and populations from Oceania and Southeast Asia have previously been explored through analyses of mitochondrial DNA (mtDNA), Y chromosomal DNA, and human leukocyte antigen loci. Recent genetic studies have supported the “slow boat” and “entangled bank” models according to which the Polynesian migration can be seen as an expansion from Melanesia without any major direct genetic thread leading back to its initiation from Taiwan. We assessed mtDNA variation in 640 individuals from nine tribes of the central mountain ranges and east coast regions of Taiwan. In contrast to the Han populations, the tribes showed a low frequency of haplogroups D4 and G, and an absence of haplogroups A, C, Z, M9, and M10. Also, more than 85% of the maternal lineages were nested within haplogroups B4, B5a, F1a, F3b, E, and M7. Although indicating a common origin of the populations of insular Southeast Asia and Oceania, most mtDNA lineages in Taiwanese aboriginal populations are grouped separately from those found in China and the Taiwan general (Han) population, suggesting a prevalence in the Taiwanese aboriginal gene pool of its initial late Pleistocene settlers. Interestingly, from complete mtDNA sequencing information, most B4a lineages were associated with three coding region substitutions, defining a new subclade, B4a1a, that endorses the origin of Polynesian migration from Taiwan. Coalescence times of B4a1a were 13.2 ± 3.8 thousand years (or 9.3 ± 2.5 thousand years in Papuans and Polynesians). Considering the lack of a common specific Y chromosomal element shared by the Taiwanese aboriginals and Polynesians, the mtDNA evidence provided here is also consistent with the suggestion that the proto-Oceanic societies would have been mainly matrilocal.
An extensive phylogenetic analysis of mtDNA from nine Taiwanese tribes reveals an unambiguous genetic link between aboriginal Taiwanese and Polynesian populations, to the exclusion of mainland Asians.
Ancient DNA methodology was applied to analyse sequences extracted from freshly unearthed remains (teeth) of 4 individuals deeply deposited in slightly alkaline soil of the Tell Ashara (ancient Terqa) and Tell Masaikh (ancient Kar-Assurnasirpal) Syrian archaeological sites, both in the middle Euphrates valley. Dated to the period between 2.5 Kyrs BC and 0.5 Kyrs AD the studied individuals carried mtDNA haplotypes corresponding to the M4b1, M49 and/or M61 haplogroups, which are believed to have arisen in the area of the Indian subcontinent during the Upper Paleolithic and are absent in people living today in Syria. However, they are present in people inhabiting today’s Tibet, Himalayas, India and Pakistan. We anticipate that the analysed remains from Mesopotamia belonged to people with genetic affinity to the Indian subcontinent since the distribution of identified ancient haplotypes indicates solid link with populations from the region of South Asia-Tibet (Trans-Himalaya). They may have been descendants of migrants from much earlier times, spreading the clades of the macrohaplogroup M throughout Eurasia and founding regional Mesopotamian groups like that of Terqa or just merchants moving along trade routes passing near or through the region. None of the successfully identified nuclear alleles turned out to be ΔF508 CFTR, LCT-13910T or Δ32 CCR5.
Current models propose that mitochondrial DNA macrohaplogroups M and N evolved from haplogroup L3 soon after modern humans left Africa. Increasingly, however, analysis of isolated populations is filling in the details of, and in some cases challenging, aspects of this general model.
Here, we present the first comprehensive study of three such isolated populations from Madagascar: the Mikea hunter-gatherers, the neighbouring Vezo fishermen, and the Merina central highlanders (n = 266). Complete mitochondrial DNA genome sequences reveal several unresolved lineages, and a new, deep branch of the out-of-Africa founder clade M has been identified. This new haplogroup, M23, has a limited global distribution, and is restricted to Madagascar and a limited range of African and Southwest Asian groups.
The geographic distribution, phylogenetic placement and molecular age of M23 suggest that the colonization of Madagascar was more complex than previously thought.
Human settlement and migrations along sides of Bay-of-Bengal have played a vital role in shaping the genetic landscape of Bangladesh, Eastern India and Southeast Asia. Bangladesh and Northeast India form the vital land bridge between the South and Southeast Asia. To reconstruct the population history of this region and to see whether this diverse region geographically acted as a corridor or barrier for human interaction between South Asia and Southeast Asia, we, for the first time analyzed high resolution uniparental (mtDNA and Y chromosome) and biparental autosomal genetic markers among aboriginal Bangladesh tribes currently speaking Tibeto-Burman language. All the three studied populations; Chakma, Marma and Tripura from Bangladesh showed strikingly high homogeneity among themselves and strong affinities to Northeast Indian Tibeto-Burman groups. However, they show substantially higher molecular diversity than Northeast Indian populations. Unlike Austroasiatic (Munda) speakers of India, we observed equal role of both males and females in shaping the Tibeto-Burman expansion in Southern Asia. Moreover, it is noteworthy that in admixture proportion, TB populations of Bangladesh carry substantially higher mainland Indian ancestry component than Northeast Indian Tibeto-Burmans. Largely similar expansion ages of two major paternal haplogroups (O2a and O3a3c), suggested that they arose before the differentiation of any language group and approximately at the same time. Contrary to the scenario proposed for colonization of Northeast India as male founder effect that occurred within the past 4,000 years, we suggest a significantly deep colonization of this region. Overall, our extensive analysis revealed that the population history of South Asian Tibeto-Burman speakers is more complex than it was suggested before.
Central Asia and the Indian subcontinent represent an area considered as a source and a reservoir for human genetic diversity, with many markers taking root here, most of which are the ancestral state of eastern and western haplogroups, while others are local. Between these two regions, Terai (Nepal) is a pivotal passageway allowing, in different times, multiple population interactions, although because of its highly malarial environment, it was scarcely inhabited until a few decades ago, when malaria was eradicated. One of the oldest and the largest indigenous people of Terai is represented by the malaria resistant Tharus, whose gene pool could still retain traces of ancient complex interactions. Until now, however, investigations on their genetic structure have been scarce mainly identifying East Asian signatures.
High-resolution analyses of mitochondrial-DNA (including 34 complete sequences) and Y-chromosome (67 SNPs and 12 STRs) variations carried out in 173 Tharus (two groups from Central and one from Eastern Terai), and 104 Indians (Hindus from Terai and New Delhi and tribals from Andhra Pradesh) allowed the identification of three principal components: East Asian, West Eurasian and Indian, the last including both local and inter-regional sub-components, at least for the Y chromosome.
Although remarkable quantitative and qualitative differences appear among the various population groups and also between sexes within the same group, many mitochondrial-DNA and Y-chromosome lineages are shared or derived from ancient Indian haplogroups, thus revealing a deep shared ancestry between Tharus and Indians. Interestingly, the local Y-chromosome Indian component observed in the Andhra-Pradesh tribals is present in all Tharu groups, whereas the inter-regional component strongly prevails in the two Hindu samples and other Nepalese populations.
The complete sequencing of mtDNAs from unresolved haplogroups also provided informative markers that greatly improved the mtDNA phylogeny and allowed the identification of ancient relationships between Tharus and Malaysia, the Andaman Islands and Japan as well as between India and North and East Africa. Overall, this study gives a paradigmatic example of the importance of genetic isolates in revealing variants not easily detectable in the general population.
Linguistic and genetic studies on Roma populations inhabited in Europe have unequivocally traced these populations to the Indian subcontinent. However, the exact parental population group and time of the out-of-India dispersal have remained disputed. In the absence of archaeological records and with only scanty historical documentation of the Roma, comparative linguistic studies were the first to identify their Indian origin. Recently, molecular studies on the basis of disease-causing mutations and haploid DNA markers (i.e. mtDNA and Y-chromosome) supported the linguistic view. The presence of Indian-specific Y-chromosome haplogroup H1a1a-M82 and mtDNA haplogroups M5a1, M18 and M35b among Roma has corroborated that their South Asian origins and later admixture with Near Eastern and European populations. However, previous studies have left unanswered questions about the exact parental population groups in South Asia. Here we present a detailed phylogeographical study of Y-chromosomal haplogroup H1a1a-M82 in a data set of more than 10,000 global samples to discern a more precise ancestral source of European Romani populations. The phylogeographical patterns and diversity estimates indicate an early origin of this haplogroup in the Indian subcontinent and its further expansion to other regions. Tellingly, the short tandem repeat (STR) based network of H1a1a-M82 lineages displayed the closest connection of Romani haplotypes with the traditional scheduled caste and scheduled tribe population groups of northwestern India.
The application of quasi-median networks provides an effective tool to check the quality of mtDNA data. Filtering of highly recurrent mutations prior to network analysis is required to simplify the data set and reduce the complexity of the network. The phylogenetic background determines those mutations that need to be filtered. While the traditional EMPOPspeedy filter was based on the worldwide mtDNA phylogeny, haplogroup-specific filters can more effectively highlight potential errors in data of the respective (sub)-continental region. In this study we demonstrate the performance of a new, west Eurasian filter EMPOPspeedyWE for the fine-tuned examination of data sets belonging to macrohaplogroup N that constitutes the main portion of mtDNA lineages in Europe. The effects on the resulting network of different database sizes, high-quality and flawed data, as well as the examination of a phylogenetically distant data set, are presented by examples. The analyses are based on a west Eurasian etalon data set that was carefully compiled from more than 3500 control region sequences for network purposes. Both, etalon data and the new filter file, are provided through the EMPOP database (www.empop.org).
mtDNA population data; West Eurasia; Network analysis; Filter analysis; Error detection; EMPOP
In agreement with historical documentation, several genetic studies have revealed ancestral links between the European Romani and India. The entire mitochondrial DNA (mtDNA) of 27 Spanish Romani was sequenced in order to shed further light on the origins of this population. The data were analyzed together with a large published dataset (mainly hypervariable region I [HVS-I] haplotypes) of Romani (N = 1,353) and non-Romani worldwide populations (N>150,000). Analysis of mitogenomes allowed the characterization of various Romani-specific clades. M5a1b1a1 is the most distinctive European Romani haplogroup; it is present in all Romani groups at variable frequencies (with only sporadic findings in non-Romani) and represents 18% of their mtDNA pool. Its phylogeographic features indicate that M5a1b1a1 originated 1.5 thousand years ago (kya; 95% CI: 1.3–1.8) in a proto-Romani population living in Northwest India. U3 represents the most characteristic Romani haplogroup of European/Near Eastern origin (12.4%); it appears at dissimilar frequencies across the continent (Iberia: ∼31%; Eastern/Central Europe: ∼13%). All U3 mitogenomes of our Iberian Romani sample fall within a new sub-clade, U3b1c, which can be dated to 0.5 kya (95% CI: 0.3–0.7); therefore, signaling a lower bound for the founder event that followed admixture in Europe/Near East. Other minor European/Near Eastern haplogroups (e.g. H24, H88a) were also assimilated into the Romani by introgression with neighboring populations during their diaspora into Europe; yet some show a differentiation from the phylogenetically closest non-Romani counterpart. The phylogeny of Romani mitogenomes shows clear signatures of low effective population sizes and founder effects. Overall, these results are in good agreement with historical documentation, suggesting that cultural identity and relative isolation have allowed the Romani to preserve a distinctive mtDNA heritage, with some features linking them unequivocally to their ancestral Indian homeland.
The "out of Africa" model postulating single "southern route" dispersal posits arrival of "Anatomically Modern Human" to Indian subcontinent around 66–70 thousand years before present (kyBP). However the contributions and legacy of these earliest settlers in contemporary Indian populations, owing to the complex past population dynamics and later migrations has been an issue of controversy. The high frequency of mitochondrial lineage "M2" consistent with its greater age and distribution suggests that it may represent the phylogenetic signature of earliest settlers. Accordingly, we attempted to re-evaluate the impact and contribution of earliest settlers in shaping the genetic diversity and structure of contemporary Indian populations; using our newly sequenced 72 and 4 published complete mitochondrial genomes of this lineage.
The M2 lineage, harbouring two deep rooting subclades M2a and M2b encompasses approximately one tenth of the mtDNA pool of studied tribes. The phylogeographic spread and diversity indices of M2 and its subclades among the tribes of different geographic regions and linguistic phyla were investigated in detail. Further the reconstructed demographic history of M2 lineage as a surrogate of earliest settlers' component revealed that the demographic events with pronounced regional variations had played pivotal role in shaping the complex net of populations phylogenetic relationship in Indian subcontinent.
Our results suggest that tribes of southern and eastern region along with Dravidian and Austro-Asiatic speakers of central India are the modern representatives of earliest settlers of subcontinent. The Last Glacial Maximum aridity and post LGM population growth mechanised some sort of homogeneity and redistribution of earliest settlers' component in India. The demic diffusion of agriculture and associated technologies around 3 kyBP, which might have marginalized hunter-gatherer, is coincidental with the decline of earliest settlers' population during this period.
The issue of errors in genetic data sets is of growing concern, particularly in population genetics where whole genome mtDNA sequence data is coming under increased scrutiny. Multiplexed PCR reactions, combined with SNP typing, are currently under-exploited in this context, but have the potential to genotype whole populations rapidly and accurately, significantly reducing the amount of errors appearing in published data sets. To show the sensitivity of this technique for screening mtDNA genomic sequence data, 20 historic samples of the enigmatic Andaman Islanders and 12 modern samples from three Indian tribal populations (Chenchu, Lambadi and Lodha) were genotyped for 20 coding region sites after provisional haplogroup assignment with control region sequences. The genotype data from the historic samples significantly revise the topologies for the Andaman M31 and M32 mtDNA lineages by rectifying conflicts in published data sets. The new Indian data extend the distribution of the M31a lineage to South Asia, challenging previous interpretations of mtDNA phylogeography. This genetic connection between the ancestors of the Andamanese and South Asian tribal groups ∼30 kya has important implications for the debate concerning migration routes and settlement patterns of humans leaving Africa during the late Pleistocene, and indicates the need for more detailed genotyping strategies. The methodology serves as a low-cost, high-throughput model for the production and authentication of data from modern or ancient DNA, and demonstrates the value of museum collections as important records of human genetic diversity.
Sakha – an area connecting South and Northeast Siberia – is significant for understanding the history of peopling of Northeast Eurasia and the Americas. Previous studies have shown a genetic contiguity between Siberia and East Asia and the key role of South Siberia in the colonization of Siberia.
We report the results of a high-resolution phylogenetic analysis of 701 mtDNAs and 318 Y chromosomes from five native populations of Sakha (Yakuts, Evenks, Evens, Yukaghirs and Dolgans) and of the analysis of more than 500,000 autosomal SNPs of 758 individuals from 55 populations, including 40 previously unpublished samples from Siberia. Phylogenetically terminal clades of East Asian mtDNA haplogroups C and D and Y-chromosome haplogroups N1c, N1b and C3, constituting the core of the gene pool of the native populations from Sakha, connect Sakha and South Siberia. Analysis of autosomal SNP data confirms the genetic continuity between Sakha and South Siberia. Maternal lineages D5a2a2, C4a1c, C4a2, C5b1b and the Yakut-specific STR sub-clade of Y-chromosome haplogroup N1c can be linked to a migration of Yakut ancestors, while the paternal lineage C3c was most likely carried to Sakha by the expansion of the Tungusic people. MtDNA haplogroups Z1a1b and Z1a3, present in Yukaghirs, Evens and Dolgans, show traces of different and probably more ancient migration(s). Analysis of both haploid loci and autosomal SNP data revealed only minor genetic components shared between Sakha and the extreme Northeast Siberia. Although the major part of West Eurasian maternal and paternal lineages in Sakha could originate from recent admixture with East Europeans, mtDNA haplogroups H8, H20a and HV1a1a, as well as Y-chromosome haplogroup J, more probably reflect an ancient gene flow from West Eurasia through Central Asia and South Siberia.
Our high-resolution phylogenetic dissection of mtDNA and Y-chromosome haplogroups as well as analysis of autosomal SNP data suggests that Sakha was colonized by repeated expansions from South Siberia with minor gene flow from the Lower Amur/Southern Okhotsk region and/or Kamchatka. The minor West Eurasian component in Sakha attests to both recent and ongoing admixture with East Europeans and an ancient gene flow from West Eurasia.
mtDNA; Y chromosome; Autosomal SNPs; Sakha