1.  Phylogeography of mtDNA haplogroup R7 in the Indian peninsula 
Human genetic diversity observed in Indian subcontinent is second only to that of Africa. This implies an early settlement and demographic growth soon after the first 'Out-of-Africa' dispersal of anatomically modern humans in Late Pleistocene. In contrast to this perspective, linguistic diversity in India has been thought to derive from more recent population movements and episodes of contact. With the exception of Dravidian, which origin and relatedness to other language phyla is obscure, all the language families in India can be linked to language families spoken in different regions of Eurasia. Mitochondrial DNA and Y chromosome evidence has supported largely local evolution of the genetic lineages of the majority of Dravidian and Indo-European speaking populations, but there is no consensus yet on the question of whether the Munda (Austro-Asiatic) speaking populations originated in India or derive from a relatively recent migration from further East.
Here, we report the analysis of 35 novel complete mtDNA sequences from India which refine the structure of Indian-specific varieties of haplogroup R. Detailed analysis of haplogroup R7, coupled with a survey of ~12,000 mtDNAs from caste and tribal groups over the entire Indian subcontinent, reveals that one of its more recently derived branches (R7a1), is particularly frequent among Munda-speaking tribal groups. This branch is nested within diverse R7 lineages found among Dravidian and Indo-European speakers of India. We have inferred from this that a subset of Munda-speaking groups have acquired R7 relatively recently. Furthermore, we find that the distribution of R7a1 within the Munda-speakers is largely restricted to one of the sub-branches (Kherwari) of northern Munda languages. This evidence does not support the hypothesis that the Austro-Asiatic speakers are the primary source of the R7 variation. Statistical analyses suggest a significant correlation between genetic variation and geography, rather than between genes and languages.
Our high-resolution phylogeographic study, involving diverse linguistic groups in India, suggests that the high frequency of mtDNA haplogroup R7 among Munda speaking populations of India can be explained best by gene flow from linguistically different populations of Indian subcontinent. The conclusion is based on the observation that among Indo-Europeans, and particularly in Dravidians, the haplogroup is, despite its lower frequency, phylogenetically more divergent, while among the Munda speakers only one sub-clade of R7, i.e. R7a1, can be observed. It is noteworthy that though R7 is autochthonous to India, and arises from the root of hg R, its distribution and phylogeography in India is not uniform. This suggests the more ancient establishment of an autochthonous matrilineal genetic structure, and that isolation in the Pleistocene, lineage loss through drift, and endogamy of prehistoric and historic groups have greatly inhibited genetic homogenization and geographical uniformity.
PMCID: PMC2529308  PMID: 18680585
2.  Genetic Structure of Tibeto-Burman Populations of Bangladesh: Evaluating the Gene Flow along the Sides of Bay-of-Bengal 
PLoS ONE  2013;8(10):e75064.
Human settlement and migrations along sides of Bay-of-Bengal have played a vital role in shaping the genetic landscape of Bangladesh, Eastern India and Southeast Asia. Bangladesh and Northeast India form the vital land bridge between the South and Southeast Asia. To reconstruct the population history of this region and to see whether this diverse region geographically acted as a corridor or barrier for human interaction between South Asia and Southeast Asia, we, for the first time analyzed high resolution uniparental (mtDNA and Y chromosome) and biparental autosomal genetic markers among aboriginal Bangladesh tribes currently speaking Tibeto-Burman language. All the three studied populations; Chakma, Marma and Tripura from Bangladesh showed strikingly high homogeneity among themselves and strong affinities to Northeast Indian Tibeto-Burman groups. However, they show substantially higher molecular diversity than Northeast Indian populations. Unlike Austroasiatic (Munda) speakers of India, we observed equal role of both males and females in shaping the Tibeto-Burman expansion in Southern Asia. Moreover, it is noteworthy that in admixture proportion, TB populations of Bangladesh carry substantially higher mainland Indian ancestry component than Northeast Indian Tibeto-Burmans. Largely similar expansion ages of two major paternal haplogroups (O2a and O3a3c), suggested that they arose before the differentiation of any language group and approximately at the same time. Contrary to the scenario proposed for colonization of Northeast India as male founder effect that occurred within the past 4,000 years, we suggest a significantly deep colonization of this region. Overall, our extensive analysis revealed that the population history of South Asian Tibeto-Burman speakers is more complex than it was suggested before.
PMCID: PMC3794028  PMID: 24130682
3.  Presence of three different paternal lineages among North Indians: A study of 560 Y chromosomes 
Annals of human biology  2009;36(1):46-59.
The genetic structure, affinities, and diversity of the 1 billion Indians hold important keys to numerous unanswered questions regarding the evolution of human populations and the forces shaping contemporary patterns of genetic variation. Although there have been several recent studies of South Indian caste groups, North Indian caste groups, and South Indian Muslims using Y-chromosomal markers, overall, the Indian population has still not been well studied compared to other geographical populations. In particular, no genetic study has been conducted on Shias and Sunnis from North India.
This study aims to investigate genetic variation and the gene pool in North Indians.
Subjects and methods
A total of 32 Y-chromosomal markers in 560 North Indian males collected from three higher caste groups (Brahmins, Chaturvedis and Bhargavas) and two Muslims groups (Shia and Sunni) were genotyped.
Three distinct lineages were revealed based upon 13 haplogroups. The first was a Central Asian lineage harbouring haplogroups R1 and R2. The second lineage was of Middle-Eastern origin represented by haplogroups J2*, Shia-specific E1b1b1, and to some extent G* and L*. The third was the indigenous Indian Y-lineage represented by haplogroups H1*, F*, C* and O*. Haplogroup E1b1b1 was observed in Shias only.
The results revealed that a substantial part of today’s North Indian paternal gene pool was contributed by Central Asian lineages who are Indo-European speakers, suggesting that extant Indian caste groups are primarily the descendants of Indo-European migrants. The presence of haplogroup E in Shias, first reported in this study, suggests a genetic distinction between the two Indo Muslim sects. The findings of the present study provide insights into prehistoric and early historic patterns of migration into India and the evolution of Indian populations in recent history.
PMCID: PMC2755252  PMID: 19058044
Paternal lineages; Y-chromosomal markers; North Indians; migration
4.  Ancestral European roots of Helicobacter pylori in India 
BMC Genomics  2007;8:184.
The human gastric pathogen Helicobacter pylori is co-evolved with its host and therefore, origins and expansion of multiple populations and sub populations of H. pylori mirror ancient human migrations. Ancestral origins of H. pylori in the vast Indian subcontinent are debatable. It is not clear how different waves of human migrations in South Asia shaped the population structure of H. pylori. We tried to address these issues through mapping genetic origins of present day H. pylori in India and their genomic comparison with hundreds of isolates from different geographic regions.
We attempted to dissect genetic identity of strains by multilocus sequence typing (MLST) of the 7 housekeeping genes (atpA, efp, ureI, ppa, mutY, trpC, yphC) and phylogeographic analysis of haplotypes using MEGA and NETWORK software while incorporating DNA sequences and genotyping data of whole cag pathogenicity-islands (cagPAI). The distribution of cagPAI genes within these strains was analyzed by using PCR and the geographic type of cagA phosphorylation motif EPIYA was determined by gene sequencing. All the isolates analyzed revealed European ancestry and belonged to H. pylori sub-population, hpEurope. The cagPAI harbored by Indian strains revealed European features upon PCR based analysis and whole PAI sequencing.
These observations suggest that H. pylori strains in India share ancestral origins with their European counterparts. Further, non-existence of other sub-populations such as hpAfrica and hpEastAsia, at least in our collection of isolates, suggest that the hpEurope strains enjoyed a special fitness advantage in Indian stomachs to out-compete any endogenous strains. These results also might support hypotheses related to gene flow in India through Indo-Aryans and arrival of Neolithic practices and languages from the Fertile Crescent.
PMCID: PMC1925095  PMID: 17584914
5.  Genetic affinities among the lower castes and tribal groups of India: inference from Y chromosome and mitochondrial DNA 
BMC Genetics  2006;7:42.
India is a country with enormous social and cultural diversity due to its positioning on the crossroads of many historic and pre-historic human migrations. The hierarchical caste system in the Hindu society dominates the social structure of the Indian populations. The origin of the caste system in India is a matter of debate with many linguists and anthropologists suggesting that it began with the arrival of Indo-European speakers from Central Asia about 3500 years ago. Previous genetic studies based on Indian populations failed to achieve a consensus in this regard. We analysed the Y-chromosome and mitochondrial DNA of three tribal populations of southern India, compared the results with available data from the Indian subcontinent and tried to reconstruct the evolutionary history of Indian caste and tribal populations.
No significant difference was observed in the mitochondrial DNA between Indian tribal and caste populations, except for the presence of a higher frequency of west Eurasian-specific haplogroups in the higher castes, mostly in the north western part of India. On the other hand, the study of the Indian Y lineages revealed distinct distribution patterns among caste and tribal populations. The paternal lineages of Indian lower castes showed significantly closer affinity to the tribal populations than to the upper castes. The frequencies of deep-rooted Y haplogroups such as M89, M52, and M95 were higher in the lower castes and tribes, compared to the upper castes.
The present study suggests that the vast majority (>98%) of the Indian maternal gene pool, consisting of Indio-European and Dravidian speakers, is genetically more or less uniform. Invasions after the late Pleistocene settlement might have been mostly male-mediated. However, Y-SNP data provides compelling genetic evidence for a tribal origin of the lower caste populations in the subcontinent. Lower caste groups might have originated with the hierarchical divisions that arose within the tribal groups with the spread of Neolithic agriculturalists, much earlier than the arrival of Aryan speakers. The Indo-Europeans established themselves as upper castes among this already developed caste-like class structure within the tribes.
PMCID: PMC1569435  PMID: 16893451
6.  Genetic variation in South Indian castes: evidence from Y-chromosome, mitochondrial, and autosomal polymorphisms 
BMC Genetics  2008;9:86.
Major population movements, social structure, and caste endogamy have influenced the genetic structure of Indian populations. An understanding of these influences is increasingly important as gene mapping and case-control studies are initiated in South Indian populations.
We report new data on 155 individuals from four Tamil caste populations of South India and perform comparative analyses with caste populations from the neighboring state of Andhra Pradesh. Genetic differentiation among Tamil castes is low (RST = 0.96% for 45 autosomal short tandem repeat (STR) markers), reflecting a largely common origin. Nonetheless, caste- and continent-specific patterns are evident. For 32 lineage-defining Y-chromosome SNPs, Tamil castes show higher affinity to Europeans than to eastern Asians, and genetic distance estimates to the Europeans are ordered by caste rank. For 32 lineage-defining mitochondrial SNPs and hypervariable sequence (HVS) 1, Tamil castes have higher affinity to eastern Asians than to Europeans. For 45 autosomal STRs, upper and middle rank castes show higher affinity to Europeans than do lower rank castes from either Tamil Nadu or Andhra Pradesh. Local between-caste variation (Tamil Nadu RST = 0.96%, Andhra Pradesh RST = 0.77%) exceeds the estimate of variation between these geographically separated groups (RST = 0.12%). Low, but statistically significant, correlations between caste rank distance and genetic distance are demonstrated for Tamil castes using Y-chromosome, mtDNA, and autosomal data.
Genetic data from Y-chromosome, mtDNA, and autosomal STRs are in accord with historical accounts of northwest to southeast population movements in India. The influence of ancient and historical population movements and caste social structure can be detected and replicated in South Indian caste populations from two different geographic regions.
PMCID: PMC2621241  PMID: 19077280
7.  Ancestry-Shift Refinement Mapping of the C6orf97-ESR1 Breast Cancer Susceptibility Locus 
PLoS Genetics  2010;6(7):e1001029.
We used an approach that we term ancestry-shift refinement mapping to investigate an association, originally discovered in a GWAS of a Chinese population, between rs2046210[T] and breast cancer susceptibility. The locus is on 6q25.1 in proximity to the C6orf97 and estrogen receptor α (ESR1) genes. We identified a panel of SNPs that are correlated with rs2046210 in Chinese, but not necessarily so in other ancestral populations, and genotyped them in breast cancer case∶control samples of Asian, European, and African origin, a total of 10,176 cases and 13,286 controls. We found that rs2046210[T] does not confer substantial risk of breast cancer in Europeans and Africans (OR = 1.04, P = 0.099, and OR = 0.98, P = 0.77, respectively). Rather, in those ancestries, an association signal arises from a group of less common SNPs typified by rs9397435. The rs9397435[G] allele was found to confer risk of breast cancer in European (OR = 1.15, P = 1.2×10−3), African (OR = 1.35, P = 0.014), and Asian (OR = 1.23, P = 2.9×10−4) population samples. Combined over all ancestries, the OR was 1.19 (P = 3.9×10−7), was without significant heterogeneity between ancestries (Phet = 0.36) and the SNP fully accounted for the association signal in each ancestry. Haplotypes bearing rs9397435[G] are well tagged by rs2046210[T] only in Asians. The rs9397435[G] allele showed associations with both estrogen receptor positive and estrogen receptor negative breast cancer. Using early-draft data from the 1,000 Genomes project, we found that the risk allele of a novel SNP (rs77275268), which is closely correlated with rs9397435, disrupts a partially methylated CpG sequence within a known CTCF binding site. These studies demonstrate that shifting the analysis among ancestral populations can provide valuable resolution in association mapping.
Author Summary
In genome-wide association studies of disease susceptibility, there is no particular expectation that a genotyped SNP showing an association is itself a pathogenic variant. Rather, it is more likely that a SNP giving a signal does so because it is in linkage disequilibrium (LD) with a pathogenic variant. When the analysis is shifted to a population of another ancestry, the tagging relationship between the genotyped SNP and the pathogenic variant may be disrupted, due to differing patterns of LD between populations. Thus, it is not straightforward to determine whether a susceptibility locus identified in one ancestral population is also associated with risk in another. Moreover, the differing patterns of LD between ancestral populations can be used to gain resolution in genetic mapping. We refer to this approach as ancestry-shift refinement mapping. Here, we apply it to a breast cancer risk variant near the estrogen receptor α gene that was initially described in a Chinese population. We show that the tagging relationship between the originally described SNP rs2046210 and the pathogenic variant(s) is not maintained in Europeans and Africans. We identify a SNP, rs9397435, that is associated with breast cancer risk in populations of Asian, European, and African ancestry.
PMCID: PMC2908678  PMID: 20661439
8.  Genetic affinities between endogamous and inbreeding populations of Uttar Pradesh 
BMC Genetics  2007;8:12.
India has experienced several waves of migration since the Middle Paleolithic. It is believed that the initial demic movement into India was from Africa along the southern coastal route, approximately 60,000–85,000 years before present (ybp). It has also been reported that there were two other major colonization which included eastward diffusion of Neolithic farmers (Elamo Dravidians) from Middle East sometime between 10,000 and 7,000 ybp and a southern dispersal of Indo Europeans from Central Asia 3,000 ybp. Mongol entry during the thirteenth century A.D. as well as some possible minor incursions from South China 50,000 to 60,000 ybp may have also contributed to cultural, linguistic and genetic diversity in India. Therefore, the genetic affinity and relationship of Indians with other world populations and also within India are often contested. In the present study, we have attempted to offer a fresh and immaculate interpretation on the genetic relationships of different North Indian populations with other Indian and world populations.
We have first genotyped 20 tetra-nucleotide STR markers among 1800 north Indian samples of nine endogamous populations belonging to three different socio-cultural strata. Genetic distances (Nei's DA and Reynold's Fst) were calculated among the nine studied populations, Caucasians and East Asians. This analysis was based upon the allelic profile of 20 STR markers to assess the genetic similarity and differences of the north Indian populations. North Indians showed a stronger genetic relationship with the Europeans (DA 0.0341 and Fst 0.0119) as compared to the Asians (DA 0.1694 and Fst – 0.0718). The upper caste Brahmins and Muslims were closest to Caucasians while middle caste populations were closer to Asians. Finally, three phylogenetic assessments based on two different NJ and ML phylogenetic methods and PC plot analysis were carried out using the same panel of 20 STR markers and 20 geo-ethnic populations. The three phylogenetic assessments revealed that north Indians are clustering with Caucasians.
The genetic affinities of Indians and that of different caste groups towards Caucasians or East Asians is distributed in a cline where geographically north Indians and both upper caste and Muslim populations are genetically closer to the Caucasians.
PMCID: PMC1855350  PMID: 17417972
9.  Reconstructing the Population Genetic History of the Caribbean 
PLoS Genetics  2013;9(11):e1003925.
The Caribbean basin is home to some of the most complex interactions in recent history among previously diverged human populations. Here, we investigate the population genetic history of this region by characterizing patterns of genome-wide variation among 330 individuals from three of the Greater Antilles (Cuba, Puerto Rico, Hispaniola), two mainland (Honduras, Colombia), and three Native South American (Yukpa, Bari, and Warao) populations. We combine these data with a unique database of genomic variation in over 3,000 individuals from diverse European, African, and Native American populations. We use local ancestry inference and tract length distributions to test different demographic scenarios for the pre- and post-colonial history of the region. We develop a novel ancestry-specific PCA (ASPCA) method to reconstruct the sub-continental origin of Native American, European, and African haplotypes from admixed genomes. We find that the most likely source of the indigenous ancestry in Caribbean islanders is a Native South American component shared among inland Amazonian tribes, Central America, and the Yucatan peninsula, suggesting extensive gene flow across the Caribbean in pre-Columbian times. We find evidence of two pulses of African migration. The first pulse—which today is reflected by shorter, older ancestry tracts—consists of a genetic component more similar to coastal West African regions involved in early stages of the trans-Atlantic slave trade. The second pulse—reflected by longer, younger tracts—is more similar to present-day West-Central African populations, supporting historical records of later transatlantic deportation. Surprisingly, we also identify a Latino-specific European component that has significantly diverged from its parental Iberian source populations, presumably as a result of small European founder population size. We demonstrate that the ancestral components in admixed genomes can be traced back to distinct sub-continental source populations with far greater resolution than previously thought, even when limited pre-Columbian Caribbean haplotypes have survived.
Author Summary
Latinos are often regarded as a single heterogeneous group, whose complex variation is not fully appreciated in several social, demographic, and biomedical contexts. By making use of genomic data, we characterize ancestral components of Caribbean populations on a sub-continental level and unveil fine-scale patterns of population structure distinguishing insular from mainland Caribbean populations as well as from other Hispanic/Latino groups. We provide genetic evidence for an inland South American origin of the Native American component in island populations and for extensive pre-Columbian gene flow across the Caribbean basin. The Caribbean-derived European component shows significant differentiation from parental Iberian populations, presumably as a result of founder effects during the colonization of the New World. Based on demographic models, we reconstruct the complex population history of the Caribbean since the onset of continental admixture. We find that insular populations are best modeled as mixtures absorbing two pulses of African migrants, coinciding with the early and maximum activity stages of the transatlantic slave trade. These two pulses appear to have originated in different regions within West Africa, imprinting two distinguishable signatures on present-day Afro-Caribbean genomes and shedding light on the genetic impact of the slave trade in the Caribbean.
PMCID: PMC3828151  PMID: 24244192
10.  Reconstructing the Indian Origin and Dispersal of the European Roma: A Maternal Genetic Perspective 
PLoS ONE  2011;6(1):e15988.
Previous genetic, anthropological and linguistic studies have shown that Roma (Gypsies) constitute a founder population dispersed throughout Europe whose origins might be traced to the Indian subcontinent. Linguistic and anthropological evidence point to Indo-Aryan ethnic groups from North-western India as the ancestral parental population of Roma. Recently, a strong genetic hint supporting this theory came from a study of a private mutation causing primary congenital glaucoma. In the present study, complete mitochondrial control sequences of Iberian Roma and previously published maternal lineages of other European Roma were analyzed in order to establish the genetic affinities among Roma groups, determine the degree of admixture with neighbouring populations, infer the migration routes followed since the first arrival to Europe, and survey the origin of Roma within the Indian subcontinent. Our results show that the maternal lineage composition in the Roma groups follows a pattern of different migration routes, with several founder effects, and low effective population sizes along their dispersal. Our data allowed the confirmation of a North/West migration route shared by Polish, Lithuanian and Iberian Roma. Additionally, eleven Roma founder lineages were identified and degrees of admixture with host populations were estimated. Finally, the comparison with an extensive database of Indian sequences allowed us to identify the Punjab state, in North-western India, as the putative ancestral homeland of the European Roma, in agreement with previous linguistic and anthropological studies.
PMCID: PMC3018485  PMID: 21264345
11.  Genetic Affinities of the Central Indian Tribal Populations 
PLoS ONE  2012;7(2):e32546.
The central Indian state Madhya Pradesh is often called as ‘heart of India’ and has always been an important region functioning as a trinexus belt for three major language families (Indo-European, Dravidian and Austroasiatic). There are less detailed genetic studies on the populations inhabited in this region. Therefore, this study is an attempt for extensive characterization of genetic ancestries of three tribal populations, namely; Bharia, Bhil and Sahariya, inhabiting this region using haploid and diploid DNA markers.
Methodology/Principal Findings
Mitochondrial DNA analysis showed high diversity, including some of the older sublineages of M haplogroup and prominent R lineages in all the three tribes. Y-chromosomal biallelic markers revealed high frequency of Austroasiatic-specific M95-O2a haplogroup in Bharia and Sahariya, M82-H1a in Bhil and M17-R1a in Bhil and Sahariya. The results obtained by haploid as well as diploid genetic markers revealed strong genetic affinity of Bharia (a Dravidian speaking tribe) with the Austroasiatic (Munda) group. The gene flow from Austroasiatic group is further confirmed by their Y-STRs haplotype sharing analysis, where we determined their founder haplotype from the North Munda speaking tribe, while, autosomal analysis was largely in concordant with the haploid DNA results.
Bhil exhibited largely Indo-European specific ancestry, while Sahariya and Bharia showed admixed genetic package of Indo-European and Austroasiatic populations. Hence, in a landscape like India, linguistic label doesn't unequivocally follow the genetic footprints.
PMCID: PMC3290590  PMID: 22393414
12.  Development of a Panel of Genome-Wide Ancestry Informative Markers to Study Admixture Throughout the Americas 
PLoS Genetics  2012;8(3):e1002554.
Most individuals throughout the Americas are admixed descendants of Native American, European, and African ancestors. Complex historical factors have resulted in varying proportions of ancestral contributions between individuals within and among ethnic groups. We developed a panel of 446 ancestry informative markers (AIMs) optimized to estimate ancestral proportions in individuals and populations throughout Latin America. We used genome-wide data from 953 individuals from diverse African, European, and Native American populations to select AIMs optimized for each of the three main continental populations that form the basis of modern Latin American populations. We selected markers on the basis of locus-specific branch length to be informative, well distributed throughout the genome, capable of being genotyped on widely available commercial platforms, and applicable throughout the Americas by minimizing within-continent heterogeneity. We then validated the panel in samples from four admixed populations by comparing ancestry estimates based on the AIMs panel to estimates based on genome-wide association study (GWAS) data. The panel provided balanced discriminatory power among the three ancestral populations and accurate estimates of individual ancestry proportions (R2>0.9 for ancestral components with significant between-subject variance). Finally, we genotyped samples from 18 populations from Latin America using the AIMs panel and estimated variability in ancestry within and between these populations. This panel and its reference genotype information will be useful resources to explore population history of admixture in Latin America and to correct for the potential effects of population stratification in admixed samples in the region.
Author Summary
Individuals from Latin America are descendants of multiple ancestral populations, primarily Native American, European, and African ancestors. The relative proportions of these ancestries can be estimated using genetic markers, known as ancestry informative markers (AIMs), whose allele frequency varies between the ancestral groups. Once determined, these ancestral proportions can be correlated with normal phenotypes, can be associated with disease, can be used to control for confounding due to population stratification, or can inform on the history of admixture in a population. In this study, we identified a panel of AIMs relevant to Latin American populations, validated the panel by comparing estimates of ancestry using the panel to ancestry determined from genome-wide data, and tested the panel in a diverse set of populations from the Americas. The panel of AIMs produces ancestry estimates that are highly accurate and appropriately controlled for population stratification, and it was used to genotype 18 populations from throughout Latin America. We have made the panel of AIMs available to any researcher interested in estimating ancestral proportions for populations from the Americas.
PMCID: PMC3297575  PMID: 22412386
13.  Molecular insight into the genesis of ranked caste populations of western India based upon polymorphisms across non-recombinant and recombinant regions in genome 
Genome Biology  2005;6(8):P10.
To trace admixture and genesis of caste populations of western India, polymorphisms were examined across non-recombining 20 Y-SNPs, 20 Y-STRs, 18 mtDNA diagnostic sites, HVS-1 plus HVS-2 regions; and recombining 15 highly polymorphic autosomal STRs in four predominant caste populations- upper-ranking Desasth-brahmin and Chitpavan-brahmin; a middle-ranking Kshtriya Maratha; and a lower-rank peasant Dhangar.
Large-scale trade and cultural contacts between coastal populations of western India and Western-Eurasians paved for extensive immigration and genesis of wide spectrum of admixed gene pool. To trace admixture and genesis of caste populations of western India, we have examined polymorphisms across non-recombining 20 Y-SNPs, 20 Y-STRs, 18 mtDNA diagnostic sites, HVS-1 plus HVS-2 regions; and recombining 15 highly polymorphic autosomal STRs in four predominant caste populations- upper-ranking Desasth-brahmin and Chitpavan-brahmin; a middle-ranking Kshtriya Maratha; and a lower-rank peasant Dhangar.
The generated genomic data was compared with putative parental populations- Central Asians, West Asians and Europeans using AMOVA, PC plot, and admixture estimates. Overall, disparate uniparental ancestries, and l.1% GST value for biparental markers among four studied caste populations linked well with their exchequer demographic histories. Marathi-speaking ancient Desasth-brahmin shows substantial admixture from Central Asian males but Paleolithic maternal component support their Scytho-Dravidian origin. Chitpavanbrahmin demonstrates younger maternal component and substantial paternal gene flow from West Asia, thus giving credence to their recent Irano-Scythian ancestry from Mediterranean or Turkey, which correlated well with European-looking features of this caste. This also explains their untraceable ethno-history before 1000 years, brahminization event and later amalgamation by Maratha. The widespread Palaeolithic mtDNA haplogroups in Maratha and Dhangar highlight their shared Proto-Asian ancestries. Maratha males harboured Anatolianderived J2 lineage corroborating the blending of farming communities. Dhangar heterogeneity is ascribable to predominantly South-Asian males and West-Eurasian females.
The genomic data-sets of this study provide ample genomic evidences of diverse origins of four ranked castes and synchronization of caste stratification with asymmetrical gene flows from Indo-European migration during Upper Paleolithic, Neolithic, and later dates. However, subsequent gene flows among these castes living in geographical proximity, have diminished significant genetic differentiation as indicated by AMOVA and structure.
PMCID: PMC4071276
14.  Replication of genetic loci for ages at menarche and menopause in the multi-ethnic Population Architecture using Genomics and Epidemiology (PAGE) study 
Human Reproduction (Oxford, England)  2013;28(6):1695-1706.
Do genetic associations identified in genome-wide association studies (GWAS) of age at menarche (AM) and age at natural menopause (ANM) replicate in women of diverse race/ancestry from the Population Architecture using Genomics and Epidemiology (PAGE) Study?
We replicated GWAS reproductive trait single nucleotide polymorphisms (SNPs) in our European descent population and found that many SNPs were also associated with AM and ANM in populations of diverse ancestry.
Menarche and menopause mark the reproductive lifespan in women and are important risk factors for chronic diseases including obesity, cardiovascular disease and cancer. Both events are believed to be influenced by environmental and genetic factors, and vary in populations differing by genetic ancestry and geography. Most genetic variants associated with these traits have been identified in GWAS of European-descent populations.
A total of 42 251 women of diverse ancestry from PAGE were included in cross-sectional analyses of AM and ANM.
SNPs previously associated with ANM (n = 5 SNPs) and AM (n = 3 SNPs) in GWAS were genotyped in American Indians, African Americans, Asians, European Americans, Hispanics and Native Hawaiians. To test SNP associations with ANM or AM, we used linear regression models stratified by race/ethnicity and PAGE sub-study. Results were then combined in race-specific fixed effect meta-analyses for each outcome. For replication and generalization analyses, significance was defined at P < 0.01 for ANM analyses and P < 0.017 for AM analyses.
We replicated findings for AM SNPs in the LIN28B locus and an intergenic region on 9q31 in European Americans. The LIN28B SNPs (rs314277 and rs314280) were also significantly associated with AM in Asians, but not in other race/ethnicity groups. Linkage disequilibrium (LD) patterns at this locus varied widely among the ancestral groups. With the exception of an intergenic SNP at 13q34, all ANM SNPs replicated in European Americans. Three were significantly associated with ANM in other race/ethnicity populations: rs2153157 (6p24.2/SYCP2L), rs365132 (5q35/UIMC1) and rs16991615 (20p12.3/MCM8). While rs1172822 (19q13/BRSK1) was not significant in the populations of non-European descent, effect sizes showed similar trends.
Lack of association for the GWAS SNPs in the non-European American groups may be due to differences in locus LD patterns between these groups and the European-descent populations included in the GWAS discovery studies; and in some cases, lower power may also contribute to non-significant findings.
The discovery of genetic variants associated with the reproductive traits provides an important opportunity to elucidate the biological mechanisms involved with normal variation and disorders of menarche and menopause. In this study we replicated most, but not all reported SNPs in European descent populations and examined the epidemiologic architecture of these early reported variants, describing their generalizability and effect size across differing ancestral populations. Such data will be increasingly important for prioritizing GWAS SNPs for follow-up in fine-mapping and resequencing studies, as well as in translational research.
The Population Architecture Using Genomics and Epidemiology (PAGE) program is funded by the National Human Genome Research Institute (NHGRI), supported by U01HG004803 (CALiCo), U01HG004798 (EAGLE), U01HG004802 (MEC), U01HG004790 (WHI) and U01HG004801 (Coordinating Center), and their respective NHGRI ARRA supplements. The authors report no conflicts of interest.
PMCID: PMC3657124  PMID: 23508249
menopause; menarche; genome-wide association study; race/ethnicity; single nucleotide polymorphism
15.  Ancestral Components of Admixed Genomes in a Mexican Cohort 
PLoS Genetics  2011;7(12):e1002410.
For most of the world, human genome structure at a population level is shaped by interplay between ancient geographic isolation and more recent demographic shifts, factors that are captured by the concepts of biogeographic ancestry and admixture, respectively. The ancestry of non-admixed individuals can often be traced to a specific population in a precise region, but current approaches for studying admixed individuals generally yield coarse information in which genome ancestry proportions are identified according to continent of origin. Here we introduce a new analytic strategy for this problem that allows fine-grained characterization of admixed individuals with respect to both geographic and genomic coordinates. Ancestry segments from different continents, identified with a probabilistic model, are used to construct and study “virtual genomes” of admixed individuals. We apply this approach to a cohort of 492 parent–offspring trios from Mexico City. The relative contributions from the three continental-level ancestral populations—Africa, Europe, and America—vary substantially between individuals, and the distribution of haplotype block length suggests an admixing time of 10–15 generations. The European and Indigenous American virtual genomes of each Mexican individual can be traced to precise regions within each continent, and they reveal a gradient of Amerindian ancestry between indigenous people of southwestern Mexico and Mayans of the Yucatan Peninsula. This contrasts sharply with the African roots of African Americans, which have been characterized by a uniform mixing of multiple West African populations. We also use the virtual European and Indigenous American genomes to search for the signatures of selection in the ancestral populations, and we identify previously known targets of selection in other populations, as well as new candidate loci. The ability to infer precise ancestral components of admixed genomes will facilitate studies of disease-related phenotypes and will allow new insight into the adaptive and demographic history of indigenous people.
Author Summary
Admixed individuals, such as African Americans and Latinos, arise from mating between individuals from different continents. Detailed knowledge about the ancestral origin of an admixed population not only provides insight regarding the history of the population itself, but also affords opportunities to study the evolutionary biology of the ancestral populations. Applying novel statistical methods, we analyzed the high-density genotype data of nearly 1,500 Mexican individuals from Mexico City, who are admixed among Indigenous Americans, Europeans, and Africans. The relative contributions from the three continental-level ancestral populations vary substantially between individuals. The European ancestors of these Mexican individuals genetically resemble Southern Europeans, such as the Spaniard and the Portuguese. The Indigenous American ancestry of the Mexicans in our study is largely attributed to the indigenous groups residing in the southwestern region of Mexico, although some individuals have inherited varying degrees of ancestry from the Mayans of the Yucatan Peninsula and other indigenous American populations. A search for signatures of selection, focusing on the parts of the genomes derived from an ancestral population (e.g. Indigenous American), identifies regions in which a genetic variant may have been favored by natural selection in that ancestral population.
PMCID: PMC3240599  PMID: 22194699
16.  Mitochondrial and Y-chromosome diversity of the Tharus (Nepal): a reservoir of genetic variation 
Central Asia and the Indian subcontinent represent an area considered as a source and a reservoir for human genetic diversity, with many markers taking root here, most of which are the ancestral state of eastern and western haplogroups, while others are local. Between these two regions, Terai (Nepal) is a pivotal passageway allowing, in different times, multiple population interactions, although because of its highly malarial environment, it was scarcely inhabited until a few decades ago, when malaria was eradicated. One of the oldest and the largest indigenous people of Terai is represented by the malaria resistant Tharus, whose gene pool could still retain traces of ancient complex interactions. Until now, however, investigations on their genetic structure have been scarce mainly identifying East Asian signatures.
High-resolution analyses of mitochondrial-DNA (including 34 complete sequences) and Y-chromosome (67 SNPs and 12 STRs) variations carried out in 173 Tharus (two groups from Central and one from Eastern Terai), and 104 Indians (Hindus from Terai and New Delhi and tribals from Andhra Pradesh) allowed the identification of three principal components: East Asian, West Eurasian and Indian, the last including both local and inter-regional sub-components, at least for the Y chromosome.
Although remarkable quantitative and qualitative differences appear among the various population groups and also between sexes within the same group, many mitochondrial-DNA and Y-chromosome lineages are shared or derived from ancient Indian haplogroups, thus revealing a deep shared ancestry between Tharus and Indians. Interestingly, the local Y-chromosome Indian component observed in the Andhra-Pradesh tribals is present in all Tharu groups, whereas the inter-regional component strongly prevails in the two Hindu samples and other Nepalese populations.
The complete sequencing of mtDNAs from unresolved haplogroups also provided informative markers that greatly improved the mtDNA phylogeny and allowed the identification of ancient relationships between Tharus and Malaysia, the Andaman Islands and Japan as well as between India and North and East Africa. Overall, this study gives a paradigmatic example of the importance of genetic isolates in revealing variants not easily detectable in the general population.
PMCID: PMC2720951  PMID: 19573232
17.  Reconstructing Austronesian population history in Island Southeast Asia 
Nature Communications  2014;5:4689.
Austronesian languages are spread across half the globe, from Easter Island to Madagascar. Evidence from linguistics and archaeology indicates that the ‘Austronesian expansion,’ which began 4,000–5,000 years ago, likely had roots in Taiwan, but the ancestry of present-day Austronesian-speaking populations remains controversial. Here, we analyse genome-wide data from 56 populations using new methods for tracing ancestral gene flow, focusing primarily on Island Southeast Asia. We show that all sampled Austronesian groups harbour ancestry that is more closely related to aboriginal Taiwanese than to any present-day mainland population. Surprisingly, western Island Southeast Asian populations have also inherited ancestry from a source nested within the variation of present-day populations speaking Austro-Asiatic languages, which have historically been nearly exclusive to the mainland. Thus, either there was once a substantial Austro-Asiatic presence in Island Southeast Asia, or Austronesian speakers migrated to and through the mainland, admixing there before continuing to western Indonesia.
Populations speaking Austronesian languages are numerous and widespread, but their history remains controversial. Here, the authors analyse genetic data from Southeast Asia and show that all populations harbour ancestry most closely related to aboriginal Taiwanese, while some also contain a component closest to Austro-Asiatic speakers.
PMCID: PMC4143916  PMID: 25137359
18.  Reconstructing Roma History from Genome-Wide Data 
PLoS ONE  2013;8(3):e58633.
The Roma people, living throughout Europe and West Asia, are a diverse population linked by the Romani language and culture. Previous linguistic and genetic studies have suggested that the Roma migrated into Europe from South Asia about 1,000–1,500 years ago. Genetic inferences about Roma history have mostly focused on the Y chromosome and mitochondrial DNA. To explore what additional information can be learned from genome-wide data, we analyzed data from six Roma groups that we genotyped at hundreds of thousands of single nucleotide polymorphisms (SNPs). We estimate that the Roma harbor about 80% West Eurasian ancestry–derived from a combination of European and South Asian sources–and that the date of admixture of South Asian and European ancestry was about 850 years before present. We provide evidence for Eastern Europe being a major source of European ancestry, and North-west India being a major source of the South Asian ancestry in the Roma. By computing allele sharing as a measure of linkage disequilibrium, we estimate that the migration of Roma out of the Indian subcontinent was accompanied by a severe founder event, which appears to have been followed by a major demographic expansion after the arrival in Europe.
PMCID: PMC3596272  PMID: 23516520
19.  In situ origin of deep rooting lineages of mitochondrial Macrohaplogroup 'M' in India 
BMC Genomics  2006;7:151.
Macrohaplogroups 'M' and 'N' have evolved almost in parallel from a founder haplogroup L3. Macrohaplogroup N in India has already been defined in previous studies and recently the macrohaplogroup M among the Indian populations has been characterized. In this study, we attempted to reconstruct and re-evaluate the phylogeny of Macrohaplogroup M, which harbors more than 60% of the Indian mtDNA lineage, and to shed light on the origin of its deep rooting haplogroups.
Using 11 whole mtDNA and 2231 partial coding sequence of Indian M lineage selected from 8670 HVS1 sequences across India, we have reconstructed the tree including Andamanese-specific lineage M31 and calculated the time depth of all the nodes. We defined one novel haplogroup M41, and revised the classification of haplogroups M3, M18, and M31.
Our result indicates that the Indian mtDNA pool consists of several deep rooting lineages of macrohaplogroup 'M' suggesting in-situ origin of these haplogroups in South Asia, most likely in the India. These deep rooting lineages are not language specific and spread over all the language groups in India. Moreover, our reanalysis of the Andamanese-specific lineage M31 suggests population specific two clear-cut subclades (M31a1 and M31a2). Onge and Jarwa share M31a1 branch while M31a2 clade is present in only Great Andamanese individuals. Overall our study supported the one wave, rapid dispersal theory of modern humans along the Asian coast.
PMCID: PMC1534032  PMID: 16776823
20.  Low Levels of Genetic Divergence across Geographically and Linguistically Diverse Populations from India 
PLoS Genetics  2006;2(12):e215.
Ongoing modernization in India has elevated the prevalence of many complex genetic diseases associated with a western lifestyle and diet to near-epidemic proportions. However, although India comprises more than one sixth of the world's human population, it has largely been omitted from genomic surveys that provide the backdrop for association studies of genetic disease. Here, by genotyping India-born individuals sampled in the United States, we carry out an extensive study of Indian genetic variation. We analyze 1,200 genome-wide polymorphisms in 432 individuals from 15 Indian populations. We find that populations from India, and populations from South Asia more generally, constitute one of the major human subgroups with increased similarity of genetic ancestry. However, only a relatively small amount of genetic differentiation exists among the Indian populations. Although caution is warranted due to the fact that United States–sampled Indian populations do not represent a random sample from India, these results suggest that the frequencies of many genetic variants are distinctive in India compared to other parts of the world and that the effects of population heterogeneity on the production of false positives in association studies may be smaller in Indians (and particularly in Indian-Americans) than might be expected for such a geographically and linguistically diverse subset of the human population.
Genomic studies of human genetic variation are useful for investigating human evolutionary history, as well as for designing strategies for identifying disease-related genes. Despite its large population and its increasing complex genetic disease burden as a result of modernization, India has been excluded from most of the largest genomic surveys.
The authors performed an extensive investigation of Indian genetic diversity and population relationships, sampling 15 groups of India-born immigrants to the United States and genotyping each individual at 1,200 genetic markers genome-wide. Populations from India, and groups from South Asia more generally, form a genetic cluster, so that individuals placed within this cluster are more genetically similar to each other than to individuals outside the cluster. However, the amount of genetic differentiation among Indian populations is relatively small. The authors conclude that genetic variation in India is distinctive with respect to the rest of the world, but that the level of genetic divergence is smaller in Indians than might be expected for such a geographically and linguistically diverse group.
PMCID: PMC1713257  PMID: 17194221
21.  Ancient origin and evolution of the Indian wolf: evidence from mitochondrial DNA typing of wolves from Trans-Himalayan region and Pennisular India 
Genome Biology  2003;4(6):P6.
A study of mitochondrial DNA diversity across three different taxonomically informative domains (cytochrome-B gene, 16S rDNA and hypervariable d-loop control region) revealed that the Himalayan wolf and the Indian Gray wolf are genetically distinct from each other as well as from all other wolves of the world
The two wolf types found in India are represented by two isolated populations and believed to be two sub-species of Canis lupus. One of these wolf, locally called Himalayan wolf (HW) or Tibetan wolf, is found only in the upper Trans-Himalayan region from Himachal Pradesh to Leh in Kasmir and is considered to be C. lupus chanco. The other relatively larger population is of Indian Gray wolf (GW) that is found in the peninsular India and considered to be C. lupus pallipes. Both these wolves are accorded endangered species status under the Indian Wildlife Protection Act. In 1998 for the first time in India, we initiated molecular characterization studies to understand their genetic structure and taxonomic status. Since then, we have analyzed the genetic variability in 18 of the total of 21 HW samples available in Zoological parks along with representative samples of GW, wild dogs and jackals. Our study of mitochondrial DNA diversity across three different taxonomically informative domains i.e., cytochrome-B gene, 16S rDNA and hypervariable d-loop control region revealed HW to be genetically distinct from the GW as well as from all other wolves of the world, including C. lupus chanco from China. Most importantly, d-loop haplotypic diversity revealed both HW and GW from India to be significantly diverse from other wolf populations globally and showed that these represent the most ancient lineages among them. Phylogenetic analysis revealed the Indian wolves as two independent lineages in a clade distinct and basal to the clade of all wolves from outside of India. Conservative estimate of evolutionary time-span suggests more than one million years of separation and independent evolution of HW and GW. We hypothesize that Indian wolves represent a post-jackal pre-wolf ancestral radiation that migrated to India about 1-2 mya and underwent independent evolution without contamination from other wolf like canids. The study thus, suggests that Indian subcontinent had been one major center of origin and diversification of the wolf and related canids. Further, the significant degree of genetic differentiation of HW from GW and of these two from other wolves, suggest the interesting possibility of them to be new wolf species/subspecies in evolution that may need to be described possibly as C. himalayaensis and C. indica (or as C. lupus himalyaensis and C. lupus indica), respectively. Thus for the first time, the study reveals new ancient wolf lineages in India and also highlights the need to revisit the origin, evolution and dispersion of wolf populations in Asia and elsewhere. Simultaneously, it increases the conservation importance of Indian wolves warranting urgent measures for their effective protection and management, especially of the small HW population that at present is not even recognized in the canid Red List.
PMCID: PMC4071266
22.  Influence of language and ancestry on genetic structure of contiguous populations: A microsatellite based study on populations of Orissa 
BMC Genetics  2005;6:4.
We have examined genetic diversity at fifteen autosomal microsatellite loci in seven predominant populations of Orissa to decipher whether populations inhabiting the same geographic region can be differentiated on the basis of language or ancestry. The studied populations have diverse historical accounts of their origin, belong to two major ethnic groups and different linguistic families. Caucasoid caste populations are speakers of Indo-European language and comprise Brahmins, Khandayat, Karan and Gope, while the three Australoid tribal populations include two Austric speakers: Juang and Saora and a Dravidian speaking population, Paroja. These divergent groups provide a varied substratum for understanding variation of genetic patterns in a geographical area resulting from differential admixture between migrants groups and aboriginals, and the influence of this admixture on population stratification.
The allele distribution pattern showed uniformity in the studied groups with approximately 81% genetic variability within populations. The coefficient of gene differentiation was found to be significantly higher in tribes (0.014) than caste groups (0.004). Genetic variance between the groups was 0.34% in both ethnic and linguistic clusters and statistically significant only in the ethnic apportionment. Although the populations were genetically close (FST = 0.010), the contemporary caste and tribal groups formed distinct clusters in both Principal-Component plot and Neighbor-Joining tree. In the phylogenetic tree, the Orissa Brahmins showed close affinity to populations of North India, while Khandayat and Gope clustered with the tribal groups, suggesting a possibility of their origin from indigenous people.
The extent of genetic differentiation in the contemporary caste and tribal groups of Orissa is highly significant and constitutes two distinct genetic clusters. Based on our observations, we suggest that since genetic distances and coefficient of gene differentiation were fairly small, the studied populations are indeed genetically similar and that the genetic structure of populations in a geographical region is primarily influenced by their ancestry and not by socio-cultural hierarchy or language. The scenario of genetic structure, however, might be different for other regions of the subcontinent where populations have more similar ethnic and linguistic backgrounds and there might be variations in the patterns of genomic and socio-cultural affinities in different geographical regions.
PMCID: PMC549189  PMID: 15694006
23.  The Phylogeography of Y-Chromosome Haplogroup H1a1a-M82 Reveals the Likely Indian Origin of the European Romani Populations 
PLoS ONE  2012;7(11):e48477.
Linguistic and genetic studies on Roma populations inhabited in Europe have unequivocally traced these populations to the Indian subcontinent. However, the exact parental population group and time of the out-of-India dispersal have remained disputed. In the absence of archaeological records and with only scanty historical documentation of the Roma, comparative linguistic studies were the first to identify their Indian origin. Recently, molecular studies on the basis of disease-causing mutations and haploid DNA markers (i.e. mtDNA and Y-chromosome) supported the linguistic view. The presence of Indian-specific Y-chromosome haplogroup H1a1a-M82 and mtDNA haplogroups M5a1, M18 and M35b among Roma has corroborated that their South Asian origins and later admixture with Near Eastern and European populations. However, previous studies have left unanswered questions about the exact parental population groups in South Asia. Here we present a detailed phylogeographical study of Y-chromosomal haplogroup H1a1a-M82 in a data set of more than 10,000 global samples to discern a more precise ancestral source of European Romani populations. The phylogeographical patterns and diversity estimates indicate an early origin of this haplogroup in the Indian subcontinent and its further expansion to other regions. Tellingly, the short tandem repeat (STR) based network of H1a1a-M82 lineages displayed the closest connection of Romani haplotypes with the traditional scheduled caste and scheduled tribe population groups of northwestern India.
PMCID: PMC3509117  PMID: 23209554
24.  Skin Color Variation in Orang Asli Tribes of Peninsular Malaysia 
PLoS ONE  2012;7(8):e42752.
Pigmentation is a readily scorable and quantitative human phenotype, making it an excellent model for studying multifactorial traits and diseases. Convergent human evolution from the ancestral state, darker skin, towards lighter skin colors involved divergent genetic mechanisms in people of European vs. East Asian ancestry. It is striking that the European mechanisms result in a 10–20-fold increase in skin cancer susceptibility while the East Asian mechanisms do not. Towards the mapping of genes that contribute to East Asian pigmentation there is need for one or more populations that are admixed for ancestral and East Asian ancestry, but with minimal European contribution. This requirement is fulfilled by the Senoi, one of three indigenous tribes of Peninsular Malaysia collectively known as the Orang Asli. The Senoi are thought to be an admixture of the Negrito, an ancestral dark-skinned population representing the second of three Orang Asli tribes, and regional Mongoloid populations of Indo-China such as the Proto-Malay, the third Orang Asli tribe. We have calculated skin reflectance-based melanin indices in 492 Orang Asli, which ranged from 28 (lightest) to 75 (darkest); both extremes were represented in the Senoi. Population averages were 56 for Negrito, 42 for Proto-Malay, and 46 for Senoi. The derived allele frequencies for SLC24A5 and SLC45A2 in the Senoi were 0.04 and 0.02, respectively, consistent with greater South Asian than European admixture. Females and individuals with the A111T mutation had significantly lighter skin (p = 0.001 and 0.0039, respectively). Individuals with these derived alleles were found across the spectrum of skin color, indicating an overriding effect of strong skin lightening alleles of East Asian origin. These results suggest that the Senoi are suitable for mapping East Asian skin color genes.
PMCID: PMC3418284  PMID: 22912732
25.  Reconstructing Native American Migrations from Whole-Genome and Whole-Exome Data 
PLoS Genetics  2013;9(12):e1004023.
There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparental markers. The 1000 Genomes Project provides a unique opportunity for improving our understanding of population genetic history by providing over a hundred sequenced low coverage genomes and exomes from Colombian (CLM), Mexican-American (MXL), and Puerto Rican (PUR) populations. Here, we explore the genomic contributions of African, European, and especially Native American ancestry to these populations. Estimated Native American ancestry is in MXL, in CLM, and in PUR. Native American ancestry in PUR is most closely related to populations surrounding the Orinoco River basin, confirming the Southern America ancestry of the Taíno people of the Caribbean. We present new methods to estimate the allele frequencies in the Native American fraction of the populations, and model their distribution using a demographic model for three ancestral Native American populations. These ancestral populations likely split in close succession: the most likely scenario, based on a peopling of the Americas thousand years ago (kya), supports that the MXL Ancestors split kya, with a subsequent split of the ancestors to CLM and PUR kya. The model also features effective populations of in Mexico, in Colombia, and in Puerto Rico. Modeling Identity-by-descent (IBD) and ancestry tract length, we show that post-contact populations also differ markedly in their effective sizes and migration patterns, with Puerto Rico showing the smallest effective size and the earlier migration from Europe. Finally, we compare IBD and ancestry assignments to find evidence for relatedness among European founders to the three populations.
Author Summary
Populations of the Americas have a rich and heterogeneous genetic and cultural heritage that draws from a diversity of pre-Columbian Native American, European, and African populations. Characterizing this diversity facilitates the development of medical genetics research in diverse populations and the transfer of medical knowledge across populations. It also represents an opportunity to better understand the peopling of the Americas, from the crossing of Beringia to the post-Columbian era. Here, we take advantage sequencing of individuals of Colombian (CLM), Mexican (MXL), and Puerto Rican (PUR) origin by the 1000 Genomes project to improve our demographic models for the peopling of the Americas. The divergence among African, European, and Native American ancestors to these populations enables us to infer the continent of origin at each locus in the sampled genomes. The resulting patterns of ancestry suggest complex post-Columbian migration histories, starting later in CLM than in MXL and PUR. Whereas European ancestral segments show evidence of relatedness, a demographic model of synonymous variation suggests that the Native American Ancestors to MXL, PUR, and CLM panels split within a few hundred years over 12 thousand years ago. Together with early archeological sites in South America, these results support rapid divergence during the initial peopling of the Americas.
PMCID: PMC3873240  PMID: 24385924

