Rates of uterine leiomyomata (UL) are 2–3 times higher in African Americans than in European Americans. It is unclear whether inherited factors explain the ethnic disparity. To investigate the presence of risk alleles for UL that are highly differentiated in frequency between African Americans and European Americans, the authors conducted an admixture-based genome-wide scan of 2,453 UL cases confirmed by ultrasound or surgery in the Black Women's Health Study (1997–2009), a national prospective cohort study. Controls (n = 2,102) were women who did not report a UL diagnosis through 2009. Mean percentage of European ancestry was significantly lower among cases (20.00%) than among controls (21.63%; age-adjusted mean difference = −1.76%, 95% confidence interval: −2.40, −1.12; P < 0.0001), and the association was stronger in younger cases. Admixture analyses showed suggestive evidence of association at chromosomes 2, 4, and 10. The authors also genotyped a dense set of tag single nucleotide polymorphisms at different loci associated with UL in Japanese women but failed to replicate the associations. This suggests that genetic variation for UL differs in populations with and without African ancestry. The admixture findings further indicate that no single highly differentiated locus is responsible for the ethnic disparity in UL, raising the possibility that multiple variants jointly contribute to the higher incidence of UL in African Americans.
African Americans; African continental ancestry group; European continental ancestry group; female; genetics; leiomyoma; prospective studies; uterine neoplasms
Tens of millions of base pairs of euchromatic human genome sequence, including many protein-coding genes, have no known location in the human genome. We describe an approach for localizing the human genome's missing pieces by utilizing the patterns of genome sequence variation created by population admixture. We mapped the locations of 70 scaffolds spanning four million base pairs of the human genome's unplaced euchromatic sequence, including more than a dozen protein-coding genes, and identified eight large novel inter-chromosomal segmental duplications. We find that most of these sequences are hidden in the genome's heterochromatin, particularly its pericentromeric regions. Many cryptic, pericentromeric genes are expressed in RNA and have been maintained intact for millions of years while their expression patterns diverged from those of paralogous genes elsewhere in the genome. We describe how knowledge of the locations of these sequences can inform disease association and genome biology studies.
To determine whether shared epitope (SE)–containing HLA–DRB1 alleles are associated with rheumatoid arthritis (RA) in African Americans and whether their presence is associated with higher degrees of global (genome-wide) genetic admixture from the European population.
In this multicenter cohort study, African Americans with early RA and matched control subjects were analyzed. In addition to measurement of serum anti–cyclic citrullinated peptide (anti-CCP) antibodies and HLA–DRB1 genotyping, a panel of >1,200 ancestry-informative markers was analyzed in patients with RA and control subjects, to estimate the proportion of European ancestry.
The frequency of SE-containing HLA–DRB1 alleles was 25.2% in African American patients with RA versus 13.6% in control subjects (P = 0.00005). Of 321 patients with RA, 42.1% had at least 1 SE-containing allele, compared with 25.3% of 166 control subjects (P = 0.0004). The mean estimated percent European ancestry was associated with SE-containing HLA–DRB1 alleles in African Americans, regardless of disease status (RA or control). As reported in RA patients of European ancestry, there was a significant association of the SE with the presence of the anti-CCP antibody: 86 (48.9%) of 176 patients with anti-CCP antibody–positive RA had at least 1 SE allele, compared with 36 (32.7%) of 110 patients with anti-CCP antibody–negative RA (P = 0.01, by chi-square test).
HLA–DRB1 alleles containing the SE are strongly associated with susceptibility to RA in African Americans. The absolute contribution is less than that reported in RA among populations of European ancestry, in which ~50–70% of patients have at least 1 SE allele. As in Europeans with RA, the SE association was strongest in the subset of African American patients with anti-CCP antibodies. The finding of a higher degree of European ancestry among African Americans with SE alleles suggests that a genetic risk factor for RA was introduced into the African American population through admixture, thus making these individuals more susceptible to subsequent environmental or unknown factors that trigger the disease.
The recent explosion in available genetic data has led to significant advances in understanding the demographic histories of and relationships among human populations. It is still a challenge, however, to infer reliable parameter values for complicated models involving many populations. Here, we present MixMapper, an efficient, interactive method for constructing phylogenetic trees including admixture events using single nucleotide polymorphism (SNP) genotype data. MixMapper implements a novel two-phase approach to admixture inference using moment statistics, first building an unadmixed scaffold tree and then adding admixed populations by solving systems of equations that express allele frequency divergences in terms of mixture parameters. Importantly, all features of the model, including topology, sources of gene flow, branch lengths, and mixture proportions, are optimized automatically from the data and include estimates of statistical uncertainty. MixMapper also uses a new method to express branch lengths in easily interpretable drift units. We apply MixMapper to recently published data for Human Genome Diversity Cell Line Panel individuals genotyped on a SNP array designed especially for use in population genetics studies, obtaining confident results for 30 populations, 20 of them admixed. Notably, we confirm a signal of ancient admixture in European populations—including previously undetected admixture in Sardinians and Basques—involving a proportion of 20–40% ancient northern Eurasian ancestry.
admixture; human populations; genetic drift; moment statistics
We present a DNA library preparation method that has allowed us to reconstruct a high coverage (30X) genome sequence of a Denisovan, an extinct relative of Neandertals. The quality of this genome allows a direct estimation of Denisovan heterozygosity indicating that genetic diversity in these archaic hominins was extremely low. It also allows tentative dating of the specimen on the basis of “missing evolution” in its genome, detailed measurements of Denisovan and Neandertal admixture into present-day human populations, and the generation of a near-complete catalog of genetic changes that swept to high frequency in modern humans since their divergence from Denisovans.
The peopling of the Americas has been the subject of extensive genetic, archaeological and linguistic research; however, central questions remain unresolved1–5. One contentious issue is whether the settlement occurred via a single6–8 or multiple streams of migration from Siberia9–15. The pattern of dispersals within the Americas is also poorly understood. To address these questions at higher resolution than was previously possible, we assembled data from 52 Native American and 17 Siberian groups genotyped at 364,470 single nucleotide polymorphisms. We show that Native Americans descend from at least three streams of Asian gene flow. Most descend entirely from a single ancestral population that we call “First American”. However, speakers of Eskimo-Aleut languages from the Arctic inherit almost half their ancestry from a second stream of Asian gene flow, and the Na-Dene-speaking Chipewyan from Canada inherit roughly one-tenth of their ancestry from a third stream. We show that the initial peopling followed a southward expansion facilitated by the coast, with sequential population splits and little gene flow after divergence, especially in South America. A major exception is in Chibchan-speakers on both sides of the Panama Isthmus, who have ancestry from both North and South America.
Mutations are the raw material of evolution, but have been difficult to study directly. We report the largest study of new mutations to date: 2,058 germline changes discovered by analyzing 85,289 Icelanders at 2,477 microsatellites. The paternal-to-maternal mutation rate ratio is 3.3, and the rate in fathers doubles from age 20 to 58 whereas there is no association with age in mothers. Longer microsatellite alleles are more mutagenic and tend to decrease in length, whereas the opposite is seen for shorter alleles. We use these empirical observations to build a model that we apply to individuals for whom we have both genome sequence and microsatellite data, allowing us to estimate key parameters of evolution without calibration to the fossil record. We infer that the sequence mutation rate is 1.4–2.3×10−8 per base pair per generation (90% credible interval), and that human-chimpanzee speciation occurred 3.7–6.6 million years ago.
Long-range migrations and the resulting admixtures between populations have been important forces shaping human genetic diversity. Most existing methods for detecting and reconstructing historical admixture events are based on allele frequency divergences or patterns of ancestry segments in chromosomes of admixed individuals. An emerging new approach harnesses the exponential decay of admixture-induced linkage disequilibrium (LD) as a function of genetic distance. Here, we comprehensively develop LD-based inference into a versatile tool for investigating admixture. We present a new weighted LD statistic that can be used to infer mixture proportions as well as dates with fewer constraints on reference populations than previous methods. We define an LD-based three-population test for admixture and identify scenarios in which it can detect admixture events that previous formal tests cannot. We further show that we can uncover phylogenetic relationships among populations by comparing weighted LD curves obtained using a suite of references. Finally, we describe several improvements to the computation and fitting of weighted LD curves that greatly increase the robustness and speed of the calculations. We implement all of these advances in a software package, ALDER, which we validate in simulations and apply to test for admixture among all populations from the Human Genome Diversity Project (HGDP), highlighting insights into the admixture history of Central African Pygmies, Sardinians, and Japanese.
admixture; linkage disequilibrium
The Roma people, living throughout Europe and West Asia, are a diverse population linked by the Romani language and culture. Previous linguistic and genetic studies have suggested that the Roma migrated into Europe from South Asia about 1,000–1,500 years ago. Genetic inferences about Roma history have mostly focused on the Y chromosome and mitochondrial DNA. To explore what additional information can be learned from genome-wide data, we analyzed data from six Roma groups that we genotyped at hundreds of thousands of single nucleotide polymorphisms (SNPs). We estimate that the Roma harbor about 80% West Eurasian ancestry–derived from a combination of European and South Asian sources–and that the date of admixture of South Asian and European ancestry was about 850 years before present. We provide evidence for Eastern Europe being a major source of European ancestry, and North-west India being a major source of the South Asian ancestry in the Roma. By computing allele sharing as a measure of linkage disequilibrium, we estimate that the migration of Roma out of the Indian subcontinent was accompanied by a severe founder event, which appears to have been followed by a major demographic expansion after the arrival in Europe.
Large data sets on human genetic variation have been collected recently, but their usefulness for learning about history and natural selection has been limited by biases in the ways polymorphisms were chosen. We report large subsets of SNPs from the International HapMap Project1,2 that allow us to overcome these biases and to provide accurate measurement of a quantity of crucial importance for understanding genetic variation: the allele frequency spectrum. Our analysis shows that East Asian and northern European ancestors shared the same population bottleneck expanding out of Africa but that both also experienced more recent genetic drift, which was greater in East Asians.
Strong signatures of positive selection at newly arising genetic variants are well-documented in humans1–8, but this form of selection may not be widespread in recent human evolution9. Because many human traits are highly polygenic and partly determined by common, ancient genetic variation, an alternative model for rapid genetic adaptation has been proposed: weak selection acting on many pre-existing (standing) genetic variants, or polygenic adaptation10–12. By studying height, a classic polygenic trait, we demonstrate the first human signature of widespread selection on standing variation. We show that frequencies of alleles associated with increased height, both at known loci and genome-wide, are systematically elevated in Northern Europeans compared with Southern Europeans (p<4.3×10−4). This pattern mirrors intra-European height differences and is not confounded by ancestry or other ascertainment biases. The systematic frequency differences are consistent with the presence of widespread weak selection (selection coefficients ~10−3–10−5 per allele) rather than genetic drift alone (p<10−15).
Human Genomics; Population Genetics; Europeans; Height; Selection
Hair relaxers are used by millions of black women, possibly exposing them to various chemicals through scalp lesions and burns. In the Black Women’s Health Study, the authors assessed hair relaxer use in relation to uterine leiomyomata incidence. In 1997, participants reported on hair relaxer use (age at first use, frequency, duration, number of burns, and type of formulation). From 1997 to 2009, 23,580 premenopausal women were followed for incident uterine leiomyomata. Multivariable Cox regression was used to estimate incidence rate ratios and 95% confidence intervals. During 199,991 person-years, 7,146 cases of uterine leiomyomata were reported as confirmed by ultrasound (n = 4,630) or surgery (n = 2,516). The incidence rate ratio comparing ever with never use of relaxers was 1.17 (95% confidence interval (CI): 1.06, 1.30). Positive trends were observed for frequency of use (Ptrend < 0.001), duration of use (Ptrend = 0.015), and number of burns (Ptrend < 0.001). Among long-term users (≥10 years), the incidence rate ratios for frequency of use categories 3–4, 5–6, and ≥7 versus 1–2 times/year were 1.04 (95% CI: 0.92, 1.19), 1.12 (95% CI: 0.99, 1.27), and 1.15 (95% CI: 1.01, 1.31), respectively (Ptrend = 0.002). Risk was unrelated to age at first use or type of formulation. These findings raise the hypothesis that hair relaxer use increases uterine leiomyomata risk.
African Americans; female; hair straighteners; leiomyoma; prospective studies
Genome wide association studies (GWAS) have proven a powerful method to identify common genetic variants contributing to susceptibility to common diseases. Here we show that extremely low-coverage sequencing (0.1–0.5x) captures almost as much of the common (>5%) and low-frequency (1–5%) variation across the genome as SNP arrays. As an empirical demonstration, we show that genome-wide SNP genotypes can be inferred at a mean r2 of 0.71 using off-target data (0.24x average coverage) in a whole-exome study of 909 samples. Using both simulated and real exome sequencing datasets we show that association statistics obtained using ultra low-coverage sequencing data attain similar P-values at known associated variants as genotyping arrays, without an excess of false positives. Within the context of reductions in sample preparation and sequencing costs, funds invested in ultra low-coverage sequencing can yield several times the effective sample size of SNP-array GWAS, and a commensurate increase in statistical power.
The major histocompatibility complex (MHC) on chromosome 6p21 is a key contributor to the genetic basis of systemic lupus erythemathosus (SLE). Although SLE affects African Americans disproportionately compared to European Americans, there has been no comprehensive analysis of the MHC region in relationship to SLE in African Americans. We conducted a screening of the MHC region for 1,536 single nucleotide polymorphisms (SNPs) and the deletion of the C4A gene in a SLE case-control study (380 cases, 765 age-matched controls) nested within the prospective Black Women’s Health Study. We also genotyped 1,509 ancestral informative markers throughout the genome to estimate European ancestry in order to control for population stratification due to population admixture. The most strongly associated SNP with SLE was the rs9271366 (odds ratio, OR = 1.70, p = 5.6×10−5) near the HLA-DRB1 gene. Conditional haplotype analysis revealed three other SNPs, rs204890 (OR = 1.86, p = 1.2×10−4), rs2071349 (OR = 1.53, p = 1.0×10−3), and rs2844580 (OR = 1.43, p = 1.3×10−3) to be associated with SLE independent of the rs9271366 SNP. In univariate analysis, the OR for the C4A deletion was 1.38, p = 0.075, but after simultaneous adjustment for the other four SNPs the odds ratio was 1.01, p = 0.98. A genotype score combining the four newly identified SNPs showed an additive risk according to the number of high-risk alleles (OR = 1.67 per high-risk allele, p< 0.0001). Our strongest signal, the rs9271366 SNP, was also associated with higher risk of SLE in a previous Chinese genome-wide association study (GWAS). In addition, two SNPs found in a GWAS of European ancestry women were confirmed in our study, indicating that African Americans share some genetic risk factors for SLE with European and Chinese subjects. In summary, we found four independent signals in the MHC region associated with risk of SLE in African American women.
systemic lupus erythemathosus; African Americans; major histocompatibility complex; single nucleotide polymorphisms
Comparisons of DNA sequences between Neandertals and present-day humans have shown that Neandertals share more genetic variants with non-Africans than with Africans. This could be due to interbreeding between Neandertals and modern humans when the two groups met subsequent to the emergence of modern humans outside Africa. However, it could also be due to population structure that antedates the origin of Neandertal ancestors in Africa. We measure the extent of linkage disequilibrium (LD) in the genomes of present-day Europeans and find that the last gene flow from Neandertals (or their relatives) into Europeans likely occurred 37,000–86,000 years before the present (BP), and most likely 47,000–65,000 years ago. This supports the recent interbreeding hypothesis and suggests that interbreeding may have occurred when modern humans carrying Upper Paleolithic technologies encountered Neandertals as they expanded out of Africa.
One of the key discoveries from the analysis of the Neandertal genome is that Neandertals share more genetic variants with non-Africans than with Africans. This observation is consistent with two hypotheses: interbreeding between Neandertals and modern humans after modern humans emerged out of Africa or population structure in the ancestors of Neandertals and modern humans. These hypotheses make different predictions about the date of last gene exchange between the ancestors of Neandertals and modern non-Africans. We estimate this date by measuring the extent of linkage disequilibrium (LD) in the genomes of present-day Europeans and find that the last gene flow from Neandertals into Europeans likely occurred 37,000–86,000 years before the present (BP), and most likely 47,000–65,000 years ago. This supports the recent interbreeding hypothesis and suggests that interbreeding occurred when modern humans carrying Upper Paleolithic technologies encountered Neandertals as they expanded out of Africa.
One enduring question in evolutionary biology is the extent of archaic admixture in the genomes of present-day populations. In this paper, we present a test for ancient admixture that exploits the asymmetry in the frequencies of the two nonconcordant gene trees in a three-population tree. This test was first applied to detect interbreeding between Neandertals and modern humans. We derive the analytic expectation of a test statistic, called the D statistic, which is sensitive to asymmetry under alternative demographic scenarios. We show that the D statistic is insensitive to some demographic assumptions such as ancestral population sizes and requires only the assumption that the ancestral populations were randomly mating. An important aspect of D statistics is that they can be used to detect archaic admixture even when no archaic sample is available. We explore the effect of sequencing error on the false-positive rate of the test for admixture, and we show how to estimate the proportion of archaic ancestry in the genomes of present-day populations. We also investigate a model of subdivision in ancestral populations that can result in D statistics that indicate recent admixture.
admixture; gene genealogies; lineage sorting
The “thrifty genotype” hypothesis proposes that the high prevalence of type 2 diabetes (T2D) in Native Americans and admixed Latin Americans has a genetic basis and reflects an evolutionary adaptation to a past low calorie/high exercise lifestyle. However, identification of the gene variants underpinning this hypothesis remains elusive. Here we assessed the role of Native American ancestry, socioeconomic status (SES) and 21 candidate gene loci in susceptibility to T2D in a sample of 876 T2D cases and 399 controls from Antioquia (Colombia). Although mean Native American ancestry is significantly higher in T2D cases than in controls (32% v 29%), this difference is confounded by the correlation of ancestry with SES, which is a stronger predictor of disease status. Nominally significant association (P<0.05) was observed for markers in: TCF7L2, RBMS1, CDKAL1, ZNF239, KCNQ1 and TCF1 and a significant bias (P<0.05) towards OR>1 was observed for markers selected from previous T2D genome-wide association studies, consistent with a role for Old World variants in susceptibility to T2D in Latin Americans. No association was found to the only known Native American-specific gene variant previously associated with T2D in a Mexican sample (rs9282541 in ABCA1). An admixture mapping scan with 1,536 ancestry informative markers (AIMs) did not identify genome regions with significant deviation of ancestry in Antioquia. Exclusion analysis indicates that this scan rules out ∼95% of the genome as harboring loci with ancestry risk ratios >1.22 (at P < 0.05).
The risk of type 2 diabetes is approximately 2-fold higher in African Americans than in European Americans even after adjusting for known environmental risk factors, including socioeconomic status (SES), suggesting that genetic factors may explain some of this population difference in disease risk. However, relatively few genetic studies have examined this hypothesis in a large sample of African Americans with and without diabetes. Therefore, we performed an admixture analysis using 2,189 ancestry-informative markers in 7,021 African Americans (2,373 with type 2 diabetes and 4,648 without) from the Atherosclerosis Risk in Communities Study, the Jackson Heart Study, and the Multiethnic Cohort to 1) determine the association of type 2 diabetes and its related quantitative traits with African ancestry controlling for measures of SES and 2) identify genetic loci for type 2 diabetes through a genome-wide admixture mapping scan. The median percentage of African ancestry of diabetic participants was slightly greater than that of non-diabetic participants (study-adjusted difference = 1.6%, P<0.001). The odds ratio for diabetes comparing participants in the highest vs. lowest tertile of African ancestry was 1.33 (95% confidence interval 1.13–1.55), after adjustment for age, sex, study, body mass index (BMI), and SES. Admixture scans identified two potential loci for diabetes at 12p13.31 (LOD = 4.0) and 13q14.3 (Z score = 4.5, P = 6.6×10−6). In conclusion, genetic ancestry has a significant association with type 2 diabetes above and beyond its association with non-genetic risk factors for type 2 diabetes in African Americans, but no single gene with a major effect is sufficient to explain a large portion of the observed population difference in risk of diabetes. There undoubtedly is a complex interplay among specific genetic loci and non-genetic factors, which may both be associated with overall admixture, leading to the observed ethnic differences in diabetes risk.
Confounding due to population stratification is a potential source of concern in population-based genetic association studies, particularly in recently admixed populations such as African Americans. Several methods have been developed to control for population stratification in the context of genome-wide association studies. Because these approaches require thousands of genotypes from genetic markers, they are not well suited to be used in genetic association analyses without genome-wide data. An alternative approach to control for population stratification is to estimate admixture proportions by using ancestral informative markers (AIMs). The authors evaluated whether a relatively small number of AIMs would be sufficient to estimate ancestral proportions in African Americans. They first estimated European admixture proportions in 1,757 subjects from the Black Women's Health Study (1995–2009) by genotyping an admixture panel of 1,373 AIMs; they then compared these results with those obtained using smaller sets of AIMs. The authors found that just 30 AIMs are needed to obtain very high correlation of estimates with the entire set (r = 0.89; P < 0.0001). A set of 200 AIMs gave an almost perfect correlation with the entire set (r = 0.98; P < 0.0001). These results show that a small number of AIMs are sufficiently precise to estimate European admixture in African Americans.
African Americans; confounding factors (epidemiology); genetic association studies; genetics, population; molecular epidemiology
Sex-biased demographic events can result in asymmetries in female and male effective population size that can lead to different patterns of genetic variation on chromosome X than are expected based on the patterns on the autosomes. Previous studies point to a period around the time of the dispersal of anatomically modern humans out of Africa when chromosome X experienced a significant reduction in effective population size relative to the autosomes. Here, we explore whether a sex-biased demographic history could explain these observations. We use coalescent simulations to show that a model of primarily male migration during the out-of-Africa dispersal can produce the striking patterns that are observed when comparing patterns of genetic variation on the autosomes and chromosome X. The model involves a history in which after the founder population of non-Africans lost much of its genetic diversity, subsequent mostly male gene flow from an African source brought new diversity into the population. We also explore two additional models, one of sex-biased generation time and one of a substructured population during the dispersal out of Africa with primarily female migration among demes. These latter models cannot account for the magnitude of the observed reduction in chromosome X effective population size, although it is plausible that they played a more minor role in producing the striking chromosome X/autosome patterns.
gender-biased demography; chromosome X; autosomes; effective population size; coalescent simulations; human
Glutathione plays a crucial role in free radical scavenging, oxidative injury, and cellular homeostasis. Previously, we identified a non-synonymous polymorphism (P462S) in the gene encoding the catalytic subunit of glutamate cysteine ligase (GCLC), the rate-limiting enzyme in glutathione biosynthesis. This polymorphism is present only in individuals of African descent. Presently, we report that this ethnic-specific polymorphism (462S) encodes an enzyme with significantly decreased in vitro activity when expressed by either a bacterial or mammalian cell expression system. In addition, overexpression of the 462P wild-type GCLC enzyme results in higher intracellular glutathione concentrations than overexpression of the 462S isoform. We also demonstrate that apoptotically stimulated mammalian cells overexpressing the 462S enzyme have increased caspase activation and increased DNA laddering compared to cells overexpressing the wild-type 462P enzyme. Finally, we genotyped several African and African-descent populations and demonstrate that the 462S polymorphism is in Hardy-Weinberg dysequilibrium, with no individuals homozygous for the 462S polymorphism identified. These findings describe a glutathione production pathway polymorphism present in individuals of African descent with significantly decreased in vitro activity.
Genome-wide linkage and association studies have uncovered variants associated with sarcoidosis, a multi-organ granulomatous inflammatory disease. African ancestry may influence disease pathogenesis since African Americans are more commonly affected by sarcoidosis. Therefore, we conducted the first sarcoidosis genome-wide ancestry scan using a map of 1,384 highly ancestry informative single nucleotide polymorphisms genotyped on 1,357 sarcoidosis cases and 703 unaffected controls self-identified as African American. The most significant ancestry association was at marker rs11966463 on chromosome 6p22.3 (ancestry association risk ratio (aRR)= 1.90; p=0.0002). When we restricted the analysis to biopsy-confirmed cases, the aRR for this marker increased to 2.01; p=0.00007. Among the eight other markers that demonstrated suggestive ancestry associations with sarcoidosis were rs1462906 on chromosome 8p12 which had the most significant association with European ancestry (aRR=0.65; p=0.002), and markers on chromosomes 5p13 (aRR=1.46; p=0.005) and 5q31 (aRR=0.67; p=0.005), which correspond to regions we previously identified through sib pair linkage analyses. Overall, the most significant ancestry association for Scadding stage IV cases was to marker rs7919137 on chromosome 10p11.22 (aRR=0.27; p=2×10−5), a region not associated with disease susceptibility. In summary, through admixture mapping of sarcoidosis we have confirmed previous genetic linkages and identified several novel putative candidate loci for sarcoidosis.
Previous genetic studies have suggested a history of sub-Saharan African gene flow into some West Eurasian populations after the initial dispersal out of Africa that occurred at least 45,000 years ago. However, there has been no accurate characterization of the proportion of mixture, or of its date. We analyze genome-wide polymorphism data from about 40 West Eurasian groups to show that almost all Southern Europeans have inherited 1%–3% African ancestry with an average mixture date of around 55 generations ago, consistent with North African gene flow at the end of the Roman Empire and subsequent Arab migrations. Levantine groups harbor 4%–15% African ancestry with an average mixture date of about 32 generations ago, consistent with close political, economic, and cultural links with Egypt in the late middle ages. We also detect 3%–5% sub-Saharan African ancestry in all eight of the diverse Jewish populations that we analyzed. For the Jewish admixture, we obtain an average estimated date of about 72 generations. This may reflect descent of these groups from a common ancestral population that already had some African ancestry prior to the Jewish Diasporas.
Southern Europeans and Middle Eastern populations are known to have inherited a small percentage of their genetic material from recent sub-Saharan African migrations, but there has been no estimate of the exact proportion of this gene flow, or of its date. Here, we apply genomic methods to show that the proportion of African ancestry in many Southern European groups is 1%–3%, in Middle Eastern groups is 4%–15%, and in Jewish groups is 3%–5%. To estimate the dates when the mixture occurred, we develop a novel method that estimates the size of chromosomal segments of distinct ancestry in individuals of mixed ancestry. We verify using computer simulations that the method produces useful estimates of population mixture dates up to 300 generations in the past. By applying the method to West Eurasians, we show that the dates in Southern Europeans are consistent with events during the Roman Empire and subsequent Arab migrations. The dates in the Jewish groups are older, consistent with events in classical or biblical times that may have occurred in the shared history of Jewish populations.