The Roma people, living throughout Europe and West Asia, are a diverse population linked by the Romani language and culture. Previous linguistic and genetic studies have suggested that the Roma migrated into Europe from South Asia about 1,000–1,500 years ago. Genetic inferences about Roma history have mostly focused on the Y chromosome and mitochondrial DNA. To explore what additional information can be learned from genome-wide data, we analyzed data from six Roma groups that we genotyped at hundreds of thousands of single nucleotide polymorphisms (SNPs). We estimate that the Roma harbor about 80% West Eurasian ancestry–derived from a combination of European and South Asian sources–and that the date of admixture of South Asian and European ancestry was about 850 years before present. We provide evidence for Eastern Europe being a major source of European ancestry, and North-west India being a major source of the South Asian ancestry in the Roma. By computing allele sharing as a measure of linkage disequilibrium, we estimate that the migration of Roma out of the Indian subcontinent was accompanied by a severe founder event, which appears to have been followed by a major demographic expansion after the arrival in Europe.
Large data sets on human genetic variation have been collected recently, but their usefulness for learning about history and natural selection has been limited by biases in the ways polymorphisms were chosen. We report large subsets of SNPs from the International HapMap Project1,2 that allow us to overcome these biases and to provide accurate measurement of a quantity of crucial importance for understanding genetic variation: the allele frequency spectrum. Our analysis shows that East Asian and northern European ancestors shared the same population bottleneck expanding out of Africa but that both also experienced more recent genetic drift, which was greater in East Asians.
Strong signatures of positive selection at newly arising genetic variants are well-documented in humans1–8, but this form of selection may not be widespread in recent human evolution9. Because many human traits are highly polygenic and partly determined by common, ancient genetic variation, an alternative model for rapid genetic adaptation has been proposed: weak selection acting on many pre-existing (standing) genetic variants, or polygenic adaptation10–12. By studying height, a classic polygenic trait, we demonstrate the first human signature of widespread selection on standing variation. We show that frequencies of alleles associated with increased height, both at known loci and genome-wide, are systematically elevated in Northern Europeans compared with Southern Europeans (p<4.3×10−4). This pattern mirrors intra-European height differences and is not confounded by ancestry or other ascertainment biases. The systematic frequency differences are consistent with the presence of widespread weak selection (selection coefficients ~10−3–10−5 per allele) rather than genetic drift alone (p<10−15).
Human Genomics; Population Genetics; Europeans; Height; Selection
Hair relaxers are used by millions of black women, possibly exposing them to various chemicals through scalp lesions and burns. In the Black Women’s Health Study, the authors assessed hair relaxer use in relation to uterine leiomyomata incidence. In 1997, participants reported on hair relaxer use (age at first use, frequency, duration, number of burns, and type of formulation). From 1997 to 2009, 23,580 premenopausal women were followed for incident uterine leiomyomata. Multivariable Cox regression was used to estimate incidence rate ratios and 95% confidence intervals. During 199,991 person-years, 7,146 cases of uterine leiomyomata were reported as confirmed by ultrasound (n = 4,630) or surgery (n = 2,516). The incidence rate ratio comparing ever with never use of relaxers was 1.17 (95% confidence interval (CI): 1.06, 1.30). Positive trends were observed for frequency of use (Ptrend < 0.001), duration of use (Ptrend = 0.015), and number of burns (Ptrend < 0.001). Among long-term users (≥10 years), the incidence rate ratios for frequency of use categories 3–4, 5–6, and ≥7 versus 1–2 times/year were 1.04 (95% CI: 0.92, 1.19), 1.12 (95% CI: 0.99, 1.27), and 1.15 (95% CI: 1.01, 1.31), respectively (Ptrend = 0.002). Risk was unrelated to age at first use or type of formulation. These findings raise the hypothesis that hair relaxer use increases uterine leiomyomata risk.
African Americans; female; hair straighteners; leiomyoma; prospective studies
Genome wide association studies (GWAS) have proven a powerful method to identify common genetic variants contributing to susceptibility to common diseases. Here we show that extremely low-coverage sequencing (0.1–0.5x) captures almost as much of the common (>5%) and low-frequency (1–5%) variation across the genome as SNP arrays. As an empirical demonstration, we show that genome-wide SNP genotypes can be inferred at a mean r2 of 0.71 using off-target data (0.24x average coverage) in a whole-exome study of 909 samples. Using both simulated and real exome sequencing datasets we show that association statistics obtained using ultra low-coverage sequencing data attain similar P-values at known associated variants as genotyping arrays, without an excess of false positives. Within the context of reductions in sample preparation and sequencing costs, funds invested in ultra low-coverage sequencing can yield several times the effective sample size of SNP-array GWAS, and a commensurate increase in statistical power.
The major histocompatibility complex (MHC) on chromosome 6p21 is a key contributor to the genetic basis of systemic lupus erythemathosus (SLE). Although SLE affects African Americans disproportionately compared to European Americans, there has been no comprehensive analysis of the MHC region in relationship to SLE in African Americans. We conducted a screening of the MHC region for 1,536 single nucleotide polymorphisms (SNPs) and the deletion of the C4A gene in a SLE case-control study (380 cases, 765 age-matched controls) nested within the prospective Black Women’s Health Study. We also genotyped 1,509 ancestral informative markers throughout the genome to estimate European ancestry in order to control for population stratification due to population admixture. The most strongly associated SNP with SLE was the rs9271366 (odds ratio, OR = 1.70, p = 5.6×10−5) near the HLA-DRB1 gene. Conditional haplotype analysis revealed three other SNPs, rs204890 (OR = 1.86, p = 1.2×10−4), rs2071349 (OR = 1.53, p = 1.0×10−3), and rs2844580 (OR = 1.43, p = 1.3×10−3) to be associated with SLE independent of the rs9271366 SNP. In univariate analysis, the OR for the C4A deletion was 1.38, p = 0.075, but after simultaneous adjustment for the other four SNPs the odds ratio was 1.01, p = 0.98. A genotype score combining the four newly identified SNPs showed an additive risk according to the number of high-risk alleles (OR = 1.67 per high-risk allele, p< 0.0001). Our strongest signal, the rs9271366 SNP, was also associated with higher risk of SLE in a previous Chinese genome-wide association study (GWAS). In addition, two SNPs found in a GWAS of European ancestry women were confirmed in our study, indicating that African Americans share some genetic risk factors for SLE with European and Chinese subjects. In summary, we found four independent signals in the MHC region associated with risk of SLE in African American women.
systemic lupus erythemathosus; African Americans; major histocompatibility complex; single nucleotide polymorphisms
Comparisons of DNA sequences between Neandertals and present-day humans have shown that Neandertals share more genetic variants with non-Africans than with Africans. This could be due to interbreeding between Neandertals and modern humans when the two groups met subsequent to the emergence of modern humans outside Africa. However, it could also be due to population structure that antedates the origin of Neandertal ancestors in Africa. We measure the extent of linkage disequilibrium (LD) in the genomes of present-day Europeans and find that the last gene flow from Neandertals (or their relatives) into Europeans likely occurred 37,000–86,000 years before the present (BP), and most likely 47,000–65,000 years ago. This supports the recent interbreeding hypothesis and suggests that interbreeding may have occurred when modern humans carrying Upper Paleolithic technologies encountered Neandertals as they expanded out of Africa.
One of the key discoveries from the analysis of the Neandertal genome is that Neandertals share more genetic variants with non-Africans than with Africans. This observation is consistent with two hypotheses: interbreeding between Neandertals and modern humans after modern humans emerged out of Africa or population structure in the ancestors of Neandertals and modern humans. These hypotheses make different predictions about the date of last gene exchange between the ancestors of Neandertals and modern non-Africans. We estimate this date by measuring the extent of linkage disequilibrium (LD) in the genomes of present-day Europeans and find that the last gene flow from Neandertals into Europeans likely occurred 37,000–86,000 years before the present (BP), and most likely 47,000–65,000 years ago. This supports the recent interbreeding hypothesis and suggests that interbreeding occurred when modern humans carrying Upper Paleolithic technologies encountered Neandertals as they expanded out of Africa.
One enduring question in evolutionary biology is the extent of archaic admixture in the genomes of present-day populations. In this paper, we present a test for ancient admixture that exploits the asymmetry in the frequencies of the two nonconcordant gene trees in a three-population tree. This test was first applied to detect interbreeding between Neandertals and modern humans. We derive the analytic expectation of a test statistic, called the D statistic, which is sensitive to asymmetry under alternative demographic scenarios. We show that the D statistic is insensitive to some demographic assumptions such as ancestral population sizes and requires only the assumption that the ancestral populations were randomly mating. An important aspect of D statistics is that they can be used to detect archaic admixture even when no archaic sample is available. We explore the effect of sequencing error on the false-positive rate of the test for admixture, and we show how to estimate the proportion of archaic ancestry in the genomes of present-day populations. We also investigate a model of subdivision in ancestral populations that can result in D statistics that indicate recent admixture.
admixture; gene genealogies; lineage sorting
The “thrifty genotype” hypothesis proposes that the high prevalence of type 2 diabetes (T2D) in Native Americans and admixed Latin Americans has a genetic basis and reflects an evolutionary adaptation to a past low calorie/high exercise lifestyle. However, identification of the gene variants underpinning this hypothesis remains elusive. Here we assessed the role of Native American ancestry, socioeconomic status (SES) and 21 candidate gene loci in susceptibility to T2D in a sample of 876 T2D cases and 399 controls from Antioquia (Colombia). Although mean Native American ancestry is significantly higher in T2D cases than in controls (32% v 29%), this difference is confounded by the correlation of ancestry with SES, which is a stronger predictor of disease status. Nominally significant association (P<0.05) was observed for markers in: TCF7L2, RBMS1, CDKAL1, ZNF239, KCNQ1 and TCF1 and a significant bias (P<0.05) towards OR>1 was observed for markers selected from previous T2D genome-wide association studies, consistent with a role for Old World variants in susceptibility to T2D in Latin Americans. No association was found to the only known Native American-specific gene variant previously associated with T2D in a Mexican sample (rs9282541 in ABCA1). An admixture mapping scan with 1,536 ancestry informative markers (AIMs) did not identify genome regions with significant deviation of ancestry in Antioquia. Exclusion analysis indicates that this scan rules out ∼95% of the genome as harboring loci with ancestry risk ratios >1.22 (at P < 0.05).
The risk of type 2 diabetes is approximately 2-fold higher in African Americans than in European Americans even after adjusting for known environmental risk factors, including socioeconomic status (SES), suggesting that genetic factors may explain some of this population difference in disease risk. However, relatively few genetic studies have examined this hypothesis in a large sample of African Americans with and without diabetes. Therefore, we performed an admixture analysis using 2,189 ancestry-informative markers in 7,021 African Americans (2,373 with type 2 diabetes and 4,648 without) from the Atherosclerosis Risk in Communities Study, the Jackson Heart Study, and the Multiethnic Cohort to 1) determine the association of type 2 diabetes and its related quantitative traits with African ancestry controlling for measures of SES and 2) identify genetic loci for type 2 diabetes through a genome-wide admixture mapping scan. The median percentage of African ancestry of diabetic participants was slightly greater than that of non-diabetic participants (study-adjusted difference = 1.6%, P<0.001). The odds ratio for diabetes comparing participants in the highest vs. lowest tertile of African ancestry was 1.33 (95% confidence interval 1.13–1.55), after adjustment for age, sex, study, body mass index (BMI), and SES. Admixture scans identified two potential loci for diabetes at 12p13.31 (LOD = 4.0) and 13q14.3 (Z score = 4.5, P = 6.6×10−6). In conclusion, genetic ancestry has a significant association with type 2 diabetes above and beyond its association with non-genetic risk factors for type 2 diabetes in African Americans, but no single gene with a major effect is sufficient to explain a large portion of the observed population difference in risk of diabetes. There undoubtedly is a complex interplay among specific genetic loci and non-genetic factors, which may both be associated with overall admixture, leading to the observed ethnic differences in diabetes risk.
Confounding due to population stratification is a potential source of concern in population-based genetic association studies, particularly in recently admixed populations such as African Americans. Several methods have been developed to control for population stratification in the context of genome-wide association studies. Because these approaches require thousands of genotypes from genetic markers, they are not well suited to be used in genetic association analyses without genome-wide data. An alternative approach to control for population stratification is to estimate admixture proportions by using ancestral informative markers (AIMs). The authors evaluated whether a relatively small number of AIMs would be sufficient to estimate ancestral proportions in African Americans. They first estimated European admixture proportions in 1,757 subjects from the Black Women's Health Study (1995–2009) by genotyping an admixture panel of 1,373 AIMs; they then compared these results with those obtained using smaller sets of AIMs. The authors found that just 30 AIMs are needed to obtain very high correlation of estimates with the entire set (r = 0.89; P < 0.0001). A set of 200 AIMs gave an almost perfect correlation with the entire set (r = 0.98; P < 0.0001). These results show that a small number of AIMs are sufficiently precise to estimate European admixture in African Americans.
African Americans; confounding factors (epidemiology); genetic association studies; genetics, population; molecular epidemiology
Sex-biased demographic events can result in asymmetries in female and male effective population size that can lead to different patterns of genetic variation on chromosome X than are expected based on the patterns on the autosomes. Previous studies point to a period around the time of the dispersal of anatomically modern humans out of Africa when chromosome X experienced a significant reduction in effective population size relative to the autosomes. Here, we explore whether a sex-biased demographic history could explain these observations. We use coalescent simulations to show that a model of primarily male migration during the out-of-Africa dispersal can produce the striking patterns that are observed when comparing patterns of genetic variation on the autosomes and chromosome X. The model involves a history in which after the founder population of non-Africans lost much of its genetic diversity, subsequent mostly male gene flow from an African source brought new diversity into the population. We also explore two additional models, one of sex-biased generation time and one of a substructured population during the dispersal out of Africa with primarily female migration among demes. These latter models cannot account for the magnitude of the observed reduction in chromosome X effective population size, although it is plausible that they played a more minor role in producing the striking chromosome X/autosome patterns.
gender-biased demography; chromosome X; autosomes; effective population size; coalescent simulations; human
Glutathione plays a crucial role in free radical scavenging, oxidative injury, and cellular homeostasis. Previously, we identified a non-synonymous polymorphism (P462S) in the gene encoding the catalytic subunit of glutamate cysteine ligase (GCLC), the rate-limiting enzyme in glutathione biosynthesis. This polymorphism is present only in individuals of African descent. Presently, we report that this ethnic-specific polymorphism (462S) encodes an enzyme with significantly decreased in vitro activity when expressed by either a bacterial or mammalian cell expression system. In addition, overexpression of the 462P wild-type GCLC enzyme results in higher intracellular glutathione concentrations than overexpression of the 462S isoform. We also demonstrate that apoptotically stimulated mammalian cells overexpressing the 462S enzyme have increased caspase activation and increased DNA laddering compared to cells overexpressing the wild-type 462P enzyme. Finally, we genotyped several African and African-descent populations and demonstrate that the 462S polymorphism is in Hardy-Weinberg dysequilibrium, with no individuals homozygous for the 462S polymorphism identified. These findings describe a glutathione production pathway polymorphism present in individuals of African descent with significantly decreased in vitro activity.
Genome-wide linkage and association studies have uncovered variants associated with sarcoidosis, a multi-organ granulomatous inflammatory disease. African ancestry may influence disease pathogenesis since African Americans are more commonly affected by sarcoidosis. Therefore, we conducted the first sarcoidosis genome-wide ancestry scan using a map of 1,384 highly ancestry informative single nucleotide polymorphisms genotyped on 1,357 sarcoidosis cases and 703 unaffected controls self-identified as African American. The most significant ancestry association was at marker rs11966463 on chromosome 6p22.3 (ancestry association risk ratio (aRR)= 1.90; p=0.0002). When we restricted the analysis to biopsy-confirmed cases, the aRR for this marker increased to 2.01; p=0.00007. Among the eight other markers that demonstrated suggestive ancestry associations with sarcoidosis were rs1462906 on chromosome 8p12 which had the most significant association with European ancestry (aRR=0.65; p=0.002), and markers on chromosomes 5p13 (aRR=1.46; p=0.005) and 5q31 (aRR=0.67; p=0.005), which correspond to regions we previously identified through sib pair linkage analyses. Overall, the most significant ancestry association for Scadding stage IV cases was to marker rs7919137 on chromosome 10p11.22 (aRR=0.27; p=2×10−5), a region not associated with disease susceptibility. In summary, through admixture mapping of sarcoidosis we have confirmed previous genetic linkages and identified several novel putative candidate loci for sarcoidosis.
Previous genetic studies have suggested a history of sub-Saharan African gene flow into some West Eurasian populations after the initial dispersal out of Africa that occurred at least 45,000 years ago. However, there has been no accurate characterization of the proportion of mixture, or of its date. We analyze genome-wide polymorphism data from about 40 West Eurasian groups to show that almost all Southern Europeans have inherited 1%–3% African ancestry with an average mixture date of around 55 generations ago, consistent with North African gene flow at the end of the Roman Empire and subsequent Arab migrations. Levantine groups harbor 4%–15% African ancestry with an average mixture date of about 32 generations ago, consistent with close political, economic, and cultural links with Egypt in the late middle ages. We also detect 3%–5% sub-Saharan African ancestry in all eight of the diverse Jewish populations that we analyzed. For the Jewish admixture, we obtain an average estimated date of about 72 generations. This may reflect descent of these groups from a common ancestral population that already had some African ancestry prior to the Jewish Diasporas.
Southern Europeans and Middle Eastern populations are known to have inherited a small percentage of their genetic material from recent sub-Saharan African migrations, but there has been no estimate of the exact proportion of this gene flow, or of its date. Here, we apply genomic methods to show that the proportion of African ancestry in many Southern European groups is 1%–3%, in Middle Eastern groups is 4%–15%, and in Jewish groups is 3%–5%. To estimate the dates when the mixture occurred, we develop a novel method that estimates the size of chromosomal segments of distinct ancestry in individuals of mixed ancestry. We verify using computer simulations that the method produces useful estimates of population mixture dates up to 300 generations in the past. By applying the method to West Eurasians, we show that the dates in Southern Europeans are consistent with events during the Roman Empire and subsequent Arab migrations. The dates in the Jewish groups are older, consistent with events in classical or biblical times that may have occurred in the shared history of Jewish populations.
Obesity is an important cause of morbidity and mortality worldwide. In the U.S., the prevalence of obesity is higher in African Americans than whites, even after adjustment for socioeconomic status. This leads to the hypothesis that differences in genetic background may contribute to racial/ethnic differences in obesity-related traits. We tested this hypothesis by conducting a genome-wide admixture mapping scan using 1,350 ancestry-informative SNPs in 3,531 self-identified blacks from the Atherosclerosis Risk in Communities (ARIC) study. We used these markers to estimate the overall proportions of European ancestry (PEA) for each individual and then scanned for the association between PEA and obesity-related traits (both continuous and dichotomous) at each locus. The median (interquartile range) PEA was 0.151 (0.115). PEA was inversely correlated with continuous body mass index (BMI), weight, and subscapular skinfold thickness, even after adjusting for socioeconomic factors. In contrast, PEA was positively correlated with BMI-adjusted waist circumference. Using admixture mapping on dichotomized traits, we identified a locus on 2p23.3 to be suggestively associated with BMI (locus-specific LOD = 4.11) and weight (locus-specific LOD = 4.07). After adjusting for global PEA, each additional copy of a European ancestral allele at the 2p23.3 peak was associated with a BMI decrease of ∼0.92 kg/m2 (p = 2.9 × 10-5). Further mapping in this region on chromosome 2 may be able to uncover causative variants underlying obesity, which may offer insights into the control of energy homeostasis.
Lipoprotein(a) (Lp(a)) is an important causal cardiovascular risk factor, with serum Lp(a) levels predicting atherosclerotic heart disease and genetic determinants of Lp(a) levels showing association with myocardial infarction. Lp(a) levels vary widely between populations, with African-derived populations having nearly 2-fold higher Lp(a) levels than European Americans. We investigated the genetic basis of this difference in 4464 African Americans from the Jackson Heart Study (JHS) using a panel of up to 1447 ancestry informative markers, allowing us to accurately estimate the African ancestry proportion of each individual at each position in the genome. In an unbiased genome-wide admixture scan for frequency-differentiated genetic determinants of Lp(a) level, we found a convincing peak (LOD = 13.6) at 6q25.3, which spans the LPA locus. Dense fine-mapping of the LPA locus identified a number of strongly associated, common biallelic SNPs, a subset of which can account for up to 7% of the variation in Lp(a) level, as well as >70% of the African-European population differences in Lp(a) level. We replicated the association of the most strongly associated SNP, rs9457951 (p = 6×10−22, 27% change in Lp(a) per allele, ∼5% of Lp(a) variance explained in JHS), in 1,726 African Americans from the Dallas Heart Study and found an even stronger association after adjustment for the kringle(IV) repeat copy number. Despite the strong association with Lp(a) levels, we find no association of any LPA SNP with incident coronary heart disease in 3,225 African Americans from the Atherosclerosis Risk in Communities Study.
This study compares three extant elephants species - forest, savanna, and Asian - to extinct mammoth and mastodon. Surprisingly, forest and savanna elephants in Africa today are as distinct from each other as mammoth and Asian elephants.
To elucidate the history of living and extinct elephantids, we generated 39,763 bp of aligned nuclear DNA sequence across 375 loci for African savanna elephant, African forest elephant, Asian elephant, the extinct American mastodon, and the woolly mammoth. Our data establish that the Asian elephant is the closest living relative of the extinct mammoth in the nuclear genome, extending previous findings from mitochondrial DNA analyses. We also find that savanna and forest elephants, which some have argued are the same species, are as or more divergent in the nuclear genome as mammoths and Asian elephants, which are considered to be distinct genera, thus resolving a long-standing debate about the appropriate taxonomic classification of the African elephants. Finally, we document a much larger effective population size in forest elephants compared with the other elephantid taxa, likely reflecting species differences in ancient geographic structure and range and differences in life history traits such as variance in male reproductive success.
The living elephants are the last survivors of a once highly successful mammalian order, the Proboscidea, which includes extinct species such as the iconic woolly mammoth (Mammuthus primigenius) and the American mastodon (Mammut americanum). Despite numerous studies, the phylogenetic relationships of the modern elephants to the woolly mammoth, as well as the taxonomic status of the African elephants of the genus Loxodonta, remain controversial. This is in large part due to the fact that both the woolly mammoth and the American mastodon (the closest outgroup to elephants and mammoths available for genetic studies) are extinct, posing considerable technical hurdles for comparative genetic analysis. We have used a combination of modern DNA sequencing and targeted PCR amplification to obtain a large data set for comparing American mastodon, woolly mammoth, Asian elephant, African savanna elephant, and African forest elephant. We unequivocally establish that the Asian elephant is the sister species to the woolly mammoth. A surprising finding from our study is that the divergence of African savanna and forest elephants—which some have argued to be two populations of the same species—is about as ancient as the divergence of Asian elephants and mammoths. Given their ancient divergence, we conclude that African savanna and forest elephants should be classified as two distinct species.
There are considerable racial disparities in prostate cancer risk, with a 60% higher incidence rate among African American (AA) men compared with European American (EA) men, and a 2.4 fold higher mortality rate in AA men than in EA men. Recently, studies have implicated several African-ancestry associated prostate cancer susceptibility loci on chromosome 8q24. In the current study, we performed admixture mapping in AA men from two independent case-control studies of prostate cancer to confirm the 8q24 ancestry association and also identify other genomic regions that may harbor prostate cancer susceptibility genes. A total of 482 cases and 261 controls were genotyped for 1,509 ancestry informative markers across the genome. The mean estimated individual admixture proportions were 20% European and 80% African. The most significant observed increase in European ancestry occurred at rs2141360 on chromosome 7q31 in both the case-only (p=0.0000035) and case-control analyses. The most significant observed increase in African ancestry across the genome occurred at a locus on chromosome 5q35 identified by SNPs rs7729084 (case-only analysis: p=0.002), and rs12474977 (case-control analysis: p=0.004), which are separated by 646 kb and were adjacent to one another on the panel. On chromosome 8, rs4367565 was associated with the greatest excess African ancestry in both the case-only and case-control analyses (case-only and case-control p=0.02), confirming previously reported African-ancestry associations with chromosome 8q24. In conclusion, we confirmed ancestry associations on 8q24, and identified additional ancestry-associated regions potentially harboring prostate cancer susceptibility loci.
Prostate Cancer; Admixture Mapping; Ancestry; PODXL; DOCK4
African American women with breast cancer present more commonly with aggressive tumors that do not express the estrogen receptor (ER) and progesterone receptor (PR) compared with European American women. Whether this disparity is the result of inherited factors has not been established. We performed an admixture-based genome-wide scan to search for risk alleles for breast cancer that are highly differentiated in frequency between African American and European American women and may contribute to specific breast cancer phenotypes, such as ER negative (ER−) disease. African American women with invasive breast cancer (n=1,484) were pooled from 6 population-based studies and typed at ~1,500 ancestry informative markers (AIMs). We investigated global genetic ancestry and performed a whole genome admixture scan searching for breast cancer predisposing loci in association with disease phenotypes. We found a significant difference in ancestry between ER+PR+ and ER−PR− women, with higher European ancestry among ER+PR+ individuals, after controlling for possible confounders (OR for a 0 to 1 change in European ancestry proportion=2.84, 95% CI: 1.13–7.14, p=0.026). Women with localized tumors had higher European ancestry than women with non-localized tumors (OR=2.65, CI: 1.11–6.35, p=0.029). No genome-wide statistically significant associations were observed between European or African ancestry at any specific locus and breast cancer, or in analyses stratified by ER/PR status, stage or grade. In summary, in African American women genetic ancestry is associated with ER/PR status and disease stage. However, we found little evidence that genetic ancestry at any one region contributes significantly to breast cancer risk or hormone receptor status.
African Americans; breast cancer; admixture mapping; hormone receptor status; genetic ancestry
Microsatellite length mutations are often modeled using the generalized stepwise mutation process, which is a type of random walk. If this model is sufficiently accurate, one can estimate the coalescence time between alleles of a locus after a mathematical transformation of the allele lengths. When large-scale microsatellite genotyping first became possible, there was substantial interest in using this approach to make inferences about time and demography, but that interest has waned because it has not been possible to empirically validate the clock by comparing it with data in which the mutation process is well understood. We analyzed data from 783 microsatellite loci in human populations and 292 loci in chimpanzee populations, and compared them with up to one gigabase of aligned sequence data, where the molecular clock based upon nucleotide substitutions is believed to be reliable. We empirically demonstrate a remarkable linearity (r2 > 0.95) between the microsatellite average square distance statistic and sequence divergence. We demonstrate that microsatellites are accurate molecular clocks for coalescent times of at least 2 million years (My). We apply this insight to confirm that the African populations San, Biaka Pygmy, and Mbuti Pygmy have the deepest coalescent times among populations in the Human Genome Diversity Project. Furthermore, we show that microsatellites support unbiased estimates of population differentiation (FST) that are less subject to ascertainment bias than single nucleotide polymorphism (SNP) FST. These results raise the prospect of using microsatellite data sets to determine parameters of population history. When genotyped along with SNPs, microsatellite data can also be used to correct for SNP ascertainment bias.
microsatellite evolution; molecular clocks; coalescent time; average square distance; FST; SNP ascertainment bias
Retinal vascular caliber provides information about the structure and health of the microvascular system and is associated with cardiovascular and cerebrovascular diseases. Compared to European Americans, African Americans tend to have wider retinal arteriolar and venular caliber, even after controlling for cardiovascular risk factors. This has suggested the hypothesis that differences in genetic background may contribute to racial/ethnic differences in retinal vascular caliber. Using 1,365 ancestry-informative SNPs, we estimated the percentage of African ancestry (PAA) and conducted genome-wide admixture mapping scans in 1,737 African Americans from the Atherosclerosis Risk in Communities (ARIC) study. Central retinal artery equivalent (CRAE) and central retinal vein equivalent (CRVE) representing summary measures of retinal arteriolar and venular caliber, respectively, were measured from retinal photographs. PAA was significantly correlated with CRVE (ρ = 0.071, P = 0.003), but not CRAE (ρ = 0.032, P = 0.182). Using admixture mapping, we did not detect significant admixture association with either CRAE (genome-wide score = −0.73) or CRVE (genome-wide score = −0.69). An a priori subgroup analysis among hypertensive individuals detected a genome-wide significant association of CRVE with greater African ancestry at chromosome 6p21.1 (genome-wide score = 2.31, locus-specific LOD = 5.47). Each additional copy of an African ancestral allele at the 6p21.1 peak was associated with an average increase in CRVE of 6.14 µm in the hypertensives, but had no significant effects in the non-hypertensives (P for heterogeneity <0.001). Further mapping in the 6p21.1 region may uncover novel genetic variants affecting retinal vascular caliber and further insights into the interaction between genetic effects of the microvascular system and hypertension.
Retinal vessels provide a window to microvascular systems elsewhere in the body. The diameter of retinal vessels varies between racial/ethnic groups, being generally wider in African Americans compared to European Americans. To determine whether genetic background may contribute to this observed difference, we scanned the entire genomes of 1,737 African Americans, searching for genomic regions where individuals with either wider retinal venular or narrower retinal arteriolar caliber have a difference from the average percentage of African ancestry. We find that the percentage of African ancestry is positively correlated with retinal venular caliber, particularly in the hypertensive individuals. We detect substantive evidence of association between excess African ancestry and wider retinal venular caliber on chromosome 6p21.1 in the hypertensives, but not in the non-hypertensives. The 6p21.1 region contains genes that are known to be involved in development and modulation and of retinal vessels. Our results suggest that genetic factors may contribute to the observed difference in retinal vascular caliber between African Americans and European Americans. Further fine-mapping studies of the genomic region may identify variants affecting retinal vascular caliber.
Allele frequency differences across populations can provide valuable information both for studying population structure and for identifying loci that have been targets of natural selection. Here, we examine the relationship between recombination rate and population differentiation in humans by analyzing two uniformly-ascertained, whole-genome data sets. We find that population differentiation as assessed by inter-continental FST shows negative correlation with recombination rate, with FST reduced by 10% in the tenth of the genome with the highest recombination rate compared with the tenth of the genome with the lowest recombination rate (P≪10−12). This pattern cannot be explained by the mutagenic properties of recombination and instead must reflect the impact of selection in the last 100,000 years since human continental populations split. The correlation between recombination rate and FST has a qualitatively different relationship for FST between African and non-African populations and for FST between European and East Asian populations, suggesting varying levels or types of selection in different epochs of human history.
A common assumption when analyzing patterns of human genetic variation is that most of the genome can be treated as “nearly neutral,” in the sense that the effects of natural selection on allele frequencies are very small compared with the influence of population demographic history. To test the validity of this assumption, we analyzed data from more than a million human polymorphisms and summarized allele frequency differences across populations. We find that, compared with the genome-wide average, allele frequency differences are 7% reduced on average in the tenth of the genome with the highest recombination rate and are 3% increased in the tenth with the lowest rate. Such a correlation cannot be explained by demography. Instead, the pattern reflects the fact that forces of natural selection have had a profound impact on patterns of variation throughout the genome in the last 100,000 years.