Motivation: Admixed populations offer a unique opportunity for mapping diseases that have large disease allele frequency differences between ancestral populations. However, association analysis in such populations is challenging because population stratification may lead to association with loci unlinked to the disease locus.
Methods and results: We show that local ancestry at a test single nucleotide polymorphism (SNP) may confound with the association signal and ignoring it can lead to spurious association. We demonstrate theoretically that adjustment for local ancestry at the test SNP is sufficient to remove the spurious association regardless of the mechanism of population stratification, whether due to local or global ancestry differences among study subjects; however, global ancestry adjustment procedures may not be effective. We further develop two novel association tests that adjust for local ancestry. Our first test is based on a conditional likelihood framework which models the distribution of the test SNP given disease status and flanking marker genotypes. A key advantage of this test lies in its ability to incorporate different directions of association in the ancestral populations. Our second test, which is computationally simpler, is based on logistic regression, with adjustment for local ancestry proportion. We conducted extensive simulations and found that the Type I error rates of our tests are under control; however, the global adjustment procedures yielded inflated Type I error rates when stratification is due to local ancestry difference.
Contact: firstname.lastname@example.org; email@example.com.
Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Adjustment for population structure is necessary to avoid bias in genetic association studies of susceptibility variants for complex diseases. Population structure may differ from one genomic region to another due to the variability of individual ancestry associated with migration, random genetic drift or natural selection. Current association methods for correcting population stratification usually involve adjustment of global ancestry between study subjects.
Results: We suggest interrogating local population structure for fine mapping to more accurately locate true casual genes by better adjusting the confounding effect due to local ancestry. By extensive simulations on genome-wide datasets, we show that adjusting global ancestry may lead to false positives when local population structure is an important confounding factor. In contrast, adjusting local ancestry can effectively prevent false positives due to local population structure and thus can improve fine mapping for disease gene localization. We applied the local and global adjustments to the analysis of datasets from three genome-wide association studies, including European Americans, African Americans and Nigerians. Both European Americans and African Americans demonstrate greater variability in local ancestry than Nigerians. Adjusting local ancestry successfully eliminated the known spurious association between SNPs in the LCT gene and height due to the population structure existed in European Americans.
Supplementary information: Supplementary data are available at Bioinformatics online.
Populations of the Americas were founded by early migrants from Asia, and some have experienced recent genetic admixture. To better characterize the native and non-native ancestry components in populations from the Americas, we analyzed 815,377 autosomal SNPs, mitochondrial hypervariable segments I and II, and 36 Y-chromosome STRs from 24 Mesoamerican Totonacs and 23 South American Bolivians.
Results and Conclusions
We analyzed common genomic regions from native Bolivian and Totonac populations to identify 324 highly predictive Native American ancestry informative markers (AIMs). As few as 40–50 of these AIMs perform nearly as well as large panels of random genome-wide SNPs for predicting and estimating Native American ancestry and admixture levels. These AIMs have greater New World vs. Old World specificity than previous AIMs sets. We identify highly-divergent New World SNPs that coincide with high-frequency haplotypes found at similar frequencies in all populations examined, including the HGDP Pima, Maya, Colombian, Karitiana, and Surui American populations. Some of these regions are potential candidates for positive selection. European admixture in the Bolivian sample is approximately 12%, though individual estimates range from 0–48%. We estimate that the admixture occurred ~360–384 years ago. Little evidence of European or African admixture was found in Totonac individuals. Bolivians with pre-Columbian mtDNA and Y-chromosome haplogroups had 5–30% autosomal European ancestry, demonstrating the limitations of Y-chromosome and mtDNA haplogroups and the need for autosomal ancestry informative markers for assessing ancestry in admixed populations.
Admixture; Ancestry Informative Markers (AIMs); Native Americans; Bolivian; Totonac; Positive selection
In genetic association studies, it is necessary to correct for population structure to avoid inference bias. During the past decade, prevailing corrections often only involved adjustments of global ancestry differences between sampled individuals. Nevertheless, population structure may vary across local genomic regions due to the variability of local ancestries associated with natural selection, migration, or random genetic drift. Adjusting for global ancestry alone may be inadequate when local population structure is an important confounding factor. In contrast, adjusting for local ancestry can more effectively prevent false-positives due to local population structure. To more accurately locate disease genes, we recommend adjusting for local ancestries by interrogating local structure. In practice, locus-specific ancestries are usually unknown and cannot be accurately inferred when ancestral population information is not available. For such scenarios, we propose employing local principal components (PC) to represent local ancestries and adjusting for local PCs when testing for genotype–phenotype association. With an acceptable computation burden, the proposed algorithm successfully eliminates the known spurious association between SNPs in the LCT gene and height due to the population structure in European Americans.
Genome-wide association studies; Local ancestries; Local principal components; Migration; Random genetic drift; Natural selection; Genomic inflation factor; Genomic control; Local ancestry principal components correction; Fine mapping
Genome-wide association studies in cohorts of European descent have identified novel genomic regions as associated with lipids, but their relevance in African Americans remains unclear.
Methods and Results
We genotyped 8 index SNPs and 488 tagging SNPs across 8 novel lipid loci in the Jackson Heart Study, a community-based cohort of 4605 African Americans. For each trait, we calculated residuals adjusted for age, sex, and global ancestry and performed multivariable linear regression to detect genotype-phenotype association with adjustment for local ancestry. To explore admixture effects, we conducted stratified analyses in individuals with a high probability of 2 African ancestral alleles or at least 1 European allele at each locus. We confirmed 2 index SNPs as associated with lipid traits in African Americans, with suggestive association for 3 more. However, the effect sizes for 4 of the 5 associated SNPs were larger in the European local ancestry subgroup compared to the African local ancestry subgroup, suggesting that the replication is driven by European ancestry segments. Through fine-mapping, we discovered 3 new SNPs with significant associations, two with consistent effect on triglyceride levels across ancestral groups: rs636523 near DOCK7/ANGPTL3 and rs780093 in GCKR. African LD patterns did not assist in narrowing association signals.
We confirm that 5 genetic regions associated with lipid traits in European-derived populations are relevant in African Americans. To further evaluate these loci, fine-mapping in larger African American cohorts and/or resequencing will be required.
lipids; genetics; epidemiology; risk factors
A 58kb region on chromosome 9p21.3 has consistently shown strong association with coronary artery disease (CAD) in multiple genome-wide association studies in populations of European and East Asian ancestry. In this study we sought to further characterize the role of genetic variants in 9p21.3 in African American individuals.
Methods and Results
Apparently healthy African American siblings (n=548) of patients with documented CAD <60 years of age were genotyped and followed for incident CAD for up to 17 years. Tests of association for 86 SNPs across the 9p21.3 region in a GEE logistic framework under an additive model adjusting for traditional risk factors, family, follow-up time, and population stratification were performed. A single SNP within the CDKN2B gene met stringent criteria for statistical significance, including permutation-based evaluations. This variant, rs3217989, was common (minor allele [G] frequency 0.242), conveyed protection against CAD (OR=0.19, 95% CI: 0.07 to 0.50, p=0.0008) and was replicated in a combined analysis of two additional case/control studies of prevalent CAD/MI in African Americans (n=990, p=0.024, OR= 0.779, 95% CI: 0.626-0.968).
This is the first report of a CAD association signal in a population of African ancestry with a common variant within the CDKN2B gene, independent from previous findings in European and East Asian ancestry populations. The findings demonstrate a significant protective effect against incident CAD in African American siblings of persons with premature CAD, with replication in a combination of two additional African American cohorts.
African American; CDKN2B; Coronary Artery Disease; Genetics; 9p21
While genome-wide association studies (GWAS) have primarily examined populations of European ancestry, more recent studies often involve additional populations, including admixed populations such as African Americans and Latinos. In admixed populations, linkage disequilibrium (LD) exists both at a fine scale in ancestral populations and at a coarse scale (admixture-LD) due to chromosomal segments of distinct ancestry. Disease association statistics in admixed populations have previously considered SNP association (LD mapping) or admixture association (mapping by admixture-LD), but not both. Here, we introduce a new statistical framework for combining SNP and admixture association in case-control studies, as well as methods for local ancestry-aware imputation. We illustrate the gain in statistical power achieved by these methods by analyzing data of 6,209 unrelated African Americans from the CARe project genotyped on the Affymetrix 6.0 chip, in conjunction with both simulated and real phenotypes, as well as by analyzing the FGFR2 locus using breast cancer GWAS data from 5,761 African-American women. We show that, at typed SNPs, our method yields an 8% increase in statistical power for finding disease risk loci compared to the power achieved by standard methods in case-control studies. At imputed SNPs, we observe an 11% increase in statistical power for mapping disease loci when our local ancestry-aware imputation framework and the new scoring statistic are jointly employed. Finally, we show that our method increases statistical power in regions harboring the causal SNP in the case when the causal SNP is untyped and cannot be imputed. Our methods and our publicly available software are broadly applicable to GWAS in admixed populations.
This paper presents improved methodologies for the analysis of genome-wide association studies in admixed populations, which are populations that came about by the mixing of two or more distant continental populations over a few hundred years (e.g., African Americans or Latinos). Studies of admixed populations offer the promise of capturing additional genetic diversity compared to studies over homogeneous populations such as Europeans. In admixed populations, correlation between genetic variants exists both at a fine scale in the ancestral populations and at a coarse scale due to chromosomal segments of distinct ancestry. Disease association statistics in admixed populations have previously considered either one or the other type of correlation, but not both. In this work we develop novel statistical methods that account for both types of genetic correlation, and we show that the combined approach attains greater statistical power than that achieved by applying either approach separately. We provide analysis of simulated and real data from major studies performed in African-American men and women to show the improvement obtained by our methods over the standard methods for analyzing association studies in admixed populations.
The haplotypes of the X chromosome are accessible to direct count in males, whereas the diplotypes of the females may be inferred knowing the haplotype of their sons or fathers. Here, we investigated: 1) the possible large-scale haplotypic structure of the X chromosome in a Caucasian population sample, given the single-nucleotide polymorphism (SNP) maps and genotypes provided by Illumina and Affimetrix for Genetic Analysis Workshop 14, and, 2) the performances of widely used programs in reconstructing haplotypes from population genotypic data, given their known distribution in a sample of unrelated individuals.
All possible unrelated mother-son pairs of Caucasian ancestry (N = 104) were selected from the 143 families of the Collaborative Study on the Genetics of Alcoholism pedigree files, and the diplotypes of the mothers were inferred from the X chromosomes of their sons. The marker set included 313 SNPs at an average density of 0.47 Mb. Linkage disequilibrium between pairs of markers was computed by the parameter D', whereas for measuring multilocus disequilibrium, we developed here an index called D*, and applied it to all possible sliding windows of 5 markers each. Results showed a complex pattern of haplotypic structure, with regions of low linkage disequilibrium separated by regions of high values of D*. The following programs were evaluated for their accuracy in inferring population haplotype frequencies: 1) ARLEQUIN 2.001; 2) PHASE 2.1.1; 3) SNPHAP 1.1; 4) HAPLOBLOCK 1.2; 5) HAPLOTYPER 1.0. Performances were evaluated by Pearson correlation (r) coefficient between the true and the inferred distribution of haplotype frequencies.
The SNP haplotypic structure of the X chromosome is complex, with regions of high haplotype conservation interspersed among regions of higher haplotype diversity. All the tested programs were accurate (r = 1) in reconstructing the distribution of haplotype frequencies in case of high D* values. However, only the program PHASE realized a high correlation coefficient (r > 0.7) in conditions of low linkage disequilibrium.
Chronic kidney disease (CKD) is an increasing global public health concern, particularly among populations of African ancestry. We performed an interrogation of known renal loci, genome-wide association (GWA), and IBC candidate-gene SNP association analyses in African Americans from the CARe Renal Consortium. In up to 8,110 participants, we performed meta-analyses of GWA and IBC array data for estimated glomerular filtration rate (eGFR), CKD (eGFR <60 mL/min/1.73 m2), urinary albumin-to-creatinine ratio (UACR), and microalbuminuria (UACR >30 mg/g) and interrogated the 250 kb flanking region around 24 SNPs previously identified in European Ancestry renal GWAS analyses. Findings were replicated in up to 4,358 African Americans. To assess function, individually identified genes were knocked down in zebrafish embryos by morpholino antisense oligonucleotides. Expression of kidney-specific genes was assessed by in situ hybridization, and glomerular filtration was evaluated by dextran clearance. Overall, 23 of 24 previously identified SNPs had direction-consistent associations with eGFR in African Americans, 2 of which achieved nominal significance (UMOD, PIP5K1B). Interrogation of the flanking regions uncovered 24 new index SNPs in African Americans, 12 of which were replicated (UMOD, ANXA9, GCKR, TFDP2, DAB2, VEGFA, ATXN2, GATM, SLC22A2, TMEM60, SLC6A13, and BCAS3). In addition, we identified 3 suggestive loci at DOK6 (p-value = 5.3×10−7) and FNDC1 (p-value = 3.0×10−7) for UACR, and KCNQ1 with eGFR (p = 3.6×10−6). Morpholino knockdown of kcnq1 in the zebrafish resulted in abnormal kidney development and filtration capacity. We identified several SNPs in association with eGFR in African Ancestry individuals, as well as 3 suggestive loci for UACR and eGFR. Functional genetic studies support a role for kcnq1 in glomerular development in zebrafish.
Chronic kidney disease (CKD) is an increasing global public health problem and disproportionately affects populations of African ancestry. Many studies have shown that genetic variants are associated with the development of CKD; however, similar studies are lacking in African ancestry populations. The CARe consortium consists of more than 8,000 individuals of African ancestry; genome-wide association analysis for renal-related phenotypes was conducted. In cross-ethnicity analyses, we found that 23 of 24 previously identified SNPs in European ancestry populations have the same effect direction in our samples of African ancestry. We also identified 3 suggestive genetic variants associated with measurement of kidney function. We then tested these genes in zebrafish knockdown models and demonstrated that kcnq1 is involved in kidney development in zebrafish. These results highlight the similarity of genetic variants across ethnicities and show that cross-species modeling in zebrafish is feasible for genes associated with chronic human disease.
Population structure occurs when a sample is composed of individuals with different ancestries and can result in excess type I error in genome-wide association studies. Genome-wide principal-component analysis (PCA) has become a popular method for identifying and adjusting for subtle population structure in association studies. Using the Genetic Analysis Workshop 16 (GAW16) NARAC data, we explore two unresolved issues concerning the use of genome-wide PCA to account for population structure in genetic associations studies: the choice of single-nucleotide polymorphism (SNP) subset and the choice of adjustment model. We computed PCs for subsets of genome-wide SNPs with varying levels of LD. The first two PCs were similar for all subsets and the first three PCs were associated with case status for all subsets. When the PCs associated with case status were included as covariates in an association model, the reduction in genomic inflation factor was similar for all SNP sets. Several models have been proposed to account for structure using PCs, but it is not yet clear whether the different methods will result in substantively different results for association studies with individuals of European descent. We compared genome-wide association p-values and results for two positive-control SNPs previously associated with rheumatoid arthritis using four PC adjustment methods as well as no adjustment and genomic control. We found that in this sample, adjusting for the continuous PCs or adjusting for discrete clusters identified using the PCs adequately accounts for the case-control population structure, but that a recently proposed randomization test performs poorly.
There are many ways to perform adjustment for population structure. It remains unclear what the optimal approach is and whether the optimal approach varies by the type of samples and substructure present. The simplest and most straightforward approach is to adjust for the continuous principal components (PCs) that capture ancestry. Through simulation, we explored the issue of which ancestry informative PCs should be adjusted for in an association model to control for the confounding nature of population structure while maintaining maximum power. A thorough examination of selecting PCs for adjustment in a case-control study across the possible structure scenarios that could occur in a genome-wide association study has not been previously reported.
We found that when the SNP and phenotype frequencies do not vary over the sub-populations, all methods of selection provided similar power and appropriate Type I error for association. When the SNP is not structured and the phenotype has large structure, then selection methods that do not select PCs for inclusion as covariates generally provide the most power. When there is a structured SNP and a non-structured phenotype, selection methods that include PCs in the model have greater power. When both the SNP and the phenotype are structured, all methods of selection have similar power.
Standard practice is to include a fixed number of PCs in genome-wide association studies. Based on our findings, we conclude that if power is not a concern, then selecting the same set of top PCs for adjustment for all SNPs in logistic regression is a strategy that achieves appropriate Type I error. However, standard practice is not optimal in all scenarios and to optimize power for structured SNPs in the presence of unstructured phenotypes, PCs that are associated with the tested SNP should be included in the logistic model.
Principal components analysis (PCA) has been successfully used to correct for population stratification in genome-wide association studies of common variants. However, rare variants also have a role in common disease etiology. Whether PCA successfully controls population stratification for rare variants has not been addressed. Thus we evaluate the effect of population stratification analysis on false-positive rates for common and rare variants at the single-nucleotide polymorphism (SNP) and gene level. We use the simulation data from Genetic Analysis Workshop 17 and compare false-positive rates with and without PCA at the SNP and gene level. We found that SNPs’ minor allele frequency (MAF) influenced the ability of PCA to effectively control false discovery. Specifically, PCA reduced false-positive rates more effectively in common SNPs (MAF > 0.05) than in rare SNPs (MAF < 0.01). Furthermore, at the gene level, although false-positive rates were reduced, power to detect true associations was also reduced using PCA. Taken together, these results suggest that sequence-level data should be interpreted with caution, because extremely rare SNPs may exhibit sporadic association that is not controlled using PCA.
Advances in genotyping technologies have contributed to a better understanding of human population genetic structure and improved the analysis of association studies. To analyze patterns of human genetic variation in Brazil, we used SNP data from 1129 individuals – 138 from the urban population of Sao Paulo, Brazil, and 991 from 11 populations of the HapMap Project. Principal components analysis was performed on the SNPs common to these populations, to identify the composition and the number of SNPs needed to capture the genetic variation of them. Both admixture and local ancestry inference were performed in individuals of the Brazilian sample. Individuals from the Brazilian sample fell between Europeans, Mexicans, and Africans. Brazilians are suggested to have the highest internal genetic variation of sampled populations. Our results indicate, as expected, that the Brazilian sample analyzed descend from Amerindians, African, and/or European ancestors, but intermarriage between individuals of different ethnic origin had an important role in generating the broad genetic variation observed in the present-day population. The data support the notion that the Brazilian population, due to its high degree of admixture, can provide a valuable resource for strategies aiming at using admixture as a tool for mapping complex traits in humans.
genetic structure; Brazilian; admixture mapping; admixture
We investigated the ability of several principal components analysis (PCA)-based strategies to detect and control for population stratification using data from a multi-center study of epithelial ovarian cancer among women of European-American ethnicity. These include a correction based on an ancestry informative markers (AIMs) panel designed to capture European ancestral variation and corrections utilizing un-thinned genome-wide SNP data; case-control samples were drawn from four geographically distinct North-American sites. The AIMs-only and genome-wide first principal components (PC1) both corresponded to the previously described North or Northwest-Southeast axis of European variation. We found that the genome-wide PCA captured this primary dimension of variation more precisely and identified additional axes of genome-wide variation of relevance to epithelial ovarian cancer. Associations evident between the genome-wide PCs and study site corroborate North American immigration history and suggest that undiscovered dimensions of variation lie within Northern Europe. The structure captured by the genome-wide PCA was also found within control individuals and did not reflect the case-control variation present in the data. The genome-wide PCA highlighted three regions of local LD, corresponding to the lactase (LCT) gene on chromosome 2, the human leukocyte antigen system (HLA) on chromosome 6 and to a common inversion polymorphism on chromosome 8. These features did not compromise the efficacy of PCs from this analysis for ancestry control. This study concludes that although AIMs panels are a cost-effective way of capturing population structure, genome-wide data should preferably be used when available.
Recent studies in population of European ancestry have shown that 30%∼50% of heritability for human complex traits such as height and body mass index, and common diseases such as schizophrenia and rheumatoid arthritis, can be captured by common SNPs and that genetic variation attributed to chromosomes are in proportion to their length. Using genome-wide estimation and partitioning approaches, we analysed 49 human quantitative traits, many of which are relevant to human diseases, in 7,170 unrelated Korean individuals genotyped on 326,262 SNPs. For 43 of the 49 traits, we estimated a nominally significant (P<0.05) proportion of variance explained by all SNPs on the Affymetrix 5.0 genotyping array (). On average across 47 of the 49 traits for which the estimate of is non-zero, common SNPs explain approximately one-third (range of 7.8% to 76.8%) of narrow sense heritability.
The estimate of is highly correlated with the proportion of SNPs with association P<0.031 (r2 = 0.92). Longer genomic segments tend to explain more phenotypic variation, with a correlation of 0.78 between the estimate of variance explained by individual chromosomes and their physical length, and 1% of the genome explains approximately 1% of the genetic variance. Despite the fact that there are a few SNPs with large effects for some traits, these results suggest that polygenicity is ubiquitous for most human complex traits and that a substantial proportion of the “missing heritability” is captured by common SNPs.
The “missing heritability” problem has been intensely debated for the last few years. Possible explanations include the existence of many genetic variants each with a small effect, rare variants with large effects, and heritability being over-estimated. Previous studies using whole-genome estimation have demonstrated that for human complex traits such as height, body mass index, and intelligence, a large portion of the heritability can be captured by all the common SNPs on the current genotyping arrays. These studies, however, were all concentrated only on a few traits. In this study, we analysed 49 quantitative traits in a sample of ∼7,000 unrelated Korean individuals. We found that, on average over all the traits, common SNPs on the Affymetrix 5.0 genotyping array explain approximately a third of the heritability, that genetic variants are widely distributed across the whole genome with longer chromosomes explaining more phenotypic variation, and that approximately any 1% of the genome explains 1% of the heritability. Despite examples where a few variants explain a substantial amount of variation, all these results are consistent with polygenicity being ubiquitous for most complex traits.
Pericardial fat is a localized fat depot associated with coronary artery calcium and myocardial infarction. We hypothesized that genetic loci would be associated with pericardial fat independent of other body fat depots. Pericardial fat was quantified in 5,487 individuals of European ancestry from the Framingham Heart Study (FHS) and the Multi-Ethnic Study of Atherosclerosis (MESA). Genotyping was performed using standard arrays and imputed to ∼2.5 million Hapmap SNPs. Each study performed a genome-wide association analysis of pericardial fat adjusted for age, sex, weight, and height. A weighted z-score meta-analysis was conducted, and validation was obtained in an additional 3,602 multi-ethnic individuals from the MESA study. We identified a genome-wide significant signal in our primary meta-analysis at rs10198628 near TRIB2 (MAF 0.49, p = 2.7×10-08). This SNP was not associated with visceral fat (p = 0.17) or body mass index (p = 0.38), although we observed direction-consistent, nominal significance with visceral fat adjusted for BMI (p = 0.01) in the Framingham Heart Study. Our findings were robust among African ancestry (n = 1,442, p = 0.001), Hispanic (n = 1,399, p = 0.004), and Chinese (n = 761, p = 0.007) participants from the MESA study, with a combined p-value of 5.4E-14. We observed TRIB2 gene expression in the pericardial fat of mice. rs10198628 near TRIB2 is associated with pericardial fat but not measures of generalized or visceral adiposity, reinforcing the concept that there are unique genetic underpinnings to ectopic fat distribution.
Pericardial fat is a localized fat depot associated with coronary artery calcium and myocardial infarction. To test whether genetic loci are associated with pericardial fat independent of other body fat depots, we measured pericardial fat in 5,487 individuals of European ancestry. After performing an unbiased screen using genome-wide association, we identified a genome-wide significant signal in our primary meta-analysis at rs10198628 near TRIB2 (MAF 0.49, p = 2.7×10-08). This SNP was not associated with visceral fat (p = 0.17) or body mass index (p = 0.38). Our findings were robust among multi-ethnic participants from the MESA study, with a combined p-value of 5.4E-14. We observed TRIB2 gene expression in the pericardial fat of mice. rs10198628 near TRIB2 is associated with pericardial fat but not measures of generalized or visceral adiposity, reinforcing the concept that there are unique genetic underpinnings to ectopic fat distribution.
Peripheral arterial disease (PAD) is associated with significant morbidity and mortality, and has a higher prevalence in African Americans than Caucasians. Ankle arm index (AAI) is the ratio of systolic blood pressure in the leg to that in the arm, and, when low, is a marker of PAD. We used an admixture mapping approach to search for genetic loci associated with low AAI. Using data from 1040 African-American participants in the observational, population-based Health, Aging, and Body Composition Study who were genotyped at 1322 single nucleotide polymorphisms(SNPs) that are informative for African versus European ancestry and span the entire genome, we estimated genetic ancestry in each chromosomal region and then tested the association between AAI and genetic ancestry at each locus. We found a region of chromosome 11 that reaches its peak between 80 and 82 Mb associated with low AAI (p<0.001 for rs12289502 and rs9665943, both within this region). 753 African-American participants in the observational, population-based Cardiovascular Health Study were genotyped at rs9665943 to test the reproducibility of this association, and this association was also statistically significant (odds ratio(OR) for homozygous African genotype 1.59 (95% confidence interval (CI) 1.12–2.27)). Another candidate SNP (rs1042602) in the same genomic region was tested in both populations, and was also found to be significantly associated with low AAI in both populations (OR for homozygous African genotype 1.89 (95% CI 1.29–2.76)). This study identifies a novel region of chromosome 11 representing an area with a potential candidate gene associated with PAD in African Americans.
peripheral vascular disease; genetics; African-American
The major histocompatibility complex (MHC) on chromosome 6p21 is a key contributor to the genetic basis of systemic lupus erythemathosus (SLE). Although SLE affects African Americans disproportionately compared to European Americans, there has been no comprehensive analysis of the MHC region in relationship to SLE in African Americans. We conducted a screening of the MHC region for 1,536 single nucleotide polymorphisms (SNPs) and the deletion of the C4A gene in a SLE case-control study (380 cases, 765 age-matched controls) nested within the prospective Black Women’s Health Study. We also genotyped 1,509 ancestral informative markers throughout the genome to estimate European ancestry in order to control for population stratification due to population admixture. The most strongly associated SNP with SLE was the rs9271366 (odds ratio, OR = 1.70, p = 5.6×10−5) near the HLA-DRB1 gene. Conditional haplotype analysis revealed three other SNPs, rs204890 (OR = 1.86, p = 1.2×10−4), rs2071349 (OR = 1.53, p = 1.0×10−3), and rs2844580 (OR = 1.43, p = 1.3×10−3) to be associated with SLE independent of the rs9271366 SNP. In univariate analysis, the OR for the C4A deletion was 1.38, p = 0.075, but after simultaneous adjustment for the other four SNPs the odds ratio was 1.01, p = 0.98. A genotype score combining the four newly identified SNPs showed an additive risk according to the number of high-risk alleles (OR = 1.67 per high-risk allele, p< 0.0001). Our strongest signal, the rs9271366 SNP, was also associated with higher risk of SLE in a previous Chinese genome-wide association study (GWAS). In addition, two SNPs found in a GWAS of European ancestry women were confirmed in our study, indicating that African Americans share some genetic risk factors for SLE with European and Chinese subjects. In summary, we found four independent signals in the MHC region associated with risk of SLE in African American women.
systemic lupus erythemathosus; African Americans; major histocompatibility complex; single nucleotide polymorphisms
In this study, we assessed association of genome-wide association studies (GWAS) “hits” by race with adjustment for potential population stratification (PS) in two large, diverse study populations; the Carolina Breast Cancer Study (CBCS; N total = 3693 individuals) and the University of Pennsylvania Study of Clinical Outcomes, Risk, and Ethnicity (SCORE; N total = 1135 individuals). In both study populations, 136 ancestry information markers and GWAS “hits” (CBCS: FGFR2, 8q24; SCORE: JAZF1, MSMB, 8q24) were genotyped. Principal component analysis was used to assess ancestral differences by race. Multivariable unconditional logistic regression was used to assess differences in cancer risk with and without adjustment for the first ancestral principal component (PC1) and for an interaction effect between PC1 and the GWAS “hit” (SNP) of interest. PC1 explained 53.7% of the variance for CBCS and 49.5% of the variance for SCORE. European Americans and African Americans were similar in their ancestral structure between CBCS and SCORE and cases and controls were well matched by ancestry. In the CBCS European Americans, 9/11 SNPs were significant after PC1 adjustment, but after adjustment for the PC1 by SNP interaction effect, only one SNP remained significant (rs1219648 in FGFR2); for CBCS African Americans, 6/11 SNPs were significant after PC1 adjustment and after adjustment for the PC1 by SNP interaction effect, all six SNPs remained significant and an additional SNP now became significant. In the SCORE European Americans, 0/9 SNPs were significant after PC1 adjustment and no changes were seen after additional adjustment for the PC1 by SNP interaction effect; for SCORE African Americans, 2/9 SNPs were significant after PC1 adjustment and after adjustment for the PC1 by SNP interaction effect, only one SNP remained significant (rs16901979 at 8q24). We show that genetic associations by race are modified by interaction between individual SNPs and PS.
population stratification; ancestry; prostate cancer; breast cancer; GWAS “hits”
A common allele at the TAGAP gene locus demonstrates a suggestive, but not conclusive association with risk of rheumatoid arthritis (RA). To fine map the locus, we conducted comprehensive imputation of CEU HapMap single-nucleotide polymorphisms (SNPs) in a genome-wide association study (GWAS) of 5500 RA cases and 22 621 controls (all of European ancestry). After controlling for population stratification with principal components analysis, the strongest signal of association was to an imputed SNP, rs212389 (P=3.9 × 10−8, odds ratio=0.87). This SNP remained highly significant upon conditioning on the previous RA risk variant (rs394581, P=2.2 × 10−5) or on a SNP previously associated with celiac disease and type I diabetes (rs1738074, P=1.7 × 10−4). Our study has refined the TAGAP signal of association to a single haplotype in RA, and in doing so provides conclusive statistical evidence that the TAGAP locus is associated with RA risk. Our study also underscores the utility of comprehensive imputation in large GWAS data sets to fine map disease risk alleles.
TAGAP; genetics; rheumatoid arthritis
Because previous preclinical and clinical studies have implicated the endogenous opioid system in major depression and in the neurochemical action of antidepressants, the authors examined how DNA variation in the μ-opioid receptor gene may influence population variation in response to citalopram treatment.
A total of 1,953 individuals from the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study were treated with citalopram and genotyped for 53 single nucleotide polymorphisms (SNPs) in a 100-kb region of the OPRM1 gene. The sample consisted of Non-Hispanic Caucasians, Hispanic Caucasians, and African Americans. Population stratification was corrected using 119 ancestry informative markers and principal components analysis. Markers were tested for association with phenotypes for general and specific citalopram response as well as remission.
Association between one SNP and specific citalopram response was observed. After Bonferroni correction, the strongest finding was the association between the rs540825 SNP and specific response. The rs540825 polymorphism is a nonsynonymous SNP in the final exon of the μ-opioid receptor-1X isoform of the OPRM1 gene, resulting in a histidine to glutamine change in the intra-cellular domain of the receptor. When Hispanic and Non-Hispanic Caucasians were analyzed separately, similar results in the population-corrected analyses were detected.
These results suggest that rates of response to antidepressants and consequent remission from major depressive disorder are influenced by variation in the μ-opioid receptor gene as a result of either an effect on placebo response or true pharmacologic response.
Genetic association studies can be used to identify factors that may contribute to disparities in disease evident across different racial and ethnic populations. However, such studies may not account for potential confounding if study populations are genetically heterogeneous. Racial and ethnic classifications have been used as proxies for genetic relatedness. We investigated genetic admixture and developed a questionnaire to explore variables used in constructing racial identity in two cohorts – 50 African Americans (AAs) and 40 Nigerians. Genetic ancestry was determined by genotyping 107 ancestry informative markers. Ancestry estimates calculated with maximum likelihood estimation were compared with population stratification detected with principal component analysis. Ancestry was approximately 95% west African, 4% European, and 1% Native American in the Nigerian cohort and 83% west African, 15% European, and 2% Native American in the AA cohort. Therefore, self-identification as AA agreed well with inferred west African ancestry. However, the cohorts differed significantly in mean percentage west African and European ancestries (P < 0.0001) and in the variance for individual ancestry (P ≤ 0.01). Among AAs, no set of questionnaire items effectively estimated degree of west African ancestry, and self-report of a high degree of African ancestry in a three-generation family tree did not accurately predict degree of African ancestry. Our findings suggest that self-reported race and ancestry can predict ancestral clusters, but do not reveal the extent of admixture. Genetic classifications of ancestry may provide a more objective and accurate method of defining homogenous populations for the investigation of specific population-disease associations.
African American; race; ancestry; genetic admixture
It is well-known that population substructure may lead to confounding in case-control association studies. Here, we examined genetic structure in a large racially and ethnically diverse sample consisting of 5 ethnic groups of the Multiethnic Cohort study (African Americans, Japanese Americans, Latinos, European Americans and Native Hawaiians) using 2,509 SNPs distributed across the genome. Principal component analysis on 6,213 study participants, 18 Native Americans and 11 HapMap III populations revealed 4 important principal components (PCs): the first two separated Asians, Europeans and Africans, and the third and fourth corresponded to Native American and Native Hawaiian (Polynesian) ancestry, respectively. Individual ethnic composition derived from self-reported parental information matched well to genetic ancestry for Japanese and European Americans. STRUCTURE-estimated individual ancestral proportions for African Americans and Latinos are consistent with previous reports. We quantified the East Asian (mean 27%), European (mean 27%) and Polynesian (mean 46%) ancestral proportions for the first time, to our knowledge, for Native Hawaiians. Simulations based on realistic settings of case-control studies nested in the Multiethnic Cohort found that the effect of population stratification was modest and readily corrected by adjusting for race/ethnicity or by adjusting for top PCs derived from all SNPs or from ancestry informative markers; the power of these approaches was similar when averaged across causal variants simulated based on allele frequencies of the 2,509 genotyped markers. The bias may be large in case-only analysis of gene by gene interactions but it can be corrected by top PCs derived from all SNPs.
AIMs; African American; Native Hawaiian; Latino; admixture; principal component analysis
An individual’s genotypes at a group of Single Nucleotide
Polymorphisms (SNPs) can be used to predict that individual’s
ethnicity, or ancestry. In medical studies, knowledge of a subject’s
ancestry can minimize possible confounding, and in forensic applications,
such knowledge can help direct investigations. Our goal is to select a small
subset of SNPs, from the millions already identified in the human genome,
that can predict ancestry with a minimal error rate.
The general form for this variable selection procedure is to estimate
the expected error rates for sets of SNPs using a training dataset and
consider those sets with the lowest error rates given their size. The
quality of the estimate for the error rate determines the quality of the
resulting SNPs. As the apparent error rate performs poorly
when either the number of SNPs or the number of populations is large, we
propose a new estimate, the Improved Bayesian Estimate.
We demonstrate that selection procedures based on this estimate
produce small sets of SNPs that can accurately predict ancestry. We also
provide a list of the 100 optimal SNPs for identifying ancestry.
R functions are available at http://bioinformatics.med.yale.edu/group/josh/index.html.
Ancestry; Ethnicity; SNPs; Error Rate; Allele Frequency; Genotype; AIM; Bootstrap; FOSSIL
The single nucleotide polymorphism (SNP) rs11761231 on chromosome 7q has been reported as a sexually dimorphic marker for rheumatoid arthritis susceptibility in a British population. We sought to replicate this finding and better characterize susceptibility alleles in the region in a North American population.
DNA from two North American collections of RA patients and controls (1605 cases and 2640 controls) was genotyped for rs11761231 and 16 additional chromosome 7q tag SNPs using Sequenom iPlex assays. Association tests were performed for each collection and also separately contrasting male cases versus male controls and female cases versus female controls. Principal components analysis (EIGENSTRAT) was used to determine association with RA before and after adjusting for population stratification in the subset of the samples (772 cases and 1213 controls) with whole genome SNP data.
We failed to replicate association of the 7q region with rheumatoid arthritis. Initially, rs11761231 showed evidence for association with RA in the NARAC collection (p=0.0076) and rs11765576 showed association with RA in both the NARAC (p = 0.019) and RA replication (p = 0.0013) collections. These markers also exhibited sexual differentiation. However, in the whole genome subset, neither SNP showed significant association with RA after correction for population stratification.
While two SNPs on chromosome 7q appeared to be associated with RA in a North American cohort, the significance of this finding did not withstand correction for population substructure. Our results emphasize the need to carefully account for population structure to avoid false positive disease associations.