|Home | About | Journals | Submit | Contact Us | Français|
Genome-wide association studies in cohorts of European descent have identified novel genomic regions as associated with lipids, but their relevance in African Americans remains unclear.
We genotyped 8 index SNPs and 488 tagging SNPs across 8 novel lipid loci in the Jackson Heart Study, a community-based cohort of 4605 African Americans. For each trait, we calculated residuals adjusted for age, sex, and global ancestry and performed multivariable linear regression to detect genotype-phenotype association with adjustment for local ancestry. To explore admixture effects, we conducted stratified analyses in individuals with a high probability of 2 African ancestral alleles or at least 1 European allele at each locus. We confirmed 2 index SNPs as associated with lipid traits in African Americans, with suggestive association for 3 more. However, the effect sizes for 4 of the 5 associated SNPs were larger in the European local ancestry subgroup compared to the African local ancestry subgroup, suggesting that the replication is driven by European ancestry segments. Through fine-mapping, we discovered 3 new SNPs with significant associations, two with consistent effect on triglyceride levels across ancestral groups: rs636523 near DOCK7/ANGPTL3 and rs780093 in GCKR. African LD patterns did not assist in narrowing association signals.
We confirm that 5 genetic regions associated with lipid traits in European-derived populations are relevant in African Americans. To further evaluate these loci, fine-mapping in larger African American cohorts and/or resequencing will be required.
Two principal goals of genome wide association (GWA) studies have been the identification of predictive genetic markers of human disease1 and the unbiased discovery of novel mechanisms of disease pathogenesis. To date, the majority of GWA studies have been conducted in populations of European ancestry. Although this population homogeneity has facilitated statistical analyses, it remains unclear whether future predictive algorithms or therapies based upon recently discovered variants will be relevant in other racial groups, such as African Americans.
African Americans represent an admixed population, carrying segments of DNA on their chromosomes that are derived from both European American (~20% of chromosomal regions) and West African (~80%) ancestral populations.2 These ancestral populations were subjected to differing demographic history, resulting in different allele frequencies and linkage disequilibrium patterns in their genomes.2 Thus, it is possible that specific genetic variants associated with disease in European populations may not be associated in African American individuals. In addition to a potential clinical utility for predicting and treating disease in worldwide populations, it has, in fact, been proposed that carrying out genetic association analysis in different ethnic populations with different patterns of linkage disequilibrium will be useful to narrow genetic loci and potentially identify causal variants.3
Using a GWA approach, we and others identified novel loci as associated with LDL-cholesterol (LDL), HDL-cholesterol (HDL), and/or triglycerides (TG) in populations of European descent.4, 5 We recently confirmed the association of 1 newly discovered locus (rs646776 in 1p13 near CELSR2/PSRC1/SORT1) with LDL in the Third National Health and Nutritional Examination Survey (NHANES III), a multiethnic cohort consisting of non-Hispanic black, Mexican American, and non-Hispanic white individuals.6 However, there was variable replication among the three ethnicities for the remaining loci, which may have been due to limited power to detect associations, differences in linkage disequilibrium patterns, or fundamental differences in the genetic architecture of the quantitative traits. Furthermore, statistical analysis of traits in admixed populations is more complex due to population stratification, which can lead to both false negative and positive associations. We have recently implemented a method that utilizes genotypes from a dense panel of markers differentiated in frequency between European and African populations to allow accurate measurement of ancestry across the genome and correct for issues of stratification.2
In the current study, we directly addressed the following issues governing the relevance of specific genetic variants and, more broadly, genetic loci discovered in European-derived populations to African Americans. First, we sought to validate eight recently reported “index” polymorphisms for LDL, HDL, and TG in a large community-based cohort of African American individuals. This analysis was done with stringent control for local ancestry to account for population stratification. Furthermore, we densely tagged the eight genomic regions in linkage disequilibrium with the index polymorphisms to determine whether these loci were relevant to variation in plasma lipids in African Americans. Finally, we used local-ancestry based analysis to explore whether fine-mapping in an African American cohort would serve the additional goal of narrowing the genetic loci of interest and potentially identifying causal variants.2
The Jackson Heart Study (JHS) is a community-based, observational study developed as an extension of the Jackson, Mississippi cohort of the Atherosclerosis Risk in Communities (ARIC) study. The overall cohort consists of 5,301 individuals and was designed to identify risk factors for common cardiovascular diseases in African Americans. Of the 4,605 participants whose consent allowed their inclusion in the current study, 1,448 are part of a family study.
Fasting LDL, HDL, and TG were measured in each participant using previously described methods.2
Because the index SNPs were discovered via GWA in cohorts of European ancestry, we first identified the interval in which common variants were in LD with the index SNPs in the HapMap population of northern and western European ancestry (CEU) with a r2 threshold of ≥ 0.25. Given the power of the original studies, this interval is likely to contain the causal SNP(s) of interest. Using Tagger7, we selected tag SNPs along this specified region in the Yoruban West African HapMap population (YRI) with an r2 threshold of 0.80 and minor allele frequency (MAF) ≥ 1%. We then forced these SNPs into Tagger to tag the same region in CEU with an r2 0.80 and MAF ≥1%.2 In total, 488 SNPs were genotyped (1q42 near GALNT2 = 13 SNPs, 19p13 near CILP2/PBX4 = 145 SNPs, 7q11 near TBL2/MLXIPL = 33 SNPs, 8q24 near TRIB1 = 53 SNPs, 1p13 near CELSR2/PSRC1/SORT1 = 57 SNPs, 1p31 near ANGPTL3 = 30 SNPs, 12q24 near KCTD10 = 80 SNPs, 2p23 near GCKR = 77 SNPs).
Genotyping was performed on the Sequenom platform, which utilizes matrix-assisted laser-desorption ionization time-of-flight mass spectroscopy as described previously.1 SNPs with a genotype call rate < 90% (n=90) and individuals with genotyping success rates < 75% (n=90) were excluded from analysis. Hardy-Weinberg equilibrium was not used to judge genotyping quality given the established deviation of SNP frequencies from equilibrium in admixed populations.8
ANCESTRYMAP software9 was used for estimation of global and local ancestry in JHS as previously described.2 Briefly, the Hidden Markov Model underlying this program uses genotypes from a panel of densely spaced markers differentiated in frequency between African and European populations for ancestry inference. Two overlapping panels of ancestry informative markers (AIMs) were used for genotyping the 4,464 JHS individuals: 976 were genotyped on an older “Phase 2” Panel of 1,536 markers, and 3,488 were genotyped on an updated “Phase 3” Panel of 1,536 markers. These various Phases represent improvements in the admixture mapping panels in terms of information content, genome coverage, and SNP genotyping efficiency.
For subjects on lipid lowering medication (n=560), an underlying untreated LDL value was estimated using imputation.10 HDL and TG were log transformed. Residuals for each lipid trait were created using SAS 9.0 (Cary, NC) and were adjusted for age, gender, and global ancestry.
All association analyses were performed using MERLIN to account for family structure among related individuals in JHS and for imputation of missing genotypes based on inheritance patterns.11 The multivariable adjusted residual was used as the phenotype to test phenotype-genotype association. We utilized two strategies to account for confounding by local ancestry. First, linear regression models were further adjusted for a continuous local ancestry variable derived from output of the ANCESTRYMAP program9, which takes as input AIM genotypes and uses a Hidden Markov Model to estimate local ancestry for individuals across the genome. In addition, we performed stratified genotype-phenotype association analyses in individuals with >95% probability of 2 African ancestral chromosomes (JHS_afr) and in individuals with >95% probability of at least 1 European ancestral allele (JHS_eur) at each locus of interest,2 with a different JHS_afr subpopulation (of ~1600–1800 individuals) and JHS_eur subpopulation (of ~ 600 individuals) at each locus (e.g. SORT1 locus). Within each of these subgroups, we evaluate the association of SNP genotype with the three lipid phenotype traits, computing an effect size and p-value.
Because there were 13 comparisons being considered for GWAS SNP replication, we applied a Bonferroni cutoff of 0.05/13 or 0.004 as our threshold of significance of these associations. SNPs were considered “suggestive” if p-values were in the range 0.05>p>0.004. For declaration of statistical significance in the fine-mapping analysis, the Benjamini-Hochberg procedure (p-adjust in R 2.8.1) was performed to provide conservative estimates of tail-based false discovery rates (FDR). An FDR threshold of 0.15 was chosen for LDL, HDL, and TG.
Power calculations for lipid traits were performed by simulation, using a linear regression model of lipid plasma level (in standardized units) against genotype with minor allele frequencies ranging from 0.05–0.50 and testing a range of effect sizes ranging from 0.02–0.35 standardized units. This was performed for 2 purposes: replication of index SNPs in the entire JHS population, and discovery of new SNPs in the JHS_afr subpopulation. For replication, the number of individuals was set at 3,300 (corresponding to unrelated members in JHS) with a p-value of 0.05/13=0.004 as threshold. For SNP discovery in JHS_afr, which involved testing a large number of SNPs with low prior odds of association, the number of individuals was set at 1,700 and the critical p-value for declaring significance set at 0.05/300=0.00017. 500 simulations were performed at each combination of minor allele frequency and effect size (percentage of residual variance explained), and, for each minor allele frequency, the largest effect size at which the significance threshold was exceeded in >80% of the runs was reported.
Participant characteristics for individuals in JHS are displayed in Table 1. Because local ancestry estimates vary among loci, characteristics of local ancestry subgroups at the chr1p13 locus are shown as a representative sample. After excluding participants with TG ≥ 400mg/dL and/or low genotyping rate (<90%), genotype-phenotype data from 4,474 African American individuals were analyzed.
The genetic associations of blood lipids traits in JHS with eight index SNPs identified through GWA are shown in Table 2. Given that several index SNPs were associated with more than one lipid trait in the original studies, we evaluated a total of 13 SNP-lipid trait associations, defining significance as an unadjusted p-value of less than 0.004, with effect in the same direction as the original study. Associations with p-values > 0.004 and < 0.05 with consistent direction of change as the original study were considered suggestive.
At this threshold, 2 of the 8 index SNPs genotyped in the JHS were significant: 1 associated with TG (rs1260326) and 1 with LDL (rs646776). Three SNPs were suggestive (rs12130333, rs17145738, and rs10774708). rs1260326 in GCKR was strongly associated with plasma TG levels, decreasing TG by ~ 7.3% per copy of the major allele across the entire cohort (p=7.4E-05). Two other loci were more weakly associated with TG: rs12130333 near DOCK7/ANGPTL3, and rs17145738 near TBL2 on chromosome 7. The genotype at each of these SNPs resulted in a ~4% change in TG concentrations (p=0.04 for both). Of the 3 SNPs associated with LDL in GWA studies, only rs646776 was significant in JHS (p=1.2E-04), with each copy of the major allele increasing LDL ~3.3 mg/dL across the entire cohort. Finally, rs10774708 in KCTD10 was modestly associated with HDL, increasing HDL by approximately 1% per copy of the major allele (p=0.04).
In order to determine whether local ancestry at each index locus influences effect sizes, we performed stratified analyses comparing individuals with a >95% probability of African versus European ancestry in the eight index SNP regions. Representative demographic characteristics for the 1p13 subgroups are shown in Table 1. For 4 of the 5 index SNPs significantly/suggestively associated with lipid traits in African Americans (rs1260326, rs12130333, and rs17145738 with TG and rs646776 with LDL), European ancestry appeared to drive the associations (Table 2). For these SNPs the effect sizes on the European ancestral background exceeded those seen on the African ancestral background, ranging from 1.6 to 3 fold differences in magnitude of effect. Interestingly, rs4846914 near GALNT2 was not associated with HDL in the overall analysis despite evidence for association on the African chromosomal segment (p_all=0.09, p_afr=0.02). However, this finding may be spurious and warrants replication in additional samples.
To discover potentially more informative variants in African Americans, we undertook dense fine mapping for each of the eight genetic loci within JHS. Controlling the false-discovery rate at <15% gave 1 significant association for LDL, 0 significant associations for HDL, and 2 significant associations for TG as detailed in Table 3 (see Methods).
For two of the six genomic regions densely genotyped and tested for association with TG, we identified SNPs with stronger associations with TG in African Americans than their corresponding index GWAS SNPs. Depicted in Figures 1 and and22 are SNPs in these two regions, 2p23 containing GCKR and 1p31 containing DOCK7/ANGPTL3, that were significant at an FDR of 0.15. rs780093, an intronic SNP in GCKR, was most strongly associated with TG concentrations, with each copy of the major allele resulting in a ~8% reduction in TG (p=4.6E-06) and with similar effects on the African and European local ancestral backgrounds. rs780093 is in moderate/strong LD with rs1260326, the index SNP at this locus, in both YRI and CEU HapMap populations (r2=0.65 and 0.93, respectively). In fact rs780093 has subsequently been found to strongly associated with TG levels in a large meta-analysis of >19,000 individuals of European descent, with p=3.1E-29.12 Although rs1260326 is a missense coding variant in the GCKR gene, our data suggest that rs780093 may either be the causal variant, or be more tightly linked to the causal variant at this locus by showing greater strength of association (p=4.6E-06 vs. 7.4E-05). Although we cannot rule out statistical fluctuation, it is interesting that this nominally stronger association is predominantly driven by a more significant association on the African local ancestry background (p=0.004 vs. p=0.12) and is supported by a consistent effect size of 8% per allele in both African and European ancestry subgroups. As described above, such a similarity in effect size is lacking for rs1260326.
Within the 1p31 region near DOCK7 and ANGPTL3, rs636523 had a markedly stronger association with TG than the index SNP rs12130333 (p=4.6E-04 vs. p=0.04). Each copy of the risk allele affected TG concentrations by ~4%. rs636523 is in moderate LD with the index SNP rs12130333 in the CEU Hapmap population (r2=0.45) but in weak LD in YRI (r2=0.04). The subsequent meta-analysis described above has in fact found a stronger association of rs636523 with TG (p=2.91E-07), than the index SNP rs12130333 (p=0.0004), in keeping with our results. Moreover, rs636523 demonstrates comparable effect sizes on both ancestral backgrounds rendering it a more plausible candidate as a causal variant. Although our work cannot shed light on the gene most likely regulated by this SNP, the ANGPTL3 gene is known to harbor rare variants that regulate TG metabolism13; therefore, it is likely the causal gene at this locus.
rs629301 at the 1p13 locus was nominally more strongly associated with LDL compared with the index SNP rs646776 (p=5.4E-05 vs. p=1.2E-04; Figure 3). rs629301 (in blue) is in strong LD with rs646776 (in red) in both the CEU and Yoruban (YRI) HapMap populations (r2=0.92 in YRI and r2=1.0 in CEU); hence, the effect sizes per copy of the major allele are similar to those seen with rs646776 (~ 3.5 mg/dL), with similar disparities in effect sizes between the African and European local ancestry subgroups. We cannot exclude statistical fluctuation as driving the subtle differences in p-values. rs629301 was also found to be subsequently strongly associated with LDL in the large meta-analysis of European Ancestry individuals described above (p=2.147E-41)
Fine mapping analysis provided no further support for the importance of the other chromosomal regions in determining lipid profiles in African Americans.
Applying false discovery rates of 0.15, we found no variants for LDL, HDL or TG that met our significance threshold.
In the Jackson Heart Study, we confirmed that five SNPs discovered in European-derived populations were associated with lipid traits in African Americans at a significant or suggestive level. Furthermore, we identified three SNPs with stronger associations in African Americans than the index discovery SNPs at their respective loci. Taken together, our findings suggested that the same genomic regions that predict lipid traits in European-derived populations are likely relevant in African Americans, but that the specific variants within each locus may differ among ethnic groups. Finally, we found that fine-mapping genomic regions on an African ancestral background has limited utility to narrow genomic regions of association in an admixed cohort of this size.
Three index GWA SNPs did not meet our significance threshold in JHS. This failure to replicate may have occurred for a variety of reasons. Our analysis has modestly reduced power (Supplemental Table) due to sample size relative to the original reports, which were meta-analyses of ~8000 individuals. If the effect sizes in Kathiresan et al4, which were typically less than 0.12 standard units for all SNPs, are applicable to JHS, we had 80% power to detect associations with three variants at p<0.004: rs1260326 (2p23, GCKR) with TG; rs646776 (1p13, CELSR2/PSRC1/SORT1); and rs17321515 (8q24, TRIB1). In addition to a smaller sample size, the reduced power arises in part from the fact that minor allele frequencies at 4 of the 8 loci were lower in African Americans as compared to the discovery cohorts of European ancestry, further decreasing power to confirm association (rs4846914: MAF=0.12 vs 0.40; rs17145738, 0.08 vs 0.13; rs17321515, 0.44 vs 0.49; and rs12130333, 0.10 vs 0.22). Interestingly, although we replicated associations at 2p23 and 1p13, we saw no association at 8q24, despite adequate power. Although the reason for this result is unclear, a possible explanation could be differing patterns in linkage disequilibrium in African Americans, which would weaken the observed association with a non-causal SNP. Additionally, unidentified gene-x-gene or gene-x-environment interactions may cause fundamental differences in effect sizes between European and African Americans.2
We confirmed that local ancestry data at each locus provides additional insight into genetic associations in African Americans. For 4 of the 5 associated index GWA SNPs, the effect sizes on the European ancestry background exceeded those on the African background. Although the effect size differences for any single SNP can only be considered suggestive, this finding supports a pattern of systematically weaker effect sizes on the African ancestral background. Because members of the JHS are from the same community, it is unlikely that unmeasured differential environmental exposures contributed to the apparent systematic difference in effect sizes. A more likely explanation is that the index SNPs are markers for the major casual SNP at each locus and that smaller effect sizes in the African local ancestry subgroup result from the weaker SNP-SNP correlations seen in the ancestral West African population.
Fine-mapping yielded two variants with consistent effects on plasma TG levels irrespective of local ancestry - rs780093 in GCKR and rs636523 near DOCK7/ANGPTL3 - making them credible candidates as the causal variants at their respective loci. Neither SNP is a nonsynonymous coding variant: rs780093 is intronic within the GCKR gene, and rs636523 is near DOCK7 and ANGPTL3. It currently remains unclear how these noncoding variants might influence phenotypic variation. Ultimately, experimental validation in a faithful disease model will be required to further elucidate their mechanistic action. While further replication of these SNPs in additional African American samples is needed, our findings suggest that these SNPs may have predictive value in African Americans at these two loci. In contrast, while rs629301 was nominally more strongly associated with plasma LDL cholesterol than its corresponding index SNP rs646776, there remained a disparity in the effect on the African background compared to the European background. A suitable genetic marker at this locus for LDL in African Americans thus remains to be identified.
Fine-mapping genomic regions is one proposed next step to further characterize loci discovered through GWA.14 It has been suggested that the shorter linkage disequilibrium (LD) patterns in historically larger populations such as West Africans may allow narrowing the interval in which causal variants may lie.3 However, fine-mapping does not necessarily provide an independent pattern of LD because of the large amount of European chromosomal segments arising from admixture. Therefore, we performed a stratified association analysis based upon local ancestry, limiting ourselves to individuals with only a high probability of 2 alleles of African ancestry at the locus (JHS_AFR2). In principle, analysis in this subpopulation will provide an entirely independent pattern of LD as there is no contribution from European ancestral segments. For each of the eight loci, this resulted in a subgroup of ~2500 individuals. Using this approach, we did not find any significant associations for LDL, HDL, or TG.
While theoretically attractive, our work highlights several challenges for tag-based fine-mapping in African Americans. First, power to detect novel associations, even in previously defined loci, is limited. Power calculations for SNP discovery within the African Ancestry subpopulations (Supplemental Table) show that we are substantially underpowered to detect the typical effect sizes for SNPs seen in European populations. Genome-wide association studies have shown that common variants impart modest effects on their associated traits; thus, identifying these variants often requires meta-analyses of large cohort consortia. Unfortunately, there are fewer available cohorts of African ancestry, limiting the number of individuals available for genetic analysis. Furthermore, analyses in admixed populations such as African Americans are complicated by the mixture of African and European segments across the genome. By excluding individuals with European ancestry at a given locus from analysis in order to narrow the association signal, sample size is further diminished (by approximately 45% in our study). In addition, it is plausible that the genetic loci discovered in European-derived populations in fact harbor multiple functional loci along a single haplotype and that in the absence of similar LD structure, the association signal may be much weaker. Finally, tag-based strategies typically utilize highly correlated common variants from the HapMap to capture genetic variation across a genomic region. The second version of HapMap contains only ~30% of the common SNPs that are present in the genome,14 so coverage is currently incomplete. The 1000 Genomes Project, an effort currently underway to resequence the entire genome of >1000 individuals, should significantly improve upon the currently available datasets of common variants. Next generation genomic resequencing technology holds great promise by providing the ability to resequence genomic regions, identifying both common and rare variants in a large number of individuals at a relatively modest financial cost.
Our study has several strengths and limitations. First, the Jackson Heart Study is a well-phenotyped, community based study and is one of the largest available cohorts of African American individuals. Next, we accounted for the genetic architecture of both African and European-derived populations when we selected our tag SNPs. In our analyses, we accounted for population stratification, which can confound genotype-phenotype associations, by adjusting for both global and local ancestry. In addition, we performed stratified analyses by local ancestry and examined whether genetic association signals could be narrowed on the African ancestry background. While many have postulated that this may be a useful approach, few have tested this hypothesis. A key limitation to our study is statistical power. As mentioned earlier, most common variants discovered to date impart small effects, mandating large sample sizes to detect genetic associations. Unfortunately, genetic research in African Americans is currently limited by the relatively few cohorts available as compared to groups of European descent. In addition, tag SNP selection is limited to known common variants.
In conclusion, we investigated eight genomic regions recently reported as associated with lipids in populations of European descent to determine their relevance in African Americans. Of the eight regions, we confirmed that five of the eight index SNPs were significantly/suggestively associated with plasma levels of LDL, HDL, and/or TG levels in African Americans. Moreover, at three of the five confirmed loci, we found SNPs with stronger associations with lipids in African Americans compared to the index SNPs discovered in European-derived cohorts, demonstrating that although associated regions may be relevant to many ethnic groups, specific markers of association can be unique to individual groups. Finally, we demonstrated the limitations of using a cross-ethnic fine-mapping approach to narrow association signals and suggest resequencing these loci in multi-ethnic groups as an alternate approach.
We gratefully acknowledge the contributions of the participants and staff of the Jackson Heart Study.
Funding: Research support for JHS studies was provided by R01-HL-084107 (JGW) from the National Heart, Lung, and Blood Institute and contracts N01-HC-95170, N01-HC-95171, and N01-HC-95172 from the National Heart, Lung, and Blood Institute and the National Center on Minority Health and Health Disparities. M.K. was supported by the Charles A. King Trust, Bank of America, Co-Trustee. S.K. is supported by a Doris Duke Charitable Foundation Clinical Scientist Development Award, a charitable gift from the Fannie E. Rippel Foundation, the Donovan Family Foundation, a career development award from the United States National Institutes of Health (NIH), and institutional support from the Department of Medicine and Cardiovascular Research Center at Massachusetts General Hospital.
Conflict of Interest Disclosures: None
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.