|Home | About | Journals | Submit | Contact Us | Français|
Blood lipid levels, including low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C) and triglycerides (TG), are highly heritable traits and major risk factors for atherosclerotic cardiovascular disease (CVD). Using individual ancestry estimates at marker locations across the genome, we present a novel quantitative admixture mapping analysis of all three lipid traits in a large sample of African-Americans from the Family Blood Pressure Program. Regression analysis was performed with both total and marker-location-specific European ancestry as explanatory variables, along with demographic covariates. Robust permutation analysis was used to assess statistical significance. Overall European ancestry was significantly correlated with HDL-C (negatively) and TG (positively), but not with LDL-C. We found strong evidence for a novel locus underlying HDL-C on chromosome 8q, which correlated negatively with European ancestry (P = .0014); the same location also showed positive correlation of European ancestry with TG levels. A region on chromosome 14q also showed significant negative correlation between HDL-C levels and European ancestry. On chromosome 15q, a suggestive negative correlation of European ancestry with TG and positive correlation with HDL-C was observed. Results with LDL-C were less significant overall. We also found significant evidence for genome-wide ancestry effects underlying the joint distribution of HDL-C and TG, not fully explained by the locus on chromosome 8. Our results are consistent with a genetic contribution to and may explain the healthier HDL-C and TG profiles found in Blacks versus Whites. The identified regions provide locations for follow-up studies of genetic variants underlying lipid variation in African-Americans and possibly other populations.
Cardiovascular disease (CVD) is a leading cause of morbidity and mortality, making it a major public health concern worldwide (1). The etiology of the disease is multifactorial, with established genetic and environmental risk factors (2–5). Atherosclerosis, which is characterized by lipid accumulation, inflammatory response, cell death and fibrosis in arterial walls, accounts for 75% of all deaths from CVD (6). There is compelling evidence of involvement of blood lipid concentrations in atherosclerosis and CVD (7–9). Blood lipids are complex traits influenced by both genetic and environmental factors (2–5,10). Heritability estimates for lipid concentrations vary [~40–75% for high-density lipoprotein cholesterol (HDL-C), ~40–65% for low-density lipoprotein cholesterol (LDL-C) and ~35–50% for triglycerides (TG)] (11), yet are consistently high for all lipid traits.
The genetic dissection of these complex traits has long been a major challenge, and there have been many studies aimed at finding specific loci for these phenotypes. Various study designs have been employed, ranging from large-scale genome-wide association (12–15) to candidate gene (16–19) and linkage studies (6,20,21). Recently, Willer et al. (15), in a large meta-analysis combining genome-wide scans with ~9000 individuals of European descent in stage 1 and ~11 500 individuals of European descent in stage 2, found convincing evidence for 11 loci (genes) previously implicated in lipid metabolism as well as compelling evidence for several newly identified loci (two for HDL-C, three for TG, one for LDL-C and one region encompassing several genes associated with both LDL-C and TG).
At the same time, there are known ethnic trends in lipid levels (22,23). For example, after adjusting for socioeconomic factors, African-Americans generally have healthier lipid levels (lower TG and higher HDL-C) than people of European descent (16,24,25), whereas both have healthier lipid levels than South Asians (26,27). This underscores the importance of undertaking large-scale mapping studies of populations with distinct ethnic backgrounds.
New world admixed populations provide unique opportunities for locating trait loci by genetic admixture mapping (28–33). The African-American population of the United States is typically characterized by admixture of European and African ancestral genomes in different proportions with some spatial variation (34–36). Studies have correlated genome-wide African ancestry in African-Americans with healthier lipid levels (37). In this report, we present results of an admixture mapping analysis examining the correlation of lipid levels (LDL-C, HDL-C and TG) with estimated genome-wide ancestry proportions as well as ancestry estimates at each of 284 marker loci distributed around the genome in 1044 unrelated African-Americans subjects from the Family Blood Pressure Program (FBPP), in a search for locus-specific effects.
Sample demographics for the 1044 individuals are given in Table 1. There were 698 females and 346 males. The subjects from GENOA were on average older (average age 59.6) than the HyperGEN subjects (average age 49.4). Within each network, the average body mass index (BMI), HDL-C and LDL-C were generally higher in females than in males, whereas TG levels showed slightly reversed tendencies. The male and female subjects from GENOA had higher HDL-C and TG levels than their corresponding counterparts from HyperGEN. There was little variation in BMI between networks. Average European ancestry varied modestly between the two networks, as was previously noted (36).
LDL-C levels were modestly correlated with HDL-C and TG levels (correlation coefficient −0.13 with HDL-C and 0.19 with TG). HDL-C and TG, as observed in other studies, showed a stronger inverse correlation (−0.28). The correlation between the traits was virtually unaffected by the transformations. By analysis of variance, all three lipid traits were strongly associated with BMI and age (Table 2). HDL-C was strongly associated with sex. TG and LDL-C differed significantly between networks.
LDL-C was not found to be correlated with overall European individual ancestry (IA) (P = .80). On the other hand, HDL-C was strongly negatively correlated with European IA (t = −2.76, P=0.0059) and TG was strongly positively correlated with European IA (t = 3.29, P = .0010).
A Q–Q plot of the rl values (standardized regression coefficients) for the 284 loci for LDL-C, TG and HDL-C showed a reasonably good fit to normality (Fig. 1) except for some outlier points, in particular at the low (negative) end of the distribution for HDL-C.
According to normal distribution theory, we expected 14 (~5% of 284) marker positions to have an absolute standardized regression coefficient |rl| value > 1.96, or seven in each tail. However, we found an excess of points in the tails in each of the three distributions. For LDL-C, there were seven rl values > 1.96 as expected but 10 in the left tail (i.e. values less than −1.96) (Supplementary Material, Table S1). For HDL-C, there were 11 rl values > 1.96 (i.e. in the right tail) and 10 in the left tail (less than −1.96) (Supplementary Material, Table S2). For TG, there were 12 rl values > 1.96 (i.e. in the right tail) and eight in the left tail (Supplementary Material, Table S3).
For LDL-C, the two most promising chromosomal regions were at 3q and 4q, both showing negative correlation with European ancestry. On 3q, five markers covering 64 cM had Z-scores less than −2.0, with a peak of −2.35 at 188.3 cM (marker D3S2427). On chromosome 4q, three markers had Z-scores less than −2.0, with a peak of −2.33 at 78 cM (marker D4S2367).
For HDL-C, two significant regions on chromosomes 8q and 14q showed negative correlation with European ancestry, whereas a region on chromosome 9q showed positive correlation with European ancestry. Chromosome 8q had four markers covering 27 cM with Z-scores less than −2.0, with a peak of −4.13 at 82.3 cM (marker D8S1136). On chromosome 14q, three markers had Z-scores less than −2, with a peak of −3.00 at 95.9 cM (marker GATA193A07). On chromosome 9q, there were three markers with Z-scores > 2, with a peak of 2.69 at 110.9 cM.
The two most promising locations for TG included one extended region positively correlated with European ancestry on chromosome 8q and another negatively correlated with European ancestry on chromosome 15q. On chromosome 15q, the peak Z-score of −2.75 occurred at 90.0 cM (marker D15S652), whereas two neighboring markers also had Z-scores less than −2. On chromosome 8q, an extended region of 87 cM stretching from 77.9 to 164.5 cM contained seven markers with Z-scores > 2. Two separate peaks occurred in this interval, one at 135.1 cM with a Z-score of 2.58 (marker D8S1179) and the other at 77.9–82.3 cM (markers D8S1113 and D8S1136) with a Z-score of 2.31. We note that this is precisely the same location harboring the most significant result for HDL-C (marker D8S1136).
To determine the statistical significance of our findings, we employed a permutation test (see Materials and Methods). For LDL-C and TG, none of the markers was significant at P < 0.05. For HDL-C, however, in 5000 permutations, the minimum of 284 rl scores crossed −4.13 only seven times, −3.01 seventy-three times and −2.99 ninety times. Hence, the result associated with the marker D8S1136 is highly significant on a genome-wide level (P = 0.0014), whereas that for D8S1113 and GATA193A07 are also formally significant (P < 0.02).
We also looked at possible significant interactions of BMI, age, sex, network, usage of lipid-lowering medication, and hypertension status with local ancestry at the significant genomic marker locations for all three lipid traits and did not find any. The Z-scores for HDL-C and TG across chromosomes 8 and 14 are plotted in Figure 2, which also gives a visual impression of the inverse relationship between the two traits. We also examined the possibility of outlier points driving the Z-scores for the three significant loci (Supplementary Material, Fig. S1) but found no evidence of outliers.
Furthermore, as another check on the robustness of our conclusions, we examined results of multiple additional analyses based on different randomly selected sets of unrelated individuals from the African-American families in our study. The results for these subsets were essentially identical to those we obtained in the original analysis, indicating that our results were not dependent on the original selection of unrelated individuals from these families.
Our results also indicated an overall genome-wide ancestry deviation in opposite directions for HDL-C and TG. To determine whether multiple loci are influential in the joint distribution of these lipid traits, we looked at the correlation between the regression coefficients of the 284 markers for the two lipids. If a common set of genes is responsible, for example, for both high HDL-C and low TG values, then we would observe a large negative correlation between the ancestry regression coefficients.
For the 284 markers, the correlation between the regression coefficients for HDL-C and TG was −0.55. To determine statistical significance of this correlation, we performed 5000 permutations (as described in Materials and Methods), and for each permutation calculated the pairwise correlation between the ancestry regression coefficients for the two lipid traits. Out of 5000 permutations, the correlation exceeded −0.55 only 70 times, which corresponds to a P-value of .014. After removing the two chromosomes eight markers with highly negative rl scores for HDL-C and highly positive rl scores for TG from the analysis, the correlation declined only to −0.53, which was still statistically significant (P = 0.028). After removal of the markers on chromosomes 8q and 15q, the correlation still trended in the same direction but was no longer significant (P = .063). To assess whether the regressions of HDL-C and TG on total European IA could be attributed solely to ancestry at the peak location on chromosome 8q, we performed a regression analysis of HDL-C (and TG) including both 8q locus-specific European ancestry and total European ancestry in the model. For both lipids, total European IA was no longer significant (and regression coefficients were close to 0), once the locus-specific European ancestry was included in the model. Hence, the overall ancestry effect we observed for both lipids could be explained simply by a locus on chromosome 8q, although the situation may be more complex, with other loci contributing.
Numerous genes have been directly implicated in the variation of lipid traits such as apolipoproteins (APOA1, APOA2, APOA4, APOE), ATP-binding cassette sub-family A member 1 (ABCA1), lecithin cholesterol acyltransferase (LCAT) and lipoprotein lipase (LPL), among others (16,38–43). All these genes have been extensively studied and well-characterized biochemically. With the exception of APOE, variants in the above-mentioned genes and a few others have been found mostly in Mendelian disorders involving lipid metabolism. Identifying more common and/or less penetrant variants affecting lipid levels has been a more challenging endeavor.
Among the three lipid traits we studied, LDL-C had the least significant results. Although no points for LDL-C were formally significant, we did note a modest increase in markers correlated with excess African ancestry on chromosomes 3q and 4q. The region on chromosome 3q was previously identified in a linkage study with 470 subjects in 10 pedigrees with a maximum LOD score of 4.1 (44), but not in other linkage and association studies. On chromosome 4q, the marker D4S2367, which had the second lowest rl value, is close to the SNP rs10518072. This SNP was associated with LDL-C levels in the Framingham Heart genome-wide association study (P = 1.4 × 10−4) (13) but was not reported by Willer et al. (15) study.
In contrast, the results for HDL-C were more significant. Most of the markers in the extreme left tail (negative correlation with European ancestry) are on chromosomes 8q and 14q. The peak region on chromosome 8q (at 82 cM), our most significant finding overall, also showed evidence of positive correlation of European ancestry with TG. Deviations in the other tail of the Z-score distribution were more subtle and less significant. Three consecutive markers on chromosome 9q showed up in this tail, with a peak between 104 and 111 cM.
Of interest, the region on chromosomes 9q22–q32 that we identified harbors the ABC1 (ATP-binding cassette) gene at 9q31. ABC1 has long been associated with studies involving HDL-C and is regarded as a candidate gene for HDL-C deficiency and other related phenotypes (45–48). Mutations in ABC1 can cause Tangier disease, a genetic disorder of cholesterol characterized by very low levels of HDL-C (46,47,49–52). Hence, ABC1 would be a candidate for an HDL-C quantitative trait locus (QTL) in our analysis as well.
The three markers in our study, which were clearly significant after permutation testing, are D8S1113 and D8S1136 from the region 8q11.23–q21.11 and GATA193A07 from the region 14q24.1–q31. The region 8q11.23–q21.11 is the location of the CYP7A1 gene at 8q12.1. This gene encodes a member of the cytochrome P450 superfamily of enzymes which catalyze many reactions involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. This region also harbors the candidate gene CYP7B1 (at 8q12.3; 65.6 Mb from pter) between the markers D8S1113 and D8S1136. Little other evidence in the literature implicates this region of 8q in HDL-C or TG levels, suggesting our results may be indicating a novel locus. In a study of lipid levels among 113 Hispanic families, the GATA193A07 marker on chromosome 14 showed modest evidence of linkage for TG (two point LOD score 2.12) and for the ratio of TG to HDL-C (two point LOD score 2.46) (52). Weak evidence of linkage to the locus was also obtained in a genome scan of QTLs influencing HDL-C levels (19,53).
Analysis of TG did not reveal any formally significant loci, although there was an overall excess of observations in the right tail of the Z-score distribution. This excess was due to a preponderance of loci on chromosome 8. Three of these loci overlapped with our most significant region for HDL-C at 8q11–q21, whereas four other markers mapped to a non-overlapping, more distal location between 8q23 and q24. We do note, however, that rs7007075, the SNP most strongly associated (P < 7.7 × 10−6) with TG levels in the Framingham Heart genome-wide association study, lies in this region. The SNP rs17321515 (8q24.13) at 3′ downstream of the gene TRIB1 has been one of the most significant findings (P=4 × 10−17) of genome-wide association studies (12). This study also reported five independently associated SNPs in this region, all of which lie within the two consecutive markers D8S592 at 8q24.11 and D8S1179 at 8q24.13, included in our study. The rl value corresponding to the marker D8S1179 (2.59) and D8S592 (2.48) for TG are the highest and the third highest that we found in our study (Supplementary Material, Table S3).
As noted by us and others, the various lipid fractions are modestly correlated within individuals. The largest correlation is the negative association between HDL-C and TG. Prior studies have suggested this correlation is, in part, genetically determined (44,54–62). Thus, it is appropriate in any genomic analysis to search for genes underlying the joint distribution of these traits.
In our admixture mapping analysis, we therefore searched for locations potentially underlying multiple traits. Examination of the joint distribution of regression coefficients for HDL-C and TG did indeed demonstrate a correlation (−0.55) that was statistically significant based on a permutation analysis. The excess correlation over expected was due partially but not exclusively to markers on chromosomes 8q and 15q, suggesting a possible multi-genic contribution to this correlation.
Admixture mapping analysis is complementary to other gene mapping approaches such as QTL linkage analysis and genome-wide association studies. The power of the method depends both on a measurable effect influencing the trait of interest and on a sizeable difference in allele frequency between ancestral populations, in this case Europeans and Africans. Some of our more significant findings overlapped with prior linkage and genome-wide association studies, and these indicate chromosome regions for more detailed study in follow-up. However, the degree of overlap among these approaches depends on a number of factors, including the underlying allele frequencies and degree of linkage disequilibrium between trait-associated alleles and marker SNPs.
The power of admixture mapping is also dependent upon the accuracy of the estimates of locus-specific ancestry. Ancestry informativeness of the marker set that we have used is on an average 50% and never falls < 40% (33).
Of note, the direction of our significant findings for HDL-C and TG and lack of significant findings for LDL-C are consistent with a genetic contribution to the known ethnic difference for HDL-C and TG. Specifically, the regression coefficient we obtained for European IA on (untransformed) HDL-C (−8.18) can fully explain the observed Black–White differences in HDL-C. If we assume that African-Americans have 15% European ancestry, on average (36), then the expected difference between Whites and African-Americans would be 85% of −8.18 or −6.95. In our sample, the standard deviation of HDL-C was 16.80, so the projected difference of −6.95 units corresponds ~0.4 standard deviations, a figure comparable to the ethnic difference observed in two large studies (24,25). Similarly, the regression coefficient of European IA on (untransformed) TG levels was 64.47. The standard deviation of TG in our sample was 76.36. Therefore, the predicted ethnic difference from our analysis would be 85% of 64.47 = 54.80, which corresponds ~0.7 standard deviations, again comparable to the differences observed for US Blacks and Whites (24,25).
To our knowledge, this is the largest quantitative admixture mapping effort of blood lipid levels in terms of sample size and marker locus involvement. Overall, our findings are encouraging and provide regions for follow-up analyses of genes influencing HDL-C and TG, and possibly LDL-C in these and other African-American individuals.
The FBPP is a large multicenter genetic study of high blood pressure and related traits in multiple racial/ethnic groups, including European-Americans, African-Americans, Mexican Americans and Asians and Asian Americans (63). It includes four component networks: GenNet, GENOA, HyperGEN and SAPPHIRE. GenNet, GENOA and HyperGEN who independently collected samples from European-American and African-American families.
All the individuals we included in this study were unrelated self-identified African-Americans from the field centers of GENOA and HyperGEN for whom fasting plasma lipid concentrations were available. To maximize the number of unrelated individuals in our sample, whenever possible we selected unrelated founder individuals; otherwise we chose at random one individual per family. Our final sample of 1044 individuals consisted of 349 individuals sampled by the GENOA network and 695 individuals sampled by the HyperGEN network. Out of these 1044 individuals, 32 did not have their LDL-C on record, so all our findings for LDL-C are based on 1012 individuals, whereas those for HDL-C and TG are based on 1044 individuals. There were 57 individuals (52 for LDL-C) who were noted to be taking lipid-lowering medication. The two networks included in this study measured fasting plasma lipids using standard enzymatic methods in laboratories participating in the CDC standardization program. All individuals who participated in the FBPP gave informed consent; the Institutional Review Board at each clinic site approved all protocols, and a Certificate of Confidentiality was obtained from the Federal Government for this study.
DNA was extracted from whole blood by standard methods by each of the four FBPP networks and was sent to the US National Heart, Lung and Blood Institute’s Mammalian genotyping service in Marshfield, Wisconsin, for genotyping. Screening set 8 (372 highly polymorphic microsattelite markers with an average inter-marker map distance of 10 cM) was used.
We used the computer program Structure (64,65) to estimate genome-wide, as well as site-specific IAs in all African-American participants. The linkage model was used, with genetic distance between markers specified according to the Marshfield map. In each analysis, the Markov Chain Monte Carlo algorithm was run for 100 000 steps of burn-in followed by another 100 000 steps. We assumed a model with two ancestral populations. We included 1378 unrelated non-Hispanic White participants from the FBPP to represent the European ancestral population as well as 127 unrelated sub-Saharan African individuals from the Human Genome Diversity Project to represent the African ancestral population (66). The latter set of individuals had been genotyped at more than 300 short tandem repeat at the time of our analysis, and we included genotypes at 284 markers which were overlapping with the FBPP genotypes.
The distributions of all three lipid traits for the 1044 (1012 for LDL-C) unrelated individuals were non-normal due to positive skewness. The best normalization of the LDL-C data was obtained using the Box–Cox power transformation (67) with parameter (λ) 0.69, whereas the value of λ was −0.33 and −0.14, respectively, for HDL-C and TG.
We first calculated regression coefficients for each of the transformed lipid traits on overall European IA, allowing for covariates. Then, for each individual at each locus, a European ancestry deviation xl was defined as the estimated ancestry at locus l minus the background ancestry estimated from the genome-wide markers for that individual, as previously described (68). The variable xl was then used as the primary independent variable in a linear regression model with the quantitative trait value of LDL-C or HDL-C or TG as the dependent variable. Log–log transformed BMI, age, sex and network were included as covariates in this analysis, if significant. The use of lipid-lowering medication was not found to be significant for any of the traits, possibly due to the small number of individuals (n = 57) on medication. Hypertension status was also included as a covariate, but was not significant for any of the traits. We also checked for possible two factor interactions and none was significant. All non-significant covariates and interactions were dropped from the model in subsequent analyses. The standardized regression coefficient of rl, defined as, was assumed to be distributed as asymptotically normal and was used to assess statistical significance. To account for multiple testing (284 markers), we performed a permutation analysis in which we randomly reassigned the vector of genetic ancestry estimates for the 284 marker locations to individuals whose LDL-C, HDL-C, TG and covariate data remained intact. This procedure preserved the correlation structure of the markers and the correlation structure of the traits and covariates, but dissociated the relationship between the markers and phenotypes. For each permuted dataset, we performed the same regression analysis of the traits on excess ancestry at each marker location, as was done for the original data, and obtained the most extreme values (positive and negative) of rl (the Z-score statistics). Five thousand permutations were performed. To derive P-values adjusted for multiple testing, we determined the percentage of times out of 5000 permutations that an observed value of rl was exceeded in the permuted data analysis.
This work was supported by grants awarded to the Family Blood Pressure Program, which is supported by a series of cooperative agreements from the National Heart, Lung and Blood Institute to GenNet, HyperGEN, GENOA and SAPPHIRe.
We thank Mark Kvale for his assistance with analysis. The following investigators are associated with the Family Blood Pressure Program: GenNet Network: Alan B. Weder (Network Director), Lillian Gleiberman (Network Coordinator), Anne E. Kwitek, Aravinda Chakravarti, Richard S. Cooper, Carolina Delgado, Howard J. Jacob and Nicholas J. Schork. GENOA Network: Eric Boerwinkle (Network Director), Tom Mosley, Alanna Morrison, Kathy Klos, Craig Hanis, Sharon Kardia and Stephen Turner. HyperGEN Network: Steven C. Hunt (Network Director), Janet Hood, Donna Arnett, John H. Eckfeldt, R. Curtis Ellison, Chi Gu, Gerardo Heiss, Paul Hopkins, Aldi T. Kraja, Jean-Marc Lalouel, Mark Leppert, Albert Oberman, Michael A. Province, D.C. Rao, Treva Rice and Robert Weiss. SAPPHIRe Network: David Curb (Network Director), David Cox, Timothy Donlon, Victor Dzau, John Grove, Kamal Masaki, Richard Myers, Richard Olshen, Richard Pratt, Tom Quertermous, Neil Risch and Beatriz Rodriguez. National Heart, Lung and Blood Institute: Dina Paltoo and Cashell E. Jaquish. Web Site: http://www.biostat.wustl.edu/fbpp/FBPP.shtml
Conflict of Interest statement. None declared.