|Home | About | Journals | Submit | Contact Us | Français|
This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Blood lipid levels including low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglycerides (TG) are highly heritable. Genome-wide association is a promising approach to map genetic loci related to these heritable phenotypes.
In 1087 Framingham Heart Study Offspring cohort participants (mean age 47 years, 52% women), we conducted genome-wide analyses (Affymetrix 100K GeneChip) for fasting blood lipid traits. Total cholesterol, HDL-C, and TG were measured by standard enzymatic methods and LDL-C was calculated using the Friedewald formula. The long-term averages of up to seven measurements of LDL-C, HDL-C, and TG over a ~30 year span were the primary phenotypes. We used generalized estimating equations (GEE), family-based association tests (FBAT) and variance components linkage to investigate the relationships between SNPs (on autosomes, with minor allele frequency ≥10%, genotypic call rate ≥80%, and Hardy-Weinberg equilibrium p ≥ 0.001) and multivariable-adjusted residuals. We pursued a three-stage replication strategy of the GEE association results with 287 SNPs (P < 0.001 in Stage I) tested in Stage II (n ~1450 individuals) and 40 SNPs (P < 0.001 in joint analysis of Stages I and II) tested in Stage III (n~6650 individuals).
Long-term averages of LDL-C, HDL-C, and TG were highly heritable (h2 = 0.66, 0.69, 0.58, respectively; each P < 0.0001). Of 70,987 tests for each of the phenotypes, two SNPs had p < 10-5 in GEE results for LDL-C, four for HDL-C, and one for TG. For each multivariable-adjusted phenotype, the number of SNPs with association p < 10-4 ranged from 13 to 18 and with p < 10-3, from 94 to 149. Some results confirmed previously reported associations with candidate genes including variation in the lipoprotein lipase gene (LPL) and HDL-C and TG (rs7007797; P = 0.0005 for HDL-C and 0.002 for TG). The full set of GEE, FBAT and linkage results are posted at the database of Genotype and Phenotype (dbGaP). After three stages of replication, there was no convincing statistical evidence for association (i.e., combined P < 10-5 across all three stages) between any of the tested SNPs and lipid phenotypes.
Using a 100K genome-wide scan, we have generated a set of putative associations for common sequence variants and lipid phenotypes. Validation of selected hypotheses in additional samples did not identify any new loci underlying variability in blood lipids. Lack of replication may be due to inadequate statistical power to detect modest quantitative trait locus effects (i.e., <1% of trait variance explained) or reduced genomic coverage of the 100K array. GWAS in FHS using a denser genome-wide genotyping platform and a better-powered replication strategy may identify novel loci underlying blood lipids.
Blood lipid levels are a major contributor to atherosclerotic cardiovascular disease . Current evidence suggests that blood lipids are complex genetic phenotypes, influenced by both environmental and genetic factors. Heritability estimates for blood lipids are high, including ~40–60% for high-density lipoprotein cholesterol (HDL-C), ~40–50% for low-density lipoprotein cholesterol (LDL-C), and ~35–48% for triglycerides (TG) . These estimates indicate that DNA sequence variation plays an important role in explaining inter-individual variation in blood lipid levels. Indeed, sequence variants in individual genes have been consistently related to blood lipid phenotypes, including APOE/PCSK9 with LDL-C [3-5], CETP/LIPC/LPL with HDL-C [6-9], and APOA5/LPL with TG [10,11], among others. However, the extent to which common genetic variants across the genome account for total variation in blood lipid levels is unknown.
Recent advances in genomics enable a genome-wide association study (GWAS), an approach in which a substantial fraction of common genetic variation is tested for a role in determining phenotypic variation . These advances include a map of the correlation structure for approximately 4 million common genetic variants (minor allele frequency >5%) and whole-genome genotyping technologies capable of assaying 100,000–500,000 single nucleotide polymorphisms (SNPs) in an individual . Utilizing a fixed genotyping marker set such as the Affymetrix 100K GeneChip in an association study tests a substantial fraction of the genome in whites, ~30–45% in some estimates . GWAS has been successfully applied to identify novel genetic loci related to several medical phenotypes including age-related macular degeneration , inflammatory bowel disease , and electrocardiographic QT interval . Identifying novel genetic variants related to blood lipid phenotypes may provide new drug targets to alter blood lipid levels and may aid in the prediction of cardiovascular disease.
We hypothesized that common genetic variants explain a proportion of the inter-individual variability in LDL-C, HDL-C, and TG. Accordingly, we conducted genome-wide linkage and association studies for these three phenotypes in Framingham Heart Study (FHS) participants.
Of the 1345 FHS participants who are part of the family plate set (see Executive Summary), we focused our analyses on the 1087 participants from the Offspring cohort who had Affymetrix 100K genotypes. Lipid phenotypes were measured at various examinations as described in Table Table1.1. Each study participant provided written informed consent for genetic analyses and the study was approved by Boston University's Institutional Review Board.
Blood lipids were measured from fasting venous blood collected at each of seven clinical examination time points extending from 1971 to 2001. Total cholesterol, HDL-C, and TG were measured by standard enzymatic methods. LDL-C was calculated using the Friedewald formula, with a missing value assigned for participants with a measured TG > 400 mg/dL. Clinical covariates utilized in phenotypic regression modeling included age at the time of blood lipid measurement, age2, body mass index (weight in kg divided by the height in m2), alcohol intake (drinks per week), current cigarette smoking (yes, no), menopausal status (postmenopausal yes, no), and hormone replacement therapy (yes, no).
Commonly-used lipid lowering therapies affect total cholesterol and TG. To account for treatment effect, we imputed total cholesterol and TG values for those treated with lipid-lowering therapy. The imputation procedure was modeled after prior work on imputing blood pressure values for those on antihypertensive medication . For each treated individual, a correction factor was added to the observed [treated] lipid value (total cholesterol or TG). This correction factor consisted of the difference between an ''expected'' residual and the ''calculated'' residual. The ''calculated'' residual for each individual was generated in a sex-specific manner after adjustment for age, age2, age3, and examination year (by decade). The ''expected'' residual was generated within each sex and 10 year-age-group as the average of ''calculated'' residuals equal or greater than the treated individual's ''calculated'' residual.
Lipoprotein subclass profiles were measured by a commercially available proton NMR spectroscopic assay (LipoScience, Raleigh, NC) on plasma samples stored at -70°C as described previously . The particle concentration of the following 9 lipoprotein species were determined: 3 VLDL subclasses [large, >60 nm (including chylomicrons); intermediate, 35–60 nm; small, 27–35 nm]; 3 LDL subclasses (IDL, 23–27 nm; large LDL 21.3–23 nm; small LDL, 18.3–21.2 nm); and 3 HDL subclasses (large, 8.8–13 nm; intermediate, 8.2–8.8 nm; small, 7.3–8.2 nm). The small LDL subclass comprises the sum of subclasses formerly labeled "intermediate" (19.8–21.2 nm) and "small" (18.3–19.7 nm) , since concentrations of both have very similar relations to lipid levels.
All analyses were based on the Affymetrix 100K GeneChip genotyping data generated in Framingham Heart Study participants as described previously . In order to minimize false positive associations due to genotyping artifact, we limited our analyses to SNPs with a genotyping call rate ≥80% and a Hardy-Weinberg Equilibrium P ≥ 0.001. Given lower statistical power to detect associations with rarer SNPs, we limited our results to SNPs with a minor allele frequency ≥10%.
TG levels were log-transformed to approximate a normal distribution. For each blood lipid phenotype, the long-term average of 4 to 7 serial measurements was used as the primary phenotype. Participants contributing fewer than 4 of 7 measures of a given phenotype were excluded from that analysis. MeanLDL-C, MeanHDL-C, and MeanTG were adjusted for covariates in sex-specific linear regression models. Two sets of phenotypic models were created: Model 1 (age, age2) and Model 2 (age, age2, body mass index, alcohol intake, cigarette smoking, menopausal status, and hormone replacement therapy). For quantitative covariates (age, body mass index, and alcohol intake), the mean value across examinations was used as a covariate. For categorical covariates, the proportion of exams scored as 'yes' was used. The residual MeanLDL-C, MeanHDL-C, and MeanlogTG values from Model 1 and Model 2 served as the primary phenotypes.
For genotype-phenotype association analyses, we assumed an additive model of inheritance. We conducted multivariable linear regression using GEE, family-based association testing using FBAT, and linkage using Merlin for computation of IBDs and SOLAR for variance component models as described in the Executive Summary.
Heritability estimates for the lipid phenotypes were obtained from extended families with at least two members by variance-components methods using the Sequential Oligogenic Linkage Analysis Routines (SOLAR) package . Using this approach, maximum-likelihood estimation was applied to a mixed-effects model that incorporated fixed covariate effects, additive genetic effects, and residual error. The additive genetic effects and residual errors were assumed to be normally distributed and to be mutually independent. The analyses were performed using residuals from the multivariable models (Model 1 and Model 2) mentioned above. For phenotypes with kurtosis > 1, heritability estimates were computed on ranked normalized deviates.
Replication genotyping was attempted in three independent sample sets: a) the FHS unrelated plate set; b) Genetics of Lipid Lowering Drugs and Diet Network (GOLDN); and c) Malmö Diet and Cancer Study – Cardiovascular Cohort (MDC-CC).
The second stage consisted of ~1450 biologically unrelated individuals from the FHS unrelated plate set. The third stage consisted of ~1450 participants from GOLDN and ~5200 participants from MDC-CC. GOLDN is a family-based sample recruited from two National Heart, Lung, and Blood Institute's Family Heart Study field centers (Minneapolis, MN and Salt Lake City, UT). The Family Heart Study is a multi-center, population-based cohort designed to study the genetic and environmental determinations of cardiovascular disease.
The MDC study is a community-based prospective epidemiologic cohort of 28,098 persons recruited for a baseline examination between 1991 and 1996. From this cohort, 6103 persons were randomly selected to participate in the MDC-CC which sought to investigate risk factors for cardiovascular disease. Of the MDC-CC participants, 5466 had DNA and lipid phenotypes available. Individuals on lipid lowering therapy and with outlier values of LDL-C, HDL-C, or TG (top 0.5% of the distribution) were excluded, leaving 5212 individuals available for the SNP-lipid association analyses
For follow-up into Stage II (the FHS unrelated plate set), we selected all SNPs in the GWAS with an association P < 0.001 for the MeanLDL-C, MeanHDL-C, or MeanTG phenotypes from the minimally-adjusted phenotypic model (Model 1, adjustment for age, age2 only). We next conducted a joint analysis of Stage I (GWAS 100K data) and Stage II (FHS unrelated plate set). The joint analysis consisted of a weighted average of the beta estimates and standard errors from Stages I and II and used the inverse of the variance in each stage as weights.
For follow-up into Stage III (GOLDN and MDC-CC), we selected for genotyping all SNPs with a P < 0.001 in the joint analysis of Stages I and II. For genotype-phenotype association analyses in MDC-CC and GOLDN, we assumed an additive model of inheritance. In MDC-CC, we conducted multivariable linear regression analyses to test the null hypothesis that LDL-C, HDL-C, or TG residuals (sex-specific residuals adjusted for age and age2) did not differ by increasing minor allele copy number. In GOLDN, to account for correlated observations due to family relationships we used linear mixed-effects methods in SOLAR.
To summarize the statistical evidence for association for each SNP across all three stages, we reiterated the weighted average beta-estimates and standard errors as described above.
Clinical characteristics of the FHS sample of 1345 subjects are presented in the Executive Summary. Table Table11 displays the variables that were studied in our analyses of lipid phenotypes. Further information on these phenotypes can be found at http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?id=phs000007. Since Original cohort members were non-fasting at examination, our analyses considered only the 1087 Offspring Study participants with fasting lipid measurements and Affymetrix 100K SNP genotypes. For this paper we focus on longitudinal mean levels of serially measured values (minimum of 4, maximum of 7) of LDL-C, HDL-C, and TG (labeled MeanLDL-C, MeanHDL-C, and MeanTG).
Heritability estimates for long-term average lipid phenotypes (Mean LDL-C, MeanHDL-C, and MeanTG) were greater than those from single time-point measurements (Table (Table1).1). For example, the heritabilities of MeanLDL-C, MeanHDL-C, and MeanTG were 0.66, 0.69, and 0.58, respectively, whereas heritabilities for LDL-C, HDL-C, and TG measured at FHS Examination 1 (a single time-point) were 0.59, 0.52, and 0.48, respectively. The highest heritability estimate for any available lipid phenotype was that for lipoprotein (a) at 0.90.
From the GEE analyses, the strongest associations for MeanLDL-C, MeanHDL-C, and MeanTG were for SNPs rs287474 (p = 6.3*10-9), rs524802 (p = 7.6*10-7), and rs7007075 (p = 7.7*10-6), respectively (Table (Table2a).2a). From the FBAT analyses, the strongest associations for MeanLDL-C, MeanHDL-C, and MeanTG were for SNPs rs287474 (p = 1.4*10-8), rs10495594 (p = 5.1*10-5), and rs1449866 (p = 1.8*10-5), respectively (Table (Table2b).2b). For each multivariable-adjusted phenotype, the number of SNPs with a GEE association p < 10-4 ranged from 13 to 18 and with p < 10-3, from 94 to 149. The number of SNPs with FBAT association p < 10-4 ranged from 2 to 5 and with p < 10-3, from 74 to 79.
Linkage LOD scores > 2.0 are presented in Table Table2c.2c. The best evidence for linkage was a peak LOD score of 3.3 on chromosome 7 for the MeanHDL-C phenotype.
Because the prior probability of any SNP relating to a phenotype is low and given the number of tests, the P value distribution in a GWAS should approach a null distribution. Any strong departure from this expectation might suggest artifacts in genotyping or analysis. For the 70,987 SNPs that passed quality-control filters, the distribution of association P values (generated by the GEE methodology) approached a null distribution but with a slight excess of low P values. For example, for the MeanLDL-C, whereas one would expect 1% of SNPs to demonstrate a P < 0.01 by chance, we found that 1.34% of SNPs displayed a P < 0.01. Similar results were seen for meanHDL-C and meanTG (data not shown).
We evaluated the association results for a SNP and each of a set of four correlated phenotypes – ApoA-I, LDLNMRsm, MeanHDL-C, and MeanTG (Table (Table3).3). Several SNPs were associated with P < 0.01 for 3 of the 4 phenotypes.
Among the GEE association results, a SNP (rs7007797) in the lipoprotein lipase (LPL) was associated with MeanHDL-C (p = 0.0005) and MeanTG (p = 0.002) (Table (Table4).4). This SNP is a perfect proxy (r2 = 1) to the previously studied rs328 (also known as S447X) . The minor allele of rs328 has been consistently related to higher HDL-C and lower TG. The direction of effect for SNP rs7007797 in our dataset was consistent with previous observations. Due to a lack of SNPs in the Affymetrix 100K GeneChip correlated with previously reported variants (at r2 > 0.5 threshold) in the APOE, PCSK9, CETP, LIPC, and APOA5 genes, we were unable to confirm these other previously reported associations (Table (Table44).
Replication is critical to distinguish true positives from false ones in a GWAS. We pursued a three-stage replication strategy with 287 SNPs (P < 0.001 in Stage I) tested in Stage II (n~1450 individuals) and 40 SNPs (P < 0.001 in joint analysis of Stages I and II) tested in Stage III (n~6650 individuals). Results are displayed in Table Table5.5. After three stages of replication, there was no convincing statistical evidence for association (i.e. joint analysis stages I, II & III P < 10-5) between any of the tested SNPs and lipid phenotypes.
We examined associations of Affymetrix 100K SNPs and lipid traits in FHS and identified putative associations with lipid phenotypes. We studied the long-term average of up to 7 measurements each of LDL-C, HDL-C, and TG as the primary phenotypes and for one phenotype, the MeanLDL-C, we observed a nominal P that exceeded genome-wide significance . However, validation of selected hypotheses in additional samples did not identify any new loci underlying variability in blood lipids.
GWAS offers the potential to identify novel genetic variants/loci that are associated with blood lipid variation, unlimited by our current knowledge of lipoprotein biology. However, a central limitation of GWAS is that the true signals are mixed amidst a large number of false positive results. Validation in additional samples is required to distinguish the true positives from the false ones.
Replication of initial GWAS findings using a staged design has been suggested to minimize genotyping cost and maximize statistical power [23,24]. An important consideration in such a design is the proportion of markers taken forward to a second stage. We estimated the statistical power for our three-stage GWAS strategy. Assuming a modest number of markers (all SNPs with P < 0.001 for each phenotype, ~0.1% of markers) are taken forward to Stage II, a second stage sample size of 1450, that SNPs with P < 0.001 are taken forward from Stage II to Stage III, a stage III sample size of 6650, and that the final alpha (after Stages I, II, & III) is set at a conservative 5*10-8, we estimated that we had 89% power to detect a quantitative trait locus explaining 2% of phenotypic variance, 48% power to detect a locus explaining 1% of the variance, and 13% power to detect a locus explaining 0.5% of the variance.
With our replication effort, we failed to identify any novel loci related to blood lipids. At least two potential explanations are possible. First, our study design had limited statistical power to detect common SNPs that explain ≤1% of trait variance. In the Diabetes Genetics Initiative genome-wide association study for blood lipid traits, we recently showed that for lipid traits, there are few common variants that explain >2% of the variance and most SNPs explain <1% of trait variance . To have adequate statistical power to detect these effects given an initial GWAS sample size of ~1000, many more markers (i.e., hundreds of SNPs) will need to be taken to the second and third stages. Second, the limited genomic coverage of the Affymetrix 100K array may have limited our ability to replicate previously reported loci and discover novel loci. For example, using the Affymetrix 500 K array, we recently identified glucokinase regulatory protein (GCKR) as a novel locus associated with TG . Of any SNP on the 500 K array, an intronic GCKR SNP (rs780094) explained the greatest proportion of blood TG variance in the Diabetes Genetics Initiative study. However, on the Affymetrix 100K array, there are no SNPs within the 60 kb spanning GCKR.
This study is distinguished by the availability of serial lipid phenotypes over a 30-year time span, the community-based nature of the collection, and the routine ascertainment of covariates in a standardized clinical examination. We acknowledge several limitations. These include the lack of validation for the imputation methodology used to address lipid lowering therapy, limited statistical power due to sample size, and confinement to a single ancestral group – whites of European ancestry.
Using a 100K genome-wide scan, we present association and linkage results for a rich set of lipid phenotypes in FHS. This resource may be useful for comparisons with other GWAS currently in progress. GWAS in FHS using a denser genome-wide genotyping platform and a better-powered replication strategy may identify novel loci underlying blood lipids.
FBAT = family-based association test; GEE = generalized estimating equations.
The authors declare that they have no competing interests.
SK, AM, SD, GP, JMO, and LAC participated in the design of the study and the interpretation of the data. AM, GP, and SD conducted the statistical analyses. SK drafted the manuscript. AS, CG, LG, and NPB generated replication genotype data and analyses. OM and MOM provided replication samples and conducted association analyses in the Malmo Diet and Cancer Study. DKA and JMO provided replication samples and led the generation of lipid phenotypes in the GOLDN study. SD, SK, RD, JMO, and LAC revised the manuscript critically for important intellectual content. All authors read and approved the above manuscript.
We thank the Framingham Heart Study participants for their long-term voluntary commitment to this study. The Framingham Heart Study is supported by a contract from the National Heart, Lung and Blood Institute (contract No. N01-HC-25195). We acknowledge Dr. Michael Christman, Dr. Alan Herbert and colleagues at Boston University who conducted the Affymetrix 100K genotyping and have made these data publicly available. A portion of the research was conducted using the Boston University Linux Cluster for Genetic Analysis (LinGA) funded by the NIH NCRR (National Center for Research Resources) Shared Instrumentation grant (1S10RR163736-01A1). Dr. Ordovas is supported by contracts 53-K06-5-10 and 58-1950-9-001 from the US Department of Agriculture Research Service. Dr. Kathiresan is funded by the Doris Duke Charitable Foundation Clinical Scientist Development Award, the Fannie E. Rippel Foundation, and NIH K23 HL083102.
This article has been published as part of BMC Medical Genetics Volume 8 Supplement 1, 2007: The Framingham Heart Study 100,000 single nucleotide polymorphisms resource. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2350/8?issue=S1.