|Home | About | Journals | Submit | Contact Us | Français|
Genome-wide association studies (GWAS) have replicably identified multiple loci associated with population-based plasma lipid concentrations1-5. Common genetic variants at these loci together explain <10% of the total variation of each lipid trait4,5. Rare variants of individually large effect may contribute additionally to the “missing heritability” of lipid traits6,7, however it remains to be shown to what extent rare variants will affect lipid phenotypes. Here, we demonstrate a significant accumulation of rare variants in GWAS-identified genes in patients with an extreme phenotype of abnormal plasma triglyceride (TG) metabolism. A GWAS of hypertriglyceridemia (HTG) patients revealed that common variants in APOA5, GCKR, LPL and APOB genes were associated with the HTG phenotype at genome-wide significance. We subsequently resequenced protein coding regions of these genes and found a significant burden of 154 rare missense or nonsense variants in 438 HTG patients, in contrast to 53 variants in 327 controls (P=6.2X10-8); this corresponds to a carrier frequency of 28.1% of HTG patients and 15.3% of controls (P=2.6X10-5). Many rare variants were predicted in silico to have compromised function; additionally some had previously demonstrated dysfunctionality in vitro. Rare variants in these 4 genes explained 1.1% of total variation in HTG diagnoses. Our study demonstrates a marked mutation skew that likely contributes to disease pathophysiology in patients with HTG.
Genome-wide association studies (GWAS) have replicably identified novel and known loci associated with population-based plasma lipid concentrations1-5. Despite the robustness of these associations, the proportion of variability explained by GWAS-identified loci is relatively modest, <10% in most studies4,5. While vastly expanded study sample sizes continue to reveal new associations, each newly associated variant has an incrementally smaller effect size and contributes only marginally to the cumulative variation of each lipid phenotype6. This suggests that GWAS of population-based subjects may be reaching their limits to explain genetic variation of complex traits. A question that has arisen is whether additional forms of genetic variation, such as rare variants of individually large effect, could contribute to the “missing heritability” of complex traits such as plasma lipid concentrations6,7. While the mechanistic basis for the association between lipid traits and most common variants discovered in GWAS is still largely unknown, it remains possible that rare variants in GWAS-identified genes may contribute significantly to lipid phenotypes.
Studying subjects at the extremes of a quantitative phenotype distribution has proven useful to identify functional rare variants8-12. Using missense-accumulation analysis in genes defined a priori as likely to contain rare variants, a burden of mutations can be quantified statistically in subjects with severe phenotypes, prior to functional assessment of each variant. Primary hypertriglyceridemia (HTG) is one such complex polygenic disease broadly defined by fasting plasma TG concentrations >95th percentile13. Resequencing of TG-modulating candidate genes has implicated both common and rare variants in HTG disease pathophysiology9,14-16, however the majority of phenotypic variation underlying severe HTG remains unattributed17. Our objectives were: 1) to perform a non-biased GWAS of patients with HTG to identify common variants associated with HTG; and 2) to resequence coding regions of candidate genes in loci reaching genome-wide significance to evaluate the burden of rare variants in HTG patients compared with controls. Here, we demonstrate that loci found to be associated with HTG by GWAS using common variants also harbour a significant excess of rare variants.
In total, 555 HTG patients and 1319 controls were included in two cohorts of the study: the GWAS cohort included 463 HTG patients and 1197 controls and the sequencing cohort included 438 HTG patients and 327 controls. HTG patients were unrelated subjects diagnosed with Fredrickson hyperlipoproteinemia (HLP) phenotypes 2B (MIM 144250), 3 (MIM 107741), 4 (MIM 144600) or 5 (MIM 144650), ascertained primarily from a single tertiary referral lipid clinic. The mean plasma TG concentration of HTG patients was 14.3 mmol/L. Controls had maximum recorded fasting plasma TG concentrations <2.3 mmol/L to exclude undiagnosed HTG. All study subjects were of self-declared European ancestry; subjects deviating from European ancestry as determined by multi-dimensional scaling using whole-genome SNP data were removed from sequencing analysis (Supplemental Fig. 1). As expected, clinical characteristics of HTG patients were less favorable than controls, with worse lipid profiles and an increased prevalence of type 2 diabetes (Table 1).
The HTG phenotype was tested for association with >2.1 million single nucleotide polymorphisms (SNPs) using an additive multivariate logistic regression model (Supplemental Fig. 2). This model appropriately adjusted for sex, body mass index, diabetes status and 10 principal components of ancestry (Supplemental Fig. 3). Four loci were significantly associated with HTG at genome-wide significance: APOA5, GCKR, LPL and APOB (Table 2). Most associations with HTG were mediated by the same genomic loci associated with fasting plasma TG concentration in population-based GWAS5: APOA5 and GCKR were associated at the same lead SNP, and LPL was associated with the same haplotype block. Conversely, the HTG-associated SNPs in APOB were ~123-kb upstream of the gene, perhaps consistent with the involvement of regulatory elements in the over-expression of TG-rich lipoproteins in HTG pathophysiology. Investigation of sub-threshold association signals did not provide any additional insight into novel HTG-associated genes.
Subsequently, we tested the hypothesis that common genetic variants in remaining known TG-associated loci are similarly associated with HTG5. Only three loci were replicated at a Bonferroni-corrected significance threshold of P<0.005: MLXIPL, TRIB1, and ANGPTL3 (Table 2). Positive replication of these TG-associated loci, combined with trends towards significance at FADS (P=0.05) and NCAN (P=0.07), suggest that additional TG-modulating loci may also be involved in HTG pathophysiology, however smaller effect sizes likely limit their detection.
We next hypothesized that HTG-associated genes would harbour rare variants related to HTG disease causation. The protein-coding sequences of APOA5, GCKR, LPL and exons 26 and 29 (67.8%) of APOB were resequenced in individual subjects as regions most likely to harbour protein-compromising mutations. Across the 4 genes, 80 distinct rare variants were identified with minor allele frequencies <1% in controls (Fig. 1, Supplemental Table 1). A significant accumulation of rare variants was identified in HTG patients (Table 3), including 154 total variants in 438 HTG diploid genomes compared to 53 total variants in 327 control diploid genomes (P=6.2×10-8), corresponding to a significantly increased carrier frequency of 28.1% in HTG patients versus 15.3% in controls (P=2.6×10-5). A more restricted analysis of rare variants found exclusively in either HTG patients or controls, deliberately removing all reported variants without demonstrated functional compromise, similarly revealed a significant burden of 47 variants in HTG patients compared to 9 variants in controls (P=2.4×10-5); this corresponds to a significantly increased carrier frequency of 10.3% in HTG patients compared to 2.8% in controls (P=4.4×10-5). HTG carriers’ fasting plasma TG concentrations ranged from 3.10–88.5 mmol/L, whereas control carriers’ fasting plasma TG concentrations ranged from 0.45–1.93 mmol/L. No discernable patterns were observed between such attributes as the gene, mutation type or mutation position with plasma TG concentration or HTG phenotype.
The strength of association between HTG and genomic loci did not predict the mutation accumulation observed in the resequenced genes. LPL harboured the largest relative proportion of rare variants, followed by GCKR, APOB and APOA5, with 30.9, 10.7, 9.3, and 4.5 rare variants per kilobase of coding sequence in HTG patients, whereas the same genes harboured 5.6, 2.7, 4.3 and 0.9 rare variants per kilobase of coding sequence in controls. The burden of rare variants found in HTG patients is highly suggestive of phenotype causation, supported by several truncation mutations, in silico predictions of deleterious effects, and bonafide characterized deleterious mutations (Supplemental Table 1). The majority of subjects carried only 1 rare variant, however subjects with multiple rare variants were also significantly over-represented in HTG patients (6.6% HTG carriers versus 0.9% control carriers; P=3.7×10-5). Not all rare variants in HTG patients are necessarily sufficient to cause HTG, however they likely contribute to the biochemical heterogeneity observed between patients. For instance, the APOB R3500W variant causes hypercholesterolemia18, but was found in a HTG patient with Fredrickson HLP phenotype 2B, defined by both plasma TG and total cholesterol in excess of the 95th percentile. For this individual case, APOB R3500W is more likely contributing to the elevated total cholesterol phenotype, but the mutation is part of the genetic background of this patient that led to his ascertainment through the lipid clinic. This patient exemplifies our working model that both common and rare genetic determinants in TG-associated genes together contribute to the phenotypic heterogeneity underlying HTG.
Finally, we assessed the contribution of genetic and clinical variables to the total variation in HTG diagnosis, in subjects common between GWAS and sequencing cohorts. A comprehensive logistic regression model including clinical variables and both common and rare genetic variants explained 41.6% of total variation in HTG diagnosis: clinical variables explained 19.7%, common genetic variants in 7 HTG-associated loci explained 20.8% and rare genetic variants in 4 HTG-associated loci explained 1.1%. These data suggest that rare variants found in 4 GWAS-identified genes incrementally contribute to the unexplained variation underlying HTG pathophysiology.
In summary, we performed a GWAS and resequencing of HTG-associated genes and found a significant accumulation of missense and nonsense mutations that contribute to the unexplained genetic component of HTG. Our results suggest that a complex genetic architecture of both common and rare variants in a spectrum of TG-associated genes is responsible for HTG. Future studies using high-throughput next generation sequencing are required to determine whether these associations extend to additional HTG-associated genes, including MLXIPL, TRIB1, and ANGPTL3, and TG-associated genes identified by epidemiological-scale GWAS of population-based samples. It also remains possible that rare variants in TG-modulating genes that have not yielded signals on GWAS, such as GPIHBP1 or LMF1, will further contribute to HTG phenotypes19,20. Functional analyses may more accurately define the extent of dysfunction of rare variants identified in HTG patients and their role in disease causation, while higher level analyses including gene-gene and gene-environment interactions will determine the combined impact of multiple genetic variants on plasma TG concentration in patients with HTG. Our study demonstrates that an accumulation of rare variants is present in GWAS-identified genes and that these contribute to the missing heritability of complex traits among individuals at the extreme of a lipid phenotype.
HTG patients were ascertained through tertiary referral lipid clinics. Controls were obtained from population-based studies including the Study of Health Assessment and Risk in Ethnic Groups21 and the Myocardial Infarction Genetics Consortium22 (Supplemental Methods). Biochemical analyses were conducted separately in each cohort, as previously described14,21,22.
All subjects were genotyped using Affymetrix Genome-Wide Human SNP Array 6.0 (Affymetrix, Santa Clara, CA) according to protocols of the London Regional Genomics Centre (www.lrgc.ca) or the Broad Institute (http://www.broadinstitute.org/). Imputation was conducted using HapMap CEU phased haplotypes in MACH23. All genotypes were filtered for minor allele frequency >1%, Hardy Weinberg P>0.0001, and 95% call rate or imputation quality r2>0.4. Identity-by-state calculations, multi-dimensional scaling, and association testing were conducted in PLINK24. Genome-wide significance was pre-specified as P<5X10-7; nominal significance for replication of known TG-associated SNPs was a Bonferroni-corrected threshold P<0.005. Covariates entered into all analyses included sex, body mass index, diabetes status and 10 principal components of ancestry as generated by Eigenstrat25,26.
All genes were bidirectionally sequenced in individual samples using an ABI 3730 Automated DNA Sequencer and called using automated software (Applied Biosystems, Foster City, CA). Rare variants were manually curated, confirmed by repeat analysis, and annotated in silico for functional effects using Polyphen (http://genetics.bwh.harvard.edu/pph/). Only rare variants <1% in controls causing missense or nonsense mutations were counted towards mutation accumulation. Rare variant accumulation was compared between HTG patients and controls using Fisher's exact test, defining nominal significance as a 2-sided P<0.05 (Supplemental Methods).
Total explained variation was calculated from the residuals of multivariate logistic regression model including age, sex, body mass index, diabetes status, and HTG-associated common and rare variants as independent variables, using a SAS v9.2 (SAS Institute, Cary, NJ) macro written for this purpose27 (Supplemental Methods).
We thank the London Regional Genomics Centre (David Carter, Gerry Barbe, and Kevin Kang) for their dedication to this project. We also thank the Myocardial Infarction Genetics Consortium (MIGen) study for the use of their genotype data as control data in our study. The MIGen study was funded by the U.S. National Institutes of Health and National Heart, Lung, and Blood Institute's STAMPEED genomics research program (R01 HL087676) and a grant from the National Center for Research Resources (U54 RR020278). This work was made possible by the facilities of the Shared Hierarchical Academic Research Computing Network (SHARCNET; www.sharcnet.ca). CTJ is supported by a CIHR Banting and Best Canada Graduate Scholarship, a Heart and Stroke Foundation Program Grant, and a CIHR Vascular Research Fellowship. VS was supported by the Sigrid Juselius Foundation and by the Finnish Academy (grant #129494). SSA is supported by the Michael G. DeGroote Heart and Stroke Foundation of Ontario Chair and the Eli Lilly May Cohen Chair in Women's Health Research at McMaster University. RAH is supported by the Jacob J. Wolfe Distinguished Medical Research Chair, the Edith Schulich Vinet Canada Research Chair in Human Genetics (Tier I), the Martha G. Blackburn Chair in Cardiovascular Research, and operating grants from the Canadian Institutes for Health Research (MOP-13430, MOP-79523, CTP-79853), the Heart and Stroke Foundation of Ontario (NA-6059, T-6018, PRG-4854), the Pfizer Jean Davignon Distinguished Cardiovascular and Metabolic Research Award, and Genome Canada through the Ontario Genomics Institute.