|Home | About | Journals | Submit | Contact Us | Français|
Rare variant association studies (RVAS) target the class of genetic variation with frequencies less than 1%. Recently, investigators have used exome sequencing in RVAS to identify rare alleles responsible for Mendelian diseases but have experienced greater difficulty discovering such alleles for complex diseases. In this review, we describe what we have learned about lipoprotein metabolism and coronary heart disease (CHD) through the conduct of RVAS.
Rare protein-altering genetic variation can provide important insights that are not as easily attainable from common variant association studies. First, RVAS can facilitate gene discovery by identifying novel rare protein-altering variants in specific genes that are associated with disease. Second, rare variant associations can provide supportive evidence for putative drug targets for novel therapies. Finally, rare variants can uncover new pathways and reveal new biologic mechanisms.
The field of human genetics has already made tremendous progress in understanding lipoprotein metabolism and the etiology of CHD in the context of rare variants. As next generation sequencing becomes more cost-effective, RVAS with larger sample sizes will be conducted. This will lead to more novel rare variant discoveries, and the translation of genomic data into biological knowledge and clinical insights for cardiovascular disease.
The frequency of a variant can be classified into four categories: fixed, common, low frequency, and rare. Fixed variants, also called monomorphic, do not vary among individuals in a population, whereas, common single nucleotide variants (SNVs) [frequency ≥ 5%] represents the other end of the allele frequency spectrum that contains the most frequent number of alleles in a SNV . Because of the high frequency of common variants, these SNVs are typically captured using standard genotyping arrays and potential disease loci can be more easily detectable in populations of moderate size . Furthermore, low frequency variants (frequency between 0.5–5%) have been included on most genotyping arrays  and imputation of low frequency variants is now possible in most human populations . Unlike low-frequency variants, rare variants (frequency < 0.5%) are seen in one or a few individuals and are optimally detected using high-throughput sequencing approaches.
Protein-altering, or non-synonymous variants, are those SNVs predicted to change the amino acid sequence of a protein. A special class of rare protein-altering variants are those that result in a complete loss of function (LoF). Complete LoF mutations are SNVs predicted to create a premature stop codon (nonsense), disrupt a splice-site and indels predicted to alter the reading frame of a protein (frameshift). These variants are likely to be under strong negative selection and have a strong deleterious impact on protein function .
Common variant association studies (CVAS; sometimes referred to as genome-wide association studies (GWAS)), typically target the class of genetic variation with frequencies greater than 1%. These studies rely on genotyping arrays, and have been successful in identifying common genetic variation associated with complex traits and diseases, including lipid metabolism and coronary heart disease (CHD) [5–9]. The first wave of CVAS discovered an association with CHD at the 9p21 locus [5, 6, 8, 9]. Subsequently, larger studies have now identified over >160 loci associated with blood lipids [9–12] and 58 loci associated with CHD .
Loci discovered from CVAS can span several megabases and involve dozens of genes (due to linkage disequilibrium). Fine-mapping  and functional validation studies  have helped refine association signals. However, such studies are time-consuming and in a number of cases, it still remains difficult to localize specific variants and/or genes underlying disease associations.
Rare variant association studies (RVAS), on the other hand, target the class of genetic variation that is generally missed by CVAS; that is, these studies can capture SNVs with frequencies less than 1%. These studies typically rely on sequencing technologies. Recently, several large-scale RVAS studies have been performed using next generation sequencing coupled with targeted capture of the protein-coding regions (“exome sequencing”). To date, investigators have used exome sequencing to identify rare alleles responsible for Mendelian diseases , but have experienced greater difficulty discovering such alleles for complex diseases .
RVAS has several advantages over CVAS. In Table 1, we highlight the advantages and disadvantages of RVAS compared to CVAS. Rare protein-altering genetic variation can provide impactful biological and clinical insights that common variants studied in CVAS cannot. First, RVAS can facilitate gene discovery by identifying novel rare protein-altering variants associated with disease. Second, rare variant associations can provide supportive evidence for putative drug targets for novel therapies. Finally, rare variants can uncover new biology involved in disease pathways. In the present review, we discuss each of these impacts that RVAS has had in informing lipoprotein metabolism and CHD.
By studying rare protein-altering variants segregating in families with putative Mendelian dyslipidemias, one can implicate causal variants and genes for disease . An example is a study by Musunuru et al. that examined the exome sequences of two siblings with very low levels of plasma low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C) and triglycerides . Two siblings had both copies of the angiopoietin-like 3 (ANGPTL3) gene knocked out (i.e. compound heterozygotes) by two different nonsense mutations, p.S17X and p.E129X, where both mutations affected LDL-C and triglycerides in a manner that was dependent on the number of mutations carried. The ANGPTL3 protein is secreted by the liver, involved in the regulation of extracellular lipases, and is associated with lipid levels in mice . More recently, a larger study of nine families and 32 unrelated individuals has confirmed the p.S17X variant association with Familial Combined Hypolipidemia (FCH) . FCH causes a global reduction of plasma lipoproteins. Specifically, FCH is characterized by a reduction in apolipoprotein (Apo) B-[very low-density lipoprotein (VLDL) and low-density lipoprotein (LDL)] and apoA-1-containing [high-density lipoprotein (HDL)] lipoproteins . Homozygotes for the p.S17X variant had a reduction in plasma lipid and no circulating ANGPTL3 . A significant carrier effect is observed for p.S17X variant and ANGPTL3 levels, with carriers having a 42% reduction in ANGPTL3 levels compared to non-carriers.
Despite this early success, using exome sequencing to discover novel inherited causes of Mendelian disorders can be difficult in instances where inheritance patterns are not recessive. A recent study performed whole exome sequencing on 213 individuals from 41 families with extreme LDL-C or HDL-C levels. The study successfully identified putative pathogenic variants in nine (22%) families, all in known lipid-mediated genes . Although novel variants were not discovered in this study, the study highlighted potential limitations of detecting rare variants in familial settings. These limitations include complex inheritance patterns and the over representation of shared alleles within families, making identification of casual alleles indistinct .
Recently, large-scale efforts have applied exome sequencing to study rare variants (frequency < 1%) in population-based samples. By studying the extremes of quantitative traits such as LDL-C, Lange et al. showed that carrying rare variants in patatin-like phospholipase domain-containing protein 5 (PNPLA5) increased LDL-C by 43.1 mg/dl compared to non-carriers [21*]. This finding was unique in that it was only observed in African Americans. Hence, population-specific signals can be captured by incorporating samples from different ethnicities in the study design of RVAS.
Exome sequencing efforts have also been applied to individuals who present with myocardial infarction (MI) at a young age. Do et al. sequenced more than 9,700 exomes from samples with early-onset MI and MI-free controls. From this effort, rare variant association signals were discovered for early-onset MI in two genes – 1) low-density lipoprotein receptor [LDLR] and 2) and apolipoprotein A-V [APOA5]. In this study, we demonstrated that by focusing on rare protein-altering variants, one can implicate the causal variant and gene, unlike CVAS, where it’s often difficult to point to a creditable set of variants . For example, in this study, APOA5 was established as a bona fide MI gene. Although common variants in the apolipoprotein A-1 [APOA1]-apolipoprotein C-III [APOC3]-apolipoprotein A-4 [APOA4]-apolipoprotein A-5 [APOA5] (abbreviated: APOA1-C3-A4-A5) locus have been associated with plasma triglycerides and MI risk [23–25], the region contains a complex linkage disequilibrium pattern and several established lipid genes including APOA1, APOC3, and APOA4. Therefore, previously, it had been unclear if APOA5 was a MI gene in humans. Furthermore, this study showed that SNVs predicted to severely disrupt gene function showed the largest magnitude of effect for both LDLR and APOA5. This includes complete LoF variants as well as the set of variants that are predicted to be deleterious across multiple in silico functional prediction algorithms. This study provided proof of principle that exome sequencing can be used to discover rare mutations that contribute to the genetic risk to a complex disease.
Human genetics has the potential to identify new therapeutic targets and provide evidence for validation of targets with regard to their impact on plasma lipids and MI. A case-in-point is the discovery of the PCSK9 gene as a regulator of plasma LDL-C. Abifadel et al. studied large pedigrees using classical linkage analysis and discovered that gain-of-function mutations in PCSK9 led to autosomal dominant familial hypercholesterolemia (FH). Other studies of PCSK9 also suggest causality with rare variants in proprotein convertase subtilisin/kexin type 9 preproprotein (PCSK9) in population-based settings. Cohen et al showed that individuals with nonsense mutations in PCSK9 (C679X and Y142X) had much lower total cholesterol and LDL-C compared to those without the mutation . Furthermore, when investigating probands in trios with LDL-C below the 10 percentile, Cohen et al. report one family with the p.Y142X null mutation in which all individuals carrying the substitution had LDL-C levels below the 5th percentile . Based in part on the human genetics evidence, pharmaceutical companies have concluded that targeting PCSK9 can lead to an effective therapeutic. Recently, the U.S Food and Drug Administration has approved two PCSK9 inhibitors (Praluent and Repatha) for use in individuals who require additional lowering of LDL-C.
As another example, RVAS have been used to validate the main target effect of the drug ezetimibe. Ezetimibe has been shown to lower plasma levels of LDL-C. However, until recently, it was unclear whether this drug would also lower risk for CHD. Ezetimibe functions by inhibiting the protein product of Neimann-Pick C1 Like 1 (NPC1L1), a gene that encodes a protein essential in intestinal cholesterol absorption . By sequencing the exons of 7,364 CHD cases and 14,728 controls, a study showed that heterozygote carriers for complete LoF mutations in NPC1L1 had 12 mg/dL lower LDL-C compared to non-carriers. Subsequently, carriers were shown to have a 55% decreased risk for CHD [29**]. This finding supports the hypothesis that NPC1L1 inhibition may decrease CHD risk. Findings from this study were consistent with those from a very recent study, The Improved Reduction of Outcomes: Vytorin Efficacy International Trial (IMPROVE-IT), which showed that ezetimibe added to statin therapy had a stronger effect on CHD risk compared to just statin therapy alone .
Finally, an antisense oligonucleotide inhibiting APOC3 has been developed based on evidence from human genetics that shows that a null mutation in APOC3, p.R19X, leads to lower plasma triglycerides. However, it was unknown whether inhibiting APOC3 would lower CHD risk. A genetic study by Crosby et al. described rare loss-of function and missense mutations in APOC3 [31**], where carriers of rare APOC3 mutations had considerably lower serum triglyceride (39%) and circulating APOC3 (43%) levels compared to non-carriers. Carriers of APOC3 variants also had significantly higher HDL and lower LDL cholesterol. These carriers subsequently had as much as 40% reduced risk of developing CHD [31**]. This work suggests that therapeutically targeting APOC3 will reduce risk of CHD.
In addition to prioritizing new targets for drug discovery, such gene discovery can often reveal previously unappreciated biologic mechanisms. With PCSK9, an entirely new mechanism for regulation of plasma LDL-C was identified [32, 33]. Similarly, the discovery of ATP-binding cassette sub-family A member 1 (ABCA1) as the causal gene for Tangier disease , a disorder (originally discovered on Tangier Island off the coast of Virginia) characterized by very low HDL-C, has led to a new field of research worldwide in cholesterol efflux.
Recent RVAS using exome sequencing technology have highlighted specific genetic mechanisms involved in triglyceride (TG)-rich lipoprotein metabolism and the etiology of CHD risk. Plasma TG concentration is a complex polygenic trait that integrates chylomicrons synthesized by the intestine in the postprandial state, very-low-density lipoproteins (VLDL) synthesized by the liver, and remnant lipoproteins of both intestinal and hepatic origin. Epidemiological studies have established associations between plasma TG and increased risk to CHD events ; however, until recently, it has been unknown whether this association represents causal processes. Recent Mendelian randomization studies have suggested that plasma TG [36, 37] and remnant cholesterol are causal risk factors for CHD. However, it has been suggested that this causal relationship may be driven by only a subset of genes regulating TG-rich lipoproteins contribute to CHD risk.
The aforementioned RVAS on APOC3 and APOA5 have implicated these genes as specific mechanisms for altering TG-rich lipoproteins and impacting CHD risk. Furthermore, rare complete LoF mutations in the gene encoding lipoprotein lipase (LPL) cause chylomicronemia, severe hypertriglyceridemia, very low HDL-C, and premature CHD [38, 39]. In addition, a gain-of-function (GoF) variant in LPL is associated with lower TG, higher HDL-C, and reduced CHD risk [40, 41]. These three genes suggest that at least some of the genes in the LPL pathway can decrease CHD risk by altering TG, beyond lowering LDL-C. We note that not all genes in the LPL pathway do not show definitive evidence of altering CHD risk. For example, apolipoprotein C-II is a critical co-factor for the activation of LPL. Furthermore, lipase maturation factor 1 and glycosylphosphatidylinositol anchored high density lipoprotein binding protein 1 play key functional roles in regulating LPL activity . Complete LoF mutations have been observed in each of these three genes and have been shown to lead to deficiency in LPL activity and severe hypertriglyceridemia. It’s unclear why some genes in the LPL pathway appear to be influence CHD risk while others do not. One possibility may perhaps be that APOC3, APOA5 and LPL may be capturing specific atherogenic processes. Another may be statistical where current RVAS for MI are - even at ~10,000 cases and controls - are statistically underpowered and therefore it remains unclear whether complete LoF mutations in these genes alter CHD risk [16, 43**].
In Table 2, we summarize human genetic evidence for each of the genes in the LPL pathway that show definitive genetic evidence in influencing CHD risk. Common, low-frequency and/or rare variants of genes encoding LPL and its modulating proteins are associated with differences in plasma TG and/or HDL-C, and in some instances, with risk of CHD [31**, 40, 43**, 44]. These human genetic findings highlight the importance of the LPL pathway in triglyceride-rich lipoprotein metabolism, and the etiology of CHD.
The field has made tremendous progress in the genomics of lipoprotein metabolism in the context of rare variants. Addressing current limitations of exome sequencing such as increasing sample size of RVAS to increase statistical power will accelerate gene discovery efforts. Fortunately, large collaborations and the use of large health care systems with hundreds of thousands of individuals with available electronic medical records can potentially overcome sample size limitations. Furthermore, recruitment and follow-up of carriers of rare mutations can be used to understand biological function using a reverse genetics approach. Unlike traditional forward genetics where a phenotype-genotype relationship is inferred from observing a phenotype first, reverse genetics uses known genotype information to further probe specific physiological processes. Such approaches have primarily been used in non-human model organisms but are now being used to generate genotype-first prospective human cohorts . Much like the progress seen for genetic mapping in CVAS studies , we expect that as sequencing becomes more cost-effective and the scalability of RVAS to very large sample sizes becomes feasible, rare variant discovery of human disease, and subsequent translation of these discoveries to biologic and clinical insights will be accomplished.
Financial support and sponsorship
GMP is supported by the National Heart, Lung, and Blood Institute of the National Institutes of Health under Award Number K01HL125751. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Conflicts of interest