|Home | About | Journals | Submit | Contact Us | Français|
The correlation of null alleles with human phenotypes can provide insight into gene function in humans. In individuals of African ancestry, we set out to identify null and damaging missense variants, and test these variants for association with a range of cardiovascular phenotypes.
We performed whole exome sequencing in 3,223 African American individuals from the Jackson Heart Study and found a total of 729,666 variant sites with minor allele frequency (MAF) < 5%, including 17,263 null variants and 49,929 missense variants predicted to be damaging by in silico algorithms. We tested null and damaging missense variants within each gene for association with 36 cardiovascular traits. We found three associations that met our pre-specified level of significance (α=1.1×10−7). Null and damaging missense variants in PCSK9 were associated with 36 mg/dl lower low density lipoprotein cholesterol (LDL-C) (p-value=3×10−21). Three individuals in their 50s with complete PCSK9 deficiency (each compound heterozygote for PCSK9 p.Y142X and p.C679X) were identified, with one having a coronary artery calcification score in the 83rd-percentile despite a LDL-C of 32 mg/dl. A damaging missense variant in HBQ1 (p.G52A) was associated with a 2 pg/cell lower mean corpuscular hemoglobin (p-value=9×10−13) and rare damaging missense variants in VPS13A with higher red blood cell distribution width (p-value=9.9 × 10−8).
A limited number of null/damaging alleles with a large effect on cardiovascular traits were detectable in ~3,000 African American individuals.
A compelling therapeutic target for lowering low-density lipoprotein cholesterol (LDL-C) emerged from human genetic studies - the proprotein convertase subtilisin/kexin type 9 gene (PCSK9)1. Null alleles (also termed loss-of-function [LoF] protein-coding sequence variants) in PCSK9 were identified in African Americans2 and shown to associate with lower plasma LDL-C levels2–4 as well as reduced risk for CHD (up to 88% reduction)5, 6. Based on this human genetic evidence as well as corroborating functional studies, several pharmaceutical companies have established drug development programs targeting PCSK97 and two inhibitors have been approved for reducing LDL-C in individuals with heterozygous familial hypercholesterolemia and individuals with clinical atherosclerotic cardiovascular disease8, 9. Based on the PCSK9 example, it has been suggested that low-frequency or rare mutations of large effect may be paradigmatic for therapeutic target discovery10.
To address whether additional such examples can be readily identified, we sequenced the exomes of 3,223 individuals from the Jackson Heart Study (JHS), a prospective cohort of African Americans living in Jackson, Mississippi, and catalogued null as well as damaging missense mutations across 18,465 genes. Subsequently, we performed an association study of these variants with a range of quantitative and qualitative cardiovascular traits.
The JHS is a community-based longitudinal cohort study located in the Jackson, Mississippi metropolitan area designed to investigate the determinants of cardiovascular disease in African Americans11. JHS recruited 5,301 African Americans, aged between 35–84, between September 2000 and March 200811. The Institutional Review Board of the University of Mississippi Medical Center approved the study protocol and all participants provided written informed consent.
Exome sequencing was performed at three sequencing centers (the Broad Institute [n = 2,317], University of Washington [n = 481], and Baylor University [n = 475]) across 5 projects (The U.S. National Heart, Lung, and Blood Institute’s [NHLBI] Exome Sequencing Project [ESP], Myocardial Infarction Genetics Consortium Exome Sequencing Project [MIGen ExS], CHARGE-S, Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples [T2D-GENES], and Minority Health Genomics and Translational Research Bio-Repository Database [MH-GRID]) (Supplemental Table 1). The sequencing reads (i.e. fastq files) from exomes were aligned to the human genome reference (hg19) using bwa on a per lane basis and bam files were obtained from the three sequencing centers. The Genome Analysis Toolkit (GATK) v3.1 HaplotypeCaller algorithm was used for joint variant discovery and genotyping on both exomes and flanking 50bp of intronic sequence (http://www.broadinstitute.org/gatk/guide/article?id=3893). Single-sample gVCFs were created using the GATK HaplotypeCaller with the options -emitRefConfidence GVCF, --variant_index_type LINEAR, and --variant_index_parameter 128000. Then batches of ~200 gVCFs were merged into a single gVCF using the CombineGVCF command in GATK. Finally, GenotypeGVCFs was run on the combined gVCFs to create the raw SNP and indel VCFs. As a majority of individuals were sequenced at the Broad Institute, we limited analysis to the sequence intervals captured by the Broad’s exome sequencing platform.
GATK Variant Quality Score Recalibration (VQSR) was used with the recommended resources to filter variants. The SNP VQSR model was trained using HapMap3.3 and 1KG Omni 2.5 SNP sites and a 99.5% sensitivity threshold was applied to filter variants, while the INDEL VQSR model was trained using the Mills 1000G gold standard and Axiom Exome Plus sites for insertions/deletions and a 99.0% sensitivity threshold was applied to filter INDEL sites. Variants were filtered to VQSR PASS and quality depth (QD) ≥ 2. (Supplemental Table 2). Individual genotypes were set to missing if depth < 5.
We performed quality control on the jointly-called samples. Individuals were checked for total number of variants, observed number of singletons and doubletons, Ti/Tv ratio, Het/Hom ratio, missingness, contamination with VerifyBamID12, and non-reference concordance with available genotype data from the Illumina HumanExome BeadChip v1.0. Individuals that were outliers (> ± 3*interquartile range) on at least one metric were excluded (Supplemental Table 1, Supplemental Figure 1). Population structure was assessed using the multi-dimensional scaling (MDS) algorithm in the PLINK software13 and ten principal components of ancestry were obtained (Supplemental Figure 2).
All variant sites were annotated with the Variant Effect Predictor algorithm (VEP; http://useast.ensembl.org/info/docs/tools/vep) and dbSFP14 (https://sites.google.com/site/jpopgen/dbNSFP). Analysis was limited to variants predicted to be null (nonsense, splice, frameshift) plus missense variation predicted to be damaging in at least five of the following seven variation prediction tools15: LRT16, Mutation Taster17, PolyPhen218 (HumDiv), PolyPhen2 (HumVar), SIFT19, MutationAssessor20 and FATHMM21.
We analyzed 36 cardiovascular traits (Figure 1) available in the Jackson Heart Study Vanguard Center data package (https://www.jacksonheartstudy.org/jhsinfo/ForResearchers/VanguardCenters/tabid/171/Default.aspx). For participants who were taking antihypertensive medication, we added 10 mm Hg to observed systolic blood pressure (SBP) values and 5 mm Hg to diastolic blood pressure (DBP) values22. We adjusted the total cholesterol values for individuals on lipid lowering medication by replacing their total cholesterol values by total cholesterol divided by 0.823. No adjustment was made on high-density lipoprotein cholesterol (HDL-C) or triglycerides. Only fasting lipid measures were used and LDL-C was calculated using the Friedewald equation for those with triglycerides < 400 mg/dl, using the lipid adjusted total cholesterol for those on treatment.
Individuals with diabetes were excluded in analyses of fasting plasma glucose, fasting insulin, HOMA-IR, HOMA-B, and HbA1c. Individuals with QRS > 120, atrial fibrillation, or coronary heart disease were excluded for analysis of QRS interval. Individuals with QRS ≥ 120, ECG heart rate < 40, ECG heart rate > 120, or with atrial fibrillation were excluded from the analysis of QT interval. Individuals with end stage renal disease (ESRD) defined as eGRF < 15 or reporting being on dialysis, hemoglobinopathy defined as being homozygous for rs334, or myelotoxic drug use were excluded from the blood cell trait analyses.
Non-normality of the following raw traits was resolved by a natural log transform before analysis: triglycerides, leptin, hsCRP, endothelin, renin, aldosterone, and adiponectin. Non-normality was resolved by the log transformation.
We performed gene-based analyses of 36 cardiovascular phenotypes. We limited analysis to null mutations plus missense variants predicted to be damaging by at least 5 of 7 in silico prediction algorithms (LRT, Mutation Taster, PolyPhen2 (HumDiv), PolyPhen2 (HumVar), SIFT, MutationAssessor and FATHMM)15. We aggregated variants with minor allele frequency (MAF) < 5% within each gene using four sets of variants: (1) null mutations only, (2) null mutations plus missense variants predicted to be damaging by 7 of 7 in silico prediction algorithms, (3) null mutations plus missense variants predicted to be damaging in at least 6 of 7 in silico prediction algorithms, and (4) null mutations plus missense variants predicted to be damaging in at least 5 of 7 in silico prediction algorithms. All associations were performed using the EPACTS (http://genome.sph.umich.edu/wiki/EPACTS) software. EPACTS (Efficient and Parallelizable Association Container Toolbox) is a software pipeline to perform statistical tests of association using sequence data. It implements the EMMAX24 (Efficient Mixed Model Association eXpedited) model, a mixed model association approach that captures pedigree, cryptic relatedness, and population structure by using a covariance matrix estimated from genome-wide data. To apply the EMMAX model, we used the epacts-group command with the emmaxCMC test option to perform collapsing burden gene-based tests. The single command with the q.emmax test option in EPACTS was used to obtain the single variant results for each variant going into the gene-based test. We used an additive genetic model. A kinship matrix of all individuals was created with EPACTS and used in analyses. All analyses were adjusted for age, sex, and 4 principal components of ancestry. Analyses for QT interval and QRS additionally included adjustments for height and BMI.
We excluded results with ≤ 10 minor alleles contributing to the gene-based test to ensure robust association statistics. We set our significance threshold to 1.1 × 10−7 (0.05/[36 traits*~12,500 genes after minor allele count exclusion]).
A Wilcoxon rank sum test was performed to compare PCSK9 null compound heterozygous carriers to heterozygous carriers using the R software (version 3.1). Coronary artery calcification (CAC) percentiles were calculated with the MESA CAC Score Reference Values web tool (http://www.mesa-nhlbi.org/Calcium/input.aspx)25.
We performed power calculations using the Genetic Power calculator (http://pngu.mgh.harvard.edu/~purcell/gpc/) with the “QTL association for sib-ships and singletons” option.
After quality control, 3,223 individuals from the Jackson Heart Study were available for analysis (Table 1, Supplemental Table 1). We observed 17,263 null variants with MAF < 5% and 49,929 missense variants predicted to be damaging in at least 5 of 7 in silico prediction algorithms with MAF < 5% (Supplemental Table 2). Of the 18,465 genes sequenced, 14,058 have a null or damaging missense variant with MAF < 5%. On average, we observe 5 null or damaging missense variants per gene and an average of 7 null or damaging missense alleles per gene. Each individual carries, on average, a total of 153 null or damaging missense variants with MAF < 5%.
We found three gene-based associations that met our pre-specified significance threshold of 1.1 × 10−7 (Table 2, Supplemental Table 3, Supplemental Table 4). The most significant association was between LDL-C and PCSK9. Participants who carried null or damaging missense mutations in PCSK9 had 36 mg/dl lower LDL-C compared with non-carriers (p-value=2.9 × 10−21). Of note, we identified three individuals with complete PCSK9 deficiency (each compound heterozygote for PCSK9 p.Y142X and p.C679X) (Table 3). These individuals had a lower median LDL-C (64.2 mg/dl) compared to individuals that carry only one null mutation (85.7 mg/dl; n=77) (p-value=0.044; Supplemental Figure 3). The three PCSK9 null compound heterozygotes did not differ from heterozygotes in any other cardiometabolic trait tested except QT interval (Supplemental Table 5). Compound heterozygotes had a lower QT interval (mean=369, range=362–380) compared to individuals that carried only one null PCSK9 variant (mean=413) (p-value=0.006 using a Wilcoxon rank sum test). Individuals carrying one null PCSK9 variant had similar QT intervals compared with non-carriers (mean=413), suggesting a recessive effect. Two individuals carrying both PCSK9 p.Y142X and p.679X had a coronary artery calcification (CAC) greater than the 80th-percentile for their age and sex. A 52-year-old man had a CAC of 24.9, which is in the 83rd-percentile for age and sex, despite an LDL-C of 32 mg/dl (Table 3).
The second most significant gene association was between mean corpuscular hemoglobin (MCH) and hemoglobin subunit theta 1 (HBQ1). Individuals carrying a damaging missense variant (p.G52A)26 in HBQ1 had lower MCH compared with non-carriers (p-value=8.4 × 10−13). One additional association passed our significance threshold. Rare damaging missense variants in Vacuolar Protein Sorting-Associated Protein 13A (VPS13A) were associated with an increase in red blood cell distribution width (p-value=7.1 × 10−8). Of the nine variants that contributed to the association between VPS13A and red blood cell distribution width, six were singletons, one a doubleton, one with four carriers (p.S2673L) and one with 22 minor allele carriers (p.K2672N) (Supplemental Table 4). VPS13A showed evidence for association with other hematologic phenotypes, including lower hemoglobin levels (p-value=7.0 × 10−04; Supplemental Table 6).
Li et al27 recently reported ten gene-based associations aggregating null variants with a p-value < 4.4 × 10−6. Individuals of African Ancestry contributed to seven of these associations. We attempted to replicate these seven associations in our data (Supplemental Table 7). We replicated the association of total cholesterol with PCSK9 (beta = -39 mg/dl; p-value = 6.6 × 10−12), and of triglycerides with apolipoprotein C-III (APOC3; p-value = 1.0 × 10−5)2, 28–30. We found suggestive evidence for the association of fasting glucose with thioredoxin domain containing 5 (TXNDC5), consistent with the report by Li et al; carriers of null alleles in TXNDC5 had higher fasting glucose compared with non-carriers (p=0.07).
For 3,223 individuals and a significance level of 1.1 × 10−7, we had 99% statistical power to detect a 1-standard deviation unit effect with a 1% cumulative minor allele frequency, and 64% statistical power to detect a 1-standard deviation unit effect with a 0.5% cumulative minor allele frequency. Analysis of Mendelian lipid genes as a ‘positive control’ shows several genes where a burden of null/damaging mutations alters the expected plasma lipid fraction in the appropriate direction (e.g., LDLR and higher LDL-C [P=4.7 × 10−5], CETP and higher HDL-C [P=0.0001]) (Supplemental Table 8). However, even an analysis of positive controls is limited by the number of carriers, with the majority of the Mendelian lipid genes having < 10 observed null alleles.
We set out to discover null or damaging missense variants that lead to a large effect on any of a range of cardiovascular traits. In a study of 3,223 African Americans, we found three associations that met our pre-specific significance threshold.
We report one new observation, that of VPS13A associated with an increase in red blood cell distribution width (RDW). RDW is a measure of the range of variation in red blood cells and higher values can indicate certain disorders such as anemia. Mutations in VPS13A have been reported to cause chorea-acanthocytosis, an autosomal recessive neurodegenerative disorder that causes red blood cells to appear spiky31. Ten VPS13A variants are reported in ClinVar with chorea-acanthocytosis listed as the condition. We did not find any of the reported ClinVar variants in our data nor any carriers of rare damaging recessive variants in VPS13A. Here, in a sample of individuals unselected for disease state, we report a milder phenotype resulting from heterozygous mutations in VPS13A. Similar to VPS13A, Mendelian lipid genes having a large effect on plasma lipid levels have been shown to harbor common variants with smaller effects on phenotype32–34.
We found three individuals who are compound heterozygous for null mutations in PCSK9. Previously, only two individuals with PCSK9 deficiency have been reported35, 36. Both of the previously reported individuals were young (21 and 31 years old) and had very low circulating LDL-C (14–16 mg/dl). The three individuals we have identified here are older (50–52 years old) and have higher circulating LDL-C (32–72 mg/dl). One of the three individuals had a CAC score in the 83rd-percentile despite a LDL-C of 32 mg/dl. CAC values over the 75th-percentile are considered abnormal.
Some limitations deserve mention. The association between VPS13A and RDW needs to be confirmed in an independent study. Furthermore, sequencing will be required for replication; none of the variants driving the novel gene-based association were available on the widely-used exome genotyping array. The few results passing our pre-specified significance level could be explained by statistical power given our sample size and the limited number of observed null alleles per gene. We also note that we have used a stringent significance threshold given the multiple testing burden inherent in our study design.
In conclusion, a limited number of null/damaging alleles with a large effect on cardiovascular traits were detectable from the exome sequences of 3,000 African American individuals.
The correlation of null alleles with human phenotypes can provide insight into gene function in humans. Here, we performed whole exome sequencing in 3,223 African American individuals living in Jackson, Mississippi in order to identify null and damaging missense variants and test these variants for association with 36 cardiovascular traits. We replicated the association of null and damaging missense variants in PCSK9 with LDL cholesterol and found three individuals in their 50s each compound heterozygous for PCSK9. Of note, one of these three individuals had a coronary artery calcification score in the 83rd-percentile despite a LDL-C of 32 mg/dl. We also found individuals with rare damaging missense variants in VPS13A had higher red blood cell distribution width compared with non-carriers. Mutations in VPS13A have been previously reported to cause chorea-acanthocytosis, an autosomal recessive neurodegenerative disorder that causes red blood cells to appear spiky. Only a limited number of null/damaging alleles with a large effect on cardiovascular traits were detectable in ~3,000 African American individuals.
Sources of Funding: GMP is supported by the National Heart, Lung, and Blood Institute of the National Institutes of Health under Award Number K01HL125751. SK is supported by a Research Scholar award from the Massachusetts General Hospital (MGH), the Howard Goodman Fellowship from MGH, the Donovan Family Foundation, R01HL107816, and a grant from Fondation Leducq. The Jackson Heart Study is supported by contracts HHSN268201300046C, HHSN268201300047C, HHSN268201300048C, HHSN268201300049C, HHSN268201300050C from the National Heart, Lung, and Blood Institute and the National Institute on Minority Health and Health Disparities. The MH-GRID Network (Investigators: Rakale C. Quarells, Gary H. Gibbons, Donna K. Arnett, Robert L. Davis, Suzanne M. Leal, Deborah A. Nickerson, James Perkins, Charles N. Rotimi, Joel H. Saltz, Herman A. Taylor, and James G. Wilson) was supported, in part, by a grant from the National Institute on Minority Health and Health Disparities (grant #1RC4MD005964). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.