|Home | About | Journals | Submit | Contact Us | Français|
Genome wide association studies (GWAS) have identified SNPs in the 9p21 gene desert associated with coronary artery disease (CAD)1–4 and Type 2 diabetes (T2D)5–7. Despite evidence for a role of the associated interval in neighboring gene regulation8–10, the biological underpinnings of these genetic associations to CAD or T2D have not yet been explained. Here we identify 33 enhancers in 9p21; the interval is the second densest gene-desert for predicted enhancers and 6 times denser than the whole genome (p<6.55 10−33). The CAD risk alleles of SNPs rs10811656/rs10757278 are located in one of these enhancers and disrupt a binding site for STAT1. Lymphoblastoid cell lines (LCL) homozygous for the CAD risk haplotype exhibit no binding of STAT1, and in LCL homozygous for the CAD non-risk haplotype binding of STAT1 inhibits CDKN2BAS expression, which is reversed by siRNA knock-down of STAT1. Using a new, open-ended approach to detect long-distance interactions (3D-DSL), we find that in human vascular endothelium cells (HUVEC) the enhancer interval containing the CAD locus physically interacts with the CDKN2A/B locus, the MTAP gene and an interval downstream of INFA21. In HUVEC, IFNγ activation strongly affects the structure of the chromatin and the transcriptional regulation in the 9p21 locus, including STAT1 binding, long-range enhancer interactions and altered expression of neighboring genes. Our findings establish a link between CAD genetic susceptibility and the response to inflammatory signaling in a vascular cell type and thus demonstrate the utility of GWAS findings to direct studies to novel genomic loci and biological processes important for disease etiology.
GWAS have identified 8 SNPS in the 9p21 interval strongly associated with CAD1–4 and other vascular diseases11,12, all of which are highly correlated (r2 > 0.8) and located in a 53-kb linkage disequilibrium (LD) block. The haplotype diversity in this interval is limited in Caucasians with ~ 25% of individuals homozygous for the risk haplotype which confers ~ 2 fold greater risk of myocardial infarction than noncarriers2. Independent studies have identified 4 more SNPs associated with T2D in an adjacent but yet distinct 11-kb LD block5–7. The associated CAD and T2D SNPs (Supplemental Table 1) lie in a gene desert (Figure 1) flanked by CDKN2B (130kb upstream) and DMRTA1 (370kb downstream) suggesting that the functional variants underlying the association are likely in regulatory elements. The CAD interval overlaps with the 3′ end of a non-coding gene with unknown function, CDKN2BAS, which is co-expressed with the CDKN2A/B locus13.
In this study we used a multi-pronged approach involving cellular assays and population sequencing to identify and functionally characterize the variants underlying the 9p21 association with CAD (Supplemental Figure 1). We initially sought to identify regulatory elements in the 9p21 gene desert by examining transcription factor binding and chromatin modification profiles in human cells including HeLa, K562, and human ES cells14. Histone H3 trimethylated at lysine4 (H3K4me3) is associated with promoters of active genes, looking for this mark we determined that the CDKN2B and DMRTA1 promoters were the only ones in the interval (Figure 1). CTCF-binding sites mark insulators15; from ChIP-chip data of this factor in HeLa cells, we identified seven potential insulators in the 9p21 interval. One insulator is located close to the CDKN2B promoter and another one is located 130 kb upstream, in the CAD interval. Finally we searched for marks indicative of enhancers; enrichment of histone H3 monomethylation at lysine 4 (H3K4me1), binding of p300 and MED1, presence of DNase hypersensitivity sites (DHS) and depletion of H3K4me316. Looking at these marks, we predict the locations of 33 enhancers, of which 26 are significantly enriched in conserved sequences (p<0.01 – Supplemental Table 2). Six enhancers are proximal to the CAD interval; nine enhancers are located in the CAD interval (referred to as ECAD1-9), two in the T2D interval (referred to as ET2D1-2) and 16 distal to the T2D interval. The majority of the 33 predicted enhancers fall in the proximal part of the gene-desert, in or near the CAD and T2D interval. These findings were further supported by the analysis of publicly available genome-wide datasets generated to predict regulatory elements using a variety of cell types (Figure 1). Additionally, we validated the enhancer activity of the ECAD2, ECAD4, ECAD5, ECAD7, ECAD8, ECAD9 and 1 ET2D elements using luciferase reporter assays in HeLa cells (data not shown). Interestingly, we determined that the 9p21 interval is the second densest gene-desert for predicted enhancers (7.5 per 100kb) in the human genome and the one containing the most disease-associated variants (10 SNPs – Supplemental Table 3). These data indicate that the 9p21 gene desert has an important regulatory function.
To identify the complete set of DNA variants in the 9p21 gene desert we sequenced a 196-kb interval (from CDKN2B promoter to 65 kb distal of the T2D interval) in 50 individuals of European descent and used the variants to refine the pattern of LD in the interval (Figure 2a). We identified 765 variants (31 indels and 734 SNPs) and used r2 correlation coefficient to identify variants in LD with any of the 8 CAD or 4 T2D associated SNPs. Of the 765 variants, 41 spanning 44 kb are in perfect LD with a CAD associated SNP, increasing to 131 variants spanning 111 kb when lowering r2 threshold to 0.5 (Figure 2b). In contrast to the CAD interval, only 23 variants were in LD (r2≥0.5) with at least one of the 4 T2D associated SNPs, spanning 11-kb. No variants were in LD (r2≥0.5) with both CAD and T2D associated SNPs.
The functional variants underlying the association of 9p21 with CAD and T2D are highly likely a subset of the 131 and 23 candidate variants in LD (r2≥0.5) with the initial CAD or T2D associated SNPs, respectively. However, the strong LD structure in the 9p21 gene-desert presents a hurdle for the precise identification of the causative functional variants using only genetic evidence. Thus, we determined which of the candidate variants are located in functional elements. Five of the CAD candidate variants are located in exons and 118 in introns of CDKN2BAS, however, none of the exonic variants are located in a conserved element suggesting that they are unlikely to be functional. Thirty-three CAD candidate variants are located in ECAD1-9 sequences; ECAD9 is significantly enriched for variants (11 in 1700 bp; p<3.1×10−5). The remaining 22 variants are distributed in ECAD1 (3 variants), ECAD2 (2), ECAD3 (8), ECAD4 (1), ECAD5 (3), ECAD6 (1), ECAD7 (3), and ECAD8 (1). Six of the 23 T2D candidate variants are located in a T2D enhancer, 5 of which are in ET2D1 and one in ET2D2.
To identify candidate regulatory variants, we computationally determined which variants disrupt consensus transcription factor-binding sites (TFBS) in the predicted enhancers (Supplemental Tables 4 & 5). Although many of the enhancers in the CAD interval had variants disrupting TFBS, ECAD9 containing 8 such variants was the most notable. Additionally, the SNP (rs10757278) most consistently associated with increased risk of CAD17 is in ECAD9 and disrupts a TFBS involved in the inflammatory response (STAT1 – Supplemental Table 4). The rs10811656 and rs10757278 SNPs are located 4-bp apart in the predicted ECAD9 STAT1 binding site in the non-risk haplotype, which is disrupted in the risk haplotype (TTCCGGTAA > TTCTGGTAG). STAT1 has previously been shown to bind this locus in HeLa cells18, which are heterozygous for rs10811656/rs10757278 alleles, when treated with IFNγ. We observed the binding of STAT1 at the same locus in HUVEC heterozygous for the rs10811656/rs10757278 alleles upon treatment with IFNγ (Figure 3a). We examined the effect of IFNγ treatment in heterozygous HeLa and HUVEC cells on the expression of CDKN2B and CDKN2BAS. Interestingly, we noticed that IFNγ treatment leads to the repression and induction of CDKN2B and CDKN2BAS, respectively. This effect is greater in HUVEC where CDKN2BAS expression is induced 4-fold and CDKN2B transcription is repressed 2-fold (Figure 3b). These results are consistent with the epigenetic transcriptional repression of CDKN2B induced by ANRIL, the transcript encoded by CDKN2BAS19.
To validate the disruption of STAT1 occupancy at the rs10811656/rs10757278 risk alleles, we used LCL homozygous for the CAD risk haplotype. STAT1 is constitutively expressed at high levels in LCL20. We showed by chromatin immunoprecipitation (ChIP) that STAT1 binds at ECAD9 in 2 LCL homozygous for the CAD non-risk haplotype (2.7-fold enrichment), whereas it does not bind in 2 LCL homozygous for the risk haplotype (Figure 3c). To further establish a functional link between STAT1 occupancy in ECAD9 and expression of CDKN2BAS, we assessed the expression of CDKN2BAS in LCL upon STAT1 siRNA mediated knock-down; the expression of CDKN2BAS was significantly up regulated (7-fold) in LCL homozygous for the CAD non-risk haplotype, but the effect was quite small in LCL homozygous for the CAD risk haplotype (Figure 3d). These results are consistent with the fact that STAT1 fails to bind ECAD9 in the CAD risk haplotype and thus is unlikely to participate in CDKN2BAS regulation. Interestingly, these results suggest that the effects of STAT1 binding at ECAD9 may have cell type-specific differences on the expression of CDKN2BAS between LCL (repression) and HUVEC (activation) (Supplemental Discussion).
To determine which genes are interacting with and potentially regulated by the ECAD9 enhancers we examined all genes (20 upstream and 1 downstream) located within 2Mb for interactions. As long-range enhancer interactions are often tissue specific21 we chose to use HUVEC cells as a model system for vascular endothelium. We performed chromatin conformation capture (3C) using cross-linked HUVEC grown in standard conditions, followed by BamH1 digestion of the chromatin and diluted religation22. We analyzed the 3C using a novel approach derived from combining 3C with DNA selection and ligation 23 (method referred to as 3D-DSL) and examined the ligated acceptor and donor BamH1 sites via high throughput sequencing. The resolution of 3D-DSL is limited by the distribution of BamH1 sites, to compensate for this we selected 6 BamH1 sites flanking ECAD9 and other neighboring enhancers as acceptor sites, and 145 BamH1 donor sites distributed throughout the 2 Mb of sequence tested for interactions. We identified 9 donor sites interacting strongly with the enhancer acceptor sites (Figure 4a) of which 7 are sufficiently distant (>45 kb) from the acceptor sites to be recognized as specific (Supplemental Table 6). They occur in 4 areas: 1) CDKN2A/B locus, 2) MTAP gene, 3) downstream of INFA21 and 4) downstream of the T2D Associated Region (DOTAR), an interval of unknown function. Accounting for the resolution limitations due to the distribution of BamH1 sites, we did not assign ligated acceptor sites as belonging to ECAD9 but rather refer to them as “associated enhancers” and grouped the donor site interactions in CDKN2A and CDKN2B as “CDKN2A/B”. We confirmed the interactions between the associated enhancers and surrounding genes identified by the 3D-DSL approach using traditional methods. The interaction between the associated enhancers and IFNA21, which spans 946 kb, was assayed by fluorescence in situ hybridization (FISH). We examined whether or not treatment with IFNγ would affect this long-range interaction and observed that the enhancers and IFNA21 are in close proximity in 58% vs 37% of the chromosomes analyzed in HUVEC nuclei in the presence and absence of IFNγ, respectively (p<1.63 10−25 Supplemental Figure 2 and Supplementary Table 7). These data indicate that this interaction occurs basally, but is significantly remodeled upon treatment with IFNγ. We validated more closely interacting loci via site specific PCR on the religated 3C DNA and verified that the interaction between the associated enhancers and CDKN2A/B locus is strengthened in the presence of IFNγ whereas the interaction with MTAP is reduced (Figure 4b).
In conclusion, our study shows that IFNγ stimulation and STAT1 binding at ECAD9 play important roles in regulating the expression of CDKN2B and CDKN2BAS. Our findings are consistent with previous studies observing a correlation between CAD risk variants and CDKN2A/B expression in lymphocytes10, and reduced expression of CDKN2A/B in cardiac and other tissues in a mouse model with the orthologous CAD interval deleted 8. We also demonstrate that the associated enhancers physically interact with the intervals encoding CDKN2A/B, MTAP and IFNA21 in HUVEC and that these interactions are remodeled upon IFNγ treatment, thus suggesting that the STAT1 binding site in ECAD9 is a key regulatory sequence. STAT1 is a downstream effector of the pathway that mediates response to inflammation, which is associated with angiogenesis24 and atherosclerosis pathogenesis25 in endothelial tissue. Thus, in endothelial tissues, the biological effects of the rs10811656 and rs10757278 risk alleles disrupting the ECAD9 STAT1 binding site might be augmented during activation of the inflammatory response. It is likely that the unique enhancer landscape of 9p21 is governed by higher order chromatin structure and thus involved in different temporal- or tissue- specific expressions of additional genes than those identified in our study 26. Future studies assessing the effects of variants on the 9p21 chromatin landscape under different physiological conditions and cells types will undoubtedly provide additional insights into the association of this interval with CAD and T2D susceptibility.
The 2 HUVEC cell lines used in the study were derived from male Caucasian donors genotyped as heterozygotes for the 9p21 CAD associated SNPs. The 4 EBV transformed lymphoblastoid cell lines (LCL) were selected for their genotypes (2 homozygous CAD risk or 2 homozygous CAD non-risk) using the HapMap data27. Experiments were performed within 2–4 passages by treating cells with 100 ng/ml of IFNγ (R&D) for 24 hrs. Treated and untreated cells were subjected to RT-PCR (HUVEC), ChIP (HUVEC and LCL), 3C analysis (HUVEC) or FISH (HUVEC).
Probe design: We designed 12 acceptor probes in the interval chr9:22100523-22126469 (hg18; spanning from ECAD7 to the ET2D2 enhancer), on both strands immediately 3′ of the 6 BamHI sites in the region. We designed 290 donor probes on both strands immediately 5′ of the 145 BamHI sites in the interval chr9:21035934-22494089 (hg18; except where acceptor probes were designed). A universal sequence added to the probes is compatible with Illumina GA adapters for direct sequencing. The 12 acceptor probes and 290 donor probes (Supplemental Table 8) were pooled in equimolar amounts, separately. 3D-DSL sequencing: The DSL ligation products were prepared as described in Kwon et al.23. 3C was performed as per Lieberman-Aiden et al 28 and the products were sheared by sonication. The sheared DNA (200–600 bp) was purified and biotinylated. Donor and acceptor probes pools were annealed to the biotinylated 3C samples and the biotinated DNA was bound on to streptavidin magnetic beads. The 5′ phosphate of acceptor probes and 3′ OH− of donor probes were ligated using Taq DNA ligase. The ligated products were washed and eluted from the streptavidin magnetic beads, followed by PCR amplification and deep sequencing on the Illumina GA2 (Supplemental Information).
We thank X. Wang, K. Trevarthen and M. Nakano for experimental assistance. We thank the Scripps Genomic Medicine Clinical team for sample collection. This work was partially supported by NCRR grants 1U54RR025204 and 1UL1RR031980-01 and NIH grants HL065445, DK39949, DK018477, DK074868 and 1R21CA152613-01. MGR is an Investigator with HHMI.
Authors ContributionsOH, DN, BR, EJT, KAF and MGR designed the experiments. DN, XS, NGR, NH and XDF carried out the experiments. OH, DN, XS, BT, KAF and MGR contributed to analyzing the data. OH, DN, KAF and MGR wrote the manuscript.
Author Information The authors declare no competing interests