|Home | About | Journals | Submit | Contact Us | Français|
Tissue-specific transcriptional regulation is central to human disease1. To identify regulatory DNA active in human pancreatic islets, we profiled chromatin by FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements)2–4 coupled with high-throughput sequencing. We identified ~80,000 open chromatin sites. Comparison of islet FAIRE-seq to five non-islet cell lines revealed ~3,300 physically linked clusters of islet-selective open chromatin sites, which typically encompassed single genes exhibiting islet-specific expression. We mapped sequence variants to open chromatin sites and found that rs7903146, a TCF7L2 intronic variant strongly associated with type 2 diabetes (T2D)5, is located in islet-selective open chromatin. We show that rs7903146 heterozygotes exhibit allelic imbalance in islet FAIRE signal, and that the variant alters enhancer activity, indicating that genetic variation at this locus acts in cis with local chromatin and regulatory changes. These findings illuminate the tissue-specific organization of cis-regulatory elements, and show that FAIRE-seq can guide identification of regulatory variants important for disease.
Pancreatic islets are composed of endocrine cells that secrete insulin, glucagon, and other polypeptide hormones. Islet cells are essential for glucose homeostasis, and thus elucidating the transcriptional control of islet-cell function and growth has implications for understanding diabetes pathogenesis and treatment6,7. Gene regulatory elements often function by recruiting DNA-associated proteins to specific loci, a process that typically results in local nucleosome eviction8. Nucleosome loss, or formation of “open chromatin”, is an evolutionarily conserved indicator of regulatory activity and can be used as a molecular tag to isolate regions of the genome bound by regulatory factors9.
For three samples of purified human pancreatic islets, we used FAIRE2–4 to identify sites of open chromatin (Fig. 1a, Table 1, Supplementary Table 1 online). We technically validated FAIRE-seq by its high concordance to patterns determined by hybridization of the same FAIRE samples to tiling DNA microarrays (Fig. 1b, Supplementary Fig. 1a online). Furthermore, we found that despite differences in age, cause of death, genotype, islet isolation procedures, and level of exocrine cell contaminants, the majority of regions identified by FAIRE in any one sample were also detected in the others (Supplementary Fig. 1b online). Thus, FAIRE-seq is a robust method for characterizing chromatin in islets.
Several lines of evidence indicate that FAIRE reliably identifies active regulatory elements in islets. Consistent with FAIRE in fibroblasts2, the most enriched FAIRE regions were found near known Transcription Start Sites (TSS) (Fig. 2a). Overall, there was a positive relationship between FAIRE signal near TSS and transcript levels in human islets10 (Fig. 2a). Furthermore, promoters previously shown to bind RNA Polymerase II and the transcription factors HNF4A and HNF1A in islets11 were enriched by FAIRE more frequently than other promoter regions (Fig. 2b).
To extend this observation, we identified regulatory regions that were utilized selectively in islets relative to other cell types. We compared FAIRE-seq data from islets to data from five non-islet cell lines (HeLa-S3, HUVEC, GM12878, HepG2 and K562; Methods) and found that 45% of islet open chromatin sites were unique to islets among this group of cell types. We refer to these sites as islet-selective open chromatin. We identified 340 RefSeq genes with islet-selective open chromatin in the TSS or gene body (Supplementary Table 2 online). This relatively short list included 24 well characterized genes that are selectively expressed in islets (Supplementary Table 3 online), including genes involved in human diabetes (PDX1, ABCC8, SLC30A8, G6PC2, GAD2) and islet developmental regulators (NEUROD1, NKX6-1, PAX6, ISL1)6,7,12–14. Therefore, islet-selective open chromatin detected by FAIRE identifies loci integral to islet-cell biology and disease.
Many sites of open chromatin detected by FAIRE are located in intergenic regions, far (>2 kb) from a known TSS. For these distal sites, evidence also points strongly toward a regulatory function. First, distal intergenic open chromatin sites were enriched in evolutionary conserved sequences, predicted transcription factor binding sites and regulatory modules, regardless of whether the open chromatin was islet-selective or ubiquitous (shared by all six cell types) (Fig. 2c, Supplementary Fig. 1d and Supplementary Table 4 online, Methods). Second, ubiquitous intergenic open chromatin often coincided with binding sites of CTCF15–17 (observed 16%, expected 0.39%, P<0.001) (Fig. 2c), a transcriptional regulator and insulator protein that binds to a large number of genomic sites, many of which are shared in different cell types16. Open chromatin at CTCF sites was centered at the location of the CTCF binding15, suggesting that FAIRE signal is indicative of interactions between regulatory factors and DNA (Fig. 2d). Third, intergenic islet-selective (but not ubiquitous) open chromatin preferentially harbors DNA-binding motifs of pancreatic islet transcription factors, including RFX, TCF1/HNF1, HNF3B, FOXD, and MAF (P<0.01, Supplementary Table 4 online). Notably, whereas ~36% of ubiquitous open chromatin was located within 2 kb upstream of a TSS or in the first exon, only 1% of islet-selective open chromatin was located in these regions (Fig. 2e, Supplementary Fig. 1c online). Collectively, these findings indicate that distal FAIRE sites harbor regulatory elements, and consistent with recent studies of histone modification patterns in enhancer regions18 suggest that most cell-type specific open chromatin is located far from known TSS.
We next sought to link these distal islet-selective elements to specific genes. We examined whether islet-selective sites exhibit a higher-order organization that could point to the existence of functional domains. We found that open chromatin sites were not evenly distributed throughout the genome, but instead were located in physically linked clusters (Fig. 3a). Clustering was also observed with islet-selective open chromatin (Fig. 3b). We identified 3,348 domains containing at least three islet-selective open chromatin sites separated by less than 20 kb, which we call islet-selective COREs (Clusters of Open Regulatory Elements) (Fig. 3c, Supplementary Table 5 online, Methods). Islet-selective COREs had a median size of 25 kb, with the largest containing 148 FAIRE sites spanning 602 kb (Fig. 3d). Consistent with CTCF binding to insulator elements separating chromatin domains19, the frequency of CTCF binding sites was two-fold higher outside than within COREs (P=1.3×10−48). This suggested that islet-selective COREs were functional chromatin domains and provided an avenue to assigning open chromatin sites to genes.
Islet-selective COREs were located within 10 kb of at least one RefSeq gene in 69% of cases (randomized COREs=54%; P=1.5×10−35, Fig. 3e). Of these, 94% were associated with only one gene, and most were contained within 2 kb of gene boundaries (Supplementary Fig. 2 online) suggesting single-gene regulatory function (expected=84%; P=6.2×10−23, Fig. 3e, Methods). Compared to six other primary tissues, genes overlapping islet COREs had higher expression in islets and brain (one-way ANOVA, both tissues P<1×10−5, Fig. 3f), consistent with the neuroendocrine nature of islet-cells20. Islet-selective COREs were also enriched in genes linked to islet-specific functions, including transcription factors, ion channels, and secretory apparatus components (Table 2, Supplementary Table 6 online). Thus, islet-selective COREs are typically linked to single genes that are expressed in an islet-selective manner.
A subset of islet-selective COREs spanned remarkably broad distances at loci encoding critical regulators of pancreas development and function (Fig. 3g, Supplementary Table 7 and Supplementary Figs. 2 and 3 online). For example, an islet-selective CORE spanned a 46-kb domain containing PDX1, a master regulator of pancreas development and β-cell function7 (Fig. 3g). At this locus, FAIRE sites coincided with previously characterized evolutionarily conserved enhancers named “Area I–IV”21,22 and with uncharacterized putative enhancer sites (Fig. 3h). Other islet-selective COREs included a 94-kb domain 3′ of NKX6-1, an essential regulator of β-cell differentiation23, one located in a cluster of brain-enriched snoRNA and miRNAs24, and another in conserved sequences >400 kb from any annotated gene (Supplementary Fig. 3 and Supplementary Table 7 online for additional examples). Such domains contrasted with loci devoid of open chromatin and known to be inactive in islets (Supplementary Fig. 3q–s). This dataset thus provides a rich resource to dissect cis regulation in pancreatic islets.
Recent genome-wide association studies for T2D susceptibility have implicated sequence variants at multiple loci, many of which may impair islet-cell function13,14,25–27. Many T2D susceptibility loci do not contain strongly associated variants in protein-coding regions, suggesting that the underlying functional variants regulate gene activity. Furthermore, at each locus, most associated SNPs are not expected to directly affect disease risk and are instead in linkage disequilibrium with one or more functional variant(s). We sought to use our open chromatin map to guide identification of functional regulatory SNPs. We identified known SNPs mapping to islet FAIRE sites and focused on 20 loci harboring variants associated with T2D or fasting glycemia (FG)5,13,14,25,28 (Fig. 4a, and Supplementary Table 8 online). Of 350 SNPs in strong linkage disequilibrium with a reported SNP associated with T2D or FG (Methods), 38 SNPs at 10 loci overlapped islet FAIRE regions (Fig. 4a, and Supplementary Table 8 online). Notably, rs7903146 in TCF7L2, which shows consistent T2D association in samples across diverse ethnic groups29, is located in an islet-selective open chromatin site (Fig. 4b, and Supplementary Fig. 4a online).
The presence of rs7903146 in a FAIRE-enriched site allowed us to test directly whether sequence variation at this locus correlates with chromatin state in islet cells. We tested 31 human islet samples and identified nine individuals heterozygous at rs7903146. Using two independent assays, FAIRE-isolated DNA from heterozygous individuals exhibited a T:C allelic ratio that was significantly greater than observed from input genomic DNA or from genomic DNA from unrelated heterozygote individuals (Real-time PCR: input: 49.3±1.0% T allele, FAIRE: 57.3±2.8% T allele, P=2.1×10−5, Fig. 4c; Quantitative sequencing: input: 57.5 ± 2.7% T allele, FAIRE: 66.2 ± 4.6% T allele, P=0.004, Fig. 4d and Supplementary Fig. 4b online). Thus, in human islet cells, the chromatin state at rs7903146 is more open in chromosomes carrying the T allele, which is associated with increased T2D risk5.
Next, we created allele-specific luciferase reporter constructs and measured enhancer activity in two islet β-cell lines, MIN6 and 832/13. Allelic differences in enhancer activity were observed in both cell lines. The T allele showed significantly greater enhancer activity compared to the C allele in both orientations (Forward: MIN6 P=1.6×10−7, 832/13 P=0.005; Reverse: MIN6 P=3.1×10−7, 832/13 P=3.1×10−4, Fig. 4e,f, Supplementary Fig. 4c,d online). However, allele-specific differences were not observed in the human embryonic kidney cell line 293T (Supplementary Fig 4e online). These data suggest that sequence variation at TCF7L2 affects T2D susceptibility by altering cis regulation and local chromatin structure in islet cells. The results are consistent with a previous report of association between the T allele and increased TCF7L2 transcripts in islets29, although the allele-specific changes described here can potentially impact different genomic regulatory functions, including transcriptional rates, promoter usage, or splicing.
To our knowledge, this study represents the first high-resolution atlas of regulatory elements in pancreatic islets. The unbiased maps generated by FAIRE-seq reveal new insights regarding the organization of tissue-specific cis-regulatory elements. Many earlier studies have shown that the genome is functionally organized in chromosomal territories1,30–32. Our observations extend previous findings by uncovering the existence of a large number of cell-selective regulatory domains associated with single genes, and provide a foundation for mechanistic understanding of transcriptional regulation of genes important for pancreatic islet cells. Identification of regulatory sites in a disease-relevant primary tissue also serves to dramatically reduce the genomic space when searching for functional non-coding sequence variants that influence T2D susceptibility. More generally, the current study provides a framework to move forward from the identification of large sets of disease-associated variants toward understanding the subset of functional variants that underlie disease risk.
Human islet FAIRE-seq raw data, alignments in the form of mean-centered base counts, and enriched sites at the three thresholds for all three samples can be obtained from Gene Expression Omnibus (GSE17616).
All experiments were performed according to protocols approved by the Institutional ethical committees of the Hospital Clinic de Barcelona, Geneva University Hospitals, Istituto Scientifico Ospedale San Raffaele, and Hospital Universitari de Bellvitge. All samples were isolated from multiorgan donors without a history of glucose intolerance after informed consent from family members. Information on samples used for FAIRE-Seq are provided in Supplementary Table 1 online. Pancreatic islets were isolated according to established procedures33. After isolation, islets were cultured in CMRL 1066 containing 10% FCS and shipped at room temperature in the same medium. Samples 1 and 2 were processed upon arrival, while sample 3 and subsequent samples used in locus-specific assays were recultured in RPMI 1640 containing 10% FCS for three days before performing FAIRE. Islets were rinsed with PBS three times, and crosslinked for 10 min in 1% formaldehyde at room temperature with constant shaking. After adding glycine (final concentration 125 mM), islets were rinsed with PBS containing protease inhibitor cocktail (Roche) at 4°C, snap frozen, and stored at −80°C. Islet purity was assessed by dithizone staining34 immediately prior to fixation. The accuracy of dithizone staining was verified by immunofluorescence analysis using DAPI, anti-insulin, and anti-glucagon antibodies35.
For all but 2 samples (see below), FAIRE was performed as described2 with modifications. Frozen pellets with ~3000 crosslinked islets were thawed on ice in 1 mL lysis buffer (2% Triton X-100, 1% SDS, 100 mM NaCl, 10 mM Tris-Cl at pH 8.0, 1 mM EDTA) and disrupted with five 1-minute cycles using 0.5 mm glass beads (BioSpec). Samples were sonicated for 10–20 rounds of 30 pulses (1 second on/0.5 second off) using a Branson Sonifier 450D at 15% amplitude. After 10 rounds the efficiency of sonication was assessed, and further rounds were performed when needed to ensure that the majority of chromatin fragments were in the 200–1000 bp range. Debris was cleared by centrifugation at 15,000 g for 5 minutes at 4°C. Nucleosome-depleted DNA was extracted with phenol-chloroform followed by ethanol precipitation and RNase A (100 µg/mL) treatment2.
For samples 1 and 2 we employed a modified procedure that yielded less consistent chromatin fragmentation. Cells were incubated with 50 mM HEPES (pH 8.0), 140 mM NaCl, 1 mM EDTA, 0.1% SDS, 0.1% sodium deoxycholate, then centrifuged at 9,000 g 10 min. The pellet was resuspended in 50 mM HEPES (pH 8.0), 140 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% SDS, 0.1% sodium deoxycholate, and then processed as described above.
Libraries were generated from gel-purified ~200 bp DNA fragments. After adaptor ligation and PCR-based amplification, samples were sequenced on the Illumina Genome Analyzer II platform using standard procedures. Sequence reads were aligned to the human reference genome (hg18) using Mapping and Assembly with Qualities (MAQ) with default mapping parameters 36. Post-alignment processing removed all reads that had an overall MAQ mapping quality <30 and artificially extended each read to a final length of 200 bp. We counted filtered reads mapping to each base in the genome to obtain a read density for each base. To facilitate display, read densities were centered on the mean read density of each chromosome.
Sites of FAIRE-seq enrichment were assessed with F-Seq37, which uses a kernel density estimate to calculate genomic regions where the continuous probability is greater than a user-defined standard deviation threshold over the mean across a local background. We used a feature length of 1,000 and three standard deviation (s.d.) thresholds resulting in three sets of enriched regions for each sample. The most liberal threshold was set for each sample using an empirical estimation of the upper bounds on the number of nucleosome-depleted regions genome-wide (roughly 200,000). For sample 3, the thresholds used were s.d.=6 (liberal), s.d.=8 (moderate) and s.d.=10 (stringent). For samples 1 and 2, which were sequenced to a lower depth, the thresholds used were s.d.=4 (liberal), s.d.=5 (moderate) and s.d.=8 (stringent).
We estimated the mappable proportion of the reference genome in two ways, using 120 million randomly generated reads (2.7×109 mappable bases, ~85% of hg18) and using PeakSeq (2.8×109 mappable bases, ~89% of hg18)38. We independently calculated genome coverage using 125 million reads obtained from islet FAIRE and found it to be 97.9% and 97.7% concordant, respectively.
Sequence reads obtained from five major ENCODE cell lines (GM12878, HeLa-S3, HUVEC, K562, and HepG2) were aligned to the reference genome (hg18) using MAQ36 and filtered as described above. Sites of enrichment were determined using F-Seq37, using the same parameters as islet samples 1 and 2.
Islet FAIRE preparations from samples 1 and 2 were fluorescently labeled and hybridized to a tiling DNA microarray covering 1% of the genome selected for the ENCODE pilot project1. Sites of enrichment were called using ChIPOTle39. For Receiver Operating Characteristic (ROC) curve analysis, sites of enrichment were called using a p-value threshold of 1×10−12.
We recorded the percentage of bases underlying FAIRE-seq sites that overlapped 28-species conserved elements40, predicted regulatory modules (PreMod)41 and transcription factor binding sites (TRANSFAC42 and MotifMap43). For each set of peaks we permuted positions across the mappable genome 1,000 times and re-calculated the overlap. P values were calculated from permutations that had a higher degree of overlap than the observed set of peaks. We used Clover to test for over-represented transcription factor binding motifs in sequences underlying intergenic FAIRE-seq enrichment44. Sequences were separated by chromosome and analyzed for motifs from JASPAR45 and TRANSFAC42, as well as the CTCF motif15. Significance was calculated by comparing to the mappable intergenic portion of the tested chromosome, and motifs reaching a p-value threshold of .01 were reported.
We used RMA-normalized signals from a previously reported experiment using HG-U133A and HG-U133B GeneChips with five non-diabetic islet samples10, and obtained an average value for each probe. The five samples were selected by hierarchically clustering expression data from 7 non-diabetic individuals. We excluded two samples (Sydp2 and SydPI) that had poor concordance with the others and showed low expression of known islet genes. We counted the number of FAIRE-Seq reads mapping to each base in a 1 kb window surrounding each RefSeq TSS, grouped RefSeq genes by their average islet expression level, and, for each group, calculated the average mean-centered FAIRE read density at every base in the window.
An islet FAIRE-seq site was considered islet-selective if the site did not overlap a site from any of the five additional tested cell types. Note that such sites are not expected to be necessarily unique to islets. An islet site was considered ubiquitous if the site overlapped a FAIRE-seq site in all five additional cell types. Moderate stringency FAIRE-seq site thresholds were used for all datasets.
For each RefSeq transcript we assessed the region 2 kb upstream through 2 kb downstream and calculated the percentage of bases that overlapped a moderate islet FAIRE enrichment site. We calculated the same value in the combined data from five non-islet cell lines, and selected genes that were more or equally enriched in islets compared to combined non-islet data.
We identified 3,348 regions with at least three islet-selective sites (as defined above) located <20 kb from each other using Galaxy46. These criteria were based on the typical size of islet-selective clusters (Fig. 3b). For comparisons we created a set of randomized COREs of identical size and mappability. To assess CTCF binding enrichment, we obtained CTCF binding sites from multiple cell types17, calculated the frequency within COREs (0.007 sites/kb), in randomized COREs (0.013 sites/kb), and in the mappable genome (0.013 sites/kb), and tested for significance using a two-sided χ2 test.
RefSeq, ncRNA, or Non-RefSeq Unigene transcripts were assigned to a CORE when the transcriptional start or end site was within 10 kb of either end of the CORE. When comparing CORE overlaps with one or more genes we required a minimum overlap of one bp. We used DAVID47 to test enrichment of biological processes in CORE genes, and used all RefSeq genes as a background. We employed all biological processes from GO and PANTHER48, removing GO terms associated with >1,000 genes from the analysis.
For gene expression comparisons, we obtained gcRMA-normalized signals from Human U133A and GNF1H microarrays from seven tissues (pancreatic islet, liver, heart, kidney, lung, skeletal muscle and whole brain)49. We required that probes reported a signal of >100 in at least one tissue. We used one-way ANOVA to assess expression differences between tissues for CORE genes and for an identical number of randomly selected genes that did not overlap a CORE.
We identified 20 loci where SNPs have been reported to show genome-wide association (P<5×10−8) with T2D or fasting glycemia14,25,26,50–54. For each locus we identified variants in HapMap CEU release 22 in strong linkage disequilibrium (r2>0.8) with the reported reference SNP; 350 variants satisfied these criteria and were termed ‘T2D-associated SNPs’. We then defined the region of association for each locus by manually identifying recombination hotspots from HapMap release 22 data flanking the associated SNPs. We identified SNPs in dbSNP v129 with an average heterozygosity >1% in these regions. SNPs that overlapped FAIRE sites in sample 3 were recorded.
We genotyped 31 human islet genomic DNA (gDNA) samples using TaqMan SNP Genotyping Assays (Applied Biosystems). Nine samples were heterozygous for rs7903146. For these samples we used TaqMan SNP Genotyping Assays to determine the allelic ratio of DNA fragments containing rs7903146 in FAIRE and input DNA. All reactions were performed in triplicate in a volume of 25 µl using 5 ng DNA quantified with a Nanodrop 1000 (Thermo Scientific). A standard curve was generated by mixing gDNA from samples with known genotype to generate seven allelic ratios - 10:90, 20:80, 40:60, 50:50, 60:40, 80:20, and 90:10. The relative abundance of T and C alleles in each experimental sample was estimated from the standard curve, and compared to input DNA from the same samples and gDNA from unrelated heterozygous individuals. Allelic ratio was also assessed in seven samples heterozygous for rs7903146 by quantitative Sanger sequencing in FAIRE and input DNA (oligonucleotides shown in Supplementary Table 9 online). ImageJ was used to quantify the area under the curve of peaks in the chromatogram. Data are expressed as mean ± s.d. and were assessed with two-sided unpaired t-tests for gDNA vs. FAIRE, or paired t-tests for input vs. FAIRE.
To confirm islet-selective FAIRE enrichments, we employed real-time PCR with SYBR Green detection as described55,56. We performed triplicate measurements from 5 ng FAIRE DNA, and used a serial dilution of input DNA as the standard curve. We expressed FAIRE enrichment values relative to the enrichment values in the same sample at a local negative control region.
A 240 bp fragment surrounding rs7903146 was PCR-amplified from DNA of individuals homozygous for either the T or C allele of rs7903146 (oligonucleotides shown in Supplementary Table S9 online). PCR fragments were cloned in both orientations in the multiple cloning site of the minimal promoter-containing firefly luciferase reporter vector pGL4.23 (Promega). Four independent clones for each allele for each orientation were verified by sequencing and transfected in duplicate into MIN657 and 832/13 (Chris Newgard, Duke University) β-cell lines, and into HEK293T cells. Cells were co-transfected with a phRL-TK Renilla-luciferase vector to control for transfection efficiency. Transfections were performed with lipofectamine 2000 (for MIN6 and HEK 293T; Invitrogen) or FUGENE-6 (for 832/13; Roche Diagnostics). Cells were assayed 48 hours after transfection using the Dual Luciferase Assay (Promega). Firefly luciferase activity was normalized to Renilla luciferase activity and then divided by values for a pGL4.23 minimal promoter empty vector control. A two-sided t-test was used to compare luciferase activity between alleles. Experiments in MIN6 and 832/13 cells were carried out on a second independent day and yielded comparable allele-specific results.
We thank Xavi Garcia for experimental support in rs7903146 functional studies, Ignasi Moran for insights and help in the analysis of COREs, Dr Pierre Maechler (University of Geneva Medical Center) for generously providing insulin release data, Drs. Lorenzo Piemonti and Rita Nano (San Raffaele Scientific Institute), and Dr. Montserrat Nacher (IDIBELL) for human islets. This work was supported by the European Union VI Framework Programme project Eurodia to JF, Ministerio de Ciencia e Innovación (SAF2008-03116) to JF, Juvenile Diabetes Research Foundation (26-2008-633 to JF, 31-2008-416 to TB, 6-2005-1178 and 31-2008-416 to AS), the US NHGRI ENCODE project (U54 HG004563 subcontract to JDL), and R01 DK072193 to KLM. KLM is a Pew Scholar in the Biomedical Sciences.
JF and JDL conceived the study. KJG, TN, LP, JMS, KLM, JDL, and JF designed the experiments, interpreted results, and wrote the manuscript. TN conducted FAIRE experiments, developed and performed allelic imbalance assays. PGG optimized the FAIRE protocol and performed microarray studies. KJG, JMS, and PGG performed sequence analysis and KJG, LP, TN and JMS performed data analysis. LP conducted the analysis of CORES. MPF and TMP conducted reporter assays. PM conducted high-throughput sequencing. AS, DB, TB, and EM provided purified human islet samples.
Competing Financial Interests
The authors declare no competing financial interests.