|Home | About | Journals | Submit | Contact Us | Français|
The human body is composed of diverse cell types with distinct functions. While it is known that lineage specification depends on cell specific gene expression, which in turn is driven by promoters, enhancers, insulators and other cis-regulatory DNA sequences for each gene1–3, the relative roles of these regulatory elements in this process is not clear. We have previously developed a chromatin immunoprecipitation-based microarray method (ChIP-chip) to locate promoters, enhancers, and insulators in the human genome4–6. Here, we use the same approach to identify these elements in multiple cell types and investigated their roles in cell type-specific gene expression. We observed that chromatin state at promoters and CTCF-binding at insulators are largely invariant across diverse cell types. By contrast, enhancers are marked with highly cell type-specific histone modification patterns, strongly correlate to cell type-specific gene expression programs on a global scale, and are functionally active in a cell type-specific manner. Our results defined over 55,000 potential transcriptional enhancers in the human genome, significantly expanding the current catalog of human enhancers and highlighting the role of these elements in cell type-specific gene expression.
We performed ChIP-chip analysis as previously described5 to determine binding of CTCF (insulator-binding protein) and the coactivator p300, and patterns of histone modifications in five human cell lines: cervical carcinoma HeLa, immortalized lymphoblast GM06690 (GM), leukemia K562, embryonic stem cells (ES), and BMP4-induced ES cells (dES). We first investigated 1% of the human genome selected by the ENCODE Consortium7, using DNA microarrays consisting of 385,000 50-mer oligos that tile 30 million basepairs (bp) at 36bp resolution. We examined mono- and tri-methylation of histone H3 lysine 4 (H3K4me1, H3K4me3) and acetylation of histone H3 lysine 27 (H3K27ac) at well-annotated promoters, reasoning that the state of these histone modifications would vary in a cell type-specific manner. To our surprise, the chromatin signatures at promoters are remarkably similar across all cell types (Figure 1A). Quantitative comparison of ChIP-chip enrichment (see Supplementary Information) revealed highly-correlated histone modification patterns at promoters across all cell types, with an average Pearson correlation coefficient of 0.71 (Figure S1A). This observation also holds for the larger set of Gencode promoters (Figure S2).
Next, we identified putative insulators in the ENCODE regions for these cell types based on CTCF binding, as mammalian insulators are generally understood to require CTCF to block promoter-enhancer interactions3. We observed nearly identical CTCF binding sites (Table S1, Figure S1E) and highly-correlated CTCF enrichment patterns across all five cell types (Figure S1B), providing experimental support for the mostly cell type invariant function of CTCF as suggested by DNase hypersensitivity mapping results8.
We then investigated transcriptional enhancers in the ENCODE regions, performing ChIP-chip in HeLa, K562, and GM cells to locate binding sites for the transcriptional coactivator protein p300 (Tables S2–S4) as p300 is known to localize at enhancers9. We observed highly cell type-specific histone modification patterns at distal p300 binding sites (Figure S1F), in sharp contrast to the similarities in histone modifications across cell types at promoters. We then employed our chromatin signature-based prediction method5 to identify additional enhancers in the ENCODE regions in these cell types (Figure 1B, Table S5–S9). In addition to the characteristic H3K4me1 enrichment, predicted enhancers are frequently marked by acetylation of H3K27, DNaseI hypersensitivity, and/or binding of transcription factors and coactivators, and many contain evolutionarily conserved sequences (Figure S3–S4, see Supplementary Information). Unlike promoters and insulators, but similar to p300 binding sites, the histone modification patterns at predicted enhancers are largely cell type-specific (Figure 1B, S1D), in agreement with observations that H3K4me1 is distributed in a cell type-specific manner10.
These results suggest that enhancers are the most variable class of transcriptional regulatory element between cell types and are likely of primary importance in driving cell type-specific patterns of gene expression. Knowledge of enhancers is therefore critical for understanding mechanisms that control cell type-specific gene expression, yet our incomplete knowledge of enhancers in the human genome has confined previous studies of gene regulatory networks mainly to promoters. To identify enhancers on a genome-wide scale and facilitate global analysis of gene regulatory mechanisms, we performed ChIP-chip throughout the entire human genome as described6, mapping enrichment patterns of H3K4me1 and H3K4me3 in HeLa cells. Using previously described chromatin signatures for enhancers5, we predicted 36589 enhancers in the HeLa genome (Figure 2A, Table S10, see Supplementary Information). This method correctly located several previously characterized enhancers, including the β-globin HS2 enhancer11 and distal enhancers for the PAX612 and PLAT (t-PA)13 genes (Figure 2B). Most predicted enhancers are distal to promoters (Figure 2C), exhibit strong evolutionary conservation (see Supplementary Information), and are marked by histone acetylation (H3K27ac), binding of coactivator proteins (p300, MED1), or DNaseI hypersensitivity (DHS) (Figure 2A, 2D) (see Supplementary Information). We verified the functional potential of predicted HeLa enhancers using luciferase reporter assays as described5 (see Methods). Of nine predicted enhancers that we evaluated, seven (78%) were active in reporter assays (Figure 2E, Table S11), with median activity significantly different from random genomic regions (p = 3.25 × 10−4). These results support the suitability of using chromatin signatures to identify genomic regions with enhancer function.
We evaluated the predicted enhancers for conserved motif-like sequences using several hundred shuffled TRANSFAC motifs across 10 mammals in a phylogenetic framework that tolerates motif movement, partial motif loss, and sequencing or alignment discrepancies (see Methods). Predicted enhancers showed conservation for 4.3% of instances (at Branch-Length-Score > 50%, see Methods), substantially greater than for the remaining intergenic regions (2.9%, p < 1 × 10−100) and even promoter regions (3.9%, p = 1 × 10−57). Additionally, testing a list of 123 unique TRANSFAC motifs as described14 (see Supplementary Information), we found that 67 (54%) are over-conserved and 39 (32%) are enriched in predicted enhancers (Table S12). We also performed de novo motif discovery in enhancer regions using multiple alignments of 10 mammalian genomes (see Methods), revealing 41 enhancer motifs, of which 19 match known transcription factor motifs while 22 are novel (Table S13). These motifs show conservation rates between 7% and 22% in enhancers (median 9.3%), compared to only 1.1% for control shuffled motifs of identical composition. Furthermore, over 90% of these motifs appear to be unique to enhancers, as only 4 motifs are enriched in promoter regions and 12 are in fact depleted in promoters (Table S13), indicating that predicted enhancers contain unique regulatory sequences that may be specific to enhancer function.
To investigate the association of predicted enhancers with HeLa-specific gene expression, we used Shannon entropy15 to rank genes by the specificity of their expression levels in HeLa as compared to three other cell lines (K562, GM06990, IMR90) (Figure S5, see Supplementary Information), then plotted the distribution of enhancers around genes within insulator-delineated domains (as defined by CTCF binding sites in Figure S6, see Supplementary Information). Predicted enhancers are strikingly enriched near HeLa-specific expressed genes (Figure 3A), particularly within 200 kb of promoters. We observed a 1.83-fold enrichment (p = 4.71 × 10−279) of predicted enhancers around HeLa-specific expressed genes relative to random (see Supplementary Information) and significant depletion of enhancers around non-specific expressed genes (p = 5.43 × 10−15) and HeLa-specific repressed (p = 4.63 × 10−2) genes.
To more directly investigate the relationship between chromatin modification patterns at enhancers and cell type-specific gene expression, we expanded our global analysis to another cell type. We performed genome-wide ChIP-chip for H3K4me1 and H3K4me3 in K562 cells and identified 24566 putative enhancers in this cell type using our chromatin signature-based enhancer prediction method (Table S14) (see Supplementary Information). Consistent with results in the ENCODE regions, the vast majority of enhancers predicted in K562 and HeLa cells are unique to either cell type (Figure 3B) even though most expressed genes are common between the cell types (Figure 3C). Chromatin modification profiles at predicted enhancers throughout the genome are highly cell type-specific (Figure 3D), with a Pearson correlation coefficient of −0.32. Furthermore, these differences seem to have regulatory implications, as domains with HeLa-specific expressed genes are enriched in HeLa enhancers but depleted in K562 enhancers, and vice-versa (Figure 3E) (see Supplementary Information). These observations hold across all five cell types in the ENCODE regions (see Supplementary Information). To assess the cell type-specificity of enhancer activity, we cloned enhancers predicted specifically in K562 cells (and not in HeLa cells) and subjected them to reporter assays in HeLa cells as described above. Of nine K562-specific enhancers tested, only two (22%) were active in HeLa cells (Figure S7), and the median activity of the K-562 specific enhancers was not significantly different from random (p = 0.11), suggesting that the enhancer chromatin signature is a reliable marker of cell type-specific enhancer function.
Though most enhancers are cell type-specific, the presence of predicted enhancers shared by HeLa and K562 (Figure 3B, 3D) suggests that some enhancers may be active in multiple cell types or conditions. We compared the HeLa enhancer predictions with the results of several genome-wide studies of binding sites for sequence-specific transcription factors in different cell types, namely estrogen receptor16 (ER), p5317, and p6318 in MCF7, HCT116, and ME180 cells, respectively. Interestingly, significant percentages of binding sites for each transcription factor (from 21.4% to 32.6%) overlap with predicted enhancers in HeLa cells (Figure 4A, Table S15), in sharp contrast to a significant depletion of the repressor NRSF/REST19 at predicted enhancers and minimal overlap with CTCF-binding sites (see Supplementary Information).
To examine the potential role of enhancers in regulating inducible gene expression, we treated HeLa cells with the cytokine interferon-gamma (IFNγ) and identified binding sites for the transcription factor STAT1 throughout the genome using ChIP-chip. STAT1 generally binds its target DNA sequences only after IFNγ induction20 with a small fraction of binding possible prior to induction21. In IFNγ-treated HeLa cells, we identified 1969 STAT1 binding sites (Table S16), with 85.8% of STAT1 binding sites occurring distal to Known Gene 5′-ends. Comparison of these distal STAT1 binding sites with recent ChIP-seq analysis of STAT1 binding in uninduced HeLa cells21 shows only 6.5% of IFNγ-induced STAT1 binding sites are occupied by STAT1 prior to induction. We observed that 429 distal STAT1 binding sites overlapped enhancers predicted in HeLa cells prior to induction (Figure 4A, Table S15). The H3K4me1 enhancer chromatin signature is present prior to induction at these STAT1 binding sites, which we designated as STAT1 group I, while no evidence of this signature is visible at the remaining 1260 distal STAT1 binding sites, designated STAT1 group II (Figure 4B). Intriguingly, we observed significant relative induction of expression of genes in the domains of STAT1 group I binding sites after just 30 minutes of IFNγ-induction, while induction levels remained relatively unchanged for genes in the domains of other distal STAT1 group II binding sites during this time (Figure 4C). These findings suggest that an enhancer chromatin signature confers increased regulatory responsiveness to a STAT1 binding site, in agreement with our previous discovery of functional enhancers in HeLa cells that were marked by the enhancer chromatin signature but were not active until they were bound by STAT15.
Our findings offer the first genome-wide evaluation of the relationship between chromatin modifications at transcriptional enhancers and global programs of cell type-specific gene expression. We determined over 55,000 potential enhancers in the human genome and showed that the chromatin modifications at the enhancers correlate with cell type-specific gene expression and functional enhancer activity. Perhaps the most intriguing observation is the large number of enhancers identified from the investigation of just two cell lines. Since enhancers are mostly cell type-specific, our data suggest the existence of a vast number of enhancers in the human genome, on the order of 105–106, that are used to drive specific gene expression programs in the 200 cell types of the human body. Future experiments with diverse cell types and experimental conditions will be necessary to comprehensively identify these regulatory elements and understand their roles in the specific gene expression program of each cell type.
HeLa, K562, and IMR90 cells were obtained from ATCC. GM06990 cells were acquired from Coriell. All were cultured under recommended conditions. Passage 32 H1 cells were cultured as described22 with/without 200ng/ml BMP4 for 6 days (RND systems). Chromatin preparation, ChIP, DNA purification, and LM-PCR were performed as described using commercially available and custom antibodies, and ChIP samples were hybridized to tiling microarrays and to custom condensed enhancer microarrays (NimbleGen Systems, Inc.) as described5,6. DNase-chip was performed and the data analyzed as described23. Cloning and reporter assays were performed as described5. Data were normalized as described5 and ChIP-chip targets for CTCF, p300, MED1, and STAT1 were selected with the Mpeak program. We used MA2C24 to normalize and call peaks on Nimblegen HD2 arrays. Enhancers were predicted and K-means clustering, intersection analysis, and evolutionary conservation analysis were performed as described5. Motif analysis was performed as described25. Gene expression was analyzed using HGU133 Plus 2.0 microarrays (Affymetrix) as described5. Specificity of expression was determined using a function of Shannon entropy15. We use the MAS5 algorithm from the Bioconductor R package to generate gene expression Present/Absent calls. Detailed methods may be found in the Supplementary Information. Supplementary data for the microarray experiments has been formatted for viewing in the UCSC genome browser via http://bioinformatics-renlab.ucsd.edu/enhancer
Coordinates are listed in hg17 for 729 non-redundant CTCF binding sites identified in HeLa, GM, K562, ES, and dES cells (see Supplementary Information).
Coordinates are listed in hg17 for 36589 enhancers predicted in HeLa cells based on chromatin signatures (see Supplementary Information).
Coordinates (hg17) and primers used to amplify regions containing predicted enhancers in HeLa (H1-H9) and K562 (K1-K9) cells for cloning and reporter assays, as well as random regions selected as controls (R1–R10).
Enrichment of motifs in enhancers was analyzed as described 25,28. Over-conservation and Enrichment are calculated as the excess conservation and overabundance, respectively, of a motif in enhancers or promoters relative to that expected for a random motif of identical composition. All significance values are expressed as Z-scores, corresponding to the number of standard deviations away from the mean of a normal distribution.
Known Match score represents the shared information content between novel and known motif28. Over-conservation is calculated as the excess conservation of a motif in enhancers or promoters relative to that expected for a random motif of identical composition. Enrichment is calculated as the over-abundance of a motif in enhancers or promoters relative to that expected for a random motif of identical composition. Enhancer-Specific motifs are those lacking significant promoter enrichment. All significance values are expressed as Z-scores, corresponding to the number of standard deviations away from the mean of a normal distribution.
Coordinates are listed in hg17 for 24566 enhancers predicted in K562 cells based on chromatin signatures (see Supplementary Information).
Coordinates are listed in hg17 for each HeLa predicted enhancer with notation of overlap with experimentally determined transcription factor binding sites (see Supplementary Information).
Coordinates are listed in hg17 for 1969 STAT binding sites as determined by ChIP-chip.
Coordinates are listed in hg17 for p300 binding sites identified in HeLa cells (see Supplementary Information).
Coordinates are listed in hg17 for p300 binding sites identified in GM cells (see Supplementary Information).
Coordinates are listed in hg17 for p300 binding sites identified in K562 cells (see Supplementary Information).
Coordinates are listed in hg17 for enhancers predicted in HeLa cells based on chromatin signatures (see Supplementary Information).
Coordinates are listed in hg17 for enhancers predicted in GM cells based on chromatin signatures (see Supplementary Information).
Coordinates are listed in hg17 for enhancers predicted in K562 cells based on chromatin signatures (see Supplementary Information).
Coordinates are listed in hg17 for enhancers predicted in ES cells based on chromatin signatures (see Supplementary Information).
Coordinates are listed in hg17 for enhancers predicted in dES cells based on chromatin signatures (see Supplementary Information).
We thank members of the Ren lab for comments. This work was supported by funding from American Cancer Society (RDH), LICR (BR), NHGRI (BR), NCI (BR), and CIRM (BR).
Author contributions: RDH, NDH, GCH and BR designed the experiments; RDH, NDH, LFH, ZY, LKL, RKS, CWC, HL, and XZ conducted the ChIP-chip experiments; GCH and KAC analyzed the ChIP-chip data; GCH predicted enhancers; RDH and LKL conducted the reporter assays; JEA, RS and JAT provided hES cells and expression data; PK, AS and MK analyzed the transcription factor motifs; GEC performed and analyzed the DNaseI-chip experiments; NDH, GCH, RDH and BR wrote the manuscript.
Microarray data have been submitted to the GEO repository under accession numbers GSE14083, GSE8098, GSE7872, and GSE7118.
Reprints and permissions information is available at npg.nature.com/reprintsandpermissions