|Home | About | Journals | Submit | Contact Us | Français|
The genomic loci bound by the glucocorticoid receptor (GR), a hormone-activated transcription factor, show little overlap between cell types. To study the role of chromatin and sequence in specifying where GR binds, we used Bayesian modeling within the universe of accessible chromatin. Taken together, our results uncovered that although GR preferentially binds accessible chromatin, its binding is biased against accessible chromatin located at promoter regions. This bias can only be explained partially by the presence of fewer GR recognition sequences, arguing for the existence of additional mechanisms that interfere with GR binding at promoters. Therefore, we tested the role of H3K9ac, the chromatin feature with the strongest negative association with GR binding, but found that this correlation does not reflect a causative link. Finally, we find a higher percentage of promoter–proximal GR binding for genes regulated by GR across cell types than for cell type-specific target genes. Given that GR almost exclusively binds accessible chromatin, we propose that cell type-specific regulation by GR preferentially occurs via distal enhancers, whose chromatin accessibility is typically cell type-specific, whereas ubiquitous target gene regulation is more likely to result from binding to promoter regions, which are often accessible regardless of cell type examined.
Transcription factors (TFs) play a pivotal role in regulating the expression of genes by binding to genomic response elements. The recognition sequences for eukaryotic TFs encoded within these response elements are generally short and degenerate (1). Consequently, only a fraction of all potential binding sites of a TF present in the genome is occupied. Moreover, for several TFs, the genomic binding pattern shows little overlap between cell types (2), further emphasizing that the presence of a recognition sequence alone provides insufficient information to specify where in the genome TFs bind. Additional information is provided by the combinatorial architecture of response elements that typically harbor recognition sequences for multiple TFs. These TFs can, for example, act cooperatively, when the binding of one TF is stabilized by the presence of another, thereby influencing its genome-wide binding pattern (3,4). In addition to sequence, the local chromatin environment plays an important role in defining where in the genome TFs bind. For example, many TFs preferentially bind to regions of accessible chromatin as assayed by DNase I sensitivity assays (5). Accordingly, TF-bound regions are enriched for histone modifications that are associated with active promoters (e.g. H3K4me3 and H3K9ac) and enhancer elements (H3K4me1 and H3K27ac) (6). Consistent with an important role for the chromatin environment in specifying where TFs bind, genomic TF binding site prediction benefits from incorporating both sequence and chromatin accessibility data (7,8). Similarly, histone modification data is highly informative when predicting genomic TF binding (8).
The dynamic nature of the chromatin environment provides a mechanism to facilitate cell type-specific TF binding. For example, regions of accessible chromatin show little overlap between cell types, especially regions that do not locate to the transcriptional start site (TSS) of genes (7,9). Similarly, binding of the glucocorticoid receptor (GR), a hormone-activated TF that predominantly binds to regions of open chromatin, shows little overlap between cell types (10,11). The cell type-specific binding of GR likely also explains the limited overlap in genes regulated by GR when comparing different cell types ((12), this study) and thus could facilitate the diverse range of biological responses to glucocorticoid signaling ranging from gluconeogenesis in the liver, to the suppression of inflammation in chronic inflammatory diseases like asthma.
Here, we set out to study the role of sequence and the chromatin environment in specifying the genomic loci occupied by GR. Specifically, we explored available data from the ENCODE (13) and the Epigenomics roadmap (14) consortia regarding several histone modifications, the histone variant H2A.Z and DNase I sensitivity data in three cell types for which we have genome-wide GR occupancy data. Similar to previous studies (10), we find that GR predominantly (94%) binds to accessible (‘open’) chromatin decorated with histone modifications found at transcriptionally active regions. Since GR almost exclusively binds to open chromatin, we decided to focus only on the universe of DNase I Hypersensitive Sites (DHSs) to study the role of chromatin environment and sequence in specifying which of the open sites (GR binds ~15% of all DHSs) are bound by GR. We used a hierarchical Bayesian model to quantitatively describe GR binding in several cell lines based on sequence and chromatin features. The model is interpreted as an exploratory, hypothesis-generating step towards further investigation of how genomic sequence combines with cell type-specific chromatin state to produce a diversity of cellular responses to hormone. This approach uncovered that not all open chromatin is equal and that GR preferentially binds DHSs decorated with chromatin features found at enhancers, whereas binding at DHSs decorated with chromatin features found at promoters was disfavored. Further, we found that the depletion of promoter–proximal binding could be explained in part by changes in sequence composition, with fewer GR recognition sequences mapping to promoter–proximal DHSs. Together, our studies uncovered a preferential bias against promoter proximal GR binding, and we provide evidence that this may play a role in facilitating the cell type-specific transcriptional consequences of GR signaling.
Immortalized wild type mouse embryonic fibroblasts (MEFs) and GCN5/PCAF double knockout (dko) MEFs (15) were a kind gift of Kai Ge. Cells were grown in DMEM supplemented with 10% FBS. A549 cells were grown in DMEM supplemented with 5% FBS, whereas Nalm6 cells were grown in RPMI 1640 supplemented with 10% FBS.
ChIP assays were done essentially as described using either the N499 GR-antibody ((16), 2 μl/ChIP) an H3K9ac antibody (C5B11, Cell Signaling, 6 μl/ChIP) or as control IgG (cat#: C15410206, Diagenode, 6 μl/ChIP). For each ChIP assay, ~5 million cells were treated with 0.1% ethanol vehicle or 1 μM dexamethasone for 1.5 h. For ChIP-seq experiments, the material of multiple ChIP experiments was pooled to obtain enough material (~10 ng) for library preparation. Primers used for qPCR analysis of ChIP experiments are listed in Table Table11.
Total protein isolated from equal amounts of cells were separated with SDS-PAGE gels, transferred to membranes and incubated with either anti-GR (N499, dilution 1:5000, (16)), anti H3K9ac antibody (C5B11, Cell Signaling, dilution 1:1000) or anti-actin (Sc-1616R, Santa Cruz Biotechnology, dilution 1:1000) antibodies followed by incubation with secondary antibodies conjugated with horseradish peroxidase. Proteins were visualized using an ECL detection system (Amersham Biosciences).
DNase I assays were done essentially as described (17) by growing MEFs or dko MEFs in six-well plates (500.000 cells/well), washing cells with PBS and scraping them into 1 ml DNase I buffer (20 mM HEPES pH 7.4, 0.5 mM CaCl2, 5 % glycerol, 3 mM MgCl2, 0.2 mM spermine, 0.2 mM spermidine) plus 0.2% NP40 alternative. Nuclei from three wells were pooled, isolated by centrifugation and resuspended in 300 μl DNase I buffer containing 2 μg/ml RNase A. 55 μl aliquots were DNase I treated (or mock treated to normalize for the amount of chromatin in input) by the addition of 2.05 μl DNase I (Qiagen, 2.7 u/μl) and then incubated at 37°C for 10 min. The reaction was stopped by addition 1 ml stop buffer (50 mM Tris pH 8.0; 200 mM NaCl; 12.5 mM EDTA; 1% SDS, 200 μg/ml proteinase K) and samples were incubated at 65°C for 4 h to remove proteins. Finally, DNA was purified using a PCR purification kit (Qiagen) and regions of interest were analyzed by qPCR (primers listed in Table Table11).
GR ChIP-seq data for GR (U2OS, IMR90, K562) were previously generated by us and deposited at EBI ArrayExpress under accession number E-MTAB-2955 (18). ChIP-seq data for A549, Nalm-6 and MEFs were deposited at EBI ArrayExpress with this study under accession number E-MTAB-5113.
GR ChIP-seq data for A549 and Nalm-6 was processed as described previously (16). For wild type and dko MEFs, the paired-end sequence reads derived from GR ChIP-seq experiments were mapped with Bowtie 2 (19) onto the Mus musculus assembly mm9. Number of allowed mismatches was set to zero; length of seed substrings to align was set to 20; default setting were used for all other parameters. Peak calling was done using MACS2 (20) using the input DNA from each respective cell line as control. Genome size was set to –mm; the keep-dup parameter to 1 and default was used for all remaining parameters. Peaks were called using a FDR cutoff of 0.01.
The differential GR binding analysis between wild type and dko MEFS was done using the R package DiffBind (21, http://bioconductor.org/packages/release/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf.). Differentially bound peaks where called using the Mann–Whitney-U test with a significance cut off of 0.05. For qPCR validation experiments, loci representing the three classes of GR binding sites: common, dko-specific and wt-specific where picked based on peak height (for common peaks we chose ones with high signal) and peak ratio (for dko- and wt-specific loci, we chose peaks with relatively large differences in peak height between dko and wt MEFs).
ChIP-seq mapped sequencing data (BAM files) and annotated peaks from the ENCODE project (13) (A549, K562) were downloaded from the UCSC genome browser website. ChIP-seq data for IMR90 was from the Roadmap Epigenomics Mapping project (14) and downloaded from the NCBI GEO website under the series number GSE16256. H3K9ac ChIP data from primary MEFs was accessed via GEO database series numbers GSM863802 and GSM863804.
The following files were downloaded from the ENCODE project's Data Coordination Center FTP site and used for comparative analysis:
Pol2: wgEncodeOpenChromChipA549Pol2PkRep1.narrowPeak ETS1: wgEncodeHaibTfbsA549Ets1V0422111Etoh02PkIntersect.broadPeak
FOXA1: wgEncodeHaibTfbsA549Foxa2V0416102Etoh02PkIntersect.broadPeak FOSL2: wgEncodeHaibTfbsA549Fosl2V0422111Etoh02PkIntersect.broadPeak
DNase I-seq mapped sequencing data (BAM files) from the ENCODE project (13) (A549, K562) were downloaded from the UCSC genome browser website. DNase I data for IMR90 was from the Roadmap Epigenomics Mapping project (14) and downloaded from the NCBI GEO website under the series number GSE16256. The DHSs as annotated by the ENCODE project (13) were used to compare GR binding patterns at distinct genomic regions (promoters, exons, introns, distal) and are available from the UCSC Genome Browser as ‘narrowPeak’ files. DHS regions mapping to repeats (RepeatMasker (22) score >1000, track downloaded from repeatmasker.org, February 2012) were removed and resulted in 105 121 remaining DHSs for A549, 127 803 for IMR90 and 97 304 for K562 cells.
Motif scores were calculated using the GR motif MA0113.1 from the JASPAR database (23) and scanning both strands in a 160 bp range centered on the DHS peak. The maximum log odds score was assigned to each DHS location. Scanning was performed using the PWMscoreStartingAt function of the Biostrings R/Bioconductor package (24), with a zero-order background model for DNA with 42% GC content.
The effect of promoter proximity on GR binding was tested, controlling for lack of sequence motif, using an analysis of deviance (the change in −2 × log likelihood for a nested statistical model). A binary variable was defined for each DHS, indicating if the DHS overlapped a GR peak. The GR peak presence variable was then modeled using a logistic regression. Two models were fit: (0) motif score as the sole independent variable to explain GR peak presence, and (1) motif score plus a binary indicator of promoter proximity of the DHS as independent variables to explain GR peak presence. The deviance explained by introducing the indicator variable of promoter proximity (model 1) was expressed as a percent of the deviance explained by motif score alone (model 0) (Figure (Figure4B).4B). The significance test for all cell lines rejected the null hypothesis that promoter proximity did not help to explain depletion of GR binding beyond the motif score (P < 2 × 10−16).
The overlap of GR peaks with DHSs (Figure (Figure1B)1B) and the overlap of DHSs with GR peaks (Figure (Figure2A)2A) was calculated using the GR peak regions as defined by the MACS2 software (20) for the IMR90 ChIP-seq data, and the DNase I regions reads from the following file for the IMR90 cell line as defined by the ENCODE project: DNase I: wgEncodeOpenChromDnaseImr90Pk.narrowPeak.
The overlap of DHSs and GR-bound DHSs with promoters, exons, introns and distal regions (Figure (Figure3A)3A) was calculated using the genomic annotations defined by the Bioconductor package ‘TxDb.Hsapiens.UCSC.hg19.knownGene’ (24). Promoter regions were defined as ±2.5 kb from the TSS of a known gene. Because the genomic features can overlap, the following hierarchy was defined: (i) promoters, (2) exons, (3) intronic, (4) distal. In case a GR peak overlapped multiple features, it was assigned exclusively to one category according to this hierarchy.
The overlap of TF peaks with promoters, exons, introns and distal regions (Figure (Figure3B)3B) was calculated similarly, but using the ENCODE defined peaks from files downloaded from the Data Coordination Center FTP site.
Overlap MEF common, wt-specific, dko-specific mapping to promoters (Figure (Figure5E):5E): The overlap of MEF common, wt-specific, dko-specific GR peaks is based on the same definition of promoter regions described above. A GR peak was considered to map to promoters when at least one promoter region overlapped the middle of the peak region.
To compare the number of ChIP-seq reads within GR peak regions to random regions of the genome (Figure (Figure1C),1C), we took the 1000 GR peak regions with the lowest p-value as calculated by MACS (20) and counted the reads within them for each of the histone modifications, DNase I and Input, then took the median across all 1000 regions. We then defined a background set of non-peak regions by shifting each of the 1000 GR peak regions 5kb away in a random direction (upstream or down), counted the reads there and again calculated the median across all 1000 background regions. The ratio of these two medians (GR / background) was then calculated, converted to log2 scale and plotted.
Average read count profiles for histone marks ±2 kb from the center of DHS (Supplementary Figure S1) were calculated using the R/Bioconductor GenomicRanges package (25), taking the average of read counts at each position relative to the DHS center over the set of DHSs which overlapped GR peaks, or the set of DHSs which did not overlap a GR peak.
To compare GR-bound promoter DHSs with unbound promoter DHSs in IMR90 cells (Figure (Figure3c),3c), reads from ENCODE histone marks ChIP-seq experiments in IMR90 cells were counted using Bioconductor core packages in 1600 bp windows centered on promoter–proximal DHSs (DHSs within 200 bp of a transcriptional start site defined by RefSeq gene annotation). Shown are boxplots of log10 counts of reads across all DHSs, split by whether or not the DHS overlapped a GR peak as identified by MACS software (20).
H3K9ac abundance for different classes of GR ChIP-seq peaks in MEFs (common, dko-specific and wt-specific) was computed from datasets GSM863802 and GSM863804 retrieved from the GEO database. Both datasets were pooled for the analysis. To avoid biases from different widths of GR peaks, we set the width for each peak to 500 bp for each class of GR ChIP-seq peak (middle of peak region -249 bp, +250 bp). A single-end read from H3K9ac ChIP-seq was considered to fall into the 500 bp GR peak region when at least 1bp overlapped.
Microarray datasets for U2OS, A549 and Nalm6 cells comparing dexamethasone treatment (1 μM, 3h) with vehicle control were downloaded from E-GEOD-38971 (EBI ArrayExpress) and processed as described previously (16). To call genes regulated upon treatment with dexamethasone, we used an adjusted p-value cut-off of 0.05 and 1.5 for the fold change (either up- or down-regulated). The overlap of the regulated gene sets between cell lines is visualized as Venn diagram (Figure (Figure6A).6A). The percentage of genes with GR binding in the promoter was computed for cell type-specific genes and for genes regulated in all three cell lines examined (common target genes). Promoters were defined as regions ±2.5 kb around all transcript start sites annotated in the Ensembl database (grch37) of Homo sapiens genes, accessed via Bioconductor package BioMart (26). In case a gene had several transcripts, all of them were considered. A gene was marked as having a GR peak in the promoter, if at least one of its promoter regions overlapped a GR peak position.
A hierarchical Bayesian model (27) was constructed to correlate binding of GR (as measured by ChIP-Seq read counts) with chromatin features and motif score in DHSs across experiments and across cell types. The log of read counts for various chromatin features plus a pseudocount of 1 and the motif score over the annotated DHSs of a cell type were arranged as columns of a matrix X (Figure (Figure2B).2B). The chromatin feature matrix X was identical for experiments of the same cell type, except the Input feature, which was paired with the GR ChIP experiment: each of the IMR90 ChIP-Seq experiments was paired with its own Input experiment, and the A549 ENCODE GR ChIP experiments were paired with the A549 ENCODE Input experiment. This matrix X was then centered and scaled to have columns with zero mean and unit standard deviation. For an experiment k and a genomic range i centered on a DHS, the count of GR ChIP-Seq reads was modeled as following a Poisson distribution with a varying mean:
The ‘*’ indicates iteration over all j explanatory features. The coefficient β0k provides a sample-specific intercept, or base level of GR ChIP-seq read counts. The coefficient βjk is interpreted as the multiplicative effect of chromatin feature j on the GR ChIP-Seq read counts for experiment k. A positive coefficient implies an inductive contribution to GR binding and a negative coefficient implies an inhibitory contribution to GR binding, while fixing the contribution of all other chromatin features.
We specify that βjk for experiments k of the same cell type are related, with ct(k) denoting the cell type of experiment k. These coefficients are given a shared prior mean νj,ct(k), and experiment-specific variance σ2β:
The log of the mean for a given experiment and genomic range, log(μik), is therefore distributed as a normal random variable (the sum of a number of normal random variables). Each experiment k has its own set of coefficients, allowing for variability in the mean across samples, similar to the over-dispersion allowed by the Gamma-Poisson (Negative Binomial) distribution.
The parameters of the distributions for the main model coefficients βjk are themselves distributed as follows:
The term λj gives the overall contribution of chromatin feature j on GR binding – across all cell types included in the hierarchical model. The zero-mean normal prior for λj in practice produces a shrinkage of terms towards zero (no contribution to GR binding) unless the data provides evidence to the contrary.
The posterior of the model parameters conditioning on the observed data was sampled using the Stan C++ MCMC package and the rstan R package (28). The model was run for four chains for 4000 iterations (the first 2000 iterations discarded as burn-in) using the ‘no U-turn’ setting. R-hat values near 1 were used as a convergence diagnostic. Example code for the Bayesian model is provided as a separate supplementary file (bayes_model_Loveteal_2016.txt).
An estimate of the percent variance of log-scale GR ChIP-seq read counts explained by the model was calculated by squaring the Pearson correlation coefficient between the log observed signal and the log of the predicted GR read count for each DHS, using the coefficients estimated by the hierarchical Bayesian model. The estimated coefficients used for prediction were the posterior means from the Bayesian model.
Sequences matching the GR recognition sequence are ubiquitously found in the genome, yet only a cell type-specific minority are actually bound. To understand the mechanisms that specify which of these potential binding sites are in fact occupied by GR, we studied the role of the chromatin landscape in which these binding sites are embedded. Specifically, we used the wealth of information regarding chromatin features available from the ENCODE and the Epigenomics roadmap in three cell types (A549, K562 and IMR90) for which we have genome-wide GR ChIP-seq data ((18) and this study). Notably, the data regarding the chromatin landscape was generated in the absence of GR ligand and thus represents the chromatin landscape GR encounters upon ligand binding. Previous studies have shown that chromatin accessibility pre-determines GR binding patterns, with the majority of binding occurring at preexisting loci of accessible chromatin (10,29). Accordingly, we found that the majority of GR binding in IMR90 cells (94%) overlaps with DHSs as annotated by the ENCODE consortium (13) (Figure (Figure1A1A and B). Next, we compared the levels of chromatin features at GR-bound regions with those found at random genomic regions. As expected, this analysis showed that GR-bound regions have increased levels of DNase I sensitivity for each of the cell lines examined (Figure (Figure1C).1C). Furthermore, the DHS signal displays a distinctive pair of peaks flanking the GR-bound regions for all enriched histone modifications (H3K27ac, H3K4me1/2/3, H3K9ac) with the exception of H3K4me3, which showed a single peak overlapping the GR ChIP-seq peak (Supplementary Figure S1). The pair of peaks is indicative of GR binding in between nucleosomes, whereas the single peak for the H3K4me3 modification suggests that GR might bind on top of the nucleosome when this modification is present (Supplementary Figure S1). No striking enrichment at GR-bound regions was observed for histone modifications associated with either transcriptionally inactive regions (H3K9me3 and H3K27me3) or with transcriptional elongation (H3K36me3, H3K79me2). Interestingly, we found that H4K20me1, a histone mark linked to transcription-linked histone turn-over (30) and Polycomb repression (31), showed a marked enrichment at GR-bound regions in IMR90 and K562 cells, whereas no enrichment was observed in A549 cells (Figure (Figure1C),1C), indicative of a possible role of H4K20me1 in generating cell type-specific GR binding patterns.
Together, our findings corroborate previous studies showing that GR predominantly binds to regions of accessible chromatin decorated with chromatin features found at transcriptionally active regions. However, GR only binds a minority (15% in IMR90, Figure Figure2A)2A) of all DHSs found in a cell raising the question: what distinguishes DHSs bound by GR from those that are not bound? Notably, the occurrence of a GR consensus motif does not explain which sites GR binds with great accuracy because many bound sites do not contain a consensus motif and conversely, because many unbound sites contain GR consensus motifs ((18,32) and Figure Figure4A).4A). Therefore, we constructed a hierarchical Bayesian model which incorporates both sequence and chromatin features to quantitatively describe the binding of GR (as measured by ChIP-seq read counts) based on the level of cell type-specific chromatin features and the score of the GR motif in DHSs across experiments and across cell types (Figure (Figure2B).2B). As output, the model generates estimates for coefficients (β) for each feature and each cell type, which can be compared between features to determine their relative contribution to explaining GR binding levels. Furthermore, these coefficients can be compared between cell lines to find potential differences between cell types. Positive estimated coefficients indicate a positive association with GR binding whereas negative estimated coefficients indicate a negative association with GR binding. To assess how accurately our model describes GR binding, we calculated the percentage of variance of GR binding explained by using the fitted hierarchical model coefficients to predict log counts of ChIP-seq reads. The percent variance explained was relatively high for IMR90 and K562 (48–57%), while for A549, the percent variance explained was 22%. This difference may have be accounted for by generally lower read counts in the A549 ChIP-seq experiment, leading to more technical, sampling variability at low counts values.
Next, we analyzed the coefficients for individual chromatin features and found that they are mostly consistent across cell types (Figure (Figure2C).2C). In general, chromatin features that show the highest positive correlations with GR binding are marks of distal regulatory elements (H3K27ac, H3K4me1) and the GR motif score. Further, within the universe of DHSs, GR preferentially binds to regions with high DNase I and input read counts. Within the model, the features are scaled to facilitate direct comparisons. Therefore, the similar positive values for the DNase I coefficient and the motif score coefficient indicate that they have a similar positive association with GR binding, whereas the highest positive coefficient is observed for H3K27ac levels indicating that this feature is more important in quantitatively explaining the level of GR binding than any other feature including motif score (Figure (Figure2C2C).
Four consistently negatively associated chromatin features are H3K36me3, H3K79me2, H2A.Z and H3K9ac. H3K36me3 is typically found along the gene body of transcribed genes and has been linked to suppression of cryptic transcription initiation (33). The negative association with GR binding may thus reflect a link between H3K36me3 levels and prevention of TF binding in transcribed regions. H3K79me2 is also enriched in the gene body of transcribed genes and linked to various biological processes including telomeric silencing, DNA repair and cell cycle checkpoints (34), if and how this could be linked to GR binding however is unclear. The consistent negative association of H2A.Z with GR binding might at first appear incompatible with the findings reported by John et al. (29) that H2A.Z is highly enriched at GR binding sites. However, this study compared H2A.Z levels at GR binding sites with nearby regions that are not DHSs. Similarly, our genome-wide analysis (Figure (Figure1C)1C) shows a strong positive correlation between H2A.Z levels and GR binding. In contrast, the hierarchical Bayesian model presented here focuses only on the universe of DHSs and suggests that of all accessible sites, GR preferentially binds DHSs with low H2A.Z levels. Similar to H2A.Z, H3K9ac is enriched at the transcription start site of genes (35) and we find that within the universe of DHSs, GR binding is associated with low levels of H3K9ac (Figure (Figure2C).2C). Interestingly, the coefficients of two chromatin features that show a strong association with GR binding, H3K4me2 and H3K4me3, have a different sign depending on the cell line examined. H3K4me2 is positively associated with binding in A549 and IMR90 cell lines, whereas its negatively associated in K562 cells. In contrast, H3K4me3 levels correlate negatively with GR binding in A549 and IMR90 cells whereas they are a positive predictor of GR binding in K562 cells.
Together, our hierarchical Bayesian modeling of GR binding within the universe of accessible chromatin uncovered that GR preferentially binds regions with a recognition motif and chromatin marks associated with (active) enhancers. Further, some of the chromatin features showed a cell type-specific link to GR binding indicative of context-specific effects of the chromatin environment on GR binding. Unexpectedly, chromatin marks associated with promoter regions (H3K9ac, H2A.Z and to some degree H3K4me3) showed a negative association with GR binding, suggesting that mechanisms exist that interfere with promoter–proximal GR binding.
The negative link between GR binding and chromatin marks found at promoter regions suggests that GR preferentially binds to open regions that do not map to the promoters of genes. To test this hypothesis, we first determined for each cell line, the fraction of genome-wide DHSs that map to promoters (±2.5 kb around TSS of genes), to exons, to introns and to the rest of the genome (distal). This analysis showed that the percentage of all DHSs mapping to promoter regions ranged from 24% for IMR90 cells to 27% for A549 and K562 cells (Figure (Figure3A).3A). In comparison, the percentage of GR-bound DHSs mapping to promoters is strikingly lower, especially for A549 and IMR90 cells (A549: 13%; IMR90 12%; K562 17%). Similarly, the percentage of GR-bound DHSs mapping to exons is lower than the percentage observed for all DHSs whereas GR binding is relatively enriched at intronic and distal DHSs. Together, our analysis shows that GR preferentially binds to intronic and distal DHSs rather than DHSs mapping to promoter and exonic regions.
To test if the relative depletion of promoter–proximal DHS binding is GR-specific, we examined the distribution of other TFs across different genomic locations for ChIP-seq peaks mapping to DHSs. Similar to what we observed for GR, we observed promoter–proximal depletion of binding for FOSL2 and TCF12 TFs (Figure (Figure3B).3B). In contrast, ETS1, a member of the ETS family of transcription factors and RNA polymerase II preferentially bind promoter–proximal DHS regions with little binding to either exons, introns or distal DHSs (Figure (Figure3B;3B; similar patterns observed for ATF3, BHLHE40, c-MYC, E2F6, ELF1, PBX3, USF1, YY1c and ZBTB33, data not shown). A third group of TFs showed no marked bias for DHSs at specific genomic regions as their binding distribution followed the distribution observed for all DHSs (Figure (Figure3B,3B, examples shown for FOXA2 and CEBPB; similar pattern seen for JUND and TEAD4, data not shown). Together, our analysis indicates that, depending on the TF examined, binding can either be preferentially promoter–proximal, relatively unbiased toward a specific genomic location or, as observed for GR, depleted at promoter regions when compared to the distribution of all DHSs found in a cell.
Notably, GR does bind promoter–proximal in a number of cases (337). To determine if the chromatin features found at GR-bound promoter DHS-regions differ from those present at unbound promoter DHSs, we compared their levels. This analysis showed that the level of certain histone modifications differs between GR-bound and unbound promoter DHSs. First, GR-bound promoter DHSs have higher average levels of H3K4me1, whereas H3K4me3 levels were lower than those observed for unbound promoter DHSs (Figure (Figure3C).3C). Moreover, H2BK12 and H2BK20 acetylation levels, which are markers of active enhancers (36), are higher at GR bound promoter DHSs.
Together, we found that GR preferentially binds non-promoter DHSs and that the subset of GR-bound promoters shows chromatin characteristics typical for enhancer regions.
A simple explanation for the depletion of promoter–proximal GR binding is that many promoters lack a proper sequence motif to support GR binding. To investigate the role of sequence composition, we calculated the distribution of GR motif scores for DHSs grouped by GR peak presence and promoter proximity. As expected, for each of the cell lines, GR-bound DHSs have a slightly higher average motif score than unbound DHSs (Figure (Figure4A).4A). Further, the average motif score for distal DHSs is higher than for their promoter–proximal counterparts arguing that sequence composition contributes to the promoter–proximal depletion of GR binding observed. However, overall the promoter proximal and distal groups of DHSs have largely overlapping motif score distributions and a more formal approach was needed to determine the contribution of motif score in explaining the promoter proximal depletion observed (Figure (Figure4A).4A). Therefore, the contribution of motif score and promoter proximity to GR binding was quantified and statistically tested using an analysis of deviance. When controlling for motif score, promoter proximity could still explain a significant portion of the depletion of GR peaks (Figure (Figure4B)4B) arguing for the existence of additional mechanisms responsible for the observed promoter proximal depletion of GR binding. Together, our analysis indicates that the relatively low frequency of GR consensus motif matches explains part, but not all, of the depletion of GR binding at promoter regions.
One unexpected finding from our Bayesian modeling was the striking negative association between H3K9ac levels and GR binding. This finding was unexpected because acetylation of H3K9 is linked to transcriptionally active genomic regions and several studies have shown a positive correlation between H3K9ac levels and TF binding (37,38). H3K9ac is preferentially found at the TSS of genes and accordingly, H3K9ac levels can be used to accurately classify enhancers and promoters (39). The majority of H3K9 acetylation is carried out by GCN5 and PCAF, two enzymes that acetylate histones and other proteins including transcription factors (15,40). Of note, consistent with a role of these enzymes in modulating GR binding, overexpression of GCN5 results in a reduced activation of endogenous GR target genes and reduced DNA binding of GR (40). To explore the possible causative and mechanistic connection between H3K9ac levels and GR binding, we examined how the absence of GCN5 and PCAF influences the genome-wide binding pattern of GR (Figure (Figure5A).5A). For these assays, we used mouse embryonic fibroblast (MEF) GCN5/PCAF double knock-out cells (dko MEFs) which show an almost complete loss of global H3K9ac levels whereas acetylation and methylation levels of other lysines in the tails of histones, including H3K27ac, are largely unaffected ((15), Supplementary Figure S2A). Accordingly, we found that chromatin-associated H3K9ac levels, as assayed by ChIP, were markedly reduced in dko MEFs when compared to wild type (Figure (Figure5B).5B). Next, we assayed genome-wide GR binding in wild type and in dko MEFS by ChIP-seq. These experiments showed that the majority of GR peaks in wild type were also bound in dko MEFs (Figure (Figure5C5C and D). In addition, we find that some peaks are wild type-specific (Figure (Figure5C).5C). Most strikingly however and consistent with a role of these enzymes in suppressing GR binding, we find a 3.7 fold increase in the number of genomic GR binding sites in the dko MEFs (Figure (Figure5D).5D). Subsequent analysis of GR binding at the three classes of bound loci (common; dko MEF-specific; wild type MEF-specific) by qPCR verified what we observed in the ChIP-seq experiments for each of the loci tested (Supplementary Figure S2C), arguing that the observed increase in GR binding in dko MEFs is not simply a consequence of differences in the quality of the ChIP-seq experiments. Further, analysis of two genes each near wild type and dko-specific GR peaks showed differences in GR-dependent regulation consistent with the observed cell type-specific GR binding (Supplementary Figure S3). The increase in binding cannot be explained by changes in GR protein levels which are comparable for wild type and dko MEFs (Supplementary Figure S2B). Previous studies have shown that deletion of GCN5 results in a redistribution of DHSs in yeast (40). Therefore, we tested if changes in chromatin accessibility could explain the changes in GR binding upon GCN5/PCAF deletion by comparing DNase I sensitivity for each of the three classes of binding sites. The analysis of control ‘closed’ and ‘open’ regions showed that the assay can be used to distinguish regions with different chromatin accessibility and that these control regions yielded comparable results for wild type and dko MEFs (Supplementary Figure S2E). Further, we found that changes in GR binding levels correlated with changes in DNase I sensitivity prior to GR binding for some of the loci examined (Supplementary Figure S2E). For example, the dko-specific binding sites showed higher levels of accessibility in dko MEFs than their wild type counterpart for each of the loci examined, arguing that chromatin accessibility might contribute to the changes in GR binding we observed upon GCN5/PCAF deletion.
Next, we set out to determine if the increased genome-wide GR binding in the absence of GCN5 and PCAF might explain the promoter–proximal depletion of GR binding we observe in wild type cells. Therefore, we calculated the percentage of binding sites mapping to promoter regions for each of the three classes of GR binding sites: dko-specific, wild type-specific and common. If GCN5 and PCAF indeed play a key role in preventing promoter–proximal GR binding, we would predict that a higher percentage of dko-specific peaks map to promoter regions than for common or wild type-specific peaks. What we observed however is that a smaller percentage of dko-specific peaks maps to promoter regions than for common and wild type-specific peaks (Figure (Figure5E).5E). Further, the gained dko sites do not have higher H3K9ac levels prior to GR activation than those observed for common and wild type-specific peaks (Supplementary Figure S2D, Figure Figure5B)5B) indicating that gained binding sites in the dko MEFS are not preferentially regions with high H3K9ac levels in wild type MEFs.
Together, our findings indicate that the negative association between H3K9ac levels and GR binding does not reflect a causative link explaining the depletion of promoter–proximal GR binding observed. Nonetheless, PCAF and GCN5 appear to modulate genome-wide GR binding, possibly by inducing locus-specific changes in chromatin accessibility or by acetylating GR, which in turn can influence GR's ability to interact with DNA (41).
The observed bias against GR binding at promoter–proximal DHSs matches findings by others (10,42) and prompts the question what the underlying biological significance might be. One possibility could be that the depletion is linked to the cell type-specific functions of the glucocorticoid receptor, which regulates vastly different sets of genes in different cell types (Figure (Figure6A,6A, (12)). Given the strong link between GR binding and chromatin accessibility, we hypothesized that cell type-specific regulation might preferentially occur via distal enhancers, whose chromatin accessibility is typically cell type-specific, whereas the chromatin of promoter regions is often accessible regardless of tissue examined (7,9). To test this hypothesis, we first compared the transcriptional responses to GR activation between three cell types (A549, U2OS and Nalm6) for which we also generated genome-wide GR binding data. Consistent with other studies (12), we find that GR target genes show little overlap between cell types (30 out of a total of 2974 regulated genes in three cell lines combined are regulated in all three cell types). If cell type-specific gene regulation is preferentially driven by distal binding, the expectation would be that cell type-specific target genes have fewer promoter–proximal GR binding events than common target genes. Accordingly, we find that the percentage of genes with a promoter–proximal GR peak is smaller for cell type-specific target genes than for common target genes for each of the three cell lines examined (Figure (Figure6B)6B) and the same trend is seen when cell-type-specific target genes are compared to genes that are regulated in two out of three cell lines, indicating that depletion of promoter proximal binding might play a role in safe-guarding GR's ability to regulate target genes in a cell type-specific manner.
Prior to the advent of methods to map TF occupancy genome-wide, studies primarily focused on promoter–proximal regions to identify candidate TF binding sites responsible for the regulation of genes. Subsequent genome-wide analysis of TF binding have shown that the majority of binding for certain TFs, including GR, actually occurs at promoter-distal sites suggesting that certain TFs might preferentially regulate transcription from remote locations. Notably, the majority of open chromatin maps to promoter-distal locations and findings presented here and by others (10) show that most GR binding occurs at chromatin that is accessible prior to hormone treatment. Thus, one explanation for the genomic distribution of GR binding is that it simply follows the genomic distribution of DHSs. Interestingly however, our approach, which focused on the universe of DHSs, uncovered that GR binding is biased against promoter proximal regions (Figure (Figure3A).3A). In agreement with our findings, another study also found that GR binding is biased against promoter proximal DHSs with 39% of all DHSs mapping to promoter regions whereas the percentage of GR binding sites mapping to promoter regions was only 7% (10). Our analysis of other TFs showed different biases towards binding at DHSs located at distinct genomic regions (Figure (Figure2B).2B). For some TFs (for example ETS1, c-MYC and E2F6) the majority of binding (>75%) occurs at promoter–proximal DHSs whereas other TFs follow the distribution of DHSs or, like GR, show a bias against promoter–proximal binding. This prompts the question what the biological significance of the promoter–proximal depletion of TF binding might be. We propose that gene regulation by distal regulatory elements might play a role in allowing GR to regulate different target genes in different tissues. Our hypothesis is motivated by the fact that promoter–proximal binding sites are typically accessible across cell types, whereas promoter-distal sites are predominantly accessible in a cell type-specific manner (7,9). Accordingly, ChIP-seq peaks in promoters frequently show a larger percentage overlap between cell types than peaks promoter-distal sites (2,43). We propose that GR might bind and regulate transcription from promoter–proximal regions regardless of cell type, whereas binding and regulation from promoter-distal regions is more likely to direct cell type-specific transcriptional programs (Figure (Figure7).7). Consistent with our hypothesis, a larger percentage of genes regulated by GR across cell types harbor a promoter–proximal GR peak than genes regulated in a cell type-specific manner (Figure (Figure6B).6B). In contrast, the preferential promoter–proximal binding of other TFs involved in fundamental processes like DNA repair and cell-cycle progression (E2F6, YY1 and c-MYC) might facilitate the regulation of largely overlapping sets of genes across cell types. One way to further test our hypothesis would be to use genome-editing techniques to introduce GR response elements in the promoter–proximal regions of genes and to test if these genes are more prone to be regulated by GR regardless of cell type than genes whose expression is regulated by GR via distal regulatory elements.
Many bioinformatic approaches have integrated sequence and chromatin features to predict the genome-wide binding patterns of TFs (2,8,44). What is unique about the method presented here is that we modeled GR binding in the universe of accessible chromatin to uncover what distinguishes GR-bound from GR-unbound open chromatin. This approach helped us uncover that not all open chromatin is equal and that GR preferentially binds DHS regions with chromatin marks found at active enhancers rather than DHSs with promoter marks. Further, we found a negative association of H2A.Z with GR binding which at first appears incompatible with a previous study (29) showing that H2A.Z is highly enriched at GR binding sites. However, this study compared H2A.Z levels at GR binding sites with two nearby regions which where not DHSs. Similarly, when we compared GR bound sites (which are predominantly DHSs) with other genomic regions (which are mostly not DHSs) we found a strong enrichment of H2A.Z levels at GR bound loci. This probably reflects the strong enrichment at DHSs of H2A.Z and of other chromatin marks associated with active chromatin (45) when compared to the rest of the genome, which is mostly not DHSs. Consequently, chromatin accessibility and histone modification state might provide redundant information when TF binding is modeled genome-wide. Accordingly, a previous study has shown that addition of information regarding post-translational histone modification state did not provide additional predictive power when DNase I data was already included in the model to predict genome-wide TF binding (8). In contrast, our approach showed that certain chromatin features are specifically enriched or depleted at GR-bound accessible regions and the same approach can be applied to any TF to study the role of chromatin features in specifying which DHSs are preferentially bound by a TF of interest. The hierarchical Bayesian model allows comparisons to be made directly across experiments and across cell types. This showed, for example, that H3K27ac levels show a stronger association with the levels of GR binding than motif score. The parameters are mostly consistent across cell types with the exception of H3K4me2, H3K4me3 and to some degree H4K20me1 although the ß coefficients for this modification are quite small (Figure (Figure2C).2C). H3K4me3 marks actively transcribed promoters and shows a negative association with GR binding in IMR90 and A549 cells. In contrast, for K562 cells, H3K4me3 positively correlates with GR binding suggesting that some of the mechanisms that specify where GR binds might be cell type-specific. Notably, depletion of promoter proximal GR binding in K562 cells is less pronounced than in A549 and IMR90 cells (Figure (Figure3A).3A). Furthermore, GR-bound regions in K562 cells harbor fewer GR consensus motifs ((18), Figure Figure4A).4A). Together, this argues for the existence of K562-specific mechanisms of genomic GR binding, perhaps involving indirect (tethered) promoter–proximal binding mediated by interactions with other TFs. Our current model only included the GR consensus motif and therefore does not pick-up potential contributions of cell type-specific sequence motifs bound by TFs that tether GR to the genome or act cooperatively. Similarly, sequences that prevent GR binding to nearby response elements (46) were not included in the model. Therefore, future iterations of our model, to include additional sequence motifs, might better explain the observed GR binding variance.
The results of our modeling are useful to generate hypotheses, which guide the design of subsequent follow-up experiments. As detailed above, this approach uncovered a depletion of promoter proximal GR binding and yielded indications that this might play a role in facilitating cell type-specific cellular responses to glucocorticoid signaling. In addition, the modeling uncovered several striking correlations between chromatin features and GR binding. However, whether observed correlations between histone modifications and TF binding reflect a causal relationship is often unclear. Arguing for the existence of additional mechanisms that interfere with promoter–proximal GR binding, we found that motif score distribution only partially explains the depletion. Here, we followed up on the negative correlation between H3K9ac and GR binding and tested the effect of deletion of the two enzymes responsible for H3K9 acetylation, GCN5 and PCAF. Although we found that GR binds to many more genomic loci when PCAF and GCN5 are missing, the gained GR-bound genomic loci did not preferentially map to regions with high H3K9ac levels or promoter regions arguing that the correlation between GR binding and H3K9ac levels does not reflect a causative link. A likely alternative explanation for the observed anti-correlation between H3K9ac levels and GR binding is that high H3K9ac levels are found at promoter regions which have few GR binding sites which is the actual cause for the low levels of GR binding observed. This stresses that observed correlations between binding and histone marks do not necessarily imply causation. Nonetheless, the approach to disrupt the enzymes responsible for depositing specific histone marks provides a way to test for causality, which can also be applied to analyze other observed correlations between TF binding and histone modifications.
Taken together, our results uncovered that although GR preferentially binds accessible chromatin, its binding is biased against accessible chromatin located at promoter regions. This depletion can be explained in part by the sequence composition at promoters which harbor fewer binding sites and we hypothesize that depletion of promoter–proximal GR binding might play a role in safe-guarding cell type-specific transcriptional consequences of glucocorticoid signaling (Figure (Figure7).7). Given that many TFs preferentially bind accessible chromatin, modeling their binding in the universe of open chromatin, as we did here for GR, might provide additional insights into the mechanisms that direct TFs to specific genomic loci.
ChIP-seq data for A549, Nalm-6 and MEFs were deposited at EBI ArrayExpress with this study under accession number E-MTAB-5113.
We thank Edda Einfeldt for excellent technical support and Kai Ge for sharing immortalized wild type and dko MEFs.
Supplementary Data are available at NAR Online.
Deutsche Forschungsgemeinschaft [ME4154/1-1 to M.J.]; National Institutes of Health [R01 CA20535 to KRY]. Funding for open access charge: Core Max Planck Institute.
Conflict of interest statement. None declared.