Transcription in eukaryotes is regulated by transcription factors that associate with the genome in a cell-type and condition-specific manner. Chromatin organization forms part of the basis for this cell-type specificity by allowing or denying transcription factor (TF) access to DNA. The basic units of chromatin structure are the nucleosomes, which are known to restrict the in vivo access of certain classes of transcription factors 1
. Intensive work has been done to reveal the correlation between nucleosome position, histone modification and gene expression 2–4
. Genome-wide nucleosome occupancy maps have been generated in S. cerevisiae
, Drosophila 8
and C. elegans
, but high quality human nucleosome occupancy data is more difficult to acquire due to the large size of the human genome 10,11
. While the nucleosome positioning patterns are well established at transcription start sites (TSS), they are less well known at enhancers. Functional enhancers are cis-regulatory DNA elements that are orientation and position independent and can act at variable distances from the TSS of the genes they regulate 12,13
. Monomethylated H3K4 (H3K4me) has been shown to be associated with transcription factor binding at enhancers, trimethylated H3K4 (H3K4me3) with the TSS, and dimethylated H3K4 (H3K4me2) with both the TSS and enhancers 14,15
. To characterize the pattern of nucleosome positioning at enhancers, we used nucleosome-resolution ChIP-seq of H3K4me, H3K4me2 and H3k4me3 in the prostate cancer cell line LNCaP in response to a time-course of stimulation by the AR agonist 5α-dihydrotestoterone (DHT).
By comparing regions of enriched histone modification (Supplementary Table 1
) to AR and FoxA1 binding sites (4h DHT) and the promoter regions of RefSeq genes we found H3K4me2 ChIP-seq to be the most efficient in identifying both promoters and putative enhancers (Supplementary Fig. 1
). To differentiate intergenic TF binding sites from promoters, we removed H3K4me2 regions with strong H3K4me3 ChIP-seq signals from subsequent analyses.
We next examined whether H3K4me2 shows positional trends relative to the known AR and FoxA1 binding sites. H3K4me2 signal (based on average tag count) is highest near AR binding sites in the absence of DHT (). Upon DHT stimulation and concomitant AR binding, H3K4me2 signal decreases at the binding sites and increases in the flanking regions (). The same analysis relative to the binding sites of FoxA1 shows a bimodal tag count distribution both before () and after () DHT treatment, consistent with the role of FoxA1 as a pioneer factor to facilitate AR binding16
Signal distribution and nucleosome position analysis in the AR and FoxA1 binding regions identified by ChIP-chip experiment and the TSS
To investigate whether nucleosome positioning could explain the pattern observed in the previous analysis, we determined the likely positions of H3K4me2 marked nucleosomes using the NPS algorithm (Supplementary Table 1
. The distance from the AR motif in the binding site to the nearest detected nucleosome increases to ~200bp upon DHT stimulation (, Supplementary Fig. 2a
), indicating that nucleosomes tend to be less occupied (destabilized) at the binding site itself and more occupied (stabilized) at adjacent nucleosomes. Interestingly, the locations of the most positioned nucleosomes are concordant between DHT and vehicle (Supplementary Fig. 2b
). This suggests that prior to AR activation, AR binding loci are already marked with two well-positioned H3K4me2-containing nucleosomes, 250–450 bp apart, flanking the precise binding sites, along with a well-positioned nucleosome occluding the binding site itself. Upon AR activation, H3K4me2 modified nucleosomes are destabilized at the AR binding sites and are better positioned at the two flanking loci. While the chromatin structure relative to TSS is characterized by a nucleosome free region immediately upstream and a series of well positioned nucleosomes downstream (), we found that in general only two well-positioned nucleosomes are evident at TF bound enhancers ().
To validate our observations and rule out the possibility of ChIP-seq artifacts or loss of the H3K4me2 mark rather than the nucleosome, we conducted H3K4me2 ChIP-qPCR and Input DNA-qPCR on five AR binding sites that exhibited these H3K4me2 patterns near the genes TMPRSS2, STK39, KLK3, TMC6 and TRIM35. In all cases, the Input DNA-qPCR results show that overall nucleosome density decreases over the transcription factor binding sites and increases in the flanking regions to the same extent as the H3K4me2 marked nucleosomes (). We examined two of these cases, TMPRSS2 and STK39, in greater detail using primers that tile regions of interest at a finer resolution (), and obtained similar results.
qPCR validation of the nucleosomes stabilized-destabilized around AR binding sites
In order to determine whether the pattern of nucleosome positioning we observed at AR binding sites could be applied more broadly to other TFs, we used the change in H3K4me2 marked nucleosomes at AR binding sites following acute androgen stimulation to develop a general model of nucleosome positions at enhancers. We identified ~65,000 well-positioned nucleosome pairs (4h DHT) separated by the characteristic distance of 250–450bp in which promoter proximal pairs were removed on the basis of having H3K4me3 > H3K4me2 (). From the analysis of the AR binding sites we developed a quantitative model based on the changes in the H3K4me2 signal in the flanking nucleosomes and in the region between them (). Running the model results in a “nucleosome stabilization – destabilization” (NSD) score S
(as defined in and having a distribution as in Supplementary Fig. 3
) for each pair of appropriately spaced nucleosomes. Indeed, when we ranked all the nucleosome pairs by NSD score and grouped them into bins of 500, we found that the top scoring bins show the highest enrichment in AR binding sites ().
Motif analysis in the paired nucleosome regions
To further test the functional relevance of the regions identified by the model, we examined the evolutionary conservation across the 5,000 highest-scoring paired nucleosomes. We see three PhastCons conservation peaks, one major peak at the nucleosome depleted regions between the paired nucleosomes, and one flanking each of these nucleosomes (), this suggests evolutionary pressure not only on the TF binding sites between the paired nucleosomes but also on the regions immediately outside the paired nucleosomes.
To investigate the nature of nucleosome depletion in the regions between the paired nucleosomes, we studied the DNA sequence features in these regions. We observe that, consistent with previous models 18,19
, simple A/T content and AA/TT/TA/AT dinucleotides are depleted in nucleosome-enriched regions and enriched in nucleosome-depleted regions, while G:C dinucleotides show the opposite trend (). In addition, the stabilization of nucleosomes flanking the TF binding sites supports a model in which binding of non-nucleosomal proteins such as transcription factors forms boundaries that direct the in vivo
positioning of nearby nucleosomes 20
. A recent study also suggests that H3.3/H2A.Z-containing nucleosomes are intrinsically labile, which facilitate TFs access at regulatory sites in vivo
. We performed H2A.Z ChIP-qPCR at 5 representative AR binding sites (Supplementary Fig. 4
). These results show that H2A.Z is indeed enriched at the central nucleosome compared with the two flanking ones for all the five sites tested suggesting that it may be more labile. Significantly, the mean positions of the paired nucleosomes at AR binding loci appear to be the same in both bound (DHT) and unbound (Vehicle) conditions (, Supplementary Fig. 2b
) suggesting the existence of a common enhancer architecture that enables access of transcription factors to DNA.
To determine whether the model could be used to impute the identity of TFs that bind between nucleosome pairs, we searched for motifs in the 1000 top NSD scoring nucleosome paired regions. The top motifs identified are from the forkhead and steroid receptor families (Supplementary Table 2
and , Supplementary Fig. 5a, 5b
), previously shown to be responsible for the androgen response in prostate cancer 16
. Interestingly, this approach predicted AR binding at several sites that were not previously identified by ChIP-chip, and all of a representative sample of these were validated by ChIP-qPCR (, Supplementary Fig. 6a
ChIP-qPCR and gene expression analysis of NSD scoring sites
If the model is valid more generally, it should identify key transcription factors regulating the response of a cell population to a stimulus using the H3K4me2/3 nucleosome-resolution ChIP-seq data alone. As the LNCaP response to 4 hours of androgen exposure was dominated by AR and FoxA1, we generated nucleosome-resolution H3K4me2 ChIP-seq data at 16 hours after DHT treatment, when we predicted secondary transcriptional responses would dominate. Applying our prediction model we found NKX3.1 and Oct1 motifs to be very significantly associated with high NSD scoring regions (Supplementary Table 2
, p-value < 1e-7, Supplementary Fig. 5c, Fig. 5d
), while the AR motif is not (Supplementary Table 2
, ). NKX3.1 is a homeobox gene involved in normal prostate development that marks the prostate luminal epithelial stem cell and is a putative tumor suppressor gene in the prostate 22,23
. Oct1 has been shown previously to collaborate with AR at a subset of AR binding loci in LNCaP cells 24
. While Oct1 is constitutively expressed in LNCaP cells, NKX3.1 was induced 4-fold by androgens both at the 4h and 16h time point 25
To test whether these factors were truly differentially bound, we selected sites with high NSD scores (16h vs. 4h) and central sequences closely resembling the TRANSFAC-defined NKX3.1 or Oct1 motifs (). ChIP-qPCR was performed to compare binding under vehicle conditions to that after DHT stimulation (4h and 16h). DHT-dependent NKX3.1 recruitment was validated on 18 out of 22 selected regions, to a degree comparable with that of two known 26
NKX3.1 binding sites (, Supplementary Fig. 6b
). While previously identified Oct1 binding sites have been located in close proximity to AR binding sites 24
, the model predicted a set of putative DHT responsive Oct1 binding sites that are independent of AR binding. ChIP-qPCR of these sites shows a strong response to DHT stimulation, where all nine sites have greater enrichment of Oct1 binding at 16 hours compared with 4 hours (, Supplementary Fig. 6c
We investigated whether the genomic regions identified with this model might be of regulatory importance. We defined “4 hour
” sites and “16 hour
” sites as the 5,000 regions with highest NSD scores after 4h DHT treatment versus vehicle, and 16h versus 4h DHT treatment, respectively. We then examined gene expression microarray assays at the vehicle, 4h, and 16h time points, and compared imputed differential binding with differential gene expression. At both 4h and 16h, the differentially expressed genes are more highly associated with imputed binding sites than non-differentially expressed genes (). Further analysis shows that the likelihood of a gene being up-regulated increases with the number and score of paired nucleosome sites in the vicinity of the TSS (). In contrast, when we examined the relationship between number and score of paired nucleosome sites for down-regulated genes, we found no correlation (Supplementary Fig. 7
). These results suggest that the high NSD scoring sites are functional enhancers that play a functional role in gene regulation.
By performing ChIP-seq for H3K4me2 and H3K4me3, we profiled at high resolution the changes in nucleosome occupancy that occur at enhancers in a human prostate cancer cell line in response to androgen stimulation. Analysis of nucleosome occupancy near AR and FoxA1 binding sites revealed a striking pattern of nucleosome stabilization in a pair of nucleosomes flanking the binding site and elimination of a nucleosome at the site itself. We found several intrinsic characteristics of the middle nucleosome that distinguish it from the flanking ones, its sequence is more evolutionarily conserved and has a higher A/T content, while its histone octamer is more likely to contain the H2A.Z histone variant. This suggests that it may be intrinsically less stable than the flanking nucleosomes. Thus the apparent differences in nucleosome stability may be the result of the combination of DNA sequence, histone octamer composition and transcription factor binding. We developed a novel quantitative model and scoring function, the NSD score, that correctly identified not only the sites of AR binding but also allowed the prediction of the binding of other factors including NKX3.1 and Oct1 that mediate secondary transcriptional responses. Thus this model defines the characteristic pattern of nucleosome occupancy changes associated with enhancers and can be used to infer dynamic TF binding events that occur when a cell population transitions between states.