We analyzed the global landscape of epigenetic relationships between histone modifications and transcription initiation by investigating genome-wide ChIP-Seq data and DeepCAGE data. The results presented here show differences in the architecture of the broad and peak promoters that regulate gene expression. Especially, we revealed that the broad promoters were strongly associated with histones immediately downstream of the TSS and they were frequently modified, presumably to regulate gene expression levels.
In previous studies, aligned patterns of nucleosome positions around TSSs have been identified in yeasts and humans [22
]. However, we confirmed this alignment only for regions around TSSs derived from broad promoters, not for those around TSSs derived from peak promoters. Broad promoters have an aligned pattern of nucleosome positions around TSSs and have large nucleosome-free regions immediately upstream of TSSs. Studies in yeasts have validated the model of "open promoters," which have large, nucleosome-free regions immediately upstream of the TSS and are often associated with TATA-less promoters and poly (dA:dT)-rich tracts, the sequences of which are unbendable and unstable for histone binding [42
]. The broad promoter characteristics that we found in humans are consistent with this model, because in humans the sequence patterns in CpG islands located upstream of TSSs, in contrast to the yeast poly (dA:dT)-rich tracts, have been shown to be unstable [31
Our data indicate that the nucleosomes that are immediately downstream of TSSs and associated with broad promoters are positioned in specific regions. We suggest that broad promoters have these aligned patterns of nucleosome positions around TSSs because the nucleosome position has a stronger impact on broad promotors than on peak promoters on the determination of TSSs by transcription factors in the cell.
As an example of transcription factors that target broad promoters, we investigated the Sp1 binding sites around TSSs. Sp1 recognizes binding region of DNA via its zinc finger domain whereas TBP recognizes TATA box via its DNA binding domain. Sp1 binding sites were enriched in the regions upstream of TSSs corresponding to the nucleosome-free regions. We observed similar tendencies for the binding sites of two transcription factors, PU.1 and MAZ. Although biological experiments are necessary to investigate molecular mechanism behind this observation, we speculate that the nucleosome-free regions serve as "landing sites" for transcription factors, including Sp1, which have less precise binding motifs (which are overrepresented among broad promoters) than the TATA box [43
In addition to histone H3, we also analyzed the positions of the histone H2A variant H2A.Z, which is enriched around TSSs [46
], and we obtained similar results. In contrast, peak promoters did not have aligned patterns of nucleosome positions. One might suspect that the observation is due to high expression of genes associated with broad promoters, and low expression of those associated with peak promoters. However even after we limited the analysis to broad and peak promoters both of which are associated with highly expressed genes, we still observed the preferences of H3 for broad promoters (region 100-130 bp with respect to TSSs) compared to peak promoters (P
< 1.0 × 10-9
, chi-squared test, data not shown). Although TSSs for TATA promoters are often fixed to single positions, our results suggest that such strictly controlled positions of TSSs are not regulated by nucleosome position. However, there is some evidence that the nucleosomes around TATA promoters have regulatory roles in gene expression. In yeasts, the TATA promoter is one type of "covered promoter," and expression of the genes associated with such promoters is more likely to be inhibited by the presence of nucleosomes than expression of the genes associated with "open promoters," which are located in nucleosome-free regions [42
]; in covered promoters, nucleosomes often cover transcription factor binding sites to repress the expression of downstream genes. It is also possible that, in humans, peak promoters associated with the TATA box belong to one type of "covered promoter" where the expression of downstream genes is repressed by the presence of nucleosomes. Therefore, we speculate that transcription factor binding is controlled by nucleosome position in the case of peak promoters.
In our analysis of epigenetic control by histone modification, we uncovered an difference between broad and peak promoters. H3K4me1, -2, and -3 and H3K9ac, which are associated with gene activation, were more highly enriched around TSSs associated with broad promoters than around those associated with peak promoters. Thus broad promoters appeared to be under stronger epigenetic control than peak promoters. We found a trend that further supported this hypothesis: the expression levels of genes associated with broad promoters that had modified histones had higher expression levels than genes associated with broad promoters without modified histones. In contrast, peak promoters appeared to be under weaker epigenetic control, because far fewer of them harbored modified histones. Furthermore, there were no significant differences in the expression levels of genes associated with peak promoters that harbored or did not harbor modified histones.
It has been shown that promoters with many CpG islands are more likely to harbor modified histones than promoters with fewer CpG islands [40
]. However, even after we limited our analysis to promoters having CpG islands, number of broad promoters harboring H3K4me3 was still statistically higher than that of peak promoters. Even more remarkable differences were observed after we limited our analysis to promoters without CpG islands. Although these results may depend on the dataset of CpG islands we used, enrichment of H3K4me3 in downstream region (+100 to +130-bp) of broad promoters were still observed in the analysis using different dataset of CpG islands [47
value of < 1.0 × 10-20
for CpG-related genes, P
value of < 1.0 × 10-30
for CpG-unrelated genes).
Genes associated with broad promoters tend to be expressed ubiquitously, whereas those associated with peak promoters are likely to be expressed in specific tissues and may show low expression levels in most tissue types [8
]. Therefore, if high levels of gene expression are directly associated with histone modifications around TSSs, then we may observe spurious correlations between promoter type and histone modification. In fact, H3K4me3 is known to upregulate the expression of downstream genes. We therefore compared the distribution patterns of nucleosomes containing H3K4me3 around broad and peak promoters in cases where the downstream genes showed similar expression levels (Additional file 4
, Figure S4). We found that the broad promoters also harbored more nucleosomes containing H3K4me3 in cases where the downstream genes showed similar expression levels (data not shown); the difference in the distributions of H3K4me3 around the broad and peak promoters was statistically significant (all positions from +100 to +130 showed significant differences; P
< 1.0 × 10-3
, chi-squared test), suggesting that promoter type was indeed associated with differences in epigenetic regulation by histone modifications.
Peak promoters containing the TATA box are regulated at their transcription initiation step, generally by the assembly of a pre-initiation complex with three additional components: the TATA-associated factors, the so-called mediator complexes, and positive and negative cofactors. We presume that peak promoters containing no TATA box are regulated in a similar way. This transcription system is widely used in various species, and our results suggest that it is unlikely to use epigenetic controls. Thus, broad and peak promoters have distinct systems to regulate gene expression.
Throughout this work, we employed widely-accepted definition of peak promoters, i.e. those which initiate transcription within the range of 4 bps. Changing this threshold to 10 bp did not have much effect on the distribution patterns of nucleosomes around broad and peak promoters as shown by Pearson's correlation coefficients between histone distribution pattern around broad promoters (-5000 to 5000 bps with respect to TSS) defined by > 4 bps threshold and that defined by > 10 bps threshold. For H3 distribution patterns, correlation coefficients were 0.99 and 0.94 for broad and peak promoters, respectively. For H3K4me3 distribution patterns, the correlation coefficients were 0.99 for both broad and peak promoters. These results suggest the robustness of the relationships between the imprecision of TSS and patterns of histone distributions.
TATA boxes are used in a wide range of organisms, including prokaryotes, and are thought to be part of an ancient transcriptional system. In contrast, broad promoters are thought to be newly evolved [8
] and have incorporated histone modification systems. Our results showed that peak promoters, which are frequently associated with such ancient TATA boxes, have not incorporated histone modification systems.