More than half of human genes initiate transcription from regions of the genome with an elevated content of CpG dinucleotides and G+C base pairs referred to as 'CpG islands' [1
]. In contrast to the rest of the genome, where CpG dinucleotides are heavily methylated and so rapidly lost through deamination, CpG sites within promoter CpG islands are normally free from DNA methylation and do not have an elevated mutation rate [3
]. Genes with promoters containing CpG islands (henceforth CpG promoter genes) encode housekeeping genes expressed in all cell types [8
] but also include a substantial number of master developmental regulators such as HOX
]. In contrast, non-CpG promoter genes tend to have more restricted expression patterns and to be expressed later in development during tissue differentiation.
Several lines of evidence suggest that the process of transcription initiation differs in CpG and non-CpG promoters. Systematic identification of the 5' ends of mammalian transcripts revealed that transcription tends to initiate from a broad region in CpG promoters but in a sharp peak in non-CpG promoters [13
]. CpG promoters also more frequently initiate transcription in both the sense and antisense direction, and produce unstable non-coding RNAs even in the absence of full-length mRNA production [13
]. Further, RNA polymerase II may be constitutively recruited to CpG promoters [14
], with polymerase release being an important point of regulation [14
]. CpG promoters are less likely to contain a TATA-box [13
], and contain fewer specifically located transcription factor binding sites [20
In addition to transcription, chromatin organization has also been reported to differ between CpG and non-CpG promoters. CpG and GC-rich DNA is preferentially bound by CXXC domain proteins that can recruit chromatin-modifying activities, including Cfp1 [21
], a subunit of an H3K4me3 methyltransferase complex [22
], and KDM2A, a H3K36me2 demethylase [23
]. Consistent with this, unmethylated CpG promoters have higher levels of H3K4me3, a histone modification associated with transcription initiation [24
]. However, CpG promoters also have higher levels of other modifications associated with transcription activation, such as the histone H3 lysine 4 methylations H3K4me1 and H3K4me2, and the histone variant H2A.Z [26
]. Moreover, it has been reported that GC-rich sequences can recruit the polycomb repressive complex 2 [28
]. CpG promoters have also been reported to contain a more pronounced nucleosome-depleted region upstream of the start site, despite the fact that nucleosomes have a high intrinsic affinity for G+C and CpG rich DNA [29
]. This distinction between nucleosome-depleted CpG promoters and nucleosome occupied non-CpG promoters is reminiscent of the distinction between two major classes of promoter in budding yeast [30
]. Finally, in efforts to use chromatin modifications to predict the locations of core promoters or gene expression levels, different modifications have sometimes been reported as most useful for genes with and without CpG islands [32
]. For example, in the models developed by Karlic et al.
], H4K20me1 and H3K27ac were most frequently employed to predict the expression levels of genes with CpG island promoters, whereas H3K4me3 and H3K79me1 were the modifications most frequently used in models to predict the expression levels of non-CpG island genes.
Chromatin-modifying enzymes can be recruited by elongating polymerase complexes, by sequence-specific DNA-binding proteins, and by non-coding RNAs [34
]. We hypothesized, therefore, that, beyond the distinctions described above, promoter type could be quite a general influence on the chromatin organization of a gene, including distally, away from the start site. We show here that this is indeed the case, and that genes with CpG island promoters show characteristic transcription-coupled changes in chromatin organization not seen in other genes. In particular, CpG promoter genes show a distinct set of transcription-linked epigenetic transitions within the 5' end of their gene bodies. They also have a different chromatin organization within the promoter region, including a histone modification specifically detected at the initiation site. Our analyses highlight complex differences in the chromatin of human genes with and without CpG islands in their promoters, and are consistent with a model in which there are at least two characteristic ways in which the chromatin of a human gene changes from repression to activation, depending upon the type of promoter in which transcription initiates.