The chromatin landscape is highly complex. Understanding the functional role of a modification requires a detailed analysis of its distribution, its relationship to other epigenetic factors (chromatin context), and correlation with functional properties of modified genes. We have utilized ChIP-seq data to explore the relationship between the pattern of H3K27me3 enrichment and gene expression. Three H3K27me3 enrichment profiles were identified: enrichment in the promoter, enrichment at the TSS and broad marking across the full length of the gene. All classes were present in four different cell types, but the proportions differed. These results provide new insights into transcriptional control mediated by PRC2, and call attention to the way in which we process and utilize ChIP-Seq data.
In this study, we employed several analytical and visualization tools to assess the distribution of H3K27me3. In particular, we used an averaged scaled enrichment (ASE) plot, which shows the average signal over the body of a gene, compared enrichment profiles between distinct cell types, stratified genes based on their expression level and assessed the interaction of H3K27me3 with other histone modifications and RNApol-II. We observed dramatic differences between the average H3K27me3 profile in ES and G1ME cells, and by focusing on the differences we were able to identify three predominant enrichment profiles. Comparable results were obtained when we used the k
-means algorithm to cluster genes based on their H3K27me3 enrichment profile. Similar clustering methods were used previously to assess the distribution of many histone modifications, including H3K27me3, in CD4+
T cells (45
). Hon et al.
) confirmed a strong association between H3K27me3 and transcriptional repression, but they did not identify genes that carry H3K27me3 specifically in the promoter region. We have seen that the proportion of marked genes in each class can vary dramatically between cell types, which may explain why the promoter class was not identified in CD4+
T cells. Indeed, our results suggest that clustering tends to miss profiles that contain small numbers of genes. Additionally, the k
-means algorithm requires the number of clusters to be selected prior to running the analysis and it uses a randomly chosen ‘seed’ gene to form each cluster, altering these variables can produce different results. Each visualization tool possesses distinct advantages and multiple tools should be used to interpret ChIP-Seq data. Although the ASE plot includes some structural landmarks, including the TSS, transcriptional end site and a loosely defined promoter, it still misses internal structures, such as introns and exons. Because of this limitation, the ASE plot did not provide sufficient resolution to identify a modification specifically enriched on exons as has previously been identified for H3K36me3 (9
). One possible extension of the ASE plot would be to scale the first exon and intron to the same length.
By combining gene expression data, RNApol-II, H3K36me3 and H3K4me3 ChIP-seq data with our classification scheme, we have been able to show that each of the promoter, TSS and broad classes of H3K27me3 is associated with a distinct transcriptional outcome. The promoter genes are highly expressed despite possessing significant enrichment of H3K27me3 in the promoter. In both ES and G1ME cells, promoter genes have a depletion of the repressive mark H3K27me3 across the body of the gene and a significant level of RNApol-II binding. These genes also show significant enrichment of H3K4me3 and H3K36me3 and higher than average mRNA expression. In contrast, the broad genes are strongly repressed. They have enrichment for H3K27me3 across the entire gene that can extend into the flanking regions and have little to no RNApol-II, H3K36me3 or H3K4me3. Finally, genes in the TSS class lack significant RNApol-II or H3K36me3 binding and have lower mRNA expression levels than the average gene, although not as low as the broad class. Many of the TSS genes are likely to be bivalent, having a peak in both H3K27me3 and H3K4me3 around the TSS, although we lack the sequential ChIP data needed to confirm co-occupancy. These findings support the generally accepted view that bivalent genes are ‘poised’, meaning that they are not yet committed to either activation or repression.
While our analysis identified many genes that carry H3K27me3 in their promoter, the precise role of the modification in this context remains unclear. The vast majority of promoter genes possess CpG islands upstream of their TSS. Several groups have noticed strong association between PRC2 binding sites and GC-rich sequence elements and it is thought that these sequences play a key role in recruiting PRC2 (5
). It was surprising to find that, in contrast to the promoter class, the frequency of CpG islands in the broad class was only 30%, which is comparable to the genome-wide frequency. This suggests that alternative mechanisms may be employed to recruit PRC2 to promoter and broad genes. Although promoter genes are highly expressed, it remains possible that H3K27me3 has a repressive function at these sites and acts to moderate expression levels. Another alternative is that deposition of H3K27me3 in the promoter occurs at regulatory elements. H3K27me3 has previously been found in the promoter of repressed Polycomb target genes that express small RNAs, which act as recruitment signals for PRC2 (46
); however, in this context H3K27me3 also occurs downstream of the TSS where it acts to block RNApol-II extension. If H3K27me3 does block RNApol-II extension, then marking in the promoter may provide a way to guide alternative promoter use or prevent inappropriate transcription in the opposite direction. It is not possible to address these questions using standard microarray expression analysis alone. A detailed analysis of gene expression by RNA sequencing, including promoter usage, splicing patterns and transcriptional orientation will be required to resolve this issue.
In this study, we focused on understanding how differences in the distribution of H3K27me3 impact on gene expression; however, there are many functional properties of genes that could also be considered, including promoter usage, alternative splicing, antisense transcription and replication timing. Many new insights into the biology of PRC2 will come as we continue to map more chromatin modifications and uncover new mechanisms that influence transcription. It will be important to integrate these new data to gain a better understanding of the mechanisms that govern the distribution of PRC2 and that regulate its activity. Our study demonstrates that it is important to consider the precise pattern of H3K27me3 enrichment on genes. Many Polycomb group proteins are involved in cancer and it will be interesting to see whether H3K27me3 enrichment profiles are also altered in disease.