Chromatin regulation has been studied in a variety of systems, but most extensively in unicellular yeasts and mammalian cells.
C. elegans has many features that make it well-suited as an alternative system for studies of chromatin regulation. Of particular note are its well-annotated genome, the ease of RNAi, and the rich resource of chromatin mutants for loss of function studies
4-7. Importantly,
C. elegans has a complement of chromatin factors very similar to that of humans, in contrast to yeast
8 and allows investigations of chromatin function in a multicellular organism
9,10. Because modifications to histone tails are correlated with and can regulate chromatin structure
1-3, we decided to map their positions genome-wide, to provide a framework for chromatin studies in
C. elegans.To generate an initial map of the distributions of histone methylations across the
C. elegans genome, we used chromatin immunoprecipitation (ChIP) followed by microarray hybridization to determine the genome-wide association of trimethylation of lysine 4, lysine 9 and lysine 36 of histone H3 (H3K4me3, H3K9me3, and H3K36me3). We prepared chromatin extracts from highly synchronized triplicate wild-type third larval stage worms and carried out chromatin immunoprecipitations using commercial antibodies (see Methods). Immunoprecipitated DNA was amplified and hybridized to 2.1 million feature full genome tiling microarrays (Nimblegen). Pairwise comparisons of same antibody ChIPs showed strong correlation between replicate data (
supplementary table 1), and the three replicates showed similar enrichment patterns across different genomic regions (). To correct for differences in nucleosome occupancy, we subtracted the H3 mean ChIP signal from those of H3K4me3, H3K9me3, and H3K36me3 (see Methods). To investigate relationships between transcription and different histone modifications, we generated four sets of genes: (1) top10: those in the top 10% of expression level in our samples, determined by gene expression profiling. (2) bottom10: those in the bottom 10% of expression level. (3) ubiq: genes annotated or expected to be actively transcribed in all nuclei. (4) serp: serpentine receptor genes, most of which are thought to encode chemosensory receptors transcribed in only a few neurons and thus to be transcriptionally inactive in most nuclei
11.
To gain initial insight into gene regions enriched for different modifications, we plotted mean log
2 ChIP signals across all genes. We aligned genes at the first and last nucleotides of the annotated transcripts and extended these regions with 1kb upstream and 1kb downstream of genomic DNA (). We call the first base of annotated transcript the TSS (transcript start site). Similar to other organisms
12-16 we observed a peak of H3K4me3 enrichment near the TSS that correlates with transcriptional activity ( and
Supp Fig 1). Highly transcribed genes (ubiq and top10) show strong 5′ enrichment of H3K4me3, but inactive genes (serp and bottom10) show no enrichment.
In
C. elegans, many genes are trans-spliced at their 5′ ends to a 21bp leader sequence
17. In these cases, the transcription start sites are not known because the 5′ end of the primary transcript is spliced off and degraded. In addition, some groups of genes are transcribed in operons, with trans-splicing separating transcripts from different genes. Spliced leader SL1 is found on genes adjacent to promoters and SL2 generally occurs on downstream operon genes not adjacent to promoters. To investigate the relationship between H3K4me3 and the transcription start site, we separated genes into SL1 genes and those not annotated to contain SL1 or SL2. We find a peak of H3K4me3 200bp downstream of the presumed TSS (the first annotated base) for non-SL1 annotated genes (). In contrast, the peak of H3K4me3 for SL1 genes occurs 50bp upstream of the first annotated base. The H3K4me3 peak position suggests that the transcription start site for SL1 genes is on average 250bp upstream of the trans-splice site. Peaks of H3K4me3 should prove a useful guide for identifying promoters of SL1 and non-SL1 genes.
We next looked at the genome-wide distribution of H3K9me3. This modification is generally associated with repressed chromatin
1-3. In mammalian cells, H3K9me3 is enriched in repressed constitutive heterochromatin, repetitive DNA, DNA transposons, and other repetitive elements
18,19. Studies on small gene sets also detected H3K9me3 in the bodies of actively transcribed genes
20,21, but this does not appear to be a general property based on genome-wide studies
22,23. In
C. elegans chromatin, we find that H3K9me3 is highly enriched across inactive genes, covering promoters, transcribed regions, and 3′ regions (blue line in and
Supplementary Fig. 1). In contrast, active genes show very low H3K9me3 signals (red line in and
Supplementary Fig. 1). Regions with clustered inactive genes often displayed continuous H3K9me3 enrichment across and between genes ().
In yeast and mammalian chromatin, there is a well-documented association of H3K36me3 with transcribed regions
1-3. The Set2 histone methytransferase that catalyzes this modification is associated with elongating RNA polymerase II, and the modification is made co-transcriptionally
24-26. There is evidence that one function of H3K36me3 in the gene body is to prevent aberrant transcription initiation
27,28. We find that
C. elegans genes also show high levels of H3K36me3 in gene bodies. The level of H3K36me3 is low at the 5′ end, increases to a plateau, and then decreases at the 3′ end (,
Supplementary Fig. 1).
We observed that H3K36me3 signals often showed discrete peaks and troughs in the gene bodies, with peaks correlating with exonic regions (). To explore whether this was a genome-wide phenomenon, we plotted H3K36me3 signals across aligned intron/exon and exon/intron boundaries. This showed a striking enrichment of H3K36me3 in exon regions compared to introns (,
Supplementary Fig. 2a;
Supplementary Fig. 3). In contrast, neither H3K4me3 nor H3K9me3 showed exon enrichments (,
Supplementary Fig. 2b,c;
Supplementary Fig. 3). H3K36me3 exon enrichment is not due to GC bias as exon signals are higher than those of introns across the whole range of %GC (). We observed high and level H3K36me3 signals across exons of different lengths and lower signal across introns (;
Supplementary Fig. 3).
We next asked whether H3K36me3 exon marking was dependent on transcription or instead is a constitutive feature of exons. We found that the highly expressed ubiq and top10 genes show a higher level of exon marking relative to all genes whereas bottom10 and serp genes show low or no marking, respectively (,
Supplementary Fig. 2a). We conclude that exon marking is transcription associated.
Because chromatin marking of exonic sequence with H3K36me3 depends on transcription and transcribed exons are spliced into mature transcripts, we wondered whether marking was related to the process of splicing. If so, then chromatin encoding exons that are constitutively included in transcripts would be expected to have a higher level of H3K36me3 than alternatively spliced exons. To address this possibility, we assembled a set of exon trios where an alternative exon is flanked by two constitutive exons and compared H3K36me3 levels in the three exons; the alternative and constitutive exons have similar GC contents (). We also compared these trios to a control set of length matched trios where all three exons are constitutively included. We find that alternative exons have significantly reduced H3K36me3 exon signals relative to their constitutive neighbours and to the matched control exons (). In contrast, there is no difference between the sets of trios in levels of H3K4me3 or H3K9me3 (). The reduction in H3K36me3 signal in alternative exons indicates that exon marking is related to splicing.
Although profiles of H3K36me3 have been extensively mapped in other organisms, exon marking has not been observed before. To ask whether this phenomenon is specific to
C. elegans or alternatively might be widespread, we analysed genome-wide data for mapping of H3K36me3 in mouse and human chromatin
22,23. These mapping data were generated by massively parallel sequencing rather than microarrays providing a platform control. Similar to the
C. elegans data, we find a strong enrichment of H3K36me3 in both mouse and human exons relative to introns (). In contrast, we find essentially level signals for H3K4me3 (). As a further control we examined H3K27me1, found across active gene bodies like H3K36me3
22, and found similar signals in exons and introns. (). As in
C. elegans, H3K36me3 exon enrichment is not due to GC bias (). The H3K36me3 signal in long exons increases to a plateau, similar to the pattern in
C. elegans exons (
Supplementary Fig 3). Across shorter exons more typical of human genes, H3K36me3 signal increases from 5′ to 3′ ends, resulting in an apparent peak near the 5′ splice site of the next intron (). The lower H3K36me3 signals in introns increase near both the 5′ and 3′splice sites ().
The above analysis demonstrated that H3K36me3 exon marking is conserved in human and mouse. To explore whether marking in mammalian chromatin is likely to be related to splicing as it is in
C. elegans, we used the mouse data
23 to ask whether alternative exons show reduced H3K36me3 signals relative to constitutive exons. Indeed, we found that mouse alternative exons have significantly lower H3K36me3 signals but no difference in levels of H3K4me3 (). The GC contents of the alternative exons are also similar to those of the constitutive exons (). We conclude that H3K36me3 marking of expressed exons is conserved.
What could be the function of H3K36me3 exon marking? Because constitutively expressed exons have higher marking than alternatively included ones, marking has a relationship with cis-splicing. There is increasing evidence that a significant amount of splicing occurs co-transcriptionally rather than post-transcriptionally, making interactions between chromatin and the splicing machinery plausible
29,30. Indeed, although to our knowledge marking of exons in chromatin has not been observed previously, there are recent reports of chromatin factors having roles in splicing. For example, the H3K4me3 binding protein CHD1 is associated with the splicesome and required for high splicing efficiency
31. In addition, splicing factors have been reported to associate with both chromatin and the RNA polymerase II complex
29,30. An attractive possibility is that marked exons in chromatin provide a mechanism to facilitate efficient splicing. For example, marked exons might aid recruitment of splicing factors to chromatin.
A second possibility is that the splicing machinery could regulate directly or indirectly K36 methyltransferases on the travelling RNA polymerase complex. If so, the composition of the travelling RNA polymerase complex might differ in exonic and intronic regions. For example, engagement in splicing reactions might reduce binding of splicing factors to Pol II. If these factors compete with or inhibit the H3K36me3 methyltransferase, this could result in regional differences in H3K36me3 on chromatin.
It is also known that the rate of RNA polymerase procession can vary over the gene and that changes in processivity can affect inclusion of alternative exons
32,33. It would be interesting to investigate whether H3K36me3 affects processivity, which in turn could affect splicing. H3K36me3 is known to prevent spurious transcription initiation
27,28, so it could have a general inhibitory influence on Pol II complex activity.