|Home | About | Journals | Submit | Contact Us | Français|
While much attention has been focused on chromatin at promoters and exons, human genes are mostly composed of intronic sequences. Analyzing published surveys of nucleosomes and 41 chromatin marks in humans, we identified histone modifications specifically associated with 5′ intronic sequences, distinguishable from promoter marks and bulk nucleosomes. These intronic marks were spatially reciprocal to H3K36me3, typically transitioning near internal exons. Several marks transitioned near bona fide exons, but not near nucleosomes at exon-like sequences. Thus, we interrogated splicing for a role in histone marking. Despite dramatic changes in regulated alternative splicing, histone marks were stable. Notably, these findings are consistent with a role for exon definition in influencing histone marks. In summary, we demonstrate that the location of many intragenic marks in humans can be distilled into a simple organizing principle: association with 5′ intronic or 3′ exonic regions.
Nucleosomes, the basic units of eukaryotic chromatin structure, are marked by numerous post-translational modifications and compositional variants that influence DNA and RNA metabolism1-3. For example, transcription by RNA polymerase II is associated with particular modifications and variants at promoters, including histone H3 lysine 4 methylations, numerous acetylations, H2A.Z, and H3.3 (refs. 4,5). Of these, H3K4me3 acts as a binding determinant for the TAF1 subunit of TFIID, mechanistically linking this modification to transcription initiation6.
By analogy to chromatin marks at promoters, marks positioned within genes physically coincide with various co-transcriptional processes and may influence or be influenced by RNA processing7,8. For example, depletion of human BRE1A/RNF20, which is required to catalyze H2B lysine 120 ubiquitylation (H2Bub), results in decreased chromatin association of the transcription elongation and splicing factor SKIP/SNW1 (ref. 9). Similarly, H3K36me3 is catalyzed by a methyltransferase associated with elongating, hyperphosphorylated RNA polymerase II10 and depletion of the human enzyme, SETD2/HYPB/KMT3A, leads to a defect in mRNA export and changes in alternative splicing11,12.
Most provocatively, bioinformatic analyses of data from diverse animals indicate that certain histone modifications, particularly H3K36me3, are not only elevated within genes, but specifically at exons13-19. It has been suggested, though, that H3K36me3 levels simply reflect higher nucleosome occupancy at exons in Metazoa17,19. In any case, how the bulk of chromatin within genes compares to the one or few nucleosomes at each exon has not been fully elucidated.
To explore more broadly the relationships between chromatin and gene architecture, we began by analyzing published human epigenomic data. We detected regions located after active transcription start sites and extending through 5′ introns of variable length and the first few internal exons, enriched in H2Bub, H3K23ac, and eight different methylations including H3K79me2/3. Regions in the 3′ portions of active genes displayed H3K36me3 peaking at internal exons with elevated levels extending into each downstream intron. Intriguingly, regulated alternative splicing failed to change these histone modification profiles.
To discover new epigenomic features related to gene expression, we analyzed published sequencing surveys of human mononucleosomes prepared by native micrococcal nuclease digestion (MNase-seq), mononucleosome ChIP (ChIP-seq) for 20 histone methylations, 18 histone acetylations, and the histone variants H2A.Z and H3.3, as well as cross-linked sonicated ChIP-seq of H2B lysine 120 ubiquitylation20-24. We used principal component analysis (PCA) to reduce the complexity of this large data set, producing a smaller number of dimensions, or principal components, that retain much of the variance in the original data set (see Methods). Thus, we aimed to identify subsets of the 41 chromatin marks that occurred at similar regions of the genome.
To identify relationships specific to genes, we applied PCA to data from 10 kb upstream to 10 kb downstream of transcription start sites (TSSs) (Fig. 1 and Supplementary Fig. 1). The first principal component corresponded to the 17 marks previously characterized as a “backbone” at active promoters23 (H2A.Z, H2BK5ac, H2BK12ac, H2BK20ac, H2BK120ac, H3K4ac, H3K4me1, H3K4me2, H3K4me3, H3K9ac, H3K9me1, H3K18ac, H3K27ac, H3K36ac, H4K5ac, H4K8ac, and H4K91ac). We also identified four more acetylations at active promoters (H2AK9ac, H3K23ac, H4K12ac, and H4K16ac). The second principal component included nucleosomes and marks that were generally depleted near active promoters, but had elevated occupancy within genes. The third principal component included high levels of H2BK5me1, H2Bub, H3K4me1, H3K4me2, H3K9me1, H3K23ac, H3K79me1, H3K79me2, H3K79me3, and H4K20me1.
Intriguingly, we noticed that several marks, particularly those enriched in the third principal component, were enriched in the 5′ regions of actives genes. This was reminiscent of their spatial distribution in yeast10, but in humans these marks extended downstream into 5′ introns of varying length, decreasing in the vicinity of exons (Fig. 2a). To assess if this were a genome-wide phenomenon, we aligned thousands of genes by TSSs and sorted them by distances from the TSSs to the first 3′ splice sites, plotting each gene as a row with sequencing read density as a heatmap. Thus, this perspective simultaneously displayed three different aspects of gene architecture: TSS positions, first introns, and first internal exons. In this view, principal component three, and to a lesser extent component two, displayed a 5′ bias that decreased near internal exons, with enrichment more apparent at genes with higher expression (Supplementary Fig. 1).
Analyzing individual chromatin marks in this way revealed specific relationships with features of gene architecture (Fig. 2b and Supplementary Fig. 2). As expected, promoter marks high in principal component one, such as H3K4me3, surrounded TSSs. Nucleosomes, high in component two, appeared slightly higher near TSSs and internal exons than within neighboring introns13,16-19. H3K79me2, representative of the 10 marks high in component three, showed enrichment that extended downstream of TSSs into first introns until reaching internal exons. In contrast, H3K36me3 exhibited a spatially reciprocal relationship with H3K79me2, clearly peaking at internal exons. As a control, H3K27me1 was present within active genes20, but exhibited little if any change with respect to exons (Supplementary Fig. 2).
Interpreting the distribution of histone modifications from MNase- and ChIP-seq data is complicated by potential technical artifacts25-27 and nucleosome occupancy biases13,16-19,27. Therefore, we independently assessed intronic and exonic marks by quantitative PCR (ChIP-qPCR) at the STIM1 gene, as it has internal exons separated by large 5′ introns (Supplementary Fig. 3). Validating the ChIP-seq profiles, we observed enrichment for H3K79me2 in the first three long introns and H3K36me3 near exons. H3 displayed some heterogeneity along STIM1, which may reflect differential nucleosome occupancy or positioning, especially at exons13,16-19. However, H3K36me3 near the exons of STIM1 remained enriched after normalization to H3, although reduced in dynamic range.
Across the genome, normalizing ChIP-seq data by MNase-seq data largely removed exon-centered enrichment for most marks (e.g. H3K79me2), but H3K36me3 remained enriched after normalization, peaking near exons and extending on average a few kilobases downstream (Supplementary Fig. 4). Also, performing PCA and heatmap analyses after normalizing each ChIP-seq dataset by MNase-seq (Supplementary Fig. 4) supported similar conclusions as the analyses without normalization above (Figs. 1 and and22).
We considered the possibility that nucleosome occupancy at internal exons could function as the signal delineating intronic and exonic regions. Bona fide internal exons are annotations of spliced RNAs, minimally including 3′ and 5′ splice sites with additional sequence constraints to encode proteins and include splicing enhancer and silencer sites. Spies et al. identified intronic exon-like sequence composition regions (ECRs) that are not spliced, but which have high nucleosome occupancy like that at bona fide exons18.
We compared ECRs to bona fide exons to determine if the general sequence composition sufficient to direct high nucleosome occupancy18 were also sufficient to pattern histone modifications. To control for biases in number and position of ECRs relative to all exons, we compared each intronic ECR (12,687 total) to its nearest bona fide exon. Similar to genome-wide averages of H3K36me3 at exons (Supplementary Fig. 4), H3K36me3 peaked at ECR-matched exons (Supplementary Fig. 5). However, H3K36me3 did not peak at ECR positions. On average, marks in the third principal component were higher upstream of exons than downstream and a similar pattern was found near ECRs, although the marks variably displayed decreased spatial response to ECRs compared to exons (e.g. H3K4me1, H3K4me2, H3K9me1, and H3K23ac). As a control for comparison, H3K27me1 showed little difference up- and downstream of exons or ECRs (Supplementary Fig. 5).
Finding that nucleosome occupancy could not fully account for the spatial dynamics of histone modifications that change near exons, we considered a model in which RNA splicing events are mechanistically linked to co-transcriptional histone modification, thus coupling exon positions to chromatin states. Splicing is a complex, multistep process that involves recognition of splice sites and progression of spliceosome assembly, ultimately resulting in intron excision28. Both splice site recognition and spliceosome assembly can be regulated to produce alternative RNA splicing outcomes29. To test if RNA splicing itself could direct histone modification profiles, we examined two cases of alternative splicing in which the frequency of exon inclusion was experimentally manipulable without altering the DNA/RNA sequence.
In our first example, YPEL5 exon 2 inclusion increases when cells are treated with caffeine30. To assess both YPEL5 expression and alternative splicing, we measured total mRNA by qPCR and the ratio of mRNA isoforms by RT-PCR in various concentrations of caffeine. After 7 hours exposure, mRNA levels were similar between 6 mM and 18 mM caffeine (Supplementary Fig. 6), while exon 2 inclusion increased from 1% inclusion to 66% inclusion (Fig. 3a and Supplementary Fig. 6). We verified that our ChIP-qPCR at YPEL5 spatially recapitulated the profiles of H3K79me2, H3K36me3, and nucleosomes in the ChIP-seq and MNase-seq data (Supplementary Fig. 6). Notably, H3K79me2 and H3K36me3 profiles were similar at the low and high caffeine concentrations (Fig. 3b). Therefore, the dramatic change in splicing of YPEL5 RNA was not sufficient to direct changes in these histone modifications.
As with many alternative splicing events, little is known about the mechanism of YPEL5 exon 2 inclusion. Moreover, we were constrained to short treatment periods in the experiments, as high levels of caffeine are toxic. Hence, we sought a second example of regulated alternative splicing, in which changes in alternative splicing could be maintained over many cell divisions. In lymphocytes, the RNA-binding protein hnRNPLL binds to CD45 pre-mRNA to promote cell-type specific exon-skipping of multiple CD45 exons31-33. Although B cells typically express long isoforms of CD45, the B cell lymphoma lines BJAB and BL41 express different levels of hnRNPLL, and consequently, maintain different CD45 isoforms31.
We obtained a set of matched BJAB and BL41 cell lines in which hnRNPLL levels are stably decreased by short hairpin RNA interference or increased by hnRNPLL over-expression, respectively31. BJAB and BL41 cells incidentally expressed 100-fold different levels of CD45 mRNA, so we compared matched lines to control for variable CD45 expression (Fig. 4a). RT-PCR analysis to assay variable inclusion of exons 4-6 confirmed that depletion of hnRNPLL in BJAB cells increased inclusion, whereas overexpression of hnRNPLL in BL41 cells decreased inclusion (Fig. 4b). H3K36me3, H3K79me2, and nucleosome levels at CD45 measured by ChIP-qPCR were generally similar to published ChIP-seq and MNase-seq data in T cells (Supplementary Fig. 7). If RNA splicing were sufficient to change histone modifications, increased exon inclusion at exons 4-6 should decrease H3K79me2 and/or increase H3K36me3 levels. However, we observed no significant differences in histone modifications between matched pairs of BJAB or BL41 cells expressing different CD45 isoforms (Fig. 4c,d).
We used PCA to help identify groups of histone marks (Fig. 1) that displayed distinct profiles within active genes (Supplementary Fig. 1). Among these, we identified a group of 10 marks biased for 5′ introns and roughly reciprocal with H3K36me3 near exons (Fig. 2 and Supplementary Fig. 2). Sequence-related nucleosome occupancy alone was not sufficient to explain transitions between intronic and exonic histone modification regions (Supplementary Fig. 5). The genome-wide phenomenon of intronic and exonic marks suggests that a general mechanism associated with splicing influences histone marking, as speculated previously15. Thus, we tested if levels of RNA splicing influenced the profiles of these histone modifications. Neither inducible changes in exon inclusion in YPEL5 after several hours (Fig. 3), nor changes in inclusion of multiple exons in CD45 over numerous cell divisions (Fig. 4) resulted in significant differences in H3K79me2 or H3K36me3. This contrasts starkly with transcriptional induction, in which H3K79me2 and H3K36me3 levels can increase within one hour34.
Our results rule out a simple correlation between the extent of RNA splicing and histone modification levels. Intriguingly, regulated skipping of CD45 exon 4 is achieved by stalling spliceosome assembly at an early step following exon definition35; therefore, spliceosome assembly may be sufficient to pattern histone modifications at CD45 exon 4, regardless of splicing outcomes. As it is now recognized that promoter marks are relatively stable36,37 and may reflect transcription initiation38 rather than productive transcription, we suggest by analogy that intronic and exonic marks are relatively stable and may reflect an aspect of the splicing process, such as exon definition, rather than productive splicing per se.
If a feature related to splicing were required to influence histone modifications, we would expect a decrease in 5′ intron marks and an increase in H3K36me3 specifically at bona fide exons, but not at ECRs, which have similar sequence composition, but are not spliced. Indeed, ECRs do not direct increased levels of H3K36me3 as at exons or decreased levels of certain 5′ intronic marks (i.e. H3K4me1, H3K4me2, H3K9me1, H3K23ac) (Supplementary Fig. 5). In contrast, some 5′ intronic marks (e.g. H3 lysine 79 methylations) have nearly equivalent decreases near exons and ECRs, suggesting that the local sequence composition of exons (and ECRs) may be sufficient to influence these histone modification profiles. Thus, cis-acting sequences, functioning as DNA or RNA, could signal to change histone modifications. For example, SR protein binding sites at exons could affect chromatin marking through “reverse” coupling8. Additionally, elevated nucleosome occupancy at exons13,16-19 may play a role, independently or in conjunction with cis-acting sequences. Extensive mutagenesis will be essential to distinguish between these possibilities and identify necessary sequences.
How distinct histone modification patterns are determined remains largely unknown. In agreement with Spies et al.18, we find that histone modification patterns do not merely reflect nucleosome enrichment at exons, indicating the need for additional steps of regulation beyond nucleosome occupancy such as chromatin modifying enzyme action. Spatial similarities among the intronic marks could reflect cross-regulation between histone modifications39. For example, the extensive overlap between H2Bub and H3K79me2/3 (Supplementary Fig. 2) is consistent with direct stimulation by H2Bub of H3 lysine 79 methyltransferase activity by DOT1L and a DOT1L-containing protein complex40,41. Also, H2Bub can directly stimulate human SET1-mediated methylation of H3 lysine 4 in vitro, so H2Bub within 5′ introns may help to increase H3K4me1 and H3K4me2 (refs. 42,43).
We propose that promoter, intronic, and exonic intragenic histone modification regions constitute three distinct chromatin zones at active human genes4,23, analogous to those found in yeast10,44: i) the “backbone” or promoter zone contains H3K4me3, ii) a 5′ zone includes H3K4me2 (and 9 other marks in humans, including H3K79me2), and iii) a 3′ zone contains H3K36me3 (Fig. 2a). However, most yeast genes are short and intronless, in contrast to human genes, which frequently contain large first introns. Also, we found occasionally that these different zones may interdigitate, e.g. there was switching back and forth between H3K79me2 and H3K36me3 within STIM1 (Supplementary Fig. 3). This implies that these zones are not strictly 5′ and 3′ with respect to genic positioning, but rather that introns and exons act to organize intragenic histone modifications.
Intronic and exonic histone modifications can, in principle, affect many aspects of gene expression7. For example, H3K36me3-modified nucleosomes could directly influence splicing12. Furthermore, H3K36me3 levels on average are slightly lower at alternatively spliced exons compared to nearby constitutive exons14,15,17. An open question for future studies remains: how are these marks established and maintained at particular introns and exons within the same transcribed unit? A working knowledge of such localization mechanisms will be paramount to understanding temporal and spatial regulation of chromatin marks in humans.
Analyses were performed and plots were generated with custom R scripts (http://www.r-project.org). Human gene models were obtained from the UCSC genome browser RefSeq (hg18) annotations (http://genome.ucsc.edu/). We offset sequencing read end-positions by 73 bases (half the width of the canonical nucleosome) to approximate the position of nucleosomal centers. MNase- and ChIP-seq reads were aggregated in non-overlapping 1-kb windows from 10 kb upstream to 10 kb downstream of TSSs and data across these genomic segments were subjected to PCA. PCA was performed by singular value decomposition with the data centered and scaled using the R function prcomp(). Figure 1 was rendered in PyMOL (http://www.pymol.org/). For all plots only one gene model was shown for genes with multiple models. For Figure 2a and Supplementary Figures 6 and 7, a Gaussian kernel density estimator was used with both the kernel width and bin size set to 146 bp (size of one canonical nucleosome). The kernel density scale was adjusted so that uniform density would be equal to 1 at all positions. We did not precisely map the positions of individual nucleosomes, because our analysis was generally on the scale of tens to hundreds of nucleosomes. However, we verified that the plots were robust to changes in nucleosome phasing and average width in the kernel density estimator.
SW620 human colon cancer cells (ATCC CCL-227) were grown in DMEM supplemented with 10% FBS (v/v), Pen-Strep, and Normocin (Amaxa). They were grown to near confluence for the experiments in Supplementary Figure 3 and were seeded at a density of 1 × 105 cells per cm2 24 hours prior to treatment with media supplemented with the indicated concentration of caffeine for 7 hours in Figure 3. BJAB and BL41 B cell lines31 were grown in RPMI-1640 supplemented with 10% FBS (v/v), glutamine, Pen-Strep, and Normocin (and 5 μg ml-1 puromycin for the BJAB lines) to a density of about 1 × 106 cells ml-1. The earliest passages available for the B cell experiments were used, because the effect on CD45 alternative splicing decreased over continual passaging.
ChIP was performed as described34 with the following modifications. Nuclei from 1 × 107 crosslinked cells were digested with 100 units of MNase (Worthington) in 500 μl of Buffer N supplemented with 3 mM CaCl2 for 10 min at 37°C. We previously determined the concentration of MNase to be optimal for digestion to mostly mononucleosomes by agarose gel electrophoresis of the cross-link reversed input chromatin. Chromatin extract equivalent to 2 × 106 cells in 500 μl of RIPA was incubated with 4 μg of one of the following antibodies for 2 hours at 4°C: H3K79me2, H3K36me3, or H3 (Abcam ab3594, ab9050, or ab1791). Instead of Protein A/G agarose beads, 50 μl of Protein G Dynabeads (Invitrogen) were added to each tube and incubated overnight at 4°C. After the chromatin elution step, DNA was purified with QIAquick PCR columns (QIAGEN) and quantified by qPCR as described below.
To analyze ChIP-qPCR data, we used MNase-digested input material as a quantitative standard, thereby measuring the efficiency of immunoprecipitation, and internally controlling for digestion biases. In addition, we normalized H3K79me2 and H3K36me3 ChIP to an H3 ChIP to control for differences in nucleosome occupancy (Supplementary Fig. 3). It should be noted that ChIP-qPCR is subject to particular caveats, most notably the dependence on fixed primer positions. As such, MNase digestion of DNA around strongly positioned nucleosomes can negatively affect the amplification efficiency of partially overlapping amplicons. Similarly, histone modifications or variants may differentially influence the extent of MNase digestion and, consequently, bias amplification efficiency. We attempted to minimize these effects by tiling genes with primer sets designed to amplify small segments of DNA (62 bp median amplicon size, Supplementary Table 1).
PCR primers (Supplementary Table 1) were designed with BatchPrimer3 v1.0 (http://wheat.pw.usda.gov/demos/BatchPrimer3/) from human repeat-masked sequence obtained from the UCSC genome browser (hg18). Serial dilutions of DNA ranging from 50% to 0.4% of the input chromatin were used to obtain calibration curves, measure primer efficiencies, and ensure that quantification was in a linear dynamic range. Primer sets yielding multiple amplification products or calibration curves with R-squared values of < 0.96 were excluded. For each qPCR sample, 0.5 μl of 50 μl eluted ChIP DNA was amplified in 25 μl volume reactions containing 250 μM dNTPs, 1 × (NH4)2SO4 buffer (Fermentas), 0.5 μM primer, 1.5 mM MgCl2, 1.25 units Dynazyme II (Finnzymes), and Sybr Green I fluorescent dye (Sigma). Fluorescence was measured on a BioRad CFX-96 machine using standard cycling conditions (3 min at 95°C, 40 cycles of 15 s at 95°C, 30 s at 55°C, and 15 s at 72°C). Each ChIP value was calculated as the median of 2-4 technical qPCR replicates. For each biological replicate consisting of qPCR replicates, median ChIP values for H3K79me2 and H3K36me3 were then divided by median H3 ChIP values.
We thank the following for kindly providing reagents and data: Shalini Oberdoerffer and Anjana Rao (Harvard Medical School) for B cell lines, Noah Spies and Chris Burge (MIT) for ECR locations, and Efrat Shema and Moshe Oren (Weizmann Institute of Science) for H2Bub ChIP-seq data. We also thank Hiten Madhani and Joan Steitz for critical reading of the manuscript and the Guthrie, Yamamoto, and Panning groups for helpful discussions. J.T.H. and A.M.P. were supported by individual ARCS Foundation Scholarships. Research support was provided by NIH grants GM21119 to C.G. and CA020535 to K.R.Y. C.G. is an American Cancer Society Research Professor of Molecular Genetics and K.R.Y. is a consultant with Merck & Co., Inc.
Author Contributions: J.T.H. and A.M.P. designed and performed the analyses and experiments. J.T.H., A.M.P., C.G., and K.R.Y. wrote the manuscript.