Previous studies have shown that underlying DNA sequences are important determinants of nucleosome occupancy [14
]. For example, the in vitro
binding of nucleosomes to naked genomic DNA from different species is dictated in large part by the DNA sequence composition [15
]. By collecting nucleosome-bound DNA sequences and center-aligning them, common underlying features of nucleosome-favoring sequences could be found and modeled based on thermodynamics for future predictions of nucleosome formation [14
]. In another approach, a support vector machine was employed to build nucleosome prediction models based on different human cell lines [16
Although promoter sequences have been extensively explored with respect to nucleosome patterns, the mechanism by which CGI sequences affect nucleosome assembly has never been studied. One may postulate that the unique sequence features of CGIs (for example, aberrant high CpG density) may prevent nucleosome assembly, considering the active chromatin structure of CGIs in vivo
Expectedly, the in vivo
nucleosome occupancy within the CGI is remarkably low compared to that in the flanking regions (Figure ). Open chromatin can be identified by DNase I hypersensitivity experiments. I used the whole-genome data of DNase I hypersensitivity sites [18
] to assess their enrichment in CGIs (see Materials and methods). The fraction of the human genome that harbors these sites was compared with that of the CGIs that overlap these sites, producing an odds ratio of 14. This means that open chromatin is 14-fold more likely to be found in CGIs than in the other genomic regions.
Figure 1 Nucleosome organization of promoter CGIs. (a-c) Nucleosome patterns upstream, inside and downstream of the CGI (from left to right) based on (a) in vivo nucleosome occupancy for human T cells  measured as normalized read count (NRC; see Materials and (more ...)
To assess whether the nucleosome depletion of CGIs is derived from sequence preferences, I utilized the two independent nucleosome prediction datasets mentioned above [15
]. The portions of the prediction data for CGIs were collected to show that strong nucleosome-favoring features were encoded in the DNA sequences of CGIs (Figure ; Additional file 1
). This finding is confirmed by the high DNA bendability of CGI sequences, which is required for sharp DNA bending around histone complexes [19
] (Figure ). The measurement of DNA bending was based on structural parameters that characterize the bending propensity of trinucleotides, as deduced from DNase I digestion data [20
One factor that can explain this pattern is homopolymeric dA:dT tracts. As important elements in eukaryotic promoters, these tracts are known to act as an intrinsic nucleosome destabilizer [21
]. Thus, they can be used as a strong indicator of a nucleosome-free state in sequence-based nucleosome prediction models [23
]. The sequences of CGIs typically lack these elements. A high CG density cannot be maintained in AT-rich sequences. This phenomenon might explain, in part, the nucleosome-favoring signals encoded in CGI sequences.
Reflecting this reciprocal tendency of in vivo and predicted nucleosome occupancy, promoters with a CGI tended to maintain a NFR in vivo (Figure ) against high sequence tendencies toward nucleosome deposition (Figure ). Conversely, CGI-lacking promoters exhibited high nucleosome occupancy at the +1 nucleosome location (Figure ), which seemed to be programmed by nucleosome sequence preferences (Figure ).
The conflicting results obtained from the sequence features and in vivo
measurements were also demonstrated in the context of DNA methylation. CGIs are typically unmethylated [25
], notwithstanding many target CpGs in them. It is likely that trans
-acting regulators are actively recruited to promoter CGIs to maintain this region in a nucleosome-and methylation-free state, overcoming the sequence preferences for high methylation and nucleosome packaging. Accordingly, CGIs showed increased nucleosome occupancy when methylated (orange curve in Figure ).
A model of cis
-programmed nucleosome positioning has been established for the yeast promoters [15
]. In the human genome, however, DNA sequences completely fail to predict the presence of promoter NFRs, which is the most distinguishing property of nucleosome positions in vivo
. This seems due to the unexpected feature of CGIs, which is a conflict between the actions of cis
-elements in the context of chromatin organization.
CGIs often extend into downstream transcript regions. This provides an explanation for the observation that the exon at the 5' end of the transcript, flanked with the transcription start site, shows a remarkably higher CpG density than the downstream exons (Additional file 2
). Given the distinctive chromatin state of CGIs, this might influence exonic nucleosome occupancy and CpG methylation depending on exon location.
An investigation of the DNA methylation and nucleosome occupancy of exons reveals several novel findings (Figure ). First, nucleosome occupancy and CpG methylation are enriched in exons relative to introns. Second, non-coding exons (NCEs) show markedly lower enrichment than coding exons, including initial coding exons (ICEs), internal exons, and last coding exons (LCEs). Third, a significant difference is detected between the 5' end ICEs and internal ICEs. Fourth, even though flanking each other within the LCE or ICE, the UTR and the coding region show differential levels of nucleosomes and methylation.
Figure 2 Exonic DNA methylation and nucleosome occupancy. (a) Nucleosome occupancy (upper panel) and CpG methylation (lower panel) plotted as the average of all transcripts across non-coding exons (NCEs), coding exons, and flanking introns according to their relative (more ...)
The exonic enrichment of nucleosomes has been reported in most recent studies [12
]. A similar finding has also been reported for H3K36me3 [10
]. Indeed, H3K36me3 showed a pattern similar to that observed for nucleosomes (Additional file 3
). The exon enrichment of DNA methylation has been recently reported [27
]. A novel observation here is that these marks are differentially distributed among exons with different positions and functions, in a manner that nicely explains their role in RNA splicing.
For example, the 5'-end ICEs do not display high enrichment because they do not require mechanisms for exon inclusion as starting exons only with the splice donor. On the other hand, the functional importance of coding exons might restrict the loss of these marks that ensure exon inclusion into mature transcripts. The maintenance of these marks in coding exons might be assisted by DNA sequence conservation, as indicated by the observation that coding sequences in the ICEs and LCEs show higher enrichment than their flanking UTRs. As compared to 5' UTRs, 3' UTRs are located more remotely from splice acceptors, decreasing the need for these epigenetic mechanisms.
This is the first study to suggest a role for intragenic DNA methylation in RNA splicing. Using the same nucleosome dataset employed herein [5
], a previous study has reported the association of high nucleosome occupancy and exons with weak splice sites [13
]. Based on the same data for exon strength, I discovered that CpG methylation was also enriched in weak exons (Additional file 4
Overlapping CGIs on the 5'-end exons seemed to be coupled with a lower level of DNA methylation and nucleosome occupancy (Additional file 2
). However, internal NCEs were not affected by CGIs (Additional file 2
) but still demonstrated a low level of nucleosome occupancy and CpG methylation similar to introns (Figure ). Therefore, it is not likely that the differential enrichment between internal NCEs and internal ICEs results from the CGI effects.
As the methylation data used here were generated based on the affinity of methylation-binding proteins, it is possible that high CpG density on exons results in the exon enrichment of DNA methylation. To resolve this confounding effect, I used the normalized methylation levels divided by CpG density. It seems that CpG density does not affect the DNA methylation patterns (Additional file 5
). Another approach to measuring DNA methylation is based on bisulfite treatment, which provides methylation measures on single CpG sites. One such dataset for H1 human embryonic stem cells and IMR90 lung fibroblasts [28
] was used and found to reproduce a similar pattern of exon enrichment (Additional file 6
To further test the role of CpG methylation in RNA splicing, I employed RNA-seq data, which can provide the relative expression of each internal exon compared to the other exons present in the transcript. This measure indicates the inclusiveness of the RNA splicing process for a given exon and is thus termed exon inclusiveness. The exons with the lowest 10% of exon inclusiveness (less than about -1) were considered as spliced out while the others as spliced in. To evaluate sequencing bais, the exons with the top 10% of exon inclusiveness (greater than about 1) were identified as highly expressed (see Materials and methods). The distribution of exon inclusiveness is presented in Figure .
The comparison of nucleosome occupancy and CpG methylation among the above-defined skipped exons, included exons, and highly expressed exons (Figure ) reveals that the included exons indeed contain a higher level of epigenetic marks compared to the skipped exons. Moreover, the pattern was not caused by sequencing bias, given the minor differences between the included and highly expressed exons. This result is consistent with the finding that H3K36me3 is enriched on constitutive exons [10
] and confirms the hypothesis that these marks can facilitate exon inclusion.
In an effort to find why the three marks are associated with splicing regulation, I discovered that CpG methylation, nucleosome deposition, and H3K36me3 differentially marked the internal exons of genes possessing different expression levels (Figure ): H3K36me3 marked highly expressed genes as shown in a previous study [10
], nucleosomes appeared among lowly expressed genes, and DNA methylation was linked with an intermediate level of gene expression. The elongation efficiency of pol II clarified this pattern (Figure ). Genes with a CGI in their promoter tended to be regulated by H3K36me3 rather than nucleosomes or CpG methylation, probably for efficient transcription elongation (see gray lines in Figure ).
Figure 3 Normalized nucleosome occupancy, CpG methylation, and H3K36me3 density. (a,b) Normalized nucleosome occupancy, CpG methylation, and H3K36me3 density for internal exons versus (a) the quantiles of gene expression level and (b) pol II elongation efficiency. (more ...)
Tilgner et al.
] have shown that when normalized by nucleosome levels, the relative density of H3K36me3 does not show exon-specific enrichment. My hypothesis is as follows. The relative density of H3K36me3 differs between highly and lowly expressed genes. It is the density of nucleosomes that differs between exons and introns. Therefore, the absolute level of H3K36me3, the product of the nucleosome level and the relative modification density, should be different between the exons and introns of highly expressed genes (Additional file 7
This finding proposes a new model for the influence of epigenetic mechanisms on RNA splicing. Nucleosomes seem to act as roadblocks to pol II passage and expose weak splice acceptors for a long duration to ensure exon inclusion. CpG methylation might play a similar function but with a lower efficiency in pol II inhibition. H3K36me3 appears to accelerate RNA splicing, likely by recruiting the spliceosome-for example, via the CHD1 protein [29
]. Although the detailed mechanisms remain to be elucidated, these three marks could function cooperatively to ensure the inclusion of the protein-coding exons of many different transcripts with varying transcriptional activity by differentially controlling pol II elongation efficiency.
In the present study, I focused on the general mechanistic effect of chromatin organization on proper splicing. However, tissue-specific or condition-specific alternative splicing may not be regulated in this way. More elaborate mechanisms involving cis-acting RNA sequences and trans-acting RNA-binding proteins should accompany this process. Changes in chromatin organization of an exon may result in an alternative inclusion or exclusion of the exon. With epigenomic datasets coupled with RNA profiles for multiple tissues or conditions, we will be able to demonstrate the chromatin regulation of alternative splicing.