|Home | About | Journals | Submit | Contact Us | Français|
5-hydroxymethylcytosine (5-hmC), a derivative of 5-methylcytosine (5-mC), is abundant in the brain for unknown reasons. Our goal was to characterize the genomic distribution of 5-hmC and 5-mC in human and mouse tissues. We assayed 5-hmC using glucosylation coupled with restriction enzyme digestion, and interrogation on microarrays. We detected 5-hmC enrichment in genes with synapse-related functions in both human and mouse brain. We also identified substantial tissue-specific differential distributions of these DNA modifications at the exon-intron boundary, in both human and mouse. This boundary change was mainly due to 5-hmC in the brain, but due to 5-mC in non-neural contexts. This pattern was replicated in multiple independent datasets and with single molecule sequencing. Moreover, in human frontal cortex, constitutive exons contained higher levels of 5-hmC, relative to alternatively-spliced exons. Our study suggests a novel role for 5-hmC in RNA splicing and synaptic function in the brain.
The recent rediscovery of 5-hmC in mammals 1–3 demonstrates that covalent DNA modifications are more dynamic than previously believed. 5-hmC is generated by the oxidation of 5-mC, a reaction mediated by the Ten-Eleven Translocation (TET 1, 2 or 3) family of enzymes3. Traditional methods of assaying DNA modifications have either been unable to differentiate between 5-mC and 5-hmC (e.g. bisulfite mapping4) or have been specific for 5-mC (e.g. antibodies against 5-mC). Relative to other tissues, 5-hmC is particularly enriched in the brain, as observed in mice and humans 1,5. 5-mC at gene promoters suppresses transcription by recruiting transcriptional repressors 6. In the mouse cortex and cerebellum, 5-hmC is enriched within genes and appears to increase with increasing transcription levels 7,8. There is also accumulating evidence for 5-hmC as an intermediate in functional and literal DNA demethylation 9–12. Synchronous neuronal activity promotes active DNA demethylation of plasticity-related genes in the mouse brain via TET-mediated formation of 5-hmC 9; however, it is not known if such demethylation completely accounts for the enrichment of 5-hmC in the brain.
We mapped both 5-mC and 5-hmC in a variety of neuronal and non-neuronal tissues from mice and humans to investigate their respective roles. We labeled 5-hmC using the phage enzyme β-glucosyltransferase (BGT), followed by differential digestion of DNA with restriction enzymes either sensitive or insensitive to these DNA modifications (Fig. 1a). The resulting DNA fragments were amplified and interrogated on genome-wide tiling microarrays (Supplementary Table 1). This assay has the advantage of mapping modifications with up to single-CpG resolution, which was crucial for our ability to connect differences in DNA modifications to exon-intron boundaries.
BGT transfers a glucose molecule specifically to the hydroxymethyl group of 5-hmC, thus rendering it resistant to digestion by the methylation insensitive MspI enzyme at the ChmCGG target site 13,14 (Supplementary Fig. 1a); 5-hmC is thus estimated by differential resistance to MspI-digestion with and without glucosylation of genomic DNA (gDNA). HpaII (targets the same site, CCGG) cannot cut CmCGG or ChmCGG, and conceptually its difference with MspI digestion is a measure of both 5-mC and 5-hmC. Subtraction of the 5-hmC estimate from the HpaII-based estimate therefore measures 5-mC. We validated the assay combining glucosylation and restriction enzyme digestion to measure 5-hmC with biochemical, molecular biological and in silico methods (Supplementary Fig. 1,2 and Supplementary Table 2,3). We verified that the addition of a glucose moiety to 5-hmC confers resistance to MspI digestion of modified oligonucleotides (Supplementary Fig. 1, 2) and that gDNA modifications can be measured on tiling microarrays (Fig. 1b).
For microarray normalization, we chose an algorithm that corrected for affinity bias due to probe sequence, a known issue for tiling arrays 15. We compared two sequence-based normalization algorithms using modification estimates from quantitative polymerase chain reaction (qPCR) (Supplementary Note 2, Supplementary Fig. 2a, Supplementary Table 2,3), and selected an algorithm described by Potter et al. 16 (Supplementary Note 1, equation 1). We also found that single probe estimates for 5-hmC or 5-mC provided less bias while maintaining precision for these data, as compared to averaging probe intensities in a local window (Supplementary Table 2). Despite the increased variance in single probe estimates, biological variability across samples significantly exceeded variability in technical replicates (Supplementary Fig. 2b; e.g. probe-wise increase due to biological variability, mean = 0.52, 95 % CI = [0.51,0.53], p < 10−16, one-sample t-test).
Average probe intensities had the relative magnitude expected from the three treatments: MspI-digested non-glucosylated gDNA had the lowest intensity (mean± SD = −1.45 ± 1.00; 134,521 target probes), followed by MspI-digested glucosylated gDNA (−1.35 ± 1.00); HpaII-digested non-glucosylated gDNA had the highest intensity (−0.81 ± 1.03). Negative values reflect the digestion (underrepresentation) of target sites relative to the baseline of undigested sequences.
Using thin layer chromatography (TLC) and intensities from microarray probes, we verified that 5-hmC levels were the highest in mouse brain gDNA, compared to liver, kidney, pancreas and heart (Supplementary Table 4,5). This finding is consistent with previous reports 1,5. To investigate the origin of increased levels of 5-hmC in the brain, we identified genes and intergenic regions with significantly different 5-hmC in the mouse brain compared to other tissues. Of 134,521 probes that overlapped the non-repetitive genome (six chromosomes), 73,461 overlapped exactly one gene (defined by Mouse Genome Informatics (MGI) symbol; 2–238 probes per gene, median = 16.8). Out of 4,357 genes tested, 730 had different 5-hmC levels in the brain relative to the other tissues (repeated-measures analysis of variance (ANOVA), P < 10−2, false discovery rate (FDR) Q < 0.05; Supplementary Table 6). All but one gene had higher 5-hmC, and lower 5-mC, levels in the brain. Even within the brain, these genes contained above average 5-hmC levels (P < 10−4, bootstrapping (no replacement), median 5-hmC of randomly-sampled genes (R = 10,000)). Separately, we identified 83 differential intergenic probes that also had significantly higher 5-hmC levels in the brain (probe-wise linear regression, total of 60,721 intergenic probes, Q < 0.05; Supplementary Table 7).
We further explored possible associations of genic 5-hmC and steady-state mRNA levels in the mouse dataset. We related 5-hmC and 5-mC intensity levels from our microarrays to tissue-specific mRNA levels from a previously-published dataset 17 (Supplementary Note 1). In three out of five investigated tissues, 5-hmC showed a significant increase in genes with higher transcription (Fig. 1c, Supplementary Fig. 3). This trend was also observed in the brain, but did not reach statistical significance (P = 0.17, linear regression; Fig. 1c, Supplementary Fig. 3); the same was true for the 730 genes identified to have higher 5-hmC levels (P = 0.06, bootstrapping randomly-sampled genes (R=10,000)). In contrast, genic levels of 5-mC significantly decreased with increasing mRNA levels, in all five tissues (Fig. 1c and Supplementary Fig. 3).
To determine if the 730 genes obtained above had unifying cellular functions, we performed a functional overrepresentation analysis (ORA) (DAVID 18, background set of 4,357 genes tested). We observed that these genes were statistically overrepresented in 8 Gene Ontology (GO) terms (total 156 candidate terms), which in turn were associated with synapse function (Table 1). The top overrepresented terms were “cell adhesion” (Q = 1.3 × 10−6), “plasma membrane” (Q = 8.7 × 10−5), and “synapse” (Q = 3.5 × 10−4). These genes also had functional annotation clusters pertaining to ion channel activity, Rho GTPase signaling, and neuronal development (Supplementary Table 8). Interestingly, some of these 5-hmC rich genes showed transcript enrichment or specificity for non-neuronal brain cells. We identified the cell-type specificity of the 730 brain-enriched genes using a list from a previously published transcriptomic dataset of individual brain cell populations 19 (Supplementary Note 1). Using this list, 25 of the brain-enriched genes (3 %) have enriched transcription in astrocytes, 21 (3 %) in oligodendrocytes, and 57 (8 %) in neurons (Supplementary Table 9). By this definition, a few genes were also specific to the cell type (astrocytes: Rfx4, Gli3; oligodendrocytes: Elovl7, Cpm; neurons: Nts, Syt1, Nrg3, Trhde, Kcnc2, Clstn2).
We tested to see if the increase in brain 5-hmC observed in the 730 identified genes generalized to the entire functional class of synapse-related genes. We compared average 5-hmC in all genes mapped to the synapse-relevant GO terms to that in genes outside these categories (Supplementary Note 1) 20. Genes within each tested category (“synapse” and “synapse part”, “cell adhesion”, “plasma membrane”) had significantly higher 5-hmC levels (e.g. 5-hmC for “synapse”, n = 3,258 probes) compared to genes in other categories (n = 70,203; P = 3.2 × 10−20, two-tailed Wilcoxon-Mann-Whitney (WMW) test; Fig. 2a, Supplementary Fig. 4). This effect persisted after controlling for GC content of the probe sequence, a parameter that could artificially influence probe intensity.
We extended the comparison of 5-hmC enrichment in synapse-related genes to human brain by assaying gDNA from 28 human post-mortem brains (frontal cortex Brodmann area10 (BA10); Stanley Medical Research Institute and Harvard brain tissue resource center; demographic details in Supplementary Table 10). Human gDNA samples were processed the same way as the mouse gDNA samples (Supplementary Note 1), and interrogated on Affymetrix 2.0R human whole genome tiling microarrays E and F (Supplementary Table 1 for probe counts). Similar to the mouse brain, the human frontal cortex had higher 5-hmC levels within genes that mapped to the GO terms “synapse”, “synapse part”, “cell adhesion” and “plasma membrane” (“synapse” and “synapse part”: P = 1.0 × 10−4, one-tailed WMW test; Fig. 2b, Supplementary Fig. 5).
Several studies performed before the re-discovery of 5-hmC have noted a change in the density of modified cytosines at the exon-intron boundary of genes, with the density being higher on the exonic side 21,22. These studies, however, used bisulfite sequencing and therefore did not differentiate between 5-mC and 5-hmC. As our restriction enzyme-based assay provided single nucleotide resolution mapping of DNA modifications, we compared densities of each modification on either side of the exon-intron boundary (linear mixed-effects model, Online Methods). Distances to the boundary were measured from the second ‘C’ of the target sequence (‘CCGG’). DNA modification differences at exon-intron boundaries are reported for two regions: one immediate to the boundary (cumulative for the first 5 nucleotides, d = 5 bp) and one that captures the general peri-boundary trend (cumulative for the first 20 nucleotides, d = 20 bp). Relevant parameters (e.g. probe count) and statistics for all datasets are in Supplementary Table 11. Overall probe intensities for various DNA modifications in all datasets are in Supplementary Table 12.
Levels of all DNA modifications changed at the exon-intron boundary in human frontal cortex samples (n = 28 brain samples; Affymetrix tiling microarrays E and F, covering six chromosomes). Consistent with previous findings, we found higher densities of both modifications in exons relative to introns (d = 5, P = 2.8 × 10−7; and d = 20, P = 2.8 × 10−40; P-values from linear mixed-effects model unless otherwise stated; mean differences are not reported in the text as they are in arbitrary units of microarray intensity, and are not by themselves biologically meaningful). Most of this cross-boundary change in modification density was attributable to 5-hmC (d = 5 bp, P = 2.3 × 10−6; d = 20 bp, P = 2.8 × 10−20; Fig. 3a, b). In contrast, 5-mC showed no changes closer to the boundary (d = 5 bp, P = 0.57) and relatively smaller exonic increases than 5-hmC for longer periboundary distances (d = 20 bp, P = 9.1 × 10−6; Fig. 3a, b). The change in 5-hmC, relative to 5-mC, was most evident in the first 5–10 bp from the boundary. For longer distances (up to 50 bp tested), both modifications showed robust increases in exons relative to introns (Fig. 3b, Supplementary Fig. 6a).
We also mapped DNA modifications in brain samples from patients diagnosed with schizophrenia and bipolar disorder 23, collectively termed “major psychosis” (n = 54 brain samples (frontal cortex BA10); Affymetrix tiling microarray E). Consistent with the findings in the control brain set reported above, gDNA from brains of psychosis patients also showed a predominant change in 5-hmC at the exon-intron boundary, at both distances (Fig. 3b inset, Supplementary Table 11 and Supplementary Fig. 6b).
To ascertain if the cross-boundary change was unique to brain tissue in humans, we analyzed human liver samples (n = 13) in parallel with age- and sex-matched frontal cortex samples (Supplementary Table 10). Here again, brain gDNA samples showed a larger change in 5-hmC than in 5-mC. In contrast, liver gDNA samples showed a predominant change in 5-mC, both at the boundary (d = 5, P = 4.8 × 10−6) and longer distances (d = 20, P = 4.8 × 10−12) (Fig. 3c, d, Supplementary Fig. 6c, d, Supplementary Table 11).
We validated the findings in the human frontal cortex using three independent methods. First, to estimate the impact of GC differences in exons and introns, we compared probes representing introns to those representing exons, after matching for GC/AT content and distance from boundary (100 iterations to sample greater number of exonic probes). The overall trends of 5-hmC and 5-mC for matched probes were similar to those for unmatched probes (Supplementary Fig. 6e). 5-hmC cross-boundary differences were significant (p < 0.05, geometric mean of 100 iterations) for distances starting at d = 5 up to d = 35, while 5-mC showed a more modest gradual increase.
Second, we assayed 5-hmC by single molecule sequencing (SMS) (Helicos Biosciences Corp., Cambridge MA), which does not require PCR and microarray hybridization, thus eliminating artifacts due to DNA sequence composition 24. Genomic DNA from one frontal cortex sample was MspI-digested with or without prior glucosylation, followed by polyA end-labeling, and sequencing (Supplementary Note, Supplementary Table 13). Direct sequencing confirmed a sharp intronic decrease in 5-hmC at the exon-intron boundary (one-tailed WMW test, median exonic increase in 5-hmC =+ 4.6% for d = 20; P = 9.2 × 10−3; Fig. 3d, Supplementary Fig. 7). We were unable to accurately measure exon-intron differences for d = 5 due to a small number of reads 4 bp into the intron. However, the exonic increase in 5-hmC is evident for cumulative distances larger than 10 bp (Fig. 3d).
The third approach compared DNA enriched in 5-mC to a fraction depleted in 5-mC, separated using MethylMiner Methylated DNA Enrichment kit (Invitrogen; Supplementary Note 1). Each fraction was then subjected to the treatment as in Fig. 1a (n = 6 human frontal cortex samples; Supplementary Fig. 8a). We expected that if 5-mC levels in the brain actually changed across the exon-intron boundary, then we would detect a greater cross-boundary change in the 5-mC rich fraction relative to the depleted fraction. We did not observe such a change at the exon-intron boundary; in fact, intronic levels of 5-mC were slightly higher (P = 0.05, d = 20).
The above three control experiments collectively show that in the human frontal cortex, the change in 5-hmC at the exon-intron boundary exceeds that of 5-mC.
To determine if the change in 5-hmC at exon-intron boundaries is unique to the human frontal cortex, we examined DNA in multiple mouse organs, including the mouse brain. Mouse brain samples (8 weeks old male C57/BL6) were split into frontal cortex and the remainder (including cerebellum and brain stem) (n = 15 samples per group, Affymetrix mouse tiling microarray A). The frontal cortex was analyzed separately to match the brain region investigated in human samples. Consistent with patterns in human brain, at d = 20 from the exon-intron boundary, the change in levels of DNA modifications were mainly due to 5-hmC in both frontal cortex (P = 7.5 × 10−6, linear mixed-effects model) and in the rest of the brain (P = 3.0 × 10−8) (Fig. 4a, b; Supplementary Table 11). In these contexts, the change in 5-hmC was not evident at the immediate boundary (e.g. P = 0.35 at d = 5 in frontal cortex), but rather was gradual over a region adjacent to the boundary. In contrast, 5-mC exhibited a more dramatic immediate cross-boundary change (P = 9 × 10−4 at d = 5) (Fig. 4a, b).
For non-neural organs, we used the mouse dataset described previously in this work (Supplementary Table 5; 36 samples from liver, pancreas, kidney, and heart, microarrays A and G). In agreement with the markings seen in the human liver, mouse organs of non-neural origin primarily had changes in cross-boundary densities of 5-mC (d = 5, P = 0.03), with a non-significant change in 5-hmC (d = 5, P = 0.25) (Fig. 4c, Supplementary Table 11). Although both types of DNA modifications show relatively higher exonic densities at the larger peri-boundary regions (d = 20), the magnitude of change in 5-mC (P = 3.4 × 10−19) exceeded that of 5-hmC (P = 5.7 × 10−3) (Fig. 4c).
We next assayed a murine neuronal cell line which had undetectable (< 1%) global levels of 5-hmC measured by TLC (mHypoA-2/24; n = 18 samples, Supplementary Table 12). The microarray analysis, which is more sensitive to region-specific differences than TLC, detected a marginally significant change in 5-hmC at d = 20 (P = 0.045, linear mixed-effects model) and a highly significant difference in 5-mC (P = 1.6 × 10−17) (Supplementary Fig. 8b and Supplementary Table 11).
We also examined boundary differences in a B-lymphocyte dataset where cells were treated with suberoylanilide hydroxamic acid (SAHA). Histone deacetylase (HDAC) inhibitors like SAHA may promote DNA demethylation 25, and have also been shown to induce changes in pre-mRNA splicing 26. We examined whether SAHA induces changes in DNA modifications at the exon-intron junction. Vehicle-treated B lymphocytes (no SAHA) showed no 5-hmC exon-intron differences at d = 20 (P = 0.27); however, SAHA-treated cells showed an increase in the cross-boundary differences in a manner roughly corresponding to increasing doses of SAHA (P = 0.01, 8.8 × 10−4, and 3.6 × 10−4 for 0.01 μM, 0.02 μM and 0.1 μM of SAHA, respectively) (Supplementary Fig. 9, Supplementary Table 11).
Changes in DNA modifications at exon-intron boundaries may impact exon recognition and exon inclusion levels. To investigate this possibility, we measured 5-hmC as a function of exon inclusion levels, using RNAseq data for human frontal cortex and liver 27 (Online methods). Exon inclusion levels were measured as the proportion of transcripts that include a given exon; exons were classified as being alternatively-spliced (exon included in ≤ 80 % of gene transcripts) or constitutive (exon included in 100 % of gene transcripts) (Online Methods). We observed that, at d = 20, the median 5-hmC density was lower in alternatively-spliced exons, relative to constitutive exons (P = 0.04, two-tailed WMW test; see Supplementary Table 14 for probe counts and exon sampling), but it was not different for 5-mC (d = 20, P = 0.61); number of probes at d = 5 were too small for a meaningful comparison. The 5-hmC effect was even stronger at the level of the whole exon (P = 8.4 × 10−5); here a marginal change was observed in 5-mC levels (P = 0.025, Fig. 5a). The increase in 5-hmC was still borderline significant after probes in alternative and constitutive exons were matched for GC content (1000 iterations, 197 probes per group; geometric mean of P = 0.07 for 5-hmC and P = 0.08 for 5-mC). In the liver, neither 5-hmC nor 5-mC was different among the two types of exons (P = 0.13 for 5-hmC, P = 0.38 for 5-mC, Supplementary Table 14) (Fig. 5b).
Separately, we found that average exonic 5-hmC in intron-less – or single exon – genes was lower than that in genes with multiple exons. For this analysis we used tiling array data for control human brains over six chromosomes (arrays E and F, 28 samples, sample median used for each probe; RefSeq gene definitions were used; within each category, duplicate bases were collapsed. We found that exonic 5-hmC in single-exon genes was statistically different from that in multi-exon genes (median, intronless = 0.06, multi-exon = 0.073; P = 0.026, two-tailed WMW; n: intronless = 1,256 probes, multi-exon = 22,500 probes). This finding is consistent with an exonic change in 5-hmC due to gene splicing.
Collectively, the results suggest that the density of 5-hmC proximal to splice sites and within exons could impact splicing and affect exon inclusion levels in the mammalian brain.
In this study, we adapted a 5-hmC detection strategy that uses glucosylation-induced resistance to restriction enzymes for tiling microarray-based mapping of 5-hmC. The glucosylation-based detection of 5-hmC has been successfully used in other studies 7,8,28. Our microarray-based quantification of DNA modifications (verified by TLC) showed significantly higher levels of 5-hmC in brain as compared to other tissues, which is consistent with previous reports 1,5. We identified a large number of genes with higher densities of 5-hmC in the brain compared to those in heart, liver, kidney, and pancreas. More generally, we discovered a trend where 5-hmC in the gene body increases with increased transcription of the corresponding gene, while 5-mC decreases. This is consistent with the observation that 5-hmC prevents the binding of transcriptional repressor proteins and is found within actively-transcribed genes 7,29–31. Interestingly, the association between genic 5-hmC and mRNA levels was weakest in the brain (P = 0.17), suggesting that in addition to the regulation of gene activity 5-hmC may have other functions in the cell.
Functional annotation analysis revealed a statistical overrepresentation of terms pertaining to synaptic plasticity in genes enriched for 5-hmC in the brain. In particular, annotation clusters were composed of protein groups involved in distinct aspects of synaptic remodelling: ion channels, members of the Rho GTPase signalling pathway, and axon guidance molecules. Notably, most genes in these clusters encode proteins that are functionally located at the plasma membrane, rather than being cytosolic. The observation that 5-hmC is overrepresented in the genes controlling synaptic plasticity may shed a new light on the epigenetics of learning and memory. Adult animals with conditional double knockouts of genes encoding DNA methyltransferase 1 and 3b (Dnmt1 and Dnmt3b) in the cerebral cortex showed learning and memory defects in hippocampal-dependent learning tasks, suggesting that changes in DNA methylation occur in post-mitotic neurons 32. Fear conditioning, an experimental paradigm for emotional learning, results in the demethylation and transcriptional activation of reelin (RELN) 33. Recent mappings of 5-hmC in the brain have remarked a generation of 5-hmC and demethylation in response to neuronal activity, but the degree to which the two correspond is not completely clear 9,34. In the absence of DNA replication in post-mitotic neurons, it is possible that loci undergoing gene reactivation via DNA demethylation accumulate 5-hmC over time. Experiments that map 5-hmC over the course of multiple gene reactivations will explore whether, in the context of synaptic activity, this base marks a stable epigenetic state, or if it is an intermediate in cycles including complete demethylation (i.e. conversion to unmodified cytosines).
In addition to transcriptional regulation, it is possible that 5-hmC and 5-mC may impact the process of pre-mRNA splicing. Our finding of boundary changes in all modified cytosines replicates earlier observations that used bisulfite modification and sequencing 22,35,36. The separation of the two DNA modifications in our study showed that 5-hmC, rather than 5-mC, accounts for most of the density difference at the immediate exon-intron boundary in human frontal cortex samples. This finding was validated by three control experiments: comparison of GC/AT matched probes, Helicos single molecule sequencing, and methyl-binding domain (MBD)-enriched 5-mC mapping. While the mouse brain samples show a predominant change in 5-hmC at longer peri-boundary distances, the main change within the first 10 bp of the boundary was in 5-mC.
The non-neural tissues investigated included human liver and four mouse organs (liver, pancreas, heart and kidney). While these tissues had measurable levels of genomic 5-hmC density by TLC, modification changes at the exon-intron boundary were mainly due to 5-mC. In contrast to the pattern observed in the brain, peri-boundary changes in 5-hmC were relatively minor in these tissues. These findings suggest that any splicing-related functions DNA modification may have in non-neural organs are mediated mainly by 5-mC rather than 5-hmC. Separately, we found that B-lymphocytes show increased 5-hmC differences at the exon-intron boundary upon treatment with SAHA. SAHA is a chemotherapeutic agent that acts as a HDAC inhibitor and may promote DNA demethylation 25. As the dose used in this study is comparable to those in plasma of patients treated for cancer with SAHA 37, 5-hmC changes at the exon-intron boundary may also be expected to take place in vivo. Hence, it is also possible that such changes in 5-hmC in response to HDAC inhibition could also contribute to alternative splicing 26.
Previous studies have reported that exons are enriched in the histone modifications H3K36me3, H3K4me3, and H3K27me2, relative to flanking intronic regions 38–40. These modifications can recruit splicing regulators to impact alternative splicing of nascent transcripts 41. While the resolution of histone maps is limited by the size of nucleosomal DNA (147 bp), our DNA modification studies have identified changes in modification levels within 20 bp (and in some cases, within 5 bp) surrounding the exon-intron boundary. This precision argues for a specific effect at the exon-intron boundary as well as a possible difference between exons and introns as a whole, as described in earlier studies 34. Our finding that 5-hmC densities are lower in alternatively-spliced exons, relative to constitutive exons, complements the observation that H3K36me3 is less enriched in alternatively-spliced exons 39,42,43. DNA methylation has been shown to modulate exon inclusion levels by influencing rate of transcript elongation in B-lymphocytes. Lack of DNA methylation can promote exon inclusion by causing ‘pauses’ in RNA Polymerase II (RNA PolII)-mediated elongation, and may also affect RNA PolII elongation-dependent changes in alternative splicing by affecting the binding of the transcriptional repressor CTCF 44. There may also be other mechanisms by which DNA methylation could influence exon inclusion levels, for example, through the recruitment of splicing factors via methyl binding proteins.
Our findings suggest that tissue-specific distributions of 5-hmC or 5-mC at the exon-intron boundary, and within genes, may simultaneously influence both transcription and splicing. The direction of the influence remains unclear, as transcription and splicing may also affect epigenetic DNA modifications, and mechanisms for cross-talk likely exist between epigenetic regulation, splicing, and transcription 45,46.
Our experimental strategy exploited the ability of the β-glucosyltransferase (BGT) enzyme of T4 phage to transfer a monomeric glucose residue from uridine diphospho-α-D-glucose (UDP-Glc) to the exocyclic hydroxyl group of 5-hmC in double-stranded DNA. This specific addition of a glucose moiety to 5-hmC blocks access of methylation insensitive endonucleases to their target sites thus making the DNA fragment resistant to digestion 1,2. Here we used isoschizomer restriction enzymes, HpaII and MspI, where HpaII only cleaves DNA when the CpG dinucleotide is unmethylated at CCGG, while MspI cleaves the CCGG sites independently of any natural CpG modification, including 5-mC and 5-hmC 3,4.
Microarray experiments were performed as illustrated in Fig 1. Each genomic DNA sample was split into three parts, one of which was subjected to BGT glucosylation and two of which were not glucosylated (details in Supplementary Note). One unglucosylated part was treated with MspI, serving as a baseline for digestion. The second unglucosylated part was treated with HpaII, which spares any modified cytosines at the recognition site. The BGT-glucosylated part was treated with MspI, and 5-hmC was measured as the decrease in digestion of this part, relative to the unglucosylated part. The difference between HpaII and MspI restriction efficacy on unglucosylated DNA reflected the sum of all modifications, and the difference between all modifications and 5-hmC was the measure of 5-mC.
This study utilized datasets from various mouse organs, human brain and liver, and two cell lines. Male C57BL/6J mouse brain tissues (frontal cortex and remaining brain) and other organs (liver, pancreas, kidney, and heart) were obtained from 8-week old mice (Supplementary Table 5). All human brain samples were obtained from the Stanley Medical Research Institute brain collection (courtesy of Drs. Michael B. Knable, E. Fuller Torrey, Maree J. Webster, and Robert H. Yolken) and from the Harvard Brain Tissue Resource Center (courtesy of Dr. F. Benes). Human liver samples were obtained from commercial databanks (Curline (USA) and Cambridge Biosc (UK)). Liver samples were age- and sex-matched to brain samples. Demographic details are provided in Supplementary Table 10. Genomic DNA isolation was performed by standard proteinase K and phenol chloroform extraction.
From each tissue, two 1 μg samples of genomic DNA were sheared to a 200 bp fragment length using a Covaris S2 sonifier (Covaris, USA). Sheared DNA was end-filled in the presence of T4 DNA polymerase (5 U) at 11°C for 20 min and purified using PN buffer from the QIAquick Nucleotide Removal Kit (Qiagen). The purified 1 μg of blunt-ended DNA was ligated with A1–25 adapters (10 μM) (Supplementary Table 15) in the presence of T4 DNA ligase (10 U) at 22°C for 3 h, inactivated at 65°C for 10 min, and purified with the QIAquick Nucleotide Removal Kit. One third (~300ng) of the purified DNA was treated with 200 μM uridine-5′-diphospho-α-D-glucose (UDP-Glc, Sigma), 80 ng BGT 5 in 100 mM Tris-HCl (pH 8.0) and 25 mM MgCl2 for 3 h at 37°C, and purified again. Thereafter, 200 ng of glucosylated DNA and 200 ng of untreated DNA were subjected to MspI digestion in a final volume of 20 μl for 12 h at 37°C, followed by enzyme inactivation at 80°C for 10 min. The same protocol was performed on 200 ng of untreated DNA using HpaII. Following the observation that some 5-hmC may be resistant to MspI digestion 6, we used 10 units of enzyme (50 fold surplus of the enzyme over the DNA substrate digested for 12 h). Restriction enzyme digested genomic DNA from each scenario was PCR amplified using adapter-specific primers (Supplementary Table 15). Amplified PCR products were fragmented, biotin-labeled, and hybridized on Affymetrix whole-genome tiling arrays (Supplementary Table 1), and scanned as described in the Affymetrix Chromatin Immunoprecipitation Assay protocol (http://media.affymetrix.com/support/downloads/manuals/chromatin_immun_ChIP.pdf)
We distinguished between three types of probes on the whole-genome tiling microarray: “target probes”, which contain the restriction site CCGG, “flanking probes”, which do not contain the recognition sequence but could lie up to 200 bp upstream or downstream of a target probe (sheared DNA fragments were of average 200 bp), and “non-target probes” that neither contain a target sequence nor are within 200 bp (on either side) of the target sequence. Non-target probes are unaffected by enzymatic cleavage, and therefore, can be used as a baseline for array normalization. After comparing different array preprocessing algorithms (Supplementary Note), we chose to use a sequence-based algorithm after Potter et al.7.
The model (Supplementary Note, equation 1) was applied to non-target probes matched proportionally to target probes by GC content. Normalized intensities were obtained by subtracting the fitted value from raw probe intensities, resulting in a normal distribution of probe-level intensities. Each chip was independently normalized. All downstream analyses were carried out at the single-probe level (i.e. without windowing or peak-calling) and solely on target probes (henceforth referred to simply as ‘probes’). 5-hmC was measured as the log-difference between glucosylated DNA and native DNA following treatment with MspI (Fig. 1a). The sum of all the DNA modifications was measured as the log-difference between HpaII-treated unglucosylated DNA and MspI-treated unglucosylated DNA. 5-mC was estimated by the difference between all DNA modifications and 5-hmC.
Following normalization, probes overlapping repeats were excluded (RepeatMasker, simple repeats and segmental duplications; genomic annotations from UCSC genome browser, build mm8 for mouse, hg18 for human). Information on arrays and probe counts are listed in Supplementary Table 1. Unless otherwise specified, analysis was done using R software and BioConductor 8.
For the exon-intron boundary analysis, probes were required to have a ‘CCGG’ in both the Affymetrix probe sequence and the chromosomal sequence downloaded from UCSC; probes that did not meet this criterion were excluded.
Probes were filtered for those overlapping genes on either strand (RefSeq genes (refGene) downloaded from UCSC, mm8; range between txStart and txEnd columns). Probes overlapping multiple genes (gene defined by MGI gene symbol) were excluded, as were genes containing exactly one probe, resulting in 73,461 probes over 4,357 genes (range = 2 to 238 probes per gene, mean = 17). A repeated-measures ANOVA was conducted on probes in each gene, with tissue as a between-groups factor (‘Brain’ = 11 samples, ‘Other’ = 36 samples (9 each of liver, heart, kidney, pancreas)), and the biological sample as a within-groups error term. Probes within a gene were not averaged across samples, nor collapsed within a gene. Gene-wise nominal p-values were adjusted using the Benjamini-Hochberg method of FDR correction (BH; α = 0. 01). 730 genes had Q-values < 5 %; all but one of these had higher 5-hmC in the brain, and are therefore called ‘brain-enriched’.
Boundary regions were defined as regions at a certain distance on both sides of an exon start or end (RefSeq genes, internal exons only). Where multiple exons had the same extent ([chromosome, start, end] location), only one such exon was retained. For each probe, the distance from the modifiable C (of CCGG) was computed to the nearest exon (or intron) boundary; this value is the boundary distance of the probe (a value of zero was exonic in these calculations). Statistics were computed on probes that lay 5 or 20 bp on either side of the junction. The result at a distance of 20 bp reflected the general trend observed at the boundary, while that at 5 bp reflected immediate cross-boundary change. A linear mixed-effects model 9 was used to test probe intensity differences between the exonic and intronic side of the junction. Junction side was used as the fixed-effects term, and array type and sample were treated as random-effects terms (Supplementary Note); α = 0. 025 was used to indicate a statistically significant result.
RNAseq from liver and cortex 10 were analyzed to identify cassette exons and their respective inclusion levels. Thresholds used to identify cassette exons are described in Supplementary Note. The inclusion level of an exon was defined as the proportion of gene transcripts in which the exon was present:% Exon Inclusion = 100*(sum(CiA)+ sum(ACj))/((sum(CiA)+ sum(ACj) + 2*(sum(CiC2)+ sum(C1Cj))), where Ci is any possible splicing donor upstream the alternative exon (including C1), and Cj any possible splicing acceptor downstream the alternative exon (including C2). Exons with multiple acceptor or donor splice sites or not supported by RNA-Seq were excluded. We defined an exon as ‘alternatively-spliced’ if the inclusion level was less than 80%, and as ‘constitutive’ if inclusion was 100 %. DNA methylation was assessed in those exons that overlapped ‘target probes’ (probes containing CCGG) (probe counts in Supplementary Table 14). Methylation levels of exons were computed at the whole-exon level and at d = 20 bp away from the boundary. Sample-averaged median probe intensities were compared between probes overlapping constitutive and alternative exons (two-tailed WMW test, α = 0. 05).
We thank Nicole Miller and Christopher Austin (The National Center for Advancing Translational Sciences, NIH) for providing cell lines for the SAHA experiment. We also thank Jean-Sebastien Doucet (Centre for Addiction and Mental Health) for assistance with mouse brain sample preparation. This research has been supported by the Canadian Institutes of Health Research (grants 199170 and 186007) and the US National Institutes of Health (grants MH074127, MH088413, DP3DK085698, HG005758 and HG004535) to A.P. and Canadian Institutes of Health Research grants to B.J.B., by the Research Council of Lithuania and the French Ministry of foreign and European affairs under the Lithuania-France bilateral collaboration program Gilibert (grant TAP-13/2011–2012) to S.M. and S.K. and the European Social Fund under the Global Grant measure (grant VP1-3.1- MM-07-K-01-105) to S.K. A.P. is Tapscott Chair in Schizophrenia Studies at the University of Toronto and a senior fellow of the Ontario Mental Health Foundation. M.I. is a recipient of a Human Frontier Science Program Organization Long Term Fellowship.
Author contributionsS.K., A.P., T.K. and S.P. conceived the study and designed the experiments. S.M., E.K., Z.L., and S.K. developed and verified the methods for 5-hmC detection. T.K., M.P., M.T., P.J., C.P., V.L., A.N., D.B., and A.W. conducted the microarray experiments. S.P., K.K., P.K., S.C.W., and R.K. developed bioinformatics methods and analyzed microarray data. P.K., T.K., S.P. and A.P. contributed to the design and data analysis of the Helicos sequencing experiment. M.X. and R.T. generated SAHA treated cell lines. M.I. and B.B. generated the lists of alternative and constitutive exons and perform 5-hmC comparisons. T.K., S.P., V.L., S.K., A.P. wrote the manuscript. All authors reviewed the manuscript.