|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: AC MEP LB. Performed the experiments: AC VP JK EI. Analyzed the data: TW AC NGS SDP LB. Wrote the paper: AC TW LB.
Cell specific states of the chromatin are programmed during mammalian development. Dynamic DNA methylation across the developing embryo guides a program of repression, switching off genes in most cell types. Thus, the majority of the tissue specific differentially methylated sites (TS-DMS) must be un-methylated CpGs.
Comparison of expanded Methyl Sensitive Cut Counting data (eMSCC) among four tissues (liver, testes, brain and kidney) from three C57BL/6J mice, identified 138,052 differentially methylated sites of which 23,270 contain CpGs un-methylated in only one tissue (TS-DMS). Most of these CpGs were located in intergenic regions, outside of promoters, CpG islands or their shores, and up to 20% of them overlapped reported active enhancers. Indeed, tissue-specific enhancers were up to 30 fold enriched in TS-DMS. Testis showed the highest number of TS-DMS, but paradoxically their associated genes do not appear to be specific to the germ cell functions, but rather are involved in organism development. In the other tissues the differentially methylated genes are associated with tissue-specific physiological or anatomical functions. The identified sets of TS-DMS quantify epigenetic distances between tissues, generated during development. We applied this concept to measure the extent of reprogramming in the liver of mice exposed to in utero or early postnatal nutritional stress. Different protocols of food restriction reprogrammed the liver methylome in different but reproducible ways.
Thus, each identified set of differentially methylated sites constituted an epigenetic signature that traced the developmental programing or the early nutritional reprogramming of each exposed mouse. We propose that our approach has the potential to outline a number of disease-associated epigenetic states. The composition of differentially methylated CpGs may vary with each situation, behaving as a composite variable, which can be used as a pre-symptomatic marker for disease.
Animals develop from repeated division of a single cell (a fertilized egg) as a result of precise spatio-temporal regulation of gene expression. Combinatorial use of cis and trans DNA regulatory elements permit commitment of cells to different lineages. Once a differentiation path is chosen the trajectory of gene expression is maintained through many cell generations. The acquisition of this “cellular memory” is key for the assembly of the different organized tissues, and is maintained largely by epigenetic marks such as 5-methyl cytosine methylation –. During development the genome is faithfully replicated millions of times but the epi-genome varies with each cell type. Moreover, in contrast to the genome, the epigenome is dynamic and sensitive to the environment. Exposure to stress during critical stages of development could cause subtle changes in the epigenome resulting in a small but time-cumulative effect on cellular physiology , , –. Thus, small deviations in tissue-specific methylation patterns could contribute to the developmental origins of many adult diseases , .
To understand the role of DNA methylation in the cell-type-specification of the chromatin or in the onset of pathophysiological mechanism, it is necessary to contrast the distribution of methyl-marks across various tissues and different conditions. The ultimate goal will be to identify tissue-specific regulatory loci controlled by methylation. Most of the research done to address this problem has focused on the function of promoters and their CpG islands (CGIs), shaping the perception that the CGIs are the hotspots for epigenetic regulation of tissue specific transcriptional activity  –. However, technological advances in methylome analysis have increasingly shown Tissue Specific Differentially Methylated Regions (TS-DMR) as hypo-methylated loci existing outside promoters and CGIs, –. We recently showed that most of the hypo-methylated loci existing in the liver of an adult mouse are located in introns or intergenic regions and do not meet commonly accepted definitions of CGIs . Indeed these CpG-poor un-methylated loci showed the highest concentrations in DNA regulatory sequences, many of which were liver specific . We hypothesized that the epigenetic regulation of transcriptional networks that specify tissues occurs primarily in CpG located outside promoters.
Comparing the distribution of methylation sites in mouse gDNA derived from liver, kidney, brain and testis we found 138,052 independent differentially methylated sites (DMS). One sixth of them consisted of CpGs whose methylation levels were particularly high or low in only one tissue, hence named tissue specific differentially methylated sites (TS-DMS). Most of these are un-methylated CpGs associated with tissue-specific expressed genes. Functional profiles obtained from these genes suggest that relevant aspects of tissue physiology would be epigenetically demarcated during development. The distribution of methylation among a minimal set of 23,270 CpG showed commonalities between biological replicates and differences between tissues, suggesting that these TS-DMS constitute unique epigenetic footprints created during development. We detected further reprogramming of methylation patterns drove by in utero and/or immediate postnatal food restriction. We suggest that our approach can outline a number of disease-associated epigenetic states. The composition of DMS would be different for each situation representing highly specific DNA methylation biomarker panels. Discovery of novel epigenetic makers is an area of increasing interest in biomedical research, however most of the newly discovered candidates have been found by focusing on a small number of well-defined loci, usually promoters and CGIs . As these loci can be identified either as methylated or un-methylated, they offer a limited sensitivity or specificity for diagnosis, , . On the other hand, a composite variable, such as the composition of CpGs in a set of differentially methylated sites, can represent a larger number of different situations covering wider applications.
Four tissues: brain (B), kidney (K), liver (L) and testis (T) were obtained from each of three adult male C57Bl/6J mice. Fetal livers were also derived from the same mouse strain. In addition, 16 liver samples were used from three-week-old male ICR mice (Harlan, Indianapolis, IN, USA) that were subjected to a study of nutritional influences on diabetes and obesity risk , . This study was carried out in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. Animal protocols “Proposal # 2009-0017 LN, DNA methylation in obesity and diabetes”, were approved by The National Institute of Environmental Health Sciences Animal Care and Use Committee.
CpG tag libraries for eMSCC were prepared as described , including the addition of equimolar amounts of un-methylated lambda phage DNA (Promega, Cat.# D1521). All libraries were sequenced by Expression Analysis (Durham, NC, USA) using an Illumina Genome Analyzer IIx. Reads (30 to 50 million per library) were aligned to the mouse reference genome (mm9) using MOM .
This procedure has four main steps: 1) For each of 12 samples (3 per tissue), we normalized the reads at each CpG by a factor that is proportional to the total number of reads in the sample; 2) We aligned all the reads in each sample to Takai-Jones CGIs (TJ-CGIs) and generated the distribution of the average normalized methyl-sensitive-counts (digestion frequencies) of TJ-CGIs. We chose the value at the valley of the distribution as the cutoff to filter out CpGs presumed heavily methylated in all four tissues; 3) For each CpG that passed the filter in step 2, we performed all six pairwise comparisons of the four tissues, i.e. B-K, B-L, B-T, K-L, K-T, L-T; where B, K, T and L represent brain, kidney, testis and liver respectively and B-K, B-L, etc., represent the pair wise comparisons. For each CpG, the average and the standard deviation of the reads were calculated using all samples. The standardized residual for each sample was then calculated by dividing the difference between the observed numbers of reads for that sample and the corresponding average by the standard deviation. We plotted the quantiles of the residuals against the quantiles of standard normal distribution to see if it is reasonable to assume that the data are approximately normally distributed; 4) We applied the mdFDR method to identify DMSs based on the p values of all six possible t-tests. The mdFDR procedure allowed us to control the overall false discovery rate for all pairwise comparisons as well as the directional errors when declaring more or less methylated states . In this manner, we identified the preliminary DMS. We next removed the pairs of comparison with average normalized read differences between two tissues that were less than the cutoff value generated from step 2, to ensure that two tissues had a different methylation state at that CpG. Last, for each DMS we averaged reads of each tissue and identified the sites with methylation levels significantly high or low in only one tissue. These sites were called TS-DMS.
Previously, an experiment was reported in which pregnant mice were subjected to undernutrition (U) during the last third of their pregnancy and this feeding regimen was continued during lactation generating the UU pups, alternative some mothers were changed to a control diet (C) during lactation and generated the UC pups. CC and CU protocols were also implemented to generate the respective pups. All pups were euthanized and their livers analyzed . We used 16 liver samples derived from the same experiment: CC (n=4), CU (n=5), UC (n=2) and UU (n=5) to profile changes in methylation using the approach described in previous section.
CpGs were classified according to their genomic location: 1) TSS region (−3 Kb to +2 Kb of the TSS); 2) gene body region; 3) 3′ end region encompassing 3 Kb of DNA sequences downstream of each transcriptional end; and 4) intergenic region. Coordinates for the beginning and end of these regions were taken from the mm9 building downloaded from the UCSC Genome Bioinformatics browser (http://hgdownload.cse.ucsc.edu/goldenPath/mm9/database/knownGene.txt.gz). The equally unmethylated CpGs (497,547) were defined as the CpGs with the average normalized df above the cutoff value of unmethylation in all four tissues and excluding any overlapping with DMSs.
CpGs were mapped to three regions relative to CGIs and their 2-kb flanking regions (shores), including inside CGI, in CGI shores, and outside of CGI and its 2 Kb shores. Five CGI definitions were used for this analysis –.
We used the Gene Expression Barcode 2.0 browser to obtain a list of genes expressed in brain, kidney, liver, and testis (http://rafalab.jhsph.edu/barcode/index.php?page=tissuegene). The original gene expression information in this database was extracted from publicly available gene expression results of 9,652 samples using the Mouse Genome 430 2.0 microarray . In this set, a gene is considered expressed in a specific tissue if it is expressed in more than 95% of the samples of this tissue. We further extracted tissue-specific expressed genes by collecting, those that are expressed in only one of the four tissues. We considered a gene to be tissue differentially methylated when it has at least one TS-DMS within the region encompassing ±3 Kb from its edges. To standardize the different lists, all of them were converted to DAVID identifiers using the Gene ID conversion tool in NIAID's DAVID Bioinformatics Resources 6.7 ,  website (http://david.abcc.ncifcrf.gov/). Only genes with DAVID identifiers were considered in this study. The odds ratio was calculated as the ratio of the odds of tissue-specific un-methylation occurring in tissue-specific expressed genes versus the odds of it occurring in non-tissue-specific expressed genes. Four binary sets of data were created to calculate the odds ratio (OR) in each tissue according to the formula: OR=(a/c)/(b/d). Where “a” is the number of genes tissue differentially methylated that were uniquely expressed in the same tissue; “b” are genes not associated with TS-DMS but uniquely expressed in the analyzed tissue; “c” is the number of genes tissue-differentially-methylated but expressed in more than one tissue; “d” is the number of genes not associated to TS-DMS and expressed in more than one tissue. Both ORs and p-values were calculated by using the logistic regression function in R package with one degree of freedom.
We downloaded a list of 32,266 mouse tissue-specific enhancers in cortex, liver, kidney, and testis identified by H3K4me1 ChIP-Seq study (http://chromosome.sdsc.edu/mouse/download.html) . We aligned our identified TS-DMSs and a total of 7 million surveyed CpGs in the genome to the enhancers based on their genomic locations. Enhancer was defined as being covered by our eMSCC method if it has at least one TS-DMS. To quantify the fold enrichment of TS-DMS in enhancers with respect to the genome, we calculated the ratio between [number of TS-DMS within enhancers/total number of TS-DMS] and [total number of surveyed CpGs within enhancers/total number of surveyed CpGs in the genome]. We estimated the significance of the enrichments by simulation, which allowed us to assign p values based on the distribution of the fold enrichments generated from simulation of 100,000 iterations. For each iteration we randomly generated a set of genomic regions with the same number of enhancers and the same size (bp) of each enhancer. In addition, we examined the overlap of TS-DMS with 15,435 active enhancers, which were defined by both H3K4me1 and H3K27ac marks.
We used the tool provided in NIAID's DAVID Bioinformatics Resources 6.7 ,  to analyze enrichments in five main categories: biological process, cellular component, molecular function, KEGG pathway, and tissue expression. We used the Bonferroni-corrected P value of 0.05 as the threshold to identify significantly enriched categories, which were sorted by -fold enrichment over the background frequency in the human genome. We focused on genes associated with CpGs uniquely unmethylated in one of the four tissues, using gene names derived from the Gene ID conversion tool within DAVID. A gene was defined as being associated with a CpG if the CpG was within 3 Kb upstream to 3 Kb downstream of the gene.
The functional annotation tables usually display a rank-order with broader general term at the top, and most specific terms toward the bottom. Although the upper terms have greater statistical support (largely enriched, o lower p-values), they are usually less to almost no informative, but generally are the reported terms when the main output of the enrichment tool is based on functional enrichments tables. DAG graph facilitates the systematic set up of thresholds to select relevant terms based on the structure of the results. We used Gene Ontology enRIchment anaLysis and visuaLizAtion tool (GORILLA)  which allows to identify the most informative terms that are significant enriched.
Brain, kidney, liver and testis were dissected from three C57BL/6J mice and genomic DNA was isolated from every specimen. Each sample was treated with four methylation-sensitive restriction enzymes to compare the extent of digestion at approximately 7 million CpGs. Briefly, twelve CpG-tag libraries were prepared according to our previously published method , and deep sequenced to generate 37.5 to 51.4 million aligned reads per sample, Table S1 in Supporting Information S1, Data Set S1. These reads contain ~27-bp sequences located adjacent to CCGG, ACGT, GCGC, CCGC and GGCG digested sites. Digestion frequencies (df) were calculated by counting the number of reads aligned to each site in the genome. A site with low df may indicate either a need for deeper sequencing or that the site is highly methylated. Three observations support the idea that most poorly digested sites were associated with high levels of methylation, rather than with poor coverage or random sampling of CpG-tags during sequencing: 1) Of the approximately 7 million CpGs tested across this study, 5,283,360 were represented in the libraries suggesting an effective coverage; 2) To a large extent, sites that were resistant to the enzymatic digestion, were arranged in tandem within the body of genes, repetitive sequences and intergenic regions. All these places are known to harbor most of the genomic methylation marks; 3) Digestion frequencies for poorly identified sites were consistently low in all the tissues and in all the replicates, which is not expected as a result of random sampling. Figure 1 illustrates these observations. Each digestion profile represents the averaged frequencies of three replicates. Highly digested sites segregate from those resistant to the enzymes. The majority of sites located in the 5′ regions of genes and in CGIs scored high digestion frequencies, whereas sites in intergenic regions, repetitive DNA or gene bodies, scored low. Figure 1 shows tissue specific differentially methylated regions (TS-DMR) spanning the entire intragenic CGIs in the last exon of several Zinc finger proteins, Table 1. All these differentially methylated CGIs distinguish somatic tissues from testis. Indeed, this epigenetic dichotomy seems to reflect a transcriptional dichotomy, Figure 2. For instance, the CGI located in the 5′ region of the Repin 1 gene is un-methylated and coincides with DNAse hypersensitivity in all the somatic tissues, whereas, the intragenic CGI located in the 3′ region of the gene is hyper-methylated, and insensitive to DNase I.
Whilst these results uphold the effectiveness of Methyl-Sensitive Cut Counting (MSCC) to profile methylation , , , we harnessed the power of this method to perform a multiple testing procedure to detect differentially methylated sites (DMS), . By implementing MSCC in a multiple pairwise comparison design, we controlled site-specific biases, Figure S1 in Supporting Information S1. In addition, the variability introduced by random errors is weighed in the denominator of the equation used to calculate the t-statistic; this allows us to ascribe the differences in the digestion frequencies to differences in methylation levels.
Although our experimental design allows for comparisons across the dynamic range, we contrasted only those sites that are highly supported by sequencing data. To conduct this selection we first normalized the libraries to reach similar sequencing depth Figure S2E in Supporting Information S1. The average df was computed for every CGI defined by the Takai and Jones algorithm . For each library, the 21,246 computed scores were organized in a frequency histogram, Figure S2E in Supporting Information S1. The figure shows a bimodal distribution that is consistent with the bimodal pattern of methylation found in CpG islands . As such the frequency histogram has been used to directly determine the methylation status of individual CGIs , but here we used them to derive an unrefined estimate of the relationship between digestion frequency and rate of methylation at each surveyed CpG. The sites with a df of 10 or less were considered heavily methylated and sites not reaching the cut-off in any of the libraries were excluded. For the identification of DMS multiple pairwise Welch t tests were performed. The null hypothesis was stated as no difference between the averages of the triplicates. Based on the differences detected by the Welch t-test further directional decisions were made to sort the means according to their values. This allowed the identification of sites whose rate of digestion was unusually high or low in only one of the tissues. These sites were called TS-DMS. The test was designed to control for errors associated with wrong rejections of the null hypothesis and also for assigning wrong inequality patterns. A mixed directional false discovery rate (mdFDR) of 10% was chosen in this analysis . Mouse gDNA samples were spiked with lambda gDNA (λDNA) to achieve an equimolar ratio between both genomes. In this way, for each one of the 1,202 completely un-methylated sites present in the lambda genome, 12 replicates were obtained, Data Set S2. Figure S2 A and C in Supporting Information S1 suggest that df computed for these replicates are approximately normally distributed with very slight departures in the tails of the distribution. Comparison of digestion frequencies among λ-gDNA replicates reveals that 10% of the 1,202 λ-sites were found to have significant differences in their digestion frequencies. Although no differences were expected between the 12 un-methylated λDNA replicates, our results show that these empirical values for false discovery rate were very close to the theoretical value of 10% selected for this study.
Applied to CpGs from the mouse genome the comparative differences reveals 150,428 sites where the methylation varies to the same extent among the analyzed four tissues. Furthermore, retaining only the sites whose digestion frequencies differed by a minimum of 10 resulted in a final count of 138,052 DMS (Data Set S3). Only 24,803 of these DMS were TS-DMS, i.e. sites that were hypo or hyper-methylated in only one tissue, Figure 3. Notably, 23,270 (94%) of these TS-DMS are un-methylated CpGs in one of the four tissues, of which 57% were in the testis.
An overwhelming number of genome-wide methylation studies focused on CGIs and promoter regions have proposed them as hot spots for epigenetic control of gene expression through CpG methylation . To investigate if our results are consistent with this concept, we compared the distribution of sites having a comparable methylation level among the four tissues with the distribution of TS-DMS. Almost all the sites identified as DMS or TS-DMS were found in introns or intergenic regions which are remote from promoters, Figure 4A. Recently, it has been proposed that intergenic and intragenic CGIs function as alternative promoters which can be repressed in a tissue specific manner through methylation of their CpGs , , . However we found that only 5 to 25% of TS-DMS were parts of CpG islands. The fraction of TS-DMS in CGIs increased proportionally to the number of island in each set, Figure 4-B.
Most recently, CGI shores, not CGI, have been proposed to be the targets of tissue-specific epigenetic regulation , . We found that the fraction of TS-DMS mapping to CGI shores (2000 base pairs from the edge of a CGI) follows the same trend and proportions than those mapping to CGIs, Figure 4B. The different CGI sets used in this study represent efforts to improve predictions of functionality on GC-rich genomic sequences, mainly promoter activity and un-methylated status. Notably the Gardiner- Garden and Frommer's set (GGF-CGI), which is the oldest, biggest and considered the less rigorous in term of functional prediction, is the one that collects the biggest number of TS-DMS. Approximately 60% of the tissue specific un-methylated CpGs are equally distributed between GGF-CGI and their shores. Conversely, the GGF-CGI-derived more stringent sets, such as those produced applying the Takai and Jones criteria  or with an epigenome prediction pipeline , are the ones containing the smallest fractions of TS-DMS in their CGIs or shores, Figure 4b. Thus the main characteristic in a set correlating with the proportion of TS-DMS is their number of predicted CpG rich loci, suggesting that the effort done to improve the prediction of epigenetically regulatory sequences in a genome failed to capture the DNA features that guide tissue-specific epigenetic mechanisms.
The Gene Expression Barcode browser contains a list of genes whose levels of expression, in 89 murine tissues, were represented as binary calls (expressed or not-expressed) . These lists were made from 9,652 publicly available gene expression results obtained with the Mouse Genome 430 Array 2.0. Genes detected above threshold in more than 95% of samples of a specific tissue are considered expressed in that tissue. We considered a gene to be expressed in a tissue-specific manner when it is expressed in only one of the four organs included in our study. In addition we created a list including all the genes with at least one TS-DMS in the region extending from -3 Kb to +3 Kb of the edges of the gene. We found that genes associated with TS-DMS are more likely to be expressed in a tissue specific manner. The strength of this association varies across the different tissues, Table 2.
A map of cis-regulatory elements for the mouse genome has recently been published . The map includes 53,834 putative promoters and 234,764 potential enhancers, many of which are tissue specific. These enhancers were defined according to the presence of ChIP-Seq peaks for histone H3 lysine 4 mono-methylation marks (H3K4me1) outside promoters. We obtained the coordinates for 24,128 tissue-specific enhancers that are divided between liver, kidney and testis (http://chromosome.sdsc.edu/mouse/download.html). We compared the relative number of TS-DMS and surveyed CpG sites inside and outside of these enhancer-DNA sequences. We found that the percentage of TS-DMS within enhancers is up to 30 times higher than outside enhancers Table 3 (p-value<10−5, simulation of 100,000 iterations), with enrichment varying across tissues. Next we examined the overlap of TS-DMS with active enhancers, as defined by acetylation marks (H3K27ac),  (Table 4). A comparison of the results (Table 3 with Table 4) shows that most of the TS-DMS derived from somatic tissues are localized in active enhancers, while the opposite is true for the case of the TS-DMS derived from the testis. For example, of the 792 TS-DMS marking enhancers in liver, 561 (70%) are in active enhancers.
We created two lists representing different levels of association between the TS-DMS and their corresponding genes, Data Set S4. If a TS-DMS is outside the region occupied by the gene ±3,000 bp it was assigned to the nearest TSS (weak association). If a TS-DMS is inside the region occupied by the gene ±3,000 bp it was directly assigned to that gene (strong association). Gene annotation enrichment analyses using these gene lists were mainly based on the Gene Ontology data base (GO) . To obtain the functional profiles we used DAVID Bioinformatics Resources ,  and Gene Ontology enRIchment anaLysis and visuaLizAtion tool (GORILLA) . Genes associated with TS-DMS were significantly enriched in tissue-specific expression annotation in each of the four tissues, (Table 5). Table 6 tabulates the functional enrichment results for biological processes in the context of a directed acyclic graph (DAG) structure created using the GOrilla tool, e.g. Figure 5. The table includes all the terms at terminal nodes of the DAG describing the most specific and informative biological processes that showed significant enrichment. Terms described in these sets matched distinct attributes of the physiology and biochemistry of each particular tissue, Figures 5 and Figures S3and S4 in Supporting Information S1, also Table 6. Interestingly, genes physically linked to TS-DMS are mostly associated with typical activities of the respective adult organ, whilst many genes with TS-DMS distal to their promoters are associated with embryonic development, Table 6 and Figure S4 in Supporting Information S1. Finally, we scored and ranked the genes according to the number of TS-DMS associated with them, and used the GOrilla tool to discover GO terms that are significantly enriched at the top of a ranked gene list. In the case of liver, we found very high enrichments for genes involved in lipid metabolism and homeostasis, particularly for processes including synthesis, transport and cellular response to cholesterol, Table 7.
We performed an unsupervised hierarchical clustering analysis to visualize the TS-DMS structure between and within tissue samples as a hierarchical dendrogram where the branch lengths represent the degree of epigenetic similarity between samples. Samples from the same tissues were grouped together regardless of whether DMS or TS-DMS dataset was used (Figure 6 A and B). These results suggest that the DMS identified among 6,955,111-surveyed CpGs have the potential to function as tissue-specific methylation fingerprints.
We next tested if the liver specific DMS can be used as a unique epigenetic signature of the adult mouse liver. We combined the DMS found in the four-tissue dataset with similar data obtained from livers of 16 outbred mice and also the liver from a E11.5 C57BL/6J embryo. Despite the differences in the genetic background among these samples, the livers from all 19 mice co-clustered separately from brain, kidney, and testis, Figure 6C. A sample derived from embryonic liver (E11.5) was most similar to the adult liver samples than those from other organs, but still clustered separately from adult-stage liver, Figure 6D.
Given the critical role of the liver in controlling whole-body metabolism, and the high frequency of TS-DMS in genes that control key aspects of the hepatic lipid homeostasis Figure S3 in Supporting Information S1, we analyzed the impact of prenatal and postnatal nutrient availability on liver methylation patterns. Differentially methylated sites (DMS) were isolated from mice whose mothers had been subjected to four different feeding protocols : CC, prenatal and postnatal control diet; CU, prenatal control diet and postnatal food restriction; UU, prenatal and postnatal food restriction; UC, prenatal food restriction and postnatal control diet (for details see ). Comparison of CC, CU and UU liver samples led to the identification of 5,574 DMS (mdFDR 15%), Data Set S5. Practically all these DMS were hypo-methylated in mice that were exposed to a control diet during the whole experiment (CC) but hyper-methylated in those mice exposed to any of the diets including food restriction (UC, CU or UU). Out of these 5,574 DMS induced for food restriction, 533 were previously identified as differentially methylated sites in the comparisons among the four tissues (vide supra). Remarkably genes associated with these 533 DMS were highly enriched (between 4 to 15 times) in annotations related to SMAD protein signal transduction, steroid and fatty acid metabolic process.
We examined to what extent these DMS can function as epigenetic signatures of each nutritional exposure. Unsupervised clustering of MSCC data obtained from 4 CC, 5 CU, and 5 UU samples regrouped experiments according to the treatment, Figure 7B. When data from a different treatment was included (UC) the samples clustered outside the previous defined groups, Figure 7B. We inferred that these DMS constitute epialleles that traced the nutritional history of mice during the in utero and/or immediate postnatal stages.
Based on our expanded methyl-sensitive cut counting (eMSCC) method , we screened for differential methylation in gDNA samples derived from liver, kidney, brain and testis. We detected 138,052 differentially methylated sites, of which 24,803 were uniquely methylated or un-methylated in one but not in the other tissues (Figure 3). Strikingly, 23,270 (93%) of these TS-DMS were hypomethylated, suggesting that un-methylation rather that methylation is the epigenetic state highlighting developmentally active loci.
In agreement with previous results obtained by Restriction Landmark Genomic Scanning (RLGS) or Methylated DNA Immunoprecipitation (MeDIP), we found that most TS-DMS represent differences between the testis and somatic tissues , , –. Interestingly and more often in testis, many TS-DMS co-localize in CpG islands or CpG clusters to form testis-specific un-methylated regions within gene bodies or gene deserts. The transcriptional landscape around these loci seems inconsistent with the notion that they may represent alternative promoters (Figure 1) , , . A particularly interesting result is that tissue-specific differential methylation at intragenic CpG clusters correlates with tissue-specific differential expression of zinc finger protein genes (Zfp), We found 39 Zfp genes with a differentially methyled region in the 3′ terminal exon that is invariably methylated in somatic tissues and hypomethylated in testis, (Table 1). Remarkably, most of them follow a similar inverse relationship between the level of methylation at the 3′ terminal exon and gene transcriptional activity, Figure 2. It is intriguing why their TSSs remain largely un-methylated in all tissues but their intragenic CGIs seem to be preferential sites for de novo methylation in somatic tissues during development. The lack of differential methylation at the promoter CGIs suggests a limited role for these loci in the epigenetic transcriptional regulation, however aberrant methylation marks in the promoters of a Zfp correlates with cervical cancer . Probably, the 3′ exons start the developmental program as un-methylated regions, acquiring methyl-marks as the embryo develops into adult. For instance, spatio-temporal and cell-specific methylation of these exonic-CpGs could avoid the binding of a gene repressor. Most of Zfp genes described in Table 1 belong to the C2H2 Zf family, which are enriched in KAP 1 repressor target sites. Interestingly ChiP-chip experiment detected most of the KAP1 binding sites towards the 3′ transcribed regions of these Zf-genes, . Alternatively, the exonic-DNA methylation could turn the chromatin in to a Pol II elongation-permissive state. For example it has been recently shown that H3K79, H4K20, H2BK5 and H3K36m3 mark nucleosomes wrapping exonic-DNA. The tri-methylation of H3K36 is the most prominent mark and correlates positively with increased gene expression levels and stimulated transcriptional elongation –. In agreement with the positions of TS-DMS we found in the Zfp genes, H3K36me3 is primarily found in downstream exons, . The co-occurrence of H3K36me3 and DNA hypermethylation has been recently shown in the bodies of zinc finger genes along chromosome 9, . When all genes covered by this study were considered, these authors found that the co-occurrence between H3K36me3 and hypermethylation was more frequently observed at last exons or at highly expressed genes. Overall the data suggest that differential methylation of Zfp genes at 3′ exons constitutes a developmental epigenetic signature. Some of these same genes have been already identified as playing key roles in cell fate specification, . It will be interesting to follow the dynamics of gene expression and methylation at these loci during embryonic development.
We observed intriguing differences in the methylation pattern of DNA derived from testis. Firstly, the largest numbers of TS-DMS were observed in testes. However, testis showed the weakest association of TS-DMS with active enhancers or tissue-specific gene expression (Tables 2, ,4,4, and and5).5). Secondly, testis TS-DMS were localized at loci encoding genes involved in the embryonic development of multiple organs and anatomical structures, Figure S4 in Supporting Information S1 and Table 6. Whilst we recognize that testis includes both gametes and somatic cells, our data suggest that part of the epigenetic information involved in development would be pre-coded in the sperm genome.
The presence of differentially methylated CGIs is rarer in somatic tissues, by contrast most TS-DMS are located far from promoters, in loci with scarce CpGs and in many overlapping regions reported as tissue-specific enhancers. Indeed we found that local DNA hypomethylation highlighted active enhancers (H3K4me1 and H3K27me3) in a tissue-specific manner. Indeed, it has recently been shown that transcription factors (TF) can actively induce open chromatin spots and local de-methylation of low CpG density sequences, which ultimately constitute a footprint for DNA regulatory elements , . Despite the scarcity of methylable sites in these elements, the methylation of critical CpGs was sufficient to prevent the TF-DNA interaction, .
Not only did we find active enhancers highly enriched in TS-DMS, but we also found that genes spotted with TS-DMS are expressed in a tissue-specific manner, (Tables 2 and and5)5) with their functional profiles matching specific functions of the corresponding tissues. Considering all our data we suggest that most of the sequences having TS-DMS must be cis-acting elements for tissue-specific trans-acting factors.
To assess the extent to which the TS-DMS could demarcate tissue-specific physiological aspects, we performed annotation enrichment analysis in the set of genes associated with these epigenetic marks. We found that the position of the TS-DMS in relation to genes defines two types of functional profiles. When TS-DMS are found in intergenic regions and distant from gene boundaries, there is a notable enrichment in processes related to embryonic development Figure S4 in Supporting Information S1.
In the instance where TS-DMS are within or very near to the borders of the gene, most of the biological processes describe physiological and organ-specific functions, (Table 6). A remarkable observation is that genes involved in specific liver functions such as, hepatic cholesterol homeostasis, response to nutrients, response to peptide hormones, plasma clearance of lipoproteins and metabolic control of xenobiotic, are distinguished in the liver genome by the presence of TS-DMS, (Figure 5 and Figure S3 in Supporting Information S1. Indeed, by ranking genes according to the number of TS-DMS and querying for enriched GO terms at the top of a ranked list, we found a selective increase in functional specificity towards lipid homeostasis, especially hepatic cholesterol management, Table 7.
We speculate that the high incidence of TS-DMS demarcating key genetic aspects of hepatic physiology could account for many of the findings that report associations between epigenetic alterations and metabolic disorders. Multiple studies have demonstrated that the nutritional state during sensitive developmental periods, including the intrauterine life and lactation periods, can “program” developmental trajectories to improve the ability of the fetus to survive in similar postnatal environments. Unfortunately, these adaptations may also modulate risk for metabolic disease in adult life. An attractive hypothesis is that these effects of nutritional exposures are mediated by altered epigenetic regulation of transcription. Whether methylation is the primary triggering event, or mediated by histone modification or effects of noncoding RNAs, remains unclear. However, our data suggest that differential methylation may mark key developmental loci, which are susceptible to nutritional or other environmental insults.
We identified thousands of sites whose differentially methylated status generated epialleles in the offspring of mothers subjected to different feeding protocols (Figure 7, Data Set S5). We used correlation distances to look for comparable variations in the magnitude of digestion frequencies in the sites detected as differentially methylated. Unsupervised cluster analysis shows that this set of epi-alleles characterize the nutritional history of each mouse.
In summary, in the present work we extended our previous observation that outside of CGIs and promoters there are a large number of regions with hypomethylated CpG, at low density. These regions not only outnumber the CGI but also are the loci with the greater enrichment in regulatory sequences, , . Here we show that CpGs that are part of TS-DMS are also typically found far from promoters and outside of CGIs. TS-DMS are enriched in sequences recognized as tissue-specific enhancers and associated with tissue-specific expressed genes. Genes associated with these sites fulfill roles in the development of the corresponding tissue but also in their specific physiological functions. The identification of loci with aberrant DNA methylation marks has been largely extended during the last 5 years. Our results agree with the idea that epi-mutation rather than mutations better explain the developmental origin of diseases . Most of the work on disease-related DNA methylation has focused on hypermethylation of CGI in cancer. Thus, the relevance of aberrant methylation marks outside CGI in human disease has not been well studied. Here we show that during development methylation of CpGs outside promoter or CGI seemed to be far more dynamic than commonly appreciated. We propose that the catalog of TS-DMS presented in this work, particularly those linked to the hepatic function could constitute a comprehensive epi-panel for detection and diagnosis of metabolic diseases.
This file contains Table S1 and Figure S1–Figure S4. Table S1, which is a summary of the results of the short-reads-alignments to the NCBI37/mm9 assembly mouse reference genome; Figure S1, showing the influence of random sampling and systematic bias are canceling out during the pairwise site-by-site comparisons; Figure S2, Validating of the assumption of normality for the distribution of digestion frequencies (methyl sensitive cut counts) in CpGs from both lambda and mouse replicates; Figure S3, detailing the biological regulation branch showed in Figure 5 and Figure S4, showing the functional enrichment analysis results for genes proximal to TS-DMS located at intergenic regions. A description of the column headers in Data Sets S1 to S5 is provided in tables at the end of the document.
Methyl Sensitive Cut Counting Results for CpG sites surveyed in the mouse genome
Methyl Sensitive Cut Counting Results for CpG sites surveyed in the lambda phage genome
138,052 differentially methylated sites (DMS), their genomic coordinates and their closest
List of genes used for Gene Ontology enrichment analysis
List of 5,574 diet-reprogrammed differentially methylated CpGs
We thank Horacio Colaneri for providing expertise in graphic design and help with the production of different data sets published in this article. To Dagoberto Grenet and Yitang Yan for providing experimental support.