|Home | About | Journals | Submit | Contact Us | Français|
DNA cytosine methylation is a central epigenetic modification that plays essential roles in cellular processes including genome regulation, development and disease. Here we present the first genome-wide, single-base resolution maps of methylated cytosines in a mammalian genome, from both human embryonic stem cells and fetal fibroblasts, along with comparative analysis of mRNA and small RNA components of the transcriptome, several histone modifications, and sites of DNA-protein interaction for several key regulatory factors. Widespread differences were identified in the composition and patterning of cytosine methylation between the two genomes. Nearly one-quarter of all methylation identified in embryonic stem cells was in a non-CG context, suggesting that they may utilize different methylation mechanisms to affect gene regulation. Methylation in non-CG contexts showed enrichment in gene bodies and depletion in protein binding sites and enhancers. Non-CG methylation disappeared upon induced differentiation of the embryonic stem cells, and was restored in induced pluripotent stem cells. We identified hundreds of differentially methylated regions proximal to genes involved in pluripotency and differentiation, and widespread reduced methylation levels in fibroblasts associated with lower transcriptional activity. These reference epigenomes provide a foundation for future studies exploring this key epigenetic modification in human disease and development.
Thirty-four years have passed since Riggs, Holliday and Pugh proposed that cytosine DNA methylation in eukaryotes could act as a stably inherited modification affecting gene regulation and cellular differentiation1,2. Since then, intense research effort has expanded our understanding of diverse aspects of DNA methylation in higher eukaryotic organisms. These include elucidation of molecular pathways required for establishing and maintaining DNA methylation, cell-type specific variation in methylation patterns, and the involvement of methylation in multifarious cellular processes such as gene regulation, DNA-protein interactions, cellular differentiation, suppression of transposable elements, embryogenesis, X-inactivation, genomic imprinting and tumorigenesis3–9. DNA methylation, together with covalent modification of histones, is thought to alter chromatin density and accessibility of the DNA to cellular machinery, thereby modulating the transcriptional potential of the underlying DNA sequence10.
Genome-wide studies of mammalian DNA methylation have previously been conducted, however they have been limited by low resolution11, sequence-specific bias, or complexity reduction approaches that analyze only a very small fraction of the genome12–14. In order to improve our understanding of the genome-wide patterns of DNA methylation we have generated single-base resolution DNA methylation maps throughout the majority of the human genome in both embryonic stem cells and fibroblasts. Furthermore, we have profiled several important histone modifications, protein-DNA interaction sites of regulatory factors, and the mRNA and small RNA components of the transcriptome to better understand how changes in DNA methylation patterns and histone modifications may affect readout of the proximal genetic information.
Single-base DNA methylomes of the flowering plant Arabidopsis thaliana were previously achieved using MethylC-Seq15 or BS-Seq16. In this method, genomic DNA is treated with sodium bisulfite (BS) to convert cytosine, but not methylcytosine, to uracil, and subsequent high-throughput sequencing. We performed MethylC-Seq for two human cell lines, H1 human embryonic stem cells17 and IMR90 fetal lung fibroblasts18, generating 1.16 and 1.18 billion reads, respectively, that aligned uniquely to the human reference sequence (NCBI build 36/HG18). The total sequence yield was 87.5 and 91.0 gigabases (Gb), with an average read depth of 14.2× and 14.8× per strand for H1 and IMR90, respectively (Supplementary Fig. 1a). In each cell type, over 86% of both strands of the 3.08 Gb human reference sequence are covered by at least one sequence read (Supplementary Fig. 1b), accounting for 94% of the cytosines in the genome.
We detected approximately 62 million and 45 million methylcytosines in H1 and IMR90 cells, respectively (1% FDR, see Supplementary Materials, Fig. 1a), comprising 5.83% and 4.25% of the cytosines with sequence coverage. Full browsing of the entire dataset at single base resolution can be performed at http://neomorph.salk.edu/human_methylome using the AnnoJ browser (www.annoj.org). Of the methylcytosines detected in the IMR90 genome, 99.98% were in the CG context, and the total number of mCG sites was very similar in both cell types. In the H1 stem cells we detected abundant DNA methylation in non-CG contexts (mCHG and mCHH, where H = A, C or T), comprising almost 25% of all cytosines at which DNA methylation is identified, and accounting for most of the difference in total methylcytosine number between the cell types (Fig. 1a). The prevailing assumption is that mammalian DNA methylation is located almost exclusively in the CG context, however a handful of studies have previously detected non-CG methylation in human cells, and in particular in embryonic stem cells19,20. Bisulfite-PCR, cloning and sequencing of selected loci displaying H1 non-CG methylation in several human cell lines revealed that a second embryonic stem cell line, H917, displayed mCHG and mCHH at conserved positions, confirming that non-CG methylation is likely a general feature of human ES cells (Fig. 2, Supplementary Table 2). In addition, like IMR90 cells, BMP4-induced H1 cells lost non-CG methylation at several loci examined while methylation in the CG context was maintained, indicating that the pervasive non-CG methylation is lost upon differentiation. Furthermore, analysis of these loci in IMR90 induced pluripotent stem (iPS) cells revealed restored non-CG methylation (Fig. 2). Overall this demonstrates that the CHG and CHH methylation identified in H1 and absent in IMR90 are not simply due to genetic differences between the two cell types, but rather the presence of non-CG methylation is characteristic of an embryonic stem cell state. For each cell type, two biological replicates were performed with cells of different passage number (see Supplementary Materials), and comparison of the methylcytosines identified independently in each replicate revealed a high concordance of cytosine methylation status between replicates (Supplementary Fig. 2). For each cell type, the final DNA methylation map presented in this study represents the composite of the two biological replicates. The OCT4 gene exemplifies both cell-specific differential methylation and the presence of non-CG methylation (Fig. 1b), and in addition displayed a ~50-fold reduction in OCT4 transcript in IMR90 (data not shown). The absence of mCHG and mCHH methylation in IMR90 coincided with significantly lower transcript abundance of the de novo DNA methyltransferases (DNMTs) DNMT3A and DNMT3B and the associated DNMT3L in IMR90 cells (Supplementary Fig. 3), which is supported by a previous study of DNA methylation in ES cells and somatic cells19 and by the determined target sequence specificity of these DNMTs21,22.
Multiple reads covering each methylcytosine can be used as a readout of the fraction of the sequences within the sample that are methylated at that site16, here referred to as the methylation level of a specific cytosine. Similar to the Arabidopsis genome15, in the H1 genome we observed that 77% of mCG sites were 80–100% methylated, whereas 85% of mCHG and mCHH sites were 10–40% methylated (Fig. 1c), indicating that at sites of non-CG methylation only a fraction of the surveyed genomes in the sample were methylated. Notably, 56% of mCG sites in IMR90 were highly methylated (80–100%, Fig. 1c), indicating that although the total number of mCG sites in H1 and IMR90 is similar, in general the IMR90 mCG sites were typically less frequently methylated. In support of this, considering all CG site sequencing events, 82.7% and 67.7% were methylated in H1 and IMR90, respectively. A global-scale view of DNA methylation levels revealed that the density of DNA methylation showed large variations throughout each chromosome (Fig. 1d). Sub-telomeric regions of the chromosomes frequently showed higher DNA methylation density (Fig. 1d, Supplementary Fig. 4), which was previously reported as important for control of telomere length and recombination23,24. The smoothed profile of DNA methylation density in 100 kb windows indicated that on the chromosomal level the density profile of mCG in H1 and IMR90 was similar. The density profiles of mCHG and mCHH revealed that non-CG methylation was present throughout the entire chromosome. These two non-CG methylation marks showed a moderate correlation and did not always occur together (Pearson correlation 0.5 in 1 kb windows; Supplementary Fig. 2d). Notably, changes in density of the non-CG methylation were distinct from that of mCG in a number of regions.
To characterize the abundant non-CG methylation in the H1 genome, we compared the average density of methylation relative to the underlying density of all potential sites of methylation in each context (henceforth referred to as the relative methylation density), throughout various genomic features (Fig. 3a, Supplementary Fig. 5). We observed a correlation in the density of mCG and the distance from the transcriptional start site (TSS), with mCG density increasing in the 5’ UTR to a similar level in exons, introns and the 3’ UTR as to 2 kb upstream of the TSS (Fig. 3a). We generally observed lower relative densities of methylation at CG islands and TSS, however a subset of these regions did not display this depletion (Supplementary Fig. 6)13,14,25. mCHG and mCHH methylation densities also decreased significantly toward the TSS and returned to the same level as 2 kb upstream at the end of the 5’ UTR, however within exons, introns and 3’ UTRs the non-CG methylation densities were twice as high. Intriguingly, the mCHH density was approximately 15–20% higher in exons than within introns and the 3’ UTR. To identify links between gene activity and non-CG methylation level within the gene body we performed strand-specific RNA-Seq15 and observed a positive correlation between gene expression and mCHG (R = 0.60) or mCHH (R = 0.58) density (Fig. 3b), with highly expressed genes containing 3-fold higher non-CG methylation density than non-expressed genes (Supplementary Fig. 7a). However, no correlation was observed between CG methylation density and gene expression in the H1 cells (Fig. 3b).
We identified 447 and 226 genes that were proximal to genomic regions highly enriched for mCHG and mCHH, respectively, with 180 genes in common. An example of non-CG methylation enrichment in such a gene, Splicing Factor 1, is shown in Fig. 3c. Analysis of gene ontology terms for each set revealed significant enrichment for genes involved in RNA processing, RNA splicing, and RNA metabolic processes (P 2 × 10−11, Supplementary Fig. 7b). Unexpectedly, we found a significant enrichment of non-CG methylation on the anti-sense strand of gene bodies, for both mCHG and mCHH enriched sets of genes (P < 0.1 and P < 0.001, respectively, Fig. 3d). The anti-sense strand serves as the template for RNA polymerization, and further investigation will be required to determine if there are functional repercussions of this non-CG methylation strand bias. We also observed that genes in H1 had significantly more RNA originating from introns than in IMR90, relative to the total number of sequenced reads in each sample, and this discrepancy in intronic read abundance was significantly enhanced in the mCHG and mCHH enriched genes (P < 0.001, Fig. 3e). The higher abundance of intronic reads was associated with higher non-CG methylation within gene bodies, rather than differential non-CG methylation of exons versus introns.
In the Arabidopsis genome, the methylation state of a cytosine in the CG and CHG contexts is highly correlated with the methylation of the cytosine on the opposite strand within the symmetrical site15,16. While we observed that 99% of mCG sites from the human cell lines were methylated on both strands, surprisingly mCHG was highly asymmetrical, with 98% of mCHG sites being methylated on only one strand. This raises an interesting question as to how these sites of DNA methylation are consistently methylated in a considerable fraction of the genomes without two hemi-methylated CHG sites as templates for faithful propagation of the methylation state (Fig. 1c). It is not yet known whether continual, but indiscriminate, de novo methyltransferase activity preferentially methylates particular CHG sites after replication, or if a persistent targeting signal is present that drives CHG methylation.
We analyzed the genome sequence proximal to sites of non-CG methylation to determine whether enrichment of particular local sequences were evident, as previously reported in the Arabidopsis DNA methylomes15,16. Whereas no local sequence enrichment was observed for mCG sites, a preference for the TA dinucleotide upstream of non-CG methylation was observed (Fig. 3f and Supplementary Fig. 8). Furthermore, the base following a non-CG methylcytosine was most commonly an A, with a T also observed relatively frequently, a sequence preference observed in previous in vitro studies of the mammalian DNMT3 methyltransferases21,22.
To determine whether there was any preference for the distance between adjacent sites of DNA methylation in the human genome, we analyzed the relative distance between methylcytosines in each context within 50 nucleotides in introns. We focused on introns because these are genomic regions enriched in non-CG methylation, but unlike exons, are not constrained by protein coding selective pressures (Fig. 3g,h). Analyses for random genomic sequences and exons are presented in Supplementary Fig. 9, together with mCG spacing patterns. For methylcytosines in all contexts, a periodicity of 8–10 bases was evident (Fig. 3g, h and Supplementary Fig. 9), but interestingly a strong tendency was observed for two pairs of 8-base separated mCHG sites spaced with 13 bases between them. An 8–10 base periodicity was also evident for mCHH sites, corresponding to a single turn of the DNA helix, as previously observed in the Arabidopsis genome16, indicating that the molecular mechanisms governing de novo methylation at CHH sites may be common between the plant and animal kingdoms. A structural study of the mammalian de novo methyltransferase DNMT3A and its partner protein DNMT3L found that 2 copies of each form a heterotetramer that contains two active sites separated by the length of 8–10 nucleotides in a DNA helix26,27. The consistent 8–10 nucleotide spacing we observed in the human genome suggests that DNMT3A may be responsible for catalysing the methylation at non-CG sites. Notably, the mCHG and mCHH relative spacing patterns were distinct, suggesting that this sub-categorization of the non-CG methylation is appropriate, and that distinct pathways may be responsible for the deposition of mCHG and mCHH in the human genome.
Numerous past studies have documented that DNA methylation can alter the ability of some DNA binding proteins to interact with their target sequences28–32. In order to further investigate this relationship we used ChIP-Seq33 to identify sites of protein-DNA interaction in H1 cells for a set of proteins important for gene expression in the pluripotent state, namely NANOG, SOX2, KLF4, and OCT4, as well as proteins involved in the transcription initiation complex and in enhancers (TAF1 and p300, respectively) (Supplementary Tables 3–8). In general we observed a decrease in the profile of relative methylation density toward the site of interaction, particularly in the non-CG context, independently from proximity to the TSS (Fig. 4a and Supplementary Fig. 10). The IMR90 genome showed higher average density of methylation at H1 SOX2 and p300 interaction sites, but had similar CG methylation densities for the H1 NANOG and OCT4 interaction sites, even though the genes encoding these proteins are transcribed at a very low level in IMR90 relative to H1 (47 – 50 fold less mRNA), and are not considered to be functional in fibroblasts. This suggests that these genomic regions are generally maintained in a less methylated state in multiple cell types regardless of the occupancy of these specific DNA binding proteins.
We next analyzed the patterns of DNA methylation in sets of enhancers either unique to each cell type or shared. ChIP-Seq was utilized to detect the location of enhancers throughout the H1 and IMR90 genomes, defined as regions of simultaneous enrichment of the histone modifications H3K4me1 and H3K27ac34 (Fig. 4b). We examined the average relative DNA methylation density at enhancer sites, as well as the flanking genomic regions, and found a depletion of CG methylation at IMR90-specific enhancers, yet enrichment in mCG density in H1 at the same genomic locations (Fig. 4b). In contrast, at H1-specific enhancers there was no change in mCG density in either the H1 or IMR90 genome, but non-CG methylation density decreased approximately 3-fold at the enhancer sites, relative to the density 5 kb up- and downstream. This is in agreement with the depletion of non-CG methylation in the H1 genome at predicted sites of p300 interaction (Fig. 4a), a strong indicator of enhancer activity34. The set of enhancer sites present in both H1 and IMR90 cells showed both of these cell-specific patterns: lower mCG density in IMR90 and lower non-CG methylation density in H1. The specific depletion of DNA methylation at active enhancers in each cell type (also recently reported on a limited basis35) indicates maintenance of these elements in an unmethylated state, potentially preventing interference in the process of protein-DNA interaction at these sites. Notably, H1 cells had depleted non-CG methylation but not mCG, in contrast to the mCG depletion at IMR90 enhancers. These data could indicate cell-type specific utilization of different categories of DNA methylation, possibly coupled with novel stem-cell specific factors that are able to recognize non-CG methylation, akin to the specific binding of the H3K9 histone methyltransferase KRYPTONITE to non-CG methylation sites in Arabidopsis 36.
The paradigm of DNA methylation controlling aspects of cellular differentiation necessitates that patterns of methylation vary in different cell types. Numerous studies have previously documented differences in DNA methylation between cell types and disease states7,8,10,37. With comprehensive maps of DNA methylation throughout the genomes of the two distinct cell types, we next characterized changes in DNA methylation evident between the H1 and IMR90 DNA methylomes, and explored how these changes may relate to the distinctiveness of these cells.
Pairwise comparison of methylation at the same genomic coordinates between H1 and IMR90 is required to reveal cell-specific methylation patterns potentially masked by average profiles. The Pearson correlation coefficient of the mCG methylation state between H1 and IMR90 was calculated for 20 equally sized windows flanking or within various genomic features (Fig. 5a and Supplementary Fig 11), providing a measure of methylation state conservation at these genomic features between the two cell types, and distinct from the average methylation density profiles presented above (Fig. 4). At the sites of protein-DNA interaction surveyed in Fig. 4a, we observed a decrease in the correlation of methylation compared to the flanking 1.5 kb of the genome (Fig. 5a), except for KLF4 (data not shown). This decrease was most pronounced at the predicted site of protein-DNA interaction, indicating that even though the mCG depletion was a general feature of the surveyed protein binding sites (Fig. 4a), when a pairwise comparison of the methylation status at each cytosine associated with the protein binding site between H1 and IMR90 was performed a significant decrease in the conservation of methylation was observed (Fig. 5a).
Surprisingly, we found that a large proportion of the IMR90 genome displayed lower levels of CG methylation than H1 (Fig. 1c). Contiguous regions with an average methylation level less than 70% were identified (mean length = 153 kb), which we termed partially methylated domains (PMDs) (Fig. 5b, Supplementary Fig. 12 and Supplementary Table 9). The PMDs comprised a large proportion of every autosome (average = 38.4%), and 80% of the IMR90 × chromosome (Supplementary Fig. 12), consistent with the lower levels of DNA methylation reported in the inactive × chromosome38. As IMR90 cells are derived from a female (XX), it is anticipated that simultaneous sequencing of BS-converted genomic DNA from both the inactive and the active × chromosomes will manifest as PMDs throughout the majority of the × chromosome. However, the widespread prevalence of PMDs on the autosomes was unexpected. We analyzed the ratio of methylated to unmethylated CG sites within individual MethylC-Seq reads. The IMR90 reads located within PMDs were more frequently partially- or un-methylated compared to all IMR90 reads aligned to the autosomes (Supplementary Fig. 12b). The decrease in PMD methylation manifested similarly in IMR90 autosomes and chromosome ×, however currently we cannot determine whether common pathways are responsible for altering methylation patterns in all chromosomes.
Upon inspection of 5,644 genes with a TSS located in or within 10 kb of a PMD, we found a strong enrichment for these genes to be less expressed in IMR90 (P = 2 × 10−47, Fisher’s Exact Test, FET). Specifically, of all of the genes that were more highly expressed in H1 (H1 ≥ 3 × IMR90 transcript abundance), 42% were located within PMDs, compared to only 13% of all more highly expressed genes in IMR90 cells being located in PMDs (Fig. 5b,c, Supplementary Tables 10,11). Many of the partially methylated and down-regulated genes in IMR90 displayed lower proximal H3K4me3 and H3K36me3 modifications, and higher proximal H3K27me3 levels (Fig. 5b, Supplementary Fig. 13, and Hawkins et al., submitted). While in IMR90 cells we observed a positive correlation between the mean gene body mCG methylation level and gene expression, no such relationship was discernible in H1 cells (Fig. 5d). Consequently, the positive correlation between gene expression and gene body methylation recently reported12 could be re-interpreted as a depletion of methylation in repressed genes in differentiated cells.
A sliding window approach was used to identify differentially methylated regions (DMRs) enriched for cytosines where IMR90 was more highly methylated than H1 (5% FDR, FET, Supplementary Fig. 14). We identified 491 DMRs, and in a window spanning 20 kb up- and down-stream of each DMR we surveyed mCG density, mRNAs, smRNAs, H3K4me3, H3K36me3, H3K27me3, genes, and repetitive elements (Fig. 6, Supplementary Table 12, and Hawkins et al., submitted). The DMRs were associated with 139 and 113 genes more highly expressed in H1 and IMR90, respectively. More than half of these genes were associated with DMRs located within 2 kb upstream of the TSS or the 5’ UTR, which include factors previously defined as playing a role in embryonic stem cell function39 (Supplementary Fig. 15 and Supplementary Tables 13,14).
Complete linkage hierarchical clustering of these data revealed two broad categories of transcriptional activity, histone modifications and DNA methylation proximal to the DMRs (Fig. 6). Group 1 DMRs are associated with high proximal H3K4me3, H3K36me3, and transcriptional activity relative to IMR90, and are unmarked by H3K27me3 in both cell types. Although we did not observe widespread association of small RNA molecules with enrichment of DNA methylation, we found that a subset of Group 1 DMRs co-localize with dense clusters of small RNAs that map to annotated Human Endogenous Retroviruses (HERVs)40. Notably, the HERVs were less densely methylated in H1 and frequently associated with high downstream transcriptional activity, in contrast to the more methylated state in IMR90 that was not associated with abundant small RNAs and showed little proximal transcription (Fig. 6 and Supplementary Fig. 16). Accurate targeting of DNA methylation by small RNAs is a well-established process in plants41. While our data did not provide evidence for the existence of an analogous process in the human cells, further experiments may be required to investigate this relationship in greater detail, such as DNA methylation profiling following silencing of components of the RNAi machinery.
Group 2 DMRs were associated with gene-rich sequences that were more highly expressed in IMR90 cells and generally displaed a depletion of LINEs in the flanking sequence, with concomitant H3K27me3 modification and less DNA methylation, as observed in many IMR90 PMDs. Furthermore, Group 2 regions in H1 frequently displayed both H3K4me3 and H3K27me3 modifications, characteristic of the bivalent state that is thought to instil a suppressed but poised transcriptional status42,43. Many of these regions showed markedly less H3K27me3 in IMR90 in addition to more DNA methylation, suggesting that prior repression may have been relieved, and defining a set of genes potentially regulated by DNA methylation and involved in the developmental transition from a pluripotent to differentiated state.
We found extensive differences between the DNA methylomes of two human cell types, revealing the highly dynamic nature of this epigenetic modification. The genomic context of the DNA methylation is resolved, here revealing abundant methylation in the non-CG context, which is typically overlooked in alternative methodologies. Profiling of enhancers and different patterning of CG and non-CG methylation in gene bodies and their different correlation with gene expression suggest possible alternative roles for DNA methylation in these two contexts. The exclusivity of non-CG methylation in stem cells, likely maintained by continual de novo methyltransferase activity and not observed in differentiated cells, suggests that it may play a key role in the origin and maintenance of this pluripotent state. Essential future studies will need to explore the prevalence of non-CG methylation in diverse cell types, including variation throughout differentiation and its potential reestablishment in induced pluripotent states.
Human H1, H9, BMP4-induced H1, and IMR90 cells were cultured as described previously34,44,45. smRNA-seq libraries were generated from 10 – 50 nt small RNAs using the Small RNA Sample Prep v1.5 kit (Illumina, San Diego, CA), as per manufacturer’s instructions. Strand-specific mRNA-seq libraries were produced using a modification of a protocol described previously15. Unique 5’ and 3’ RNA oligonucleotides were sequentially ligated to the ends of fragments of RNA isolated by depletion of rRNA from total RNA samples. MethylC-seq libraries were generated by ligation of methylated sequencing adapters to fragmented genomic DNA followed by gel purification, sodium bisulfite conversion and 4 cycles of PCR amplification. ChIP-Seq libraries were prepared following Illumina protocols with minor modifications (See Supplementary Materials). Sequencing was performed using the Illumina Genome Analyzer II as per the manufacturer’s instructions.
MethylC-seq sequencing data was processed using the Illumina analysis pipeline and FastQ format reads were aligned to the human reference genome (hg18) using the Bowtie alignment algorithm46. The base calls per reference position on each strand were used to identify methylated cytosines at 1% FDR. mRNA-seq reads were aligned to the human reference and splice junctions of UCSC Known Genes using the ELAND algorithm (Illumina, San Diego, CA). smRNA-seq reads that contained a subset of the 3’ adapter sequence were selected and this adapter sequence removed, retaining trimmed reads that were from 16 to 37 nt in length. These processed reads were aligned to the human reference genome (NCBI build 36/HG18) using the Bowtie alignment algorithm, and any read that aligned with no mismatches and to no more than 1000 locations in the reference genome was retained. Base calling, and mapping of Chip-Seq reads was performed using the Illumina pipeline.
We thank A. Elwell and A. Hernandez for assistance with sequence library preparation and Illumina sequencing. R.L. is supported by a Human Frontier Science Program Long-term Fellowship. R.D.H. is supported by an American Cancer Society Postdoctoral Fellowship. This work was supported by grants from the following: Mary K. Chapman Foundation, The National Institutes of Health (U01 ES017166 and U01 1U01ES017166-01), the California Institute for Regenerative Medicine (RS1-00292-1), the Australian Research Council Centre of Excellence Program (CE0561495, DP0771156) and Morgridge Institute for Research, Madison, Wisconsin. We thank the NIH Roadmap Reference Epigenome Consortium (http://nihroadmap.nih.gov/epigenomics/referenceepigenomeconsortium.asp) and C. Gunter (Hudson-Alpha Institute) for assistance.
Supplementary Information is linked to the online version of the paper at www.nature.com/nature.
Author Contributions Experiments were designed by J.R.E., B.R., R.L., J.A.T. and R.D.H. Cells were grown by J.A.-B. and Q.-M.N. MethylC-Seq, RNA-Seq and smRNA-Seq experiments were conducted by R.L. and J.R.N. ChIP-Seq experiments were conducted by R.D.H., L.L. and Z.Y. ChIP-seq data analysis was performed by G.H., R.D.H. and L.E. BS-PCR validation was performed by R.H.D. Sequencing data processing was performed by R.L., J.T.-F., L.E., V.R. and G.H. Bioinformatic and statistical analyses were conducted by M.P., R.L., G.H., J.T.-F., R.H.D., R.S. and A.H.M. AnnoJ development was performed by J.T.F and A.H.M. The manuscript was prepared by R.L., M.P., R.H.D., A.H.M. and J.R.E.
Author Information Sequence data is available under the GEO accessions GSM429321-23, GSM432685-92, GSM438361-64, GSE17917 and GSE16256, and the SRA accessions SRX006782-89, SRX006239-41, SRX007165.1-68.1 and SRP000941. Analysed data sets can be obtained from http://neomorph.salk.edu/human_methylome. Reprints and permissions information is available at www.nature.com/reprints.