|Home | About | Journals | Submit | Contact Us | Français|
CpG islands (CGIs) are prominent in the mammalian genome owing to their GC-rich base composition and high density of CpG dinucleotides1,2. Most human gene promoters are embedded within CGIs that lack DNA methylation and coincide with sites of histone H3 lysine 4 trimethylation (H3K4me3), irrespective of transcriptional activity3,4. In spite of these intriguing correlations, the functional significance of non-methylated CGI sequences with respect to chromatin structure and transcription is unknown. By performing a search for proteins that are common to all CGIs, here we show high enrichment for Cfp1, which selectively binds to non-methylated CpGs in vitro5,6. Chromatin immunoprecipitation of a mono-allelically methylated CGI confirmed that Cfp1 specifically associates with non-methylated CpG sites in vivo. High throughput sequencing of Cfp1-bound chromatin identified a notable concordance with non-methylated CGIs and sites of H3K4me3 in the mouse brain. Levels of H3K4me3 at CGIs were markedly reduced in Cfp1-depleted cells, consistent with the finding that Cfp1 associates with the H3K4 methyltransferase Setd1 (refs 7, 8). To test whether non-methylated CpG-dense sequences are sufficient to establish domains of H3K4me3, we analysed artificial CpG clusters that were integrated into the mouse genome. Despite the absence of promoters, the insertions recruited Cfp1 and created new peaks of H3K4me3. The data indicate that a primary function of non-methylated CGIs is to genetically influence the local chromatin modification state by interaction with Cfp1 and perhaps other CpG-binding proteins.
To characterize the chromatin modifications typical of CGIs, we used the methyl-CpG-sensitive restriction endonuclease HinPI (cleavage site GCGC) to release small chromatin fragments from purified brain nuclei, as described previously9. As sites for this enzyme in bulk chromatin are rare and generally uncleavable owing to DNA methylation, the released fraction predominantly contains non-methylated CGIs. Confirming this, further digestion of the deproteinized DNA with HpaII (cleavage site CCGG) specifically collapsed the nucleosomal ladder generated by HinPI, but had little effect on DNA released from bulk chromatin with MseI (cleavage site TTAA; Supplementary Fig. 1)9. Western blotting confirmed that non-methylated CGI chromatin is enriched for histone modifications associated with actively transcribed genes (acetylated histone H3, H3K4me3 and H3K4me2) compared with bulk chromatin (Fig. 1a). In contrast, CGI chromatin was depleted for marks not found at active promoters: H3K36me3, H3K9me3, H3K27me3 and H4K20me3 (Fig. 1a). Agreement between these results and genome-wide studies of chromatin modifications3,4,10,11 indicated that this fraction could be used to identify proteins that preferentially localize to non-methylated CGIs. We first tested CXXC finger protein 1 (Cfp1), which binds to non-methylated CpG dinucleotides in vitro by a CXXC zinc finger domain6,12. The data showed that Cfp1 is enriched within the CGI fraction of the genome (Fig. 1a). Similarly, Kdm2a, an H3K36 demethylase that also contains a CXXC domain13, was enriched in the CGI fraction.
Focusing on Cfp1, we tested its in vivo binding specificity by chromatin immunoprecipitation (ChIP) at an endogenous CGI that is present in both methylated and non-methylated states. The Xist CGI is mono-allelically methylated in female cells, but fully methylated in males, which only have one X chromosome14. ChIP analysis of mouse brain tissue identified a peak of Cfp1 binding over the Xist CGI in females, but no peak was present in males, suggesting that Cfp1 exclusively binds to the non-methylated allele (Fig. 1b). To test this more stringently, we used bisulphite sequencing across the Xist locus to determine the methylation status of the immunoprecipitated chromatin recovered from females. As expected, input DNA comprised equal numbers of methylated and non-methylated DNA clones. DNA immunoprecipitated by the Cfp1 antibody was almost exclusively non-methylated (96%), however, whereas DNA immunoprecipitated with an antibody against the methyl-CpG-binding protein MeCP2 (refs 15–17) was predominantly methylated (88%; Fig. 1c). We conclude that Cfp1 selectively binds to non-methylated CpGs in vivo.
To test whether Cfp1 is concentrated at non-methylated CpGs within CGIs, we analysed the genome-wide distribution of Cfp1 using high-throughput DNA sequencing of immunoprecipitated DNA (ChIP-Seq). Prominent peaks of Cfp1 binding co-localized with non-methylated CGIs (Fig. 2a), 81% of which were Cfp1-associated. Cfp1 has been identified as part of the Setd1 H3K4 methyltransferase complex8 and ChIP-Seq with H3K4me3 antibodies showed that 93% of Cfp1-bound CGIs also possess this histone modification (Fig. 2b and Supplementary Table 1). Consistent with the possibility that Cfp1 binding is responsible for recruiting the Setd1 complex to these sites, Cfp1-negative non-methylated CGIs (19% of the total) also lack H3K4me3 (Fig. 2b). Despite being rich in non-methylated CpGs, these CGIs are somehow refractory to Cfp1 binding. One potential explanation came from alignment with the published18 distribution of the polycomb-associated mark H3K27me3 (ref. 19) in mouse brain. More than half (58%) of Cfp1-negative and H3K4me3-negative CGIs contained the H3K27 modification (Fig. 2a, b and Supplementary Fig. 2). In these cases H3K27me3 and polycomb binding may render a CpG island refractory to Cfp1 binding and to H3K4 methylation.
To assess the importance of Cfp1 for the recruitment of H3K4me3, we used stably expressed short hairpin RNAs (shRNAs) directed against Cfp1 to reduce its level in NIH3T3 cells. Single shRNAs reduced Cfp1 (Supplementary Fig. 3), but a combination of three gave a greater effect (Fig. 3a). Depleted cells showed altered morphology (Fig. 3b) and retarded growth (Fig. 3c). ChIP analysis revealed a loss of Cfp1 binding compared with vector-only transfected cells accompanied by a precipitous drop in levels of H3K4me3 across CGIs at the brain-derived neurotrophic factor (Bdnf), β-actin (Actb), c-Myc and Dlx5/6 genes (Fig. 3d). The same results were obtained with clones expressing each of two independent shRNA sequences, ruling out off-target effects of shRNA expression (Supplementary Fig. 3). As a further control, H3K27me3 profiles at the same loci were unaffected by depletion of Cfp1 (Fig. 3d and Supplementary Fig. 3b). The loss of H3K4me3 at six randomly selected CGI promoters in Cfp1-depleted cells argues that this modification is dependent on the presence of Cfp1.
Although Cfp1 binds non-methylated CpGs and seems to be required for H3K4 methylation at CGIs, it is possible that this reflects indirect recruitment of Setd1 by RNA polymerase II, which is present at active CGI promoters. Alignment of ChIP-Seq profiles for Cfp1, H3K4me3 and the unphosphorylated form of RNA polymerase II indeed showed co-localization of all three signals at 86% of all Cfp1-bound CGIs (Supplementary Table 1 and Supplementary Fig. 4). In a small proportion (7%) of cases, however, RNA polymerase II was undetectable, despite the presence of robust peaks of H3K4me3 and Cfp1 (Supplementary Fig. 4). This raised the possibility that RNA polymerase II may not be required and that Cfp1 binding is sufficient to direct H3K4 trimethylation. To test this hypothesis, we used embryonic stem (ES) cell lines in which artificial promoterless CpG-rich DNA sequences had been introduced into the genome at sites that normally lack H3K4me3. The DNA insert in ES line TβC44 (ref. 20) comprises a 720-base-pair (bp) enhanced green fluorescent protein (eGFP) coding sequence containing 60 CpGs21 adjacent to a 600-bp puromycin-resistance gene with 93 CpGs (Fig. 4a). The inserted sequence has the typical CpG density of a CGI, but lacks a promoter. Bisulphite analysis showed that integrated sequence is non-methylated (Fig. 4a). In the targeted cells, prominent domains of Cfp1 and H3K4me3 coincided with the inserted CpG-rich DNA (Fig. 4b). Interestingly, the peaks of H3K4me3 and Cfp1 tracked CpG density as expected if H3K4me3 is determined by this DNA dinucleotide sequence (Fig. 4b, broken line). No peak of RNA polymerase was detected. An independent ES cell line carrying an eGFP insertion on the X chromosome22 (Fig. 4c) also created a peak of H3K4me3 and Cfp1 (Fig. 4d). In this case, bisulphite sequencing showed that approximately a quarter of the integrated sequences were hypomethylated and the remainder were densely methylated (Fig. 4e, input panel). ChIP-bisulphite analysis demonstrated that Cfp1 and H3K4me3 antibodies significantly enriched the hypomethylated sequences (Fig. 4e). We conclude that clusters of non-methylated CpG are sufficient to recruit Cfp1 and create a peak of H3K4me3 modification, even in the absence of a promoter.
The density of non-methylated CpG is ~50-fold higher in CGIs than in bulk genomic DNA, as CpG in the latter is deficient (20% of expected23) and mostly methylated (~70%). It is unclear whether this high CpG density arises as a passive consequence of events at promoters and has no functional significance, or whether it has been selected over evolutionary time because it facilitates transcription (or other DNA-related processes). Our results favour selection, as they indicate that CpG density per se can directly influence histone modification status by the recruitment of the Cfp1 protein and its associated Setd1 histone H3K4 methyltransferase complex. The ability of an exogenous promoter-less CpG-rich insertion to create de novo an H3K4me3 focus provides strong support for this notion. An attractive biological rationale for this phenomenon may be simplification of the large mammalian genome by the creation of ‘beacons’ of H3K4me3 that highlight CGI promoters within the genomic landscape1.
Whether CpG clustering is sufficient to create stable non-methylated CGIs is uncertain. There is evidence that H3K4me3 is incompatible with de novo methylation as components of the DNA methyltransferase complex (Dnmt3L) are repelled by this modification24. In theory, therefore, Cfp1-bound CGIs should be intrinsically stable in the non-methylated state. Previous studies suggest, however, that transcription also has a role. Maintenance of non-methylated CGIs through the waves of de novo methylation in the early embryo depends on promoter function, as point mutations that prevent transcription factor binding without significantly reducing CpG density destroy the immunity of a CpG island to DNA methylation25,26. It follows that H3K4 methylation due to CpG clustering may not be sufficient to reliably perpetuate the non-methylated state. Indeed, more than half of cells carrying the promoter-less eGFP insertion at the Mecp2 locus had acquired dense methylation in ES cells despite the presence of a CpG cluster.
Our data suggest that chromatin modification need not arise secondarily as a result of, for example, transcriptional status, but can be determined genetically due to the sequence characteristics of the underlying DNA. In particular CpG, by virtue of its widely varying local densities and alternative modification states, has the properties of a signalling module that locally influences genome function. As shown here, DNA methylation-free CpG clusters can recruit Cfp1 and probably other CXXC domain proteins. Densely methylated CGIs, on the other hand, attract methyl-CpG-binding proteins, which in turn recruit enzymes that can reinforce repressive histone modifications17,27,28. Future studies of proteins that read and interpret CpG signals promise to shed further light on both genetic and epigenetic determinants of chromosome function.
Nuclei were prepared from brains of 4-week-old mice as previously described31. Nuclear preparations were digested with a two-fold excess of HinP1 or Mse1 in a buffer containing 50 mM Tris-HCl, pH 8, 100 mM NaCl, 5 mM MgCl2, 0.1 mM EGTA and 1 mM β-mercaptoethanol. The released chromatin was retained in the supernatant after centrifugation at 3,800g for 5 min and the proteins were precipitated using trichloroacetic acid before western blot analysis.
Antibodies used are listed in Supplementary Table 2.
ChIP on brain tissue was performed as described17 using antibodies as shown in Supplementary Table 2. Most ChIP-qPCR profiles were replicated using independent Cfp1 antibodies. Illumina linkers were ligated in-house and Solexa sequencing was carried out using Illumina 2G Solexa sequencers using two replicate lanes per biological sample. ChIP-Seq was analysed using custom bioinformatic tools generated in-house (see Supplementary Table 3 for the parameters used). ChIP using formaldehyde crosslinked NIH3T3 cells was performed as previously described32. Bisulphite sequencing was performed as described29. Real-time PCR was carried out using Quantace Sensimix Plus using a Biorad iCycler according to the manufacturer’s instructions (primer sequences are available on request).
NIH3T3 cells were transfected using lipofectamine reagent (Invitrogen) with three independent pSuper vectors containing short hairpin constructs directed against Cfp1 (Oligoengine) or vector alone. Target sequences were as follows: target 986, 5′-GAAGGUGAA GCACGUGAAG-3′; target 1250, 5′-CAGCCAACCGAAUCUAUGA-3′; and target 1920, 5′-CUUCACCAAACGAUCCAAC-3′. Stable clones were selected for puromycin resistance. A combination of the three shRNAs reduced Cfp1 more robustly and was therefore used for the data in Fig. 3. Individual shRNAs gave comparable results by western and ChIP (see Supplementary Fig. 3). RNA was extracted using Tri reagent (Sigma) and was complementary DNA was prepared using reverse transcriptase (Promega). Expression levels were determined using real-time PCR analysis (primer sequences available on request).
ES cell line TβC44 was generated by homologous recombination as described20. A Mecp2-eGFP knock-in targeting vector was constructed by sequential cloning of 5′ (5.3 kb) and 3′ (1.9 kb) regions of Mecp2 homology into peGFP-N1 (Clontech). A PGK-Neo cassette flanked by loxP sites was added to enable selection of transfected cells. Gene targeting was carried out in the ES cell line E14 TG2a to generate an insertion into the Mecp2 gene transcription unit at the junction between the open reading frame and the 3′ untranslated region. This construct was initially designed to create a MeCP2-eGFP fusion protein after transcription and translation. Cells were grown on gelatinized dishes in the presence of recombinant human LIF in Glasgow MEM (Invitrogen) supplemented with 10% FBS (Globepharm), 1× MEM non-essential amino acids, sodium pyruvate (1 mM) and β-mercaptoethanol (50 μM; all Invitrogen). ES cells (5 × 107 cells) were transfected with linearized targeting vector (250 μg DNA in 0.8 ml HEPES buffered saline) by electroporation (800 V, 3 μF, BioRad Gene Pulser) and plated at 5 × 106 cells per dish. Correctly targeted clones were first identified by PCR specific for homologous recombination. The integrity of the targeted locus was confirmed by Southern blot analyses. A single positive clone was transiently transfected with pCAGGS-CRE33 for the Cre-mediated deletion of the selectable marker and a recombinant clone was then used for this study.
We are grateful to D. Skalnik for the gift of a Cfp1 antibody, I. Chambers for the TβC44 ES cell line, R. Klose for discussions, E. Sheridan for testing the DNA sequencing protocol, and K. Auger and J. Parkhill for coordinating the DNA sequencing. We also thank R. Ekiert and J. Connelly for comments on the manuscript. This work was funded by a Cancer Research UK studentship to J.P.T. and by grants from the Wellcome Trust, the Medical Research Council and the European Union ‘Epigenome’ Network of Excellence.
Supplementary Information is linked to the online version of the paper at www.nature.com/nature.
Author Information High throughput sequencing data has been deposited in the Gene Expression Omnibus (GEO) under the accession number GSE18578. Reprints and permissions information is available at www.nature.com/reprints. The authors declare no competing financial interests.