The CXXC domain binding of CpG was originally discovered in MBD1, a protein isolated for its homology to the methylated CpG DNA-binding protein, MeCP2, but also having three zinc fingers of the CXXC type (
Cross et al., 1997). Subsequently, it was determined that while two of the CXXC domains within MBD1 bind methylated DNA, many CXXC domains are specific for unmethylated CpG sequences (
Birke et al., 2002;
Lee et al., 2001), including a third CXXC domain within MBD1 (
Jorgensen et al., 2004) and the DNA methyltransferase DNMT1 (
Pradhan et al., 2008). Thus, multiple chromatin modifiers could potentially be recruited by CpG-rich sequences. Two recent studies have tested this idea and found a critical role for CXXC domains in recruiting histone-modifying activities to chromatin (
Blackledge et al., 2010;
Thomson et al., 2010).
In one test of the role of the CXXC domains in targeting CpG islands, Bird and colleagues studied the recruitment of CXXC1 (Cfp1), a component of mammalian Set1/COMPASS, the major H3K4me3 in mammals (
Lee and Skalnik, 2005;
Miller et al., 2001;
Thomson et al., 2010;
Wu et al., 2008). Bird and colleagues performed genome-wide profiling of CXXC1 in mouse brain. They found CXXC1 to be localized at 80% of the CpG islands, 90% of which were enriched for H3K4me3. Half of the CXXC1-negative CpG islands were previously reported to be sites of Polycomb and H3K27me3 occupancy. Comparing CXXC1 occupancy at
Xist, a gene transcribed on only one of two X chromosomes in females, CXXC1 was found to associate exclusively with the transcribed, unmethylated, CpG copy. Together, these findings suggest that CpG islands could help recruit mammalian Set1/COMPASS through CXXC1’s affinity for unmethylated CpG sequences.
To experimentally test the role of CpG islands in recruiting CXXC1, Bird and colleagues created two ES cell lines carrying a promoterless construct with eGFP and puromycin sequences that contain CpG dinucleotide densities similar to those found in CpG islands. In one cell line, the artificial CpG island was targeted to the 3 end of Nanog, while in the other, the construct was targeted to the 3 end of Mecp2, an X-linked gene. When located adjacent to Nanog, the cassette remained free of CpG methylation, despite lacking detectable Pol II. H3K4me3 and CXXC1 occupancy tracked CpG density within the cassette and around the insertion site. Interestingly, the cassette inserted next to the Mecp2 gene showed two-thirds CpG methylation, but still was found to be bound by CXXC1. Bisulfite sequencing of the CXXC1 and H3K4me3 ChIP DNA demonstrated that only a third of the immunoprecipitated cassette was CpG-methylated, strongly suggesting that unmethylated CpG sequences are recruitment sites for mammalian Set1/COMPASS. However, point mutations within the CXXC domain of CXXC1 would be required to demonstrate the direct role of this protein in binding CpG and the ensuing regulation of H3K4 methylation by mammalian Set1/COMPASS.
While Set1/COMPASS is the major H3K4 trimethylase in mammalian cells (
Wu et al., 2008), in other studies it has been demonstrated that the loss of CXXC1 leads to an increase in global H3K4 trimethylation levels in ES cells (
Tate et al., 2009). Based on this observation, Skalnik and colleagues have suggested that CXXC1 functions by restricting Set1/COMPASS methyltransferase activity. In contrast, Bird and colleagues find that H3K4me3 levels are reduced at the CpG islands in the absence of CXXC1. It will be interesting to learn where in the genome H3K4me3 is increasing upon loss of CXXC1.
An important finding by Bird and colleagues is that H3K4 methylation implemented by mammalian Set1/COMPASS is independent of Pol II and transcription, while findings in yeast have shown that transcription is required for proper H3K4 trimethylation (
Krogan et al., 2003a;
Ng et al., 2003b;
Shilatifard, 2006). One possible explanation for this apparent difference is that the interaction of Pol II with CpG islands might be transient and not as easily detectable as the product of the process, histone H3K4 trimethylation. Therefore, sensitive RNA-seq methods, such as global run-on sequencing GRO-seq (
Core et al., 2008), could reveal transcription within these CpG islands, thus explaining the association of CXXC1 and mammalian COMPASS at these CpG islands.
Interestingly, the yeast homolog of CXXC1, Cps40, has a PHD finger in common, but lacks a CXXC domain. However, the
Drosophila homolog of CXXC1, CG17446, contains both the PHD and CXXC domains. CpG islands have not been studied in
Drosophila, although their existence has been predicted (
Takai and Jones, 2002). It is notable that
Drosophila Trithorax, unlike its mammalian counterpart MLL, lacks a CXXC domain, further demonstrating that although these H3K4 methyltransferase complexes are largely conserved in composition and function, some aspects of their recruitment can differ, perhaps reflecting differences in the size or complexity of the genome in which they are found.
Another class of histone-modifying enzymes bearing a CXXC domain is KDM2A/B. KDM2A/B is a histone demethylase that preferentially uses H3K36me2 as a substrate (
Tsukada et al., 2006). H3K36me2 is a modification that has previously been linked to gene silencing (
Bender et al., 2006), and can recruit histone deacetylases (
Li et al., 2009;
Youdell et al., 2008); suggesting that removal of H3K36me2 could facilitate the formation of open chromatin. Klose and colleagues tested the role of the CXXC domain in targeting KDM2A. First, they tested the DNA-binding specificity of KDM2A’s CXXC domain and found that it preferentially binds to unmethylated CpG sequences (
Blackledge et al., 2010). Genome-wide profiling demonstrated that KDM2A was highly enriched at annotated CpG islands. Major peaks of KDM2A binding not corresponding to CpG islands were shown to have a high CpG content with little DNA methylation as assessed by bisulfite sequencing. Since most CpGs outside of the CpG islands are unmethylated, Klose and colleagues likely found novel CpG islands that were previously unnoticed due to the statistical criteria used in the annotation process. Sites of strong KDM2A binding were also found to be depleted for H3K36me2, suggesting that KDM2A localization to CpG islands results in active demethylation of H3K36me2. Indeed, knockdown of KDM2A results in increased H3K36me2 at some CpG islands, however, very little alteration in transcription was reported when KMD2A levels were reduced by RNAi, suggesting that H3K36 dimethylation at CpG islands at the promoters does not have a major transcriptional regulatory role. KDM2A and the highly related KDM2B have been shown to be highly concentrated in nucleoli of cells where they can repress ribosomal RNA transcription (
Frescas et al., 2007;
Tanaka et al., 2010). Interestingly, rDNA accounts for 20% of the unmethylated CpG sequences in the mouse genome (
Bird et al., 1985), suggesting that KDM2A and KDM2B are targeted in part through CpG recognition for rDNA transcription by RNA Pol I.
The studies by the Bird and Klose groups were both in agreement that recruitment to unmethylated CpG islands via the CXXC domain was uncorrelated with transcriptional activity, suggesting that unmethylated CpG content is sufficient for recruitment. Since MBD1 and the DNA methyltransferase DNMT1 also have CXXC fingers that recognize unmethylated CpG, it will be important to ask how various CXXC-associated activities compete or coexist with each other on the same site and how this is regulated during development. It would not be surprising if future studies in this regard will find that the interactions of the CXXC finger-containing proteins with their target site are context-dependent and require other cellular signals for proper function. The studies by the Bird and Klose groups should stimulate investigations into the role of CpG islands in transcription and the function of histone-modifiying activities in this process.