|Home | About | Journals | Submit | Contact Us | Français|
Cytosine DNA methylation is a stable epigenetic mark that is critical for diverse biological processes including gene and transposon silencing, imprinting, and X chromosome inactivation. Recent findings in plants and animals have greatly increased our understanding of the pathways utilized to accurately target, maintain, and modify patterns of DNA methylation and have revealed unanticipated mechanistic similarities between these organisms. Key roles for small RNAs, proteins with methylated DNA binding domains and DNA glycosylases in these processes have emerged. Drawing on insights from both plants and animals should deepen our understanding of the regulation and biological significance of DNA methylation.
The genetic information within a cell is encoded by DNA, which is packaged into chromatin. Epigenetic modifications of DNA and histones, the core components of chromatin, constitute an additional layer of information that influences the expression of the underlying genes. DNA methylation, the addition of a methyl group to a cytosine base, is one such epigenetic modification, and it is evolutionarily ancient and associated with gene silencing in eukaryotes. Attesting to its importance, DNA methylation defects in mammals are embryonic lethal, and in plants they can lead to pleiotropic morphological defects.
In mammals, DNA methylation occurs almost exclusively in the symmetric CG context and is estimated to occur at ~70-80% of CG dinucleotides throughout the genome1. However, a small amount of non-CG methylation is observed in embryonic stem (ES) cells2–4. The remaining unmethylated CG dinucleotides are mostly found near gene promoters in dense clusters, termed CpG islands5,6. In plants, DNA methylation commonly occurs at cytosine bases within all sequence contexts: the symmetric CG and CHG contexts (where H=A, T, or C) and the asymmetric CHH context7. Genome wide, DNA methylation levels of approximately 24%, 6.7% and 1.7% are observed for CG, CHG, and CHH contexts, respectively8. Unlike in mammals, DNA methylation in plants predominantly occurs on transposons and other repetitive DNA elements9.
In mammals, DNA methylation patterns are established by the DNA methyltransferase (Dnmt) 3 family of de novo methyltransferases and maintained by the maintenance methyltransferase, Dnmt110–12 (FIG. 1). In plants, de novo methylation is catalyzed by DOMAINS REARRANGED METHYLTRANSFERASE 2 (DRM2), a homolog of the Dnmt3 methyltransferases and maintained by three different pathways: CG methylation by DNA METHYLTRANSFERASE 1 (MET1), the plant homolog of Dnmt1, CHG methylation by CHROMOMETHYLASE (CMT3), a plant specific DNA methyltransferase, and asymmetric CHH methylation through persistent de novo methylation by DRM213 (FIG. 1). However, the pathways controlling the establishment and maintenance of DNA methylation, as well as those involved in the removal of DNA methylation, are less well characterized.
In this review, we focus on recent studies in plants and animals that have greatly expanded our understanding of such pathways. We begin with the establishment of DNA methylation, with a separate section focusing on the dynamics of DNA methylation in reproductive cells and the roles of small RNAs at this stage of development. We then discuss mechanisms governing the maintenance and removal of DNA methylation. In each section, recent advances from both plants and animals will be presented and both similarities and differences will be highlighted. In particular, small RNAs, methyl-binding domain proteins and DNA glycosylases are common components of the pathways that define dynamic DNA methylation patterns in the two taxonomic groups.
Throughout plant development, small RNAs target homologous genomic DNA sequences for cytosine methylation in all sequence contexts through a phenomenon initially observed by Wassenegger et al.14 termed RNA-directed DNA methylation (RdDM)7,15. In addition to the canonical RNA interference (RNAi) machinery (that is, members of the Dicer and Argonaute families) and DRM2, RdDM also requires two plant specific RNA polymerases, Pol IV and Pol V, with largely non-redundant functions16,17, two putative chromatin remodeling factors, and several other recently identified proteins15. Through the characterization of these components, an increasingly detailed mechanistic understanding of RdDM is emerging (FIG. 2).
Biogenesis of the 24nt small interfering RNAs (siRNAs) required to target DNA methylation depends on Pol IV, RNA-DEPENDENT RNA POLYMERASE 2 (RDR2) and DICER-LIKE 3 (DCL3). Other RdDM components including DRM2, ARGONAUTE 4 (AGO4), and Pol V are needed for siRNA accumulation for a subset of loci; however, these proteins do not appear to be involved in the initial production of siRNAs and are proposed to reinforce siRNA biogenesis by an unknown mechanism7,18. Additional subunits or interacting partners of Pol IV and Pol V have recently been identified19–23. While some subunits are shared with Pol II, others are unique to Pol IV, Pol V, or both20. Although no polymerase activity has been demonstrated for Pol IV, mutations in the largest subunit, NUCLEAR RNA POLYMERASE D1 (NRPD1), including mutations within the conserved metal binding motif, greatly reduce the abundance of siRNAs18,24–30, suggesting Pol IV may be an active polymerase. Pol IV is hypothesized to initiate siRNA biogenesis by producing long single-stranded RNA transcripts. These transcripts are then thought to be acted upon by RDR2, generating double-stranded RNAs that are processed into 24nt siRNAs by DCL3 and loaded into AGO47,15. AGO4 interacts with the Pol V subunit, NUCLEAR RNA POLYMERASE E1 (NRPE1)31,32, and this interaction is required for RdDM31, leading to the hypothesis that this complex functions as a downstream effector of DNA methylation. In vivo, AGO4 co-localizes either with Cajal bodies or with NRPE1, NRPE2 and DRM2 at a separate discrete nuclear body termed the AGO4/NRPDE1 (previously termed NRPD1b) (AB) body32,33. The AB body is located adjacent to 45S ribosomal DNA and may be a site of active RdDM33.
A recent study further clarified the role of Pol V in RdDM by identifying low abundance intergenic noncoding (IGN) transcripts from several loci whose accumulation depends on Pol V34. NRPE1 is present at these transcribed DNA regions and is associated with the RNA transcripts, suggesting Pol V is an active polymerase34. These Pol V-dependent transcripts are required for DNA methylation and silencing of surrounding loci, but their accumulation does not depend on NRPD1, DCL3 or RDR234, suggesting Pol V acts in RdDM via a pathway that is independent of siRNAs. These IGN transcripts are proposed to function as scaffolds for the recruitment of the silencing machinery, possibly facilitated by base pairing interactions between AGO4-bound siRNAs and nascent Pol V transcripts34. A requirement of transcription for silencing is also observed in fission yeast where transcription of heterochromatic DNA by Pol II is required for siRNA mediated heterochromatin formation35.
Current models of RdDM posit that both Pol V-dependent transcripts and siRNAs are required to silence a particular locus. Several studies support the hypothesis that AGO4 and/or SUPPRESSOR OF TY INSERTION 5-LIKE (SPT5-like)/KOW DOMAIN-CONTAINING TRANSCRIPTION FACTOR 1 (KTF1) may bridge the siRNA and IGN transcript generating pathways. SPT5-like, a protein with homology to the yeast transcription elongation factor Spt5, was recently identified as a downstream effector of RdDM21,23,36. SPT5-like and NRPE1 are both able to interact with AGO4 through a conserved Glycine/Tryptophan (GW/WG), or Ago hook, motif present in their carboxy-terminal regions23,31,32,36. In vivo, both SPT5-like and AGO4 interact with Pol V-dependent transcripts36,37, prompting speculation that SPT5-like serves as an adaptor protein that binds both AGO4 and nascent Pol V transcripts, aiding in the recruitment of AGO4 to Pol V transcribed loci. This interaction may also be required to recruit the silencing machinery, including DRM2, to establish DNA methylation.
Another factor thought to act as a downstream RdDM effector, INVOLVED IN DE NOVO 2 (IDN2), was recently identified38. IDN2 has homology with SUPPRESSOR OF GENE SILENCING 3 (SGS3), a protein involved in posttranscriptional gene silencing, and like SGS3 it contains an XS domain able to recognize double-stranded RNAs with 5’ overhangs38. A possible RNA substrate for IDN2 is the duplex between AGO4-bound siRNAs and Pol V non-coding transcripts39, which could also be a signal that aids in recruitment of DRM2 to establish DNA methylation.
In addition to Pol V-dependent transcripts, Pol II-dependent noncoding transcripts required for transcriptional gene silencing at some loci have recently been identified in a weak nuclear RNA polymerase B2 (nrpb2) mutant and these transcripts are also proposed to act as scaffolds for the recruitment of RdDM factors including AGO4, and possibly Pol IV and Pol V40. Further supporting a role for Pol II in RdDM, two genetic screens for RdDM factors identified a conserved protein, RNA-DIRECTED DNA METHYLATION 4 (RDM4)/DEFECTIVE IN MERISTEM SILENCING 4 (DMS4), with similarity to the yeast protein, Interacts with Pol II (IWR1)41,42. While the precise relationship between Pol II, Pol V and Pol IV remains elusive, these studies suggest they may be more intimately connected than previously thought.
Although the mechanisms through which Pol IV and Pol V are targeted to specific loci are poorly understood, several recent findings are beginning to shed light on these aspects of RdDM. DEFECTIVE IN RNA-DIRECTED DNA METHYLATION 1 (DRD1), a putative chromatin remodeling factor43, and DEFECTIVE IN MERISTEM SILENCING 3 (DMS3), an RdDM component with similarity to Structural Maintenance of Chromosome (SMC) proteins38,44, are needed for NRPE1 chromatin association and for accumulation of IGN transcripts34,37, although how these components are targeted is unknown. In addition, the NRPB2 subunit of Pol II aids in the association of both NRPE1 and NRPD1 with chromatin, suggesting Pol II-dependent transcripts, or the act of transcription, may recruit Pol IV and Pol V to specific loci40. Finally, a putative chromatin remodeling factor, CLASSY 1 (CLSY1), may be involved at an early stage of siRNA production, possibly at the level of Pol IV or RDR2 activity45.
Unlike in plants, DNA methylation in mammals covers most of the genome, with the main exception being CpG islands. This DNA methylation pattern is largely established during early embryogenesis, around the time of implantation46,47, through the activity of the Dnmt3a and Dnmt3b DNA methyltransferases5,48,49. However, during post-implantation development, further epigenetic reprogramming occurs in primordial germ cells (PGCs). Following a wave of demethylation, which is required to erase DNA methylation imprints established in the previous generation, DNA methylation patterns are reestablished at imprinted loci and transposable elements (TEs) during gametogenesis by Dnmt3a and a non-catalytic paralog, Dnmt3L5,12,48,49. Recent studies suggest DNA methylation may be targeted to TEs and imprinted genes during germ cell development through different mechanisms, with the former involving PIWI-interacting RNAs (piRNAs) (discussed later) and the latter involving an interactions between Dnmt3L and unmethylated histone 3 lysine 4 (H3K4) tails.
Biochemical purification of Dnmt3L lead to the discovery that Dnmt3L interacts with unmethylated H3K4 tails through its cysteine-rich ATRX-Dnmt3-Dnmt3L (ADD) domain50,51. As Dnmt3L also interacts with Dnmt3a12,50, a model in which Dnmt3L binds unmethylated H3K4 tails and recruits the Dnmt3a2 isoform to specific loci, including imprinted loci, was proposed50,52 (FIG. 3A). Indeed, an inverse relationship between H3K4 methylation and allele-specific DNA methylation has been reported at several imprinted loci53–56. Further supporting this model, an oocyte-specific H3K4 demethylase57, lysine demethylase 1B (KDM1B), was recently shown to be required for the establishment of DNA methylation at several differentially methylated regions (DMRs) associated with imprinted genes during oogenesis57, and defects in DNA methylation at such loci resulted in a loss of imprinting in developing embryos57. This model is also consistent with studies showing H3K4 methylation appears anticorrelated with DNA methylation in multiple mammalian cell types58–62 and with findings that H3K4 dimethylation and trimethylation are anticorrelated with DNA methylation in plants63.
In addition to Dnmt3a and Dnmt3L, there is evidence that transcription across DMRs is also required for imprinting64. Chotalia et al.64 demonstrate that such transcription occurs during oocyte growth (prior to or around the time when de novo methylation occurs) and, at least at the imprinted Gnas locus, is required for the establishment of DNA methylation64. These finding lead to the proposal that the act of transcription, or the transcripts themselves, may alter the chromatin structure of imprinted loci and/or recruit the histone modifying enzymes and DNA methyltransferases required to establish DNA methylation imprints64.
Mechanistic insight into how de novo methyltransferases function once targeted to a particular locus was provided by several biophysical studies focused on the interaction between Dnmt3L and Dnmt3a. Co-crystallization of the carboxy-terminal regions of Dnmt3a and Dnmt3L revealed a tetrameric complex that positions two Dnmt3a proteins such that their active sites are adjacent to one another52. Two Dnmt3L proteins are located on either side of the Dnmt3a dimer and residues of Dnmt3L may stabilize the active site loop in Dnmt3a52, which could account for the observed stimulatory effect of Dnmt3L on Dnmt3a/b activity65,66. Superimposition of the Dnmt3a carboxy-terminal structure with that of the M.HhaI methyltransferase complexed with DNA67 provided a model in which the two Dnmt3a active sites are separated by approximately one helical DNA turn, suggesting that each tetrameric complex could simultaneously methylate two cytosine residues at a defined spacing of 8–10 bps52. This tetrameric complex was subsequently shown to oligomerize on DNA substrates, forming a filamentous nucleoprotein complex68 (FIG. 3A).
Consistent with the determined structural parameters, a periodicity for DNA methylation on opposite strands of a DNA duplex as well as along the same strand of DNA was observed in vitro through bisulfite sequencing analyses52,68. In vivo, the spacing of CG dinucleotides at many DMRs are also consistent with an ~8–10 bp periodicity52, as are findings that CG dinucleotides at an 8 bp spacing are overrepresented across the human genome69,70 and to a lesser extent across the mouse genome69. Since Dnmt3a appears to be a non-processive DNA methyltransferase71, formation of an oligomer could help explain the observed periodic pattern of DNA methylation. Although whether oligomerization occurs in vivo remains unknown, it is tempting to hypothesize a model in which interactions between Dnmt3L and unmethylated H3K4 tails, or possibly between Dnmt3a and other histone modifications or histone methyltransferases, might target and set the register for oligomerization of Dnmt3a/Dnmt3L tetramers resulting in an ~8–10 bp periodicity.
In Arabidopsis, nucleotide resolution DNA methylation mapping revealed an element of periodicity for DNA methylation. For CHH methylation (mostly controlled by DRM2) a period of ~10 bps was observed genome wide8, suggesting that the periodicity observed for Dnmt3a may be a common feature of de novo methyltransferases and that it may also occur on a genome wide scale in mammals. For CHG methylation (mostly controlled by CMT3), a period of approximately the size of a nucleosome, 167 nucleotides, was found8, consistent with the chromodomain in CMT3 interacting with methylated H3 tails72.
In addition to interactions with unmethylated H3K4 tails, other mechanisms for targeting DNA methylation to specific loci throughout the genome also shape the overall methylation landscape during mammalian development. These including interactions between Dnmt3a/b and the histone methyltransferases G9a, Enhancer of zeste homolog 2 (EZH2), Suppressor of variegation 3–9 homolog 1 (SUV39H1), and SET domain bifurcated 1 (SETB1)5. More recently, Zhao et al.73 have demonstrated that symmetric methylation of histone 4 arginine 3 (H4R3me2s) by protein arginine methyltransferase 5 (PRMT5) can recruit Dnmt3a to the human β-globin locus, which is required for the DNA methylation and silencing of this gene. In vitro characterization of the interaction between the H4R3me2s modification and Dnmt3a demonstrated that the ADD domain of Dnmt3a is sufficient to mediate this interaction73. While histone arginine methylation has been implicated in gene silencing74, this finding marks the first direct link between arginine methylation and DNA methylation.
Transposons and other repetitive DNA elements are highly abundant in both plant and mammal genomes. Due to the high risk TEs pose to genome integrity, their expression must be tightly regulated. Such control is particularly important in cells that transmit genetic information to the subsequent generation. In both plants and mammals, such elements are targeted by the de novo methylation machinery and are maintained in a methylated and silenced state. Recent evidence suggests that in mammals, like in plants, small RNAs play an important role in targeting transposons for methylation.
In plants, DNA methylation patterns appear to be maintained in a multigenerational manner leading to the view that DNA methylation in plants is quite static. However, several complimentary studies demonstrate that DNA methylation patterns in plants are dynamic during development by showing that genome-wide losses of DNA methylation occur during both male and female gametogenesis. This observed hypomethylation is similar to the global demethylation observed in PGCs and on the paternal genome during mammalian development48,49. These resent studies suggest that, in Arabidopsis, these changes may reinforce transposon silencing in the sperm and egg cells, respectively75–77 (FIG. 4).
During male gametogenesis tricellular pollen grains that contain a vegetative cell nucleus and two sperms cells are produced78 (FIG. 4B). Analysis of transposon expression in different plant tissues revealed that transposons, which are methylated and silenced in most tissues, are expressed and mobile in pollen75. Within the pollen grain, transposon reactivation appears to be restricted to the vegetative nucleus. This is a key distinction, as the sperm cells, but not the vegetative nucleus, provide genetic information to subsequent generations78 and thus their genome integrity must be protected. Consistent with decreased DNA methylation and transposon activation, several RdDM components are down regulated in pollen75,79 and DECREASED DNA METHYLATION 1 (DDM1), a chromatin remodeling factor required for maintenance of CG methylation80, appears to be excluded from the vegetative nucleus75. Sequencing of siRNA populations from pollen and isolated sperm cells showed an increase in 21nt siRNAs in sperm cells75. Since these siRNAs correspond to transposons that do not appear to be expressed in the sperm cells, it was postulated that siRNAs generated in the vegetative nucleus might travel to the sperm cells and reinforce silencing by an unknown mechanism75 (FIG.4C).
The two sperm cells fertilize the central cell and the egg cell of the multicellular female gametophyte in a double fertilization event that generates the embryo and the endosperm, respectively78 (FIG. 4B,C). While previous studies have documented decreased DNA methylation at discrete imprinted loci in endosperm78, two recent studies show endosperm DNA methylation is reduced genome wide, likely originating from demethylation in central cell of the female gametophyte76,77. These findings are in line with observations that chromatin appears less condensed in endosperm nuclei81. Despite this global decrease in DNA methylation, Hsieh et al.76 found increased CHH methylation in both the endosperm and embryo tissues relative to adult shoot tissue and suggest this hypermethylation could result from enhanced RdDM. Consistent with these findings, profiling of Pol IV-dependent siRNA levels in different plant tissues shows maternal-derived siRNAs accumulate to high levels in the endosperm82. Analogous with the model of reinforced silencing in the male gametophyte75, these finding lead to the suggestion that siRNAs potentially generated in the central cell may reinforce silencing in the egg cell and possibly in the developing embryo76 (FIG. 4B,C).
Since potentially deleterious transposition events occurring in the sperm or egg cells would be inherited in subsequent generations, demethylation during gametogenesis may function to reveal TEs within the genome with the potential to be expressed and arm siRNA-based pathways in order to ensure these elements are efficiently silenced. Such a mechanism would be inherently adaptable, as newly integrated transposons would also be expressed, leading to siRNA production and the establishment of silencing. Interestingly, Teixeira et al.83 recently demonstrated that siRNA producing loci, unlike other regions of the Arabidopsis genome, can be re-methylated in all sequence contexts when methylation is lost in previous generations, suggesting a dynamic role for RdDM in correcting DNA methylation defects. However, remethylation to approximately wild-type levels was only observed after multiple generations, as is also the case when newly inserted transgenes become silenced. It is tempting to speculate that decreased methylation and transposon reactivation during gametogenesis might be required to generate siRNA signals and allow the observed reestablishment of silencing.
A small RNA pathway is also required to silence some transposons in mammals during male gametogenesis (FIG. 5). While RdDM in plants utilizes 24nt siRNAs, transposon control in mammals utilizes 25-30nt small RNAs, termed piRNAs, initially identified in Drosophila84. In Drosophila, piRNAs bound by the P-element induced wimpy testis (PIWI) clade of argonautes guide cleavage of transposon transcripts, which results in posttranscriptional gene silencing85. This clade of argonautes is highly conserved in animals86 and initial genetic analysis in mammals and flies suggested that roles for piRNAs in germ cell development and transposon silencing were also conserved85. Yet, early studies of mammalian piRNA populations revealed that, unlike in Drosophila, mammalian piRNAs were not enriched for repetitive regions of the genome87,88 leaving it unclear whether mammalian piRNAs function to silence transposons. However, subsequent analyses of piRNA populations isolated at earlier stages of mouse development revealed an enrichment in repetitive DNA sequences89-91. These piRNA populations possess the characteristic sequence properties of both primary89-91 and secondary89,91 piRNAs, suggesting that they are generated by a mechanism similar to the ping-pong amplification model initially proposed in Drosophila92,93 and that they are indeed involved in the posttranscriptional silencing of transposons (FIG. 5A).
In mammals, decreases in DNA methylation and increases in expression were observed at several TEs in two PIWI clade mutants, miliand miwi290,94, suggesting that piRNAs silence transposons at both the transcriptional and posttranscriptional levels. However, these initial methylation studies were carried out at a developmental stage many mitoses after the establishment of DNA methylation, which occurs following a wave of genome wide demethylation in PGCs (FIG. 5B). Thus it could not be determined whether methylation defects were occurring at the level of maintenance or de novo methylation. Two recent studies provide compelling evidence that piRNAs are indeed involved in de novo methylation by demonstrating that DNA methylation defects in mili mutants occur at the stage in development when de novo methylation in male germ cells is observed89 and that piRNA populations from this time period are highly enriched in transposon sequences89,91. Aravin et al.91 further demonstrate that piRNAs are present in Dnmt3L mutants, suggesting the piRNA pathway acts upstream of de novo methylation.
By analogy with models for siRNA and piRNA mediated transposon control in Arabidopsis and Drosophila, respectively, the demethylation in PGCs may reveal TEs with the potential to be expressed when hypomethylated leading to the production of piRNAs and the targeting of DNA methylation to homologous sequences throughout the genome (FIG. 5). It has been hypothesized (FIG. 5A) that PIWI-piRNA complexes could interact with nascent transposon transcripts and directly recruit the de novo methyltransferases. However, preliminary studies failed to show an interaction between PIWI argonautes and Dnmt3 proteins91. Alternatively, this recruitment could be indirect, first involving the recruitment of chromatin modifiers which catalyze modifications that subsequently recruit the DNA methyltransferases95.
Recently, it was found that PIWI family members in mouse96–99, Drosophila, and Xenopus contain symmetrical dimethylarginine (sDMA) modifications97. Methylated arginines can be recognized by tudor domains and purification of Mili, Miwi, or Miwi2 containing complexes demonstrated interactions with various Tudor domain-containing (Tdrd) proteins96,98–101: Tdrd1 was found to interact with Mili96,98–101, while Tdrd1, Tdrd2, Tdrd9 interacted with Miwi298 (FIG. 5A). Like Mili, Tdrd1 is required for DNA methylation and transposon silencing in mouse germ cells98,99. In tdrd1 mutants, the profile of Mili bound piRNAs is altered, containing a higher proportion of non-transposon sequences98,99, as is the profile of Miwi2 bound piRNAs, containing a lower proportion of antisense piRNAs98, which may explain the observed transposon reactivation.
Once established, global DNA methylation patterns must be stably maintained in order to ensure that transposons remain in a silenced state and to preserve cell type identity.
In mammals, DNA methylation is maintained by Dnmt1 (FIG. 6A). This methyltransferase is associated with replication foci and functions to restore hemimethylated DNA generated during DNA replication to the fully methylated state10. Early studies illustrated that Dnmt1 is recruited to replication foci via an interaction with the proliferating cell nuclear antigen (PCNA) component of the replication machinery102. However, disruption of this interaction only resulted in a minor reduction in DNA methylation103–105. Recently, it was shown that Dnmt1 also interacts with another chromatin associated protein, Ubiquitin-like PHD and RING finger domain 1 (UHRF1) (FIG. 1), and that UHRF1 is required for the association of Dnmt1 with chromatin106,107. Studies showing that mutations in UHRF1 cause severe decreases in DNA methylation106,107 and that the SRA domain of UHRF1 specifically binds to hemimethylated CG dinucleotides106,108–111 have lead to a model in which UHRF1 recruits Dnmt1 to hemimethylated DNA106,107. In addition, UHRF1 also interacts with Dnmt3a and Dnmt3b112, which may suggest a role for UHRF1 in de novo methylation. Maintenance of DNA methylation also requires the chromatin remodeling factor Lymphoid-specific Helicase (LSH1)113,114, although the mechanism through which LSH1 functions in DNA methylation remains unknown.
In plants, genetic analyses have demonstrated that homologs of the above mentioned mammalian proteins (FIG. 1), the MET1 DNA methyltransferase80, the VARIANT IN METHYLATION/ORTHRUS (VIM/ORTH) family of SRA domain proteins115,116, and the DDM1 chromatin remodeling factor80,117 are required to maintain CG methylation, suggesting plants and mammals maintain CG methylation in a similar manner. However, further work is needed to determine mechanistically whether these proteins are indeed functioning in a similar way as observed in mammals. One known difference between plants and mammals, is that mutations in DDM1, but not LSH1, cause a decrease in histone 3 lysine 9 (H3K9) methylation118,119, a modification that is highly correlated with DNA methylation and silencing in plants120 and mammals5.
In Arabidopsis ~1/3 of genes have CG methylation in their coding region, which is maintained by MET18,9,121,122. Unlike methylation at transposons, CG methylation within gene bodies does not appear to cause silencing as these genes tend to be moderately expressed in many tissues9,122. Nonetheless, the expression of some body methylated genes is upregulated in met1 mutants122, and highly or lowly expressed genes tend to lack body methylation, suggesting an interplay between transcription and body methylation. The presence of body CG methylation at some genes has also been reported in other invertebrate organisms, suggesting it may be a common feature of eukaryotic genomes6. Initial studies in Arabidopsis postulated that body methylation might suppress the production of antisense transcripts from cryptic promoters122,123. However, increases in antisense transcripts in met1 mutants were found to be rare and uncorrelated with body methylated genes9. Thus, the function of body methylation remains poorly understood.
CHG methylation is thought to be maintained via a reinforcing loop involving histone and DNA methylation124 (FIG. 6B). Genome wide profiling of H3K9 and DNA methylation showed that these marks are highly correlated120. Furthermore, loss of either CMT3, the DNA methyltransferase largely responsible for maintaining CHG methylation125,126, or SU(VAR)3–9 HOMOLOG 4 (SUVH4)/ KRYPTONITE (KYP), the histone methyltransferase largely responsible for H3K9 dimethylation127–129, results in a dramatic decrease in DNA methylation127,128. Two other H3K9 histone methyltransferases, SUVH5 and SUVH6, also contribute to global levels of CHG methylation130,131. The observed interdependence of DNA and histone modifications could arise from the multidomain structure of CMT3 and KYP (FIG. 1). In addition to its histone methyltransferase domain, KYP possesses an SRA domain that specifically binds CHG methylation124, suggesting CHG methylation recruits KYP. In turn, CMT3 possesses a chromodomain that binds methylated histone H3 tails72, suggesting histone methylation by KYP may recruit CMT3. Such cross talk between DNA and histone methylation is also observed in mammals and, in many cases, the connection between these modifications appears to involve protein-protein interactions between the histone and DNA methyltransferases themselves5. Whether direct protein interactions between CMT3 and KYP occur and aid in maintaining CHG methylation in plants is unknown.
Asymmetric DNA methylation is maintained by constant de novo methylation by DRM2 and RdDM. However, at some loci CHH methylation is controlled by CMT3 and DRM2132. Like maintenance of CG and CHG methylation, RdDM also requires proteins with SRA domains. SUVH9 and SUVH2 possess SRA domains that preferentially bind CHH and CG methylation, respectively, and these proteins are thought to act late in the RdDM pathway (FIG. 2), possibly functioning to recruit or retain DRM2 at loci targeted for methylation133.
Although in most cases DNA methylation is a stable epigenetic mark, reduced levels of methylation are observed during development in both plants and mammals. This net loss of methylation can either occur passively, via replication in the absence of functional maintenance methylation pathways, or actively by removing methylated cytosines.
In plants, active demethylation is achieved by DNA glycosylase activity, likely in combination with the base excision repair (BER) pathway134,135 (FIG. 7). DEMETER (DME)136 and REPRESSOR OF SILENCING 1 (ROS1)137 are the founding members of a family of DNA glycosylases in Arabidopsis that also includes DEMETER-LIKE 2 and 3 (DML2 and DML3)138,139. The Arabidopsis glycosylases recognize and remove methylated cytosines from double-stranded DNA oligonucleotides irrespective of sequence context in vitro138–142 and in vivo, mutations in these genes cause increased DNA methylation in all sequence contexts at specific genomic loci121,137–140,143,144. In general, DNA glycosylases involved in BER recognize and remove mutagenic substrates, including oxidized and alkylated bases, as well as thymine/guanine (T:G) mismatches often generated by deamination of methylated cytosines145. The DME/ROS1 glycosylases have homology to the helix-hairpin-helix-Gly-Pro-Asp (HhH-GPD) class of DNA glycosylases and they are bifunctional enzymes able to break both the N-glycosidic bond, removing the base, and the DNA backbone145,146 (FIG. 7). In mammals, the resulting single nucleotide gap is then acted upon by DNA polymerase β and ligase IIIα activities, respectively, in order to repair the DNA through the short patch BER pathway. Clear homologs of these enzymes have not been identified in plants, raising the possibility that plants utilize enzymes traditionally involved in the long patch BER pathway145.
Despite similar substrate specificity, the DME/ROS1 glycosylases have distinct biological roles, with DME functioning during gametogenesis to establish imprinting78, and the other family members functioning in vegetative tissues, possibly to counteract robust DNA methylation by the RdDM pathway138,143,144,147. Unlike in mammals, where imprinting is established by the addition of methylation in an allele specific manner and is observed in both the placenta and the developing embryo, in plants, imprinting is restricted to the endosperm, the plant equivalent of the placenta78, and is established by allele specific removal of DNA methylation by DME in the central cell prior to fertilization such that only the maternal allele is expressed in the resulting endosperm78 (FIG. 4C).
Until recently, DME was only known to activate the maternal alleles of three genes: MEDEA (MEA), FLOWERING WAGENINGEN (FWA), and FERTILIZATION INDEPENDENT SEED 2 (FIS2), while in mammals ~80 imprinted genes have been identified (http://www.har.mrc.ac.uk/research/genomic_imprinting). However, findings that the genome wide decrease in CG methylation observed in the endosperm are largely dependent on DME suggest that this glycosylase acts as a global regulator of DNA methylation76,77. By comparing DNA methylation levels in embryo and endosperm tissues Gehring et al.77 were able to identify DMRs and confirm parent of origin expression of five genes, doubling the number of known imprinted genes in Arabidopsis. Approximately 40 other candidates for imprinted genes were identified77 and while imprinting at these loci remains to be experimentally verified, these findings suggest the number of imprinted loci may be more similar in plants and mammals than previously thought.
Recent characterization of Pol IV-dependent siRNA populations, which are generated from dispersed loci corresponding to >1% of the Arabidopsis genome18,25, suggests they may also be maternally imprinted82. Following reciprocal crosses between two Arabidopsis ecotypes, siRNAs from the resultant silique tissue, which contains the developing embryos, were sequenced and nearly all the Pol IV-dependent siRNAs that could be uniquely distinguished between the two ecotypes where maternal in origin82. What causes these loci to be maternally imprinted, whether this imprinting requires DME, and what function this massive extent of imprinting serves remains unknown. One hypothesis presented is that such maternal imprinting would allow recognition of self from non-self and have a suppressive effect on hybrids82. For example, maternal siRNAs could fail to target and silence a TE present in another Arabidopsis ecotype or they could target and silence a functional gene. Indeed, in Drosophila piRNAs corresponding to TEs other repeat sequences, much like the Pol IV-dependent class of siRNAs in Arabidopsis, are maternally inherited and if female flies lacking piRNAs to a particular TE are crossed to male flies harboring that element the offspring are largely inviable148.
Unlike DME, ROS1, DML2, and DML3 are expressed in vegetative tissues137–139. Comparative analysis of methylation patterns in ros1, dml2, and dml3 single mutants demonstrated that these glycosylases function redundantly, although some locus specificity was observed138. In a ros1 dml2 dml3 triple mutant, 179 loci with increased methylation relative to wild-type controls were identified despite the fact that no global increase in methylation was observed138. These loci are enriched for transposons, repetitive DNA elements, and siRNA generating loci; ~80% are also near or overlap annotated genes and the increase in DNA methylation at genes is primarily located at their 5’ and 3’ ends138,144. Together, these studies suggest ROS1, DML2, and DML3 are acting both at normally silenced loci (i.e. transposons) and at the boundaries between euchromatin and heterochromatin (i.e. genes residing in or near heterochromatic environments). At such boundaries, these glycosylases may function to protect genes that are targeted for methylation through RdDM from silencing by removing DNA methylation. At normally silenced loci, they may be required to maintain a silenced, but readily adaptable state143,144 and perhaps this is important to allow efficient reactivation of transposons during gametogenesis.
The mechanism(s) through which the DME/ROS1 glycosylases are targeted to specific loci to carryout DNA demethylation are unknown. These Arabidopsis glycosylases are quite different from most other glycosylases: they are much larger and contain two conserved domains of unknown function145 (FIG. 1). Whether these domains are required to target demethylation remains unknown. For ROS1, it has been proposed that REPRESSOR OF SILENCING 3 (ROS3), a protein that binds small single-stranded RNAs (21–26nt) in a sequence specific manner and acts in the same demethylation pathway as ROS1, may be involved in targeting ROS1 to certain loci135,149. Recent findings that DME participates in genome wide demethylation, suggests its targeting may be less specific.
In mammals, genome wide decreases in DNA methylation are observed in PGCs and on the paternal genome of the zygote48,49. While mechanisms for passive demethylation appear to play a role in achieving the observed hypomethylated states, the timing of methylation loss suggests that active mechanisms may also be required150–154. Notably, DNA methylation imprints in the zygote and pre-implantation embryo, but not the PGCs, are resistant to demethylation and several proteins including Stella155, zinc finger protein 57 (Zfp57)156, and methyl-CpG binding 3 (MBD3)157 are proposed to protect specific imprinted loci from demethylation158,159.
Proteins orthologous to the DME/ROS1 family of glycosylases have not been identified in mammals and other enzymes capable of directly removing methylated cytosines have remained largely controversial146,160. However, early work in mammals showing that activation-induced cytosine deaminase (AID) and apolipoprotein B RNA-editing catalytic component 1 (APOBEC1) are expressed in cells thought to undergo active DNA demethylation and catalyze 5-methylcytosine deamination, which results in T:G mismatches, lead to a model for demethylation involving the coupling of 5-methylcytosine deaminase and thymine DNA glycosylase activities161. Such a model is supported by recent findings in the vertebrate, Zebrafish162. Rai et al.162 show that METHYL CPG BINDING DOMAIN 4 (MBD4), a HhH-GPD thymine glycosylase related to the Arabidopsis DME/ROS1 family of glycosylases with active mammalian homologs163,164, is involved in demethylation during Zebrafish development. In addition, they demonstrate that three proteins belonging to the AID/ APOBEC family (FIG. 1), AID and APOBEC2a/2b, are involved in DNA demethylation162.
By overexpressing the AID and APOBEC2a/b deaminases in the absence or presence of overexpressed human MBD4 in Zebrafish embryos, Rai et al.162 find that loss of DNA methylation, as well as deamination of methylated cytosines, appears to be limited by the abundance of the MBD4 glycosylase, suggesting that mechanisms are in place to ensure deamination does not occur unless the resultant T:G mismatch can be efficiently removed. This is an important finding since previous models for DNA demethylation involving 5-methylcytosine deamination have been discounted due to the large mutagenic potential of an uncoupled deamination step. Growth arrest and DNA-damage-inducible protein 45 alpha (Gadd45α) may aid in coupling these processes. Gadd45α interacts with MBD4, AID, and APOBEC in vitro and stimulates demethylation of plasmid DNA transfected into Zebrafish embryos as well as the association of MDB4 and AID with methylated DNA162. In addition, MBD4 possesses a methyl-binding domain (FIG. 1), which may aid in recruitment of the demethylation machinery to methylated DNA. Together, these findings suggest a model (FIG.7) in which tight coupling of 5-methylcytosine deamination by AID and APOBEC to T:G mismatch repair via MBD4 results in DNA demethylation162. Thus, in the case of Zebrafish, (and possibly mammals) there appears to be an addition deamination step in the demethylation pathway as compared to the pathway in plants. However, the downstream events leading to a net loss of cytosine methylation may be similar (FIG. 7).
In mammals, recent data presented by Kim et al.165 suggest that MBD4 may be able to directly remove methylated cytosines at the CYP27B promoter upon hormone induced MBD4 phosphorylation. While previous in vitro analysis of MBD4 glycosylase activity revealed a strong preference for thymine/guanine mismatches over methylated cytosines166, Kim et al.165 find that upon phosphorylation, the activity of MBD4 on methylated cytosines is stimulated. In vivo, the observed decrease in methylation at the CYP27B1 promoter can occur in the absence of DNA replication, suggesting an active mechanism, and is dependent on the presence of a catalytically active MBD4 protein with the serine residues targeted for phosphorylation165. Whether such a mechanism for the direct removal of methylated cytosines could account for DNA demethylation on a larger scale remains unknown.
A role for the 5-hydroxymethylcytosine modification in mammalian DNA demethylation has also been proposed. 5-hydroxymethylcytosine is present in mouse Purkinje neurons, brain tissue, and ES cells167,168 and can be generated from methylated cytosines through hydroxylation of the methyl group. Findings that ten-eleven translocation 1 (TET1) is able to catalyze the conversion of methylated cytosines into 5-hydroxymethylcytosines in vitro and that targeted depletion of TET1 by RNAi in mouse ES cells results in decreased levels of 5-hydroxymethylcytosine167 have lead to the hypothesis that TET1 and possibly other TET family members generate 5-hydroxymethylcytosines. Since proteins known to interact with methylated cytosines have reduced affinity for 5-hydroxymethylcytosine in vitro, namely methyl CpG binding protein 2 (MeCP2) and Dnmt1169,170, potential roles for this modification in the regulation of chromatin structure and in passive DNA demethylation have been proposed167,168. The hypothesis that 5-hydroxymethylcytosine could be an intermediate in an active DNA demethylation pathway involving DNA repair has also been suggested167 given that a 5-hydroxymethylcytosine specific DNA glycosylase activity has been reported in mammalian extracts171.
In addition to active DNA demethylation by DME in the central cell of the female gametophyte, passive losses of methylation likely contribute to the overall decrease in methylation observed in the endosperm. Using a reporter driven by the MET1 promoter, Jullien et al.172 showed that MET1 expression levels are reduced during female gametogenesis. They further demonstrate that MULTICOPYSUPPRESSOR of IRA1 (MSI1) and RETINOBLASTOMA RELATED 1 (RBR1) are important for the observed repression of MET1172 and conclude that MET1 is transcriptionally repressed during gametogenesis by MSI1, likely through the retinoblastoma pathway and RBR1. Notably, MSI1 and RBR1 are also required for maternal expression of the imprinted FIS2 and FWA genes172, suggesting that passive DNA demethylation resulting from decreased MET1 levels and active demethylation by DME are working together to allow activation of imprinted genes. In mammals, Dnmt1 expression also appears to be regulated through the retinoblastoma pathway, which utilizes Rb and RbAp48, homologs of the plant RBR1 and MSI1 proteins173-176. While a direct role in imprinting has not been established in mammals several observations suggest this role may also be conserved172.
The idea that passive and active demethylation pathways are working together is appealing as it fits well with several other observations: First, DME is more active on hemimethylated DNA, which would be enriched following replication in the absence of MET1, than fully methylated DNA in vitro140,141. Second, enrichment in hemimethylated DNA should decrease the chance of detrimental double strand breaks (DSBs) predicted to arise from the removal of methylated cytosines in symmetric contexts by DME. Indeed, DME is inefficient at removing methylated cytosines across from abasic sites, which should also reduce the production of DSBs140. Finally, in addition to decreasing the workload for DME and the risk of DSBs, down regulation of MET1 might also function to ensure that hemimethylated CG sites generated by DME activity on one strand of the DNA are not efficiently restored to the fully methylated state via active targeting of DNA methyltransferases through interactions with SRA domain containing proteins.
In mammals, in addition to the reported active demethylation of the paternal genome of the zygote152–154, passive demethylation is proposed to occur during pre-implantation development of the embryo48,177. This passive decrease in methylation is likely due to exclusion of the oocyte specific form of Dnmt1, Dnmt1o, from nuclei until just prior to blastocyst formation178,179. This is reminiscent of the observed localization pattern of DDM1 in pollen, where DDM1 is observed in the sperm cells, but not in the vegetative nucleus75. Thus, plants and mammals appear to employ similar mechanisms for passive DNA demethylation, including transcriptional repression of DNA methyltransferases and exclusion of the methylation machinery from the nucleus.
Plants and animals employ similar mechanistic strategies for controlling DNA methylation. Both utilize small RNA based pathways to target DNA methylation to transposons, both require methyl-DNA binding proteins to maintain DNA methylation patterns, and both display intimate connections between histone and DNA methylation marks. Furthermore, a growing body of evidence suggests that active demethylation may occur in animals through the use of DNA glycosylases and the BER pathway, as has been documented in Arabidopsis.
Several pathways unique to plants or mammals have also been elucidated and likely contribute to the observed differences in global methylation patterns between plants and mammals. For example, in mammals where methylation is not restricted to repeat elements, the DNA methylation machinery is recruited to specific genomic loci through interactions with chromatin marks as well as with the chromatin modifying enzymes themselves. In addition, structural studies of the mammalian de novo methyltransferases suggests a mechanism in which a Dnmt3a/Dnmt3L tetramer may oligomerize on DNA, potentially leading to the nearly global methylation status of the mammalian genome. In plants, where DNA methylation occurs in all sequence contexts, a plant specific methyltransferase, CMT3, is required to maintain CHG methylation and maintenance of CHH methylation is achieved through constant de novo methylation by DRM2.
Despite the significant advances in our understanding of DNA methylation pathways, several key questions remain, especially surrounding the issue of targeting. How DNA methyltransferases are targeted by siRNAs and piRNAs in plants and mammals, respectively, remains elusive. In terms of DNA demethylation, whether DME is specifically targeted to many sites throughout the genome during gametogenesis or whether it non-selectively removes methylation remains unclear. Similarly, whether demethylation by the other DME/ROS1 family members is specifically directed to certain loci or whether the observed methylation pattern simply reflects a balance between the RdDM and demethylation pathways requires further investigation. For mammals, determination of whether mammals accomplish DNA demethylation in a manner akin to that observed for Zebrafish will be key, and surely the question of mammalian demethylation targeting will follow.