|Home | About | Journals | Submit | Contact Us | Français|
A multitude of functions have evolved around cytosine within DNA, endowing the base with physiological significance beyond simple information storage. This versatility arises from enzymes that chemically modify cytosine to expand the potential of the genome. Some modifications alter coding sequences, such as deamination of cytosine by AID/APOBEC enzymes to generate immunologic or virologic diversity. Other modifications are critical to epigenetic control, altering gene expression or cellular identity. Of these, cytosine methylation is well understood, in contrast to recently discovered modifications, such as oxidation by TET enzymes to 5-hydroxymethylcytosine. Further complexity results from cytosine demethylation, an enigmatic process that impacts cellular pluripotency. Recent insights help us to propose an integrated DNA demethylation model, accounting for contributions from cytosine oxidation, deamination and base excision repair. Taken together, this rich medley of alterations renders cytosine a genomic “wild card”, whose context-dependent functions make the base far more than a static letter in the code of life.
In poker, the rules of the game can occasionally change. Adding a “wild card” to the mix introduces a new degree of variety and presents opportunities for a skilled player to steal the pot. Given that evolution is governed by the same principles of risk and reward that are common to a poker game, it is perhaps not surprising that a genomic “wild card” has an integral role in biology.
In the conventional view, the genome is a long polymer of A, C, G and T, which together define and differentiate organisms. However, it is increasingly clear that diversity within an organism is often governed by dynamic changes that take place within this scaffold (1). Here, we make the case that cytosine is the key residue that has taken on the role of genomic “wild card” in DNA. In particular, enzymes that chemically modify cytosine introduce a physiologically important layer of complexity to the genome, beyond that seen in the primary sequence.
Remarkably, modifications of every single position in the nucleobase of purines or pyrimidines in RNA have been described (2). Cytosine, for example, can be deaminated or methylated in many different non-coding RNAs to regulate various aspects of protein translation (3, 4). The mechanisms and physiologic significance of RNA cytosine modification have been discussed elsewhere and their scope continues to expand (5, 6, 7).
It is striking that relative to RNA, modifications of nucleobases within genomic DNA have been comparatively underappreciated. In this review, we examine the curious chemistry of cytosine and the DNA modifying enzymes that change its identity (Figure 1). We begin by examining the non-canonical ways in which genomic DNA fosters adaptability and variety. To understand how cytosine is the key to generating this genomic flexibility, we describe nature’s toolbox of enzymes for modifying the nucleobase and its analogs. Numerous modifications beyond cytosine methylation are now coming to the fore, including cytosine deamination, oxidation and demethylation. We examine the common thread that runs through these modifications: by influencing the identity of cytosine, a new degree of variety can be produced.
We typically think of the genome a stable, unchanging blueprint for life. However, as life demands variety and adaptability, many other “accessory” functions must also be hard-wired into the genome. For example, modification of DNA can help organisms distinguish self DNA from foreign DNA(8). In bacterial species, DNA methyltransferases have co-evolved with a partner restriction enzyme that shares the same sequence preference. Since only host DNA is methylated, this system allows for degradation of foreign DNA by the corresponding restriction enzyme. A second adaptive role for DNA is to mediate the expression or silencing of genes (9). While DNA modifications share this role with histone modification enzymes, all are needed in order to properly modulate transcriptional networks. Importantly, DNA modifying enzymes also allow for the reverse process to occur, “resetting” the genome for proper gametogenesis or reactivation of gene expression (10). Finally, the adaptive immune system demonstrates the importance of genomic malleability. The immunoglobulin (Ig) locus is a dramatic example of how the genome is pre-programmed to foster variety, through recombination and mutation that ultimately confer an adaptive advantage (11, 12).
We will describe the manner in which cytosine modifications modulate genomic potential, allowing DNA to serve as a stable, but malleable, reservoir of information. In order to examine the relevant biological pathways, we must first introduce the enzymes in nature’s toolbox for altering cytosine within DNA (Figure 2).
In duplex DNA, the C5 and C6 positions of cytosine lie in the major groove, unencumbered by Watson-Crick interactions. The electrophilic character of the C6 position makes it a key target of modifying enzymes. For example, DNA methyltransferases (DNMTs) transiently modify C6 by attack of an active site cysteine. Methylation results from the concerted addition of a methyl group derived from S-adenosylmethionine (SAM) to the C5 position (13, 14). The covalent intermediate breaks down, liberating the enzyme and generating genomic 5-methylcytosine (mC) (Figure 2B). Interestingly, in the absence of SAM, DNMTs can catalyze non-classical reactions, such as deamination at C4 (15, 16) or the addition of aldehydes to C5 (17), raising intriguing questions about the relevance of these non-classical functions in vivo.
The epigenetic impact of C5 methylation will be discussed later in this review, but it is important to note here that previously underappreciated oxidative modifications of mC are also possible. Physiologically, oxidation of mC is carried out by the TET family enzymes (Figure 2B), which belong to the Fe(II)/α-ketoglutarate-dependent oxygenase family that includes histone demethylases and the DNA damage repair enzyme AlkB (18, 19). Rao and colleagues initially discovered the TET family based on homology to a trypanosome enzyme known to catalyze oxidation of the exocyclic methyl group of thymine. Initially, TETs were shown to oxidize mC to 5-hydroxymethylcytosine (hmC) (18). However, more recent studies have revealed that TETs can catalyze iterative oxidation of mC. The products of iterative oxidation, 5-formylcytosine (fC) and 5-carboxylcytosine (caC), are stably detectable intermediates in genomic DNA from embryonic stem (ES) cells (20, 21). In total, the TET enzymes have provided a stable of new chemical handles whose impacts on transcriptional regulation and demethylation we will examine later in this review.
The C4 position of cytosine is relatively protected while engaged in Watson-Crick pairing, but in the context of single-stranded DNA, it becomes an important site for deamination by AID/APOBEC family enzymes (Figure 2B) (22). The mechanism of deamination involves activation of a zinc-bound water for nucleophilic attack at C4 and generation of a tetrahedral intermediate. An active site glutamate promotes deamination of C4 and the conversion of cytosine analogs into uridine analogs (23). In addition to deamination of unmodified cytosine, some studies have suggested that mC deamination can generate thymine (22, 24). However, the evidence surrounding this possibility is conflicting (25), and the full spectrum of AID/APOBEC activity against various cytosine analogs has not yet been clarified. These questions and their impact on diversity will be explored.
The distinction between genomic malleability and instability is subtle. Deamination of cytosine and 5-methylcytosine may cause transition mutations; deamination is therefore a very relevant threat to genome stability. In response, sophisticated DNA repair machinery has evolved to ensure the integrity of DNA (26), namely base excision repair enzymes (BER) and mismatch repair (MMR) enzymes. Interestingly, many of these “repair” enzymes are exploited to support cytosine’s role in generating diversity.
Several BER enzymes are worthy of particular attention, with uracil DNA glycosylase (UDG) standing out with a robust ability to excise uracil from DNA. Given the need to exclude uracil, UDG conspires with deoxyuridine triphosphatase to ensure the presence of thymine over uracil in DNA (27, 28). The only naturally occurring lesion that is efficiently targeted by UDG is uracil, though unnatural lesions such as 5-fluorouracil are also processed (29). Stringent selectivity against thymine occurs by enzymatic discrimination against bulky C5 substituents, while specific hydrogen bonding to a key active site asparagine residue selects uracil over cytosine (30, 31, 32). As we will note, in addition to its principal role in promoting DNA fidelity, UDG is exploited to generate diversity when uracil is purposefully introduced into the genome.
A second key DNA repair enzyme is thymine DNA glycosylase (TDG), which targets T:G mispairs that arise from deamination of mC in CpG motifs. Spontaneous deamination of mC produces thymine which, unlike uracil, is naturally occurring in DNA and therefore more challenging to recognize as a lesion (28). Furthermore, mC is an order of magnitude more prone to spontaneous deamination than cytosine (33, 34). These factors likely contribute to the increased mutation frequency at methylated CpG sequences in cancerous cells (35). A challenge lies in editing T:G mispairs: to repair this mutation without error, repair machinery much first recognize the mispair, and then specifically excise thymine and not guanine. TDG and the enzyme MDB4 are both capable of this activity. Mice deficient in MBD4 do exhibit increased C to T mutations and tumorigenesis (36, 37), although the embryonic lethality of the TDG knockout, and not MBD4, suggests additional important roles for TDG (38, 39).
Several features distinguish TDG from UDG. First, the enzyme actively recognizes the opposite strand G and a neighboring G, biasing activity towards T:G mismatches within CpG motifs (40). Second, the stability of the pyrimidine N-glycosidic bond, not simply the presence or absence of C5 substituents, impacts substrate preferences. In fact, TDG can not only cleave uracil-related nucleobases, but also modified cytosine residues whose N-glycosidic bond is destabilized, such as 5-fluorocytosine (41). Lastly, UDG knockout mice are viable and fertile, while the TDG knockout mice are embryonic lethal, standing as the only known DNA glycosylase with such a phenotype (38, 39, 42).
An additional BER enzyme that may contribute to diversity is single-stranded monofunctional DNA glycosylase (SMUG). This misnomer belies the fact that SMUG preferentially acts on double-stranded DNA and that it targets several uracil-related lesions (43). A water molecule adjacent to the C5 position provides a mechanism for selectively processing uracil. Intriguingly, a C5-hydroxymethyl substituent can replace this active site water (44), making 5-hydroxymethyluracil (hmU) a good substrate, with potential relevance to epigenetic reprogramming (45).
The numerous DNA cytosine-modifying enzymes each play important physiologic roles in generating genomic variety. On its face, cytosine deamination is antagonistic to the primary function of DNA as a stable reservoir of information. However, when the process is highly targeted and controlled, purposeful deamination is used to yield beneficial mutations.
The foremost example of deamination as a means to diversity is demonstrated by the adaptive immune system (11, 23). The mature antibody pool is a collection of heterogeneous antigen-binding molecules produced through multiple diversity-generating mechanisms. Programmed recombination of gene segments (VDJ recombination) provides the initial repertoire of B-cells, each encoding a different surface-bound IgM molecule. However, this diversity is insufficient to yield the high-affinity interactions needed for robust immune responses. In a key transformation that occurs after exposure to antigen, B-cells in the germinal center are matured by two genome-altering processes: somatic hypermutation (SHM) and class switch recombination (CSR). In SHM, antibodies evolve from low-affinity to high-affinity by the introduction of mutations into their antigen-recognition loops at a rate 106 times that of spontaneous mutation. In CSR, the effector domain of the heavy chain is switched from IgM to yield the alternate isotypes IgA, IgE, or IgG.
The DNA modifying enzyme activation-induced deaminase (AID) mutates key cytosines in the Ig locus to initiate the molecular events that lead to SHM or CSR (Figure 3A) (11, 23). AID expression is largely B-cell specific and restricted to germinal centers, the site of SHM and CSR (46). In SHM, AID introduces uracil into Ig locus DNA (47). The uracil lesions are then subjected to repair pathways involving UDG, mismatch repair enzymes, and low-fidelity, rather than high-fidelity, DNA polymerases, like DNA pol η. The DNA “repair” pathway is therefore co-opted to promote error-prone repair, resulting in hypermutation of antibody molecules. In CSR, AID targets cytosine residues that are on opposite strands in the switch regions immediately upstream of the various heavy chain loci encoding IgM, IgG, IgE or IgA. Clustered deamination on both DNA strands leads to double-stranded DNA breaks, which are resolved by recombination to result in isotype switching.
Given the fine line between genomic malleability and instability, an important factor in deamination by AID is appropriate targeting (48, 49). Hyperactive AID is associated with common oncogenic translocations as well as leukemic progression and drug resistance in chronic myeloid leukemia (50). AID is known to act throughout the genome, but preferentially acts at the Ig locus, with a balance between deamination and repair determining function (51). The mechanism by which the Ig locus is preferentially targeted remains enigmatic and is an important area of study, though some light has been shed on targeting at the local sequence level. Within the Ig locus, AID selectively targets hotspot sequences that are enriched in the antigen recognition loops and switch regions, thus promoting functional mutations over detrimental ones (52, 53).
Though AID-catalyzed SHM and CSR are exemplars of purposeful cytosine deamination, they are not the only examples. AID is closely related to APOBEC enzymes, best known for their roles in restricting retroviruses such as HIV (54). One family member, APOBEC3G (A3G), acts as a kind of Trojan horse against HIV: it can be integrated into budding HIV virions and, upon infection of a new cell, works to damage the HIV genome. A3G deaminates the (−)-strand viral cDNA generated by reverse transcription, introducing a high frequency of uracil that impairs viral integration and disrupt essential viral proteins (Figure 3B). As a counterattack measure, lentiviral pathogens express Vif, a small accessory protein that targets A3G for ubiquitination and degradation (55). Intriguingly, even in the presence of Vif, A3G is occasionally packaged at low levels into HIV. This observation raises the possibility that low levels of A3G mutagenesis may in fact confer a survival advantage to HIV by yielding viral variants that can escape immune pressure or antiviral challenges (56). Indeed, sublethal mutagenesis and robust acquisition of resistance to antivirals has been demonstrated when HIV was cultured in the presence of cellular A3G (57, 58, 59). Thus, just as our immune system exploits cytosine deamination to generate variety via AID, viral pathogens, though primarily antagonized by A3G, also are able to control the deaminase to access beneficial genomic variety.
While cytosine DNA deamination allows for “rewriting” the genome, cytosine methylation is known to modulate gene expression and cellular identity (Figure 3C). While this modification has been well studied, in the context of considering the role of cytosine in modulating genomic potential, certain aspects of this topic are worthy of reconsideration.
Cytosine methylation upstream of transcriptional start sites is a stable chemical modification associated with transcriptional repression in eukaryotic organisms (60). Cytosine methylation occurs predominantly in the context of CpG motifs. CpG motifs are disproportionately underrepresented in the human genome, occurring four times less frequently than would be predicted by a random distribution. Further, the motifs are highly enriched in specific regions designated as CpG islands (61). The non-random distribution of potential CpG methylation sites bolsters the notion that cytosine serves an important diversity-generating function.
CpG methylation alters transcriptional repression through multiple pathways, rooted in biophysical and biochemical changes that take place in the overall DNA structure (62). DNA methylation increases the melting temperature of duplex DNA, potentially decreasing promoter accessibility to RNA polymerase (63). Further, the C5 methyl group projects into the major groove of duplex DNA, providing a biochemical handle that can be interrogated by DNA binding proteins. The impact of methylation can be direct, abrogating binding of numerous transcription factors as one means to decreasing gene expression (60). Alternatively, transcriptional repression can be indirectly affected, via methyl-DNA binding proteins that subsequently recruit histone modifying enzymes (64). Functionally, cytosine methylation can restrain the inappropriate expression of genes; thus the identity and location of the modified cytosine shapes cellular function. In embryogenesis, methylation silences the transcription of lineage-specific genes (9). Pluripotency genes are similarly methylated upon differentiation to ensure the adoption of a lineage-specific cell fate (10). Methylation also impacts imprinting, the parental-specific regulation of gene expression of autosomal transgenes and endogenous genes (65). In contrast to embryogenesis, dysregulation of methylation may result in inappropriate silencing of tumor suppressor genes (66), a process that appears widespread in cancer (67). As a whole, the chemical modification of cytosine, as governed by DNMTs, plays an essential role in dictating the phenotypic outcome of the genome in a given cell.
An additional layer of complexity was revealed by the discovery that mC may be oxidized to hmC. This modification was first identified in bacteriophage genomes as a strategy to evade bacterial restriction endonucleases (68). The epigenetic landscape changed significantly when Rao and colleagues discovered the TET family of mC oxidase enzymes in mammals (18). Further studies have demonstrated that hmC is found throughout the body, albeit at a low frequency. In tissues where hmC is most enriched, the base comprises no more than 1% of all cytosines (69, 70). Much of the focus on hmC has surrounded its presence in embryonic tissues and stem cells. Indeed, several groups have described the presence of hmC in the paternal pronucleus of the fertilized egg (71, 72), and chromatin immunoprecipitation studies have shown an association between hmC and bivalent H3K4-H3K27 histone methylation, an epigenetic hallmark of key embryonic genes (73, 74). Though it is known that hmC levels in ES cells decrease during differentiation (75, 76, 77, 78), the modulation of hmC in adult tissues remains poorly understood. Within the genome, much like mC, hmC localizes upstream of transcription start sites, but it may also be found in intragenic bodies (74, 75).
Given that the discovery of eukaryotic hmC was so recent, work is ongoing to describe its functional significance. Initial reports implicated hmC as a “poised” intermediate on the path to cytosine demethylation, a topic we tackle in the next section (18, 79). However, the current data also strongly suggest that hmC, as a stable modification of cytosine, has its own epigenetic regulatory role with respect to modulating the genome (Figure 3C). From a biophysical perspective, hmC has been shown to partially alleviate the energetic barrier for melting mC-containing duplex DNA; Tm values are similar to those of free cytosine (63, 80). However, hmC appears enriched in the promoter region of a gene, a pattern that often correlates with transcriptional repression (74). Some DNA binding proteins like MeCP2 distinguish between mC and hmC, while others, like the maintenance methyltransferase factor Uhrf1, will bind both hmC and mC (81). This implies that the information encoded by hmC may dictate chromatin structure via mechanisms distinct from mC. This notion is strengthened by the observation that TET oxidases associate with Sin3A repressor complexes and histone deacetylases (82). At this time, early reports indicate that hmC may be a stable DNA modification that, like its precursor mC, causes transcriptional repression. Currently, it is unclear what impact intragenic hmC exerts; the base may disrupt methyl-binding domain interactions that remodel euchromatin to heterochromatin (83) or may activate transcription at alternative promoters (84). Clarifying these proposed epigenetic roles of hmC, in addition to its putative role in demethylation, is an important challenge ahead.
Cytosine methylation is critical for gene imprinting and cell lineage specification, as discussed above. The reverse of this process – the removal of the methyl group – allows cells to newly express previously repressed genes or to recover their totipotent potential. Until recently, this process of cytosine demethylation was thought to be a passive process in which replication without the action of maintenance DNMTs dilutes mC from DNA. However, mounting evidence suggests that replication-independent, “active” (enzymatic) demethylation occurs globally in totipotent cells (85, 86) and also in a locus-specific fashion within somatic cells (87, 88, 89, 90, 91). Active cytosine demethylation, therefore, has now been recognized as a crucial molecular process and is yet another example of the role of cytosine in modulating genomic potential.
Cytosine demethylation is relevant even at the earliest stages of mammalian development. Upon penetrating the zona pelucida, the paternal pronucleus is rapidly demethylated (85). Remarkably, the maternal pronucleus sits in the same cytoplasm and is exclusively demethylated via passive demethylation; the mechanism for such asymmetric demethylation remains unclear. Beyond the zygote and blastula stages, a subset of cells are induced to travel to the gonadal ridge and become primordial germ cells (PGCs). Although PGC genomes are widely methylated at the time they are designated, they are globally demethylated by the time they arrive at the gonadal ridge several days later (92). Given that maintenance DNMTs are expressed in PGCs, such global demethylation is assumed to require active demethylation.
Several examples of locus-specific active demethylation suggest that this process is likewise important in the normal functioning of somatic cells. Fast methylation and demethylation cycling at the estrogen receptor promoter provide a notable example of locus-specific active demethylation (88, 89). Other studies in CD8+ T-cells illustrated that expression of IL-2 can be induced via replication-independent demethylation, suggesting a role for active demethylation in sustained immune responses (90). Finally, even neural plasticity is impacted by active demethylation as evidenced by changes at the promoter for brain-derived neurotrophic factor (91).
Although active demethylation is increasingly accepted as an important physiological process, its molecular basis remains controversial. Several DNA glycosylases have been described in Arabidopsis that can excise mC specifically; however, mammals appear to lack this activity (93). In the past several years, a wealth of new evidence has implicated several of the key cytosine modifying enzymes we have reviewed, particularly the AID/APOBEC deaminases, TET oxidases, and DNA glycosylases (94, 95, 96). Two major types of models have emerged: a deamination-initiated pathway (97, 98) and several variants of an oxidation-initiated pathway (17, 20, 21, 45, 95) (Figure 4).
In the deamination-initiated pathway, mC is first deaminated by an AID/APOBEC family member to yield thymine. The BER pathway subsequently recognizes the T:G mismatch and reverts the lesion to an unmodified cytosine. In support of the role AID/APOBEC enzymes may play in demethylation, AID-deficient PGCs were found to be more methylated than wild-type PGCs in a mouse model (99). In zebrafish embryos, coexpression of multiple AID/APOBEC members along with MBD4 caused global demethylation of the genome (100). AID was also shown to contribute to demethylation at key pluripotency loci such as the Nanog and Oct4 promoters in a heterokaryon system used to generate stem cells (101). Recent evidence that a TDG knockout is embryonic lethal supports the deamination-initiated pathway (38, 39), although not to the exclusion of the oxidation-initiated pathway, as we note below.
Several factors suggest that the deamination-initiated pathway is insufficient to fully explain demethylation, although this mechanism may indeed be an important accessory pathway towards that end. Deletion of AID is not embryonic lethal, as would be expected if this were the sole pathway for active demethylation (99). It is also hard to reconcile a prominent, genome-wide activity for AID with its known properties at the molecular level. While AID has indeed been shown to act outside of the Ig locus, this occurs several orders of magnitude less frequently than within the Ig locus (51). Furthermore, AID/APOBEC enzymes preferentially act on single-stranded DNA in particular sequence contexts (22, 52, 53), but most methylated, silenced loci are likely to be double-stranded in CpG contexts. In addition, although deaminases have been suggested to deaminate mC (24), such activity on mC is diminished relative to activity on cytosine (22). Therefore, the deamination-initiated pathway, although likely relevant in some instances, may not represent the major mechanism for demethylation.
The discovery of genomic hmC raised the possibility of oxidation-first pathways to demethylation (18). Despite the ongoing controversy, several observations bolster support for an oxidation-initiated mechanism. The striking prevalence of hmC in promoters suggests that TET oxidation of mC is likely to be an important step in demethylation (74, 75). TET knockdown in ES cells may decrease expression at loci involved in pluripotency, including Nanog (73, 74, 79, 82), and promoters undergoing active demethylation have also demonstrated a physiological association with TET (45). Finally, TET has also been shown to have a preference for binding at CpG nucleotides, where methylation is most relevant in humans (73, 82).
The route from hmC to cytosine is still under debate, but several potential pathways are worthy of consideration. These pathways can be characterized as deamination-coupled, BER-coupled or direct-reversion mechanisms. As yet, an enzyme capable of direct removal of the hydroxymethyl group from the 5-position of the base (dehydroxymethylation) has not been discovered; however, this is a mechanistically feasible reaction . Alternatively, hmC could be deaminated by AID/APOBEC enzymes to yield hmU, subsequently removed by an enzyme such as SMUG or TDG (41, 45). In this system, suggested to be active in neurons, overexpression of AID decreased endogenous hmC levels and both TET and AID contributed to demethylation at several neuron-specific promoters, although overall levels of demethylation were low (45). However, this proposed model relies on assumptions about the ability of AID/APOBEC enzymes to efficiently deaminate hmC. This activity has not yet been established, nor has sequencing revealed the presence of hmU as a detectable demethylation intermediate, although efficient removal of hmU from the genome may explain the latter point.
A more recent model for efficient demethylation integrates several observations into a more appealing mechanism involving iterative oxidation directly coupled to BER. In several recent reports, the higher oxidation products of hmC, 5-formylcytosine (fC) and 5-carboxylcytosine (caC), were detected in the genome of ES cells (20, 21, 102). Furthermore, it was shown that fC and caC directly result from iterative oxidation of mC by TETs (20, 21). Based on the precedent of a related enzyme in pyrimidine salvage, Zhang and colleagues have proposed that an undiscovered decarboxylase could catalyze the regeneration of cytosine from caC (20). While the search for such an activity could be justified, support for a much more appealing model comes from He et. al. who revisit the dependence of demethylation on BER (103). These authors looked for DNA glycosylase activity against the higher oxidation products of mC. They found that the BER enzyme TDG recognizes and excises the highly oxidized caC nucleobase (21). Notably, no such activity was detected with MDB4. In line with their proposal, knockdown of TDG leads to an accumulation of caC in the genome of ES cells, while conversely TDG overexpression decreases caC content. An independent report from Maiti and Drohat has also subsequently confirmed that TDG excises fC and caC, while leaving hmC untouched (104). This proposed mechanism is consistent with the observation that TDG deficiency is embryonic lethal and leads to perturbed methylation patterns in embryogenesis (38, 39). While it has been assumed previously that a role for TDG in demethylation implicates a deamination-mediated pathway, this need not be the case; TDG can directly excise cytosine bases with weakened N-glycosidic bonds, as would likely be the case for fC and caC.
Although the field itself is rapidly evolving, we propose that these apparently disparate studies invoking deamination, oxidation and BER can be integrated into a more coherent model (Figure 4) (105). A gathering body of evidence supports important roles for the various TET isoforms in physiological niches where DNA demethylation is thought to be relevant. Though much remains to be resolved, disrupting expression leads to perturbed demethylation of paternal paternal pronuclei and embryonic demise in the case of TET3 (106), dysregulation of hematopoiesis in the case of TET2 (107, 108) and diminished embryonic growth of viable offspring in the case TET1 (109). These genetic findings couple with the biochemical studies to make a case for the TET enzymes as major regulators of DNA demethylation. We therefore suggest that an iterative oxidation-initiated/BER-coupled pathway could be a major route to demethylation, but that deaminase enzymes could serve an important accessory role to accelerate demethylation in certain physiological settings. This could occur because deamination would generate a uracil-related base, rather than a cytosine-related base, and the relevant BER enzymes are more efficient in excision of the products of deamination. This paradigm could explain the apparent contribution of deamination in heterokaryon systems (101), neurons (45), or settings where AID/APOBEC enzymes are overexpressed (45, 100). Together, a model invoking both major and accessory pathways accounts for the observations that TET, AID/APOBEC enzymes and BER enzymes all appear to contribute to demethylation, but that a predominant pathway is required in the setting of embryogenesis, where demethylation is critical to proper development and differentiation.
While the current evidence suggests that an iterative oxidation/TDG-coupled pathway plays a major role in cytosine demethylation, the model is far from resolved and several major gaps remain in our understanding (105). For instance, hmC accumulates to higher levels than fC and caC; what controls the extent of oxidative modification by TET? Next, although Xu and colleagues (21) propose a model where caC is the intermediate just prior to BER, Maiti and Drohat observe that fC is a better substrate for TDG than caC (104). What is the final oxidation intermediate prior to BER? Further, if BER is involved in lesion recognition, the process of reversion to cytosine would generate a basic sites and DNA nicks. Given the high load of lesions that would result from DNA cytosine methylation in CpG islands, how is genomic instability averted? There are also fundamental questions that remain regarding the proposed deamination-mediated, accessory pathway. For example, the biochemical plausibility of cytosine analogs as substrates for deamination by AID/APOBEC enzymes remains largely unassessed. Addressing these open questions will be essential to the ongoing debate over the mechanism of demethylation.
Adaptability is essential to life, but it is counterbalanced by the need for genomic stability. We have made the case that cytosine modification provides mechanisms for adaptation, thus increasing the potential of the genome. Deamination of cytosine contributes to genetic variability by promoting purposeful mutations, as evidenced in the maturation of immune responses. Cytosine methylation or oxidation refines the genome by tailoring a gene program to a given cell lineage or altering gene expression in the face of environmental changes. Finally, multiple DNA-modifying pathways appear to collaborate to carry out cytosine demethylation, helping to establish a totipotent state in cells otherwise marked by methylation.
Although the a unique role for cytosine is increasingly evident, there are pressing questions that need to be explored. It is not immediately clear why cytosine is the base endowed with a special role in diversity generation. It is tempting to speculate that the pyrimidine base’s reactivity, coupled with thymine’s previously designated role in segregating DNA from RNA, allowed cytosine to fill this other niche. What is abundantly clear from the recent discovery of hmC, fC and caC is that the scope of cytosine modification is greater than previously appreciated. High sensitivity mass spectrometry has been key to the identification of novel DNA modifications, justifying an aggressive search for other such modifications (69, 102, 110). Given the advances in metabolomics, the use of labeled metabolites may provide additional mechanisms for detecting and tracking new DNA modifications.
Secondly, there are now several precedents to suggest we need to reevaluate the scope of reactions catalyzed by known DNA cytosine-modifying enzymes. TET enzymes, thought to catalyze hmC generation alone, now have been shown to produce fC and caC (20, 95); TDG, thought to act only on uracil analogs, can also excise oxidized cytosine analogs (21, 41, 104); and DNMT enzymes, thought to only catalyze methylation, can also add aldehydes (17). Resolving the complete catalytic repertoire of known DNA modifying enzymes is an important next step.
Thirdly, we should reinvigorate the search for novel enzymes that modify DNA, such as the proposed decarboxylase for caC (20, 95). Several appealing leads have already been suggested by bioinformatic analysis focused on discovering proteins with DNA-binding domains linked to known nucleotide modifying domains (111). New insights may also come from classical biochemical approaches for discovering proteins that interact specifically with DNA containing modified nucleobases.
Finally, and perhaps most critically, we are in need of novel chemical biology tools to detect site-specific modifications. Despite the wealth of information gained from methods such as bisulfite sequencing, we now know that these data need to be reinterpreted in the context of newly discovered modifications (112, 113). Several new methods have been developed to detect hmC in the genome, such as differential modification by glucosyltransferases, specific recognition of hmC and its adducts, and analysis of distinct electrical properties of modified DNA using nanopores (70, 74, 80, 114). Similar approaches are needed to fully catalog the products of deamination, iterative oxidation, and other modifications in the genome. Further, to assess the biological impact of these bases, we need methods to site-specifically control the identity of cytosine within the genome. We have tools to alter proteins within the complex milieu of the cell, but lack similar methods to explore the nature of the dynamic genome at the DNA level (1). With novel approaches at hand, we anticipate that fundamental insights into evolution and adaptation will come from exploring the “wild card” function of cytosine in the genome.
We are grateful to L. C. Wang, D. J. Krosky and C. F. Meyers for thoughtful commentary on the manuscript. R. M. Kohli is supported by the Rita Allen Foundation, the W. W. Smith Charitable Trust, and by an NIH/NIAID career development award (K08-AI089242).