|Home | About | Journals | Submit | Contact Us | Français|
DNA methylation is an epigenetic event involved in a variety array of processes that may be the foundation of genetic phenomena and diseases. DNA methyltransferase is a key enzyme for cytosine methylation in DNA, and can be divided into two functional families (Dnmt1 and Dnmt3) in mammals. All mammalian DNA methyltransferases are encoded by their own single gene, and consisted of catalytic and regulatory regions (except Dnmt2). Via interactions between functional domains in the regulatory or catalytic regions and other adaptors or cofactors, DNA methyltransferases can be localized at selective areas (specific DNA/nucleotide sequence) and linked to specific chromosome status (euchromatin/heterochromatin, various histone modification status). With assistance from UHRF1 and Dnmt3L or other factors in Dnmt1 and Dnmt3a/Dnmt3b, mammalian DNA methyltransferases can be recruited, and then specifically bind to hemimethylated and unmethylated double-stranded DNA sequence to maintain and de novo setup patterns for DNA methylation. Complicated enzymatic steps catalyzed by DNA methyltransferases include methyl group transferred from cofactor Ado-Met to C5 position of the flipped-out cytosine in targeted DNA duplex. In the light of the fact that different DNA methyltransferases are divergent in both structures and functions, and use unique reprogrammed or distorted routines in development of diseases, design of new drugs targeting specific mammalian DNA methyltransferases or their adaptors in the control of key steps in either maintenance or de novo DNA methylation processes will contribute to individually treating diseases related to DNA methyltransferases.
The discovery of C5-methylated cytosine in DNA was about half a century ago . Extensive researches in last decade have led scientists to realize that self-controlled DNA methylation can be a new inheritable route affecting genetic process, including X chromosome inactivation and uniparental disomy . A number of experimental and clinical studies demonstrated that disturbance (even subtle deflexion) of DNA methylation may induce various disorders, including different types of cancer, genomic imprinting-associated diseases (e.g. Beckwith-Wiedemann syndrome, transient neonatal diabetes, etc.), repeat-instability diseases (e.g. Huntington’s disease, spinocerebellar ataxia, etc.), and DNA methylation machinery-defected diseases [e.g. systemic lupus erythematosus, immunodeficiency, centromeric instability, and facial anomalies syndrome (ICF)] . In addition, dynamic and programmed DNA methylation is critical during development , and recent studies strongly suggest that reprogramming of DNA methylation is an important mechanism of “fetal origin” diseases .
DNA methyltransferase (DNA MTase) is a key enzyme in the process of DNA methylation concerning reactions of methyl group transferred to the C5 position in the cytosine base. In mammals, DNA MTases are divided into three families encoded by the respective genes, including Dnmt1 (DNA nucleotide methyltransferase 1), Dnmt2, and Dnmt3. Each family possesses several transcriptional variants . All mammalian DNA MTases can be incorporated in synergistically determining and regulating DNA methylation patterns involved in various cellular events, including cell proliferation, differentiation, and transformation . Such enzymatic methyl transfer reaction is suggested almost identical among different mammalian DNA MTases. Nevertheless, selectivity and specificity of substrates (i.e. DNA sequence), and spatial and temporal activation or regulation may be unique for each of mammalian DNA MTases. As a result, their functional and pathological roles may be remarkably different .
In this review, we briefly summarized molecular and biochemical properties of mammalian DNA MTases. Focusing on enzymatic and structural basis, we assessed each step of C5 position methylation of cytosine in DNA sequence, including substrate sequence targeting, base flipping, cofactor binding, methyl group transferring, and reaction processivity, as well as miscellaneous adaptors in regulation of those processes. Drugs targeting certain DNA methylation processes and possible targets for new drug designs also will be briefly discussed.
More than three decades ago, the first mammalian DNA MTases—Dnmt1, from Friend murine erythroleukemia (MEL), was isolated and purified [9, 10] with biochemical and molecular assays. The studies at that time indicated that mammalian DNA MTases, as putative DNA methylation machineries, were considerably conserved throughout the eukaryotes . Later, attempts to identify novel DNA MTases using expressed sequence tag (EST) databases screening resulted in the discovery of three candidates of Dnmt1: Dnmt2, Dnmt3a, and Dnmt3b [11, 12]. Another protein Dnmt3L (Dnmt3-like), with its sequence similarity to that of Dnmt3a and Dnmt3b, was identified without enzymatic activity in vitro in DNA methylation . By that time, there emerged a flood of scientific efforts at biochemical, molecular, and structural levels to elucidate properties of the four members of active mammalian DNA MTases, including Dnmt1, Dnmt3a, Dnmt3b, and Dnmt2 encoded by four independent genes . In simplicity, those mammalian DNA MTases can be categorized into three families: Dnmt1, Dnmt2, and Dnmt3 family (Table 1). In general, they are composed of the regulatory N-terminal region and catalytic C-terminal region, although the catalytic target is not cytosine nucleotide and with no regulatory domain in Dnmt2, and the catalytic domain in Dnmt3L lacks enzymatic capability for DNA methylation . Attributing to the established gene and protein models of DNA MTases, some antisense oligodeoxynucleotide (e.g. MG98) and peptide derivative (e.g. RG108) were successfully annotated for inhibiting expression and functions of Dnmt1 [14, 15].
Following sections will discuss some research data at molecular, biochemical, and structural levels regarding mammalian DNA MTases, and provide information for drug development targeting at the enzymes that play important roles in DNA methylation mediated diseases.
Dnmt1 has been generally believed to be constitutively expressed in proliferating cells and somatic tissues, which is closely related to replication foci in S phase of cell cycle. A large body of findings has showed that Dnmt1 mainly functions in the maintenance of DNA methylation, which is crucial for genomic integrity by preserving DNA methylation patterns throughout development and during whole life periods .
The gene encoded for Dnmt1 is localized at 19p13.2 in human and at proximity (5.0 cM) of chromosome 9 in mouse [10, 16]. The DNMT1 gene in human spans more than 60kb (kilobases) in the genome, composing at least 40 exons and 39 introns, and its canonical single transcript spreads about 5.2 kb long [16, 17]. Whereas, the Dnmt1 gene in mouse occupies about 45kb in the genome (containing 40 exons and 39 introns), producing a 5.3kb transcript. The protein Dnmt1 is predominantly expressed in somatic tissues and proliferating cells, and contains 1616 amino acid residues with molecular mass of about 190 kDa .
Several isoforms have been detected and they could be the consequence of intrinsic RNA splicing in vivo . During the gametogenesis in mouse, research evidence showed that an alternative splicing event related to the sex-specific 5’ exons produced at least two Dnmt1 mRNA variants—Dnmt1o and Dnmt1p.
Dnmt1o (for oocyte specific), an oocyte-specific short protein isoform encoded by Dnmt1o variant, was demonstrated to be due to translational initiation codon shifted to axon 4 in RNA editing, with molecular weight of about 170kDa and lacking 118 amino acid residues at the N-terminus. It is expressed in oocytes and pre-implantation embryo and fully functional in vivo to participate in maintaining methylation patterns of imprinted loci .
The other larger isoform of Dnmt1 mRNA variant in mouse, Dnmt1p (for pachytene spermatocyte specific), was primarily detected in pachytene spermatocytes. Dnmt1p mRNA was initially suggested without translation function attributed to inhibitory effects induced by multiple upstream out-of-frame initiation codons . Later, Leonhardt et al. showed that an alternative Dnmt1 transcript isoform identical to Dnmt1p mRNA was present in skeletal muscle and differentiated myotubes . Dnmt1p mRNA could be translated in the testis and skeletal muscle, and the protein isoform Dnmt1p could be active in the regulation of DNA methylation patterns during myogenesis and gametogenesis; however, the details for molecular enzymolgy of Dnmt1p, and whether Dnmt1p plays critical roles in development, still need to be elucidated.
In human and other primates, owing to in-frame insertion of an Alu element, there is an additional fragment extending 48nt (nucleotides) between exons 4 and 5. Consequently, a longer splice variant—DNMT1b derived from the primary DNMT1 gene is produced, and distributed throughout somatic tissues and all human cell lines tested. However, DNMT1b abundance is less than DNMT1 . Furthermore, the evidence showed that DNMT1b, the product of DNMT1b variant, with additional 16 amino acid residues incorporated in the very N-terminus of Dnmt1, possesses some similar enzymatic features to that of DNMT1 in vitro . Shen and colleagues indicated that during RNA editing in mouse, an acceptor site at the 3’ end of intron 4 in Dnmt1 was also functional. As a result, another protein isoform Dnmt1b in somatic tissues of mouse lacks 2 amino acid residues compared to that of Dnmt1. Dnmt1b was a translational product of the mRNA variant Dnmt1b . Nevertheless, whether methyltransferase activity of Dnmt1b (also DNMT1b) is intact in vivo remains unclear, and how and to which extent Dnmt1b and DNMT1b are involved in epigenetic modulation processes also should be further investigated.
Overall, Dnmt1 is composed of two parts that could be separated by V8 protease treatment: a fairly long diversified N-terminal extensions containing about 1,100 amino acid residues (the largest in all mammalian DNA MTases), and a relatively conserved C-terminal domain which occupies the remaining residues (Fig. 1). These two regions are joined by a linker of glycine-lysine (GK) repeats .
The C-terminal catalytic region in Dnmt1 contains ten characteristic sequence motifs (i.e. conserved motifs I–X), and the spacing sequences between each conserved motifs is referred as variable regions. Several lines of evidence further indicate that about six of the conserved motifs, that is motifs I, IV, VI, VIII, IX, and X, might be highly conserved in mammalian DNA MTases. In fact, the C-terminal part of Dnmt1 seems to be more closely related to that in prokaryotic DNA MTases (especially for bacterial restriction methyltransferases serving in the restriction-modification system) than to other mammalian DNA MTases (i.e. Dnmt2, Dnmt3a, Dnmt3b, and Dnmt3L) [10, 13]. This indicates that three families of mammalian DNA MTases may be generated in relatively independent ways during evolution. To date, no structural information for any part (even domain) of the Dnmt1 is acquired. However, due to the characteristic of evolutionary conservation of the conserved motifs between Dnmt1 and prokaryotic DNA MTases, the X-ray crystallography data on prokaryotic DNA MTases (especially the M.HhaI referred as a member of C-MTases) provided important hints concerning the structural features of the C-terminal catalytic region of Dnmt1 . Grossly, it is suggested that the C-terminal catalytic region of Dnmt1 could be oriented and folded into two domains. The large and small domain were separated by a large cleft .
The distribution of motifs arranged in the two domains is extremely asymmetric. The large domain encompasses the most conserved motifs, including motifs I–VIII and the most C-terminal part of motif X, which could be participated in Ado-Met [S-adenosyl-L-methionine (SAM)] cofactor binding, substrate (cytosine) targeting, and essential catalysis events. The “core” structure in the large domain is composed of the highly conserved motifs I, IV, VI, and VIII. Most of the constant amino acid residues, in the “core” structure such as PC dipeptidyl residues (proline-cysteine) in motif IV constituting the catalytic loop, are indicated to be situated facing the cleft and to be clustered around the active site in the C-terminal region of the Dnmt1 molecule .
The large domain could have a characteristic structural appearance which is composed of six-stranded parallel β sheet accompanying the seventh anti-parallel strand between the fifth and sixth strands, throughout which almost every β strand could be flanked by 1–2 α helices on either side to form an α/β/α sandwich-like structure .
Furthermore, there was evidence that the first strand initiated in middle of the β sheet to separate the large domain into two subdomains, (β1–β3) and (β4–β7) subdomain. The former (involved in conserved motifs I–III and X) could create the Ado-Met binding site, and the latter (involved in conserved motifs IV–VIII) could provide extrahelical cytosine targeting site. Both the biding sites are part of respective hydrophobic binding pockets embedded in equivalent areas within the two subdomains. Thus, the functional and structural similarities between the two subdomains in the large domain indicated that the C-terminal catalytic region can be arisen by gene duplication during evolution, although further investigation is needed .
The small domain comprises an extremely long variable region between conserved motif VIII and IX, conserved motif IX and partial N-terminal region of conserved motif X. The long variable sequence emanates as a stalk from the large domain, transverses almost full length of the C-terminal region, and links motif IX and small initial segment of motif X to be folded as a bulk mass in the small domain . A variety array of evidence showed that the small domain is partially responsible for the DNA sequence targeting process. Consequently, the secondary and more detailed structural features of Dnmt1 could be very different with other DNA MTases. Thus, corresponding structural features of the small domain may not be extrapolated directly from that of prokaryotic DNA MTases . As a result, to gain further refined structural information of the domains, we may have to wait for the direct and high-resolution conformational data on this macromolecule with methods of X-ray crystallography and NMR spectroscopy.
In addition to the C-terminal catalytic region, Dnmt1 also possesses a long N-terminal region (about 1,100 amino acid residues) that is accreted many domains over the course of evolution and is the basis for differing from other mammalian DNA MTases (i.e. Dnmt2 and Dnmt3 families) at the biochemical level . Using molecular and biochemical assays, as well as detailed analysis of the biological functions of Dnmt1, it is now well accepted that there exist multiple functional domains within the N-terminal regulatory extensions of Dnmt1 (Fig. 1).
In 1992, Leonhardt et al. initially showed that replication-foci targeting (RFT) domain, also known as targeting sequence (TS), was located in the N-terminus region (256–629 amino acid residues) of Dnmt1 by mutagenesis analysis. This domain was considered to target Dnmt1 toward DNA in S phase during cell cycle . Another PBD [proliferating cell nuclear antigen (PCNA) binding domain] sequence with 158–171 amino acid residues located proximal to TS domain also was demonstrated to be associated with replication during S phase . Between the PBD and TS domain, a short sequence termed as nuclear localization sequence (NLS) domain was suggested to induce Dmnt1 importing into nucleus . In the C-terminus of N-terminal region adjacent to the GK linker, there extends a region homologous to the polybromo-1 protein that was indicated to build two tandemly arranged bromo/Brahma-adjacent homology (BAH) domains. The BAH1 and BAH2 domains were proposed to act as protein-protein interaction modular motifs [31, 32]. In the centre of the N-terminal region, there is a cysteine-rich region present in almost all regulatory regions of mammalian DNA MTases (except Dnmt2 that lacks the N-terminal regulatory domain). The cysteine-rich region referred to as a zinc-binding domain containing eight conserved cysteine residues seems to function as CXXC-type zinc-finger protein in Dnmt1 .
Subsequent researches indicated that there were additional functional binding domains: Dnmt1-associated protein (DMAP1) binding domain was located near the very N-terminus , retinoblastoma tumor suppressor gene product (Rb) binding domain also was resided in the proximity of N-terminal region where could be overlapped with that in DMAP1 binding domain [35, 36]. Both the DMAP1 and Rb binding domains are related to transcriptional repression roles of Dnmt1. However, specific protein-protein interaction modules and definitive interfaces for physical interactions may need to be resolved further in depth.
In addition, the regulatory region of Dnmt1 also could directly interact with methyl-CpG binding proteins, including methyl-CpG binding protein-2 (MeCP2), methyl-CpG binding domain protein (MBD2), and MBD3 [37, 38]. Moreover, lines of evidence showed that the regulatory region could recruit several chromatin associated elements such as HDAC1 (histone deacetylase-1), HDAC2, H3-K9 methyltransferase, Suv39h1 (suppressor of variegation 3–9 homologue 1), and HP1β (heterochromatin protein 1β) [34, 39, 40]. However, data concerning direct interactions between the regulatory region and methyl-CpG binding proteins as well as chromatin associated elements is limited. Therefore, exact sequences of those binding domains and detailed direct interactions between the functional domains and their targeting proteins should be further investigated.
In 1998, Okano et al. reported the cloning and initial characterization of two enzymatically functional Dnmt3 members, Dnmt3a and Dnmt3b . Later, researchers identified a novel DNA methylation regulator, Dnmt3-like (Dnmt3L), which lacks active methyltransferase activity with its sequence similar to Dnmt3a and Dnmt3b . The Dnmt3 family members, especially Dnmt3L, are relatively specifically expressed in embryo and germ cells. Unlike Dnmt3L that is generally considered to be a regulatory factor in DNA machinery, both Dnmt3a and Dnmt3b are widely accepted as de novo DNA MTases for setting DNA methylation patterns .
In human, the coding genes DNMT3A and DNMT3B are localized at 2p23 and 20q11.2; and in mouse, genes encoding Dnmt3a and Dnmt3b are believed to be localized in chromosome 12A2–A3 and chromosome 2H1. The human genes DNMT3A and DNMT3B are composed of 26exon/25intron and 24exon/23intron, spanning nearly 110kb and 47kb in the genome, respectively; whereas in mouse, Dnmt3a and Dnmt3b genes are composed of 23exton/22intron and 24exon/23intron, extending approximately 79kb and 38kb in the genome, respectively [42, 43]. In human, the main mRNAs of DNMT3A and DNMT3B are approximately 4.3kb and 4.4kb, whose protein products are DNMT3A and DNMT3B that are composed of 912 and 853 amino acid residues, respectively; in mouse, the dominate mRNAs of Dnmt3a and Dnmt3b are about 4.2kb and 4.3kb, encoding protein Dnmt3a and Dnmt3b containing 908 and 859 amino acid residues, respectively (Table 1) [11, 42]. Moreover, preliminary data showed that there appeared 98% and 94% identity between the human DNMT3 (i.e. DNMT3A and DNMT3B) and the corresponding murine homologues .
Multiple alternative transcripts have been detected involved in both Dnmt3a (also DNMT3A) and Dnmt3b (also DNMT3B) in human and mouse .
As for Dnmt3a, by northern blot analysis, researchers reported three distinct tissue-specific splicing variants with 4.0kb-, 4.4kb-, 9.5kb-length in human, and 4.0kb-, 4.2kb-, 9.5kb-length in mouse. Recently, another variant as a candidate similar to the 9.5kb-length transcript was isolated, which was believed to be important for DNA methylation machinery during development rather than DNA methylation patterning in somatic cells generated by 9.5kb-length transcript. The newly determined splicing variant, named β transcript, was suggested to be the result of replacement of the 1α exon in 9.5 kb-length transcript by the 1β exon related to alternative splicing in process of Dnmt3a (also DNMT3A) RNA editing .
However, whether the splicing variants could possess their own corresponding translational isoforms (proteins) is still an unresolved issue. In 2002, Chen et al. demonstrated that a novel short isoform—Dnmt3a2, about 100kDa, in which the variable region possessing 219 amino acid residues in the N-terminus of Dnmt3a (termed as a long isoform, approximately 130 kDa) was lacked. Dnmt3a2 could be the translational product of the 4.0kb or 4.2 kb transcript variants in mouse . Later, evidence showed that the isoform Dnmt3a2 was translated underlying the mechanism by which, with a promoter sequence rich in GC nucleotides within the intron 6, the transcriptional initiation could be started at the sequences residing in the specific exon 7. Unlike Dnmt3a that is expressed ubiquitously in both embryo and most adult tissues and concentrated on heterochromatin, Dnmt3a2 is thought to act as a major isoform in embryo, and restricted in adult tissues (might only in the testis, ovary, thymus, and spleen) . In addition, DNMT3A2, a homologue to Dnmt3a2 in human, also had been detected as a short isoform lacking 223 amino acid residues in the N-terminus of DNMT3A; nonetheless, its relationship to the splicing variants have to be further determined .
In contrast to Dnmt3a, the situation concerning splicing polymorphism and the corresponding isoforms is less controversial in Dnmt3b. At beginning, EST database screening showed that DNMT3B (also Dnmt3b) could generate two different transcripts by alternative splicing . Several years later, researchers identified tissue-specific splicing variants of DNMT3B (also Dnmt3b), including DNMT3B1, DNMT3B2, DNMT3B3, DNMT3B4, DNMT3B5, and DNMT3B6 in human; as well as Dnmt3b1, Dnmt3b2, Dnmt3b3, Dnmt3b4, Dnmt3b5, Dnmt3b6, Dnmt3b7, and Dnmt3b8 [45, 46].
For underlying mechanism of in-frame alternative splicing during RNA editing, transcript variants generate their corresponding protein isoforms, in which isoform products of the variants Dnmt3b7 and Dnmt3b8 have not been fully determined [44, 46, 47]. Several lines of evidence showed that some peptide sequences localized at functional parts of Dnmt3b (DNMT3B) (for instance: conserved motif VII, VIII, target recognition domain between conserved motif VIII and IX, and etc.) could be deteriorated in isoforms of Dnmt3b (also DNMT3B) . Thus, unlike Dnmt1 and Dnmt3a, the alternative splicing could affect enzymatic integrity in Dnmt3b (also in DNMT3B). Consequently, isoforms of DNMT3B (Dnmt3b) can be divided into two categories: those do not alter catalytic activity of DNA MTase; and others may be inactive in catalysis at least in vitro. In human, DNMT3B6 (854aa, full-length), DNMT3B1 (853aa, nearly full-length), and DNMT3B2 (833aa, its transcript disrupted in exon 10 and 11 by in-frame alternative splicing) can be categorized into the former class; whereas, DNMT3B3 (770aa), DNMT3B4 (745aa), and DNMT3B5 (812aa), those can be affected in the C-terminal region by inframe alternative splicing involved in 3’ exons (e.g. exon 22 and/or exon 23), are categorized into the later class. For those in the later class, they also may disrupt target recognition domain as a result of alternative splicing related to 10 and 11 exons. In mouse, Dnmt3b1 (859aa, full-length), Dnmt3b2 (839aa, the 11 exon is disrupted during in-frame alternative splicing), Dnmt3b5 (860aa, the 2 exon can be alternatively spliced), and Dnmt3b6 (similar to that in Dnmt3b3) can be included in the former class; whereas Dnmt3b3 (776aa, 3 exons are spliced involved in sequences of C-terminal region and target recognition domain) and Dnmt3b4 (796aa, 2 exons in 3’ region expressing C-terminal catalytic domain are disrupted by in-frame alternative splicing) are in the later class. Moreover, the isoform products of Dnmt3b7 and Dnmt3b8 are resembled to that of DNMT3B4 and DNMT3B5, and categorized into the later class [44, 46, 47].
Although increasing transcript variants and the corresponding protein isoforms of Dnmt3a and Dnmt3b were discovered in dominating specific-tissue expression patterns in embryo and postnatal life, corresponding functions of mRNA and proteins of the variants are unclear [46, 47]. Specific genetic manipulation (i.e. isoform-specific gene knockout, mutagenesis, and etc.) towards the specific splicing variant and protein isoforms may be a clue to unwind such a “Gordian knot”.
Similar to that of Dnmt1, the architecture of Dnmt3a and Dnmt3b also can be viewed via N-terminal regulatory region and C-terminal catalytic region. The research evidence showed that Dnmt3a and Dnmt3b were closely related proteins in structures, especially in their C-terminal regions with about 84% identity (Fig. 1) .
Unlike Dnmt1, the crystal structure of the C-terminal domain of Dnmt3a2 had been resolved by Cheng and his colleagues in 2007 . Molecular modeling data indicated that the C-terminal region of Dnmt3a superimposed well with the representative prokaryotic DNA MTases—M.HhaI [26, 49]. The C-terminal catalytic part of Dnmt3a and Dnmt3b could also be packed into two-domain structure pattern (i.e. large domain and small domain separated by a large cleft). Following structure-based sequence alignment, the researchers found seven β strands and six α helices forming the typical α/β/α sandwich-like super-secondary structure in the large domain, and three β strands and two α helices in the small domain .
In the large domain, loops in the conserved motif IV flanked by N-terminal β strand and C-terminal α helix is the active site for enzymatic activities of Dnmt3a and Dnmt3b . There also appears substrate binding pocket in the large domain. Crystallography data even showed that residues located in the conserved motif I, II, and III were essential for Ado-Met substrate binding activity of the catalytic domain. They are resided in those loops flanked by N-terminal β strand and C-terminal α helix. Meanwhile, Phe residue within α helix in motif V and Arg residue between α helices in motif X are also important for Ado-Met binding .
Moreover, size-exclusive chromatography and co-crystal structure evidence also indicated that Dnmt3a/Dnmt3b could be formed as a homodimer, and Dnmt3a/Dnmt3b homodimer could link to two Dnmt3L to form a tetramer [with the form of Dnmt3L-(Dnmt3a/Dnmt3b)2-Dnmt3L]  (Fig. 4). Structure-based sequence alignment indicated that the variable sequence linker between conserved motif VIII and IX in the small domain mediated the interface between the two Dnmt3a/Dnmt3b proteins, acting as DNA sequence recognition domain . The interface between Dnmt3a/Dnmt3b and Dnmt3L is mediated by three α helices in the conserved motif III, V, and VII, respectively, in the large domain .
Compared to that of Dnmt1, the N-terminal regulatory part of Dnmt3a and Dnmt3b is only about half in length (Fig. 1). Unlike the C-terminal region, similarity of N-terminal region between Dnmt3a and Dnmt3b is limited. Two conserved domains exist in Dnmt3a and Dnmt3b. However, the most N-terminal region in the N-terminal regulatory domain of Dnmt3a/Dnmt3b is variable.
A cysteine-rich zinc-binding domain shares homology with a region in ATRX (alpha-thalassemia and mental retardation on X chromosome), and is belong to the Swi2/Snf2 family (mating type swithching-2/sucrose non-fermenting factor-2) of ATP-dependent chromatin remodeling factors. It is located in the C-terminal area near the GK linker of the N-terminal regulatory region of Dnmt3a/Dnmt3b to function as a transcriptional repressor domain . The cysteine-rich domain (also named ATRX-homology domain) in Dnmt3a and Dnmt3b possesses a C2-C2 zinc finger, and contains a plant homology domain (PHD)-like sequence that is involved in protein-protein interactions . Further studies have demonstrated that Dnmt3a/Dnmt3b can be interacted directly with several essential transcriptional and epigenetic regulators mediated by the cysteine-rich domain, including RP58 (repression protein of 58kDa), Myc, HDAC1, Suv39h1, and HP1 [40, 52–55].
Another conserved sequence located at N-terminal to the cysteine-rich domain is the PWWP domain (rich in proline and tryptophan residues as a highly conserved proline-tryptophan-tryptophan-proline motif), and spanned 100–150 amino acid residues in both Dnmt3a and Dnmt3b . A variety body of evidence demonstrated that the PWWP domain is a widely distributed modular protein domain found in more than 60 eukaryotic proteins, especially chromatinassociated proteins . Preliminary clinical findings showed that the ICF syndrome also could be associated with mutation in the N-terminal region of DNMT3B related to a missense defection in the PWWP domain . It is generally accepted that the PWWP domain could be an essential functional module to play roles in targeting Dnmt3a/Dnmt3b to pericentric heterochromatin and metaphase chromosomes [59, 60]. According to the structural information resolved in Dnmt3b PWWP domain, the PWWP domain in Dnmt3a/Dnmt3b can be plausibly divided into two subdomains . The N-terminal half is folded as a barrel-like structure composing five β strands, which is believed to be fairly conserved throughout almost all PWWP domains; and the C-terminal half would be packed into five-α helix bundle, which could be variable between different PWWP domains so as to be inferred to participate protein-protein interactions [56, 61]. Recently, Li et al. proposed that the PWWP domain may mediate direct interactions between Dnmt3a and a polycomb group protein, chromobox 4 (Cbx4), and function as a sumoylation E3 ligase to promote Dnmt3a via SUMO (small ubiquitin-related modifier) modification . Furthermore, two subdomains are packed into an integrated structure entity, in which the overall module exhibits an apparently positive potential as a basic surface for DNA binding .
As mentioned above, the most its variable region in the N-terminal regulatory part does not show much homology between Dnmt3a and Dnmt3b: about 280aa-length in Dnmt3a and approximately 220aa-length in Dnmt3b. It was noted that the short isoform of Dnmt3a and Dnmt3a2 lacks the most part of the variable region (219aa), and represents a diffuse localizing pattern in nuclei . These data could provide hints that the variable region should be involved in guiding subnuclear localization of Dnmt3a and Dnmt3b. Several studies later further proposed that the variable region may hold some unspecified domain for protein-protein interactions. Dnmt3a is bound to small ubiquitin-like modifier peptide-1 (SUMO-1) and ubiquitin conjugating enzyme-9 (Ubc9) as sumoylational E2 conjugating enzyme and protein inhibitor of activated STAT (PIAS). Dnmt3b is also proposed to interact with SUMO-1 and Ubc9 [63, 64]. This provides information that the variable region could regulate Dnmt3a/Dnmt3b functions mediated by protein sumoylation.
The resolved structures of C-terminal domain of Dnmt3a and PWWP domain of Dnmt3b have been taken into explaining several key events of enzymatic processes of DNA MTase [50, 56]. Nevertheless, it should be noted that the truncated form of the protein can not be represented as a native structure of DNA MTase; and thus, to elucidate the crystal structure and even the in situ structure of the full-length DNA MTases should be seriously put on the agenda so as to unmask the sobering reality of DNA cytosine methylation in mammalian kingdom.
In human, encoding gene DNMT3L is located at 21q22.3, spanning nearly 16kb in the genome, existing 12exon/II intron. Its dominate mRNA is about 1.7kb, coding the protein DNMT3L of 387 amino acid residues in length . Whereas in mouse, the Dnmt3L gene is located at the distal locus in chromosome 10 (41.6cM), extending approximately 190kb in length holding 15exon/14intron in the genome. Its main transcript is nearly 1.7kb, translating the protein Dnmt3L with 421 amino acid residues. The genes of Dnmt3L in human and mouse share about 60% identity (Table 1) .
In contrast to Dnmt1 and Dnmt3a/Dnmt3b, to date, variants in Dnmt3L are not so complicated. As a result of inframe splicing during RNA editing, in both human and mouse, there is only one additional transcriptional variant: 1.7kb variant in human and 1.8kb variant in mouse. In human, DNMT3L2 mRNA variant could be resulted from replacing of the exon 12 in DNMT3L mRNA, whose translational isoform DNMT3L2 lacks 1 amino acid residue in C-terminus of DNMT3L. Meanwhile, in mouse, Dnmt3L2 mRNA variant is the transcript disrupting in exon 3; whereas in the dominant Dnmt3L transcrip, exon 1–2 is spliced out. Although Dnmt3L2 isoform translated from the mRNA variant Dnmt3L2 also holds 421 amino acid residues, its N-terminal part is different from that of Dnmt3L [41, 65, 66]. However, it is largely unknown of those transcript variants and protein isoforms in tissue-specific expression, enzymatic properties, and definitive functions in development and disease pathogenesis.
The overall structure of Dnmt3L is tightly related to Dnmt3a/Dnmt3b, and is believed to be relatively restricted during gametogenesis as a regulator in DNA methylation machinery, involved in genomic imprinting and DNA methylation setting processes . However, either the N-terminal regulatory region or the C-terminal catalytic part in Dnmt3L seems to be a truncated form compared to that of Dnmt3a/Dnmt3b (Fig. 1).
The C-terminal region in Dnmt3L is much shorter compared to that of Dnmt3a/Dnmt3b (~200aa in Dnmt3L vs. ~300aa in Dnmt3a/Dnmt3b). Furthermore, structure-based sequence alignment indicated that several critical functional motifs commonly seen in other DNA MTases (e.g. Dnmt1 and Dnmt3a/Dnmt3b) are subject to certain unconservative substitutions. For instance, PC dipeptidyl motif in the conserved motif IV for the function of active loop could be replaced by PP dipeptide; several residues for Ado-Met cofactor binding are mutated or absent; variable region (DNA recognition domain) responsible for DNA targeting is almost disappeared [49, 68]. This may be one of reasons why Dnmt3L has not been detected to possess DNA methyltransferase activity to date.
Crystal structures of the human DNMT3L showed that the C-terminal region of Dnmt3L can be folded into a classical methyltransferase-like fold that is canonical in the prokaryotic DNA MTase (e.g. M.HhaI), and is conserved in Dnmt3a/Dnmt3b [26, 68]. The large domain is composed of seven β strands and four α helices packed into so-called α/β/α super-secondary fold; and the small domain only contains two α helices separated from the large domain by a small cleft [49, 68].
Although the C-terminal region of Dnmt3L is inactive concerning the DNA methyltransferase activity, Chedin et al demonstrated that Dnmt3L could functionally interact with Dnmt3a/Dnmt3b via the C-terminal domain, and to be involved in the de novo DNA methylation processes [69–71]. Co-crystallization research indicated that the intermolecular interactions found in the interface between the C-terminal region of Dnmt3L and Dnmt3a/Dnmt3b could stabilize conformation of the catalytic loop, and facilitate the Ado-Met cofactor binding potency of Dnmt3a/Dnmt3b . In addition, the structure-based sequence alignment data showed that the interface between Dnmt3L and Dnmt3a/Dnmt3b might be associated with two α helices in each large domains of the catalytic region, which should be localized in the conserved motif V and VII .
The N-terminal regulatory region in Dnmt3L seems to be truncated at the N-terminus compared with that of Dnmt3a/Dnmt3b (~200aa in Dnmt3L vs.~600aa in Dnmt3a/Dnmt3b) (Fig. 1). The cysteine-rich domain also exists, however, the PWWP domain in the regulatory region of Dnmt3a/Dnmt3b lacks in Dnmt3L .
The PHD sequence of cysteine-rich domain in Dnmt3L has been demonstrated to be directly interacted with HDAC1 [72, 73]. In addition, crystallography studies showed that the cysteine-rich domain, accompanying with the zinc ions in its C2-C2 zinc finger domain, could bind with the N-terminal tail of H3 histone . Ooi et al. also indicated that the methylation level in the lysine 4 residue of H3 histone was inversely related to the binding capacity between Dnmt3L and H3 histone . Furthermore, it was noted that the N-terminal region possessed an active nuclear localization sequence occupying 156–159 amino acid residues in Dnmt3L , suggesting that Dnmt3L may convert the H3 histone methylation signal into de novo DNA methylation execution commands. Taken together, the data indicated that Dnmt3L is a functional modulator assisting Dnmt3a/Dnmt3b in targeting the specific loci of nucleosome for regulation of de novo DNA methylation, and functions as a molecular linker tangling DNA methylation and histone modifications.
Like Dnmt3a/Dnmt3b, Dnmt2 also was discovered by molecular informational manipulation in 1998 . It was strikingly exciting that Dnmt2 was extremely conserved from yeast to mammals, and was the most widely distributed DNA MTases homologue . Dnmt2 only contains the C-terminal catalytic region just like a prokaryotic DNA MTase with ten conserved motifs that are fairly conserved. There was evidence that Dnmt2 was functionally located in the cytoplasm. However, its DNA methyltransferase activity and definitive roles in vivo and in vitro might still be an enigma to date [74, 75]. Recently, emerging information indicated that Dnmt2 was a DNA MTase-like RNA methyltransferase that could specifically methylate cytosine 38 in the anticodon loop of aspartic acid transfer RNA (tRNAAsp) [76, 77]. Moreover, a glycolytic enzyme—enolase was shown to be an interacting partner of Dnmt2 in inhibiting tRNAAsp methylation in Entamoeba histolytica. This could encourage a hypothesis that Dnmt2 may link the metabolic status and DNA methylation programming/reprogramming during evolution .
Although functions of Dnmt2 in mammals are still a mystery, its roles in other metazoan model organisms (e.g. drosophila, zebrafish, and ceanorhabditis) have been extensively studied [79–81]. Recently, researches found that Dnmt2 also as a nuclear protein associated with nuclear matrix in fly, involved in mitotic divisions, retrotransposon silencing, and telomere integrity in somatic cells [82, 83].
The gene coding DNMT2 (391 amino acid residues in length) in human has been designated as TRDMT1 (tRNA aspartic acid methyltransferase 1) that is located at 10p15.1 (initially reported to be localized at 10p12–14), spanning about 58kb holding 11exon/10intron in the genome. In mouse, the locus for the Dnmt2 (415 amino acid residues in length) gene—Trdmt1, is believed to occupy chromosome 2 A1 area, expending approximately 34kb with 11exon/10intron in the genome (Table 1) [11, 12].
To date, none of distinct transcript variant and protein isoform of Dnmt2 in mouse was found. In human, it is proposed that the gene TRDMT1 could generate additional two short transcriptional variants—TRDMT1b and TRDMT1c, which lacks exon4, exon4 and 5, respectively, resulting from the in-frame alternative splicing. The corresponding protein isoforms are 367 amino acid residues and 345 amino acid residues in length. It was believed that Dnmt2 is ubiquitously expressed in prenatal and adult tissues at a low level, and not prerequisite to mammalian development [13, 76, 84]. However, compared to that in Dnmt1 and Dnmt3, whether there is differential expression and localization of the transcript variants and protein isoforms of Dnmt2 is not clear, and corresponsive enzymatic activity and functions of these variants have not been elucidated.
As described above, Dnmt2 is the most conserved member in eukaryotic DNA MTases, even present in two species of the bacterial genus Geobacter . In addition, Dnmt2 is a unique mammalian DNA MTase that lacks the entire N-terminal regulatory region. Similar to that in Dnmt1 and Dnmt3, the motifs I–X are conserved in Dnmt2 (Fig. 1). Sequence alignment researches showed that the variable sequences locating between conserved motif VIII and IX also were well conserved in Dnmt2, and acted as a DNA/RNA recognition domain. The central CFTXXYXXY motif (CFT motif) in DNA recognition domain of Dnmt2 is absent in other mammalian DNA MTases, although its roles need further investigation . The crystal structure resolved in human DNMT2 showed that the structure was extremely superimposed well with the prokaryotic DNA MTase M.HhaI. Overall, Dnmt2 is composed of a large domain and a small domain. They are separated by the CFT motif as a cleft with the corresponding functional motifs and certain domains in well-organized and correct orientations .
The large domain contains conserved motif I–VIII and most of the motif X, in which α/β/α sandwich-like structure (eight-stranded β sheet with three α helices on one side and four α helices on the opposite side) are well conserved. Unlike that of M.HhaI, additional β strand is recently demonstrated in Dnmt2. In the small domain of Dnmt2, another three α helices evolutionarily emerges and they are absent in M.HhaI. The conformational structure of small domain of Dnmt2 is a propeller-like short four-stranded β sheet with five surrounding α helices (two α helices on one side and three on the opposite) . GRASP representation of the electrostatic surface of Dnmt2 showed that the acidic pocket for Ado-Met binding is adjacent to the basic pocket for targeting DNA . Also, limited data indicated that the CFT motif as a cleft in the interface between the large and small domain could make Dnmt2 contact DNA with the sequence-independent manner. Structural analysis indicated that some key residues (e.g. tyrosine) in CFT motif could make hydrogen bond with clashed unpaired base in order to increase target DNA binding affinity and to retain the target base in the “flipped out” state [48, 86].
In contrast to other mammalian DNA MTases, Dnmt2 has been recently shown to be involved in cytosine methylation of tRNA [76, 77]. In the light of these findings, Dnmt2 may be the functional dichotomy of either DNA or RNA cytosine methylation. However, the detailed dynamic expression, distribution, and definitive catalytic activity, as well as mechanisms of Dnmt2 are still a mystery. Although this protein is widely distributed and well conserved throughout the eukaryotes, the biological roles of Dnmt1 in mammals are largely unknown . Thus, to further explore the fascinating paradox of Dnmt2 may enrich our knowledge on mammalian DNA MTases and DNA/RNA methylation network.
It has been generally believed that all active mammalian DNA MTases, including Dnmt1, Dnmt3a/Dnmt3b, and Dnmt2, utilize Ado-Met cofactor as a methyl group donor. Variety arrays of evidence showed that DNA MTases could be induced to dock in the nucleus, and then be actively or passively targeted to the relatively specific DNA sequences (might be bound with one or more “chaperones”). Following rotating the target cytosine completely out of double-stranded DNA helix (referred to as base flipping), DNA MTases can provide a platform for interaction between the substrate—cytosine and the cofactor—Ado-Met, in the catalytic active-site loop and hydrophobic binding pocket, respectively. Their enzymatic efficacy can be modulated by self regulating domain as an allosteric effect or by other regulatory partners. During the process involved in the C5 carbon atom in the cytosine making nucleophilic bound with the methyl group from Ado-Met, the covalent intermediate could be transiently formed during dearomatic and rearomatic reactions on cytosine by DNA MTases. Subsequently, DNA MTases could encode certain epigenetic marks with differential processivity genuine to regulate DNA methylation patterns involved in development and diseases pathogenesis. In the context below, we will further discuss these issues in fascinating events in molecular enzymatic processes of mammalian DNA MTases.
Information derived from the classical prokaryotic cytosine methyltransferases (e.g. M.HhaI) showed that Ado-Met is buried in the hydrophobic pocket, and stabilized via hydrogen bonds and hydrophobic interactions with amino acid residues in the conserved motif I–III and X. Following the base flipping of cytosine, the covalent intermediate could endow the cytosine C5 carbon to activate a nucleophilic attack towards the methyl group in a sulfonium centre of Ado-Met. Then, the proton in C5 position of cytosine is eliminated followed by release of cysteine moiety from the covalent intermediate to generate the C5 methylated cytosine . The sophisticated DNA cytosine methylation processes involved in the sequential covalent catalysis and acid-base catalysis previously demonstrated in the prokaryotic kingdom could be well suitable for mammalian DNA MTases. These were supported, at least in part, by the data from comparative sequence alignment between prokaryotic and mammalian DNA MTases, and also by the biochemical and structural analysis of certain mammalian DNA MTases [49, 88].
In general, one base buried in the double-stranded DNA is stabilized by Watson/Crick hydrogen bonds between the corresponding paired base regardless of conformation (A-, B-, Z-, or other type) and style (free or bound) of DNA. As a result, the targeted base (cytosine) is unable to get access to the hydrophobic binding pocket or to expose their active N3, C5, and C6 positions for interactions within DNA MTases. Preliminary thermodynamic studies on prokaryotic DNA MTases (even not restricted in DNA cytosine methyltransferases) indicate that binding affinity between enzymes and substrates is inversely related to stability of the target base pair [89, 90]. This indicated that disruption of both the Watson/Crick hydrogen bonds of the target base and the stacking π interactions between adjacent base pairs could make DNA MTases bind substrate more favorable .
X-ray crystallography of the complex of M.HhaI and target sequence showed that the target base (cytosine) as well as the corresponding deoxy-ribose moiety could be flipped out of the canonical position together, and be localized within the concave active-site pocket out of the double-stranded DNA [91, 92]. Thus, “nucleotide flipping” is the reasonable nomenclature. In addition, there was evidence that some interactions between phosphodiester groups and DNA MTases could alter either distance or angle of the flanking sugar-phosphate ester bonds on either side of the targeted cytosine, and simultaneously induce conformational changes of DNA MTases [93, 94]. Meanwhile, researchers also found that the DNA MTase could move the sugarphosphate backbone to be everted, even when the target site in DNA substrate was abasic (apurinic/apyrimidinic) pair . The “base flipping” event also is manifested by other various DNA-interacting enzymes (e.g. DNA glycosylase) [95, 96]. These suggested that the sole target base (cytosine) in DNA methylation may not induce conformational changes of DNA per se. Moreover, mutagenesis investigation indicated that the glutamine residue in the variable region (i.e. DNA recognition domain spanning between conserved motif VIII and IX) could be inserted into the target DNA for filling the place of cytosine for contacting with guanine [97, 98]. Later, studies further indicated that valine and glutamate residues in the conserved motif VI and arginine residue in the conserved motif VIII immediately adjacent to the target cytosine were essential for both stabilization of the flipped-out target base and maintenance of DNA MTases enzymatic activity [99, 100]. In addition, interactions between target base and DNA MTase could dislocate adjacent DNA backbone atoms. For instance, the theonine residue in the DNA recognition domain of DNA MTases could contact with the 5’phosphate of the target nucleotide . In the light of previous kinetic results, a hypothesis has been proposed that changes of free energy (binding energy) induced by interactions between DNA MTase and DNA might be the resource for base pair destruction and conformational changes of sugar-phosphate bonds. Furthermore, researchers discovered that the induced-fit involved in the active loop rearrangement in DNA MTases, which should be the basis for enzymatic reactions in DNA methylation, was directly coupled to the “base flipping” process . Therefore, the specific cytosine methylation in DNA duplex could be essentially contributed by substrate selectivity of DNA MTases, especially by selective interactions between the active-catalytic loop and the target cytosine.
As described above, the C-terminal catalytic region of all mammalian DNA MTases is homologous to the prokaryotic DNA MTases (especially M.HhaI). Consequently, to flip the cytosine out of its native DNA double helix should also be the key determinant for DNA methylation in mammals. This is supported at least by various enzyme kinetic experiments and numerical molecular simulations concerning mammalian DNA MTases. It is indicated that the “base flipping” may be the very initial fast-equilibrium kinetic process modulating catalytic reaction rate of DNA cytosine methylation . Recently, Jia et al. reported that the cocrystal complex of the C-terminal catalytic domain of Dnmt3a and Dnmt3L fitted well to the DNA fragment cocrystallized with M.HhaI where the target cytosine was flipped out . Thus, the data further supports the reality of “base flipping” event in the enzymatic processes of mammalian DNA MTases. Interestingly, emerging findings indicated that UHRF1 [ubiquitin-like, containing PHD (plant homology domain) and RING (really interesting new gene) finger domains 1] could recruit Dnmt1 to the replication fork, and act as a “base flipping” executor [104, 105]. However, the target base is the C5 methylated cytosine in hemimethylated double-stranded DNA [106–108]. Thus, cytosine in DNA double helix in mammals might not be the specific target for “base flipping” per se, as presented in the prokaryotic DNA MTases. Protein-DNA interactions could be a plausible reality for the reconstruction of intermolecular interactions involved in the target cytosine eversion executed by mammalian DNA MTases. To date, there still lacks direct evidence supporting whether the target base (cytosine) can undergo the “base flipping” event in the process of DNA methylation by mammalian DNA MTases. Compared to that of prokaryotic DNA MTases, mammalian DNA MTases possess the functional N-terminal regulatory region (except for Dnmt2), and also interact with various regulatory elements, so as to synergistically function as an enzymatic entity. Whether and to which extent the self-regulatory domain or other modulators are involved in the “base flipping” event is unclear, and needs further studies.
Ado-Met is a methyl group donor involved in most of methyl transfer reactions, as well as in DNA methylation. In prokaryotes, Ado-Met was demonstrated to be buried in the hydrophobic binding pocket related to conserved motifs I–III and X . The high binding affinity between DNA MTases and Ado-Met was supported by the chromatographic experiments showing that Ado-Met could be co-purified together with M.HhaI . Relevant evidence further indicated that almost all hydrogen bond donors and acceptors in Ado-Met were interacted with the corresponding atoms of the certain conserved amino acid residues in DNA MTases . In addition, some hydrophobic interactions also exist between Ado-Met and DNA MTases .
In addition, the binding site of Ado-Met can be changed in association with DNA binding state of DNA MTase. O'Gara et al. provided evidence that the primed orientation of Ado-Met in the hydrophobic binding pocket of the DNA—M.HhaI—Ado-Met tertiary complex was changed when DNA was absent in the M.HhaI—Ado-Met binary complex . The S-adenosyl-L-homocysteine, a product of Ado-Met in DNA methylation, can be buried in a differential conformational orientation, so as to be interacted with the active-site loop in DNA MTases in inhibiting catalytic potency of DNA methylation . These data led us to consider that cofactor binding in DNA MTases could manifest differential modes in a sequential order regulated by DNA targeting and enzymatic reaction processes per se . In prokaryotes, more than one Ado-Met binding site was detected in DNA MTases. Ado-Met binding was suggested to be able to allosterically modulate the enzymatic activity through the pathway influencing the active site in DNA MTases . Thus, we would like to postulate that DNA, Ado-Met, and DNA MTase in an enzymatic tertiary complex could regulate reciprocally each other.
The structure-based sequence comparison and the resolved crystal structure of Dnmt3a-Dnmt3L complex indicated that Ado-Met in mammals also were embedded in the hydrophobic binding pocket where the corresponding amino acid residues were resided in the conserved motifs I–III, V, and X of mammalian DNA MTases . Whether the native Ado-Met binding site also resided in the resemble orientation should be further investigated.
To fulfill methylation reaction in the C5 position of the target cytosine, the nucleophilic potency of cytosine is the foundation for transferring methyl group from the sulfonium centre in Ado-Met to the C5 position in cytosine. Nevertheless, distribution of reactive electrons in the cytosine ring is delocalized. Thus, cytosine alone could not provide an effective nucleophilic attack in the C5 position to methyl group in Ado-Met. In this context, destruction of the aromaticity of cytosine ring could be prerequisite to DNA cytosine methylation. DNA MTases facilitate these processes by lowering the energy barrier under the mechanism that a transient covalent intermediate could be created followed by a series of acid-base catalysis. This is critically attributed to sophisticated interactions among target base, cofactor, and enzyme.
The reaction mechanism of methyl group transferring from Ado-Met to the cytosine C5 position had been extensively investigated in the prokaryotic DNA MTases . The essential step is the formation of covalent intermediate resulted from a nucleophilic attack of the key cysteine in the active-site loop of DNA MTase on the C6 position in the target cytosine [113, 114]. Owing to the enzyme, this reaction might be facilitated by transient protonation of the N3 position with assistance of glutamate residue in the conserved motif VI of DNA MTase . Consequently, the aromaticity of cytosine could be broken; and the activity of nucleophilic attack of the C5 position in cytosine can be potentially strengthened. As a result, C5 carbon could be potently activated to attack methyl group in the sulfonium centre of Ado-Met, followed by deprotonation of the N3 position in cytosine. Finally, the covalent enzyme—cytosine complex could be resolved by eliminating the proton on the C5 position in cytosine, so as to release thiol group of the cysteine moiety (Fig. 2). However, none of amino acid residue(s) has been shown to act as a Lewis base (basic residue) responsible for the deprotonation of the C5 position in cytosine. Theoretical calculation showed that hydroxide anion rather than basic residue(s) in the active site, achieved through an essential water channel in DNA MTases, may be actually involved in the deprotonation step immediately prior to release of the cytosine moiety . In concert, the nature concerning the proton abstracting involved in the β-elimination reaction is still delusive and has to be further dissected. Currently, frequently used clinic drugs against cancers like cytosine analogues [e.g. 5-azacytidine, 2′-deoxy-5-azacytidine (decitabine), arabinosyl-5-azacytidine (fazarabine), and dihydro-5-azacytidine (DHAC)], are considered to incorporate into genome to mimic cytidine binding with DNA MTases. Owing to the nitrogen atom at position 5 in the bases of cytosine analogues, the β-elimination reaction could not occur between drugs and enzymes. Consequently, DNA MTases may be degraded and the DNA methylation process is inhibited .
Furthermore, it is notable that to generate the transient covalent intermediate described above might be very complicated. So far, the mechanism underlying cytosine methylation mentioned above is almost derived from structural information of the tertiary complex of DNA—enzyme—cofactor in postchemistry state and from mutagenesis combined with enzyme kinetic studies [25, 103]. Recently, a prechemical crystal structure of the DNA—enzyme—cofactor tertiary complex has been described. Following the refined comparison between prechemical and postchemical tertiary structures, Youngblood et al. suggested that the target cytosine could be tilted up to the cysteine residue in the active-site loop prior to the formation of covalent intermediate . The compressive arrangement was critically important for facilitating the nucleophilic attach on C6 carbon in cytosine ring derived from the thiol group in the cysteine moiety. In addition, functional and simulating data showed that, besides glutamate residue, another two arginine residues in the conserved motif VIII of DNA MTases could also play critical roles in protonation of N3 position in the target cytosine [119–121]. Quantum mechanical/molecular mechanics analysis further indicated that the two arginine residues in the conserved motif VIII may be essential for stabilizing the delocalized electrons in cytosine moiety of the transient covalent intermediate through their intrinsic electrostatic interactions . It was also indicated that the free water molecule, buried between the two arginine residues in the conserved motif VIII of DNA MTases stabilized by hydrogen bonds, was directly detected in the prechemical crystal structure of the tertiary complex and was firmly coupled to the prime orientation of Ado-Met . Therefore, it should be inferred that the free water molecule coupled with two arginine residues might modulate active-site loop in DNA MTases in stretching to and maintaining the closed conformation status, and that the compressive arrangement of cytosine—catalytic loop—Ado-Met could be transiently stated in order to effectively promote methyl transferring reaction.
Unfortunately, DNA cytosine base might be deaminated as a mutagenic reaction catalyzed by DNA MTase in prokaryotes . Solvent such as water could penetrate into active site of DNA MTase towards the relatively localized π electrons in the activated cytosine induced by covalent linkage between C6 position in cytosine and thiol group in cysteine moiety. Therefore, the amino group in the C4 position of cytosine could be replaced to form a carbonyl oxygen atom as such the conversion of cytosine to uracil is accomplished  (Fig. 2). The transient covalent intermediate is a prerequisite for the mutagenic cytosine deamination catalyzed by DNA MTases in prokaryotes . In bacteria, the mutagenic cytosine deamination is triggered in a case of undernutrition (e.g. limiting Ado-Met). This is an evolved physiological advantage for evading lethal deterioration digested by phage restrictive endonucleases [123, 124]. So far, the mutation of cytosine deamination has not been considered to be benefit for mammalian cells. In mammals, the steric blocks derived from the binding of Ado-Met or S-adenosyl-L-homocysteine in mammalian DNA MTases could prevent mutagenic cytosine deamination triggered by harmful water accession . Furthermore, mammalian DNA MTases can not effectively attack the target base without Ado-Met binding [125, 126]. These studies represent that susceptibility of cytosine mutation induced by DNA MTases in eukaryotes is less likely than that in prokaryotes. Thus, in eukaryotes, at least in mammals, epigenetic events may not interfere with intrinsic genetic states, and such the vital biological stability could be well maintained.
Globally, according to the structure-based sequence alignment data, DNA methylation reaction mechanism in mammalian DNA MTases is similar to that of prokaryotes . Interestingly, evidence showed that covalent catalysis might not be the crucial step in mammalian DNA cytosine methylation, at least in the case of Dnmt3A . These findings suggested that conformational changes of the catalytic loop in mammalian DNA MTases can be regulated by unknown modulators, and the enzyme could then directly catalyze the methyl transferring reaction under the assistance of glutamate residue in the conserved motif VI. Later, geometrical orientations among cytosine, cysteine, and S-adenosyl-L-homocysteine in the crystal structure of Dnmt3A showed that the covalent intermediate might still be important for DNA methylation . However, the active site of mammalian DNA MTases could also change its conformation postchemically. Consequently, the actual arrangement of the tertiary complex cytosine—catalytic loop—Ado-Met might not simply extrapolated from the postchemical crystal structure described above. In addition, mammalian DNA MTases still possess the N-terminal regulatory domain and bind to various functional partners; and the active site loop could be effectively modulated by interactions in the interfaces between Dnmt3A and Dnmt3L . Thus, DNA methylation reaction is surely more complicated. The roles of conserved residues in the active-site loop of mammalian DNA MTases might be evolutionarily changed. Whether there may emerge additional functional residues or small molecules (e.g. water, free ions) attributing to this enzymatic process should be a good topic for detailed mechanisms in DNA methylation in mammals.
All mammalian DNA MTases may preferentially select CG sites in DNA . As a consequence, some small molecule compounds, for instance: procaine and procainamide, have been designed to bind to CG-rich sequences so as to interfere with the binding efficacy of DNA MTases. Interestingly, procainamide specifically inhibits Dnmt1 but not Dnmt3a/3b; however, the underlying reasons should be further elucidated [129, 130].
Subsequent researches on binding or kinetic preferences showed that substrate selectivity of mammalian DNA MTases may depend on the status of DNA sequence (CG sites and the flanking sequence), DNA structures (single vs. double strand), DNA methylation patterns (unmethylated vs. hemimethylated, fully methylated), and DNA length [131–133]. Furthermore, a great array of evidence demonstrated that degree of target sequence selectivity and magnitude of preference for different methylation states of DNA represented remarkable divergence among different mammalian DNA MTases [13, 48]. As described above, Dnmt2 is the RNA MTase whose DNA methylation activity is not well ascertained and the DNA sequence selectivity is obscured . Certain tRNA sequence could be the effective target of Dnmt2.
It has been widely accepted that Dnmt1 methylated target symmetric CG sites in hemimethylated DNA sequence with a high specificity . This property is essential for imparting existing DNA methylation patterns after DNA replication, as Dnmt1 serves as maintenance DNA MTase. Recently, it has been shown that UHRF1 could, through PHD or SRA [SET (Su(var), E(z), Trithorax) and RING module associated] domain, physically bind to regions in the regulatory domain of Dnmt1 [104, 105, 134]. Moreover, as mentioned above, detailed crystallographic researches showed that UHRF1 could specifically project the C5-methyl cytosine out of the hemimethylated DNA double helix via its SRA domain [106–108], suggesting that UHRF1 could be a new target interfering with DNA MTases. Small molecules such as 5-methylcytosine-like compounds may inhibit SRA domain's functions is block DNA methylation processes. However, it is still far beyond its clinical uses .
The SRA domain, of about 210 residues in UHRF1, a single globular structure contains two twisted β sheets (composed of mixed five β strands and anti-parallel three β strands, respectively) and scattered four α helices and one 310 helix surrounding the shell-like β sandwich. This domain is characterized as a crescent moon-like conformation revealed by X-ray crystallography . The concave surface of the SRA domain interacts with the target DNA sequence to be responsible for recognition of hemimethylated CG site and promotion of C5-methyl cytosine eversion . The overall structural mode of the SRA domain-DNA duplex complex was imagined as a hand clamping the double-stranded DNA, with the Arg-associated loop (between β5 strand and α2 helix) and Val-associated loop (between α1 helix and β2 strand) corresponding to a finger and a thumb projecting into the major and minor groove of DNA, respectively, and the positively charged inner patch as a palm holding a deep pocket for binding C5-methyl cytosine  (Fig. 3). Evidence indicated that several amino acid residues in both the finger and of the thumb could interact extensively with nucleotides in the target DNA sequence. The Arg residue in the finger is important for binding of the SRA domain and targeting DNA sequence, whose guanidinium side chain could form hydrogen bonds with O6 and N7 positions in the intrahelical orphaned guanine ring, so as to simulate the Watson-Crick hydrogen bonds. It was also represented that the Arg residue might even stack with adjacent bases in the methylated strand to compensate for eversion of the C5-methyl cytosine. Another amino acid residue—Asn, located near the Arg residue in the finger, which is prerequisite for specificity of binding to hemimethylated CG sites, could be collaboratively organized with the Arg residue to recognize the opposite adjacent unmethylated cytosine and effectively interact with the phosphor-diester in the complementary strand. Lacking steric clashes derived from methyl group in the C5 position, the main-chain carbonyl group of the Asn residue in the finger could be effectively form essential hydrogen bonds with the cytosine ring in the unmethylated strand (Fig. 3). Moreover, the Val residue in the thumb may occupy the vacancy left by the extrusion of C5-methyl cytosine so as to approach a proper distance from the minor groove side, leading a van der Waals interaction between its methyl group and the aliphatic portion of Arg residue in the finger located in the opposite major groove [106–108]. In the light of these, the finger and the thumb in the SRA domain is termed as hemimethylated CG recognition loop and base-flipping promotion loop, respectively . Furthermore, researchers found that, following a dramatic conformational change of the SRA domain induced by its binding with the target sequence DNA, the space of the target base binding pocket was approximately 6×10×6 Å in the palm that was not enough for binding purines . In addition, crystallography data showed that the C5 methylated cytosine was well stabilized by the π stacking force via the two benzene rings of two Tyr residues in the palm located in the two sides of the pyrimidine ring. In the palm, the amide groups in the main chain of the Gly and Ala residues, the carbonyl groups in the side chain of the Asp residue, and the main chain of the Thr residue, are well accepted to form hydrogen bonds with O2 and N4 positions in the target cytosine, respectively, so as to discriminate the thymine base [106–108]. To explain specificity for C5-methylated cytosine binding, researches showed that, deep in the pocket, there existed a hemisphere spacing, of about 2 Å in radius, explicitly fitted the methyl group. Additionally, crystallographic data demonstrated that the C5 methyl group in the target cytosine was tightly packed in a cavity strengthened by various hydrogen bonds and van de Waals interactions with the Tyr, Thr, and Gly residues in the palm . Especially, van de Waals interactions are tightly stabilized between the C5 methyl group of the target cytosine and the Cα and Cβ atoms of the Ser residue in the palm, and the hydroxyl group of that Ser residue interacts with the phospho-diester in the methylate strand  (Fig. 3).
The crucial Gly residue at the entrance to the deep binding pocket in the palm is a key determinant for the high affinity of the SRA domain to the hemimethylated DNA sequence . Detailed analysis of the X-ray crystallography data indicated that the contact area of the SRA domain to one everted C5-methylated cytosine in hemimethylated CG site was comparable to that of methyl-DNA-binding domain (MBD) to the two C5-methylated cytosines in fully methylated CG site, and the contact area in the interface of SRA-DNA complex was also much larger than that of the MBD-DNA interface [106, 108]. Consequently, interactions with the target base flipped out could be essential and economic for binding of UHRF1 to the target DNA sequence. Based on those findings, we could propose that the mechanism underlying “base flipping” concerning the methylated cytosine executed by UHRF1 exclusively in the hemimethylated CG site may be the determinant for maintenance of methylation by Dnmt1 recruited by UHRF1. The initiation of methylation reaction by Dnmt1 might follow the release of UHRF1 from the hemimethylated DNA sequence . Thus, whether UHRF1 could determine or elevate the binding specificity and affinity should be further determined . However, it was noted that conformation of the unbound state of the SRA domain is largely different from that of bounded form, leading to an imagination that the initial selection or primary binding of UHRF1 and DNA may be nucleotide sequence-independent . Given this fact, what should be the so-called “a priori” for the specific binding of UHRF1 to the hemimethylated DNA sequence? This basic issue has yet not been illuminated.
In addition, in vitro enzyme kinetic studies indicated that Dnmt1 not only binds to hemimethylated double-stranded DNA sequence, but also may interact directly with CG-rich single-stranded oligonucleotide in which one CG site was methylated . Furthermore, binding studies showed that Dnmt1 could bind to both unmethylated and fully methylated double-stranded DNA, and could weakly methylate cytosine in the unmethylated DNA duplex . After binding with that type of single-stranded oligonucleotide as mentioned above, or with unmethylated double-stranded DNA, catalytic activity of classical maintenance of DNA methylation by Dnmt1 could be inhibited . Nonetheless, fully methylated double-stranded DNA can “buffer” the inhibitory effect of unmethylated double-stranded DNA to Dnmt1 that catalyze its own binding to unmethylated DNA duplex . Recent researches further demonstrated that non-coding RNA could also bind to the regulatory domain of Dnmt1 in the control of DNA methylation . Consequently, an explanation for the phenomenon was introduced from those findings. It states that several nucleotide sequences (either single-stranded or double-stranded, unmethylated or fully methylated DNA duplex, DNA or RNA) adjacent or distant (in the same or other DNA strands) from the DNA region bound to the catalytic domain of Dnmt1, might allosterically regulate activity of methylation reaction through interactions with the regulatory domain of Dnmt1 . However, the “allosteric regulatory model” applied in enzymatic reactions by Dnmt1 is only imagined from in vitro kinetic experiments, it still lacks real 3D structural information on the full-length Dnmt1 complexed with its interacted nucleotide sequences. Those issues should be solved in future.
In general, Dnmt1 selectively targets hemimethylated double-stranded DNA to apply its intrinsic roles in molecular memory of DNA methylation status after cellular proliferation. However, as described above, DNA methylation in the unmethylated DNA duplex also can be catalyzed by Dnmt1. This may be strengthened under allosteric activation by presence of fully methylated DNA . In addition, forthcoming experimental data showed that overexpression of Dnmt1 could hypermethylate the global genome, and Dnmt1 may function as de novo methylation fashion interacted with Dnmt3a/Dnmt3b [140, 141]. These results further indicated that the selective binding with the hemimethylated DNA duplex by Dnmt1 for maintenance of DNA methylation may be relative. The de novo-like DNA methylation reaction by Dnmt1 could be induced by certain conditions during development or diseases. The bimodal characteristics of Dnmt1 in canonical maintenance and inducible de novo-like property in vivo are essential for stabilization of epigenetic inheritance and efficiency of gene regulation switched on/off by the epigenetic network.
As discussed above, although Dnmt3a/Dnmt3b is the key enzyme for the de novo DNA methylation. Without assistance of Dnmt3L, DNA methylation patterns of imprinted loci germ in cells could be potentially affected [142, 143]. Cocrystallographic information showed that hydrophobic interactions on the interface of Dnmt3a and Dnmt3L may be induced by the Phe residues in both Dnmt3a and Dnmt3L. Moreover, the Asp residue and Arg residue in Dnmt3L may form polar interactions with the Arg residue and Glu residue in Dnmt3a, respectively, at the neighborhood of the activesite loop of Dnmt3a. Furthermore, the polar interaction between Dnmt3a and Dnmt3L is stabilized by additional intermolecular hydrogen bond and intramolecular polar interaction. Thus, native open conformation of the active-site loop in Dnmt3a can be clashed to the closed state induced by formation of the Dnmt3a-Dnmt3L interface, and increase catalytic activity of Dnmt3a . Meanwhile, Dnmt3a also can self contact to generate the Dnmt3a-Dnmt3a interface formed by polar interaction between Arg and Asp residue in one Dnmt3a and the corresponsive Asp and Arg residue in another Dnmt3a . Subsequently, two Dnmt3a and two Dnmt3L may form a tetramer complex possessing three interfaces: one central Dnmt3a-Dnmt3a homodimer interface and two lateral Dnmt3a-Dnmt3L heterodimer interfaces. The overall Dnmt3L-(Dnmt3a)2-Dnmt3L complex is an elongated (approximately 160×60×50Å), butterfly-like structure  (Fig. 4). The length of the Dnmt3L-(Dnmt3a)2-Dnmt3L tetramer (about 16nm) is apparently greater than the diameter of the nucleosome (about 11nm). This is the structural foundation for directly physical interactions between Dnmt3L of Dnmt3L-(Dnmt3a)2-Dnmt3L tetramer and H3 histone tail of nucleosome. Consequently, Dnmt3a could be targeted to DNA duplex assisted by Dnmt3L .
Unlike Dnmt1, Dnmt3a/Dnmt3b did not discriminate between hemimethylated and unmethylated DNA duplex so that the nomenclature of de novo DNA MTase is assigned for Dnmt3a/Dnmt3b . Interestingly, in vitro experiments showed that cytosine methylation positions, in a long DNA duplex catalyzed by Dnmt3a, represented a periodical property with its periodicity of approximately 10 base pairs (about one DNA helical turn) . Analysis of the structural data showed that the successive two active-site loops, facing roughly from the opposite direction in the Dnmt3L-(Dnmt3a)2-Dnmt3L tetramer, were about 40 Å apart which was equal to the distance of one helical turn  (Fig. 4). This may explain specific DNA methylation in a spacing fashion catalyzed by Dnmt3a in the target genes .
Surprisingly, besides the classical symmetric CG site, Dnmt3a/Dnmt3b could methylate non-CG sites, especially the CA site, although the enzymatic activity is apparently lower than that in the CG site [145, 146]. In light of these, Dnmt3L-(Dnmt3a)2-Dnmt3L complex and histone modification status (methylation or acetylation) could affect subsequent regulations of various genes via direct interactions between Dnmt3L and histone tails, especially during development . A possibility might exist that the DNA region where de novo DNA methylation or histone modification occurs may lack symmetrical CG sites. Dnmt3a/Dnmt3b could be forced to use non-CG site as methylation target, together with other epigenetic machinery, to synergistically control expression of the gene. Another speculation postulated that non-CG DNA methylation might be a rapid onset repression of gene expression in early embryonic periods. After emergence and maturation of other epigenetic mechanisms, non-CG site methylation catalyzed by Dnmt3a/Dnmt3b may be no longer required. Obviously, this issue should be further investigated. To dynamically analyze the functional development of epigenetic mechanisms and compare them in details during embryogenesis may provide clues for the explanation.
Whether mammalian DNA MTases only bind to CG site in the nucleotide sequence is another interesting question. According to typical protein-DNA interaction, mammalian DNA MTases (i.e. Dnmt1 and Dnmt3a/Dnmt3b) could not merely exclusively bind to CG dinucleotides . Evidence showed that changes in other nucleotides adjacent to the central CG site, about 6–12 nucleotides, termed as the “flanking sequence”, may modify catalytic activity of mammalian DNA MTases. It is suggested that Dnmt3a and Dnmt3b preferentially select the flanking sequence with the purine base at the 5’end and pyrimidine base at the 3’end in the central CG site . However, structural basis and refined modulation by the flanking sequence need additional studies.
DNA MTases are enzymes acting on a long substrate. This could methylate more than one target cytosine without release from DNA. In vitro kinetic findings showed that the turnover rate of mammalian DNA MTases is obviously faster than its dissociation rate. This behavior of the enzyme is termed as the processivity. Dnmt1 is highly processive in the DNA methylation on a long hemimethylated DNA duplex, especially during DNA replication with its binding to PCNA . Recent findings demonstrated that methylation process was clearly slower than DNA replication in the heterochromatic region, in which high density of CG sites were present . This indicated that in the heterochromatin region DNA methylation may lag behind DNA replication that can be impeded. To overcome complications, Dnmt1 is first released from the replication fork and then rebind to the newly replicated DNA duplex to catalyze cytosine methylation in unmethylated CG sites. Further studies showed that active lowering of enzyme efficacy might even applied by Dnmt3a/Dnmt3b . These results could suggest that processivity of Dnmt1 is relative, and it may be spatially or temporarily regulated.
Dnmt3a and Dnmt3b represent distributive and processive enzymes, respectively; although there is nearly 85% identity in amino acid sequences between their catalytic domains . Comparative amino acid sequence alignment analysis showed more basic amino acid residues in the catalytic domain of Dnmt3b than that in Dnmt3a. Thus, more positive charges in the catalytic domain of Dnmt3b may be the reason for its higher efficiency grasping negatively charged nucleotides . As a result, Dnmt3a and Dnmt3b can methylate cytosine in certain defined CG sites and be functional in an extensive large domain in some genes. Via those mechanisms, the de novo DNA methylation could be set up effectively, and regulated precisely and separately.
In mammalian cells, maintenance and de novo methylation events can be carried by Dnmt1 and Dnmt3a/Dnmt3b, respectively, while there may exist some functional overlap. Although the primary protein structures among them are fairly similar, each mammalian DNA MTase is the product of different genes located in different chromosomes. The catalytic domain of mammalian DNA MTases is almost identical. The organized enzymatic reaction of methyl group transferring, concerning target nucleotide flipping out of the DNA duplex, Ado-Met cofactor binding, and multiple covalent and acid-base catalysis for the methyl group transferring from Ado-Met to the C5 position of cytosine, could be constantly conserved from bacteria to mammals. A variety array of evidence indicated that UHRF1 and Dnmt3L may serve as the regulatory adaptor to recruit and target the Dnmt1 and Dnmt3a/Dnmt3b binding to selective and specific regions and status of methylation in the DNA sequence, however, the exact interaction between them is unclear. In particular, primary motivation of the key step—base flipping, involved in the sequence recognition by either DNA MTases or adaptors (i.e. UHRF1), has yet not been elucidated. In addition, several regions in the regulatory domain also may be related to other epigenetic events, such as histone acetylation, histone methylation, and RNA interference. To data, there have emerged various inhibitors of mammalian DNA MTases, applying mainly in cancer therapy. Only cytosine analogues and some certain small molecule compounds get approval for clinical treatment. The anti-cancer DNA MTase inhibitors could merely target or interfere with a general event in the enzymatic process with no specificity. As a consequence, only overall DNA methylation at genome level may be adjusted. Furthermore, different DNA MTase can play a different role in pathogenesis of either cancer or other diseases. Therefore, drugs with more specificity targeting unique DNA MTase have to be designed and annotated.
Partially supported by NIH HL090920, National Natural Science Foundation (30973211) and NSCF key grant, Jiangsu Grant (BK2009122, 08KJB32001), and Suzhou Grant (No: 90134602, SZS0602).