Catalytic modification of bases in nucleic acids is universally observed across the three primary superkingdoms of life and is the basis for a wide range of biological functions.
1 Certain modifications of bases in rRNAs and tRNAs, such as methylation, thiouridylation and pseudouridylation, are traceable to the last universal common ancestor of all life and are essential for survival.
2,3 Other RNA base modifications are more limited in their distribution. For example, wybutosine is found only in archaeal and eukaryotic tRNAs, whereas certain forms of methylation and thiouridylation show even more narrow phyletic distributions.
2,3 In contrast, DNA base modifications are apparently less diverse and more sporadic in their distribution; enzymes catalyzing them are not essential in most lineages of life.
4 This difference is potentially attributable to the constraint of needing to maintain double-helical pairing in DNA and protecting the genetic material from the potentially mutagenic effects of base modifications.
5The most common DNA modification in prokaryotes, methylation of cytosine or adenine, is primarily catalyzed by methylases encoded by mobile restriction-modification (RM) systems.
4,6 These methylases have a predominantly defensive role in immunizing the host DNA against the activity of the restriction endonucleases, which cleave invading DNA, such as those of bacteriophages.
7,8 In certain prokaryotes, DNA methylation additionally supplies an epigenetic mark for DNA repair.
9 Eukaryotes too possess several distinct DNA cytosine methylases related to the bacterial RM methylases. These have been shown to have a role in chromatin organization, regulatory gene silencing, repression of selfish DNA elements, and possibly other epigenetic processes in several animals, fungi and plants.
10–13 DNA modifications other than methylation are primarily known from caudate bacteriophages and include a spectacular array of modified bases such as 5-hydroxymethylpyrimidines and their mono- or di-glycosylated derivatives, α-putrescinylated or α-glutamylated thymines, sugar-substituted 5-hydroxypentyluracil, and N6-carbamoylmethyl adenines (called Momylation after the mom enzyme of phage Mu).
8,14 These atypical modifications are used by phages in countering the host DNA restriction response. Other DNA base modifications have become apparent in eukaryotes. The simplest of these is deamination of cytosine that appears to be mainly involved in diversification of immunity molecules in vertebrates.
15–17 Another well-studied eukaryotic modification is the formation of β-D-glucosyl-hydroxymethyluracil or base J from thymine in euglenozoans, including the parasites Trypanosoma and Leishmania.
18While enzymes catalyzing several of the major RNA modifications have been biochemically well-characterized and crystallized, fewer DNA-modifying enzymes have been studied in detail. Of the latter, the well-studied ones are DNA methylases, namely the 5-methylcytosine-generating methylases of both bacteria and eukaryotes and the bacterial N6-methyladenine-generating enzymes.
19–22 Additionally, the classic T-even phage modification system comprised of the 5-hydroxymethylcytosine (hmC) synthase and glucosyltransferases that further modify this base have been characterized. These studies have revealed that phage 5-hydroxymethylcytosine and 5-hydroxymethyluracil synthases are derived from the classical thymidylate synthases, which are often encoded by several DNA viruses, including these T-even phages.
23 Thus, phage 5-hydroxymethylpyrimidines are not derived by direct DNA modifications but by an incorporation of pre-modified base during viral DNA synthesis. On the other hand phage DNA bases glycosyltransferases (that modify the 5-hydroxymethylpyrimidines) are members of the glycogen synthase/glycogen phosphorylase fold (e.g., alpha-glucosyltransferase and beta-glucosyltransferase) or the Fringe-like glucosyltransferase (e.g., beta-glucosyl-HMC-alpha-glucosyltransferase), which directly modify the hmC in DNA.
24,25 Likewise, the phage mu enzyme, Mom, directly modifies adenines in DNA by adding a carbamoylmethyl or a related adduct, and was recently shown to belong to the GCN5-like acetyltransferase fold.
26 Other recent studies have shown that the first step in the synthesis of base J in trypanosomes, i.e., oxidation of the methyl group on thymine to generate 5-hydroxymethyluracil, occurs in situ in DNA. This reaction is catalyzed by JBP1 and JBP2, enzymes of the 2-oxoglutarate- and iron(II)-dependent dioxygenase (2OGFeDO) superfamily, and represent the first example of in situ oxidative modification of methylpyrimidines, in contrast to the T-even phage hmC generation pathways in which premodified bases are incorporated into DNA.
27,28Other enzymes of the 2OGFeDO superfamily catalyze a variety of oxidative reactions such as:
(1) Oxidation of carbons in an aromatic ring to generate phenolic groups; e.g., hydroxylases in flavonoid synthesis.
29 (2) Oxidation of aliphatic and alicyclic carbons e.g., amino acid modifications in proteins, namely hydroxylysine and hydroxyproline catalyzed respectively by lysyl and prolyl hydroxylases.
30 (3) Ring opening/closing via C-N and C-S bond formation; e.g., isopenicillin synthase.
31 (4) Oxidation of C-C bond in side-chains linked to an aromatic ring; e.g., thymine-7-hydroxylase, which oxidizes thymine to carboxyuracil in the thymidine/uridine salvage in fungi and bacteria.
32 Trypanosome JBP1 and JBP2 also catalyze this class of reaction, albeit on DNA rather than the free base.
27,28 (5) Demethylation of N-CH3 side chains linked to heterocyclic aromatic rings. This is typified by the AlkB family that functions in DNA repair by reversing methyl adducts on bases (e.g., N6-methyladenine) produced by DNA alkylating agents via complete oxidation of the methyl group to formaldehyde.
33,34 Although both AlkB and JBP1/2 operate on methyl groups on bases in DNA their catalytic domains are only distantly related. This suggested that there might be as yet undetected enzymes that catalyze the oxidative modification of DNA in this superfamily. Most enzymes of this superfamily that act on low-molecular weight substrates are standalone proteins with compact dioxygenase domains. However, those that act on biopolymers like nucleic acids and proteins are frequently fused to other nucleic-acid- or protein-interacting domains (e.g., Swi2/Snf2 ATPase module in JBP2,
35 and the MYND finger in Egl-9 like prolyl hydroxylases
34). Alternatively, they contain peculiar conserved inserts within the catalytic domain that help in binding their biopolymer targets (e.g., AlkB).
36 We accordingly hoped to utilize these features as contextual information in a computational protocol to identify potentially novel members of the 2OGFeDO superfamily that catalyze in situ oxidative modifications of nucleic acids.