The Broad Swath of Classic Seven Beta Strand Methyltransferases
Seven beta strand enzymes (also referred to as “Class I” methyltransferases) appear to make up the majority of methyltransferases in organisms [14
]. This group includes the mammalian de novo
and maintenance DNA methyltransferases [3
], the Dot1 histone lysine methyltransferase [19
], and the HEN1 microRNA methyltransferase [8
], all known enzymes that play roles in epigenesis. Remarkably, sequence similarity is shared between methyltransferases ranging from the Saccharomyces cerevisiae
enzyme active on small molecules (Tmt1), the Mycoplasma arthritidis
enzyme active on DNA (HhaI), the Arabidopsis thaliana
enzyme active on lipids (UbiE), the human enzyme active on protein (PCMT1), to even the Bos taurus
enzyme active on inorganic arsenite (AS3MT). Despite vastly different substrates of methylation, primary sequence similarity was found in small regions of these proteins before any structural information was available [20
In , we give the histone lysine methyltransferase Dot1 as an example of this class of enzyme [19
]. Enzymes in this class of methyltransferases share a common seven strand twisted beta sheet with a C-terminal beta hairpin, sandwiched between alpha helices [15
]. Four signature motifs are present (I, Post I, II, and III; [13
]). Residues of Motif I and Motif Post I contact AdoMet. The conserved aspartate amino acids in these motifs are key in stabilizing charged AdoMet species as well as hydrogen bonding to two different locations of the cofactor for positioning the methyl group to transfer. The last residues of β4 and β5, which make up portions of Motifs II and III, respectively, form part of the catalytic domain and can bind the methyl-accepting substrate [13
]. A few enzymes in this methyltransferase class deviate from this structural core, most notably the protein arginine methyltransferase PRMT1 that lacks β6 and β7 [15
] and the circularly permutated motifs in plant DRM enzymes [18
Figure 1 Seven structural strategies for methyltransferases. The methyltransferase domain for a representative member of each topologically distinct family created in PyMOL is shown on the left. Beta strands are indicated by yellow arrows, alpha helices and coils (more ...)
Some proteins in this superfamily contain conserved sequences between Motifs II and III that are methyl-acceptor substrate specific. For instance, the “DPPY” motif is seen in several N-
methyltransferases active on sp2
-hybridized nitrogen atoms in adenosine or glutamine residues [15
], while the “EE” motif is present in protein arginine methyltransferases [23
]. Inserts and deletions to the core structure have also been found to reflect substrate identity [16
]. Yeast histone H3K79 methyltransferase Dot1, originally discovered for its role in telomeric silencing [24
], has several basic residues in the N-terminal domain which bind nucleosomes [19
]; the same stretch is seen in the C-terminal domain of human Dot1 [26
]. To date, Dot1 is the only non-SET histone lysine methyltransferase (see below) and interestingly the only histone lysine methyltransferase which methylates in the globular domain of histones [27
Initially, in silico
searches for novel methyltransferases were performed using known methyltransferase sequences as probes against protein databases with BLAST. The discovery of Hmt1/Rmt1 protein arginine methyltransferase is an example of the success from this approach [28
The shift from whole sequence comparisons to motif-based searches has led to the generation of a comprehensive list of putative seven beta strand (Class I) methyltransferases [21
]. Katz et al.
used MEME [30
] to build position-based amino acid frequency matrices, or profiles, of Motifs I and Post I [31
] from multiple alignments of known methyltransferases and utilized these profiles in a comprehensive MAST [32
] search of the genome [31
]. As a result, the search is based on information from multiple methyltransferases rather than simply amino acids similarity (as in the 20×20 matrix “BLOSUM 62” used in BLAST searching). Methyltransferase domain identification was further refined by Ansari et al.
, who aligned sequences through additional secondary structure information [33
]. In their database search, the authors used hidden Markov model (HMM) profiles that take into account not only the log-odds amino acids frequency but also the frequency of inserts and deletions to account for gaps in the alignment. HMM profiles can be created from large superfamily reference sets (such as all Class I methyltransferases) to identify a general list of proteins, or alternatively can be generated from a specific subclass of proteins to restrict the search. Ansari et al.
specifically identified O-, N- and C- methyltransferases from a non-redundant database using HMM profiles from the methyltransferase domain of polyketide synthase (PKS) and nonribosomal peptide synthetase (NRPS) [33
]. However, such global search profiles spanning the entire methyltransferases domain assign penalties for mismatches between motifs that may leave true previously unidentified methyltransferases undetected.
More recently, a new approach involving motif-based searches along with HMM profile-profile local alignments were used to solve some of the computational hurdles of the past [13
]. Independent matrices describing all of the motifs, including II and III, were developed through better motif identification from either solved methyltransferase structures or from HMM profile primary and secondary structure prediction alignments [13
] using the program HHpred [34
]. Additionally, a novel program, Multiple Motif Scanning (MMS) [101
], was used to rank the yeast database of proteins to the sequence similarity of the methyltransferase motif profiles. Here, the position-based matrices were entered into MMS which includes a parameter for the conserved number of amino acids between the motifs and outputs the overall highest scoring combinations of multiple best-fit motifs [13
]. The success of this program relies on the input matrices; using matrices derived from different methyltransferase reference sets output slightly different rankings and putative methyltransferases. MMS is advantageous for proteins containing multiple ungapped motifs such as methyltransferases because it does not allow for inserts within a motif as do HMM profiles. For example, when we use HHpred to search against the yeast proteome with HMM profiles constructed from the identical motif sequences employed in Ref. 13
we did not detect a number of putative yeast methyltransferases (YMR209C, YLR063W, YKL155C and YNL092W) that were identified using MMS (see ). On the other hand, as shown in , HHpred did find YKL162C and YLR137W that were overlooked by MMS and were not reported previously [13
]. Together, these results highlight the importance of combining these methodologies to create a comprehensive list of putative methyltransferases.
To determine the potential biological function of the candidate methyltransferases, identification of the methyl-accepting substrate would be valuable. Bioinformatic approaches for identifying substrates has so far resulted in a mixed record of success. A widespread approach has been comparing protein sequences across species to reveal homologs. In fact, databases of phylomes such as PhylomeDB contain compilations of phylogenetic trees which can be used to assess the orthological and paralogical relationships of a given protein [35
]. More recently, similarity sequence networks have been used as a high-throughput method for substrate predictions [36
]. Sequence similarities in or outside of methyltransferase motifs that reflect substrate recognition allow for enzymes acting on similar substrates to cluster with each other. HMM profile-profile comparisons with a stringent E value as cutoff (<10-20
) separated yeast methyltransferases into protein arginine methyltransferases, protein glutamine methyltransferases, wybutosine-forming transferases, 2'-O-ribose methyltransferases, cytosine 5-methyltransferases, and small molecule/lipid methyltransferase clusters [13
]. Although not all yeast methyltransferases clustered in this protocol, one can gain useful information when a putative methyltransferase was found grouped with enzymes of known catalytic activity.
A Different SET of Protein Lysine Methyltransferases
Sequence similarity between the plant Rubisco protein lysine methyltransferase and three Drosophila
proteins involved in epigenetics - S
uppressor of position-effect variegation 3-9 (Su(var) 3-9), E
nhancer of zeste (E(z)), and T
rithorax (Trx) - led to the discovery of the family of SET
]. This family includes a number of histone lysine methyltransferases involved in transcriptional control through chromatin structural modification [38
]. These proteins contain the SET domain consisting of eight curved beta strands arranged into three sheets and a characteristic pseudoknot structure. An example of this domain is shown for the MLL1 histone lysine methyltransferase in [39
]. The SET proteins share sequence similarity in the N-terminal (N-SET) and C-terminal domains (C-SET) that contain residues responsible for catalysis, cofactor-binding, and substrate interaction. The first two motifs reside in N-SET while the last two motifs lie in C-SET and form the knot-like structure () [26
]. AdoMet binds to Motif I, N-terminal residues of motif III, and tyrosine in Motif IV [26
]. Interestingly, the “GxG” sequence of Motif I interacts with AdoMet as does the seven beta strand Motif I “GxGxG” sequence despite the lack of any overall structural similarity between SET and seven-beta strand methyltransferases [42
]. The catalytic site is located on the opposite side of the enzyme and includes a key catalytic tyrosine residue in Motif II. Interactions with the lysine substrate occur in the hydrophobic pocket formed by the remaining portions of Motifs III and IV [26
]. It has been hypothesized that variability within this domain defines the substrate and in fact, residues in C-SET have been integral in the determining mono-, di-, or trimethylation of the enzyme. Point mutations in the “Y/F switch” have proved successful in converting SET7/9 to a di-/tri-methyltransferases [43
], SET8 to a dimethyltransferase [44
], and Dim5 to a mono-methyltransferase [45
Amino acid sequences between the N-SET and C-SET domains are highly variable among the SET superfamily and have been dubbed the I-SET region. I-SET residues can interact with the substrate [46
]. In fact, several non-histone methyltransferases have a “SRA” motif in the I-SET domain [48
]. I-SET is not always indicative of the binding ligand; two pairs of enzymes – SUV39H1 and SETDB1, and SET7/9 and MLL – have non-homologous I-SET despite sharing identical substrates [46
]. Many SET proteins also have a Pre-SET and Post-SET domain composed of several conserved cysteine residues that coordinate zinc ions in triangular clusters. The function of these domains is not clear, although Post-SET seems to shape the channel for the lysine substrate. In enzymes that lack the cysteine-rich Post-SET, such as SET7/9 and Rubisco, additional alpha helices are oriented to create this channel [26
]. Several SET methyltransferases also have additional domains such as PWWP, PHD, and SANT that appear to recruit the chromatin substrate [49
Yeast candidate SET-domain methyltransferases, including YHL039W and YBR030W (now Rkm3), were identified by Porras-Yakushi et al.
through reiterative PSI- and PHI-BLAST searches [48
]. However, the inherent nature of BLAST searching does not lead to a single list of SET methyltransferases; instead, two self-contained “subfamilies” of proteins were found [48
]. When we now search with HHpred using SET protein sequences compiled from the SMART 6 database [50
], we find that we can produce a single list of all of these methyltransferases proteins (). Additionally, profiles obtained from the same reference dataset using only MEME-derived matrices of Motifs I-IV in MMS also identified all of these proteins, confirming their identification through the SET domain (data not shown).
The “subfamilies” described by Porras-Yakushi [48
] ultimately differentiated between what has been described as Class I-VI histone and Class VII non-histone methyltransferases based on their substrate specificity (). Initially, four classes of SET proteins were discovered through BLASTP searches and ClustalW clustering analysis of Arabidopsis genome using Drosophila
genes E(z) (Class I, H3K27), Ash1 (Class II, H3K36), Trx (Class III, H3K4), Su(var)3-9 (Class V, H3K9) [51
]. Springer et al.
expanded this analysis to include other genomes and with an updated SET protein list [52
]. These authors identified Class IV of proteins that contain PhD finger but lack Pre-SET and Post-SET, and several proteins whose I-SET domain was extended, dubbed the disrupted S-ET proteins. The S-ET proteins were later divided into two classes: Class VI histone and Class VII nonhistone proteins that includes Rubisco, cytochrome c, and ribosomal proteins [52
]. This classification of SET proteins based on the families or substrates of methylation may need to be expanded in the future upon the discovery of new methylation sites and SET proteins. In fact, methylation on substrates H1K26 and H4K20 have been recently discovered in mammalian cells [46
We have now created individual HMM profiles of each SET Class from the reference set of proteins in Ref. 53
and performed an HHpred search against the complete yeast protein database (). Every one of the twelve yeast SET methyltransferases fit into its appropriate class: H3K36-methylating Set2 in Class II, H3K4-methylating Set1 in Class III, Set3 and Set4 (which contain PHD domains) in Class IV, the interrupted domain Set5 and Set6 in Class VI, and lastly ribosomal methyltransferases Rkm1-4, YHL039W and Ctm1 in Class VII (). Interestingly, the substrates of yeast Set3, Set4, Set5, and Set6 proteins are not known. It appears likely that these will be histone lysine methyltransferases but it will be important to confirm this tentative identification experimentally.
SPOUTing Additional RNA Methyltransferases
The SPOUT methyltransferase family was first described based on the primary sequence and predicted secondary structural similarities of bacterial SpoU and TrmD methyltransferases [55
]. This new topology of methyltransferases became apparent with the solved structures of RrmA and RlmB [56
] revealing a characteristic knot distinct from SET methyltransferases. To date, SPOUT methyltransferases have been found to exclusively methylate RNA. Members of this family may thus possibly methylate RNA species involved in epigenesis. The core structure consists of a beta sheet with five parallel beta strands in a 5-3-4-1-2 orientation between two layers of helices. An example of this structure is given for TrmH in [58
]. A partial Rossmann-like fold similar to that in the seven beta strand (Class I) methyltransferases is formed by the first two N-terminal strands; variability can exist with additional alpha/beta units in this region. Unlike the seven beta strand (Class I) enzymes, AdoMet binds to the C-terminal alpha-beta “trefoil” knot that characterizes the SPOUT superfamily [12
Primary sequence similarity is not very strong among the members of this superfamily that is largely defined by its tertiary structure [12
]. Nonetheless, common motifs have been described. Motif 1 is not widely conserved among all subclasses of SPOUT methyltransferases but contains amino acids integral for tRNA binding, the release of AdoHcy, and catalysis [59
]. The latter residues of β3 bind AdoMet (here termed Motif Post 1; ). Although the topology of SPOUT methyltransferases is unique from the seven beta strand (Class I) and SET enzymes, Motif 2 of the SPOUT domain has several shared residues with both of these classes: the glycine rich coil proceeding β4 binds both the tRNA substrate along with AdoMet and the catalytic glutamyl residue is catalytic much like the asparagine/aspartate in the seven beta strand (Class I) β4 and the asparagine in the SET Motif III [12
]. Motif 3, originally described as the coil preceding β5 [59
], can be expanded to include an extended helix with a catalytic tyrosine, and is involved in AdoMet-binding and catalysis [12
] ( ). The active site is created upon dimerization, and additional catalytic residues for SPOUT methyltransferases are family specific and lie on the antiparallel or perpendicular mode of dimerization. Like the SET superfamily, several SPOUT methyltransferases have additional domains flanking the SPOUT domain including, not surprisingly, THUMP, OB-fold, L30e, and PUA domains that are associated with nucleic acid binding or modification [12
Tkaczuk et al.
have also used similar computational techniques to identify new SPOUT methyltransferases [12
]. Crystal structures of known SPOUT methyltransferases were collected and were used to search the PDB with DALI to find proteins with similar structures [12
]. PSI-BLAST searches using different members of COG families were performed on a non-redundant database to discover previously unidentified putative SPOUT methyltransferases, which were corroborated by secondary structural predictions [12
]. HMM profiles of aligned sequences were created and searched by HHpred to identify as many protein families with even remote similarities to the SPOUT domain, where proteins were further validated by reciprocal searches and fold-recognition methods [12
]. These methods identified known yeast methyltransferases Trm10, Mrm1, Trm3 as well as putative methyltransferases Emg1, YGR283C, YMR310C. The crystal structure of Emg1 later confirmed these predictions [60
]. We have also used these methods to predict YOR021C as an additional yeast putative SPOUT methyltransferase ().
The pairwise PSI-BLAST searches performed by Tkaczuk et al.
revealed a core “supercluster” of five COG families along with four satellite clusters that are all 2’-O- methyltransferases [12
]. Therefore, proteins such as Escherichia coli
YibK, LasT, and YfiF were predicted to be 2’-O-ribose methyltransferases [12
]. Interestingly, we find that yeast Tan1, currently annotated as a putative tRNA acetyltransferase, has high similarity by HHpred to the one of these satellite clusters (COG1818; e = 1.6-20
, p = 2.8-24
), indicating that it may be a 2’-O-ribose methyltransferase as well (). Additionally, enzymes responsible for m1G and m3U methylation form independent clusters which were distinct from the other COG groups [12
]. These analyses may thus reveal the substrate specificity of a putative methyltransferase.
Hitting Three Methyltransferase Superfamilies with One Stone
Although most methyltransferases are found in the seven beta strand, SET domain, and SPOUT families, there are, however, a number of these enzymes that have other types of structural folds. Interestingly, the crystal structure of a single enzyme, cobalamin-dependent methionine synthase (MetH), has given insight into three additional distinct classes of AdoMet-binding methyltransferases [62
]. This enzyme uses the methyl group of N-5-methyl-tetrahydrofolate to produce methionine from homocysteine through a methylcob(III)alamin intermediate. These classes include the MetH-reactivation domain, the homocysteine methyltransferases, and radical SAM methyltransferases .
AdoMet binds to the reactivation domain; its methyl group is then transferred to the oxidized B12 cofactor on a separate domain [62
]. The unique arrangement of this AdoMet-binding domain can be best described as a twisted center beta strand surrounded by several shorter antiparallel beta-strands forming two perpendicular sheets () [63
]. AdoMet binds to the helices and coils in the middle of this C-shaped structure, specifically the acidic residue of α2, RLAEAF in α6, the RPAPG coil following α7, and a C-terminal aromatic residue [63
]. Interestingly, we find that the AdoMet-binding domain of MetH does not show homology to any protein in yeast by sequence analysis using HHpred (). It is presently unclear whether this domain architecture is utilized in any other methyltransferase reactions; although we did not find any homologs by fold recognition programs utilizing automated modeling (MODELLER [64
]) or threading approaches (PHYRE [65
The second methyltransferase domain illuminated by methionine synthase is the homocysteine-binding domain. Our HHpred searches using this N-terminal domain of MetH as a probe against the yeast protein database detects the yeast homocysteine methyltransferase family proteins Mht1 and Sam4 with very high sequence similarity (). Additional searches with the homocysteine COG group against the yeast proteome confirms this observation (). Mht1 and Sam4 catalyze the same homocysteine to methionine reaction as MetH but utilize AdoMet or S
-methylmethionine as methyl donors [66
]. The similarity in sequences of these enzymes suggest a similarity in overall structures as well. The homocysteine-binding domain of MetH is composed of a beta-barrel from eight parallel strands () [67
]. A zinc ion is also bound to the structure and functions in MetH to draw the cobalamin closer to the catalytic domain as well as activate the thiol for nucleophilic attack. The metal coordinates with tetrahedral geometry with three cysteines following β6 (GXNC) and β8 (GGCC) with the last binding partner being either substrate homocysteine or a nitrogen/oxygen containing side-chain residue of β7 (N in the case of MetH) [67
]. Interestingly, the latter half of this domain, is homologous to YMR321C. It is unclear whether YMR321C is a putative methyltransferases; the AdoMet-binding domain of these proteins remains to be determined.
Finally, the cobalamin-binding domain of MetH is often present in proteins that also include the “radical SAM” domain [69
]. Radical SAM enzymes generally form methionine and the deoxyadenosine radical from AdoMet, where crystal structure determinations have demonstrated a TIM barrel domain () [70
]. These proteins are distinguished by their CxxxCxxC motif, which is used to bind an iron sulfur cluster necessary for radical generation. Although many of these family members catalyze non-methyltransferase reactions (typically involving the deoxyadenosyl radical formed by a one electron transfer to AdoMet), there are at least several members that are known to participate in methylation reactions despite the fact that the mechanisms of these transfers are still unclear [71
]. These include the florfenicol/chloramphenicol resistance protein (Cfr), the fortimicin methyltransferase (fmrO), and the fosfomycin methyltransferase (Fom3) [72
]. Radical SAM methyltransferases are difficult to distinguish from other radical SAM enzymes by sequence analysis. This was highlighted by our HHpred searches against the yeast proteome using multiple alignments of radical SAM methyltransferases found in the RefSeq database [73
] (). This search identifies apparent non-methyltransferases including the Bio2 biotin synthase, the C-terminal portion of Elp3 – a histone acetyltransferase thought to also be involved in histone demethylation initiated by 5′-deoxyadenosyl radical, Lip5 involved in biosynthesis of the lipoic acid, and Tyw1 in the wybutosine pathway (). Further work will be needed to ask if there are additional methyltransferases in the radical SAM family in other organisms.
Two Additional Distinct Classes of Methyltransferases – Structures to be Determined
Although their three-dimensional structures are currently unknown, membrane-bound methyltransferases share no sequence homology to structurally solved methyltransferases. Biochemical studies of the isoprenylcysteine carboxyl methyltransferases Ste14 has lead to a topology model describing its structure as six membrane spans, with two forming helical hairpin [75
]. The conserved region A contains motif RHPxYxG that is trailed by a hydrophobic stretch ending in two conserved adjacent glutamates in region B. This C-terminal domain, where five of six point mutations lead to a loss-of-function, is conserved not only in isoprenylcysteine carboxyl methyltransferases but also phospholipid methyltransferases. Interestingly, searches of Ste14 through BLAST [75
] and HHpred yield yeast phospholipids methyltransferases Opi3 and Cho2 as well as several fatty acid/steroid reductases and the C-terminal residues of ergosterol biosynthetic enzymes Erg4p and Erg24p. However, when we searched the database using multiple alignments of the proteins in the PEMT family present in the Pfam database [76
], this list only included Opi3, Ste14, and Cho2 ().
Evidence has been presented for a final class of methyltransferases represented by enzymes that modify the N-6 position of adenosine in mRNA [77
]. The yeast Ime4 protein appears to be in this group. Weak sequence similarity is found with the “DPPY” motif between Motifs II and III of some Class I seven beta strand N-methyltransferases. Our HMM analysis using the Ime4 protein sequence as probe against the yeast database indicates that this family includes members Kar4 and YGR001C (). Further work will be needed to confirm the methyltransferase activity of these proteins.