|Home | About | Journals | Submit | Contact Us | Français|
Human ASXL proteins, orthologs of Drosophila Additional sex combs, have been implicated in conjunction with TET2 as a major target for mutations and translocations, leading to a wide range of myeloid leukemias, related myelodysplastic conditions (ASXL1 and ASXL2) and Bohring-Opitz syndrome, a developmental disorder (ASXL1). Using sensitive sequence and structure comparison methods, we show that most animal ASXL proteins contain a novel N-terminal domain that is also found in several other eukaryotic chromatin proteins, diverse restriction endonucleases and DNA glycosylases, the RNA polymerase delta subunit of Gram-positive bacteria and certain bacterial proteins that combine features of the RNA polymerase α-subunit and sigma factors. This domain adopts the winged helix-turn-helix fold (wHTH) and is predicted to bind DNA. Based on its domain architectural contexts, we present evidence that this domain might play an important role both in eukaryotes and bacteria in the recruitment of diverse effector activities, including the the deubiquitinase subunit of Polycomb repressive complexes, to DNA, depending on the state of epigenetic modifications such as 5-methylcytosine and its oxidized derivatives. In other eukaryotic chromatin proteins, wHTH domain is fused to a region with three conserved motifs, which are also found in diverse eukaryotic chromatin proteins, such as the animal BAZ/WAL proteins, plant HB1 and MBD9, yeast Itc1p and Ioc3, RSF1, CECR2 and NURF1. Based on the crystal structure of Ioc3, we establish that these motifs in conjunction with the DDT motif constitute a structural determinant that is central to nucleosomal repositioning by the ISWI clade of SWI2/SNF2 ATPases. We also show that the central domain of the ASXL proteins (ASXH domain) is conserved outside of animals in fungi and plants, where it is combined with other domains, suggesting that it might be an ancient module mediating interactions between chromatin-linked protein complexes and transcription factors and UCH37-like deubiquitnases via its conserved LXXLL motif. We present evidence that the C-terminal PHD finger of ASXL protein has peculiar structural modifications that might allow it to recognize internal modified lysines other than those from the N terminus of histone H3, making it the mediator of previously unexpected interactions in chromatin.
Covalent modification of histones and other chromatin proteins are central to the dynamics and organization of eukaryotic chromatin and influence practically all nuclear processes, including replication, recombination, repair, transcription, gene silencing and RNA maturation.1–4 Of the large number of covalent histone modifications, acetylation and methylation had already extensively developed in the last eukaryotic common ancestor and are essential for eukaryotic life.5 Central to the biological significance of these modifications is their role as epigenetic markers that encode a level of information over and beyond the basic genetic information in the genome.2,3,6,7 Alongside these protein modifications, several eukaryotes display a range of DNA modifications that also play an important epigenetic role.8,9 The most prevalent and best-studied of these is DNA cytosine C5 methylation. Unlike the protein methylation or acetylation, this modification is not universal, showing a patchy distribution across the eukaryotic tree.8,9 Nevertheless, in several eukaryotes, such as vertebrates and plants, DNA methylation has evolved functions that are essential for the survival of the organism.10–14 This appears to stem from the pervasive use of cytosine methylation as an epigenetic marker, especially in the context of preserving germline integrity and in the specification of differentiated cell lineages.14–16 More recently, it has become clear that in several eukaryotes 5mC in DNA is the target for further oxidative modifications by the Tet/JBP family of nucleic acid dioxygenases; these modifications play a role both as epigenetic marks in their own right (e.g., 5-hydroxymethylcytosine) and also as intermediates (formyl and carboxy cytosine) in a pathway leading to the demethylation of 5mC.17–20 Several recent studies have suggested that both the establishment and the subsequent consequences of cytosine methylation are intimately intertwined with the phylogenetically more widespread histone modifications and the deployment of nucleosomes with alternative histone paralogs (e.g., H3K9 trimethylation and macroH2A-containing nucleosomes).21–23
Historically, the genetics and molecular biology of the regulation of the homeotic genes in Drosophila played a central role in the identification of chromatin complexes involved in histone modification and nucleosomal reorganization.1 This led to the identification of the polycomb group and the trithorax group genes, whose products, respectively, constitute key complexes involved in catalyzing the addition and removal of multiple his-tone modifications. Studies on cognates of these genes in vertebrates uncovered a similar conserved role in development, with germline mutations in these genes resulting in homeotic transformations.2–4,23 In contrast, somatic mutations and chromosomal translocations involving these genes in humans have also emerged as major players in numerous leukemias, pointing to an important role for epigenetic modifications in various hematopoietic developmental decisions.24–26 Interestingly, somatic mutations and translocations in genes encoding enzymes catalyzing epigenetic DNA modifications, such as Tet2 and DNMT3A, and metabolic enzymes, like isocitrate dehydrogenase 1/2, which are associated with the generation of 2-oxoglutarate, the metabolite used by the Tet/JBP dioxygenases, also play a major role in the initiation and progression of these leukemias.24,27–29 This reinforces the importance of the functional interaction between these two forms of epigenetic information. In particular, involvement of mutant and fused Tet/JBP dioxygenases suggests that the genetics of these leukemogenic mutations might throw light on the link between the further oxidative processing of 5mC in DNA and the complexes involved in histone modifications.
In this regard, recent studies have shown that ASXL1, one of the three human orthologs of the Drosophila Additional sex combs (Asx) gene,30–33 an enhancer of both the polycomb and trithorax group genes,34 is a major target of mutations leading to myeloproliferative and myelodysplastic conditions, including acute myeloid leukemia, polycythemia vera, essential thrombocythemia and primitive myelofibrosis.24,35–40 ASXL1 mutations along with those in another polycomb group gene EZH2 are also indicative of poor overall survival in patients with myelodysplastic syndromes, independent of other known risk factors. Furthermore, it has been observed that mutations in the Tet/JBP dioxygenase, Tet2, in acute myeloid leukemia are closely associated with mutations in ASXL1 but mutually exclusive with IDH mutations.38 This suggested that the ASXL1 might have an agonistic role in relation to Tet2. A translocation fusing another Asx paralog, ASXL2 with the histone acetyl transferase gene MOZ, has also been implicated in the myelodysplastic syndrome.41 Recent studies have also shown that Drosophila Asx and human ASXL1 together with the deubiquitinase (DUB) subunits Calypso and BAP1, respectively, constitute a histone H2A-specific DUB complex that associates with other polycomb group proteins to regulate the levels of H2AK120 monoubiquitination by the polycomb repressive complex 1.30 Consistent with such a role, germline mutations in ASXL1 results in the Bohring-Opitz syndrome, with characteristic developmental defects reminiscent of the developmental defects arising from polycomb group dysfunction in vertebrates and Drosophila.42 In line with the Drosophila genetic studies that implicate Asx in both trithorax- and polycomb-related roles, studies in mammals have shown that ASXL1 and ASXL2 can function as coactivators or corepressors for different nuclear receptor transcription factors.43–45 Together, these observations suggest that ASXL proteins serve as a potentially important nexus between regulatory processes involving different protein and DNA epigenetic marks in different aspects of chromatin dynamics.
Thus far, ASXL proteins are only known from animals.46 To better understand the function of the ASXL proteins and to investigate the occurrence of homologous proteins in other organisms, we analyzed these proteins using sensitive sequence and structure analysis methods. As a consequence, we were able to identify a novel winged helix-turn-helix (wHTH) domain related to the RNA polymerase δ subunit of Gram-positive bacteria in most animal ASXL proteins (except the Drosophila Asx protein). We were also able to show that related wHTH domains are widely distributed in eukaryotes and bacteria, including in association with diverse restriction enzymes. Further analysis of the domain architectures of the ASXL suggested that this domain, in conjunction with other associated domains, might be an ancient functional element that links epigenetic marks with different effector functions in both bacteria and eukaryotes.
Analysis of the vertebrate ASXL proteins using the SEG program reveals that it contains three globular regions. Sequence searches showed that these three regions are conserved through most of metazoa (from cnidarians to vertebrates), with the exception of insects and sponges, which appear to have lost the N-terminal globular region (Fig. 1). Of these, the N-terminal globular region showed no apparent relationship to previously characterized domains. The central globular region (termed the ASXH domain), predicted to constitute an α-helical domain, contains a conserved LXXLL motif, which has been detected previously in diverse transcription factors (TFs), coactivators and corepressors and is implicated in mediating interactions between them.46,47 Indeed, this region has been shown to mediate interactions between the vertebrate ASXL1/2 protein and nuclear receptor TFs.43,44 The C-terminal globular region corresponds to a derived version of the PHD finger (see below), a peptide-binding module commonly present in chromatin proteins.48,49 To better understand the affinities of the N-terminal globular region of the ASXL proteins, we initiated sequence profile searches using the PSIBLAST program. For example, a search with the human ASXL N-terminal region (residues 10–100) recovered, in addition to orthologous and paralogous ASXL proteins from animals, several other proteins: the NIAM/TBRG1 protein from fishes, cnidarians and sponges (e = 10−8, iteration 2), the HB1 homeodomain protein from plants (e = 10−4, iteration 4), and the restriction endonuclease HpyAIII (HgrA) from Helicobacter pylori (e = 10−3 iteration 6) in addition to uncharacterized proteins from plants, red algae, chlorophyte algae and bacteria prior to convergence. Additional transitive searches with the above-detected regions homologous to ASXL1 using the PSI-BLAST and JACKHMMER programs uncovered numerous additional proteins from bacteria, including the RNA polymerase δ subunit (RpoE) of Gram-positive bacteria and several restriction enzymes. This relationship was further confirmed using profile-profile comparisons with the HHpred program, which recovered the profile derived from the structure of the Bacillus subtilis RpoE (PDB: 2krc) 50 as the best hit (e = 10−12, p = 95%) when searched with a profile derived from the animal ASXL N-terminal globular regions. The alignment with RpoE corresponded precisely to its N-terminal globular domain that adopts a winged helix-turn-helix HTH (wHTH) fold.50 We accordingly named the conserved domain homologous to the N-terminal region of the ASXL proteins the HARE-HTH, for HB1, ASXL, restriction endonuclease HTH domain, based on the proteins in which it was detected.
Examination of the alignment and its superposition on the structure of RpoE showed that the HARE-HTH domain conforms to the classical three-stranded wHTH template: the three core helices form the HTH unit with the “wing” sheet formed by the C-terminal hairpin augmented by the extended region between helix 1 and 2.50,51 In structural terms, the HARE-HTH domain is distinguished from other previously characterized wHTH domains by the presence of a conserved element comprised of a single turn of a helix between helix 3 and preceding conserved turn (Fig. 2). This element alters the packing of helix 3 with the two N-terminal helices and also distorts its orientation when compared with most other wHTH domains. The HARE-HTH domain is also distinguished from other wHTH domains by its distinctive conservation pattern, namely an alcoholic residue (S/T) at the N terminus of helix-1 and an exposed S/T/C in helix 3 (Fig. 2). These two positions are spatially proximal and might define a distinctive interaction surface unique to the HARE-HTH domains. However, beyond these features the HARE-HTH domains show considerable sequence diversity, which might also reflect some diversity in term of their binding specificities.
To better understand the functions of the HARE-HTH domain we systematically analyzed the domain architectural contexts in which it occurred and integrated this information with data from studies on the RNA polymerase structure in bacteria (Fig. 1). In animals, in addition to the ASXL proteins described above, the HARE-HTH is found in the fish, cnidarian and sponge NIAM/TBRG1 proteins (Fig. 1), where it is combined with the SJA/Fyr domain that is frequently encountered in diverse eukaryotic chromatin proteins.5,52,53 The SJA/Fyr shares a common fold with the phosphopeptide-binding Polo domain and is predicted to bind epigenetic phosphorylation marks in histones.5 Interestingly, the HARE-HTH has been lost in the tetrapod and insect orthologs of NIAM/TBRG1. In plants and chlorophyte algae (e.g., Arabidopsis HB1 and Chalmydomonas HDZ1), the HARE-HTH is fused to a DNA binding homeodomain or a WAC-tyrosine kinase domain that are found in several eukaryotic chromatin proteins.54–56 These proteins also display a DDT motif and additional previously unrecognized conserved C-terminal motifs found in several other chromatin proteins (see below). Chlorophytes also contain a second type of HARE-HTH protein, wherein it is fused to a C-terminal α-helical domain similar to that found at the C terminus of the SMYD-type SET domain lysine methyltransferases, which potentially mediates interactions with targets (Fig. 1). In red algae (e.g., Cyanidioschyzon merolae) the HARE-HTH is linked to two N-terminal PHD fingers, similar to those seen in certain SWI2/SNF2 ATPases of the ISWI clade. Furthermore, the HARE-HTH is also found at the N terminus of the Micromonas ortholog of the DNA-demethylating glycosylase, Demeter (gi:255072753), and in two copies flanking a pair of methylated histone-binding Agenet/chromo-like domains in Ostreococcus (gi:308803789). Thus in eukaryotes, the HARE-HTH occurs as a rule in multidomain proteins, with architectures strongly indicative of chromatin-related roles (Fig. 1). Among the bacterial versions, the HARE-HTH of Gram-positive RpoE is accompanied by a highly acidic C-terminal low-complexity tail.57 Certain proteobacteria also contain a version that is comparable to RpoE that instead has an acidic low-complexity tail at the N terminus. Most remarkable are the proteins found sporadically in actinobacteria, firmicutes and proteobacteria that combine a C-terminal HARE-HTH with (1) an N-terminal module containing two or more repeats of the specialized helix-hairpin-helix (HhH) domain found in the C-terminal module of the bacterial RNA polymerase α-subunit (CTD);58 (2) Two additional HTH modules that are specifically related to those found in the region 3 and 4 of the sigma factors59 (Fig. 1; Sup. Material). Thus, these proteins combine parts of the architecture of the RNA polymerase α and σ subunits with the HARE-HTH in a single polypeptide (Fig. 1). In diverse bacterial lineages, at least six distinct restriction endonuclease (REase) domains, namely HpyAIII-like NmeDI/NmeDIP, Mrr-like, HinDIII-like, HNH/EndoVII and URI (UvrC repair protein-intron homing endonuclease) domains, are combined with one or more N-terminal HARE-HTH domains (Fig. 1 and Sup. Material). The majority of these REases are encoded in operons with linked genes for modification methylases (MTases of the adenine N6/cytosine N4 targeting family), suggesting that they might function in conjunction with them (Fig. 1). Interestingly, the HARE-HTH is also fused to the N terminus of a circularly permuted adenine N6 MTase found in certain restriction-modification systems (Fig. 1). In a similar vein, in certain bacteria, the HARE-HTH is also fused to the N terminus of a DNA glycosylase domain of the UDG superfamily that is prototyped by the human Tdg and Escherichia coli Mug and Ung (Fig. 1). Likewise, in certain bacteria, this also occurs to the N terminus of a poly-nucleotide phosphatase/kinase enzyme with the Nudix fold and P-loop kinase domains.
The domain architectures of the bacterial representatives, in particular, the stereotypic location at the N termini of multiple phylogenetically distant or structurally unrelated REase and DNA-modifying domains, suggest that the HARE-HTH might function as a DNA binding domain. Given that REase and DNA glycosylase target sites are characterized by modified nucleotides or mismatches, it is conceivable that these HARE-HTHs recognize sequences with specific DNA modifications or altered structures. As the HpyAIII REase recognizes GATC sites,60 it is conceivable that the binding of the HARE-HTH domain might be affected by modifications in adenine or cytosine residues. The bacterial proteins that combine the RNA polymerase α CTD module, the σ factor region 3 and 4 HTH domains with the HARE-HTH are striking, because an examination of the RNA polymerase holoenzyme complex with the transcription start site (TSS) shows that these modules indeed occupy successive sites on the DNA just upstream of the TSS.61,62 Thus, these proteins are predicted to function as mimics of the α and σ subunits, with the C-terminal HARE-HTH potentially occupying yet another site upstream of the TSS. Accordingly, these proteins could possibly function as a novel inhibitor of bacterial RNA polymerase TSS binding that might either function as a negative regulator, or a suppressor of improper transcription initiation. This is comparable to the role of the Gram-positive RNA polymerase δ subunit, which has been shown to bind the RNA polymerase catalytic complex, reduce its affinity for nucleic acids and increase transcription specificity by promoting recycling.57 Specifically, the δ subunit inhibits the downstream propagation of the transcription bubble at the −10 region, with its acidic C-terminal tail mimicking RNA and interacting with the RNA polymerase catalytic complex.63 While the HARE-HTH of the δ subunit has not yet been demonstrated to bind DNA or RNA, it is possible that it recognizes a distorted DNA structure associated with the transcription bubble.57,63 In eukaryotes, the phyletic pattern of the HARE-HTH, like the TAM(MBD) domain, is strongly correlated with the presence of extensive gene body methylation (e.g., presence in several plants and animals but absence in fungi).9,21,22 This, in conjunction with its genetic interaction in human myeloid lineage disorders with Tet2 and the presence of an algal Demeter-like DNA demethylase, suggests that, like some of the bacterial versions, the eukaryotic versions might also play a role in discriminating sequences with particular DNA modifications, such 5mC or its further oxidized derivatives synthesized by the Tet enzyme. A parallel is also provided by the JBP1-C domain, another distinctive HTH domain, which recognizes base J, the hydroxylated thymine derivative generated by the JBP enzymes in kinetoplastids.17,64 In this context, it might be noted that pathological MOZ-ASXL2 fusion results in deletion of part of the N-terminal helix of the ASXL2 HARE-HTH domain.
The ASXH domain. Thus far the ASXH domain, i.e., the central globular domain of the ASXL proteins, has only been reported in animals.46 Using sequence profile searches with the PSI-BLAST and hidden Markov model searches with the JACKHMMER program we were able to identify orthologous domains in fungi and plants suggesting that this domain goes back to their common ancestor (Fig. 1 and Sup. Material). The fungal versions are combined with a fungus-specific C-terminal α-helical domain, whereas the plant versions are combined with a N-terminal DNA binding GATA Zn-finger domain (Fig. 1). An additional ASXH domain is also found in the animal NFRKB/Ino80g protein, a subunit of the INO80-Uch37 chromatin-associated DUB complex. In all these cases, the ASXH domain strongly conserves the LXXLL motif (Sup. Material), suggesting that it is central to the interactions mediated by this domain. Given the role of the LXXLL motif in mediating interactions between TFs, co-activators and co-repressors,47 it is likely that the ASXH domain plays a conserved role in mediating contact between transcription factors and chromatin-associated complexes. Interestingly, in the plant versions, the GATA Zn-finger takes the place of the HARE-HTH, suggesting that in both plants and animals, recognition of specific features on DNA might be linked to interactions of the ASXH domain with other proteins. The regions shown to be required for the interaction of the Drosophila Asx and human ASXL1, respectively, with Calypso and BAP1 encompass the ASXH domain.30 Given that the ASXH domain is the only common denominator shared by the Drosophila and human proteins in this region, it is likely that it also mediates the interaction of ASXL with these DUBs. Phylogenetic analysis suggests that the Calypso/BAP1 clade of DUBs branched off in animals from within a more widely distributed clade of DUBs typified by the UCHL5/UCH37 proteins (Sup. Material). This suggests that if the interaction between the ASXL proteins and Calypso/BAP1-like proteins is indeed mediated by the ASXH domain, then it might be an animal-specific feature of the domain. Another potential animal-specific binding partner of the ASXL proteins is prototyped by the Drosophila Tantalus protein,34 whose orthologs we identified for the first time in other metazoans (Sup. Material). The ASXH domain found in fungi and plants might mediate interactions comparable to those of the animal ASXL proteins, such as those with TFs and DUBs of the UCH37-like clade.43,44
The unusual PHD finger of the ASXL proteins. The PHD finger is a pan-eukaryotic domain that principally binds the N-terminal tail of histone H3 by recognizing either a methylated or unmodified H3K4.5,49,65–68 Certain versions of the latter have evolved further features to recognize methylated H3K9, while yet others might bind acetylated histone via unconventional interfaces.69,70 The structure of the PHD finger is characterized by a binuclear version of the treble-clef fold, whose typical versions have several special features that specifically allow it to recognize N-terminal tails of histones. One of these key features is the presence of a “bilobed brace” immediately downstream of the C-terminal helix of the core treble-clef structure that only allows a N-terminal peptide to be accommodated via hydrogen bonding with the ascending arm of the treble-clef β-hairpin.71 The ability to bind methylated lysines is primarily conferred by the conservation of certain aromatic and hydrophobic residues at certain positions in the core treble clef.5,48,64–67 A multiple alignment of the PHD finger in the ASXL proteins with a panel of 150 distinct PHD fingers from diverse eukaryotes shows that it contains a methionine three positions upstream of the 3rd cysteine and an aromatic residue two positions upstream of the only conserved histidine (Fig. 2 and Sup. Material). Based on the known structures, the residues at the above-stated positions are suggestive of the ASXL PHD fingers binding methylated lysines. However, in contrast to practically all other PHD fingers, the ASXL PHD finger lacks the bilobed brace. This suggests that it does not have the constraints to limit its binding to the N-terminal histone H3 peptide (Fig. 2 and Sup. Material). This is also supported by the lack of certain other conserved features that recognize conserved residues in the N-terminal peptide of H3. Hence, we suggest that the PHD finger of ASXL might be distinctive in being able to recognize internal methylated lysines rather than just those in the N-terminal tail of histone H3. It is possible that this function might have a role in linking other internal histone methylations (e.g., H3K27 catalyzed by the Polycomb repressive complex 2) to H2AK120 monoubiquitination.5 Unlike the ASXL proteins, in red algae the HARE-HTH is fused to a conventional PHD finger (Fig. 2), suggesting that in these organisms, the action of the HARE-HTH might be coupled with the recognition of the modification status of the N terminus of histone H3.
The WHIM motifs. The plant HB1 and chlorophyte HDZ1 proteins and certain chlorophyte proteins (e.g., Volvox; gi: 302828732), which, respectively, combine either a homeo- or a WAC-tyrosine kinase domain and a DDT motif with the HARE-HTH, show a distinctive C-terminal region with three conserved motifs separated by low-complexity regions (Fig. 1). Further iterative searches with these motifs using the PSI-BLAST and JACKHMMER regions revealed that they are also present in in the Arabidopsis flowering regulator MBD9 where they combine with a TAM(MBD) domain that binds methylated CpG dinucleotides in DNA.72 These searches also recovered these three motifs in the yeast Itc1p, a subunit of the nucleosome repositioning complex Isw2.73 An improved profile including all these above proteins recovered the three motifs in the BAZ/WAL proteins (including the Williams-Beuren syndrome transcription factor), BPTF, the RSF1 subunit of the chromatin assembly factor RSF, Nurf-1, Enhancer of bithorax (Nurf301), Toutatis, CECR2, Dikar and several uncharacterized proteins from fungi, chlorophytes, stramenopiles, ciliates and kinetoplastids (Figs. 1 and and33). In all these proteins, the three motifs were always found in the same collinear order and were preceded by a DDT domain, though in certain cases additional domains such as the PHD finger might be inserted between the first and the second of the conserved motifs. The second and third of these motifs partly overlap the regions of conservation named “BAZ1” and “BAZ2” that were defined in the BAZ proteins of animals, i.e., BAZ1A, BAZ1b, BAZ2A and BAZ2B.74,75 However, the similarity between these regions in the BAZ proteins the other proteins detected in the above searches has not been previously reported. Hence, we named these three motifs the WSTF, HB1, Itc1p, MBD9 (WHIM) motifs 1–3. In the Drosophila protein ACF1, ortholog of BAZ1A, the region including the DDT domain and the three WHIM motifs, has been shown to be required for its interaction with SNF2.76,77 Likewise, deletion mapping has showed that in the BAZ2B protein, the conserved region BAZ1, which overlaps with WHIM motif 2, is required for binding SNF2H the human ortholog of ISWI.75 Consistent with this, the animal BAZ proteins, BPTF, RSF1, Nurf-1, CECR2, Dikar and the yeast Itc1p are all partners of ISWI-like SWI2/SNF2 ATPases and considerably increase their nucleosome mobilization capability in diverse nucleosome reorganizing complexes, such as ACF/WRCF, CHRAC, RSF, NoRC, WICH and their cognates in other eukaryotes.73,76–85 Therefore, one of the likely roles of the WHIM motifs, in conjunction of with the DDT domain, would be to mediate interactions with ISWI-like ATPases.
To gain better understanding of the action of the WHIM motifs, we analyzed their sequence conservation features and the domain architectures of the proteins in which they are found. First, a subset of these proteins, such as the animal BAZ1A/B-like proteins, and fungal Itc1p possess an N-terminal WAC tyrosine kinase domain that has been shown to phosphorylate Y142 of H2A.X.56 Second, most of these proteins have multiple domains for the recognition of histone H3 N-terminal peptides (PHD finger), acetylated histone peptides (bromodomains), monoubiquitinated peptides (the “little finger” type Ub-binding Zn-ribbon), phosphorylated peptides (SJA/FYR) and methylated peptides (AGENET, BMB/PWWP and AUX-RF, a novel Chromo-like domain) 1,5,86,87 (Fig. 1). Additionally, others, like HB1 and MBD9 in plants, BPTF, BAZ2A/B, CECR2 in animals and previously uncharacterized proteins in chlorophytes and stramenopiles contain DNA binding domains such as the HARE-HTH, histone H1, CENB-HTH, TAM(MBD), homeo, HMG, BRIGHT, CXXC and AT-hooks (Fig. 1). Of these, the TAM(MBD) domain in the plant MBD9 proteins is predicted to specifically bind methylated CpG dinucleotides, whereas that in the animal BAZ2 proteins is unlikely to have specific methylated CpG recognition capabilities.9 The CXXC domain also recognizes the CpG sequence, though most versions prefer unmethylated targets.9 These observations are of interest in light of our above suggestion of a possible role for the HARE-HTH in discriminating modified DNA. Thus, it appears that a common theme in the WHIM motif proteins is their coupling of the ISWI interaction capability with diverse domains involved in discriminating or catalyzing epigenetic modifications of histones or recognition of specific DNA features, such as inter-nucleosomal linker regions and distorted DNA (e.g., histone H1, HMG, BRIGHT domains and AT-hooks) or discrimination of modified DNA marks (CXXC, TAM/MBD and HARE-HTH).9,88 One group of WHIM motif proteins from certain chlorophyte, rhodophyte and stramenopile algae combine the WHIM motifs with a RFD module, which is found at the N termini of the DNMT1 methyltransferase.9 The RFD module consists of a circularly permuted version of the Sm domain fused to a HTH domain9 and has been demonstrated to be a key player in heterochromatinization by recruiting repressive proteins such as HDAC2.89 This suggests that in these WHIM motif proteins, it might have a role in recruiting nucleosome reorganizing activities to heterochromatin. Another interesting architecture seen in oomycetes combines the WHIM motifs with a Werner syndrome-type DNA repair nuclease with 3′–5′ exonuclease and HRDC domains,90 suggesting that in these organisms the ISWI-catalyzed chromatin reorganization might be combined with DNA repair (Fig. 1).
Profile-profile searches using the HHpred program allowed us to establish a statistically significant (e < 10−7, p = 90%) relationship between the DDT and WHIM motifs with the central region of the yeast Isw1 partner Ioc3 and its paralog Esc8.91 As a result, we were able to construct the following mapping between these motifs and the recently solved crystal structures of the Ioc3 protein (2y9z).91 The DDT motif corresponds to region 176–266, the WHIM1 motif to region 367–417, the WHIM2 motif to region 418–448 and the WHIM3 motif to region 455–496 (Fig. 3). The DDT and the WHIM motifs 1–3 are dominated by largely α-helical structures. As a result of this mapping, we were able show that the DDT and the WHIM1 and WHIM2 motifs tightly pack with each other to form a binding pocket for the trihelical tip of the SLIDE domain in ISW1 (region 994–1,039). Based on this mapping, the highly conserved basic residue in WHIM1 is identified as a key feature involved in packing with the DDT motif, and the acidic residue from the GxD signature of WHIM2 emerges as a major determinant of the interaction between the ISWI and its WHIM motif partners (Fig. 3). WHIM3, on the other hand, overlaps with the “helical linker-DNA binding domain” defined in the structure of Ioc3, which contacts the inter-nucleosomal linker DNA in the major groove along with the N-terminal portion of WHIM2 91 (Fig. 3). Thus, based on the Ioc3 structure, we infer that the WHIM motifs and the DDT domain function as a unit that acts a “protein ruler” to set the spacing between two nucleosomes in conjunction with an ISWI ATPase.91 In light of this, the other domains in these WHIM motif proteins that recognize either diverse epigenetic protein modifications or particular DNA conformations and modifications are likely to provide the specific context in which nucleosome spacing is reorganized in different regions of chromatin or in different phases of the cell cycle.92 It would also be of particular interest to investigate if the associated DNA binding domains in the WHIM motif proteins might influence the sequence dependence of nucleosome positioning. Furthermore, WHIM motif proteins with domains, such as the HARE-HTH and the TAM(MBD), could possibly influence nucleosome positioning in gene bodies in relation to their DNA methylation state (or further modifications like hmC and its derivatives).
Iterative sequence profile searches were performed using the PSI-BLAST95 and JACKHMMER96 programs run against the non-redundant (NR) protein database of National Center for Biotechnology Information (NCBI). Similarity-based clustering for both classification and culling of nearly identical sequences was performed using the BLASTCLUST program (ftp://ftp.ncbi.nih.gov/blast/documents/blastclust.html). The HHpred program was used for profile-profile comparisons.97 Structure similarity searches were performed using the DaliLite program.98 Multiple sequence alignments were built by the Kalign99 and PCMA100 programs, followed by manual adjustments on the basis of profile-profile and structural alignments. Secondary structures were predicted using the JPred program.101 For previously known domains, the Pfam database102 was used as a guide, though the profiles were augmented by addition of newly detected divergent members that were not detected by the original Pfam models. Clustering with BLASTCLUST followed by multiple sequence alignment and further sequence profile searches were used to identify other domains that were not present in the Pfam database. Contextual information from prokaryotic gene neighborhoods was retrieved by a Perl custom script that extracts the upstream and downstream genes of the query gene and uses BLASTCLUST to cluster the proteins to identify conserved gene-neighborhoods. Phylogenetic analysis was conducted using an approximately-maximum-likelihood method implemented in the FastTree 2.1 program under default parameters.103 Structural visualization and manipulations were performed using the PyMol (www.pymol.org) program. The inhouse TASS package, which comprises a collection of Perl scripts, was used to automate aspects of large-scale analysis of sequences, structures and genome context (Anantharaman V, Balaji S and Aravind L, unpublished).
The identification of the HARE-HTH adds yet another link to the growing connection between components of mobile prokaryotic selfish elements, in particular restriction-modification (R-M) systems or bacteriophages and eukaryotic chromatin proteins.9 These extensive connections include:
The selective pressures acting on prokaryotic selfish elements as a consequence of constant genetic conflicts with the hosts appears to have fostered diversification of methylated and non-methylated DNA binding domains, DNA-reorganizing ATPases, MTases and REases among them. The above observations suggest that eukaryotes have utilized this diversity by acquiring such domains through lateral transfer and incorporating them into eukaryotic chromatin-based regulatory systems. These acquisitions appear to have happened at different points during eukaryotic evolution:9 the SWI2/SNF2 ATPases were acquired prior to the last eukaryotic common ancestor and had already proliferated considerably by then. In contrast, the main cytosine methylation systems and DNA hydroxylases of the Tet/JBP family appear to have emerged from one or more independent acquisitions much later in eukaryotic evolution.17 Similarly, the HARE-HTH appears to be a later acquisition, probably corresponding to the time when cytosine methylation was well-established.
In eukaryotes, the HARE-HTH shows an interesting pattern of conservation. While it occurs in certain protein architectures that emerged at the base of either the animal or the green plant lineage, it might be sporadically lost in particular orthologs, as seen in the case of ASXL and NIAM/TBRG1 proteins from certain animal lineages (Fig. 1). Likewise, it might be inserted in the middle of certain proteins in certain lineages as seen in the case of the chlorophyte WSTF-like proteins with the WAC-tyrosine kinase (Fig. 1). None of the architectures with the HARE-HTH are conserved throughout the eukaryotic lineages in which they are present. However, related proteins containing the other domains with which the HARE-HTH is associated might have a much broader distribution, e.g., the ASXH domain or the DDT-WHIM motif proteins. This suggests that in different lineages it has been incorporated into the architectures of more ancient chromatin proteins, possibly enabling them to thereby acquire the capacity to discriminate particular DNA modifications. Thus, its evolutionary mobility appears to have been a notable factor linking transcription regulation, histone modification and nucleosome spacing to features of DNA in the course of the diversification of the eukaryotic crown group.
The co-involvement of ASXL1, ASXL2 and Tet2 in myeloid dysfunction suggests that the HARE-HTH might be an important feature that links DNA modifications, such as 5mC and its oxidized derivatives, hmC, fC and caC, with the action of the Polycomb repressive complex (PRC). Furthermore, the identification of unusual features in the PHD finger of the ASXL proteins suggests that recognition of modified lysines other than those at the N terminus of histone H3 might be an important feature in the action of the PRC. Our identification of the WHIM motifs and their mapping (along with the DDT motif) on to the structure of the Ioc3 protein91 points to a common structural denominator that mediates coupling of the SLIDE domain of ISWI ATPases to their primary functional partner to function, in conjunction with their C-terminal regions, as a “protein ruler” for nucleosomal spacing. Thus, the presence of alternative paralogous WHIM motif proteins, along with their architectural diversity in a given organism and between organisms, might be an important factor in influencing alternative nucleosome position states in diverse biological contexts.
Work by the authors is supported by the intramural funds of the National Library of Medicine.
No potential conflicts of interest were disclosed.