Identification and characterization of the HARE-HTH domain.
Analysis of the vertebrate ASXL proteins using the SEG program reveals that it contains three globular regions. Sequence searches showed that these three regions are conserved through most of metazoa (from cnidarians to vertebrates), with the exception of insects and sponges, which appear to have lost the N-terminal globular region (). Of these, the N-terminal globular region showed no apparent relationship to previously characterized domains. The central globular region (termed the ASXH domain), predicted to constitute an α-helical domain, contains a conserved LXXLL motif, which has been detected previously in diverse transcription factors (TFs), coactivators and corepressors and is implicated in mediating interactions between them.46,47
Indeed, this region has been shown to mediate interactions between the vertebrate ASXL1/2 protein and nuclear receptor TFs.43,44
The C-terminal globular region corresponds to a derived version of the PHD finger (see below), a peptide-binding module commonly present in chromatin proteins.48,49
To better understand the affinities of the N-terminal globular region of the ASXL proteins, we initiated sequence profile searches using the PSIBLAST program. For example, a search with the human ASXL N-terminal region (residues 10–100) recovered, in addition to orthologous and paralogous ASXL proteins from animals, several other proteins: the NIAM/TBRG1 protein from fishes, cnidarians and sponges (e = 10−8
, iteration 2), the HB1 homeodomain protein from plants (e = 10−4
, iteration 4), and the restriction endonuclease HpyAIII (HgrA) from Helicobacter pylori
(e = 10−3
iteration 6) in addition to uncharacterized proteins from plants, red algae, chlorophyte algae and bacteria prior to convergence. Additional transitive searches with the above-detected regions homologous to ASXL1 using the PSI-BLAST and JACKHMMER programs uncovered numerous additional proteins from bacteria, including the RNA polymerase δ subunit (RpoE) of Gram-positive bacteria and several restriction enzymes. This relationship was further confirmed using profile-profile comparisons with the HHpred program, which recovered the profile derived from the structure of the Bacillus subtilis
RpoE (PDB: 2krc) 50
as the best hit (e = 10−12
, p = 95%) when searched with a profile derived from the animal ASXL N-terminal globular regions. The alignment with RpoE corresponded precisely to its N-terminal globular domain that adopts a winged helix-turn-helix HTH (wHTH) fold.50
We accordingly named the conserved domain homologous to the N-terminal region of the ASXL proteins the HARE-HTH, for HB1, ASXL, restriction endonuclease HTH domain, based on the proteins in which it was detected.
Figure 1 Domain architectures, gene neighborhoods and contextual network graph of the HARE-HTH domain and WHIM motifs. Standard abbreviations are used for domain names. LF ZnR refers to the little finger-type zinc ribbon that binds Ub. X and Y, respectively, refer (more ...)
Examination of the alignment and its superposition on the structure of RpoE showed that the HARE-HTH domain conforms to the classical three-stranded wHTH template: the three core helices form the HTH unit with the “wing” sheet formed by the C-terminal hairpin augmented by the extended region between helix 1 and 2.50,51
In structural terms, the HARE-HTH domain is distinguished from other previously characterized wHTH domains by the presence of a conserved element comprised of a single turn of a helix between helix 3 and preceding conserved turn (). This element alters the packing of helix 3 with the two N-terminal helices and also distorts its orientation when compared with most other wHTH domains. The HARE-HTH domain is also distinguished from other wHTH domains by its distinctive conservation pattern, namely an alcoholic residue (S/T) at the N terminus of helix-1 and an exposed S/T/C in helix 3 (). These two positions are spatially proximal and might define a distinctive interaction surface unique to the HARE-HTH domains. However, beyond these features the HARE-HTH domains show considerable sequence diversity, which might also reflect some diversity in term of their binding specificities.
Figure 2 Multiple sequence alignment (A) and structure (B) of the HARE-HTH domain and multiple sequence alignment of the PHD finger in ASX proteins (C). For the multiple alignments, proteins are denoted by their gene names, species names and GenBank index (GI) (more ...)
Inferring functions of the HARE-HTH domain based on domain architectures and other contextual information.
To better understand the functions of the HARE-HTH domain we systematically analyzed the domain architectural contexts in which it occurred and integrated this information with data from studies on the RNA polymerase structure in bacteria (). In animals, in addition to the ASXL proteins described above, the HARE-HTH is found in the fish, cnidarian and sponge NIAM/TBRG1 proteins (), where it is combined with the SJA/Fyr domain that is frequently encountered in diverse eukaryotic chromatin proteins.5,52,53
The SJA/Fyr shares a common fold with the phosphopeptide-binding Polo domain and is predicted to bind epigenetic phosphorylation marks in histones.5
Interestingly, the HARE-HTH has been lost in the tetrapod and insect orthologs of NIAM/TBRG1. In plants and chlorophyte algae (e.g., Arabidopsis HB1 and Chalmydomonas HDZ1), the HARE-HTH is fused to a DNA binding homeodomain or a WAC-tyrosine kinase domain that are found in several eukaryotic chromatin proteins.54–56
These proteins also display a DDT motif and additional previously unrecognized conserved C-terminal motifs found in several other chromatin proteins (see below). Chlorophytes also contain a second type of HARE-HTH protein, wherein it is fused to a C-terminal α-helical domain similar to that found at the C terminus of the SMYD-type SET domain lysine methyltransferases, which potentially mediates interactions with targets (). In red algae (e.g., Cyanidioschyzon merolae
) the HARE-HTH is linked to two N-terminal PHD fingers, similar to those seen in certain SWI2/SNF2 ATPases of the ISWI clade. Furthermore, the HARE-HTH is also found at the N terminus of the Micromonas ortholog of the DNA-demethylating glycosylase, Demeter (gi:255072753), and in two copies flanking a pair of methylated histone-binding Agenet/chromo-like domains in Ostreococcus (gi:308803789). Thus in eukaryotes, the HARE-HTH occurs as a rule in multidomain proteins, with architectures strongly indicative of chromatin-related roles (). Among the bacterial versions, the HARE-HTH of Gram-positive RpoE is accompanied by a highly acidic C-terminal low-complexity tail.57
Certain proteobacteria also contain a version that is comparable to RpoE that instead has an acidic low-complexity tail at the N terminus. Most remarkable are the proteins found sporadically in actinobacteria, firmicutes and proteobacteria that combine a C-terminal HARE-HTH with (1) an N-terminal module containing two or more repeats of the specialized helix-hairpin-helix (HhH) domain found in the C-terminal module of the bacterial RNA polymerase α-subunit (CTD);58
(2) Two additional HTH modules that are specifically related to those found in the region 3 and 4 of the sigma factors59
(; Sup. Material
). Thus, these proteins combine parts of the architecture of the RNA polymerase α and σ subunits with the HARE-HTH in a single polypeptide (). In diverse bacterial lineages, at least six distinct restriction endonuclease (REase) domains, namely HpyAIII-like NmeDI/NmeDIP, Mrr-like, HinDIII-like, HNH/EndoVII and URI (UvrC repair protein-intron homing endonuclease) domains, are combined with one or more N-terminal HARE-HTH domains ( and Sup. Material
). The majority of these REases are encoded in operons with linked genes for modification methylases (MTases of the adenine N6/cytosine N4 targeting family), suggesting that they might function in conjunction with them (). Interestingly, the HARE-HTH is also fused to the N terminus of a circularly permuted adenine N6 MTase found in certain restriction-modification systems (). In a similar vein, in certain bacteria, the HARE-HTH is also fused to the N terminus of a DNA glycosylase domain of the UDG superfamily that is prototyped by the human Tdg and Escherichia coli
Mug and Ung (). Likewise, in certain bacteria, this also occurs to the N terminus of a poly-nucleotide phosphatase/kinase enzyme with the Nudix fold and P-loop kinase domains.
The domain architectures of the bacterial representatives, in particular, the stereotypic location at the N termini of multiple phylogenetically distant or structurally unrelated REase and DNA-modifying domains, suggest that the HARE-HTH might function as a DNA binding domain. Given that REase and DNA glycosylase target sites are characterized by modified nucleotides or mismatches, it is conceivable that these HARE-HTHs recognize sequences with specific DNA modifications or altered structures. As the HpyAIII REase recognizes GATC sites,60
it is conceivable that the binding of the HARE-HTH domain might be affected by modifications in adenine or cytosine residues. The bacterial proteins that combine the RNA polymerase α CTD module, the σ factor region 3 and 4 HTH domains with the HARE-HTH are striking, because an examination of the RNA polymerase holoenzyme complex with the transcription start site (TSS) shows that these modules indeed occupy successive sites on the DNA just upstream of the TSS.61,62
Thus, these proteins are predicted to function as mimics of the α and σ subunits, with the C-terminal HARE-HTH potentially occupying yet another site upstream of the TSS. Accordingly, these proteins could possibly function as a novel inhibitor of bacterial RNA polymerase TSS binding that might either function as a negative regulator, or a suppressor of improper transcription initiation. This is comparable to the role of the Gram-positive RNA polymerase δ subunit, which has been shown to bind the RNA polymerase catalytic complex, reduce its affinity for nucleic acids and increase transcription specificity by promoting recycling.57
Specifically, the δ subunit inhibits the downstream propagation of the transcription bubble at the −10 region, with its acidic C-terminal tail mimicking RNA and interacting with the RNA polymerase catalytic complex.63
While the HARE-HTH of the δ subunit has not yet been demonstrated to bind DNA or RNA, it is possible that it recognizes a distorted DNA structure associated with the transcription bubble.57,63
In eukaryotes, the phyletic pattern of the HARE-HTH, like the TAM(MBD) domain, is strongly correlated with the presence of extensive gene body methylation (e.g., presence in several plants and animals but absence in fungi).9,21,22
This, in conjunction with its genetic interaction in human myeloid lineage disorders with Tet2 and the presence of an algal Demeter-like DNA demethylase, suggests that, like some of the bacterial versions, the eukaryotic versions might also play a role in discriminating sequences with particular DNA modifications, such 5mC or its further oxidized derivatives synthesized by the Tet enzyme. A parallel is also provided by the JBP1-C domain, another distinctive HTH domain, which recognizes base J, the hydroxylated thymine derivative generated by the JBP enzymes in kinetoplastids.17,64
In this context, it might be noted that pathological MOZ-ASXL2 fusion results in deletion of part of the N-terminal helix of the ASXL2 HARE-HTH domain.
Other conserved domains associated with the HARE-HTH in eukaryotes.
The ASXH domain.
Thus far the ASXH domain, i.e., the central globular domain of the ASXL proteins, has only been reported in animals.46
Using sequence profile searches with the PSI-BLAST and hidden Markov model searches with the JACKHMMER program we were able to identify orthologous domains in fungi and plants suggesting that this domain goes back to their common ancestor ( and Sup. Material
). The fungal versions are combined with a fungus-specific C-terminal α-helical domain, whereas the plant versions are combined with a N-terminal DNA binding GATA Zn-finger domain (). An additional ASXH domain is also found in the animal NFRKB/Ino80g protein, a subunit of the INO80-Uch37 chromatin-associated DUB complex. In all these cases, the ASXH domain strongly conserves the LXXLL motif (Sup. Material
), suggesting that it is central to the interactions mediated by this domain. Given the role of the LXXLL motif in mediating interactions between TFs, co-activators and co-repressors,47
it is likely that the ASXH domain plays a conserved role in mediating contact between transcription factors and chromatin-associated complexes. Interestingly, in the plant versions, the GATA Zn-finger takes the place of the HARE-HTH, suggesting that in both plants and animals, recognition of specific features on DNA might be linked to interactions of the ASXH domain with other proteins. The regions shown to be required for the interaction of the Drosophila Asx and human ASXL1, respectively, with Calypso and BAP1 encompass the ASXH domain.30
Given that the ASXH domain is the only common denominator shared by the Drosophila and human proteins in this region, it is likely that it also mediates the interaction of ASXL with these DUBs. Phylogenetic analysis suggests that the Calypso/BAP1 clade of DUBs branched off in animals from within a more widely distributed clade of DUBs typified by the UCHL5/UCH37 proteins (Sup. Material
). This suggests that if the interaction between the ASXL proteins and Calypso/BAP1-like proteins is indeed mediated by the ASXH domain, then it might be an animal-specific feature of the domain. Another potential animal-specific binding partner of the ASXL proteins is prototyped by the Drosophila Tantalus protein,34
whose orthologs we identified for the first time in other metazoans (Sup. Material
). The ASXH domain found in fungi and plants might mediate interactions comparable to those of the animal ASXL proteins, such as those with TFs and DUBs of the UCH37-like clade.43,44
The unusual PHD finger of the ASXL proteins.
The PHD finger is a pan-eukaryotic domain that principally binds the N-terminal tail of histone H3 by recognizing either a methylated or unmodified H3K4.5,49,65–68
Certain versions of the latter have evolved further features to recognize methylated H3K9, while yet others might bind acetylated histone via unconventional interfaces.69,70
The structure of the PHD finger is characterized by a binuclear version of the treble-clef fold, whose typical versions have several special features that specifically allow it to recognize N-terminal tails of histones. One of these key features is the presence of a “bilobed brace” immediately downstream of the C-terminal helix of the core treble-clef structure that only allows a N-terminal peptide to be accommodated via hydrogen bonding with the ascending arm of the treble-clef β-hairpin.71
The ability to bind methylated lysines is primarily conferred by the conservation of certain aromatic and hydrophobic residues at certain positions in the core treble clef.5,48,64–67
A multiple alignment of the PHD finger in the ASXL proteins with a panel of 150 distinct PHD fingers from diverse eukaryotes shows that it contains a methionine three positions upstream of the 3rd
cysteine and an aromatic residue two positions upstream of the only conserved histidine ( and Sup. Material
). Based on the known structures, the residues at the above-stated positions are suggestive of the ASXL PHD fingers binding methylated lysines. However, in contrast to practically all other PHD fingers, the ASXL PHD finger lacks the bilobed brace. This suggests that it does not have the constraints to limit its binding to the N-terminal histone H3 peptide ( and Sup. Material
). This is also supported by the lack of certain other conserved features that recognize conserved residues in the N-terminal peptide of H3. Hence, we suggest that the PHD finger of ASXL might be distinctive in being able to recognize internal methylated lysines rather than just those in the N-terminal tail of histone H3. It is possible that this function might have a role in linking other internal histone methylations (e.g., H3K27 catalyzed by the Polycomb repressive complex 2) to H2AK120 monoubiquitination.5
Unlike the ASXL proteins, in red algae the HARE-HTH is fused to a conventional PHD finger (), suggesting that in these organisms, the action of the HARE-HTH might be coupled with the recognition of the modification status of the N terminus of histone H3.
The WHIM motifs.
The plant HB1 and chlorophyte HDZ1 proteins and certain chlorophyte proteins (e.g., Volvox
; gi: 302828732), which, respectively, combine either a homeo- or a WAC-tyrosine kinase domain and a DDT motif with the HARE-HTH, show a distinctive C-terminal region with three conserved motifs separated by low-complexity regions (). Further iterative searches with these motifs using the PSI-BLAST and JACKHMMER regions revealed that they are also present in in the Arabidopsis flowering regulator MBD9 where they combine with a TAM(MBD) domain that binds methylated CpG dinucleotides in DNA.72
These searches also recovered these three motifs in the yeast Itc1p, a subunit of the nucleosome repositioning complex Isw2.73
An improved profile including all these above proteins recovered the three motifs in the BAZ/WAL proteins (including the Williams-Beuren syndrome transcription factor), BPTF, the RSF1 subunit of the chromatin assembly factor RSF, Nurf-1, Enhancer of bithorax (Nurf301), Toutatis, CECR2, Dikar and several uncharacterized proteins from fungi, chlorophytes, stramenopiles, ciliates and kinetoplastids ( and
). In all these proteins, the three motifs were always found in the same collinear order and were preceded by a DDT domain, though in certain cases additional domains such as the PHD finger might be inserted between the first and the second of the conserved motifs. The second and third of these motifs partly overlap the regions of conservation named “BAZ1” and “BAZ2” that were defined in the BAZ proteins of animals, i.e., BAZ1A, BAZ1b, BAZ2A and BAZ2B.74,75
However, the similarity between these regions in the BAZ proteins the other proteins detected in the above searches has not been previously reported. Hence, we named these three motifs the WSTF, HB1, Itc1p, MBD9 (WHIM) motifs 1–3. In the Drosophila protein ACF1, ortholog of BAZ1A, the region including the DDT domain and the three WHIM motifs, has been shown to be required for its interaction with SNF2.76,77
Likewise, deletion mapping has showed that in the BAZ2B protein, the conserved region BAZ1, which overlaps with WHIM motif 2, is required for binding SNF2H the human ortholog of ISWI.75
Consistent with this, the animal BAZ proteins, BPTF, RSF1, Nurf-1, CECR2, Dikar and the yeast Itc1p are all partners of ISWI-like SWI2/SNF2 ATPases and considerably increase their nucleosome mobilization capability in diverse nucleosome reorganizing complexes, such as ACF/WRCF, CHRAC, RSF, NoRC, WICH and their cognates in other eukaryotes.73,76–85
Therefore, one of the likely roles of the WHIM motifs, in conjunction of with the DDT domain, would be to mediate interactions with ISWI-like ATPases.
Figure 3 Multiple sequence alignment of the WHIM motifs (A) and their relative structural organization in ISWI1a complexed with DNA (B). For multiple alignments, proteins are denoted by their gene names, species names and GenBank index (GI) numbers. Conserved (more ...)
To gain better understanding of the action of the WHIM motifs, we analyzed their sequence conservation features and the domain architectures of the proteins in which they are found. First, a subset of these proteins, such as the animal BAZ1A/B-like proteins, and fungal Itc1p possess an N-terminal WAC tyrosine kinase domain that has been shown to phosphorylate Y142 of H2A.X.56
Second, most of these proteins have multiple domains for the recognition of histone H3 N-terminal peptides (PHD finger), acetylated histone peptides (bromodomains), monoubiquitinated peptides (the “little finger” type Ub-binding Zn-ribbon), phosphorylated peptides (SJA/FYR) and methylated peptides (AGENET, BMB/PWWP and AUX-RF, a novel Chromo-like domain) 1,5,86,87
(). Additionally, others, like HB1 and MBD9 in plants, BPTF, BAZ2A/B, CECR2 in animals and previously uncharacterized proteins in chlorophytes and stramenopiles contain DNA binding domains such as the HARE-HTH, histone H1, CENB-HTH, TAM(MBD), homeo, HMG, BRIGHT, CXXC and AT-hooks (). Of these, the TAM(MBD) domain in the plant MBD9 proteins is predicted to specifically bind methylated CpG dinucleotides, whereas that in the animal BAZ2 proteins is unlikely to have specific methylated CpG recognition capabilities.9
The CXXC domain also recognizes the CpG sequence, though most versions prefer unmethylated targets.9
These observations are of interest in light of our above suggestion of a possible role for the HARE-HTH in discriminating modified DNA. Thus, it appears that a common theme in the WHIM motif proteins is their coupling of the ISWI interaction capability with diverse domains involved in discriminating or catalyzing epigenetic modifications of histones or recognition of specific DNA features, such as inter-nucleosomal linker regions and distorted DNA (e.g., histone H1, HMG, BRIGHT domains and AT-hooks) or discrimination of modified DNA marks (CXXC, TAM/MBD and HARE-HTH).9,88
One group of WHIM motif proteins from certain chlorophyte, rhodophyte and stramenopile algae combine the WHIM motifs with a RFD module, which is found at the N termini of the DNMT1 methyltransferase.9
The RFD module consists of a circularly permuted version of the Sm domain fused to a HTH domain9
and has been demonstrated to be a key player in heterochromatinization by recruiting repressive proteins such as HDAC2.89
This suggests that in these WHIM motif proteins, it might have a role in recruiting nucleosome reorganizing activities to heterochromatin. Another interesting architecture seen in oomycetes combines the WHIM motifs with a Werner syndrome-type DNA repair nuclease with 3′–5′ exonuclease and HRDC domains,90
suggesting that in these organisms the ISWI-catalyzed chromatin reorganization might be combined with DNA repair ().
Profile-profile searches using the HHpred program allowed us to establish a statistically significant (e < 10−7
, p = 90%) relationship between the DDT and WHIM motifs with the central region of the yeast Isw1 partner Ioc3 and its paralog Esc8.91
As a result, we were able to construct the following mapping between these motifs and the recently solved crystal structures of the Ioc3 protein (2y9z).91
The DDT motif corresponds to region 176–266, the WHIM1 motif to region 367–417, the WHIM2 motif to region 418–448 and the WHIM3 motif to region 455–496 (). The DDT and the WHIM motifs 1–3 are dominated by largely α-helical structures. As a result of this mapping, we were able show that the DDT and the WHIM1 and WHIM2 motifs tightly pack with each other to form a binding pocket for the trihelical tip of the SLIDE domain in ISW1 (region 994–1,039). Based on this mapping, the highly conserved basic residue in WHIM1 is identified as a key feature involved in packing with the DDT motif, and the acidic residue from the GxD signature of WHIM2 emerges as a major determinant of the interaction between the ISWI and its WHIM motif partners (). WHIM3, on the other hand, overlaps with the “helical linker-DNA binding domain” defined in the structure of Ioc3, which contacts the inter-nucleosomal linker DNA in the major groove along with the N-terminal portion of WHIM2 91
(). Thus, based on the Ioc3 structure, we infer that the WHIM motifs and the DDT domain function as a unit that acts a “protein ruler” to set the spacing between two nucleosomes in conjunction with an ISWI ATPase.91
In light of this, the other domains in these WHIM motif proteins that recognize either diverse epigenetic protein modifications or particular DNA conformations and modifications are likely to provide the specific context in which nucleosome spacing is reorganized in different regions of chromatin or in different phases of the cell cycle.92
It would also be of particular interest to investigate if the associated DNA binding domains in the WHIM motif proteins might influence the sequence dependence of nucleosome positioning. Furthermore, WHIM motif proteins with domains, such as the HARE-HTH and the TAM(MBD), could possibly influence nucleosome positioning in gene bodies in relation to their DNA methylation state (or further modifications like hmC and its derivatives).