|Home | About | Journals | Submit | Contact Us | Français|
Epigenetic regulation of gene transcription relies on an array of recurring structural domains that have evolved to recognize post-translational modifications on histones. The roles of bromodomains, PHD fingers, and the Royal family domains in the recognition of histone modifications to direct transcription have been well characterized. However, only through recent structural studies it has been realized that these basic folds are capable of interacting with increasingly more complex histone modification landscapes, illuminating how nature has concocted a way to accomplish more with less. Here we review the recent biochemical and structural studies of several conserved folds that recognize modified as well as unmodified histone sequences and discuss their implications on gene expression.
Epigenetic regulation of gene expression involves a multitude of protein complexes, conserved structural modules and molecular interactions mediated by DNA and histone modifications that work together to comprise a balanced and heritable system (Berger, 2007; Kouzarides, 2007). The addition, removal and interpretation of these covalent chemical modifications to chromatin allow for an additional level of complex control of gene transcription beyond the genetic code. Early processes such as cell differentiation and embryonic development, as well as aging and environmental effects on mature organisms are all controlled by epigenetic processes (Jaenisch and Bird, 2003). Dysregulation of these mechanisms has been shown to lead to cancer and other diseases. Manipulating the occurrence of these modifications has therefore inspired new clinical therapies. Towards this goal, efforts have understandably focused on examining functional mechanisms of the proteins that are involved in chromatin remodeling and epigenetic control of gene transcription at a molecular and structural level.
Many of these proteins contain one or more structurally conserved domains that are, for the large part, exclusive to chromatin remodeling and may recognize DNA, RNA or covalent histone modifications. Few of these domains occur or behave alone; many are found in multiple copies or in tandem with other chromatin-associated domains in a single protein. The recognition of histone modifications by these domains and their resulting effects on gene transcription had given rise previously to the “histone code hypothesis”, which postulates that different combinations of modifications, either in combinatorial or sequential manner, can elicit different transcriptional outcomes by recruiting proteins that recognize these modifications (Strahl and Allis, 2000; Turner, 2002).
The last few years have been marked with a flurry of structural and functional studies that have helped define the way by which these histone modifications are recognized and translated into functional outputs. Contrary to the simple models proposed in its early days, this output is hardly discrete; the complexity in the way the histone modifications is interpreted has allowed for a far more nuanced functional response, which has made the field more interesting, but also more challenging to study. The reader is directed to a number of recent reviews of the different structures involved in histone modification recognition (Adams-Cioaba and Min, 2009; Ruthenburg et al., 2007b; Sanchez and Zhou, 2009; Seet et al., 2006; Taverna et al., 2007; Yap and Zhou, 2006). Here we focus on the variations of binding modes of these established protein domains, such as tandem-repeat recognition, combinatorial recognition by adjacent modules and histone modification “switches”.
Until very recently, the bromodomain was the only known structural fold that recognizes acetylated lysine (Dhalluin et al., 1999; Sanchez and Zhou, 2009). The bromodomain fold consists of a four-helix bundle (αZ, αA, αB, αC) with two interconnecting loops (ZA and BC loops) that form an aromatic pocket to both stabilize the structure as well as coordinate binding of the acetylated lysine (Figure 1A). Two conserved tyrosine residues (one in the ZA loop, the other at the C-terminus of αB) that contribute to the hydrophobic pocket are found in the majority of bromodomains (Sanchez and Zhou, 2009), although they are not necessarily a determinant for acetyl-lysine recognition (Charlop-Powers et al., 2010). The side-chain amide nitrogen of a highly conserved asparagine residue at the beginning of the BC loop (immediately following the second conserved Tyr) forms a hydrogen bond with the acetyl-lysine carbonyl oxygen. Together, these characteristics are critical for acetyl-lysine recognition (Zhang et al., 2010). Acetyl-lysine binding to the bromodomain in isolation occurs with moderate affinity, with dissociation constants typically in the tens to hundreds of micromolar range (Zhang et al., 2010), although it can be as weak as 1–2 mM (VanDemark et al., 2007).
From a large number of bromodomain structures solved in the absence or presence of acetylated histone peptides (Dhalluin et al., 1999; Hudson et al., 2000; Jacobson et al., 2000; Nakamura et al., 2007; Owen et al., 2000; Shen et al., 2007; Singh et al., 2007; Sun et al., 2007; VanDemark et al., 2007; Vollmuth et al., 2009), the general structure of the bromodomain is well conserved and changes little upon peptide binding, with the exception of the conformational adjustment of the ZA and BC loops. Specificity by these bromodomains is dictated by the sequences within these loops interacting with both the acetylated lysine and the residues at positions −1, +1, +2 and +3 relative to the acetylated lysine (Zeng et al., 2008b; Zhang et al., 2010) (Figure 1A). The acetylated lysine ligand inserts into the pocket in a similar way in the different structures, and coordination of a HIV Tat or p53 peptide containing an acetylated lysine residue also interacts in a similar fashion (Mujtaba et al., 2002; Mujtaba et al., 2004).
Bromodomains, like many other histone-recognition modules, occur in multiples. The structure of the tandem bromodomains of TAF1 (TAFII250) shows two bromodomains packed together to form a U-shape (Jacobson et al., 2000) (Figure 1B). The individual domains fold independently, and their acetyl-lysine binding pockets are approximately 25 Å apart, equivalent to a span of 7–8 residues on a peptide. Consistent with the theory that each bromodomain might coordinate one acetylated lysine on the same peptide, studies with differently acetylated H4 peptides revealed that the TAF1 dual bromodomains bound with significantly higher affinity to peptides that were di- or tetra-acetylated at K5/K12, K8/K16, K5/K8/K12/K16 over a mono-acetylated H4 peptide. With the complex structure still elusive, this model remains to be confirmed.
A structural study of the tandem bromodomains of Rsc4 in yeast demonstrated that only the second bromodomain interacted with an acetylated H3K14 peptide, and this interaction was disrupted by phosphorylation at H3S10 (VanDemark et al., 2007) (Figure 1B). The two bromodomains fold like one autonomous unit with extensive contacts between the two bromodomains, and is more compact than the TAF1 structure, with the acetyl-lysine binding sites 20 Å apart. Furthermore, in acetylating a fused H3 peptide-Rsc4 dual bromodomain protein with Gcn5, the authors found that the N-terminus of Rsc4 was acetylated at a Gcn5-target consensus sequence, which was bound to the first bromodomain, and this binding precluded binding of the fused histone peptide to the second bromodomain. This suggests that Gcn5 provides an auto-regulatory mechanism to control Rsc4 activity by acetylating both activating and inhibiting modifications.
The Polybromo (PB1) protein contains six bromodomains in tandem at its N-terminus and is involved in chromatin remodeling. It was hypothesized that the presence of multiple domains might enable it to recognize specific nucleosomal acetylation patterns to target PB1 (and its parent PBAF complex) to chromatin (Thompson, 2009). Structures available for all but the fourth bromodomain confirm that secondary and tertiary structure is generally well conserved amongst the bromodomains. All but the sixth bromodomain contain an additional two small helices within the ZA loop. However, sequence clustering analysis indicates that these domains may be classified separately from one another, suggesting different ligands and affinities for each (Sanchez and Zhou, 2009). Indeed, each bromodomain appears to have preferences for different acetylated lysines on H2A, H2B, H3 or H4 (Charlop-Powers et al., 2010; Thompson, 2009) although the fifth and sixth bromodomains may serve as non-specific binding modules that stabilize PB1 binding to a specific acetylated histone sequence via the other four bromodomains. Interestingly, in the only PB1 bromodomain complex structure available, the H3K14ac peptide does not interact with the conserved Tyr residues that contribute to the hydrophobic pocket, but instead interacts with Leu and Val residues from the first additional helix within loop ZA and the N-terminus of αC, respectively (Charlop-Powers et al., 2010). Since these are not strongly conserved residues either among other bromodomains within PB1, it is likely that PB1 employs multiple modes of recognition of their ligands.
Brdt, a testis-specific bromodomain and extra terminal domain (BET) protein, recognizes and compacts hyperacetylated chromatin. Brdt contains two tandem bromodomains, the overall structures of which are similar; their ligand binding pockets are comparably large. Interestingly, however, the first bromodomain recognizes two acetylated lysines within its pocket (H4K5ac/K8ac), while the second more traditionally recognizes a single acetylated lysine (H3K18ac) (Moriniere et al., 2009) (Figure 1C). To our knowledge this is only one of two structurally known occurrences of a single domain coordinating two modifications within a single binding pocket (see also the RAG2 PHD finger). Although this suggests that one protein could coordinate the tails of adjacent histones via a single domain, only the former interaction is important for chromatin compaction by Brdt. Despite their general structural similarity, sequence differences in both ZA and BC loops determine that only the first bromodomain is able to interact with the H4K5ac/K8ac peptide, and based on sequence conservation, that the coordination by this fold is likely the one conserved for other BET proteins.
Bromodomains are often found adjacent to other histone recognition domains, with the most common being the PHD (plant homeodomain) finger. The relative arrangement of the tandem domains vary based on the length and composition of the linker sequence, as well as the residues of the domains that may form an interacting surface. The PHD finger-bromodomain fragment of KAP1, a transcriptional co-repressor for KRAB zinc finger proteins, lacks several conserved residues within its bromodomain, believed to preclude its ability to directly bind to acetylated lysine. The tandem domains function cooperatively as a single unit with helix αZ forming a hydrophobic core between the two folds (Zeng et al., 2008a). The intimate association between the domains enables the PHD finger, which is an intramolecular E3 sumoylation ligase, to sumoylate the bromodomain, which is required for KAP1 recruitment of the SETDB1, a histone H3 lysine 9 specific methyltransferase to chromatin (Ivanov et al., 2007). Conversely, the PHD finger-linker-bromodomain fragment of BPTF, a subunit of a nucleosome-remodeling factor complex, represents two folded domains separated by a helical linker that form no contacts between each other. Although hypothesized to be able to potentially interact with two different histone modifications simultaneously, crystal structures reveal that only the PHD finger interacts with H3K4me3 (Li et al., 2006).
The recently reported structures of the tandem PHD finger (PHD12) of human DPF3b, a component of the BAF chromatin remodeling complex (Lange et al., 2008), demonstrate that in addition to their well-known role in recognizing methylated lysine, PHD fingers can also recognize acetylated lysine, and that these two interactions can act antagonistically (Zeng et al., 2010). The tandem PHD fingers in DPF3b fold as one functionally cooperative unit, and interact with an unmodified H3 peptide with an affinity of KD ~2 µM; acetylation at K14 enhances H3 binding to a KD of 0.5 µM, whereas methylation at H3K4 almost abolishes the interaction. The H3 peptide lies across a surface shared by the two domains, with a β-strand from R2-K4 of H3 that contributes to the β-sheet of the second PHD finger (PHD2), a sharp kink at the middle of the peptide due to interactions between the first PHD finger (PHD1) and K9, and K14ac interacting with a hydrophobic pocket formed by PHD1 (Figure 1D). Additionally, the H3K14ac acyl chain interacts with Arg289 and Phe264 of PHD1, and the acetyl amide group forms a hydrogen bond with the side chain of Asp263 of PHD1. The complex structure of DPF3b PHD12 bound to an N-terminally acetylated H4 peptide confirms that the PHD1 interacts with the acetyl group with the same residues. The recognition of the acetyllysine by DPF3b PHD1 is structurally distinct from the pattern employed by the bromodomains, and is completely different from PHD finger’s mechanism for methyllysine recognition, which uses a surface on the opposite side of the fold.
In addition to its role in transcriptional regulation, histone phosphorylation is characterized as a mark of mitosis, meiosis and DNA repair. Phosphorylation occurs both on the tails and core region of histones, and is predominantly on serine residues. Often, phosphorylation of a residue adjacent to another that is modified (typically lysine) results in the disruption or marked weakening of recognition of the latter modification by its reader domain; the combined action of these modifications has been referred to as a binary switch (Fischle et al., 2003a). Aside from a complex involving two modifications on one histone peptide (Flanagan et al., 2005), there are no other known structures of proteins in complex with histone phosphothreonine.
14-3-3 proteins are known recognition modules for phosphoserine in hundreds of different targets, and were demonstrated to interact with phospho-H3 (Macdonald et al., 2005), despite departure from two known consensus sequences recognized by 14-3-3. Structural studies of 14-3-3ζ in complex with different H3S10ph peptides confirmed that the binding is distinguished from 14-3-3 complexes with non-histone targets (Macdonald et al., 2005) (Figure 2A). With the replacement of a hydrophobic residue with a threonine and the presence of two glycines following the phosphoserine in the consensus sequence, 14-3-3ζ forms extensive ionic interactions with the peptide, which does not adopt a strictly extended configuration. Addition of acetylation groups at K9 and K14 do not influence the interaction, distinguishing this recognition as direct and not as part of a binary switch.
The BRCT domain is named for its two-repeat occurrence at the C-terminus of BRCA1 and is often mutated or deleted in cancers associated with the gene. BRCT domains are found predominantly in proteins involved in cell cycle regulation and DNA damage repair and consist of three α-helices packed around a small, parallel four-stranded β-sheet. The two BRCT domains of BRCA1 bind one phosphoserine via its first domain, although the two domains pack together head-to-tail and an interface between them recognizes a phenylalanine residue C-terminal to the phosphoserine. The tandem BRCT domains of MDC1 (mediator of DNA damage checkpoint protein 1) also coordinate a single phosphoserine at H2AXS139 (Lee et al., 2005; Stucki et al., 2005) in a similar fashion, although two arginine residues from the first BRCT and a glutamine from the second BRCT appear to interact with the proceeding residues, including the free carboxylate group of the terminal tyrosine residue of H2AX (Figure 2B).
The PHD finger is a small, cysteine-rich zinc-binding module that has few secondary structure elements, and until recently was best known as protein-protein or protein-phospholipid interaction domains. A small two-stranded β-sheet is flanked by two short α-helices, with the zinc ions coordinated by conserved cysteine and histidine residues in a ‘cross-brace’ motif (Bottomley et al., 2005; Capili et al., 2001; Elkin et al., 2005; Kwan et al., 2003; Pascual et al., 2000). As with other types of RING fingers, some PHD fingers can act as E3 ubiquitin ligases, while others are involved in transcriptional regulation.
PHD fingers are found in a number of proteins involved in chromatin remodeling. By virtue of their frequent occurrence adjacent to other known chromatin interaction domains (bromodomains, PWWP domains) it had been speculated that PHD fingers might have the capacity to recognize histone modifications. In 2006, several studies revealed that they could specifically interact with methylated H3K4 to influence both gene activation and repression (Li et al., 2006; Palacios et al., 2006; Pena et al., 2006; Shi et al., 2006; Taverna et al., 2006; Wysocka et al., 2006). Subsequent studies on additional PHD fingers (Champagne et al., 2008; Hung et al., 2009; Palacios et al., 2008; Pena et al., 2008; Wang et al., 2009a; Wen et al., 2010) established that the degree of valency recognized was context and sequence dependent. Of the dozens of structures of PHD fingers free and in complex with H3 peptides that have been solved, most peptides bear the H3K4me3 mark. The majority of the complex structures have highly similar topology and the histone peptide harboring the modification forms a β-strand that integrates into the existing antiparallel β-sheet formed by the PHD finger (Figure 3A). Two notable exceptions are the PHD fingers of Pygo (in complex with co-factor BCL9 HD1) and BPTF (particularly in the absence of its adjacent helical linker and bromodomain) (Fiedler et al., 2008; Li et al., 2006), in which the N-terminal loop preceding the domain folds into the region that would be occupied by the C-terminus of the peptide strand, forcing the histone peptide to bend away from the structure, preventing strand formation. Typically, the methyl-lysine is mediated by two to four aromatic residues, numerous contacts between the strands that form the intermolecular sheet and between the NH3+ group and residues A1 and T3 with the binding pocket. In several cases, binding of H3R2 is accomplished by an acidic cluster of residues and hydrogen bonds to its guanidinium group. The R2 side-chain is accommodated by a groove adjacent to that binding the methyl-lysine, separated by a conserved Trp in the PHD finger. The Pygo-HD1 complex has a slight preference for H3K4me2 due to a hydrogen bond formed between an aspartate residue within Pygo and the amino side-chain of H3K4me2, and the guanidinium group of R2 faces the solvent (Fiedler et al., 2008). In the RAG2 PHD structure, the acidic residues are replaced with a tyrosine, and the side-chain of R2 also faces the solvent, but the Tyr substitution allows it to interact with symmetrically dimethylated R2; asymmetric dimethylation has no effect (Ramon-Maiques et al., 2007). Both BPTF and RAG2 can bind H3K4me2 with reduced affinity, and the interaction is mediated by additional contacts with bridging water molecules (Li et al., 2006; Ramon-Maiques et al., 2007).
Predictably, the presence of methylation at H3R2 has varying effects on the H3K4me-PHD finger interaction; depending on the nature of the interaction between the histone arginine and the PHD finger, asymmetric or symmetric dimethylation can have an enhancing effect (RAG2), a weakening effect (ING2/Yng1) or no effect at all (Pygo).
The AIRE (autoimmune regulator) protein, which is mutated in autoimmune polyendocrinopathy-candidiasis-ectodermal dystrophy (APECED), and BHC80, a component of the LSD1 lysine demethylase complex, are distinguished from other PHD finger proteins as they contain PHD fingers that recognize unmodified H3K4 (H3K4me0), and this interaction is abolished by methylation at this residue as well as by a number of other modifications to the H3 tail (Chakravarty et al., 2009; Chignola et al., 2009; Lan et al., 2007). Like the PHD finger complex structures with methylated histone peptides, the unmodified peptides form a β-strand anti-parallel to the β-sheet within the PHD finger (Figure 3B). The binding pockets that accommodate the peptide, however, are strikingly different from those observed for the methylated peptides. Interaction between the first eight residues of H3 and the PHD finger are extensive, including electrostatic interactions between R2, K4 and R8 of the peptide and a number of acidic residues on the PHD finger, several hydrogen and non-polar bonds with the N-terminus of H3, and a hydrophilic binding pocket for K4 that is too narrow to allow for methylation.
The Royal family of domains (Maurer-Stroh et al., 2003) are a structurally related group of protein folds (Tudor, PWWP, MBT and chromodomains) believed to have descended from a common ancestor with a conserved methylated substrate binding ability. A slightly curved, barrel-like three-strand β-sheet with a short 310 helix is common to all folds, with adjacent additional strands or helices distinguishing the different folds. Recent studies not only have established that all these domains are able to bind methylated lysines both in histone and non-histone proteins, but in contrast to what was expected, that the tandem occurrence of these domains does not necessarily result in additional histone modification recognition; instead, additional ligands binding to these other domains may confer functional specificity.
The MBT (malignant brain tumor) repeat is ~100 amino acids with a four-stranded β-barrel “core” followed by an extended “arm” of helices and a short strand (Sathyamurthy et al., 2003; Wang et al., 2003). MBT-containing proteins appear to preferentially interact with monomethyl-lysine histone peptides (Kim et al., 2006) with some binding equally well to dimethyl-lysine. Several studies have since suggested that all MBT domains display little sequence specificity for their ligands and interact with the methyl group via a “cavity insertion recognition mode” (Taverna et al., 2007) by binding deeply into the hydrophobic pocket via cation-π and van der Waals interactions but making few contacts with the protein beyond the methyl group.
The MBT domain can occur up to four times in succession, with one domain packing against another. In the two MBT repeat structures of SCM (Sex comb on midleg, a Polycomb protein involved in Hox repression) and SCM-like 2 (SCML2), the arm packs against the β-barrel core of the opposite domain and there are extensive contacts between the two cores (Grimm et al., 2007; Sathyamurthy et al., 2003). Additional studies showed that the second, but not the first MBT repeat could bind monomethylated histone peptides in a sequence-independent manner. The complex structures of SCM or SCML2 bound to a monomethyl-lysine shows that the binding pocket is composed of three aromatics and an aspartate residue that interacts with the methyl-ammonium group of the monomethyl-lysine, which are not found in the corresponding region of the first MBT repeat (Grimm et al., 2007; Santiveri et al., 2008) (Figure 4A, top). One Phe residue of the pocket closes the site in the free form structure, and undergoes a conformational change upon ligand binding. The affinity for the monomethyl-lysine either alone or within histone peptides is weak, with KD ranging from 0.5 to 1 mM and is significantly worse for dimethylated lysine.
Studies of the three MBT-repeat protein L3MBTL1 demonstrated that binding to the mono- and dimethyl-lysine by the second MBT repeat is driven both by hydrophobic and electrostatic interactions due to acidic residues adjacent to the aromatic pocket that interact with the basic residues of the histone peptide, leading to twentyfold stronger affinities than were reported for SCML2 (Li et al., 2007; Min et al., 2007) (Figure 4A, middle). The three MBT repeats are situated in a propeller-like ring with the cores facing the center, and the arms pointing outward and packed against an adjacent core. Interestingly, in a seleno-Met structure, MES was found in the pocket of all three MBT domains (Wang et al., 2003), and in native structures the pocket of the first MBT domain is occupied by a Pro residue from an adjacent protomer in the crystal (Li et al., 2007; Wang et al., 2003). As observed for the second MBT pocket of SCM and SCML2, the third MBT pocket appears closed in the native structure, but it is still unclear if this domain can specifically interact with a ligand.
The architecture of the four-repeat MBT structure of L3MTL2, its Drosophila homolog Sfmbt and related protein MBTD1 (Eryilmaz et al., 2009; Grimm et al., 2009; Guo et al., 2009) are similar to that of the three-repeat propeller, with the first, less compact MBT domain protruding away from the other repeats, lying adjacent to the fourth repeat. Only the fourth MBT domain contains the requisite hydrophobic and acidic residues and was shown to be able to interact with an H4K20me1 peptide (Grimm et al., 2009; Guo et al., 2009) (Figure 4A, bottom). Several methylated histone peptides were assessed for binding to MBTD1 and preference seemed to be nearly equal for both mono-and dimethylated peptides (Eryilmaz et al., 2009). As has been observed previously, the MBT pocket cannot accommodate trimethylated lysines due to its size and its inability to form a hydrogen bond with aspartate, and cannot support binding of unmodified lysines due to inadequate interactions with the hydrophobic residues.
The Tudor domain is a four- or five-stranded β-barrel fold with one or two helices packed against the β-sheet (Selenko et al., 2001). Tudor domains are frequently found in RNA-binding proteins (Shimojo et al., 2008), but have also been associated with methylated lysine and arginine recognition. To date, no structure of a single Tudor domain in complex with a histone peptide has been reported; however, several structures of tandem Tudor domains (TTDs) (Charier et al., 2004; Zhao et al., 2007) in complex with H3K4me, H3K9me or H4K20me peptides have been solved. In all these complexes, the methylated histone peptide interacts with only one of the domains, leaving the other Tudor domain free to possibly interact with other, as yet unknown ligands. Interestingly, the structures of TTDs vary greatly, both due to the internal packing of the Tudor fold itself as well as the interconnecting “linker” between the two domains.
The TTD of JMJD2A, a histone demethylase specific for H3K9me2/3 and H3K36me2/3, was solved in complex with H3K4me3 and H4K20me3 peptides (Huang et al., 2006a; Lee et al., 2008) revealing distinctive binding modes for each. Each ‘hybrid’ Tudor domain contains four strands, the second and third of which are long and are shared between the two domains, and a fourth which sequentially belongs to the opposite domain (Figure 4B, top). As a result, the linker between the two swapped domains is essentially a two-stranded sheet. In both structures, the methyl group is coordinated by three aromatic residues in a pocket housed by the second hybrid Tudor domain with one residue from the third helix of the first hybrid Tudor domain. However, electrostatic and hydrogen bond interactions are accomplished by different aspartate and asparagine residues in JMJD2A for binding to the two different peptides (Figure 4B, top). It is unclear what roles the two modifications play in the function of JMJD2A, although they may help to recruit the enzyme to chromatin in order to demethylate H3K9 or H3K36.
The structure of the TTD of 53BP1 was also solved in complex with both methylated peptides of p53 (Roy et al., 2010) and of H4 (Botuyan et al., 2006) (Figure 4B, middle). The structure of the protein is essentially identical; the methyl groups of both p53K382me2 and H4K20me2 lie in a similar plane in the same position relative to 53BP1, suggesting that p53 and H4K20me2 could compete for each other in DNA repair and raises questions as to which would be responsible for 53BP1 recruitment to the double-strand break (DSB). 53BP1 TTD was also identified as a binding module for H3K79me2, and this interaction was reportedly necessary for targeting to DSBs (Huyen et al., 2004). However, subsequent studies were unable to detect binding to the trimethyl mark, and only very weak binding to the dimethyl mark (Botuyan et al., 2006; Kim et al., 2006) suggesting H4K20me2 may be the dominant mark in DSB recognition. Since 53BP1 oligomerization is required for efficient DSB recognition, it is possible that a heterodimer of complexes is the active unit found at the DNA. The five-residue binding pocket is unable to accommodate a trimethyl-lysine, explaining the prevalence and preference of the monomethyl and dimethyl marks in previous studies.
UHRF1 is a putative E3 ubiquitin ligase important for the G1/S transition and for maintaining DNA methylation by recruiting and tethering DNMT1 to DNA (Bostick et al., 2007; Sharif et al., 2007). It contains an N-terminal ubiquitin-like domain, tandem Tudor domains, a PHD finger that, together with the adjacent SRA domain, recognizes H3K9me (Karagianni et al., 2008) and is important for pericentromeric heterochromatin reorganization (Papait et al., 2008), an SRA domain that preferentially binds hemi-methylated DNA (Arita et al., 2008; Avvakumov et al., 2008; Hashimoto et al., 2008; Qian et al., 2008; Unoki et al., 2004) and a C-terminal RING domain likely responsible for its E3 ligase activity specific for core histones (Citterio et al., 2004). The TTD of UHRF1 resembles that of Fragile X mental retardation protein, with two independent folds connected by an unstructured linker. In the structure, the first Tudor domain interacts with the methylated lysine of an H3K9me3 peptide but the peptide itself resides in a groove shared by the two domains (Figure 4B, bottom). This domain and histone peptide superimposes well with that of the 53BP1 TTD complex structure with H4K20me2. However, only two of the residues of the 53BP1 cage appear conserved, although a different tyrosine may compensate for the lack of the other cage residues. The implications of recognition of the same histone mark by three different domains of UHRF1 are not yet known, although it might suggest a mechanism for this multi-domain protein to remain at the chromatin while other factors, such as G9a, HDAC1 or DNMT1 can be recruited.
PWWP domains, so named for a conserved Pro-Trp-Trp-Pro motif within, consist of a five-strand β-sheet packed against a helical bundle, and were initially characterized as non-specific DNA-binding domains (Lukasik et al., 2006; Qiu et al., 2002; Slater et al., 2003). However, its chromatin-targeting ability (Ge et al., 2004) and its similarity to Tudor and chromodomains suggested that they might also have the ability to bind methylated ligands (Nameki et al., 2005). A comparison of all the PWWP domain structures solved thus far demonstrate that the β-barrel and the C-terminal two helices are well conserved, with some variability in the integrity of the remaining helices that comprise the bundle. Pdp1, a Set9-binding protein in fission yeast, contains a PWWP domain that associates with H4K20me1, but not H4K20me2 or H4K20me3, employing at least two of the conserved hydrophobic residues equivalent to those that constitute the binding pocket of the Tudor domain (Wang et al., 2009b). Similarly, the bromodomain and PHD-finger containing protein Peregrin (Brpf1) also contains a PWWP domain that is able to bind H3K36me3 and is important for chromatin association during mitosis (Laue et al., 2008), employing a three-residue aromatic cage to accommodate the trimethyl-lysine and a groove formed by a loop-helix and loop-strand to bind the residues preceding K36me3 (Vezzoli et al., 2010) (Figure 4C).
The structure of a domain-swapped PWWP dimer of hepatoma-derived growth factor has also been solved, containing two superimposable domains and flexible, unstructured linkers in between. It is not known if this is the physiologically relevant form of the domain, but its affinity for heparin increases 100-fold when in the dimeric form over its monomeric state (Sue et al., 2007). Given that other tandem repeat domains have been shown to be able to bind one or more modifications, it is possible that tandem PWWP domains would also be able to do so.
Amongst the first histone recognition domains to be characterized structurally is the chromodomain, a small, methyl-lysine binding domain consisting of a three-stranded β-sheet packed against a helix (Ball et al., 1997). There are generally two known structural classes of chromodomains – the HP1 and chromobox chromodomains and the tandem chromodomains of the CHD proteins.
Earliest work has focused on the interaction of methylated H3K9 or H3K27 peptides with the HP1 or chromobox (CBX) chromodomains. The chromodomains of these two subclasses of proteins are very similar in sequence and structure. The different isoforms of each subclass can be differentiated partly based on their relative affinities for different methylation states on H3K9 or H3K27 (Bernstein et al., 2006). Complex structures of these protein-peptide interactions show that the histone peptide adopts a β-strand conformation that is part of the chromodomain’s β-sheet (Fischle et al., 2003b; Jacobs and Khorasanizadeh, 2002; Min et al., 2003; Nielsen et al., 2002) (Figure 5A). The methyl-lysine binding site is characterized by three highly conserved aromatic residues, which are essential for histone modification recognition. Trimethylated H3K9 or H3K27 is preferred over the dimethyl group, as it provides for additional polar and van der Waals interactions with the chromodomain (Fischle et al., 2003b; Hughes et al., 2007; Jacobs and Khorasanizadeh, 2002; Min et al., 2003; Nielsen et al., 2002). Structurally, the tri- and di- and monomethyl-lysine/chromodomain interactions appear almost identical, with the entire methyl-lysine chain lying in the same plane relative to the binding pocket. Recognition of H3K27me by the CBX proteins (such as Drosophila Pc) differs from the recognition of H3K9me by HP1 in that in the former case, there are more contacts between the protein and the residues preceding the methylated lysine, and a larger surface area is buried upon methyl-lysine binding (Fischle et al., 2003b). HP1 chromodomains seem to exhibit a strong preference for H3K9me over H3K27me (Bernstein et al., 2006), while several CBX chromodomains exhibit little preference of one mark over the other, although functionally CBX proteins have been associated with H3K27me2 and H3K27me3.
CHD (chromo-ATPase/helicase-DNA-binding) proteins contain two tandem chromodomains N-terminal to DEXDc and HELICc helicase domains, and are involved in a number of different roles related to chromatin remodeling as specified by their variable C-terminal domains. Structurally, CHD1 has been the best studied of the 9 mammalian family members; it is similar to CHD2 and to all unicellular CHD proteins (Flanagan et al., 2007). The structure of the tandem chromodomains of CHD1 with several H3K4me3 peptides show coordinated binding of the single mark by both domains (Flanagan et al., 2005). Compared with the binding mode of the HP1 and Pc chromodomains, the two CHD chromodomains each contain a large sequence insertion. One insert serves to form part of the binding interface for the H3K4me3 peptide; the other forms a loop that spatially occupies the equivalent binding site of where the HP1 and Pc chromodomains would be expected to bind H3K9me or H3K27me (Figure 5B). The additional presence of H3K9ac, H3S10ph and H3K14ac do not hinder binding significantly, while H3R2me2a or H3T3ph weakens the interaction (KD ~5 µM weakens to 24 µM or 140 µM, respectively) (Flanagan et al., 2005). In the former complex structure, the R2me2a sidechain is on the periphery of the first chromodomain, with the only major effect of the methylation being the disruption of a single hydrogen bond between H3R2 and the CHD1 backbone. In the latter complex structure, H3T3ph inserts into an interface between the two chromodomains; because this modification prevents CHD1 binding to H3K4me3 in vivo during mitosis, and dephosphorylation during anaphase allows CHD1 binding to chromatin during telophase, these two modifications are believed to act as a binary switch. Intriguingly, the yeast form of CHD1 cannot bind H3K4me (Flanagan et al., 2005; Sims et al., 2005) owing to the lack of a critical aromatic residue and altered position of the chromodomain insert that interacts with the H3K4me3 peptide in the human form (Flanagan et al., 2007; Okuda et al., 2007).
The chromobarrel domain is similar to the HP1/CBX chromodomain but lacks the C-terminal a-helix and has two additional strands contributing to the β-sheet. Most intriguingly, the structure better resembles the complex structures of HP1/CBX chromodomains with their methylated histone peptides, with the most N-terminal β-strand of the chromobarrel equivalent to the strand formed by the histone peptide in the chromodomain complex (Nielsen et al., 2005). Chromobarrel domains of homologs MRG15 and Eaf3 (regulators of global histone acetylation) have been shown to interact with H3K36me2 and H3K36me3 via the three aromatic cage residues conserved from the chromodomain with moderate to weak affinity (Sun et al., 2008; Xu et al., 2008; Zhang et al., 2006), despite the presence of the additional β-strand that would otherwise occupy the position of the peptide. The structure of the chromobarrel domain from Eaf3 C-terminally fused to a H3K36me2 peptide showed that these aromatic residues indeed coordinate the methyl-lysine, but that the peptide resides outside of the pocket, such that the lysine side-chain is rotated approximately 100° away from the position in the chromodomain complex structures (Xu et al., 2008) (Figure 5C). An additional Trp residue adjacent to the sequentially third aromatic residue of the chromodomain cage is also important for methyl-lysine binding (Sun et al., 2008; Xu et al., 2008), a residue that is a conserved aromatic residue in CBX proteins, but an aspartate or histidine in the HP1 proteins. In fact, the side-chain of this third chromodomain aromatic cage residue is spatially positioned in between that of the equivalent chromobarrel residue and the aforementioned fourth aromatic residue, while the sidechain positions of the other two cage residues are remarkably well conserved. Conversely, the chromobarrel domain found in the H4K16 acetyltransferase MOF lacks most of these aromatic residues, and is believed to contribute instead to RNA binding to the intact protein (Akhtar et al., 2000; Nielsen et al., 2005).
The related chromoshadow domain, which resembles the HP1/CBX chromodomain but with an additional helix preceding the C-terminal helix, and which is found C-terminal to the chromodomain in HP1 proteins, lacks two of the three aromatic residues that coordinate the methyl-lysine in the chromodomain, and does not associate with histones. It acts as a dimerization domain for HP1 and as a dimer is responsible for protein-protein interactions (Brasher et al., 2000; Cowieson et al., 2000), and can interact with its substrate via a dimeric interface on the opposite side to where the methyl-lysine pocket would be in the chromodomain structure (Huang et al., 2006b; Thiru et al., 2004).
WDR5 is a subunit of the MLL/SET1/COMPASS H3K4 methyltransferase complexes and contains seven WD40 repeats to form a β-propeller fold. Initially, it was reported that WDR5 could specifically interact with dimethylated H3K4 nucleosomes (Wysocka et al., 2005), although structures determined later demonstrated that there was no specificity for this mark, and that in fact WDR5 could bind to multiple methylated forms of H3K4 (Couture et al., 2006; Han et al., 2006; Ruthenburg et al., 2007a; Schuetz et al., 2006). In a complex structure with H3K4me peptide, H3R2 is bound to the central cleft formed by the WD40 repeats, and that the H3K4me mark is actually solvent exposed (Couture et al., 2006; Han et al., 2006) (Figure 6A). Instead, dimethylated H3K4 forms extra hydrogen bonds with WDR5 (Couture et al., 2006; Schuetz et al., 2006). It was then proposed that WDR5 functions as a “histone modification intermediate” (Ruthenburg et al., 2007a), which helps present the lysine for methylation, but primarily recognizes H3R2 within its central cavity. The crystal structure of the MLL1 SET domain bound to H3K4me3 (Southall et al., 2009) argues against this model, as the methylated peptide is buried deep in a pocket formed by the first strand of the SET-I subdomain, the first helix of the SET-C subdomain and the required AdoHcy cofactor, and would not be accessible to WDR5 (Trievel and Shilatifard, 2009). A WDR5-interacting motif within the SET-N domain, which bears a conserved arginine in the center and homology to the H3 N-terminus, was shown to bind to the same pocket employed for H3R2 binding while adopting a similar peptide backbone conformation, and association with this motif is crucial for MLL1-WDR5 interaction as well as MLL core complex formation (Patel et al., 2008a; Patel et al., 2008b; Song and Kingston, 2008). When WDR5 and MLL are associated, MLL-directed H3K4me is better able to compete for binding to WDR5, which would then result in dissociation from MLL (Song and Kingston, 2008). Further complicating the story is the discovery that WDR5 can bind to a symmetrically methylated H3R2 peptide even more favorably than without the modification (see below).
The C-terminal domain of EED, which is part of the Polycomb repressive complex that includes methyltransferase EZH2, forms a similar structure to that of WDR5, with an additional short strand and deformed helix on the opposite side to the N-and C-termini of the fold. Complex structures with H1, H3 or H4 peptides containing methylated lysines show that the methylated sidechain faces inward into the center of the circular fold, in contrast to the WDR5 structures (Margueron et al., 2009) (Figure 6B).
The seven-bladed β-propeller fold is also found in the histone chaperone/chromatin remodeling complex subunit RbAp46 (p55), but it interacts with an unmodified H4 tail via a unique α-helix and loop on the N/C-terminal side of the fold, instead of the central cavity (Murzina et al., 2008; Song et al., 2008).
Like phosphorylation, arginine methylation appears to play roles either in facilitating protein-protein interactions by recruitment by a specific domain, or in augmenting or weakening an existing interaction with an adjacent modification. A binary switch has been proposed for asymmetric H3R2me2 and H3K4me3, with the arginine methylase PRMT6 physically preventing a PHD finger within the Set1 complex from trimethylating H3K4 (Kirmizis et al., 2007). SMN Tudor domains have been shown to be recruited to both asymmetrically- and symmetrically-dimethylated arginines on their target proteins (Cheng et al., 2007; Cote and Richard, 2005), and this coordination appears to involve an aromatic cage similarly employed by other Tudor domains in their interaction with methylated lysines (Bedford, 2007; Chen et al., 2009; Friberg et al., 2009; Sprangers et al., 2003). Other structures have been solved of different domains in complex with histone peptides containing both methylated arginines and methylated lysines. A study of the interaction between the two chromodomains of CHD1 and H3K4me3 revealed that H3R2 was important for mediating this interaction, and asymmetric methylation at this position decreased affinity for the peptide fourfold (Flanagan et al., 2005) (Figure 7A). A similar decrease in affinity is observed upon additional symmetric R2 methylation in the interaction between the ING2 PHD finger and H3K4me3 (Ramon-Maiques et al., 2007). Conversely, the recombinase RAG2 PHD finger enjoys moderately enhanced binding with the inclusion of both H3R2me2s and K4me2/3 (Ramon-Maiques et al., 2007) (Figure 7B). A tyrosine within the first β-strand interacts with the methylated guanidinium group, and this is expected to account for the slight decrease in affinity observed by intrinsic fluorescence and fluorescence anisotropy. Asymmetrically dimethylated H3R2 also contacts the tyrosine residue, but the interaction appears less stable. The interaction between the Pygo PHD finger, BCL9 HD1 domain and an H3 peptide containing K4me2 is unaffected by the addition of H3R2me2a; the modification points toward the solvent, away from the rest of the structure (Fiedler et al., 2008).
In vitro studies have established that the WD40 repeat protein WDR5 primarily recognizes H3R2 with little consideration of the methylation state of K4. PRMT7-mediated symmetric dimethylation of H3R2 results in a significant 50-fold decrease in KD; a complex structure of WDR5 and a symmetrically dimethylated H3R2 peptide illustrates additional contacts between the methylated guanidinium group inserted deeply into the central binding cleft (M. Walsh, personal communication). It is believed that activity of PRDM14, an H3K4me1 methylase, is coordinated with that of PRMT7 at transcription start sites in specific genes. Asymmetrically dimethylated H3R2 cannot bind to WDR5, and this modification displaces MLL/WDR5 from chromatin (M. Walsh, personal communication).
Monomethylated H3R2me1 does not exert the same antagonistic effect as asymmetric H3R2me2 on H3K4me3, and is instead associated with active transcription (Kirmizis et al., 2009); the structural studies of the RAG2 PHD finger demonstrated that monomethylated H3R2 had no effect on K4me3 binding, equivalent to unmethylated R2 (Ramon-Maiques et al., 2007).
The ability of a single conserved structural fold to recognize different modifications, and its evolution to adapt to different histone sequence scenarios, is a testament to the ability of nature to accomplish multiple functions employing a surprisingly restricted fold space. The minor structural changes – such as an additional secondary structure element – to a basic fold that can alter its functionality highlight an interesting suggestion that cross-talk between these domains have evolutionarily selected for these specific functions (Jin et al., 2009). Furthermore, a modular domain in a given protein that can recognize multivalent modifications enables it to modulate protein or complex activity in a temporally or spatially contextual manner, balancing affinities relative to competing effects, and allowing for more dynamic regulation (Ruthenburg et al., 2007b).
The RAG2 PHD finger is believed to be the first domain identified to interact directly with two histone modifications (with Brdt being the second). It may be that this confers a distinct advantage to RAG2 over other H3K4me binding proteins, which would be otherwise repelled by R2 methylation (Ramon-Maiques et al., 2007). Many other domains appear to recognize a residue adjacent to a modified residue with a separate pocket or set of residues, and it may be that acetylation, methylation or phosphorylation to these unmodified residues could enhance binding. Steric constraints due to the larger size of modifying enzymes or complexes likely prevent two nearby histone residues from being modified at the same time, and thus the appearance of such modifications would be sequential. This might allow for a two stage stepwise function toward a final activity.
The presence of tandem domains that recognize histone modifications adds another layer of complexity to the interpretation of the epigenetic code. Tandem domains might coordinate mutually exclusive modifications that are indicative of a balance of control of transcription, such as PRMT6-directed H3R2 methylation and H3K4 methylation by MLL (Guccione et al., 2007). Alternatively, it has been speculated that tandem domains could recognize two modifications simultaneously. Thus far, however, there are no available complex structures of tandem domains recognizing more than one modification, and it is curious how in many tandem repeats, only one domain appears to bind a modified histone peptide, despite its requirement for the other tandem domain to be intact. It could be that the various methods employed to analyze these interactions in vitro are failing to capture what may be very transient interactions between chromatin and these domains in vivo. One of the more common approaches for identifying the histone modifications that a given domain might interact with requires a screening of a restricted library of modified histone peptides a research lab has at its disposal, and a bit of luck that it is one of these that will bind in an in vitro assay. More functional evidence of the coincidence of histone marks or the appearance or activity of a modifying enzyme with the activity of the protein of interest can provide clues as to which modifications could be targets. Still, many of these interactions are relatively weak and thus difficult to detect, and the isolation of these domains may prevent additional reinforcing or stabilizing interactions that could be formed with chromatin via other domains of the protein, or other components of the complex within which the protein may function. The recognition of multiple histone modifications on a single tail, as well as modifications on multiple histone tails may produce avidity effects on chromatin in vivo (Ruthenburg et al., 2007b) that may account for very low affinities observed in vitro. In the absence of supporting experimental evidence, caution should be exerted to avoid overinterpreting these data.
Tandem domain recognition of histone modifications alludes to the possibility of recognition by one protein of two modifications from different histones, even from different nucleosomes. Such modifications may have been observed to appear at the same “time” (as much as can be detected with current protocols), leading one to speculate that functional crosstalk occurs between modifications (Suganuma and Workman, 2008). One example of trans-histone effects is the “master switch” involving H2B ubiquitination at K120, which is correlated with and is a prerequisite for methylation at H3K4 and H3K79, while deubiquitination enables methylation at H3K36. There are also several examples of protein complexes that contain multiple histone modifying enzymes, such as the Polycomb Repressive Complex 1 (PRC1), which contains both H2AK119 ubiquitination and H3K27 methylation activities (Schwartz and Pirrotta, 2007), and the MLL-MOF complex, which controls both H3K4 methylation and H4K16 acetylation (Dou et al., 2005).
Finally, as we look beyond histone modifications in gene transcription, an increasing number of studies have pointed to roles for other ligands that behave as enhancers or competitors to these modifications. Many proteins that recognize histone modifications are prone to covalent modifications, including methylation, acetylation, phosphorylation and sumoylation (Kang et al., 2010). Such covalent modifications may influence the ability of the protein to recognize histone tails, as well as serve to recruit additional factors that have further histone modifying or recognition functions. Long non-coding RNA also appears to play a major role in gene transcription, and there is now evidence showing that it is crucial for the recruitment and proper function of chromatin remodeling complexes at chromatin (Khalil et al., 2009; Rinn et al., 2007), and that non-coding RNA interactions are mediated in part by the same modules that recognize histone modifications (Yap et al., 2010). Such a diverse set of protein modular domains is composed of only a small set of conserved structural folds, which have evolved to exhibit versatile functionalities in modulating molecular interactions during a wide variety of cellular processes, particularly gene transcription.
The authors wish to thank M. Walsh for helpful discussion and M. Walsh, H. Song, Q. Zhang and L. Zeng for providing their unpublished structural data for this review.
DECLARATION OF INTEREST
This work was supported by a Terry Fox Foundation postdoctoral fellowship to K.L.Y. from the National Cancer Institute of Canada and grants to M.-M.Z. from the Empire State Stem Cell Trust Fund (NYSTEM) and the National Institutes of Health (R01CA087658-10, R01GM073207-04, R01HG004508-02, RC1DA028776-01). The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper.