|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: HW JM. Performed the experiments: HW HZ CX RL WT MFA LD WQ. Analyzed the data: HW HZ CX RL WT MFA LD WQ YW JM. Wrote the paper: JM.
The PWWP domain was first identified as a structural motif of 100–130 amino acids in the WHSC1 protein and predicted to be a protein-protein interaction domain. It belongs to the Tudor domain ‘Royal Family’, which consists of Tudor, chromodomain, MBT and PWWP domains. While Tudor, chromodomain and MBT domains have long been known to bind methylated histones, PWWP was shown to exhibit histone binding ability only until recently.
The PWWP domain has been shown to be a DNA binding domain, but sequence analysis and previous structural studies show that the PWWP domain exhibits significant similarity to other ‘Royal Family’ members, implying that the PWWP domain has the potential to bind histones. In order to further explore the function of the PWWP domain, we used the protein family approach to determine the crystal structures of the PWWP domains from seven different human proteins. Our fluorescence polarization binding studies show that PWWP domains have weak histone binding ability, which is also confirmed by our NMR titration experiments. Furthermore, we determined the crystal structures of the BRPF1 PWWP domain in complex with H3K36me3, and HDGF2 PWWP domain in complex with H3K79me3 and H4K20me3.
PWWP proteins constitute a new family of methyl lysine histone binders. The PWWP domain consists of three motifs: a canonical β-barrel core, an insertion motif between the second and third β-strands and a C-terminal α-helix bundle. Both the canonical β-barrel core and the insertion motif are directly involved in histone binding. The PWWP domain has been previously shown to be a DNA binding domain. Therefore, the PWWP domain exhibits dual functions: binding both DNA and methyllysine histones.
This article can also be viewed as an enhanced version in which the text of the article is integrated with interactive 3D representations and animated transitions. Please note that a web plugin is required to access this enhanced functionality. Instructions for the installation and use of the web plugin are available in Text S1.
The PWWP domain was first identified as a structural motif of 100–130 amino acids in the WHSC1 (Wolf-Hirschhorn syndrome candidate 1) protein and named after the conserved motif Pro-Trp-Trp-Pro in WHSC1. It was predicted to be a protein-protein interaction domain . Indeed, the PWWP domain of the DNA methyltransferase DNMT3A directly binds SALL3, which functions as an inhibitory factor for DNMT3A. SALL3 expression reduces DNMT3A-mediated CpG island methylation in cell culture and in vitro . A mutation in the PWWP domain of DNMT3B diminishes its interaction with the SUMO E3 ligase PIAS1 .
The PWWP domain was later on shown to bind DNA in 2002 by Cheng's laboratory . The PWWP domains of the DNA methyltransfearses DNMT3A and DNMT3B are essential for targeting DNA methylation to heterochromatin regions through their chromatin binding ability , . HDGF (hepatoma-derived growth factor) and the HRPs (HDGF-related proteins) consist of a highly conserved PWWP domain in their N-terminus and a variable region in the C-terminus. PWWP domains in this subfamily of PWWP-containing proteins also exhibit DNA binding ability and some of these HDGF proteins are implicated in development . HDGF exerts its transcription repressive effect through binding to a conserved DNA element in the promoter region of target genes , although it was also reported that it functioned as a nonspecific DNA-binding domain . Another member of this subfamily, PSIP1 (PC4 and SFRS1 interacting protein 1), is a transcriptional coactivator and involved in lentiviral integration. It was shown that the PWWP domain in PSIP1 displays affinity for DNA and chromatin and its chromatin binding ability is crucial for the HIV-1 integration , . Recently, PSIP1 was found to promote association of the MLL complex with transcriptionally active chromatin through its PWWP domain . The eukaryotic mismatch repair protein MSH6 also harbors a PWWP domain at its N-terminal region, which binds double-stranded DNA non-specifically .
Alongside Tudor, chromodomain, MBT domains, the PWWP domain belongs to the Tudor domain ‘Royal Family’ . The core of the Tudor, MBT and PWWP domains is composed of five β-strands. The canonical chromodomain contains three β-strands that correspond to the middle three β-strands of the Tudor, MBT and PWWP domains, and a C-terminal α-helix. The common function of the ‘Royal Family’ members is their ability to recognize lysine/arginine methylated histones or proteins through an aromatic cage , , . Although the sequence and structure alignments show that PWWP domains exhibits structural similarity to other ‘Royal Family’ members and most PWWP domains also contain an aromatic cage, it was only recently shown that PWWP is able to bind lysine methylated histone , , , .
In order to systematically study the structure and function of this domain, we purified some representative human PWWP domains and tested their binding ability to different histone peptides. The results show that PWWP domain is a weak methyllysine histone binder. Furthermore, we determined the crystal structures of the PWWP domains from seven different human proteins and three PWWP domain complex structures with histone peptides, i.e., BRPF1-H3K36me3, HDGF2-H3K79me3 and HDGF2-H4K20me3. Therefore, the PWWP domain can not only bind DNA but also histones.
The PWWP domain was first identified in WHSC1 and named after the central core motif Pro-Trp-Trp-Pro of the PWWP domain in WHSC1 . The PWWP domain comprises 100–130 amine acids and is often present in chromatin-associated proteins. In the human genome, there are at least 22 PWWP domain-containing proteins and three of them contain 2 PWWP domains (WHSC1, WHSC1L1, NSD1). The PWWP domains can be categorized into 6 classes based on sequence homology (Figure 1). The major difference between these different PWWP domains is localized on the insertion motif, which varies in length among the different PWWP domains.
In order to explore the functional roles of the PWWP domains, we determined the crystal structures of the PWWP domains from 7 different human proteins, namely, BRPF1 (bromodomain and PHD finger-containing protein 1), BRPF2, BRPF3, MUM1 (melanoma associated antigen (mutated) 1), DNMT3A, DNMT3B and HDGF2 (Hepatoma-derived growth factor 2). Our structures revealed that the overall fold of the PWWP domain consists of three motifs: a canonical β-barrel core, an insertion motif between the second and third β-strands and a C-terminal α-helix bundle (Figure 2). The canonical β-barrel core harbors an aromatic cage constructed by three aromatic residues, which is a signature feature of the Tudor domain “Royal family” (Figure 1). MBD5 (methyl-CpG binding domain protein 5) and the N-terminal PWWP domain of NSD1 (nuclear receptor binding SET domain protein 1) are two exceptions, which have just one aromatic residue at the conserved positions. A PWWP characteristic C-terminal α helix motif is located in the C-terminal part of the PWWP domains consisting of 1–5 α-helixes. A structure comparison of these PWPP domains shows that the insertion motif between the second and third β strands varies in length and secondary structure among these different classes of PWWP domains. This variable insertion motif is plausibly caused by intron/exon sliding at the genomic level, as the coding region for the second and third β strands are often split by an intron .
The PWWP domain is structurally similar to other members in the Tudor domain ‘Royal family’ , and many members in this superfamily have been shown to bind methylated histones . Furthermore, the vast majority of PWWP domains have the aromatic residues in the conserved positions that form a putative methyllysine binding aromatic cage (Figure 1). Therefore, it was compelling to speculate that the PWWP domains may also exhibit methylated histone binding ability, which was proved in recent studies , , , . The Pombe protein Pdp1 harbors a PWWP domain in its N-terminus, which was shown to bind mono-methylated histone H4K20. Because the C-terminal fragment of Pdp1 is able to bind the Pombe H4K20 methyltransferase SET9, SET9 is recruited to the H4K20me1 chromatin region through the PWWP domain of Pdp1 to increase the concentration of SET9 on chromatin and carry out the trimethylation of histone H4K20 . The PWWP domains of BRPF1 and DNMT3A were reported to bind H3K36me3 , . BRPF1 was shown to be present on the actively transcribed gene, and its enrichment corresponds to that of H3K36me3 . DNMT3A was recruited to the chromatin region with the H3K36me3 mark through its interaction of the PWWP domain with H3K36me3 .
To better understand the histone binding ability and preference of these human PWWP domains, we used fluorescence polarization and NMR titration techniques to measure binding affinities of some representative PWWP domains to various histone peptides bearing different lysine methylation states. By fluorescence polarization assay, we found that the PWWP domains in BRPF1, BRPF2, HDGF2, MUM1 and the N-terminal PWWP domains of WHSC1 and WHSC1L1 show weak binding affinity to histones with H3K36, K3K79 or H4K20 methylation (Table 1). In order to confirm this weak histone binding, we used NMR titration to measure the binding affinity of BRPF1 to different histone peptides. Our NMR titration results show that BRPF1 does not exhibit detectable binding to the H3K4me3 and H3K9me3 peptides, but binds H3K36me3 with a Kd of ~3 mM (Figure 3), which is consistent with the results reported by Bycroft's group . BRPF1 also shows weaker binding to H3K36me2 and H3K79me3 peptides (Figure 3A). BRPF2 displays a binding preference similar to BRPF1. HDGF2 binds H3K36me3, H3K79me3 and H4K20me3 weakly (Table 1). Consistent with the high throughput binding assay by Mann's group, WHSC1 and WHSC1L1 binds H3K36me3 . It was reported that DNMT3A also binds H3K36me3 . Taken together, similar to other members in the ‘Royal family’, PWWP domain also exhibits methyllysine histone binding ability.
To shed light on the molecular mechanism of methylated histone binding by PWWP domains, we determined the crystal structures of the PWWP domain of human BRPF1 in complex with H3K36me3 and that of human HDGF2 in complex with H3K79me3 and H4K20me3.
In the BRPF1-H3K36me3 complex structure, the peptide resides in a groove formed by the insertion motif, the fourth β-strand and its preceding loop from the BRPF1 PWWP domain (Figure 4A–C and Figure S1A). The trimethylated lysine K36 is accommodated in an aromatic cage formed by three aromatic residues (Y1096, Y1099 and F1147). Besides, histone residues H3T32, H3G33 and H3K36me3 make several hydrogen bonds with residues from the fourth β-strand and its preceding loop of BRPF1 PWWP (Figure 4B). Interestingly, the H3Y41 from the histone H3 peptide forms one side of the aromatic cage, but mutating H3Y41 to alanine does not significantly affect the binding of H3K36me3 peptide to BRPF1 (data not shown). We infer that H3Y41 is not involved in the H3K36me3 recognition. Therefore, bothe the canonical β-barrel core and the insertion motif are directly involved in histone binding.
We were also able to co-crystallize HDGF2 with both H3K79me3 and H4K20me3 peptides, which show very similar binding mode. In the HDGF2-H3K79me3 complex structure (Figure 4D–F and Figure S1B–C), the trimethylated lysine K79 is accommodated in an aromatic cage formed by three aromatic residues (Y18, W21 and F44). This trimethylated K79 is the major contributor of the histone binding to the PWWP domain, although histone H3 residues Q76 and D77 also make two hydrogen bonds with V33 from the insertion motif and T50 from the fourth β-strand of the HDGF2 PWWP domain (Figure 4E). This may also explain why HDGF2 shows very weak binding affinity to H3K79me3.
DNMT3A had been shown to bind histone H3K36me3 , but we were not able to obtain its cocrystals with H3K36me3, Nevertheless, we found a bis-tris buffer molecule in both the DNMT3A and DNMT3B structures (Figure 5 and Figure S1D–E). The propensity of the aromatic cage to bind buffer molecules had been identified before in another ‘Royal family’ member, L3MBTL1 , . The bis-tris molecule resides in the conserved aromatic cage of the PWWP domains (Figure 5). Superposition of the DNMT3A and DNMT3B with the BRPF1 and HDGF2 complex structures shows that the bis-tris molecule is bound in the position occupied by the tri-methyl ammonium group of the methyllysine (Figure 6). The bis-tris molecule is bound to DNMT3A and DNMT3B in slightly different conformations. In the DNMT3A structure, the bis-tris molecule forms two hydrogen bonds with the D333 residue from DNMT3A, and three hydrogen bonds with residues G298, L300 and S304 through a conserved water molecule. In the DNMT3B structure, the bis-tris molecule forms one hydrogen bond with the D266 residue from DNMT3B, and three more hydrogen bonds with the conserved residues G231, I233 and S237 via the conserved water molecule. DNMT3A and DNMT3B are DNA methyltransferase, which are essential for de novo methylation and mammalian development . Aberrant DNA methylation is implicated in various diseases, including cancer . The current focus of drug discovery mainly targets on the catalytic domain of DNA methyltransferases. The bis-tris molecule in complex with the PWWP domain of DNMT3A or DNMT3B provides a clue for designing small molecules targeting their histone binding domain.
In all these complex structures, the methyllysine binding aromatic residues are from the loop between the first and second β-strands, the N-terminus of the second β-strand and the C-terminus of the third β-strand (Figure 1). The histone residues C-terminal to the modified lysine do not make significant contributions to the binding (Figure 4), reminiscent of HP1 and Polycomb chromodomains, which mainly binds H3K9me3 and H3K27me3 peptides through residues N-terminal to the respective target lysines , , , . In these complex structures, the insertion motif is directly involved in histone binding, forming one side of the histone binding groove (Figure 6). Furthermore, this insertion motif has different lengths and structures among these PWWP domains (Figure 1 and and6),6), which may imply that the insertion motif plays a role in determining the ligand specificity.
Comparison of the structural architecture of the PWWP domain to those of chromodomain, MBT and Tudor domains shows that PWWP, MBT and Tudor all have a 5-β-strand canonical core, while the chromodomain consists of three β-strands and one α-helix (Figure 7). Overall, the fold of the PWWP domain has highest structure similarity to that of a single MBT repeat, i.e., the β-strand core is followed by α helixes, which packs against the β barrel core (Figure 7A and 7B). Other domains similar to PWWP are found in Eaf3 and MRG15. Eaf3 and MRG15 bind H3K36me3 through a chromo barrel domain , , . This chromo barrel domain is structurally similar to the PWWP domain (Figure 7C), but it lacks the PWWP motif, and it harbors a small helix turn between the third and fourth β-strands that lacks in the PWWP domain (Figure 7C). The canonical Tudor domain consists of five β-strands, which can overlay perfectly with the β-barrel core of PWWP (Figure 7D). A typical chromodomain consists of three β-strands and one α-helix, and the three β-strands can be superimposed with the middle three β-strands of PWWP, MBT and Tudor domains (Figure 7E and 7F).
The histone methyllysine binding mode exhibited by PWWP is similar to that adopted by other methyllysine binding proteins , , , , , , , , . A common feature of these methyllysine binding proteins is that they use an aromatic cage to recognize the methylated lysine . Nevertheless, the histone peptides are bound to their corresponding binders in different orientations (Figure 7G), indicating that the royal family members do not share a common binding cleft, but a similar aromatic cage at an almost identical position. Interestingly, histone peptides bind to a single chromodomain as a β-strand in a position corresponding to the first β-strand of the 5-strand canonical cores of PWWP, Tudor and MBT domains , , , , , ,  (Figure 7D).
Mutations in PWWP domain-containing proteins have been implicated in different human diseases. The gene WHSC1 is located in the Wolf–Hirschhorn syndrome critical region on 4p16.3 and is disrupted by chromosomal translocation in lymphoid multiple myeloma disease . It was recently shown that BRPF2 is associated with schizophrenia and bipolar affective disorder . HDGF was reported to be involved in tumorigenesis  and the PWWP domain in PSIP1 is critical for chromatin binding and the HIV virus type 1 infectivity . Mutations in MSH6 causes inherited somatic defects in MMR and result in increased development of hereditary non-polyposis colorectal cancer . DNMT3A and DNMT3B are de novo DNA methyltransferases and the loss-of-function mutations in human DNMT3B causes a developmental defect characterized by hypomethylation of pericentromeric repeats and are implicated in ICF (immunodeficiency, centromeric instability, facial anomalies) syndrome , . So far, the identified point mutations (Figure 1, residues highlighted in yellow) that are implicated in diseases or important for functions are all located either in the aromatic cage or on the fourth β-strand, regions involved in histone and DNA binding.
DNA fragment encoding the PWWP domain of human BRPF1 (residues 1085–1213), BRPF2 (residues 925–1049), BRPF3 (residues 1056–1195), HDGF2 (residues 1–93), DNMT3A (residues 278–427), DNMT3B (residues 206–355), MUM1 (residues 406–539) WHSC1 (residues 208–368) and WHSC1L1 (residues 247–402) were amplified by PCR and sub-cloned into pET28-MHL vector (Genbank accession number: EF456735) and transformed into Escherichia coli BL21 (DE3)-V2R-pRARE2. The cells were grown in Terrific Broth and the protein was over-expressed by addition of 1 mM isopropyl-1-thio-D-galactopyranoside (IPTG) and incubated overnight at 15°C. Harvested cells were resuspended in 50 mM HEPES, pH 7.4, supplemented with 500 mM NaCl, 2 mM β-mercaptoethanol, 5% glycerol, 0.1% CHAPS. The cells were lysed by passing through a microfluidizer (Microfluidics Corporation) at 20,000 psi. After clarification of the crude extract by high-speed centrifugation, the lysate was loaded onto a 5 ml HiTrap chelating column (GE Healthcare), charged with Ni2+. The column was washed with 10 column volumes of 20 mM HEPES buffer, pH 7.4, containing 500 mM NaCl, 50 mM imidazole and 5% glycerol, the protein was eluted with 20 mM HEPES buffer, pH 7.4, 500 mM NaCl, 250 mM imidazole, 5% glycerol. The protein was dialyzed against buffer containing 20 mM HEPES, pH 7.4, 500 mM NaCl and 5% glyceral. TEV protease was added to combined fractions containing target proteins to remove the His-tag. All the proteins except DNMT3A were further purified to homogeneity by ion-exchange chromatography on Source 30S column (10×10) (GE Healthcare), equilibrated with 20 mM PIPES buffer, pH 6.5, and eluted with linear gradient of NaCl up to 500 mM concentration (20CV). For DNMT3A, Source 30Q column was used for ion exchange chromatography. The 15N-labeled proteins for NMR titration were purified in the same protocols as native ones except that bacteria were grew in M9 minimal medium containing 1 g/L 15(NH4)2SO4 as the sole nitrogen source. The labeled proteins were concentrated to 0.15–0.3 mM for NMR titration.
Purified PWWP domain proteins were crystallized using hanging drop vapor diffusion method at 20°C by mixing 1 µl of the protein solution (10 mg/mL) with 1 µl of the reservoir solution. BRPF1 (apo) and its complex with H3K36me3 peptide were crystallized in 3.5 M sodium formate, 0.1 M Tris-HCl, pH 8.5; BRPF2 in 30% PEG2K-MME, 0.20 M potassium bromide; BRPF3 in 30% PEG 4,000, 0.2 M ammonium sulfate, 0.1 M sodium cacodylate, pH 6.5; HDGF2 in 2.0 M ammonium sulfate, 0.2 M potassium/sodium tartrate, 0.1 M sodium citrate pH 5.6; HDGF2-H3K79me3 complex in 2.0 M ammonium sulfate, 5% isopropanol; MUM1 in 25% PEG 3,350, 0.1 M ammonium sulfate, 0.1 M HEPES, pH 7.5; DNMT3A in 28% PEG 3,350, 0.1 M ammonium sulfate, 0.1 M Bis-Tris, pH 6.0; DNMT3B in 30% PEG2K-MME, 0.20 M potassium bromide, 0.1 M Bis-Tris, pH 6.5. The peptides used for co-crystallization are: SAPATGGVKme3KPHRYR (H3K36me3); EIAQDFK(me)3TDLRY (H3K79me3); AKRHRKme3VLRDN (H4K20me3).
Fluorescence polarization assays were performed in 384-well plates, using the Synergy 2 microplate reader from BioTek as described in . All the peptides were synthesized and purified by Tufts University Core Services (Boston, MA, U.S.A.), with the N-terminus labeled with fluorescein. Binding assays were performed in a 10 µl volume at a constant labeled peptide concentration (40 nM), by titrating the PWWP domains (at concentrations ranging from low to high micromolar) into 20 mM PIPES buffer (pH 6.5), containing 50 mM NaCl, 0.01% Tween-20. The data points were fitted to ligand binding function using Sigma Plot software to determine the Kd values.
To map the binding site of BRPF1 and HDGF2 PWWP domain for various methylated histone peptides and estimate the corresponding Kds, 15N-1H HSQC spectra were collected with 15N-labeled samples of PWWP domains, free and with additions of increasing amounts of unlabeled H3K4me3 (1–11 aa), H3K9me3 (1–15 aa), H3K36me3 (30–41 aa), H3K79me3 (73–84 aa), p53K370me2 (365–375 aa), p53K372me2 (364–376 aa) and p53K382me2 (376–388 aa) peptides. Weighted average chemical shift variations (Δ ppm) were calculated according to the formula (Δ ppm=([δHN]2+[δN]2)½, where δHN and δN are the changes in HN and N chemical shifts, respectively) as described in . From the Δppm, the Kds were estimated with the amide peaks of two selected amino acids, as shown in Figure 4. The shifted BRPF1 resonances are assigned according to the recent publication .
All diffraction data were collected at 100 K and reduced with the HKL suite of programs . To obtain phase information for BRPF1, 436 0.5° oscillation images collected on an FR-E copper rotating anode source (Rigaku) on a selenomethionyl derivative  crystal of space group I222 (a=43.3 Å, b=72.0 Å, c=114.0 Å). The structure was solved with the single wavelength anomalous diffraction (SAD) method  using the programs SHELXD and SHELXE . An initial model was build automatically with the program ARP/wARP . The model was further refined against a dataset that was derived from 406 0.5° oscillation images collected at beamline 19ID of the Advanced Photon Source at a wavelength of 0.977 Å. COOT , REFMAC , and MOLPROBITY  were used for interactive model building, refinement and validation, respectively. The crystal structures of DNMT3A, DNMT3B, BRPF2, BRPF3, MUM1 and the complex structures of BRPF1-H3K36me3, HDGF2-H3K79me3 and HDGF2-H4K20me3 were solved by molecular replacement using MOLREP , and refined using a similar protocol to that of apo-BRPF1. Crystal diffraction data and refinement statistics are displayed in Tables 2 and and33.
Electron density maps for the ligands identified in our complex structures and reported in this paper. (A) The omit density map for the H3K36me3 peptide in the BRPF1-H3K36me3 complex at 3σ contour. (B) The omit density map for the H3K79me3 peptide in HDGF2+H3K79me3 at 3σ contour. (C) The omit density map for the H4K20me3 peptide in HDGF2+H4K20me3 at 2σ contour. (D) the 2Fo-Fc density map for the bis-tris molecule in the DNMT3A-bis-tris structure. (E) the 2Fo-Fc density map for the bis-tris molecule in the DNMT3B-bis-tris structure.
Standalone iSee datapack - contains the enhanced version of this article for use offline. This file can be opened using free software available for download at http://www.molsoft.com/icm_browser.html.
Instructions for installation and use of the required web plugin (to access the online enhanced version of this article).
We would like to thank Farrell MacKenzie, Sally Ni and Aiping Dong for advice and technical assistance.
Competing Interests: The authors have declared that no competing interests exist.
Funding: This research was supported by the Structural Genomics Consortium, a registered charity (number 1097737) that receives funds from the Canadian Institutes for Health Research, the Canadian Foundation for Innovation, Genome Canada through the Ontario Genomics Institute, GlaxoSmithKline, Karolinska Institute, the Knut and Alice Wallenberg Foundation, the Ontario Innovation Trust, the Ontario Ministry for Research and Innovation, Merck & Co., Inc., the Novartis Research Foundation, the Swedish Agency for Innovation Systems, the Swedish Foundation for Strategic Research and the Wellcome Trust. Results shown in this report are derived from work performed at Structural Biology Center and Northeastern Collaborative Access Team beamlines at the Advanced Photon Source, Argonne National Laboratory, with support from the National Center for Research Resources (RR-15301). Argonne National Laboratory is operated by UChicago Argonne, LLC, for the United States Department of Energy, Office of Biological and Environmental Research, under contract DE-AC02-06CH11357. Research was also conducted at the Cornell High Energy Synchrotron Source (CHESS), which is supported by the National Science Foundation (DMR-0936384) and the National Institutes of Health, and also at the National Synchrotron Light Source at Brookhaven National Laboratory, funded by the United States Department of Energy's Office of Science. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.