|Home | About | Journals | Submit | Contact Us | Français|
Many chromatin-associated proteins contain two sequence motifs rich in phenylalanine/tyrosine residues of unknown function. These so-called FYRN and FYRC motifs are also found in transforming growth factor beta regulator 1 (TBRG1)/nuclear interactor of ARF and MDM2 (NIAM), a growth inhibitory protein that also plays a role in maintaining chromosomal stability. We have solved the structure of a fragment of TBRG1, which encompasses both of these motifs. The FYRN and FYRC regions each form part of a single folded module (the FYR domain), which adopts a novel α + β fold. Proteins such as the histone H3K4 methyltransferases trithorax and mixed lineage leukemia (MLL), in which the FYRN and FYRC regions are separated by hundreds of amino acids, are expected to contain FYR domains with a large insertion between two of the strands of the β-sheet.
In recent years, it has become clear that chromatin structure plays a central role in regulating processes, such as transcription, DNA repair, and DNA replication. Consequently, proteins that modify or remodel chromatin are currently subject to intense interest. Many of these proteins are modular and contain domains that reoccur in numerous nuclear proteins.1 The FYRN and FYRC sequence motifs are two poorly characterized phenylalanine/tyrosine-rich regions of around 50 and 100 amino acids, respectively, that are found in a variety of chromatin-associated proteins [Fig. 1(A)].1–3 They are particularly common in histone H3K4 methyltransferases most notably in a family of proteins that includes human mixed linage leukemia (MLL) and the Drosophila melanogaster protein trithorax. Both of these enzymes play a key role in the epigentic regulation of gene expression during development,9 and the gene coding for MLL is frequently rearranged in infant and secondary therapy-related acute leukemias.10 They are also found in transforming growth factor beta regulator 1 (TBRG1), a growth inhibitory protein induced in cells undergoing arrest in response to DNA damage and transforming growth factor (TGF)-β1.11 As TBRG1 has been shown to bind to both the tumor suppressor p14ARF and MDM2, a key regulator of p53, it is also known as nuclear interactor of ARF and MDM2 (NIAM).12 In most proteins, the FYRN and FYRC regions are closely juxtaposed, however, in MLL and its homologues they are far distant and it is not clear if each motif corresponds to an independent protein module or if both motifs are components of a single domain. To be fully active, MLL must be proteolytically processed by taspase1, which cleaves the protein between the FYRN and FYRC regions.13,14 The N-terminal and C-terminal fragments remain associated after proteolysis apparently as a result of an interaction between the FYRN and FYRC regions. How proteolytic processing regulates the activity of MLL is not known. Intriguingly, the FYRN and FYRC motifs of a second family of histone H3K4 methyltransferases, represented by MLL2 and MLL4 in humans and TRR in Drosophila melanogaster, are closely juxtaposed.
Although FYRN and FYRC motifs are found in proteins that play important roles in the regulation of cell growth and differentiation, little is known about their structure and function. In this study, we report the crystal structure of a fragment of TBRG1 that contains these motifs and show that each forms part of a single protein module, which adopts a novel α + β fold. The structure provides insights into how these motifs mediate the heterodimerization of the products of the proteolytic processing of MLL and allows the identification of potential ligand binding sites.
Attempts were made to express and purify the FYRN/FYRC regions of several of the proteins in which they are closely spaced. A fragment of TBRG1 (residues 179–324), proved the most amenable to study as it could be easily expressed in E. coli, was simple to purify and readily crystallized. Chemical denaturation of this protein by GdmCl shows a cooperative unfolding profile typical of a two-state transition when monitored both by intrinsic tryptophan fluorescence and by circular dichroism [Fig. 2(A)]. Thermal denaturation of the protein monitored by differential scanning calorimetry (DSC) also revealed a single unfolding transition with a midpoint of 41.8°C [Fig. 2(B)]. This data indicates that in TBRG1 the FYRN and FYRC regions are not separate independently folded domains but are components of a distinct protein module. Crystals of the protein were grown by hanging-drop vapor diffusion, and its structure was determined to 1.6 Å resolution using multiple wavelength anomalous dispersion phasing and a selenomethionine-substituted derivative (Table I). The FYRN and FYRC motifs both form part of a single FYR domain that adopts an α + β fold consisting of a six-stranded antiparallel β-sheet followed by four consecutive α-helices [Fig. 1(B)]. The FYRN region corresponds to β-strands 1–4 and their connecting loops, whereas the FYRC motif maps to β-strand 5, β-strand 6 and helices α1 to α4. The two central β-strands are twisted and much longer than those at the periphery of the β-sheet (strands 1 and 6). Strands 2 and 3 are connected through the longest loop in the protein, which is stabilized by a network of hydrogen bonds. A short turn connects β-strand 6 to helix α1, and a longer loop places the short helix α2 in an L-shaped conformation with respect to α1. This motif is followed by two α-helices displaying a similar L-shaped conformation (α3-loop-α4). A search of the protein data bank using DALI15 revealed no significant similarity to other structures.
A structure-based sequence alignment of selected FYRN/FYRC motifs is shown in Figure 3(A). Most of the conserved tyrosine and phenylalanine residues, after which these motifs are named, are involved in interactions that stabilize the fold. The side chains of Y216, Y234, F247, and F294, for example, all form part of the main hydrophobic core of the protein that comprises residues in the helices, the sheet and some of the loop regions [Fig. 3(B)]. The other residues that make up this core are also conserved or are replaced by other hydrophobic amino acids. Y317 and F322 are in an extended loop region at the C-terminus of the protein and are involved in hydrophobic interactions that help to position the C-terminus of the domain close to the N-terminus, a feature often seen in protein modules [Fig. 3(B)]. The side chains of most of the conserved tyrosines form hydrogen bonds that stabilize the fold. The hydroxyl of Y234 in strand 4 hydrogen bonds to the carbonyl of A223 and the amide of M225, which are in the turn between strands 3 and 4. Other residues are involved in long-range hydrogen bonds that link the two motifs. The hydroxyl of Y222, which is at the end of β-strand 3, for example, hydrogen bonds to the N group of H268 in α1 [Fig. 3(D)]. Although H268 is replaced by a tryptophan in most other FYRC motif containing proteins, it is likely that similar interactions are formed with the N group of this amino acid. The hydroxyl of Y216 forms a hydrogen bond with the carbonyl group of F294, which is at the C-terminus of helix 2. Interestingly, the carbonyl groups of the two preceding residues in this helix form hydrogen bonds with the side chain of a highly conserved arginine (R220) in β-strand 3, suggesting that the correct positioning of this helix is critical to the stability of the fold [Fig. 3(D)]. Analysis of the conservation of solvent-exposed residues using the ConSurf server16 reveals two potentially functionally important regions [Fig. 3 (C,D)]. One of these consists of a pocket on the protein surface lined by Y212 and F247, which also contains a highly conserved serine (S263). The second group of conserved solvent-exposed residues includes Y222 and R220. Two tryptophan residues, corresponding to A223 and H268 in TBRG1, that are conserved in FYRN/FYRC motifs from chromatin-associated proteins are also predicted to be located in this region of the protein, which would result in a different shape complementarity of this surface patch and hence potentially different molecular recognition features of the FYR domain.
Based on the degree to which structurally important residues are conserved, in particular those that participate in long-range interactions that link the FYRN/FYRC regions, it is likely that in other proteins these motifs adopt a similar fold to that seen in TBRG1. For proteins such as MLL, in which the FYRN/FYRC motifs are separated by hundreds of amino acids, the structure is expected to contain a large insertion between strands 4 and 5 of the sheet. This would explain why, when MLL is cleaved between the FYRN/FYRC motifs, the products of the reaction remain associated, as during this process the protein essentially is cut within an exposed loop of a folded domain. Why the FYRN/FYRC motifs of MLL and related proteins, but not those of the MLL2/MLL4 family of histone methyltransferases, have a protease cleavable insertion between them is not clear. Proteolytic processing is required for the correct nuclear sublocalization of MLL, although the mechanism by which this is achieved has not been determined. It seems unlikely that the cleavage of this loop would directly affect the structure or function of the fold. It has been suggested that in the uncleaved form of MLL the FYRN and FYRC regions are held in a conformation that prevents them from associating and proteolysis allows them to come together to form a functional unit.13 This hypothesis is consistent with the structural data presented here, as both the FYRN and FYRC regions are integral to the fold and the potential binding sites identified from sequence conservation are composed of residues from either side of the cleavage site. It is, however, possible that the protease targets a regulatory module located within the insertion between the FYRN and FYRC regions and their role in proteolytic processing is only to serve as a means of ensuring that the resulting fragments do not dissociate. It will only be possible to distinguish between these possibilities once the function of these motifs in MLL and related proteins has been established.
FYRN and FYRC motifs are found in association with modules that create or recognize histone modifications in proteins from a wide range of eukaryotes, and it is likely that in these proteins they have a conserved role related to some aspect of chromatin biology. Based on the structural data reported here, a binding rather than a catalytic function seems more likely. The nature of a potential ligand, however, remains unclear, as the FYR domain does not resemble any of the protein modules known to recognize potential binding partners such as modified histones. TBRG1-like proteins, in contrast, are restricted to metazoans and appear to have evolved in parallel with the p53/MDM2 pathway.17,18 As discussed above, the sequences of TBRG1-like FYRN and FYRC motifs are distinct from those in other proteins. Although TBRG1 is located in the nucleus, it has not so far been shown to be associated with chromatin. While it is possible that in TBRG1 these motifs have a similar role to those in other proteins and help to link the antiproliferative pathways in which it is involved with chromatin, it is also possible that they may have acquired an alternative function. Even though FYRN and FYRC motifs are found in an array of biologically and medically important proteins, there is still, clearly, much to be learnt about their function, and the data reported here will provide a structural basis for these efforts.
The DNA coding for residues 179–324 of human TBRG1 was amplified by PCR from a human cDNA library (Clontech) and cloned into a modified pRSETA (Invitrogen) expression vector containing a hexahistidine-tagged lipoyl domain and a TEV cleavage site. Protein was expressed in E. coli C41(DE3) and purified by Ni-affinity chromatography followed by TEV protease cleavage. This was followed by a second Ni-affinity chromatography step to remove the cleaved fusion partner and gel filtration.
CD measurements were performed using a Jasco J-815 spectropolarimeter. FYR domain (20 mM) was incubated with different concentrations of guanidinium hydrochloride [GdmCl] for 4 h in 20 mM Tris pH 7.5, 150 mM NaCl, and 1 mM DTT, while following the amplitude of the negative band at 222 nm as a function of denaturant. Fluorescence measurements were recorded on a Cary Eclipse fluorescence spectrophotometer equipped with a Hamilton Microlab titrator controlled by laboratory software. The excitation and emission wavelengths were 296 and 330 nm, respectively. Slit widths for both excitation and emission were kept at 7 nm, with a photomultiplier voltage of 600 V. Samples (1.4 mL) containing 5 μM FYR domain in 20 mM Tris pH 7.5, 150 mM NaCl and 1 mM DTT were thermostatted at 25 ± 0.1°C with constant stirring. Refolding was carried out by injecting 30 μL into an equal volume of 6 M GdmCl solution containing an equimolar concentration of FYR domain. The time between injections was 6 min.
Calorimetric measurements were performed by using a VP-DSC microcalorimeter (MicroCal), equipped with an AutoSampler. Scans were obtained at a protein concentration of 180 μM, in buffer containing 50 mM potassium phosphate pH 7.2, 150 mM NaCl and 1 mM DTT. A scan rate of 50°C/h was used over a temperature range from 10–110°C. The reversibility of the transition was checked by cooling and reheating the same sample. Results from the DSC measurements were analyzed with the MicroCal Origin™ program.
The FYR domain was crystallized at 3.4 mg/mL by sitting-drop vapor diffusion in 1.0 M K/Na tartrate and 100 mM HEPES, pH 7.5 at 17°C. Crystals used for phasing were grown using selenomethionine-substituted FYR domain at the same concentration and conditions as for the native proteins. The crystals were flash-frozen in liquid nitrogen after the addition of glycerol to 20%. The FYR domain crystals belonged to the spacegroup P3121 with one molecule in the asymmetric unit. Cell constants and crystallographic data are summarized in Table I. Datasets were collected at beamline I02 of the Diamond Light Source (native and seleno-methionine). All data were indexed and integrated with MOSFLM and further processed using the CCP4 package.19 Initially, five selenomethionine sites were found with SHELXD20 using the peak dataset (Table I). The coordinates of these sites were imported into SHARP21 and phases were calculated using the inflection and remote datasets. The model was built manually with the program MAIN and refined using CNS22 and PHENIX.23 The coordinates and structure factors have been deposited in the protein data bank (http://www.pdb.org) with the accession code 2wzo.
The authors thank H. Tidow and A. Andreeva for careful reading of this manuscript.