|Home | About | Journals | Submit | Contact Us | Français|
Maf transcription factors constitute a family of the basic region-leucine zipper (bZip) factors and recognize unusually long DNA motifs (13 or 14 bp), termed the Maf recognition element (MARE). The MARE harbors extended GC sequences on each side of its core motif, which is similar to TRE or CRE (7 or 8 bp) recognized by the AP1 and CREB/ATF families, respectively. To ascertain the structural basis governing the acquirement of such unique DNA recognition, we determined the crystal structure of the MafG-DNA complex. Each MafG monomer consists of three helices in which the carboxyl-terminal long helix organizes one DNA-contacting element and one coiled-coil dimer formation element. To our surprise, two well-conserved residues, Arg57 and Asn61 in the basic region, play critical roles in Maf-specific DNA recognition. These two residues show unique side-chain orientations and interact directly with the extended GC bases. Maf-specific residues in the amino-terminal and basic regions appear to indirectly stabilize MARE recognition through DNA backbone phosphate interactions. This study revealed an alternative DNA recognition mechanism of the bZip factors that bestows specific target gene profiles upon Maf homodimers or Maf-containing heterodimers.
Since the discovery of the avian v-Maf oncogene (31), a number of v-Maf-related cellular proteins have been identified. The Maf family of transcription factors has been found to harbor a unique and highly conserved basic region-leucine zipper (bZip) structure (27). The dimeric Maf factors recognize a palindromic sequence referred to as the Maf recognition element (MARE), of which there are two types: a 13-bp T-MARE (TGCTGAG/CTCAGCA) and a 14-bp C-MARE (TGCTGAGC/CGTCAGCA) (16, 17). A 7-bp TPA (12-o-tetradecanoylphorbol 13-acetate)-responsive element (TRE) (TGAG/CTCA) and an 8-bp cyclic AMP-responsive element (CRE) (TGAGC/CGTCA) constitute the core sequence of the T-MARE and the C-MARE, respectively. It has been shown that the TRE is recognized by the transcription factor AP1 (Jun-Fos heterodimer), while the CRE is engaged by CRE binding protein (CREB/ATF). These cores of the MARE are flanked on each side by extended GC elements (underlined), presenting the MARE as a 5′-TGC-(TRE or CRE)-GCA-3′ structure. This long recognition sequence is unique to the Maf protein family, and it is this DNA binding specificity that contributes to the important functions of the Maf proteins (28). Indeed, target genes activated by Maf homodimers or Maf-containing heterodimers are significantly different from those activated by AP1 or CREB proteins.
The Maf proteins have been subdivided into two groups: the large Maf proteins and the small Maf proteins. Large Maf proteins include MafA, MafB, c-Maf, and Nrl, which possess a transactivation domain and activate transcription by forming a homodimer. On the contrary, small Maf proteins lack a canonical transactivation domain and include MafG, MafK, MafF, and MafT (42). While small Maf proteins form homodimers that repress transcription (26, 29), they also form heterodimers with either the cap'n'collar (CNC) protein family (NF-E2 p45, Nrf1, Nrf2, and Nrf3) or the Bach protein family (Bach1 and Bach2) (12). The heterodimers activate or repress transcription depending on the functional domains of the CNC and Bach proteins.
The critical contributions of Maf proteins in vivo have been elucidated by gene disruption experiments in mouse (27). Target genes of Maf proteins have also been identified (14, 32, 37); for instance, c-Maf activates crystalline genes during lens development, and MafA activates insulin gene expression in pancreatic β-cells. An important observation is that Maf homodimers have a strict requirement for the extended GC sequence flanking each side of the TRE or CRE core of the MARE. In contrast, the NF-E2 binding site (TGCTGAG/CTCAT) and the antioxidant response element (ARE) (TGAG/CNNNGC) are composed of half a TRE and half a MARE. The NF-E2 binding site is an important cis-regulatory element for the expression of erythroid genes (30), while the ARE is important for the expression of antioxidant and detoxification enzyme genes (39). The critical contribution of the flanking GC sequence to the activities of these cis-acting elements revealed that small Maf proteins heterodimerize with NF-E2 p45 and Nrf2 to interact with the NF-E2 binding site and the ARE, respectively (11, 13). Small Maf proteins bind to the MARE half, and CNC and Bach proteins are positioned on the TRE half, indicating that CNC and Bach proteins possess similar binding specificities for AP1-type recognition. Thus, the importance of the flanking sequence of the MARE has been recognized, but the structural basis for the specific recognition of the MARE by Maf proteins has yet to be attained.
In this regard, at least two structural features of Maf proteins are known to be involved in their recognition of the MARE. One is the deviation of critical amino acid residues in the basic region from those highly conserved in other bZip proteins (4). In particular, a tyrosine in the basic region of Maf is a unique substitution of the critical alanine residue found in all the other bZip proteins that is essential for DNA recognition (5, 7, 18, 19). The other is the extended homology region (EHR) on the N-terminal side of the basic region (41). The EHR is highly conserved among Maf proteins and plays an important role in stabilizing the DNA binding of MafG (17). However, the role of the EHR for target DNA sequence specificity has not been elucidated yet because of the lack of Maf-DNA complex structure.
To explore the structural basis by which Maf proteins recognize the unique DNA sequence of the MARE, we performed X-ray crystallography of a MafG homodimer in complex with its cognate DNA. The complex structure including MafG polypeptides (covering the EHR, basic region, and leucine zipper region) and the MARE oligonucleotide duplex enabled us to estimate its unique DNA binding mode directly. We also compared the crystal structure of the MafG homodimer with the crystal structures of Fos-Jun and Skn1, both of which do not demand the GC flanking region for DNA binding, and found that a single α-helix of one monomer grips the half-site of the DNA in the major groove in all three cases. To our surprise, Maf-specific structural elements, including Tyr64 and Thr58 residues in the basic region and EHR en bloc, were not involved in the base recognition of the extended region. However, in sharp contrast to the case of Fos-Jun, invariant Asn61 in the basic region of MafG has a distinct side-chain orientation enabling the recognition of the extended MARE sequence, and another invariant residue, Arg57, contacts with the flanking guanine base. Thus, the Maf-specific elements appear to stabilize the unique side-chain orientations of conserved Asn61 and Arg57 toward the extended GC bases through indirect interactions involving DNA backbone phosphates.
The cDNA fragments encoding mouse MafG(1-123), MafG(21-123), and MafG(45-123) were amplified and digested to generate NdeI-BamHI cDNA fragments that were subcloned into the pET15b (Novagen) vector. Likewise, the expression vectors for the truncated mutants MafG(21-110) and MafG(21-117) were prepared. The resultant plasmids containing particular MafG fragments were transformed into BL21-CodonPlus(DE3)-RIL (Stratagene) cells for overexpression. Crude bacterial lysates were purified by Ni-nitrilotriacetic acid Superflow affinity chromatography (Qiagen). The partially purified proteins were cleaved with thrombin (Calbiochem) and further purified sequentially using Q-Sepharose Fast Flow (GE Healthcare) ion-exchange chromatography, Ni-nitrilotriacetic acid Superflow affinity chromatography, S2 (Bio-Rad) ion-exchange chromatography, HiLoad 16/60 Superdex 75-pg (GE Healthcare) size exclusion chromatography, and Benzamidine Sepharose 6B affinity chromatography. Approximately 25 mg of purified protein was obtained from 1 liter of culture. To express the selenomethionine (Se-Met)-substituted MafG protein, Escherichia coli methionine auxotrophic strain B834(DE3) (Novagen) was used.
The MafG-DNA complex was crystallized by the hanging-drop vapor diffusion method. The MafG-DNA complex (0.3 mM in 20 mM HEPES [pH 7.0], 0.15 M sodium chloride, and 5 mM dithiothreitol) was crystallized by mixing with an equal volume of buffer containing 0.1 M sodium acetate (pH 5.0), 40 mM magnesium chloride hexahydrate, 8% 2-methyl-2,4-pentandiol, and 4 mM Tris(2-carboxyethyl)phosphine at 25°C. The crystals were soaked in cryoprotecting mother liquor and flash-frozen in a nylon CryoLoop in a cold nitrogen stream before data collection. Diffraction data were collected with a microfocus X-ray source on a beam-line BL-17A apparatus at Photon Factory (KEK, Tsukuba, Japan). A crystal was picked up with a nylon loop and immediately frozen with cryogenic nitrogen gas.
The structure of the MafG-DNA complex was solved using the multiwavelength anomalous diffraction method with a Se-Met MafG (Table (Table1).1). Diffraction data were processed with HKL2000 (34). Se-Met sites were located using SHELXD (40). Phases from these sites were calculated and refined by solvent flattening, as implemented in SHELXE (40). The ensuing electron density was of good quality, enabling the main chain to be clearly traced and most of the side chains of the proteins to be well fitted. Refinement against the 2.8-Å data set was completed by iterative cycles of model building with Xfit (25) and crystallographic refinement using torsion angle molecular dynamics with CNS, version 1.2 (2). The electron density was of high quality throughout the majority of the map. Figures were drawn by use of Molscript (20) and PyMol (3).
Electrophoretic mobility shift assays (EMSAs) were carried out as reported previously (22, 44). Briefly, purified proteins were mixed with probes and incubated in gel shift buffer [20 mM HEPES (pH 7.9), 20 mM KCl, 5 mM dithiothreitol, 4 mM MgCl2, 1 mM EDTA, 100 μg/ml bovine serum albumin, and 400 μg/ml poly(dI-dC)] at 37°C for 30 min. The resulting mixture was subjected to polyacrylamide gel electrophoresis and visualized by autoradiography. The homodimer binding activity was examined using 2 to 250 nM of the MafG(1-123), MafG(21-123), and MafG(45-123) proteins. The consensus T-MARE probe was generated by annealing the two oligonucleotides, 5′-TCGAGCTCGGAATTGCTGACTCAGCATTACTC-3′ and 5′-CGAGAGTAATGCTGAGTCAGCAATTCCGAGCT-3′ (underlined bases comprise 13-bp MARE or MARE-related sequences [BglHS4: GAGTGACTCAGCC], which are flanked by arbitrary sequences).
The missing-phosphate assay was performed according to a protocol described previously (43). MafG(1-123) and p45ΔN were used at 0.3 μM and 0.1 to 3.0 μM, respectively. The parental double-stranded DNA probe was generated by annealing the sense strand 5′-ACGGAATGAGTGACTCAGCCTTACTCCATACGTTG-3′ (BglHS4-S) and the antisense strand 5′-TCAACGTATGGAGTAAGGCTGAGTCACTCATTCCG-3′ (BglHS4-AS). To generate the “nicked MARE 1” probe, BglHS4-S and the 5′ portion (5′-TCAACGTATGGAGTAAG-3′) of the complementary strand were annealed, and each 5′ end was labeled with [γ-32P]ATP. The 3′ portion (5′-GCTGAGTCACTCATTCCG-3′) of the complementary strand was then annealed. The “nicked MARE 2” probe was generated by annealing BglHS4-AS and the 5′ portion (5′-ACGGAATGAGTGACTCAGC-3′) of the complementary strand. Each 5′ end was labeled with [γ-32P]ATP before annealing the 3′ portion (5′-CTTACTCCATACGTTG-3′) of the complementary strand.
The atomic coordinates (accession number 3A5T) from this study have been deposited in the Protein Data Bank, Research Collaboratory for Structural Bioinformatics, Rutgers University, New Brunswick, NJ (http://www.rcsb.org).
Of the various bZip transcription factors, only the Maf family members recognize the exceptionally long cis-acting DNA sequence of the MARE. A TRE or CRE core flanked on each side by GC sequences constitutes the MARE (Fig. (Fig.1A).1A). No other member of the bZip family requires the GC flanking sequence adjoining the core (Fig. (Fig.1A).1A). Previous reports (4, 17) suggested that at least two structural elements are responsible for the unique DNA recognition property of the Maf proteins. One is a tyrosine residue in the basic region (shown in pink in Fig. 1B and D), and the other is the EHR domain localized on the N-terminal side of the basic region (Fig. 1C and D), both of which are highly conserved in Maf proteins. Members of the CNC family contain a unique CNC domain in the position corresponding to the EHR, but the CNC domain is apparently diverged from the EHR domain.
Since full-length MafG was prone to degradation during purification, it was expressed in bacteria and purified as the three MafG truncations, MafG(1-123), MafG(21-123), and MafG(45-123), as shown in Fig. Fig.1C.1C. EMSAs with a MARE probe revealed that the deletion of the EHR [i.e., MafG(45-123)] reduced the DNA binding ability significantly. The deletion of 20 amino acids from the N terminus [MafG(21-123)] slightly weakened the DNA binding ability. Although the N terminus of MafG was considered to make an auxiliary contribution to DNA binding, MafG(21-123) retained substantial DNA binding ability. Therefore, MafG(21-123) containing the EHR and bZip domains was used for cocrystallization. A 15-bp MARE sequence termed MARE25 (GTGCTGACTCATCAG) (Fig. (Fig.2A)2A) was chosen as a target DNA (16). The MARE25 sequence contains the NF-E2 binding site, which is critical for erythroid-specific gene regulation, and was previously shown to bind the Maf homodimer tightly (44). As shown in Fig. Fig.2A,2A, the central nucleotide of MARE25 was set as 0, and the other nucleotides were numbered accordingly. The rationale behind this choice was that since MARE25 contains one nucleotide deviation from the GC-(TRE core)-GC consensus sequence, we could assess the singular DNA recognition of each MafG monomer.
The orthorhombic crystals diffracted anisotropically up to 2.6 Å using synchrotron X-ray radiations. The asymmetric unit included two MafG fragments and one double-stranded 15-bp DNA. The structure was solved by Se-Met multiple-wavelength anomalous diffraction and was refined at a resolution of 2.8 Å (Table (Table1).1). In the electron density map, residues 113 to 123 lacked electron density, suggesting a disordered structure. To improve the quality of the crystal, we also tried crystallization with C-terminally truncated MafG(21-117) and MafG(21-110), but those attempts were unsuccessful.
Since MARE25 is asymmetric, each MafG monomer in the dimeric MafG-DNA complex recognized a slightly different DNA sequence but displayed a similar overall structure (Fig. 2A and B). The asymmetry of the DNA sequence used in the crystallization allowed a distinction to be made between the two MafG monomers. Subunit A was in contact with the GA flanking region, and subunit B was in contact with the GC flanking region. The structural differences between subunits A and B of MafG appear to be due simply to the different crystal packing. Our analysis revealed the presence of three α-helices, H1 (residues 26 to 30), H2 (residues 35 to 39), and H3 (residues 46 to 99), in MafG. The Maf-specific structural element EHR consists of H1, H2, and the N-terminal region of H3. This observation agrees well with the previously reported solution structure of monomeric MafG(1-76) (21). H3 (residues 46 to 99) corresponds to the bZip structure. H1 and H2 within the EHR interact with the basic region (residues 46 to 62 in H3) through both hydrophobic and hydrophilic interactions. Like GCN4 and Fos-Jun, the MafG homodimer embraces the major groove of the DNA via its two long H3 helices akin to a pair of chopsticks (Fig. 2A and B). The DNA in the complex is in a straight B form and has an average helical rise and twist of 3.2 Å and 34.6°, respectively.
We compared three structures: the MafG-DNA complex, the AP1-type bZip transcription factor (heterodimeric Fos-Jun ) in complex with the TRE, and the CNC-type bZip transcription factor (monomeric Skn1 ) in complex with the TRE half-site. This comparison revealed that the basic region of MafG shows practically the same structure as the basic region of the Fos-Jun dimer (Fig. (Fig.2C).2C). While Skn1 has an additional H0 helix on the N-terminal side, its H1 and H2 are essentially similar to H1 and H2 of the MafG EHR (Fig. (Fig.2D).2D). Thus, in spite of the marked difference in the recognition sequence, MafG displays a main-chain structure similar to those of the factors possessing AP1-type and CNC-type binding specificities.
More closely examining the leucine zipper region of MafG, the interhelical contacts are mediated by leucine residues in the repeated heptad motif, as was described for other bZip proteins. The homodimer is stabilized by a large hydrophobic interface involving Lue79, Leu86, and Val90 (Fig. (Fig.3A).3A). Figure Figure3B3B shows a helical-wheel representation of the MafG homodimer and indicates interhelical electrostatic interactions. The MafG coiled coil reveals a few unusual residues (Lys76, Lys83, and Asn97) at the “a” positions in the heptad repeat (Fig. (Fig.1D).1D). The side chain of each Lys76 hydrogen bonds with Gln75 at the “g” position of the other monomer via a water molecule (Fig. 3A and B). Lys83 at the “a” position shows intersubunit electrostatic interactions with Gln82 at the “g” position, while the Asn97 residues at the “a” position also link the subunits by hydrogen bonding (Fig. (Fig.3B).3B). The two pairs of electrostatic interactions (Gln82-Lys83 and Gln75-Lys76) appear to be conserved or replaced by similar interactions (Gln-Arg, Glu-Lys, or Glu-Arg) in all Maf protein family members (Fig. (Fig.1D).1D). The intersubunit Asn97 interaction is specific to small Maf proteins, since the corresponding residue is replaced with Ile or Val for large Maf proteins. This may, at least in part, account for the fact that small Maf proteins cannot form heterodimers with large Maf proteins (15). Interestingly, large Maf can form a heterodimer with Fos or Jun (17). In contrast, small Maf can interact with Fos but not with Jun (15). Interhelical electrostatic contacts of the Fos-Jun heterodimer (Fig. (Fig.3C)3C) appear to be distinct from those of the MafG homodimer (Fig. (Fig.3B),3B), and conserved interhelical interactions were not observed. Comparisons between the MafG homodimer (Fig. (Fig.3B)3B) and the Fos-Jun heterodimer (Fig. (Fig.3C)3C) suggest that additional interhelical contacts that determine the dimerization specificity should be formed upon the heterodimerization of MafG with Fos.
The recognition of the MARE by Maf consists of a broad specificity against the TRE/CRE core element but a strict requirement of the flanking region (17, 44). Each monomer of the dimeric MafG-DNA complex interacts with the major groove containing a TRE half-site (Fig. 4A and B) and the flanking regions (Fig. 4A and B). The invariant Asn61 is located deep within the major groove of the recognition site (Fig. 4A and B), and its electron density in the simulated-annealing Fo-Fc omit map for each subunit was unambiguously visible (Fig. 4C and D). We found that the orientation of the Asn61 side chain of MafG is distinct from the side-chain orientation of other bZip factors such as Fos and Skn1 (Fig. 4E and F). While the side chains of the invariant Asn residues in Fos and Skn1 orientate toward the TRE core, the side chain of MafG Asn61 bears toward the flanking regions. Arg57 is another residue that is well conserved in the bZip factors, but irrespective of its conservation among bZip factors, Arg57 is involved in DNA recognition only in Maf proteins. Our results indicate that Arg57 plays a critical role in the recognition of the extended DNA sequence. Indeed, the simulated-annealing Fo-Fc omit map revealed an unambiguous side-chain orientation of Arg57 in each subunit (Fig. 4C and D).
We summarized all contacts between the MafG homodimer and the MARE25 DNA duplex in Fig. Fig.5A.5A. Showing very good agreement with the case for AP1-type bZip factors, conserved Ala65 residues of MafG create van der Waals contacts with the T base at positions +1 and −3. Arg69 of subunit B forms a hydrogen bond with the G base at position 0 through a water molecule. A series of salt bridges is formed between Arg62 and the phosphate backbone of thymine at position +1, between Arg69 and the phosphate backbone of adenine at position −1, and between Lys71 and the phosphate oxygen of thymine at position −3. Arg56 of subunit B interacts with the phosphate backbone of thymine at position −7.
The DNA base recognition modes of the MafG monomers and Skn1 are depicted in Fig. 5B to F, concomitant with those of Pap1 and C/EBP. The conserved amino acid side chains of the protein motifs that make direct contacts with the DNA bases of the consensus DNA sequence as well as hydrogen bonds and van der Waals contacts are shown.
Asn61 of MafG subunit A forms a direct hydrogen bond with adenine (position −4) in the GA flanking sequence (Fig. (Fig.5B)5B) and an indirect hydrogen bond with cytosine (position +2) of the opposite sequence (Fig. (Fig.5A)5A) in the core region of MARE25. Asn61 of MafG subunit B creates van der Waals forces with cytosine (position −4) in the GC flanking region of MARE25 (Fig. (Fig.5C).5C). Thus, MafG Asn61 recognizes mainly the flanking region, although it forms weak indirect hydrogen bonding with the TRE core region. In stark contrast to MafG, the invariant Asn511 of Skn1 interacted only with cytosine (position +2) in the TRE region (Fig. (Fig.5D).5D). AP1-type transcription factors such as Fos, Jun, and GCN4 show basically the same DNA recognition mode as Skn1 (7, 9, 39). In the Schizosaccharomyces pombe AP1-type factor Pap1, the corresponding Asn residue also recognizes the TRE core region (Fig. (Fig.5E)5E) (6).
Arg57 in MafG subunit A forms critical hydrogen bonds with the O6 of guanine at position −5 and the O4 of thymine at position +4 (Fig. 5A and B), while Arg57 in MafG subunit B forms hydrogen bonds to the O6 of guanine at position −5 and the O6 of guanine at position +4 (Fig. 5A and C). It should be noted that Arg57 of MafG participates in DNA base recognition by changing its side-chain orientation; Arg57 in MafG shows a distinct side-chain orientation compared to the corresponding arginine residues of AP1-type and CNC-type factors. In fact, Arg507 and Arg143 residues are the residues corresponding to Arg57 of MafG in Skn1 and Fos, respectively, but the side chain of Arg507 in Skn1 interacts with a backbone phosphate of DNA (see Fig. Fig.7B),7B), and that of Arg143 in Fos is exposed to solvent (7), and these side chains are not involved in DNA base recognition.
Utilization of Arg57 for DNA recognition makes the Maf subfamily possess a unique and exceptionally long consensus protein sequence for DNA base recognition (13 residues, RxxxNxxYAxxCR) (Fig. 5B and C). In contrast, those of AP1-type/CNC-type factors (NxxAAxxC/SR) and the yeast AP1-type factor Pap1 (NxxAQxxFR) contain a 9-residue sequence, and the arginine residue corresponding to Arg57 of MafG is not included in the interaction with DNA (Fig. 5D and E). Another characteristic protein sequence is RxxNxxAVxxSR for the C/EBP subfamily, where the arginine residue involving DNA recognition is 1 amino acid closer to the asparagine residue than in MafG. The arginine residue of C/EBP at the position corresponding to Arg57 of MafG in the consensus sequence does not participate in DNA base recognition (Fig. (Fig.5F5F).
Thus, we surmise that the distinct side-chain orientation of Arg57 and Asn61 explains why MafG requires the flanking region for binding and recognizes the TRE core with specificity relatively broader than that of AP1-type and CNC-type transcription factors.
Tyr64 in the basic region has been shown to be one of the key determinants of the DNA recognition specificity of the Maf protein family (4, 18). Whereas Tyr64 is conserved in all Maf proteins, its matching residue in the other bZip transcription factors is alanine. The electron density in the simulated-annealing Fo-Fc omit map shows the side-chain orientation of Tyr64 in each subunit clearly (Fig. 4C and D), and against our expectation that this residue may be involved in the recognition of unique bases in the MARE, Tyr64 does not form any direct contacts with DNA bases, but instead, the hydroxyl group of Tyr64 interacts with the phosphate backbone of guanine at position −5 (Fig. (Fig.5A5A).
To examine the input of Tyr64 to the reinforcement of MafG-DNA complex formation, we performed a missing-phosphate assay, which is a modified EMSA using a nicked probe lacking one of the backbone phosphates. To allow the unidirectional binding of MafG to DNA, we adopted an asymmetric MARE variant (BglHS4-MARE) possessing a flanking GC sequence on one side as a probe (44) and a p45-MafG heterodimer as a trans-acting factor requiring a single GC in the MARE flanking region (Fig. (Fig.6A).6A). The p45-MafG heterodimer efficiently bound to the control probe without a nick (Fig. (Fig.6B).6B). In contrast, the removal of the 5′-phosphate of guanine at position −5 (nicked MARE-1) dramatically reduced the binding of the heterodimer, while the removal of the 3′-phosphate of cytosine at position +5 (nicked MARE-2) did not affect the binding (Fig. (Fig.6B).6B). This testifies that the interaction between Tyr64 and the backbone phosphate of guanine at position −5 is critical for the high-affinity binding of MafG to the MARE.
The crystal structure shows that Tyr64 of subunit A forms an indirect hydrogen bond with the side chain of Arg57 via a water molecule in subunit A (Fig. (Fig.4C).4C). Tyr64 appears to stabilize the side-chain orientation of Arg57 that is optimal for the recognition of the flanking guanine. The hydrogen-bonding network extends from Tyr64 to Asn61 through the indirect hydrogen bond formation between Arg57 and Asn61, and the side chain of Asn61 in turn flips out and interacts with the extended DNA base. Therefore, Tyr64 is vital for maintaining the side chains of both Arg57 and Asn61 in a spatial arrangement ideal for extended GC recognition and consequently stabilizes Maf-type DNA recognition by forming a hydrogen-bonding network with these side chains and extended DNA bases. Corresponding water molecules were not observed in subunit B in the current crystal structure (Fig. (Fig.4D),4D), perhaps due to the higher B factor (Table (Table1)1) of the corresponding region.
Thr58 is a Maf-specific residue that is well conserved in all Maf proteins. Intriguingly, Thr58 is involved in the interaction that appears to be important for establishing Maf-type DNA recognition. Thr58 develops a van der Waals contact with the backbone phosphate of cytosine at position +2 and forms a concurrent hydrogen bond with Gln54 (Fig. (Fig.7A).7A). Gln54 is also well conserved in all Maf proteins and appears to stabilize the Thr58-phosphate interaction by fixing the side-chain orientation of Thr58. Arg507, which corresponds to Arg57 of MafG in the Skn1-DNA complex, forms a hydrogen bond to the analogous backbone phosphate of cytosine at position +2 (38). Arg507 also makes a hydrogen-bonding network with the invariant Asn511 via a water molecule and a DNA backbone phosphate (Fig. (Fig.7B).7B). This hydrogen-bonding network seems to lock the side chain of invariant Asn511 into a position that is most favorable for CNC-type DNA recognition (Fig. 4E and F). It is remarkable that the well-conserved arginine and asparagine residues of MafG (Arg57 and Asn61) and Skn1 (Arg507 and Asn511) form a hydrogen-bonding network in a distinct manner by using a different DNA backbone phosphate as a scaffold in the case of MafG guanine at position −5 and in the case of Skn1 cytosine at position +2 (Fig. (Fig.5A5A and and7B).7B). The Thr58-phosphate interaction in the MafG-DNA complex appears to prevent Arg57 from forming a hydrogen bond with the backbone phosphate of cytosine at position +2, which would otherwise be optimal for CNC-type DNA recognition. Thus, Thr58 stabilizes the alternative side-chain orientation of Arg57, being well suited for extended GC base recognition.
The Maf-specific EHR element is located on the N-terminal side of the basic region. As shown in Fig. Fig.8A,8A, H1 and H2 of the EHR interact with the basic region in H3. Main-chain oxygen atoms of Leu29 and Met32 interact with the side chain of Arg56. These interactions appear to stabilize the interaction of Arg56 with the DNA backbone phosphate. The side chain of Asp26 in the EHR domain forms electrostatic relations with Arg55 and with Arg62 via a water molecule, which secures the hydrogen bonds between Arg62 and the DNA backbone phosphate.
In a primary amino acid sequence alignment, the EHR is found to be a Maf-specific element. However, where the tertiary structure is concerned, a similarity between the EHR and the CNC domain of Skn1 was previously suggested (21). Our present study revealed similar structural features between the MafG EHR and Skn1 CNC domain. Comparison of the EHR with the Skn1 CNC domain is depicted in Fig. Fig.2D.2D. We also compared the interaction between the EHR and the basic region of MafG with the interaction between the CNC domain and the basic region of Skn1 (Fig. 8A and B, respectively). Ile477 and Met480 in the Skn1 CNC domain interact with Arg506 (Fig. (Fig.8B),8B), which resembles the interaction found between the EHR (Leu29 and Met32) and Arg56 in MafG (Fig. (Fig.8A).8A). The electrostatic interactions of Asp26 with Arg55 and Arg62 in MafG correspond to the hydrophobic contacts between Ala474 in the CNC domain and Ile505 in the basic region of Skn1. Although the overall structures and part of the interactions with the basic region resemble each other, the amino acids involved and each interaction among them are substantially different.
The truncation of the EHR significantly reduced the DNA binding capability of MafG in an EMSA experiment (Fig. (Fig.1C).1C). Therefore, DNA binding is rendered stable by means of indirect contact between the EHR and the DNA backbone phosphate. Considering the similarity of the interactions mediated by the MafG EHR and the Skn1 CNC domain, the latter also stabilizes DNA binding via interactions with the basic region. Therefore, these results suggest that the roles that the EHR and CNC play may be more oriented toward stabilizing DNA binding rather than conferring specificity to DNA recognition.
The current crystal structure analysis of a MafG homodimer in complex with a cis-acting MARE revealed a new mode of bZip protein-DNA recognition. This study stemmed from the observation that Maf proteins recognize an unusually long cis-acting sequence compared with the sequences bound by other bZip transcription factors, such as the AP1 and CREB/ATF families (27). Examination of the crystal structure revealed that Maf proteins recognize mainly the extended GC (or GA) sequence that flanks the TRE core of the MARE owing to the unique side-chain orientations of Maf Arg57 and Asn61.
It was previously reported that the replacement of tyrosine residue Y287 in c-Maf with alanine does not affect c-Maf binding to the MARE with base substitutions in the extended GC elements, but the mutation inhibits c-Maf binding to the MARE with base substitutions within the TRE core region (4). We recently verified that the substitution of the conserved alanine of the CNC family factor Nrf2 with tyrosine converts the binding specificity of Nrf2 to that of Maf proteins (18). We originally interpreted this observation to be direct contacts being made by Tyr64 of MafG (corresponding to Y287 of c-Maf) with the DNA bases in the extended binding region such that an alanine-to-tyrosine substitution might create alternative base contacts deep within the major groove. However, the present study has revealed that Arg57 and Asn61 play critical roles in the recognition of the MARE. This was a surprising revelation, as these two residues are not specific to Maf family proteins.
Current data also revealed the indispensable roles that Tyr64 plays for MafG to recognize the MARE. Tyr64 of subunit A makes contact with the backbone phosphate of guanine at position −5 and concomitantly interacts with the side chain of Arg57 via a water molecule. The resulting side-chain orientation of Arg57 enables engagement with the extended GC and GA bases as well as indirect hydrogen bonding with the invariant Asn61, which consequently recognizes the extended region of the MARE. Thus, Maf proteins recognize and bind to the extended DNA sequence in the MARE by virtue of the unique side-chain orientations of these two residues.
We also found that the Maf-specific Thr58 residue contributes to Maf-type DNA recognition by associating with the DNA backbone phosphate in a manner distinct from that of Tyr64. The Tyr64-phosphate interaction with guanine at position −5 stabilizes the hydrogen bond network that is optimal for MARE recognition. On the other hand, the Thr58-phosphate interaction with cytosine at position +2 stabilizes MARE recognition by destabilizing the hydrogen-bonding network that is preferable for CNC-type recognition. Consequently, the Maf-specific structural elements in the basic region, i.e., Tyr64 and Thr58, stabilize the side-chain orientations that are optimal for MARE recognition, and this indicate that changes in the structural elements that do not interact directly with DNA bases but specify the side-chain orientations of amino acids that do interact directly with the DNA bases generate diversity in the binding sequence of bZip factors by altering the fine structure of the basic region.
A previously reported Pap1 structural analysis revealed a somewhat related strategy for generating diversity in the cis-acting DNA sequence (6). Pap1 target sequences are different from those recognized by AP1 or CREB/ATF factors. The glutamine and phenylalanine residues unique to the basic region of Pap1 (NxxAQxxFR) interact directly with the unique bases of the target DNA. The invariant Asn86 (in italics) of Pap1, corresponding to Asn61 of MafG, contributes to the distinct specificity of Pap1 through its alternatively positioned side chain. While Pap1 appears to have adopted the MafG-related strategy, Asn86 of Pap1 recognizes adenine at position +3 in the core region but not the extended DNA sequence (Fig. (Fig.5E).5E). We surmise that the use of the same conserved amino acids but in a different manner by taking advantage of distinct side-chain orientations may represent a common strategy for transcription factors to create diversity in their target sequences during molecular evolution.
Our previous nuclear magnetic resonance study suggested structural similarity between the MafG EHR and CNC domain (21), which is consistent with the observations made in this study. The binding of transcription factors to their cognate DNA sequences is stabilized by interactions between the DNA and key residues in the basic regions and the EHR or the CNC domain without those residues directly contacting with the DNA. These results support the contention that although direct base contacts may be restricted to the basic region, other domains may determine how the basic region is exploited to accept the specific DNA sequence. This phenomenon is also the case for VBP, a member of the PAR subfamily of bZip factors (8).
Biological processes mediated by Maf homodimers or Maf-containing heterodimers depend heavily on the presence of the GC (or GA) sequences flanking the core TRE/CRE sequence. Consistent with the critical role of the extended sequence in cis, Maf-type recognition specificity in trans was previously shown to define a unique array of target genes (18). As described above, the Nrf2(A502Y) mutant, in which Ala502 (corresponding to MafG Tyr64) is substituted with tyrosine, acquires the Maf-type binding specificity. Since most of the cytoprotective genes are activated by Nrf2-MafG, but not by Nrf2(A502Y)-MafG, the difference between the Maf-type and CNC-type recognition specificities appears to be critical for gene regulation in vivo. Our present study has cast a light on the molecular mechanisms underlying such differential and biologically significant DNA recognitions.
Several missense mutations in the cMAF gene in humans and mice have been identified and cause congenital cataracts (9, 24). Mutations reported previously for the basic region of human c-Maf include R299S, which corresponds to Arg62 of MafG (9). The mutation would make c-Maf defective in DNA binding since the corresponding MafG Arg62 is involved in a stabilizing network of electrostatic interactions among Asp26 and Arg55. Similarly, R291Q (Arg291 corresponds to Arg57 of MafG) is a mutation in the basic region of mouse c-Maf. The R291Q mutation in mouse c-Maf was previously reported to alter the DNA binding specificity (24). The mutation resulted in a selective alteration in the DNA binding affinity for target oligonucleotides containing variations of the core CRE and TRE, suggesting that the c-Maf R291Q mutant might acquire DNA binding specificity similarly to AP1-type DNA recognition. With a glutamine residue in place of Arg57, the side chain of Asn61 may fail to flip out to contact with the flanking region of the MARE.
It is interesting that the hyperactivation of Maf-containing dimers often relates to pathological states (1, 36), which are due to the enhanced recruitment of p300. A quantitative enhancement of c-Maf transcription was also observed for roughly half of the patients with multiple myeloma, and c-MAF gene translocation has been found in 5 to 10% of the cases (10). A increased accumulation of the Nrf2-MafG heterodimer turned out to be quite common in lung cancer and confers drug resistance to tumor cells (33, 35). An elucidation of the Maf-type DNA recognition mechanisms may facilitate the development of small molecules that can disrupt the Maf-DNA interaction, thereby alleviating pathological states related to the hyperactivation of Maf-containing dimers.
In summary, the delineation of the crystal structure of the MafG homodimer in complex with a MARE oligonucleotide unequivocally demonstrated that two residues, Arg57 and Asn61, in the basic region of MafG, which are well-conserved among bZip transcription factors, play critical roles for MafG recognition of the MARE through direct contacts with the extended GC bases in the motif. Whereas the Maf-specific structural elements EHR and Tyr64 do not interact directly with the extended GC bases, these elements stabilize the side-chain orientation of Arg57 and Asn61 residues indirectly and contribute to the specific binding of MafG to the unusually long MARE motifs.
We thank staff members of the BL-17A beam line at Photon Factory (KEK) for assistance during data collection, Hiroki Sato and Ryosuke Nakahata for help in the preparation of proteins, and Tania O'Connor for critical reading and editing of the manuscript.
This work was supported in part by grants-in-aid for scientific research from MEXT and JSPS: Scientific Research (H.K. and H.M.), Scientific Research on Priority Areas (H.M. and M.Y.), Targeted Protein Research Program (H.K., H.M., and M.Y.), Specially Promoted Research (M.Y.), and Protein 3000 project (T.T.). This work was also supported by grants from JST-ERATO (M.Y.), Tohoku University Global COE Program for the Conquest of Diseases with Network Medicine (M.Y.), and the 21st Century COE program (T.T.).
Published ahead of print on 21 September 2009.