|Home | About | Journals | Submit | Contact Us | Français|
We report the solution structure of the DNA binding domain of of the Escherichia coli regulatory protein AraC determined in the absence of DNA. The 20 lowest energy structures, determined on the basis of 1507 unambiguous nuclear Overhauser restraints and 180 angle restraints, are well resolved with a pair wise backbone RMSD of 0.7 Å. The protein, free of DNA, is well folded in solution and contains seven helices arranged in two semi-independent subdomains, each containing one helix-turn-helix DNA binding motif, joined by a 19 residue central helix. This solution structure is discussed in the context of extensive biochemical and physiological data on AraC and with respect to the DNA-bound structures of the MarA and Rob homologs.
Members of the very large AraC family of prokaryotic regulatory proteins typically share approximately 20% sequence identity over their ~110 residue DNA binding domains (DBD’s).1,2 Most members, including AraC, also contain a domain involved in controlling the protein’s activity. In AraC, the additional domain dimerizes the protein3,4, contains a binding site for arabinose, and possesses an N-terminal arm that plays a key role in controlling the arabinose-dependent DNA binding properties of the protein5,6. Considerable genetic data indicates that the N-terminal arms on the dimerization domains interact with the protein’s C-terminal DNA binding domains in the absence of arabinose,6,7 but not in the sugar’s presence4,8–11. Thus, the arms control the orientational freedom of the DNA binding domains5,8,12–14, (Fig.1).
Elucidation of the structural details of many of the important, functional, intramolecular interactions in AraC requires high resolution structural information. To date, however, despite intense effort, x-ray structure determinations of either AraC or its DNA binding domain have been unsuccessful. Inability to crystallize these proteins, either alone or bound to DNA, may be a consequence of intra- or inter-domain structural flexibility and may also be a consequence of the poor solubility properties of AraC. Other members of the AraC family share the property of low solubility.15–18 Structure determination has been possible, however, for DNA-bound forms of two AraC family members, MarA19 and Rob.20 These findings plus the fact that many DNA binding domains undergo substantial structural rearrangements upon binding to DNA,21 raise the possibility that DNA binding domains in the AraC family may be partially or even largely unfolded when free in solution. Consequently, they would substantially restructure or refold upon binding to DNA. Therefore, it is of considerable interest to determine the structure of the DNA binding domain of an AraC family member in the absence of DNA.
DNA encoding the AraC DNA binding domain region, amino acid residues 175–292, was inserted between the Nco1 and Sac1 sites in the multiple cloning region in the pET24d vector. The C-terminal truncation was generated using quick-change mutagenesis, (Stratagene) to introduce two tandem stop codons (TAATAA) after amino acid position E281. All constructs were verified by DNA sequencing.
An isolated colony of cells, BL21(DE3), freshly transformed with the appropriate pET24d expression construct was grown for 8–16 hours in 5 ml YT medium22 containing 40 µg/mL kanamycin. Unlabeled protein was over expressed for 3 hours in cultures grown in YT and induced with 0.4mM IPTG once the A550 reached 0.8. 15N-labeled proteins were over-expressed using a modification of the method of Marley, et al.23 A starter cell culture grown overnight in minimal glucose medium was used to inoculate new cultures also in minimal glucose medium which were grown with vigorous aeration at 37 °C to an A550 of ~0.8. Cells were harvested, washed with M9 salts22 lacking NH4Cl and immediately resuspended in 1/4 of the previous culture volume of minimal medium containing 1 g/L 15N NH4Cl and either glucose (1% w/v) or 13C-glucose (0.4% w/v) as the carbon source. After 5 minutes in labeling medium, over-expression was induced by the addition of IPTG to 0.4 mM and cultures were grown for an additional three hours at 37 °C, harvested and either frozen at −80 °C or processed immediately.
Frozen cells, ~5 g, were thawed on ice, resuspended in 3 volumes of lysis buffer, 20 mM sodium phosphate, pH 6.0, 300 mM NaCl, 1 mM EDTA, 1 mM dithiothreitol, 5% glycerol, and lysed by two passages through a French press. 1/100 volume of 0.1 M phenylmethanesulfonyl fluoride, freshly dissolved in ethanol, was added immediately before and again after lysis. The lysate was diluted with an equal volume of lysis buffer and centrifuged for ten minutes at 12,000 rpm in a Sorvall SS-34 rotor. The pellet was resuspended in 3 volumes of 7 M guanidium-HCl and 10 mM dithiothreitol and allowed to stand for 15 minutes at room temperature. The sample was diluted in half with 50 mM NaCl, 20 mM Na phosphate, pH 6 and dialyzed twice into 1 L of the same buffer. The dialysate was centrifuged for ten minutes at 12,000 rpm in a Sorvall SS-34 rotor and the supernatant loaded onto a 5 mL Heparin Hi-Trap column, Pharmacia, pre-equilibrated with 20 mM Na Phosphate, pH 6. Protein was eluted from the column with a 90 mL, 0–1 M NaCl gradient. The DNA binding domain elutes at about 0.45 M NaCl, is the only significant elution peak, and is >95% pure based on SDS-PAGE. The pooled peak fractions from the Heparin HiTrap column were further chromatographed on a Mono-S column using the same buffers and protocol. Heavily overloaded samples run on 15% SDS polyacrylamide gels showed no trace of contaminating proteins. Typically, about 50–100 mg of purified DBD are obtained from 5 g of cells. Purified protein concentrations were calculated from OD280 using an extinction coefficient of 8,480 M−1cm−1 as calculated from the amino acid content by the method of Pace, et al.24 Since the protein was refolded out of guanidinium-HCl, the fraction of active protein was assessed using a fluorimetric DNA binding assay.25 The protein was fully active within experimental error.
Samples were prepared by dialyzing labeled protein into 444.4 mM NaCl, 55.5 mM Na phosphate, pH 6. D2O was then added to 10% (v/v) giving a final composition of 400 mM NaCl, 50 mM Na phosphate. If necessary, samples were concentrated using a Millipore Centricon YM-10 concentrator. Typical final protein concentrations were between 200 and 600 µM. Samples in pure D2O were prepared using two passes through a buffer exchange column equilibrated with 400 mM NaCl, 50 mM Na phosphate, 100% D2O, pD 6 (apparent pH=5.626). All NMR experiments were carried out at 25 °C on various spectrometers operated by the Biomolecular NMR Center at the Johns Hopkins University. Most data used in the assignments was collected on a Varian INOVA spectrometer operating at a 1H Larmor frequency of 500 MHz. The 3D 15N and 13C-edited NOESY spectra were obtained on a Varian INOVA spectrometer operating at a 1H Larmor frequency of 800 MHz and equipped with a cryogenic probe with z-axis pulsed field gradients. All aromatic HSQC and NOESY data were collected on a Bruker Avance spectrometer operating at a 1H Larmor frequency of 600 MHz and equipped with a TCI cryoprobe with z-axis pulsed field gradients. Data were processed using NMRPipe27 and analyzed using NMRView.28
Chemical shift assignment of HN, N, CA and CB resonances was performed manually using four triple resonance experiments, HNCA, HNCACB, HN(CO)CA and CBCA(CO)NH, data not shown. The fully assigned 1H-15N-HSQC spectrum of the DBD containing residues 175–281 of AraC is shown in Supplementary Figure 1. The remainder of the chemical shift assignments were also obtained manually using a combination of several spectra including an aliphatic 13C-CTHSQC, aromatic sensitivity enhanced CH-HSQC, aromatic sensitivity enhanced CH-CTHSQC, 13C-edited aromatic 3D NOESY, HNCO, HCC(CO)NH-TOCSY, H_CC(CO)NH-TOCSY and HCCH-COSY. Overall completeness of assignments is 92.7%.
NOE spectra were converted to XEASY format using the program CARA.29 Automated NOE peak picking, identification and assignment were performed with the stand-alone ATNOS/CANDID program.30,31 The NMR solution structure was calculated using the program CYANA.32,33 Conversion of NOE peak intensities to distance restraints was done using automatic calibration. Initial dihedral angle restraints were generated using TALOS34 and were input during the CYANA calculations. The final structure was refined using CNS with explicit water as the solvent using scripts obtained from the RECOORD project web site.35 Structures were analyzed using PROCHECK-NMR.36 Graphical representations were generated using either Molmol37 or VMD.38
The structure and NMR restraints have been deposited with the RCSB PDB and the BMRB. The accession numbers are RCSB ID code rcsb100855, PDB ID code 2k9s and BMRB accession number 16001.
Initial NMR studies of the DNA binding domain of AraC suggested that the C-terminal portion of the domain is unstructured. The 1H-15N HSQC spectrum of the AraC DBD contained about ten extremely intense, sharp resonances relative to the others, suggesting that they derived from an unstructured segment. Chemical shift redundancy of these crosspeaks with several of the less intense peaks interfered with initial resonance assignment. As an eleven-residue C-terminal tail of AraC can be deleted without affecting the protein’s inducing or repressing properties39 and is absent from some natural homologs of the protein, we removed these residues. As expected, the 1H-15N-HSQC spectrum from this construct lacked the very sharp resonances, and all other backbone crosspeaks were of comparable intensity (Sup., Fig. 1). We therefore used the C-truncated DBD for all subsequent work reported here.
The resonance assignments obtained as outlined in the Methods section, and the three-dimensional NOE spectroscopy experiments, were used as inputs for the structure calculation. Initial dihedral restraints were generated by chemical shift analysis and the final structure ensemble was generated using these restraints and NOE distance restraints. The 20 structures with the lowest final target function values, (Fig. 2a), possessed an average backbone RMSD to mean of 0.7 Å. The NMR structure statistics are summarized in Table 1. The final lowest energy structure from 20 structures, refined in explicit water, is shown, (Fig. 2b).
The AraC DNA binding domain consists of seven alpha helices in two weakly interconnected subdomains. Each subdomain contains one helix-turn-helix (H-T-H) motif, helices 2 and 3 and helices 5 and 6 in the N and C terminal subdomains, respectively. The two subdomains are joined by a 19-residue central helix, helix 4. Only eight long-range NOE restraints are observed between the two subdomains, those involving residues L16, A17, V79 and G80. The absence of other contacts between the subdomains allows significant rotation of one H-T-H subdomain with respect to the other. This freedom is seen in the NMR structure ensemble in which either the N-terminal or the C-terminal subdomains can individually be overlaid well, but the relative positioning of the two subdomains shows considerably greater dispersion, (Fig. 3). That dispersion leads to the relatively large RMSD values reported in Table 1. The backbone RMSD of the individual subdomains is considerably lower, 0.48 Å and 0.49 Å for the N- and C-terminal subdomains, respectively. Also consistent with the idea that the two subdomains may move somewhat independently of one another is the fact that calculating the structures of each subdomain separately, residues 1–57 and residues 57–107, using the same NOE data sets as above gives much better structural statistics with an average backbone RMSD to mean of 0.47Å for both. Our data provide no evidence for the existence of any significantly populated alternate structures or of a substantial amount of unstructured protein.
Our results do not support the possibility mentioned above that the poor solubility of AraC results from unfolded DNA binding domains. Other sources of the protein’s low solubility and aggregation include oxidation of cysteine residues or domain swapping of helix-turn-helix regions. Oxidation seems unlikely, however, since fully reduced DBD at high concentration aggregates slowly even when stored under argon. The possibility of intermolecular domain swapping40 of the helix-turn-helix subdodmains of AraC DBD appears compatible with the very weak interdomain interactions between the two helix-turn-helix regions as well as the likelihood that in the process of binding to DNA, one of the helix-turn-helix regions in the AraC DBD undergoes considerable rotation with respect to the other helix-turn-helix region. The increased molecular weight associated with such swapping could be detected with pulse gradient NMR techniques41,42 or other physical techniques sensitive to the hydrodynamic radius.
The sequence identity between AraC and the homologs MarA and Rob is only 21% and 25%,respectively, and similarity is ~45% for both. Nevertheless, the solution structure of the DNA binding domain of AraC is similar to the DNA bound structures of MarA and Rob with pair wise backbone RMSDs of 3.2 Å and 5 Å, respectively. The seven helical regions in the three proteins begin and end at about the same positions in the homology aligned sequences, (Fig. 4), and they have roughly similar positions and orientations in space with respect to each other as shown in the overlay of AraC and MarA, (Fig. 5).
What are the implications of the AraC DBD structure for binding to DNA? If the two H-T-H regions of AraC were to interact homologously with B-form DNA sites one turn apart, the regions would be oriented similarly, and would be separated by about 34 Å. In fact, in the structure, the second helix of the two H-T-H regions differ in orientation by nearly 50 degrees, and they are separated by only about 26 Å. Thus, either the interactions are not homologous and one turn apart or significant distortions in the protein and/or DNA occur upon binding. Analysis of DNA contacts by AraC show that the major energetic contributions from bases in the araI1 DNA half-site derive from two regions of five or six base pairs separated by ten or eleven base pairs.43 Taken together these facts indicate that both AraC DBD and the DNA binding site are significantly distorted in the protein-DNA complex. This deduction is compatible with the observation that the binding of AraC to the full araI site generates a substantial bend, about 90°.44
The two H-T-H regions of MarA clearly do not interact homologously with DNA. In the crystal structure of MarA bound to DNA, the DNA-contacting helix of the second H-T-H region sits within the major groove of the DNA whereas the DNA-contacting helix of the first H-T-H region points more end-on into the major groove of the significantly distorted DNA binding site.19 In the Rob-DNA structure, the DNA is unbent and the second helix does not extend into the major grove at all, it rests on the side of the DNA binding site20. This raises the question of whether the crystal structure of Rob and the micF promoter accurately represents the solution structure or the biologically functional structure of Rob-DNA complexes since in the complexes of Rob and the zwf and fumC promoters, Rob does bend the DNA45. In the DNA-bound structures of both MarA and Rob, the distance between the centers of the DNA-contacting helices is about 26 Å, about the same as we find in the DNA-free solution structure of AraC. Thus, as expected, the DNA to which MarA has bound is substantially bent, ~35°.
As mentioned earlier, the final eleven residues of the AraC DBD, unstructured in the 0.4 M NaCl used in the structure work, are not essential to the regulatory activities of AraC and were deleted for the structure determination. The poorly homologous C-terminal sequence in MarA is structured19, at least when the protein is bound to DNA and in the lower salt concentration crystallization buffer. Although nonessential, it is possible that the C-terminal tail of AraC is also structured when bound to DNA and in low salt concentration46. While truncation of unstructured peptides often facilitates crystal formation, attempts to crystallize the C-truncated DBD were unsuccessful and similar truncation in “full-length” AraC renders the protein considerably less soluble than normal.
In summary, the solution structure of the DNA binding domain of AraC contains two weakly interacting helix-turn-helix regions connected by an α-helix. It is similar, but not identical to the DNA-bound structures of two family members with 20% sequence identity. Either AraC or the DNA or both distort by on the order of 5 Å upon DNA binding. No evidence is seen that the DBD of AraC or any significant part of it is unstructured for a significant fraction of the time when the protein is free in solution. This structure and the dimerization domain structure will permit docking calculations to address the question of information transfer between the two domains that is required for normal AraC function.
Fig.s-1. 1H-15N HSQC spectrum of C-truncated DNA binding domain of AraC at 25 °C. The complete assignments of backbone and side chain amide chemical shifts are indicated. The numbering scheme starts with residue 1 which corresponds to residue 175 of full length AraC. The data were acquired on a Varian INOVA spectrometer operating at a 1H frequency of 800 MHz and equipped with a cryogenic probe. 852 and 204 complex points were collected with sweep widths of 13333 and 3403 Hz in 1H and 15N resulting in acquisition times of 64ms (t2) and 60ms (t1), respectively. The 15N carrier frequency set at 118 ppm. Four transients were averaged for each t1 point with a recycle delay of 1.1 second.
We thank Dr. Ananya Majumdar and the Johns Hopkins Biomolecular NMR Center for extensive assistance with all NMR experiments. We thank Drs. Tingting Ju and Joel Tolman for use of their computers and assistance in the structure calculation. We thank Dr. Blake Hill for additional assistance and Katie Frato, Stephanie Dirla and Jennifer Seedorff for comments on the manuscript, and NIH grant GM18277 for support.