|Home | About | Journals | Submit | Contact Us | Français|
The New York Structural GenomiX Research Consortium (NYSGXRC) has selected the protein coded by yxaF gene from Bacillus subtilis as a target for structure determination. The yxaF protein has 191 residues with a molecular mass of 21 kDa and had no sequence homology to any structure in the Protein Data Bank (PDB) at the time of target selection. We aimed to elucidate the three-dimensional structure for the putative protein yxaF to better understand the relationship between protein sequence, structure, and function. This protein is annotated as a putative helix-turn-helix (HTH) type transcriptional regulator.1,2 Many transcriptional regulators like TetR3 and QacR4 use a structurally well-defined DNA-binding HTH motif to recognize the target DNA sequences. DNA–HTH motif interactions have been extensively studied.5-8 As the HTH motif is structurally conserved in many regulatory proteins, these DNA–protein complexes show some similarity in DNA recognition patterns. Many such regulatory proteins have a ligand-binding domain in addition to the DNA-binding domain.4,9,10 Structural studies on ligand-binding regulatory proteins4,9-12 provide a wealth of information on ligand-, and possibly drug-, binding mechanisms. Understanding the ligand-binding mechanism may help overcome problems with drug resistance, which represent increasing challenges in medicine. The protein encoded by yxaF, hereafter called T1414, shows fold similar to QacR repressor4 and TetR/CamR repressor13 and possesses putative DNA and ligand-binding domains. Here, we report the crystal structure of T1414 and compare it with structurally similar drug and DNA-binding proteins.
The target gene for T1414 was amplified via PCR from Bacillus subtilis genomic DNA with the appropriate forward (ACTAGCAGAGGAGATTCACG) and reverse (CCCCTTTTTTCCTCACCAG) primers and Taq DNA polymerase (Qiagen) using standard methods. Following gel purification, the PCR product was inserted into a pET vector modified for topoisomerase directed cloning (Invitrogen) and designed to express the protein of interest followed by a C-terminal hexa-histidine tag and was transformed into TOP10 cells. The clone was confirmed for correct sequence. The expression and solubility of the protein were checked by standard procedures.
A medium scale cell culture was grown by adding 500 mL LB medium, 25 mL of 10% glucose solution, 500 μL of 30 mg/mL kanamycin solution, and a small amount of transformed cell glycerol stock scraping to a 2 L baffled flask and shaking overnight at 250 rpm and 30°C. Ten milliliters of this was added to each of four flasks containing the same culture medium (except the glycerol stock) for large scale expression (total 2 L) and held at 250 rpm and 37°C until the OD (595 nm) reached the range of 0.8. The cultures were then induced with 200 μL 1M IPTG. After shaking overnight at 250 rpm and 21°C, the contents of the flasks were poured into 1 L spin bottles and spun at 6500 rpm for 10 min. After removal of the supernatent, the pellets were collected into 50 mL conical tubes (total mass 13.4 g) and frozen at −80°C.
The pellet was resuspended in lysis buffer (35 mL/10 g) containing 50 μL of protease inhibitor cocktail (Sigma) and 5 μL benzonase (Novagen) and subjected to repeated sonication with intervals of cooling. The lysate was then clarified by centrifugation at 38,900 g for 30 min. The protein was then immobilized on Ni-NTA resin (Qiagen), placed on a drip column, washed with 25 mL buffer A (50 mM Tris-HCl pH 7.8, 500 mM NaCl, 10 mM imidazole, 10 mM methionine, and 10% glycerol), and eluted into Amicon concentrator (Millipore) with 15 mL buffer A containing 500 mM imidazole. The solution was concentrated to 6 mL, loaded onto an S200 gel filtration column, and was run off with buffer containing 10 mM Hepes pH 7.5, 150 mM NaCl, 10 mM methionine, 10 % glycerol, and 5 mM DTT. The protein yield was 54 mg which was concentrated to 26.8 mg/mL. Seleno-methionine labeled protein was produced and purified in a similar manner.
Native and Se-Met crystals of T1414 were grown at room temperature by sitting drop vapor diffusion from a drop consisting of 1 μL of 10 mg/mL protein solution and 1 μL of the reservoir solution containing 0.2M ammonium sulfate, 30% PEG MME2000, 0.1M sodium acetate pH 6.5 equilibrated against 100 μL of the same reservoir solution. Rod-shaped crystals appeared in days and diffracted to 2 Å. Crystals were cryo-protected by adding 20% glycerol to the mother liquor, frozen by immersion in liquid nitrogen, and diffraction data were collected at the selenium absorption edge using beam line X12C (National Synchrotron Light Source, Brookhaven National Laboratory). Diffraction data were processed using HKL2000.14 Crystals of T1414 belong to the C2 space group with a calculated Matthews coefficient (Vm) of 2.1 Å3/Da, assuming two molecules/asymmetric unit. Nine out of the possible 10 Se sites were found using SOLVE.15 The resulting phases had an overall figure of merit (FOM) of 0.41. Further improvement of phases followed by density modification was carried out using SHARP16 and SOLOMAN.17
The density modified electron density map was of good quality and ARP/wARP18 permitted automated building of 80% of the polypeptide chain. The remainder of the atomic model was built into the experimental electron density map manually using O.19 The final atomic model was refined with CNS20 using high resolution native data. Rfree was calculated based on 10% of randomly selected data excluded from the refinement. Two hundred and thirty-six water molecules picked from the difference Fourier map were included upon convergence of the refinement. The final Rwork and Rfree are 23% and 29%, respectively. Rfree is slightly higher but falls within the statistical range.21 Data collection and refinement statistics are given in Table I. A Ramachandran plot calculated using PROCHECK22 showed that 94% of residues in the most favored regions and 6% in the additionally and generously allowed regions. Coordinates and structure factors have been deposited with the Protein Data Bank (PDB ID:1SGM).
The structure of the T1414 dimer has been determined to 2.0 Å resolution. Each monomer is entirely α-helical and consists of two domains [Fig. 1(A)]. Three helices (H1–H3) form the N-terminal DNA-binding domain with its characteristic HTH motif [Fig. 1(B)], and the remaining helices (H4-H10) form the C-terminal ligand-binding domain.
A DALI23 search documented structural similarity among T1414 and six previously determined structures with Z-score more than 10, including (1) transcriptional regulator (1VI0), (2) transcriptional regulator QacR (1JTY),4 (3) putative transcriptional regulator YfiR (1RKT), (4) TetR family repressor MarR (1T56),13 (5) putative transcriptional repressor (TetR/Accr family) (1T33), and (6) Gamma-Butyrolactone receptor or ARPA-like protein (1UI5).24 1RKT and 1T33 were determined by structural genomics programs and were not present in the PDB at the time of target selection. Despite very low sequence identity (< 16%) between T1414 and each of these six structurally related proteins, they are remarkably similar to T1414 [root mean square deviations (RMSD) for common Cα atoms less than 2.5 Å]. The sequence alignment (CLUSTALW)25 of these structures with T1414 shows that considerable similarity exists within ~ 65 N-terminal amino acids containing the putative HTH motif. Particularly Ala15, Phe47, and Glu52 are strictly conserved [Fig. 1(C)] and the other region shows a very poor sequence similarity [region not shown in figure, Fig. 1(C)]. Not surprisingly, there is closer similarity among the N-terminal DNA-binding domains (15%–42% identity), and more considerable divergence among the C-terminal ligand-binding domain (2%–13% identity). However, pairwise structural superpositions show clear topological similarities among the C-terminal domains, notwithstanding differences in the lengths and relative orientations of α-helices.
The homodimeric arrangement of T1414 molecules found in the asymmetric unit shows that about 2197 Å2 (15.5%) of solvent-accessible surface area is buried on dimerization indicating that the dimer is functional. Across the two-fold axis of noncrystallographic symmetry, monomers superimpose with a RMSD of 0.55 Å for all Cα atoms. C-terminal domain helices H7 and H8 contribute to dimerization. The dimer association is mainly via the hydrophilic interactions, besides a few direct and water mediated hydrogen bonding contacts involving residues Ala114, Glu118, Arg121, Thr169, and Lys171 from helix H7 and H8 of both monomers.
Residues 28 to 47 of N-terminal domain encompass the DNA-binding region that includes a well-defined HTH motif (formed by helices H2 and H3), which is responsible for DNA binding in many other transcriptional regulators.26 The two helices pack together at an angle of ~ 85°, and are connected by a turn of five residues (Gly37–Ala38–Pro39–Lys40–Gly41). The HTH motif is stabilized by the strong hydrogen bond and van der Walls interactions with helix H1. The DNA-binding domain is structurally similar to those of TetR3 and QacR4 (pairwise RMSDs,1.1 Å). A superposition of the HTH motifs of T1414 (red), QacR (blue), and TetR (green) with accompanying bound DNA is shown in Figure 1D, revealing differences within the connecting turn region. Although the number of residues forming the HTH motif is same in all the three proteins, there are differences in the length of the so-called recognition helix (H3; four residues in T1414 and seven residues in both TetR and QacR) and in the number of residues comprising the turn motifs (five residues in T1414, four in TetR, and three in QacR). The side-chains of residues Leu43–Phe46 in the short recognition helix (H3) of T1414 adopt conformations essentially identical to those observed for corresponding residues in QacR and TetR. In both, TetR3 and QacR4 helix H3 makes extensive contacts with the major groove of bound DNA.
The observed sequence and structural similarity of the T1414 DNA-binding domain to those of QacR4 and TetR3 may provide insights into its DNA-binding properties. We anticipate that DNA recognition by T1414 is similar to that exhibited by TetR, which forms a dimeric interaction with DNA. Superposition of T1414 onto the QacR-DNA complex, suggests that the T1414 dimer uses the two HTH motifs to recognize two half sites within twofold symmetric DNA recognition elements found in the B. subtilis chromosome [Fig. 1(B)].
The T1414–DNA model shows no obvious steric clashes between protein and DNA. The spacing between the two HTH motif of the dimer and the successive DNA major groove exactly complement each other. The modeled DNA engages both of the H3 recognition helices within the dimer. Calculated electrostatic potentials for the solvent accessible surfaces of TetR, QacR, and T1414 all show concentration positive features within the protein–DNA interfaces (both observed and predicted). Superficial T1414 residues in this area include those with positively charged side-chains, including Lys34 and Lys40. We believe, therefore, that T1414 binds DNA in a manner similar to that of QacR and TetR.
Comparison of ligand-binding pocket of TetR3 and QacR4 with T1414 suggests that the putative ligand-binding pocket is composed of residues from helices H5, H8, and H9 which form a narrow tunnel-like region. This tunnel, about 20 Å in length with variable diameter (4–6 Å), is predominantly lined by hydrophobic amino acids. Residues Ile56, Val59, Val66, and Leu69 from helix H5, Val123, Val127, Phe128, Trp131, and Phe135 from helix H8, and three leucines (152, 155, and 168) and four isoleucines (156, 160, 164, and 166) from helix H9 form the inner wall of the tunnel. The mouth of the tunnel has a positively charged patch, whereas the opposite end is negatively charged [Fig. 1(E)], both of which may act as charge-neutralizing regions for a bound, dipolar ligand. In our X-ray structure, the tunnel is occupied by a set of well-defined water molecules, which would be expelled on ligand binding. In QacR and TetR, similar tunnel regions support binding of six structurally diverse, cytotoxic drugs4 and tetracycline,11 respectively. The identity of the T1414 binding ligand is not known at present.
Superposition of all six QacR–drug complexes with T1414 suggests that small molecules could bind to T1414 in a similar manner because the ligand-binding tunnels appear to be conserved. Specifically, the conserved aromatic/hydrophobic sidechains of Trp131, Phe87, Phe135, Ile84, Leu139, Met165, and Cys124 form the hydrophobic cluster in the putative drug-binding pocket of T1414. Superposition of the TetR–tetracycline complex11 with T1414 shows that tetracycline occupies an analogous hydrophobic region. These comparisons suggest that T1414 also bind small molecules in its putative ligand-binding pocket.
Structure determination of T1414 and the comparison of this dimeric, HTH protein with QacR and TetR have furthered our understanding of this transcriptional regulator protein with a putative DNA-binding HTH motif. Guided by the structure, ligand- and DNA-binding studies should yield valuable information about T1414 and its close relatives in other bacteria.
We thank Dr. A. Saxena for providing data collection facilities at the NSLS.
Grant sponsor: National Institutes of Health; Grant number: GM62529.
*This article is a US government work and, as such, is in the public domain in the United States of America.