|Home | About | Journals | Submit | Contact Us | Français|
Introduction.Structural genomic centers use both NMR spectroscopic and X-ray crystallographic methods to determine three-dimensional structures of proteins on a genomic scale in a high-throughput mode and to deposit in the PDB.1 The main goal of structural genomics is to determine a large number of protein structures to complement the ever-expanding database of genome sequences. Another role of structural genomics is to delineate the correspondence between sequence and structure space; a number of protein structures from otherwise unrelated (i.e., 8-10% sequence identity) families often prove to have remarkably similar folds.2 This finding, in turn, allows better understanding of the structure-function relationships in those proteins for which either structures are not available or cannot be experimentally determined.
The New York Structural GenomiX Research Consortium (NYSGXRC) has targeted highly conserved protein families for structure determination for which no representative structure is available. The putative phosphoheptose isomerase, coded by the gene gmhA from Vibrio cholerae (NYSGXRC id T1485) and gmhA1 from Campylobacter jejuni (NYSGXRC id T1512) were selected for crystallographic structure determination because there was no representative experimental structure in the PDB at the time of target selection. A BLAST3 search showed that these proteins share high sequence homology with >50 hypothetical proteins from several organisms. Here, we report X-ray crystal structures of T1485 and T1512 and compare them with other structures of similar fold and discuss the putative active site.
Materials and Methods.Protein production. The target gene for T1485 was amplified via PCR from Vibrio cholerae genomic DNA with the appropriate forward and reverse primers and Taq DNA polymerase (Qiagen) by using standard methods. After gel purification, the PCR product was inserted into a pET vector modified for topoisomerase directed cloning (Invitrogen) designed to express the protein of interest followed by a C-terminal hexa-histidine tag and was transformed into TOP10 cells. The clone was confirmed for correct sequence. The expression and solubility of the protein were checked by standard procedures. A medium-scale cell culture was grown by adding 500 mL of LB medium, 25 mL of 10% glucose solution, 500 μL of 30 mg/mL kanamycin solution, and a small amount of transformed cell glycerol stock scraping to a 2-L baffled flask and shaking overnight at 250 rpm and 30°C. For large-scale expression, 10 mL of the overnight culture was added to each of six flasks containing similar culture medium. The cultures were held at 250 rpm and 37°C until the OD (595 nm) reached the range of 0.8. The cultures were then induced with 200 μL of 1 M IPTG. After shaking overnight at 250 rpm and 21°C, the contents of the flasks were poured into 1-L spin bottles and spun at 6500 rpm for 10 min. After removal of the supernatant, the pellets were collected into 50-mL conical tubes (total mass 35.0 g) and frozen at-80°C.
The pellet was resuspended in lysis buffer (35 mL/10 g) containing 50 μL of protease inhibitor cocktail (Sigma) and 5 μL of benzonase (Novagen) and subjected to repeated sonication with intervals of cooling. The lysate was then clarified by centrifugation at 38,900 g for 30 min. The protein was then immobilized on Ni-NTA resin (Qiagen), placed on a drip column, washed with 25 mL buffer A (50 mM Tris-HCl pH 7.8, 500 mM NaCl, 10 mM imidazole, 10 mM methionine, and 10% glycerol), and eluted into Amicon concentrator (Millipore) with 15 mL of buffer A containing 500 mM imidazole. The solution was concentrated to 6 mL, loaded onto an S200 gel filtration column, and was run off with buffer containing 10 mM Hepes pH 7.5, 150 mM NaCl, 10 mM methionine, 10% glycerol, and 5 mM DTT. The protein yield was 138 mg. Seleno-methionine labeled protein was produced and purified in a similar manner.
The target gene for T1512 was amplified via PCR from Campylobacter jejuni genomic DNA with the appropriate forward and reverse primers and Taq DNA polymerase (Qiagen) by using standard methods. The rest of the methods are similar to T1485. The protein yield was 36 mg.
Crystallization, data collection, and structure determination. Crystals of T1485 were grown at room temperature via sitting drop vapor diffusion method from a drop containing 1 μL of 15 mg/mL protein solution, 1 μL of the reservoir solution, and 0.8 μL of 15% (w/v) 1,2,3-Heptanetriol as additive against a 100 μL of the same reservoir solution, containing 0.3 M Ammonium Acetate, 0.1 M BisTris, pH 5.5, and 30% (w/v) PEG3350. X-ray diffraction data were obtained at the National Synchrotron Light Source (NSLS; beam line X12C) at Brookhaven National Laboratory. Data reduction and scaling were performed by using HKL2000.4 T1485 has two molecules in the asymmetric unit, giving a Matthews coefficient (Vm) of 2.4 Å3/Da with estimated solvent content of 49%. The structure was determined by using the multiwavelength anomalous dispersion method. Six Se sites located with SOLVE5 were used in phase refinement with SHARP.6 The phases were further improved by solvent flattening using SOLOMON.7
Crystals of T1512 were obtained via hanging drop vapor diffusion method at 20°C in 0.1 M sodium acetate, 0.1 M BisTris, pH 6.5, and 20% PEG2000 MME. X-ray diffraction data were collected at the NSLS (beam line X9A). The structure was determined via single-wavelength anomalous dispersion.8 Sixteen Se sites were located with SOLVE.9 Additional heavy-atom refinement with SHARP permitted identification of four additional Se sites. Phase refinement was performed with SHARP6 and improved by using RESOLVE9 and NCS averaging. The asymmetric unit contains four monomers, giving a Matthews coefficient (Vm) of 2.4 Å3/Da with solvent content of 49%. An atomic model for one of the four monomers was built on the basis of the experimental map, and the remainder was generated by using noncrystallographic symmetry.
All model building was performed by using ARP/wARP10 and O11 and refined with CNS.12 Solvent molecules located in the difference electron-density maps (|Fobs|-|Fcal|) were included in the final refinement. The data collection and refinement statistics are given in Table I.
Results and Discussion. Overall structure. The final refined model of T1485 consists of 176 amino acid residues composed of residues 1-82 and 98-191. No electron density was observed for residues 83-97 in either monomers A and B. The Ramachandran diagram from PROCHECK13 shows 92% residues in the most favored regions and 8% in additionally allowed regions. The final model of T1512 consists of 188 amino acid residues, with 94% residues in the most favored regions and 6% in additionally allowed regions of the Ramachandran plot.13 The size of each monomer is approximately 61 × 23 × 30 Å3. The monomeric structures of T1485 and T1512 could be superimposed with a root mean square deviation (RMSD) of 0.89 Å (159 common Cα pairs), which came as no surprise, given 40% sequence identity between the two polypeptide chains. The only significant difference between the two monomeric structures is the conformation of the loop region (residues 69-78) and insertion of three residues (residues 19-21) between the two α-helices H1 and H2 in T1485. The discussion that follows pertains to both structures.
Figures Figures1,1, 2(a), and 2(b) show the secondary structural elements and ribbon representations of the T1485 dimer and the T1512 tetramer, respectively. Both monomers adopt α/β structures, consisting of a standard parallel β-sheet flanked by five α-helices forming a three-layered αβα sandwich. The first α-layer consists of two helices (H3 and H7), whereas the third α-layer encompasses three helices (H4, H5, and H6). The middle β-sheet is five stranded (β2, β1, β3, β4, and β5) (Fig.1). Each polypeptide chain folds into a compact, single domain, which resembles the flavodoxin-type nucleotide-binding motif.14 A crevice occurs at the C-terminal ends of β-strands 1 and 3. The most unusual feature of these two structures is the presence of a long helical region (residues 2-43) at the N-terminal region made of two helices (H1 and H2) with a short turn between them. The two helices H1 and H2 are oriented almost orthogonal to each other. Helix H1 is oriented away from the domain and extends toward another monomer related by the NCS twofold and interacts strongly with that monomer.
A DALI search15 identified 4 structures with Z-scores >10 and another 40 structures with Z scores >5. We examined the top 4 similar structures. The Z-score values from the DALI searches were 18.0 with two hypothetical proteins (PDB Codes: 1NRI and 1JEO16), 14.0 with 1MOQ,17 a glutamine amidotransferase (Glucosamine 6-Phosphate synthase), and 11.1 with 1J5E,18 a Thermus thermophilus 30S ribosomal subunit. 1NRI and 1JEO came from other structural genomics centers and were not present in the PDB at the time of target selection. Both 1NRI and 1JEO could be superimposed on T1485 with an RMSD of 1.4 Å (1NRI: 148 common Cα pairs, 21% sequence identity; IJEO: 130 common Cα pairs, 20% sequence identity). The other two structurally similar proteins detected with DALI, 1MOQ, and 1J5E, each has one domain that resembles T1485 (1MOQ: RMSD = 1.5 Å, 108 common Cα pairs, 15% sequence identity; 1J5E: RMSD = 1.8 Å, 95 common Cα pairs, 17% sequence identity) Although pairwise sequence identities are low in all of the above comparisons, the polypeptide chain folds are remarkably similar. These comparisons provide examples of structures with low sequence similarity sharing similar fold,2 and this may provide insight into their possible function. Putative active site. Structural similarities of T1485 and T1512 with Phosphoglucose/Phosphomannose Isomerase (1TZC), Pseudomonas aeruginosa Phosphoheptose Isomerase (1X92), and Glucosamine 6-Phosphate Synthase (1MOQ, 1MOS, and 1XFG) led us to identify putative active sites within these newly determined structures. Our structural comparisons show the presence of sugar isomerase (SIS) domain,19 which suggests that T1485 and T1512 may function as phosphosugar isomerases. As observed in other flavodoxin-type α/β structures, the putative active sites of T1485 and T1512 are located in the crevice formed by the C-terminal ends of β-strands β1 and β3. The homologous structures are complexed with the following: 5-phosphoarabinonate20 (1TZC), D-Glycero-D-Mannopyranose-7-Phosphate (1X92), Glucosamine 6-Phosphate17 (1MOQ), 2-Amino-2-Deoxyglucitol 6-Phosphate21 (1MOS), and L-Glu Hydroxamate22 (1XFG). The sugar-binding sites are shallow clefts on the surface of the molecule formed by three loops, including loop1 between β1 and H3 (residues: 51-54), loop2 between β3 and H5 (residues: 121-124), and loop3 between β5 and H7 (residues 163-167). Most of the bound ligands have phosphate groups. Superposition of the bound 5-phosphoarabinonate (1TZC),20 D-Glycero-D-Mannopyranose-7-Phosphate (1X92), and phosphate substrates of Glucosamine 6-Phosphate Synthase (1MOQ and 1MOS) onto the structure of T1485 shows that the side-chain oxygen atoms (OG) of Ser119, Ser121, Ser124, and the backbone carbonyl oxygen atoms (O) of Thr120 and Ser121 could form hydrogen bonds with the ligand phosphate groups [Fig. 2(c)], if present in the structure of T1485. Other portions of these ligands could be accommodated in the hydrophilic pocket formed by Asn52 O, Tyr167 OH, and Gln172 OE1.
Superposition of L-Glu Hydroxamate bound to Glucosamine 6-Phosphate Synthase (1XFG) and the citrate molecule bound to hypothetical protein Mj124716 (1JEO) onto the structure of T1485 suggests that these ligands could interact with the same group of residues as suggested for phosphate-containing ligands. The concave feature on the surface of T1485 could accommodate even larger polar ligands by interacting with Asp169 OD1, Gln172 OE1, Gly54 O, and Asp58 OD1 [Fig. 2(d)].
The putative active site is predominantly composed of serine and threonine side-chains. The active site residues of 1TZC, 1MOQ, 1X92, and T1512 and those of the putative active site of T1485 are well conserved, particularly among phosphate-binding residues (Fig. 3). This region resembles structural motifs found in conserved phosphoheptose isomerases.23 Ser119 OG and Ser121 OG of T1485 adopt a conformation essentially identical to that of Ser87 OG and Ser89 OG of 1TZC, Ser119 OG and Ser121 OG of 1X92 and Ser347 OG, and Ser349 OG of 1MOQ, respectively. In addition, Ser124 OG of T1485 adopts a similar conformation to that seen for Thr92 OG1 of 1TZC, Thr352 OG1 of MOQ, and Ser126 OG of 1X92.
A striking difference among these structures occurs for residues 162-167 of T1485. The conformation observed in T1485 directs the side-chain of Thr120 into the putative active site, whereas the corresponding residues in 1TZC (Tyr88) and 1MOQ (Gln348) point away from the active site. The putative active site residues of T1512 are strictly conserved with T1485. Because the residues involved in binding the phosphate moiety of the substrate are conserved at the level of amino acid sequence (Fig. 3) and structure [Fig. 2(c)] in T1512, we believe that the discussion above also pertains to T1512.
The arrangement of T1485 monomers in the asymmetric unit buries 2710 Å2 of solvent-accessible surface on dimerization. Monomer-monomer association occurs mainly through hydrophobic interactions, with a modest number of direct and water-mediated hydrogen-bonding contacts. Helix H1 of monomer A makes extensive contacts with the same helix in monomer B. In addition to the helix-helix contacts, two other significant interactions occur between the helices H3 and H7 of monomers A and B. The dimer interface has predominantly negatively calculated electrostatic potential [Fig. 2(d)]. Superposition of the two monomers (RMSD = 0.65 Å) reveals distinct side-chain conformations for His61, His56, Phe165, and Gln172.
T1512 forms a compact homotetramer in the asymmetric unit, burying 7078 Å2 of solvent-accessible surface on tetramerization. The oligomeric structure can be described as dimer of dimers with 222 symmetry. Each T1512 dimer closely resembles the T1485 dimer, with α-helices participating in significant monomer-monomer interactions. Residues corresponding to the missing region of T1485 (83-97) form a helix in T1512. It is remarkable that four copies of this helix form the inner core of the homotetramer. A significant number of buried water molecules were detected within the intersubunit space and may serve to stabilize the quaternary structure of this protein.
Superposition of the tetramer of T1512 against the tetramer of T1485 formed by the crystallographic symmetry reveals that one of the dimers superpose well, whereas the other is translated by ~6 Å from the corresponding dimer of T1512. Although the loop region involving the residues 69-78 between helix H3 and strand β2 folds toward the dimer interface, the same region in T1512 folds away from dimer interface. It is not clear whether such a conformational change is due to the absence of the helix H3′ in T1485. The reason for the translation of the second dimer may be to avoid the steric clashes of this region with the helix H4 of symmetry mate. This difference in tetramer assembly suggests that it may be due to crystal packing.
Conclusion. The structure determinations and discussions presented in this report can further our understanding of this conserved protein family. The structural information, combined with results of biochemical studies may well yield valuable insights into the functional determinants of this protein family. Identification of the putative active site can provide useful hypotheses with which to plan future functional studies. Atomic coordinates of the final model and experimental structure factors have been deposited with the PDB (T1485: 1X94; T1512: 1TK9).
Grant sponsor: National Institutes of Health; Grant number: GM62529 under DOE Prime Contract No. DEAC02-98CH10886 to Brookhaven National Laboratory.