|Home | About | Journals | Submit | Contact Us | Français|
In most organisms, efficient d-galactose utilization requires the highly conserved Leloir pathway that converts d-galactose to d-glucose 1-phosphate. However, in some bacterial and fungal species alternative routes of d-galactose assimilation have been identified. In the so-called De Ley–Doudoroff pathway, d-galactose is metabolized into pyruvate and d-glyceraldehyde 3-phosphate in five consecutive reactions carried out by specific enzymes. The penultimate step in this pathway involves the phosphorylation of 2-oxo-3-deoxygalactonate to 2-oxo-3-deoxygalactonate 6-phosphate catalyzed by 2-oxo-3-deoxygalactonate kinase, with ATP serving as a phosphoryl-group donor. Here, a crystal structure of 2-oxo-3-deoxygalactonate kinase from Klebsiella pneumoniae determined at 2.1 Å resolution is reported, the first structure of an enzyme from the De Ley–Doudoroff pathway. Structural comparison indicates that the enzyme belongs to the ASKHA (acetate and sugar kinases/hsc70/actin) family of phosphotransferases. The protein is composed of two α/β domains, each of which contains a core common to all family members. Additional elements introduced between conserved structural motifs define the unique features of 2-oxo-3-deoxygalactonate kinase and possibly determine the biological function of the protein.
d-Galactose is a monosaccharide most commonly occurring in milk in the form of the disaccharide lactose, in which it is linked to d-glucose. Additionally, it is found in many plant tissues, including fruits and vegetables, which contain both monomeric galactose and β-1,4-linked galactosyl residues that form galactolipids in chloroplast membranes and galactan in cell walls (Acosta & Gross, 1995 ; Gross & Acosta, 1991 ). Galactose, together with xylose, arabinose and mannose, is also a constituent of hemicellulose (Schädel et al., 2010 ). In most organisms, galactose utilization is governed by the well conserved Leloir pathway (Fig. 1 ), which starts with the phosphorylation of α-d-galactose to d-galactose 1-phosphate (Gal-1-P) in a reaction controlled by galactokinase (GALK; EC 126.96.36.199; Trucco & Caputto, 1948 ). Galactose-1-phosphate uridyltransferase (GALT; EC 188.8.131.52) then catalyses sugar exchange between Gal-1-P and UDP-glucose (UDP-G), leading to the formation of UDP-galactose (UDP-Gal) and free glucose 1-phosphate (G-1-P) (Caputto & Leloir, 1949 ; Caputto et al., 1950 ). In a subsequent event, the UDP-Gal product is converted to UDP-glucose by UDP-galactose-4-epimerase (GALE; EC 184.108.40.206; Leloir, 1951 ). Cycling through GALT and GALE reactions results in a complete conversion of Gal-1-P to G-1-P. Phosphoglucomutase then converts G-1-P to d-glucose 6-phosphate, which enters the glycolytic pathway as an energy source.
Perturbations in the Leloir pathway lead to galactose-induced toxicity, which has been described in humans and yeast (de Jongh et al., 2008 ; Mehta et al., 1999 ). In humans, deficiency of GALK, GALT or GALE activity, which is usually caused by mutations in the corresponding genes, is diagnosed as one of three types of a disease known as galactosemia (Lai et al., 2009 ; Leslie, 2003 ; Fridovich-Keil, 2006 ). The clinical syndromes of the disorder result from accumulation of Leloir-pathway intermediates, especially Gal-1-P, or of compounds synthesized in secondary routes of galactose metabolism, which include galactitol and d-galactonate. The pathways producing the two latter substances involve aldose reductase catalyzing galactose-to-galactitol conversion (Yabe-Nishimura, 1998 ) and galactose dehydrogenase participating in d-galactonate synthesis (Cuatrecasas & Segal, 1966a ,b ). Since galactitol and d-galactonate are not further metabolized they are predominantly excreted in the urine (Jakobs et al., 1995 ; Wehrli et al., 1997 ), but some is deposited in tissues.
On the other hand, in prokaryotes or fungi these metabolites are not useless as they represent intermediate products of fully functional energy-yielding routes alternative to the Leloir pathway. For example, there are reports of galactitol metabolism through l-tagatose (Schneider et al., 1995 ) or l-sorbose (Fekete et al., 2004 ). Moreover, in enteric bacteria galactitol can be transported and phosphorylated via the galactitol-specific phosphoenolopyruvate-dependent transferase system (PTSGat) to galactitol 1-phosphate, which is then converted to d-tagatose 6-phosphate in a reaction catalyzed by galactitol-1-phosphate dehydrogenase (Nobelmann & Lengeler, 1996 ; Shakeri-Garakani et al., 2004 ). Further metabolism resembles the tagatose 6-phosphate pathway governed by the lac operon encoding the lactose-phosphotransferase system (PTSLac; Fig. 1 ; van Rooijen et al., 1991 ). PTSLac represents yet another way to metabolize lactose-derived galactose and is utilized mostly by lactic acid bacteria (de Vos et al., 1990 ; Zeng et al., 2010 ).
Galactonate, a second byproduct of defects in the Leloir-pathway enzymes, is an intermediate in the De Ley–Doudoroff (DD) pathway of galactose utilization (De Ley & Doudoroff, 1957 ), which resembles the Entner–Doudoroff (ED) pathway of glucose metabolism (Entner & Doudoroff, 1952 ) but involves reactions and enzymes entirely different from those engaged in the ED and Leloir pathways. In an initial step of the DD pathway d-galactose is converted by galactose dehydrogenase (GalDH; EC 220.127.116.11) to d-galactono-γ-lactone, which is subsequently hydrolyzed by lactonase (EC 18.104.22.168) to d-galactonate (Fig. 1 ). d-Galactonate then undergoes dehydration carried out by galactonate dehydratase (GalDH; EC 22.214.171.124), leading to the formation of 2-oxo-3-deoxygalactonate (KDGal). In the next step, this compound is phosphorylated by 2-oxo-3-deoxygalactonate kinase (KDGal kinase; EC 126.96.36.199) to 2-oxo-3-deoxygalactonate 6-phosphate (6-P-KDGal), which decomposes into puruvate and d-glyceraldehyde 3-phosphate. Both products are utilized by the glycolytic pathway. To date, the De Ley–Doudoroff pathway has only been described for a few bacterial species, including Pseudomonas saccharophila (De Ley & Doudoroff, 1957 ), Gluconobacter liquefaciens (Stouthamer, 1961 ), Azotobacter vinelandii (Wong & Yao, 1994 ), Rhizobium meliloti (Arias & Cerveñansky, 1986 ) and Stenotrophomonas maltophilia (Brechtel et al., 2002 ). In addition, a nonphosphorylative variation of the DD pathway has been reported for the filamentous fungus Aspergillus niger (Elshafei & Abdel-Fatah, 2001 ). A shorter version of the pathway, which lacks the first two enzymes, has been discovered in Escherichia coli (Deacon & Cooper, 1977 ) and Mycobacterium butyrium (Szumiło, 1981 ). Detailed analysis of the E. coli route has shown that the genes encoding the key enzymes are organized into the dgo operon (Cooper, 1978 ; Babbitt et al., 1995 ). Transcription from the dgo operon is induced exclusively by d-galactonate. Genome-sequencing projects have identified other organisms that possibly utilize the DD pathway, including several human pathogens such as Klebsiella pneumoniae.
As the most common metabolic pathway of galactose assimilation with human health implications, the Leloir pathway has been extensively studied. A significant fraction of research has been focused on the characterization of the enzymes catalyzing individual reactions, including the determination of their three-dimensional structures (Holden et al., 2003 ). Less attention has been attracted by alternative routes such as the De Ley–Doudoroff pathway and others. In particular, none of the proteins belonging to this metabolic network have been characterized structurally. In this study, we present the first crystal structure of an enzyme belonging to the DD pathway, namely 2-oxo-3-deoxygalactonate kinase from the human pathogen K. pneumoniae, at 2.1 Å resolution. KDGal kinase belongs to a large protein family with over 1100 members mainly found in bacteria and also in viruses.
The 2-oxo-3-deoxygalactonate kinase gene from K. pneumoniae subsp. pneumoniae MGH 78578 was amplified by PCR using genomic DNA as a template and the following primers: 5′-TACTTCCAATCCAATGCCATGACGGCTCGCTACATCGCAATA-3′ and 5′-TTATCCACTTCCAATGTTAGTTTGCCACTGCATAAGCGATGCT-3′. The PCR product was cloned into the pMCSG19 vector (Donnelly et al., 2006 ) according to the ligation-independent cloning procedure (Aslanidis & de Jong, 1990 ; Eschenfeldt et al., 2009 ). The expression vector pAPC38818 was transformed into E. coli strain BL21 (DE3) harboring the plasmid pRK1037 (Nallamsetty et al., 2004 ). This construct allows the expression of a fusion protein containing an N-terminal maltose-binding protein (MBP) followed by a tobacco vein mottling virus (TVMV) protease cleavage site, a His6 tag, a TEV protease cleavage site and a target protein. MBP cleavage takes place in vivo owing to constitutive expression of TVMV protease from the pRK1037 plasmid, generating the His6-tagged target protein.
To produce the protein, the bacterial culture was grown at 310 K and shaken at 200 rev min−1 in enriched M9 medium (Donnelly et al., 2006 ) until it reached an OD600 of 1. After cooling the culture to 277 K, selenomethionine (SeMet) and a mixture of amino acids inhibiting the metabolic pathway of methionine synthesis were added (Van Duyne et al., 1993 ; Walsh et al., 1999 ). After an additional 60 min, kinase expression was induced with 0.5 mM isopropyl β-d-1-thiogalactoside (IPTG). The cells were incubated overnight at 291 K, harvested and resuspended in lysis buffer [500 mM NaCl, 5%(v/v) glycerol, 50 mM HEPES pH 8.0, 20 mM imidazole, 10 mM β-mercaptoethanol]. The SeMet-labeled protein was purified as described previously (Kim et al., 2004 ). Specifically, the protocol included Ni2+-affinity chromatography on an ÄKTAxpress system (GE Healthcare Life Sciences) followed by His6-tag cleavage using recombinant His-tagged TEV protease and an additional step of Ni2+-affinity chromatography performed to remove the protease, the uncut protein and the affinity tag. The pure protein was concentrated using Centricon filtration (Millipore, Bedford, Massachusetts, USA) in 20 mM HEPES pH 8.0 buffer, 250 mM NaCl and 2 mM dithiothreitol (DTT). The final protein yield was ~11 mg per litre of culture.
Size-exclusion chromatography was performed on a GF Superdex-200 (10/300) analytical column using FPLC (GE Healthcare Life Sciences). The column was equilibrated with buffer consisting of 20 mM HEPES pH 8.0, 250 mM NaCl and 2 mM DTT. A 200 µl protein sample at 24 mg ml−1 was injected onto the column. After the run, the column was equilibrated with the same buffer supplemented with 15 mM ATP. A 200 µl protein sample at a concentration of 23 mg ml−1 and supplemented with 10 mM ATP was injected onto the column. The chromatography was carried out at room temperature at a flow rate of 0.3 ml min−1.
The SeMet-labeled 2-oxo-3-deoxygalactonate kinase was screened for crystallization conditions with the help of a Mosquito liquid dispenser (TTP LabTech, Cambridge, Massachusetts, USA) using the sitting-drop vapor-diffusion technique in 96-well CrystalQuick plates (Greiner Bio-One, Monroe, North Carolina, USA). For each condition, 0.4 µl protein solution (30 mg ml−1) and 0.4 µl crystallization formulation were mixed and the mixture was equilibrated against a 135 µl reservoir. Three crystallization screens were used: Index (Hampton Research, Aliso Viejo, California, USA), ANL-1 (Qiagen, Valencia, California, USA) and ANL-2 (Qiagen) at both 297 and 277 K. Diffraction-quality crystals appeared at 277 K in 0.1 M magnesium formate, 15%(w/v) PEG 3350, corresponding to condition H8 from the Index screen.
An X-ray diffraction data set extending to 2.1 Å resolution was collected on the Structural Biology Center beamline BM-19 at the Advanced Photon Source, Argonne National Laboratory, USA. Prior to data collection, the SeMet-labeled protein crystal was cryoprotected in mother liquor supplemented with 20% glycerol and flash-cooled in liquid nitrogen. A single-wavelength anomalous diffraction (SAD) data set was collected near the Se K absorption edge (0.97921 Å) at 100 K. The diffraction data were integrated and scaled with the HKL-3000 suite (Minor et al., 2006 ). The processing statistics are given in Table 1 .
The structure was determined by the SAD method using selenium peak data and the HKL-3000 software suite (Minor et al., 2006 ), utilizing SHELXD for the heavy-atom search and SHELXE for initial phasing (Sheldrick, 2008 ). Next, heavy-atom positions were refined and phases improved by iterations of MLPHARE (Otwinowski, 1991 ) and DM (Cowtan, 1994 ) with NCS averaging included in calculations. The improved experimental maps were used to build an initial model with RESOLVE (Terwilliger, 2003 ). Further model building was performed manually in Coot (Emsley & Cowtan, 2004 ), while crystallographic maximum-likelihood and TLS refinement (six groups per protein monomer; Winn et al., 2001 ) were carried out using PHENIX (Adams et al., 2010 ) and REFMAC5 (Murshudov et al., 2011 ; Winn et al., 2011 ). The refinement statistics are shown in Table 1 .
2-Oxo-3-deoxygalactonate kinase is composed of 292 amino-acid residues with a molecular weight of 31.1 kDa. SeMet-labeled protein was expressed and purified to homogeneity using the Midwest Center for Structural Genomics pipeline. For efficient production, KDGal kinase was expressed as a fusion protein containing additional elements at the N-terminus, namely MBP, which was introduced to enhance solubility, and a His6 tag that facilitates purification by Ni2+-affinity chromatography. The MBP was removed in situ during protein expression in E. coli by a TVMV protease-catalyzed reaction; the His6 tag was removed after metal-affinity chromatography using TEV protease. The final protein used for crystallization bears a non-native N-terminal SNA sequence.
The structure of KDGal kinase was determined by the SAD method using a diffraction data set collected from a crystal of SeMet-labeled protein containing five selenium sites per monomer. The crystal belonged to the orthorhombic space group P212121 and contained four monomers (A, B, C and D) in the asymmetric unit. The final model fitted into electron-density maps comprises of residues Ser(−2)–Asn292 for protein molecule A and residues Met1–Asn292 for molecules B, C and D. The N-terminal Ser(−2)-Asn(−1)-Ala0 residues are cloning artifacts. In chains B, C and D these fragments have no visible electron density and have not been modeled.
In addition to the polypeptide chains, the asymmetric unit contents include 540 water molecules, 17 glycerol molecules and four formate ions. The structure was refined to final R and R free factors of 0.174 and 0.199, respectively. The final model has an 0.016 Å r.m.s. deviation from ideal bond lengths. The quality of the stereochemistry of the structure was validated using the MolProbity server (Chen et al., 2010 ), which indicated the expected distribution of the main-chain torsion angles on the Ramachandran plot.
The structure of KDGal kinase is arranged into N- and C-terminal α/β domains separated by a deep cleft (Fig. 2 ). The core of the N-terminal domain (residues 1–82, 85–131 and 275–292) is established by a mixed β-sheet (E1) containing six strands, S3, S2, S1, S4, S8 and S7, where the S1, S4 and S8 strands are parallel. The β-sheet is sandwiched by two layers of helices. The solvent-exposed layer is formed by α-helix H1 and a short fragment adopting a 310-helix conformation (G2), while the layer neighboring a second domain is built from α-helices H3 and H4 and the C-terminal α-helix H11. Additionally, the N-terminal domain possesses a second antiparallel β-sheet (E2) created by strands S5, S9 and S10 that flanks one opening of the interdomain groove. The C-terminal domain (residues 83–84 and 132–274) is composed of a mixed β-sheet (E3) localized at the domain interface and a cluster of helices (H5, 310 G6, H7, H8, H9 and H10) forming an external part of the domain. The E3 sheet includes strands S6, S13, S12, S11, S14 and S15, the last three of which interact with each other in a parallel manner. Within the helical cluster a central role is played by the H9 helix, around which all other helices are regularly distributed. The secondary-structure elements are connected by 25 loops (L1–L25), with six of them (L1, L2, L10, L12, L15 and L16) representing β-hairpins between consecutive strands of the β-sheet. By analogy to similar proteins, a large groove located at the interface between the two domains encompasses a catalytic center composed of nucleotide-binding and substrate-binding sites (see below).
Comparison of the individual KDGal kinase molecules (A, B, C and D) does not show any significant differences between the monomers. Not surprisingly, the fragments that exhibit some flexibility belong to coil regions, with the most pronounced conformational variations localized to the N-termini and the L3 and L4 loops. Pairwise superpositions of Cα chains reveal the largest deviations to be between molecules A and B (r.m.s.d. of 0.56 Å), while the best agreement is observed for molecules C and D (r.m.s.d. of 0.36 Å).
The overall architecture of KDGal kinase classifies it as a member of the ASKHA (Acetate and Sugar Kinases/Hsc70/Actin) superfamily (Buss et al., 2001 ), which includes acetate (Buss et al., 2001 ) and sugar kinases (Anderson et al., 1978 ), heat-shock cognate 70 (hsc70; Flaherty et al., 1990 ) and actin (Kabsch et al., 1990 ). Members of this family either perform simple ATP hydrolysis (hsc70, actin) or catalyze the transfer of a phosphoryl group from ATP to their cognate substrates (kinases). Both activities depend on the presence of specific metal cations. Despite unambiguous fold similarity between KDGal kinase and other proteins belonging to this family, a DALI-based search for homologous structures (Holm & Rosenstrom, 2010 ) did not reveal any strong matches. The identified structural homologs show an amino-acid sequence identity of 12% or less. The closest relatives are butyrate kinase (PDB entry 1x9j; Z score 13.4, sequence identity 12%; Diao et al., 2009 ), type III pantothenate kinase (PDB entry 3djc; Z score 13.5, sequence identity 11%; Y. Patskovsky, J. B. Bonanno, R. Romero, M. Dickey, C. Logan, M. Maletic, S. Wasserman, J. Koss, M. J. Sauder, S. K. Burley & S. C. Almo, unpublished work) and l-rhamnulose kinase (PDB entry 2cgk; Z score 13.0, sequence identity 11%; Grueninger & Schulz, 2006 ). Additional homologs with Z scores above 12 include propionate, acetate, glycerol, gluconate and xylulose kinases. Such low sequence similarity suggests that KDGal kinase is a relatively distant cousin of other ASKHA representatives and bears some unique structural motifs. Indeed, detailed comparison reveals significant differences.
Typically, both domains of the ASKHA proteins contain a common core with the topology β1β2β3α1β4α2β5α3, while a distinctive feature of each family member is a specific pattern of insertions localized between the conserved elements. In the KDGal kinase structure the N-terminal fragment slightly breaks this canon because its ASKHA core is not fully preserved (Fig. 2 ). Specifically, there is no equivalent of helix α2. Instead, between the β4 and β5 elements (S4–S8 in KDGal kinase description) is an insertion contributing helices G2 and H3, strands S7 and S6 that extend sheets E1 and E3, and strand S5 from the E2 sheet. A second insertion is localized between the β5 and α3 elements (S8–H4) and provides major strands to the E2 sheet (S9 and S10). Therefore, the topology of the ASKHA core of the KDGal kinase N-terminal domain is β1β2β3α1β4β5α3. The C-terminal domain, on the other hand, is composed of the standard ASKHA motif with an insertion between the β3 and α1 elements (S13–H9) consisting of four extra helices (H5, G6, H7 and H8).
Another common feature of the ASKHA proteins is that the phosphoryl-group transfer is coupled to conformational rearrangement leading to domain closure (Hurley, 1996 ; Peters & Neet, 1978 ; Hoggett & Kellett, 1976 ; Miki & Kouyama, 1994 ). The transformation from the open to the closed state occurs through rotation around the hinge region established by the crossover α3 helices (Diao et al., 2009 ; Grueninger & Schulz, 2006 ; Nishimasu et al., 2007 ; Anderson et al., 1978 ; Bennett & Steitz, 1980 ). Crystal structures of different kinases complexed with substrates, nucleotides, metal ions or a combination of all ligands have demonstrated that it is substrate binding that triggers protein motion (Nishimasu et al., 2007 ; Buss et al., 2001 ; Diao et al., 2009 ). Simultaneous binding of nucleotide, although not essential for domain closure, may stimulate it further. On the other hand, trapping ATP alone does not induce any conformational change in these enzymes (Nishimasu et al., 2007 ; Simanshu et al., 2005 ; Diao & Hasson, 2009 ). We note, however, that for those ASKHA representatives which bind only ATP and Mg2+ ions to perform their functions, such as actin and hsc70, nucleotide alone is able to induce closure of the interdomain cleft (Chik et al., 1996 ; Wilbanks et al., 1995 ). Since in the current KDGal kinase structure neither nucleotide nor substrate is bound, it is very likely that it represents an open conformation. This hypothesis is further supported by comparisons with the closest structural relatives. For example, superposition of KDGal kinase onto the structure of ligand-free l-rhamnulose kinase (PDB code 2cgk), illustrating an open state of the protein (Grueninger & Schulz, 2006 ), shows better agreement than superposition with the fructose-bound closed form of l-rhamnulose kinase (PDB code 2cgj), especially in the regions involved in ligand binding (data not shown).
Analysis of several structures and sequences of ASKHA proteins identified five sets of conserved residues that establish signature motifs of the superfamily (Bork et al., 1992 ). They are denoted as PHOSPHATE1 (P1), PHOSPHATE2 (P2), ADENOSINE (A), CONNECT1 and CONNECT2. The first three motifs are involved in nucleotide binding, whereas the two latter segments define an interdomain hinge. The P1 and P2 regions correspond to the β1–β2 hairpins from the N- and C-terminal domains, respectively. The most conservative fragment of the P1 motif bears either a DxG or ExG sequence (where x is any residue). The P2 fragment is more variable. For instance, actin, hsc70 and pantothenate kinase have DxG sequences and sugar kinases have a conserved GT block, whereas butyrate kinases have a triplet of glycine residues. Sequence alignment of KDGal kinases reveals that their P1 motif includes a DWG sequence (residues 9–11, loop L1), while P2 possesses a sugar kinase-like GT signature (residues 139–140, loop L15) (Figs. 3 and 4 ). By analogy to other ASKHA members, Asp9 from the P1 fingerprint (S1 strand) is likely to coordinate a metal cation bound to ATP (see below), while an invariant Arg16 from the same motif (S2 strand) may interact with a phosphoryl group in a manner similar to that observed in glycerol kinase (Hurley et al., 1993 ) and l-rhamnulose kinase (Grueninger & Schulz, 2006 ).
The ADENOSINE motif is localized in the β4–α2 region of the C-terminal domain and the most important fragment of this segment contains the sequence -hhhxGfxh- [where h is purely hydrophobic (VLIFWY), f is partly hydrophobic (VLIFWYMCGATKHR) and x is any residue; Bork et al., 1992 ], with the -Gf- block usually consisting of two consecutive glycine residues. In several ASKHA-family representatives the loop linking elements β4 and α2 is ~7 residues long and part of it folds into an α-helix (Buss et al., 2001 ; Diao & Hasson, 2009 ; Simanshu et al., 2005 ) or 310-helix (Nishimasu et al., 2007 ; Grueninger & Schulz, 2006 ). The equivalent region of KDGal kinase, namely the L23 loop (Gly252–Ser253), is not only much shorter (two residues) but also does not have the consensus sequence (Fig. 3 ).
The CONNECT1 and CONNECT2 motifs correspond to the α3 helices from the N- and C-terminal domains, respectively, and fragments directly preceding these structural elements. The CONNECT1 fragment crosses from the N-terminal domain to the C-terminal domain, while the CONNECT2 motif crosses in the opposite direction, resulting in a close helix–helix contact. Upon substrate binding, these helices move with respect to each other and to the core β-sheets, leading to significant conformational changes that are required for appropriate positioning of reactants within the catalytic apparatus. Typically, the CONNECT motifs contain conserved alanine and glycine residues, which are localized at the helix–helix interface. These residues also have equivalents in KDGal kinase, where CONNECT1 bears an invariant Gly125 residue while CONNECT2 contains Ala278 and Gly282 (Fig. 3 ). In addition, it has been shown that the N-terminus of helix α3 from the CONNECT1 fragment contributes a residue (Asp, Glu or Gln) that functions as a general base during catalysis or is involved in metal binding. The corresponding residue in KDGal kinase is Glu119 (Figs. 3 and 4 ).
In the crystal lattice, the KDGal kinase molecules assemble into two types of noncrystallographic dimer (Figs. 5 a and 5 b). In the more compact dimer, molecule A interacts with molecule B. The same module is formed by molecule C and molecule D. These associations bury about 15% of solvent-accessible surface per monomer, which is 1719 Å2 for the A–B pair and 1759 Å2 for the C–D pair, as calculated by the PISA server (Krissinel & Henrick, 2007 ). The intermolecular interface is established by external fragments of the C-terminal domains and sheet E2 from the N-terminal domains. Specifically, the H8 helix from molecule A (or C) contacts strand S13 and sheet E2 from molecule B (or D) and vice versa. Additionally, the molecules interact via their H5 helices. Based on the interactions, these dimeric assemblies are predicted to be stable in solution. Analysis of the current structure indicates that residues from both protein molecules participate in the formation of the putative KDGal binding site (see below). Previously, the implication of the dimer interface in substrate recognition has been reported for pantothenate kinase (Fig. 5 a; Yang et al., 2008 ).
In the crystal structure, there is an alternative pairing scheme represented by molecules A and C (Fig. 5 b). For this dimer the contact area is smaller (~1400 Å2 per monomer) and the interface engages those parts of both domains that face an interdomain groove. As a result, the tip of the N-terminal domain of each monomer is trapped in a cleft belonging to the second monomer. In this pairing scheme a putative binding site for an ATP molecule located between the H5 and H10 helices is blocked (see below), locking the enzyme in an inactive state. Although the contact area as well as the network of interactions for the A–C dimer are comparable to those for A–B and C–D dimers, the limited number of hydrophobic interactions at the A–C interface suggests that this pairing may represent a crystal-packing artifact.
The combination of the two types of dimers generates a tetrameric assembly which can be described as a dimer of dimers, AB + CD (Fig. 5 c). Size-exclusion chromatography has demonstrated that the homologous protein from Mycobacterium sp. indeed exists as a tetramer in solution (Szumiło, 1983 ). However, size-exclusion chromatography of the Klebsiella KDGal kinase performed in the presence and absence of ATP shows only dimers. This suggests that the minimal biological unit is a dimer with an A–B-type interface.
In ASKHA phosphotransferases the nucleotide molecule binds in a subcavity of the C-terminal domain, with the adenine ring sandwiched in a hydrophobic pocket created by the ADENOSINE motif and one of the helices from the β3–α1 insertion (Fig. 6 a). The ring is primarily anchored by hydrophobic interactions and only in rare cases does it form water-mediated hydrogen bonds with a protein molecule, as is observed, for example, in the structure of l-rhamnulose kinase (Grueninger & Schulz, 2006 ). The adenine amino group is typically exposed to solvent but occasionally forms hydrogen bonds with a protein molecule, as illustrated by the structure of hexokinase (Nishimasu et al., 2007 ). The ribose moiety usually interacts with the insertion elements from the C-terminal domain. These fragments of ASKHA proteins differ between family members; as a consequence, there is no universal pattern of sugar binding. The most conserved contacts between nucleotide and protein molecules involve phosphoryl groups, which typically bind to main-chain amide groups from the P2 loop (Buss et al., 2001 ; Diao & Hasson, 2009 ; Simanshu et al., 2005 ). Additional interactions with side chains have also been observed (Nishimasu et al., 2007 ). After domain closure, the P1 motif also stabilizes a nucleotide molecule within the active site. Interactions between the P1 element and phosphoryl groups occur either through direct hydrogen bonds, for example in l-rhamnulose (Grueninger & Schulz, 2006 ) and glycerol kinases (Hurley et al., 1993 ), or are mediated by a metal cation which is coordinated by a conserved carboxylate residue from the β1 strand, as has been observed in the structures of hexokinase (Nishimasu et al., 2007 ) and hsc70 (Flaherty et al., 1994 ).
Although most of the above elements are conserved in the KDGal kinase structure (see above), one significant difference is noticeable. Specifically, the protein has no well defined cavity for adenine-ring binding. First of all, as mentioned above, the region corresponding to the ADENOSINE motif does not have the usual length and amino-acid composition. In consequence, it does not superpose well with the equivalent fragments from related structures (Fig. 6 ). In addition to this, KDGal kinase does not contain an insertion helix that builds a second face of the groove. This part of the enzyme adopts a loop conformation (L19) and is localized far from the ADENOSINE motif. As a result, the putative adenine-binding site has no clear pocket definition and remains wide open. This feature is utilized by the A–C dimer, in which the putative nucleotide-binding site of one monomer is occupied by a fragment of the complementary molecule.
The lack of capping helix resembles the structure of type III pantothenate kinase (Fig. 6 a; Yang et al., 2008 ). However, in this case the insertion loop remains in close proximity to the ADENOSINE fragment, leaving the cavity much shallower than that observed in KDGal kinase. In consequence, the base moiety is not buried in a dedicated subcavity but is positioned above the P2 hairpin and oriented parallel to a helix that directly succeeds strand β1. The only interactions between the adenine ring and the protein molecule involve stacking with a phenylalanine side chain from the helix. It is likely that in KDGal kinase the nucleotide also binds in an unusual way, but the discrepancy in the organization of putative ATP-binding sites between the current enzyme and related proteins does not permit reliable prediction of protein–nucleotide interactions.
In the absence of natural ligand, the nucleotide-binding sites of molecules B and D of the KDGal kinase structure are filled with glycerol molecules soaked in during cryoprotection and solvent molecules (Fig. 7 ). Molecule B binds four glycerol moieties denoted as GOL1B, GOL2B, GOL3B and GOL4B, whereas in molecule D five ligand molecules could be identified: GOL1D, GOL2D, GOL3D, GOL4D and GOL5D. In both protein monomers GOL1 sits in a small niche near the H5 and H9 helices and is hydrogen bonded to the hydroxyl group of the conserved Ser226 residue. The GOL2 molecules are loosely packed above GOL1, near the L19 loops. The L15 loop (PHOSPHATE1) interacts with the GOL3 moieties. Specifically, its two main-chain amide groups form hydrogen bonds to glycerol –OH groups. The GOL4 molecules are anchored by a direct hydrogen bond to Asp276 and additional water-mediated interactions. GOL5D is bound to the side chain of Asp9.
The comparison of KDGal kinase with similar structures suggests the location of the substrate-binding pocket. The putative 2-oxo-3-deoxygalactonate-binding site is localized near the dimer interface (A–B or C–D pairing) in the vicinity of the E2 sheet, which represents one of the insertions of the N-terminal domain (Fig. 4 ). The main residues surrounding the pocket belong to the S2 strand, the H4 helix and the loop between them (L13). The side chains of Val115 and Met116 contributed by S2 as well as Arg117 from L13 point outside the cavity; thus, only their main-chain atoms line the pocket and possibly interact with a substrate molecule. Additional residues that are likely to participate in KDGal binding include the conserved Tyr78 from loop L6, His141 and Lys143 from L15 (PHOSPHATE1) as well as Glu119 from the H4 helix (CONNECT1). The latter amino acid may play the role of general base during catalysis and could be involved in Mg2+ chelation. The second protein monomer provides further residues, such as the invariant Phe205 and Arg208, that potentially anchor a KDGal molecule.
In the present study, we have determined a 2.1 Å resolution crystal structure of 2-oxo-3-deoxygalactonate kinase, which represents one of the enzymes utilized in the De Ley–Doudoroff pathway of galactose metabolism. Our analysis demonstrates that the protein belongs to the functionally diverse ASKHA superfamily of phosphotransferases. As in the other members of the family, the kinase is composed of two domains separated by a deep groove in which the catalytic reaction takes place. In the crystal lattice, protein molecules associate into a dimer of dimers, but a size-exclusion chromatography experiment indicated the presence of only dimeric species.
The putative substrate-binding site is created mostly by the N-terminal domain, especially by those fragments that are unique to KDGal kinase. On the other hand, the pocket in which the nucleotide docks is formed by the C-terminal domain, in which a loop that is found in all family members participates in extensive interactions with phosphoryl groups. Despite a clear similarity between KDGal kinase and other ASKHA proteins, in particular pantothenate kinase, it is obvious that the ATP-binding site from the current enzyme does not resemble any known architecture. Thus, it is difficult to predict how the nucleotide moiety would bind. To answer this question, a structure of a kinase–ATP complex will need to be determined.
We thank the members of the Structural Biology Center and the Midwest Center for Structural Genomics at Argonne National Laboratory for their help in conducting these experiments. This work was supported by National Institutes of Health grants GM074942 and GM094585, by the US Department of Energy, Office of Biological and Environmental Research under contract DE-AC02-06CH11357 (AJ) and by a scholarship from the Foundation for Polish Science (KM). The submitted manuscript has been created by UChicago Argonne LLC, Operator of Argonne National Laboratory (‘Argonne’). Argonne, a US Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357.