|Home | About | Journals | Submit | Contact Us | Français|
KPN03535 (gi|152972051) is a putative lipoprotein of unknown function that is secreted by Klebsiella pneumoniae MGH 78578. The crystal structure reveals that despite a lack of any detectable sequence similarity to known structures, it is a novel variant of the OB-fold and structurally similar to the bacterial Cpx-pathway protein NlpE, single-stranded DNA-binding (SSB) proteins and toxins. K. pneumoniae MGH 78578 forms part of the normal human skin, mouth and gut flora and is an opportunistic pathogen that is linked to about 8% of all hospital-acquired infections in the USA. This structure provides the foundation for further investigations into this divergent member of the OB-fold family.
KPN03535 (gi|152972051) is an orphan protein that is exclusively found in Klebsiella pneumoniae MGH 78578 (an opportunistic human pathogen belonging to enterbacteriales of gammaproteobacteria; Galperin et al., 2007 ; Gill et al., 2006 ; Frank & Pace, 2008 ; Ley et al., 2008 ) and K. pneumoniae 342 (three-residue substitution). It consists of 132 residues with a calculated pI of 9.40 and a predicted signal peptide. The N-terminus of KPN03535 has a lipoprotein signature, indicated by the presence of an LSGC motif (von Heijne, 1989 ), as well as predictions from LipoP 1.0 (Juncker et al., 2003 ). It is a singleton protein that has not been assigned to any Pfam family, but sequence-based fold-prediction methods (Ginalski et al., 2003 ) suggest similarity to members of the PF01336 family (OB-fold nucleic acid-binding domain). We determined the crystal structure of KPN03535 in order to explore this extremely divergent member of the commonly occurring OB-fold. Structural comparisons show similarities to the OB-fold-containing Cpx-pathway protein NlpE, single-stranded DNA-binding (SSB) proteins, bacterial OB-fold (BOF) and toxin proteins, which enable inferences about function that may now be tested biochemically. This structure should serve as a basis for understanding structure–function relationships in any newly discovered proteins with a similar sequence, such as those identified by whole microbial genome sequencing and metagenomic surveys of the human microbiome.
Clones were generated using the Polymerase Incomplete Primer Extension (PIPE; Klock et al., 2008 ) cloning method. The gene encoding KPN03535 (gi|152972051; Swiss-Prot A6TEE6) was amplified by polymerase chain reaction (PCR) from K. pneumoniae MGH 78578 genomic DNA using PfuTurbo DNA polymerase (Stratagene) and I-PIPE (Insert) primers (forward primer, 5′-ctgtacttccagggcGCTTCTAAAGCCTTTTATTCCGCGGGAG-3′; reverse primer, 5′-aattaagtcgcgttaTTTAACCACCTTGGGATTCTGTAGCGTC-3′; target sequence in upper case) that included sequences for the predicted 5′- and 3′-ends. The expression vector, pSpeedET, which encodes an amino-terminal tobacco etch virus (TEV) protease-cleavable expression and purification tag (MGSDKIHHHHHHENLYFQG), was PCR-amplified with V-PIPE (Vector) primers (forward primer, 5′-taacgcgacttaattaactcgtttaaacggtctccagc-3′; reverse primer, 5′-gccctggaagtacaggttttcgtgatgatgatgatgatg-3′). V-PIPE and I-PIPE PCR products were mixed to anneal the amplified DNA fragments together. Escherichia coli GeneHogs (Invitrogen) competent cells were transformed with the V-PIPE/I-PIPE mixture and dispensed on selective LB-agar plates. The cloning junctions were confirmed by DNA sequencing. Using the PIPE method, the gene segment encoding residues Met1–Leu22 was deleted for expression of soluble protein as these residues were initially predicted to correspond to either a signal peptide using SignalP (Bendtsen et al., 2004 ) or transmembrane helices using TMHMM-2.0 (Krogh et al., 2001 ). Expression was performed in selenomethionine-containing medium. At the end of fermentation, lysozyme was added to the culture to a final concentration of 250 µg ml−1 and the cells were harvested and frozen. After one freeze–thaw cycle, the cells were homogenized in lysis buffer [50 mM HEPES pH 8.0, 50 mM NaCl, 10 mM imidazole, 1 mM tris(2-carboxyethyl)phosphine–HCl (TCEP)] and the lysate was clarified by centrifugation at 32 500g for 30 min. The soluble fraction was passed over nickel-chelating resin (GE Healthcare) pre-equilibrated with lysis buffer, the resin was washed with wash buffer [50 mM HEPES pH 8.0, 300 mM NaCl, 40 mM imidazole, 10%(v/v) glycerol, 1 mM TCEP] and the protein was eluted with elution buffer [20 mM HEPES pH 8.0, 300 mM imidazole, 10%(v/v) glycerol, 1 mM TCEP]. The eluate was buffer-exchanged with TEV buffer (20 mM HEPES pH 8.0, 200 mM NaCl, 40 mM imidazole, 1 mM TCEP) using a PD-10 column (GE Healthcare) and incubated with 1 mg TEV protease per 15 mg of eluted protein. The protease-treated eluate was passed over nickel-chelating resin (GE Healthcare) pre-equilibrated with HEPES crystallization buffer (20 mM HEPES pH 8.0, 200 mM NaCl, 40 mM imidazole, 1 mM TCEP) and the resin was washed with the same buffer. The flowthrough and wash fractions were combined and concentrated for crystallization trials to 16.1 mg ml−1 by centrifugal ultrafiltration (Millipore). KPN03535 was crystallized by mixing 100 nl protein solution with 100 nl crystallization solution in a sitting drop over a 50 µl reservoir volume using the nanodroplet vapor-diffusion method (Santarsiero et al., 2002 ) with standard Joint Center for Structural Genomics (JCSG; http://www.jcsg.org) crystallization protocols (Lesley et al., 2002 ). The crystallization reagent contained 31% polyethylene glycol 600 and 0.1 M CHES pH 9.5. No further cryoprotectant was added to the crystal. A cube-shaped crystal with approximate dimensions 80 × 80 × 80 µm was harvested after 42 d at 293 K for data collection. Initial screening for diffraction was carried out using the Stanford Automated Mounting system (SAM; Cohen et al., 2002 ) at the Stanford Synchrotron Radiation Lightsource (SSRL; Menlo Park, California, USA). The diffraction data were indexed in the orthorhombic space group P212121. The molecular weight and oligomeric state were determined using a 1 × 30 cm Superdex 200 column (GE Healthcare) in combination with static light scattering (Wyatt Technology). The mobile phase consisted of 20 mM Tris pH 8.0, 150 mM NaCl and 0.02%(w/v) sodium azide.
Single-wavelength anomalous diffraction (SAD) data were collected to 2.46 Å resolution on beamline 9-2 at SSRL at the wavelength corresponding to the peak (λ1) of a selenium absorption edge using the Blu-Ice data-collection environment (McPhillips et al., 2002 ). A data set was collected at 100 K using a MAR Mosaic 325 CCD detector (Rayonix USA). The SAD data were integrated and reduced using MOSFLM (Leslie, 1992 ) and scaled with the program SCALA (Collaborative Computational Project, Number 4, 1994 ). Phasing was performed with SHELXD (Sheldrick, 2008 ) and autoSHARP (Vonrhein et al., 2007 ) [20 selenium sites per asymmetric unit, overall FOM (acentric/centric) = 0.34/0.12, overall phasing power (anomalous differences) = 1.2] and automated iterative model building was performed with RESOLVE (Terwilliger, 2003 ). Model completion and crystallographic refinement were performed with Coot (Emsley & Cowtan, 2004 ) and REFMAC5 (Collaborative Computational Project, Number 4, 1994 ) with TLS (one group per monomer) refinement (Winn et al., 2003 ) and medium NCS restraints for all chains. Data and refinement statistics are summarized in Table 1 .
The quality of the crystal structure was analyzed using the JCSG Quality Control server, which verifies the stereochemical quality of the model using AutoDepInputTool (Yang et al., 2004 ), MolProbity (Davis et al., 2004 ) and WHATIF 5.0 (Vriend, 1990 ), the agreement between the atomic model and the data using SFCHECK 4.0 (Vaguine et al., 1999 ) and RESOLVE (Terwilliger, 2000 ), the protein sequence using ClustalW (Thompson et al., 1994 ), atom occupancies using MOLEMAN2 (Kleywegt, 2000 ) and the consistency of NCS pairs. This analysis also evaluates difference in R cryst/R free, expected R free/R cryst and maximum/minimum B values by parsing the refinement log-file and PDB header. Protein quaternary structure analysis was performed using the PISA server (Krissinel & Henrick, 2005 ). Fig. 1 (b) was adapted from an analysis using PDBsum (Laskowski et al., 2005 ) and all other figures were prepared with PyMOL (DeLano, 2002 ). Atomic coordinates and experimental structure factors for KPN03535 have been deposited in the PDB under accession code 3f1z. A systematic search for other proteins of similar structure was conducted using several different methods including the DALI server (Holm et al., 2008 ), the protein structure comparison service SSM at the European Bioinformatics Institute (http://www.ebi.ac.uk/msd-srv/ssm; Krissinel & Henrick, 2004 ) and the flexible structure-alignment method FATCAT (Ye & Godzik, 2003 ).
Residues 1–22 of the full-length protein (1–154) were initially predicted to represent a signal peptide and were removed during cloning. The crystallized protein is comprised of a glycine left after cleavage of the expression and purification tag followed by KPN03535 residues 23–154. The final model contains ten monomers (chains A–J), two PEG molecules (PEG 600 fragments from the crystallization solution) and 323 water molecules in the asymmetric unit. The ten monomers are almost identical in structure and completeness and superimpose extremely well, with pairwise r.m.s.d. values ranging from only 0.2 to 0.4 Å. Residues 23–35 in chains A B, C, E and J, 23–36 and 154 in chain D, 23–35 and 154 in chain F, 23–36 in chains G and H, and 23–38 in chain I and the N-terminal glycine in all chains are disordered and have not been modeled. The Matthews coefficient (Matthews, 1968 ) is ~3.2 Å3 Da−1, with an estimated solvent content of ~62%. The Ramachandran plot produced by MolProbity (Davis et al., 2004 ) shows that 98.5% and 100% of amino acids are in the favored and allowed regions, respectively.
Residues 70–154 of the monomer form the OB-fold comprised of a five-stranded β-sheet (β1, β2, β3, β4 and β5) capped by a short α-helix (α) based on the standard OB-fold nomenclature (Murzin, 1993 ; Fig. 1 a). The capping helix is shorter than those observed in most other OB-fold proteins (Fig. 2 ). Residues 36–69 constitute three additional α-helices (α−2, α−1 and α0) which are not observed in other structures of the same fold. The curved β-sheet forming the β-barrel core of the OB-fold is highly conserved in size and structure, while the largest variations are seen in the three loops (L12, L23 and L45) that extend in different directions from the core and are often functionally important.
Crystal-packing and assembly analysis using PISA (Krissinel & Henrick, 2005 ) supported by analytical size-exclusion chromatography and static light scattering suggest that a monomer is the likely oligomeric state. In the crystal structure, the protein assembles as two stacked pentameric rings, formed by loose interdigitation of the ‘finger-like’ β1–L12–β2 structure, with outer and inner diameters of ~80 Å and ~40 Å, respectively, and a thickness of ~40 Å. The buried surface area of each monomer within each pentamer (~540 Å2) and each monomer in the interface between the two pentamers (~600 Å2) is low. The quaternary structure analysis does not suggest sufficiently strong and extensive interactions to enable complex formation in solution, suggesting that these pentamers could be a crystallization artifact. The N-terminus of each monomer extends into the solvent and probably does not have an impact on the oligomerization state. In the absence of any biochemical data, the functional oligomeric state of the protein remains unknown.
The only other reported bacterial lipoprotein containing an OB-fold is the C-terminal domain of E. coli NlpE (new lipoprotein E), which is an outer membrane lipoprotein in Gram-negative bacteria involved in the envelope stress response in the Cpx pathway. It activates the Cpx, two-component, signal transduction pathway composed of the inner membrane histidine kinase CpxA and the cytoplasmic response regulator CpxR (Raivio & Silhavy, 1997 ). The Cpx pathway controls the production of the periplasmic protease DegP and other proteins involved in fighting cellular stress (Snyder et al., 1995 ; Danese et al., 1995 ; Raivio et al., 1999 ). Other proteins are also implicated in the regulation of the Cpx pathway. For example, CpxP with an LTXXQ motif (Pfam PF07813) is involved in feedback inhibition of the Cpx pathway (Danese et al., 1998 ; Danese & Silhavy, 1998 ). In K. pneumoniae, a periplasmic CpxP-like protein with the LTXXQ motif, KPN03534, is the neighboring gene to KPN03535. Therefore, KPN03535, like KPN03534, may play a role in the Cpx pathway, similar to NlpE. KPN03535 superimposes fairly well on E. coli NlpE (PDB code 2z4i; Hirano et al., 2007 ; r.m.s.d. = 3.3 Å, 16% sequence identity, Z score 2.3; Fig. 2 a). Despite extremely low sequence identity, some residues are conserved in KPN03535 (Arg76, Asp100, Thr105, Lys107, Arg108 and Asn117) from structure-based sequence alignment. However, the functional roles of these residues in NlpE are not known.
Other bacterial OB-fold proteins that have an N-terminal signal sequence are toxins, such as the shiga toxin (PDB code 1r4p; Fraser et al., 2004 ; Fig. 2 b), cholera toxin (PDB code 3efx; Holmner et al., 2004 ; r.m.s.d. = 2.8 Å, 6% sequence identity, Z score 5.8) and a bacterial OB-fold (BOF; Ginalski et al., 2004 ) protein (1nnx; O. Lehmann, A. Galkin, S. Pullalarevu, E. Sarikaya, W. Krajewski, K. Lim, A. Howard & O. Herzberg, unpublished work; r.m.s.d. = 2.9 Å, 14% sequence identity; Z score 7.4; Fig. 2 c). Neither NlpE nor the toxins have all three of the N-terminal helices (α0, α−1, α−2) found in KPN03535, but α−2 is observed in cholera toxin (3efx) and α−1 is observed in BOF protein (1nnx). The capping helix α in KPN03535 is shorter than in the toxins and NlpE, although it is similar to that observed in BOF protein. The β-strands forming the curved β-barrel in all these structures are of similar length, but with differences in the loop sizes that connect the β-strands.
Single-stranded DNA-binding proteins (SSBs) also possess OB-folds and are involved in a multitude of cellular functions, such as DNA replication, transcription, recombination, repair, translation, cold-shock response and maintenance of telomeres (Theobald et al., 2003 ; Chase & Williams, 1986 ; Wold, 1997 ; Meyer & Laine, 1990 ; Lohman & Ferrari, 1994 ; Lohman et al., 1996 ). KPN03535 is structurally similar to OB-fold SSBs, including E. coli SSB (PDB code 1eyg; Raghunathan et al., 2000 ; r.m.s.d. 2.7 Å; 13% sequence identity; Z score 7.0; Fig. 2 d), E. coli PriB (PDB code 1v1q; Liu et al., 2004 ; r.m.s.d. 2.3 Å; 13% sequence identity, Z score 8.0; Fig. 2 e), Thermus thermophilus aspartyl-tRNA synthetase (PDB code 1l0w; Ng et al., 2002 ; r.m.s.d. 2.6 Å; 11% sequence identity; Z score 9.0; Fig. 2 f) and human mitochondrial SSB (PDB code 3ull; Yang et al., 1997 ; r.m.s.d. 2.7 Å; 8% sequence identity, Z score 7.1). The N-terminal α−1 and α0 secondary-structure elements in KPN03535 are partially conserved in aspartyl-tRNA synthetase, but not in the other structures. Many of the loops in OB-fold ssDNA-binding proteins are either involved in interactions with DNA or in quaternary interactions that result in the various oligomeric forms. For example, loop L45, which makes the most interactions with DNA in PriB (Huang et al., 2006 ) and aspartyl-tRNA synthetase, is similar to that of KPN03535, but is much longer in E. coli and in human mitochondrial SSBs. Among the surface-exposed Arg, Lys and aromatic residues that could be functionally relevant if KPN03535 were to bind DNA or RNA (Fig. 3 ), Arg84 and Lys85 of KPN03535 are conserved and correspond to Arg17 and Lys18 in PriB, where Lys18 is involved in ssDNA-binding (Huang et al., 2006 ). Arg83 and Arg99 of KPN03535 are conserved in aspartyl tRNA synthetase as Arg29 (equivalent to Arg28 in the E. coli aspartyl-tRNA synthetase that binds to tRNA; Eiler et al., 1999 ) and Arg39. Multiple structural alignment of various OB-fold proteins using the POSA method (Ye & Godzik, 2005 ) suggests that KPN03535 has a closer relationship to tRNA synthetases than to the BOF protein and is most distant from OB-fold toxins.
Analysis of the electrostatic surface potential indicates that KPN03535 most closely resembles PriB and aspartyl-tRNA synthetase (Fig. 4 ), with a prominent positively charged area similar to the DNA-binding region of these two proteins. Interestingly, this patch is different from that observed in the E. coli SSB, which reflects the known differences in ssDNA-binding modes of SSB and PriB. The basic nature of KPN03535 (pI 9.4) also hints at the possibility of oligonucleotide binding.
In conclusion, the crystal structure of KPN03535 reveals a novel divergent member of the prevalent OB-fold and suggests that it is most likely to be a nucleic acid-binding protein. As for the recently solved structure of MPN554 from Mycoplasma pneumoniae (Das et al., 2007 ), another novel OB-fold with unknown cellular function but with single-stranded DNA-binding properties, the structure of KPN03535 reveals that further exploration of the functionality of the OB-fold is necessary. Bacterial lipoproteins have many important functions and are potential vaccine candidates (Steere et al., 1998 ; Myers et al., 2007 ). K. pneumoniae is an opportunistic pathogen that is prevalent in immunocompromised patients in hospitals and in patients with liver disease (Hidron et al., 2008 ; Pope et al., 2008 ). Functional inferences that can be drawn from this crystal structure should now allow focused structure-assisted biochemistry to establish the exact molecular and cellular role for this protein.
Additional information about KPN03535 is available from TOPSAN (Krishna et al., 2010 ) http://www.topsan.org/explore?PDBid=3f1z.
This work was supported by the National Institute of General Medical Sciences, Protein Structure Initiative Grant U54 GM074898. Portions of this research were performed at the Stanford Synchrotron Radiation Lightsource (SSRL). SSRL is a national user facility at SLAC National Accelerator Laboratory, operated by Stanford University on behalf of the United States Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research and by the National Institutes of Health (National Center for Research Resources, Biomedical Technology Program and the National Institute of General Medical Sciences). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of General Medical Sciences or the National Institutes of Health. Genomic DNA from K. pneumoniae MGH 78578 (ATCC No. 700721D) was obtained from the American Type Culture Collection (ATCC).