|Home | About | Journals | Submit | Contact Us | Français|
The crystal structure of the product of the Bacillus subtilis ykuD gene was solved by the multiwavelength anomalous dispersion (MAD) method and refined using data to 2.0 Å resolution. The ykuD protein is a representative of a distinctly prokaryotic and ubiquitous family found among both pathogenic and nonpathogenic Gram-positive and Gram-negative bacteria. The deduced amino acid sequence reveals the presence of an N-terminal LysM domain, which occurs among enzymes involved in cell wall metabolism, and a novel, putative catalytic domain with a highly conserved His/Cys-containing motif of hitherto unknown structure. As the wild-type protein did not crystallize, a double mutant was designed (Lys117Ala/Gln118Ala) to reduce excess surface conformational entropy. As expected, the structure of the LysM domain is similar to the NMR structure reported for an analogous domain from Escherichia coli murein transglycosylase MltD. The molecular model also shows that the 112-residue-long C-terminal domain has a novel tertiary fold consisting of a β-sandwich with two mixed sheets, one containing five strands and the other, six strands. The two β-sheets form a cradle capped by an α-helix. This domain contains a putative catalytic site with a tetrad of invariant His123, Gly124, Cys139, and Arg141. The stereochemistry of this active site shows similarities to peptidotransferases and sortases, and suggests that the enzymes of the ykuD family may play an important role in cell wall biology.
In spite of rapid advances in microbial genomics and an expanding set of complete prokaryotic genomes, a significant portion of bacterial genes code for nonannotated proteins of unknown structure and function. As the search for novel antibiotic targets intensifies in the face of an advancing threat of revival in infectious diseases, structural characterization of such orphan proteins or entire families of bacterial proteins is of particular importance.
One such structurally uncharacterized, ubiquitous family of exclusively prokaryotic proteins comprises putative enzymes containing the so-called ErfK/YbiS/YhnG domain (Pfam03734.8) as identified by the SMART database.1 These proteins are found in a vast variety of bacteria, ranging from pathogenic Bacillus anthracis, Vibrio cholera, Clostridium tetani, and others, to nonpathogenic Escherichia coli, Thermotoga maritima, Bacillus subtilis, and so forth. All ErfK/YbiS/YhnG domains contain a highly conserved stretch of amino acids (ΦGΦHGTX10(S/T)XGCΦR(M/L), where Φ is a hydrophobic residue and X denotes any amino acid). The presence of absolutely conserved His and Cys residues is suggestive of a catalytic function, probably as hydrolases or transferases (Fig. 1). Many of these proteins also contain the so-called LysM domain, which appears to have a peptidoglycan-binding function and is found in enzymes involved in cell wall degradation, and other proteins that could be associated with bacterial cell walls.2 Interestingly, homologous domains have also been discovered in a family of plant transmembrane receptor kinases, which are involved in the recognition of bacteria secreting lipochitin oligosaccharide signal molecules.3 Only one structural study of a LysM domain has been reported to date.2 The association of the ErfK/YbiS/YhnG domain with the LysM module suggests that the former may also play a role in cell wall degradation or, more specifically, in peptidoglycan metabolism. This is interesting because such unique prokaryotic proteins constitute a particularly suitable target for antibiotics.
The B. subtilis ykuD protein was selected as one of the targets for high-throughput structure determination by the Midwest Center for Structural Genomics (MCSG). It expressed well in E. coli but failed to crystallize despite extensive screening. We have therefore applied the recently proposed approach of surface conformational entropy reduction to generate on the protein’s surface patches that would be able to mediate crystal contacts.4 Accordingly, Lys117 and Gln118 were mutated to alanines, and the purified double mutant yielded high-quality crystals.
In this article we describe the details of the ykuD structure. Aside from the expected LysM module, the structure reveals a novel tertiary fold for the C-terminal ErfK/YbiS/YhnG domain. The stereochemical features of the amino acids in the fingerprint, conserved stretch (i.e., of His123, Gly124, Cys139, and Arg131), confirm that the ErfK domain is very likely to have catalytic properties. Based on the crystal structure, we propose a possible catalytic mechanism for the enzyme and speculate on the nature of the substrate. Our work sets the stage for functional and biochemical characterization of a novel and probably important family of ubiquitous bacterial enzymes.
The ykuD gene was cloned by polymerase chain reaction (PCR) into the pMCSG7 vector.5 The vector encodes a leader sequence consisting of a hexahistidine affinity tag followed by an eight–amino acid spacer and the tobacco etch virus (TEV) protease recognition site. The protein was expressed in the E. coli strain BL21 (DE3) and purified by Ni(2+)-nitrilotriacetic acid (Ni-NTA) metal-affinity chromatography. The affinity tag was removed during incubation with recombinant tobacco etch virus (rTEV) protease for 24 h. Protein was dialyzed against 20 mM 2-morpholinoethanesulfonic acid (MES) (pH 6.5), 300 mM NaCl, and 1.4 mM β-mercaptoethanol; and concentrated to 15 mg/mL for crystallization. Extensive screens yielded no results. Consequently, suitable sites for surface mutagenesis to enhance crystallizability were identified based on secondary structure prediction (http://www.bioinfo.pl). The dipeptide Lys-Gln (residues 117 and 118) was identified as one such potentially solvent exposed site and both residues were mutated to Ala with use of the QuikChange™ mutagenesis kit (Stratagene). Crystals of this mutant were obtained at 21°C from 0.1 M acetate (pH 4.5), 0.2 M Li2SO4, 30% polyethylene glycol (PEG) 8000, and 10 mM CdCl2. The selenomethionine (Se-Met)-labeled protein was expressed in E. coli strain B843 and purified in a manner identical to that described for the native protein. Crystals of the labeled protein were obtained from 0.1 M acetate (pH 5.1), 0.2 M Li2SO4, 30% PEG 3350, and 10 mM CdCl2.
Data collection was performed at the National Synchrotron Light Source (NSLS), beamline X9B, at the Brookhaven National Laboratory. All data were processed and scaled using HKL2000 software package.6 The ykuD structure was solved at 2.05 Å resolution by multiwavelength anomalous dispersion (MAD) using data collected at the absorption peak, inflection point, and high-energy remote wavelengths. Data statistics are shown in Table I. The crystal belongs to space group P212121; the unit cell parameters are as follows: a = 56.2 Å, b = 63.9 Å, c = 93.6 Å. The Matthews coefficient is 2.4 Å3/Da assuming two molecules per asymmetric unit. SHELXD7 was used to solve the anomalous substructure. Four selenium sites were used for phase calculation in SHARP.8 The phases were used as an input to RESOLVE,9 which automatically built 297 out of 328 residues present in the asymmetric unit. The remaining portions of the model were build manually with O10 and COOT.11 The refinement of the ykuD model was carried out with REFMAC512 using the high-energy remote data to 2.0 Å resolution and individual isotropic displacement parameters (B factors).
The atomic coordinates and structure factors of the ykuD protein have been deposited in the Protein Data Bank (PDB) under the accession code 1Y7M.
The refined crystallographic model consists of two ykuD molecules in the asymmetric unit, each made up of 164 residues in a single polypeptide chain, 2 Cd2+ ions, 4 sulfate ions, and 150 water molecules. The stereochemical quality of the model was analyzed with PROCHECK13: 88.6% of nonglycine residues are in the favored regions, and 11.4% are in the allowed regions of the Ramachandran plot. The side-chains of 18 amino acids (11 from molecule A, and 7 from molecule B) are not visible in the σA weighted 2mFobs-DFcalc electron density map contoured at 1 σ, and were consequently excluded from refinement. The average B factors for the two LysM domains are consistently about 15% higher than those for the catalytic domains, consistent with a degree of dynamic disorder in these modules. The final values of R and Rfree factors were 21.1 and 26.9 %, respectively. The refinement statistics are shown in Table I.
The ykuD protein is made up of two distinct domains: the 52-amino acid-long N-terminal LysM domain and the 112-amino acid C-terminal putative catalytic domain. The two ykuD molecules form a noncrystallographic, head-to-tail dimer with the local two fold axis nearly parallel to the crystallographic c axis [Fig. 2(A)]. Gel filtration indicated a monomer as a dominant species in solution; consequently, the noncrystallographic dimer has no functional implications.
The LysM domain was identified in ykuD based on sequence analysis, and indeed its structure is similar to the solution (NMR) structure of the isolated LysM domain from the E. coli membrane-bound lytic murein transglycosylase D.2 Ours is the first reported crystal structure of any protein containing LysM. The domain is known to have a general peptidoglycan binding function and is characterized by a βααβ tertiary structure with two α-helices located on the same side of a two stranded, antiparallel β-sheet. Figure 2(B) represents the LysM domain of ykuD superimposed on its structural homolog. Although both structures are very similar, there are some discrepancies, including part of helix B and the downstream loop (residues 22 to 36). The average B factors for that part of the structure are higher than for the rest of the domain (~25 Å2 vs ~22 Å2), consistent with higher intrinsic mobility of the loop. Moreover, the LysM domain of ykuD does not form any significant crystal contacts, which may be a reason for its reduced stability.
The 112–amino acid ErfK/YbiS/YhnG domain represents a novel fold. No structure similar to this domain was detected using DALI.14 It is a β-sandwich, with two mixed β-sheets, one containing five, and the other, six β-strands [Fig. 2(C and D)]. Strand 3 is involved in both sheets, while strands 4 and 5 are almost contiguous, except for Ile 98, which has atypical secondary conformation (ϕ = −100.6, ψ = −50.9 ) and creates a kink between them. A single α-helix is found between strands 9 and 10. The most conserved residues of the fingerprint sequence [i.e., 120ΦGΦHGT125 and 136(S/T)XGCΦR(M/L)142] are located on the solvent-exposed concave face within strands 7 and 9 [Fig. 2(C)].
Because the fingerprint of the ErfK/YbiS/YhnG domain family (Fig. 1) contains strictly invariant His and Cys residues, often found in hydrolytic enzymes and transferases, the SMART database1 suggests that the domain has a catalytic function; consequently, we will henceforth refer to it as such. As pointed out above, the most conserved residues cluster within the seventh and the ninth strands of the β-sheet. We note that the disposition of three residues—His123 and Gly124, located on β-strand 7, and Cys 139, located on β-strand 9—has distinct similarities to so-called catalytic triads present in a wide range of enzymes, including serine proteases, cysteines proteases, esterases, lipases, and a number of more esoteric, nonhydrolytic enzymes.15 The role of a nucleophile is typically reserved for a Ser or a Cys, which interacts with the most invariant residue in a triad, a histidine, which in turn is hydrogen-bonded to a neutral or negatively charged H-bond acceptor. The details vary from family to family.15 The interaction of Ser with His is in most cases mediated in the ground state by a weak H-bond between the hydroxyl and imidazol’s unprotonated Nϵ2. During the acylation step, the hydrogen released by Ser is captured by His, which adopts the protonated, imidazolium form. In papainlike cysteine proteinases, the ground state enzyme contains an ion pair in which both Cys and His are found in the ionized forms as a thiolate and imidazolium, enhancing the nucleophilic potential of cysteines.16 This pattern is not universal though, because viral 3C proteases operate by a general base mechanism, similar to serine proteases. 17 The conformation of the catalytic histidine is stabilized in all cases by additional interactions. In serine hydrolases, the Nδ1 group of the imidazole is suitably positioned to donate an H-bond to Asp or Glu,18,19 and in two notable cases the accepting group is a backbone carbonyl20,21; in the papain family of cysteine proteases, the Nϵ2 of the catalytic histidine donates an H-bond to Asn, although examples of enzymes with aspartates in that position are also known.22,23
In ykuD, the three amino acids of the putative triad are capable of forming similar interactions, albeit in the crystal structure, the side-chains of Cys139 and His123 in both molecules are perturbed by an intimate symmetric crystal contact that traps two Cd2+ ions and two sulfates at the intermolecular interface (Fig. 3). Cys139 directly coordinates Cd2+ (sulfur–metal distance of 2.33 Å), which is also coordinated by Nϵ2 of His119 from an adjacent protein molecule and two oxygen atoms of a sulfate ion, to complete a tetrahedral complex. The third oxygen of the sulfate accepts an H-bond from Nϵ2 of the catalytic His123. Thus, the thiolate of Cys139 actually points away from the imidazole of His123, not as expected for a catalytic triad. However, we believe that this is a crystallization artifact as a rotation around the χ1 torsion angle of Cys139 is sufficient to generate a canonical interaction with Nϵ2 of His123. The Nδ1 of His123 donates an H-bond to the main chain carbonyl of Gly124, in a manner similar to other triads. Overall, the mutual disposition of Cys139, His123 and Gly124 strongly suggest that this group constitutes a variant of the ubiquitous catalytic triad [Fig. 4(A)].
A triad is not sufficient to account for all the catalytic potential of serine- and cysteines-dependent hydrolases and transferases. The stabilization of the incipient oxyanion following the acylation step is accomplished within a so-called oxyanion hole, a positively charged surface crevice with two or three strong H-bond donors.24,25 Our atomic model reveals that a similar structure is also found within ykuD. The aforementioned sulfate molecule coordinated by the Cd2+ ion fortuitously serves as an approximate model for the tetrahedral intermediate. One of its oxygens forms H-bonds with backbone amides of Lys137, the absolutely conserved Gly138, and Cys139. The distances are 3.19 Å, 3.24 Å, and 3.03 Å, respectively. Two or three of these amides could act as the oxyanion hole, forming hydrogen bonds with the negatively charged oxygen of the tetrahedral intermediate of proteolytic reaction [Fig. 4(A)]. This observation further supports the notion that the putative active site’s stereochemistry in ykuD is consistent with canonical general base mechanism for serine/cysteine hydrolases.
Although the stereochemistry of the putative active site in ykuD is consistent with it having a hydrolytic or transferase function, akin to that of a protease or an esterase, the tertiary fold of the catalytic domain is not similar to any of the triad-containing families, such as chymotrypsin, subtilisin, papain or α/β hydrolase proteins. However, we note some limited topological similarities to the recently characterized structures of sortases (i.e., transpeptidases located in the cell envelope of Gram-positive bacteria) that catalyze the anchoring of specific cell-surface proteins via their unique LPXTG motif to a pentaglycine cross-bridge of peptidoglycans.26–29 In either sortase, the putative active site is located in a shallow depression on a concave side of a β-sheet, with the Cys residue found at the end of a β-strand. Interestingly, even though an involvement of a His residue has been postulated in sortases, the crystal structures suggest that it is a dyad of Cys and an invariant Arg that constitutes the active ensemble of residues.30 This has been corroborated by biochemical studies of the Staphylococcus aureus sortase A, which showed that replacement of the relevant Arg197 with other amino acids severely impedes catalysis.31 It has been postulated that Arg197 may facilitate the formation of a thiolate during the transpeptidation reaction catalyzed by the sortases.31
Close inspection of the ykuD active site also reveals the presence of the side-chain of an arginine, Arg141, in the active site, in an immediate proximity of Cys139. The mutual disposition of the two amino acids is very similar to that seen in sortases [Fig. 4(B)]. Arg141 is conserved in all sequences of the ErfK/YbiS/YhnG domains (Fig. 1). Thus, ykuD and its homologs appear to have within a short stretch of amino acids all the active site components found in cysteines-dependent proteinases and transpeptidases: a triad of Cys, His, and an H-bond acceptor stabilizing the imidazol, a well- formed oxyanion hole stabilizing the tetrahedral intermediate, as well as a positively charged residue enhancing the nucleophilicity of the Cys. It is tempting to speculate that the structural similarities between ykuD constitute a reflection of functional parallels.
The biological function of ykuD, or indeed of any of its homologs containing the ErfK/YbiS/YhnG domain, remains unknown. However, the presence of the LysM domain in ykuD indicates that the enzyme may be active when bound to peptidoglycan, although we note that in a number of proteins containing the ErfK/YbiS/YhnG domains the LysM domain is absent. It is known that ykuD is expressed in B. subtilis among spore proteins, and that its transcription is regulated by the σK factor, which is active primarily during bacterial sporulation.32 However, the function of the protein cannot be exclusive to sporulation, because it is found in a variety of Gram-negative bacteria as well.
To our knowledge, there are no structures known for Cys-dependent, triad-containing esterases, thioesterases, or lipases. It appears that cysteines only function in proteases and transpeptidases. The stereochemistry of the putative active site in ykuD, and in particular the presence of a well-developed oxyanion hole, supports the notion that the enzyme utilizes a general base mechanism and undergoes acylation. This is consistent with either a peptidase (protease) activity or a transpeptidase activity. Although we were unable to confidently model an acceptor peptide (such as pentaglycine) into the active site, there appears room for such an acceptor, so we cannot rule out that the enzyme in fact is a transpeptidase similar to sortases, anchoring proteins in the spore’s coat, and performing similar function in the cell wall of Gram-negative bacteria.
Further studies are necessary to answer all the pending questions regarding the function and activity of the ErfK/YbiS/YhnG domains. The determination of the ykuD crystal structure is an important step in this process and paves the way for the future studies of this family of proteins. Once a substrate is identified and a functional assay is available, it will be possible to dissect the structure–function relationships in the putative active site using site-directed mutagenesis.
The determination of the crystal structure of ykuD has been possible due to the application of the conformational surface entropy reduction concept, because the wild-type protein did not crystallize. It is therefore instructive to analyze the crystal contacts and assess the impact of surface mutagenesis on the lattice-forming interactions.
There are extensive contacts crystallographically related to dimers in the crystal structure, but there are specifically two crystal contacts that appear to stem directly from the surface mutations and crystallization conditions. The first contact is formed between two polypeptide chains related by noncrystallographic symmetry that belong to two adjacent noncrystallographic dimers (Fig. 3). Not surprisingly, it involves the two mutated residues—Ala117 and Ala118. These amino acids are located, as predicted, on a solvent-exposed loop. Although the two alanines are not involved in any direct intermolecular interactions, the wild-type side-chains of Lys and Gln within this loop would have almost certainly caused a steric obstacle, preventing two ykuD molecules from interacting. The third residue involved in this contact is His119, which coordinates a cadmium ion (Nϵ2–Cd2+ distance of 2.27 Å) trapped between two protein molecules at the active site. The second crystal contact is mediated by sulfate ion sequestered between two arginines (Arg159) in the center of the noncrystallographic dimer. The distances between sulfate oxygen and Nη1 of arginines from molecule A and molecule B are 2.78 Å and 2.65 Å, respectively. Moreover, Nη2 of Arg159 form molecule A creates one more hydrogen bond (3.40 Å) with sulfate oxygen. This interaction is additionally stabilized by His57 from molecule A and by Tyr96 from molecule B.
Successful crystallization of ykuD depends on the mutations (wild-type protein does not crystallize), as well as the presence of both sulfate and Cd2+ ions in the precipitant. The structure shows that the sulfate ions help stabilize a noncrystallographic dimer, while the Cd2+ ions and mutations are critical in forming cross bridges between the dimers generating layers of molecules perpendicular to the c axis. We note that this crystal architecture has similarities to other successful cases of surface mutagenesis, where noncrystallographic dimers are found in the asymmetric unit.33
This is the first crystallographic structural study of a representative of a ubiquitous family of exclusively prokaryotic proteins that contain both the LysM module, which mediates interactions with cell wall peptidoglycan, and the ErfK/YbiS/YhnG domain. The structure of the B. subtilis ykuD reveals a well-formed putative catalytic site, identifying potential functional roles for specific conserved residues such as His123, Cys139, and Arg141. The stereochemistry of the active site strongly suggests that the enzyme is a peptidase or transpeptidase, although we cannot be certain of the substrate. The presence of ykuD homologs in a variety of pathogenic bacteria, and the likelihood that these proteins play a role in the maintenance of cell wall, will hopefully lead to a thorough functional characterization of these interesting enzymes.
Grant sponsor: National Institute of General Medical Sciences; Grant numbers: GM62615 (to Z. S. Derewenda) and P-50-GM62414 (to A. Joachimiak).