|Home | About | Journals | Submit | Contact Us | Français|
High-throughput (HT) protein crystallography is severely impeded by the relatively low success rate of protein crystallization. Proteins whose structures are not solved in the HT pipeline owing to attrition in any phase of the project are referred to as the high-hanging fruit, in contrast to those proteins that yielded good-quality crystals and crystal structures, which are referred to as low-hanging fruit. It has previously been shown that proteins that do not crystallize in the wild-type form can have their surfaces engineered by site-directed mutagenesis in order to create patches of low conformational entropy that are conducive to forming intermolecular interactions. The application of this method to selected proteins from the Bacillus subtilis genome which failed to crystallize in the HT mode is now reported. In this paper, the crystal structure of the product of the YdeN gene is reported. Of three prepared double mutants, i.e. E124A/K127A, E167A/E169A and K88A/Q89A, the latter gave high-quality crystals and the crystal structure was solved by SAD at 1.8 Å resolution. The protein is a canonical α/β hydrolase, with an active site that is accessible to solvent.
Structural genomics, which synergistically involves both NMR and X-ray crystallography, is a global effort aimed at the determination of three-dimensional protein structures on a genomic scale in a high-throughput mode (Berman et al., 2000). Of particular interest in this early phase of the program is the determination of structures of proteins believed to represent new or barely known families. Every new structure deposited in the PDB increases the accuracy of homology modeling. This in turn allows better understanding of the structure–function relationships in those proteins for which structures cannot be directly determined experimentally (Stevens, 2000). However, the effectiveness of high-throughput (HT) crystallographic analysis is impeded by several bottlenecks, including the relatively low success rate of crystallization. It is estimated that of all proteins expressed in a soluble state, only between 10 and 30% form X-ray quality crystals (Claverie et al., 2002; Sulzenbacher et al., 2002). The most recent statistics available from the Midwest Structural Genomics Center (http://www.mcsg.anl.gov) show that of all successfully purified soluble proteins, less than 40% produced crystals, of which less than 60% were X-ray grade crystals that yielded useful diffraction. Thus, the first sweep of the structural genomics program is likely to harvest only the ‘low-hanging fruit’, i.e. readily expressed, stable and easily crystallizable proteins. Unfortunately, this may leave many biologically important proteins ‘hanging high and dry’.
To circumvent one of the bottlenecks, i.e. the crystallization problem, we proposed a novel approach to crystallization of proteins based on rational surface mutagenesis to create patches with low overall conformational entropy in order to facilitate the formation of crystal contacts (Derewenda, 2004). This approach proved successful in a model system of human RhoGDI and allowed the crystallization and crystal structure determination of new proteins (Longenecker, Garrard et al., 2001; Longenecker, Lewis et al., 2001; Mateja et al., 2002). We and others have also shown that the method is useful to generate crystal forms that diffract to much higher resolution than the wild-type protein, which may prove to be of importance in drug design (Mateja et al., 2002; Munshi et al., 2003).
We are now applying this strategy to those selected targets of the Bacillus subtilis structural genomics effort which failed to crystallize in the HT pipeline. This paper describes the crystallization by surface modification and structure determination of the product of the YdeN gene from B. subtilis (Midwest Centre for Structural Genomics code APC 1086). The protein was originally thought to have a unique sequence and potentially a novel fold. Subsequent tertiary-structure prediction revealed that it shows similarities to α/β hydrolases and this information was used to design three double mutants, including the K88A/Q89A mutant, in which residues with high conformational entropy located on putative surface loops were targeted. The K88A/Q89A double mutant produced crystals of high quality that diffracted to 1.8 Å. The structure revealed that the protein is a member of the ubiquitous α/β hydrolase family and structural considerations suggest that it is a hydrolase active on a soluble ester, possibly a thioester, but is not an interfacially activated lipase. The molecular model, including H atoms and anisotropic displacement parameters, was refined to a conventional R factor of 12.4%.
The pMCSG7 expression vector containing the YdeN gene with a cleavable N-terminal hexa-His tag was obtained from the Midwest Center for Structural Genomics. Initial sequence-homology searches carried out at the beginning of the project indicated that the protein has no homologues of known structures and no identifiable domains. However, further analysis, including secondary- and tertiary-structure prediction using the Polish Bioinformatics Site Meta Server (Ginalski et al., 2003; Ginalski & Rychlewski, 2003; von Grotthuss et al., 2003), yielded results that strongly suggested that the protein belongs to the α/β hydrolase family. Despite the lack of identifiable sequence similarities, the fold-recognition algorithm generated an amino-acid alignment with bromoperoxidase A1, chloroperoxidase F, chloroperoxidase T (Hofmann et al., 1998) and 2-hydroxyl-6-oxo-6-phenylhexa-2,4-dienoic acid (HPDA) hydrolase (BphD enzyme) from Rhodococcus sp. (Nandhagopal et al., 2001). This alignment revealed the presence of three conserved canonical residues forming a putative catalytic triad, i.e. Ser, Asp and His, lending credibility to the result. As this work was in progress, similar annotation appeared in the NCBI Blast search.
Using a canonical model of an α/β hydrolase, three double mutants were designed so that the mutation sites were expected to be on solvent-exposed loops. The mutants were E124A/K127A, E167A/E169A and K88A/Q89A. The Quik-Change mutagenesis kit (Stratagene) was used to introduce the designed mutations into the expression plasmid.
The wild-type and mutated proteins were expressed in the Escherichia coli BL21(DE3) strain. Cells were grown in LB medium containing 100 μg ml−1 ampicillin at 310 K until the OD600 reached 1.2. Cultures were induced with 1 mM IPTG. Expression was performed at 293 K for 17 h. Cells were harvested by centrifugation at 6000 rev min−1 for 30 min, resuspended in buffer A (50 mM Tris pH 8.5, 300 mM NaCl and 10 mM imidazole) and subsequently lysed by sonication. The soluble fraction was obtained by centrifugation at 14 000 rev min−1 for 40 min. The protein was purified using Ni-affinity chromatography. The supernatant was incubated with 10 ml Ni–NTA agarose for 2 h to allow His6-tagged protein to bind to the resin. After collecting the flowthrough, the column was washed with 1.5 l buffer A. The protein was eluted in an imidazole gradient (75–150 mM) and dialyzed against 50 mM Tris pH 8.0, 1.0 mM EDTA, 0.5 mM DTT buffer for protease cleavage. The His6 tag was removed during incubation with rTEV protease for 48 h at 283 K. The cut protein was loaded onto an Ni–NTA agarose column to separate it from uncut protein, rTEV and His6 tag. The protein was dialyzed against 20 mM Tris pH 8.0 and concentrated to 15 mg ml−1 (Bradford assay, BioRad). The SeMet-labeled protein was purified in the same way as the native protein and finally dialyzed against 20 mM Tris pH 8.0, 200 mM KH2PO4 buffer and concentrated to 15 mg ml−1.
The preliminary screens with the wild-type and mutated protein samples were carried out using a screen including 96 solutions, a subset of the 108 best crystallization conditions reported by the Joint Center for Structural Genomics (Page et al., 2003). The sitting-drop method was used throughout. No crystals were obtained for the wild-type protein. The optimized crystallization conditions of the K88A/Q89A mutant involved mixing 1 μl protein solution and 1 μl reservoir solution containing 30% PEG 1500 and 0.2 M KH2PO4 pH 5.6. Crystals appeared overnight. For the SeMet-containing crystals, the well solution was 20% PEG 8000 and 0.05 M KH2PO4 pH 4.6. Crystals were frozen using the harvesting solution containing 20% PEG 8000, 0.05 M KH2PO4 pH 4.6 and 10% ethylene glycol.
Data were collected from a single SeMet-substituted crystal at the SER-CAT (Southeast Regional Collaborative Access Team) beamline 22-ID at the Advanced Photon Source, Argonne National Laboratory at nominal wavelengths of 0.97920 Å (peak), 0.97546 Å (remote) and 0.97931 Å (edge). All data were processed and scaled using HKL2000 (Otwinowski & Minor, 1997). The unit cell was identified as P212121, with unit-cell parameters a = 36.2, b = 54.1, c = 93.2 Å. The Matthews coefficient is 2.1 Å3 Da−1, assuming one polypep-tide per asymmetric unit.
Analysis of the MAD data using XPREP (Bruker AXS, Madison, USA) revealed incompatibilities with the assumed wavelengths, suggesting that the real wavelengths differed from the nominal values. The SAD method was used instead, based on the absorption-peak data. The anomalous substructure was solved with SHELXD (Schneider & Sheldrick, 2002). The two Se sites were fed into the autoSHARP procedure incorporating SHARP (de La Fortelle & Bricogne, 1997), SOLOMON (Abrahams & Leslie, 1996) and the CCP4 suite (Collaborative Computational Project, Number 4, 1994). The model was built automatically in ARP/wARP (Perrakis et al., 1999), which was able to build 184 out of 190 residues. The program O (Jones et al., 1991) was used for subsequent manual model improvement. Refinement using all data to 1.8 Å resolution was performed with REFMAC5 (Murshudov et al., 1997). The refinement of the model utilizing individual isotropic displacement parameters (B factors) converged with an R factor of 14.0% (Rfree = 18.4%). In the final stages of the refinement, H atoms were added to the model and anisotropic displacement parameters were used. This improved the agreement factors to 12.5 and 18.0%, respectively. MOLPROBITY (Lovell et al., 2003) and PROCHECK (Laskowski et al., 1993) were used as validation tools during refinement process. Relevant crystallographic data are shown in Table 1.
The refined model of YdeN includes residues 2–187 out of a total of 190 in the single polypeptide chain. The quality of the model was verified with PROCHECK (Laskowski et al., 1993), which showed that 88.0% of non-glycine residues are in the favored regions and 11.4% are in the allowed regions of the Ramachandran plot. Notably, Ser71 lies outside this range ( = 55, ψ = −117°). It is located within a tight turn and its strained conformation (ε region in the Ramachandran plot) is a characteristic feature of all α/β hydrolases (Derewenda & Derewenda, 1991; Ollis et al., 1992). One other residue, His18, also shows the ε secondary structure ( = 58, ψ = −134°) and the electron density for this residue is equally well resolved. The strained conformation of this buried residue is stabilized by both van der Waals and hydrogen bonds, including a close packing against the adjacent Trp19. The side chain of Arg87 is not visible in the σA-weighted 2mFobs – DFcalc electron-density map contoured at 1σ and it was consequently excluded from refinement. Six side chains (Asn41, Ser55, Gln58, Glu63, Glu124 and Asp147) have at least two conformers and the occupancies were adjusted to correspond to the electron density.
The refined YdeN model reveals the canonical tertiary fold of α/β hydrolases, albeit lacking the typically observed first two antiparallel strands (Fig. 1). Consequently, the molecule consists of six central parallel β-strands with the topology 213456 and eight α-helices. The strand–turn–helix motif following β3 has an architecture which is consistent with the nucleophilic elbow motif that is typical of all α/β hydrolases and is also observed in an unrelated enzyme, malonyl-CoA acyl transferase (Serre et al., 1995). The typical amino-acid sequence around the nucleophilic Ser in these proteins is Gly-X1-Ser-X2-Gly. The glycines are found in this motif because of the acute angle formed by the strand and the helix, which brings the side chains in these two positions into collision (Ollis et al., 1992). However, the first glycine in the elbow of YdeN is replaced with an Ala. This is not unusual; such departures from a canonical sequence have been observed before and are made possible by a slight opening of the elbow (Lawson et al., 1994; Uppenberg et al., 1994). The fingerprint sequence of the nucleophilic elbow of YdeN is similar to that of lipases from filamentous fungi, where X1 and X2 are also His and Leu, respectively (Derewenda et al., 1994).
Members of the ubiquitous α/β hydrolase superfamily share a common architecture consisting of a central predominantly parallel β-sheet motif connected by loops and helices. The β-sheets create a half-barrel in which the first and the last strand are twisted with respect to each other by approximately 90°. Insertions containing long Ω loops or even large subdomains frequently occur after strands β3, β4, β6, β7 or β8. These insertions typically participate in the formation of a substrate-binding site and define the enzyme’s specificity (Heikinheimo et al., 1999). In many cases, such as those of the pancreatic lipase (van Tilbeurgh et al., 1993), bromoperoxidase A2 (Hofmann et al., 1998) or the 2-hydroxyl-6-oxo-6-phenylhexa-2,4-dienoic acid (HPDA) hydrolase (Nandhagopal et al., 2001), the insertion constitutes a ‘lid’ which undergoes structural rearrangement during the catalytic process. Smaller enzymes in this family, such as carboxylesterase (Kim et al., 1997) or cutinase (Martinez et al., 1991; Nicolas et al., 1996), have solvent-accessible active sites and are active against water-soluble small esters or thioesters.
The YdeN protein is a relatively small α/β hydrolase, smaller than cutinase (Martinez et al., 1991). The extra β-strand at the C-terminus (strand β6) arises from a difference in the tertiary fold in that region. Aside from a distinct similarity to cutinase, the tertiary fold of YdeN is also similar to the haloperoxidase, dienelactone hydrolase, esterase or serine carboxypeptidase subfamilies as defined by Heikinheimo et al. (1999). In common with the other small hydrolases, YdeN has no lid covering its active site, which is consequently solvent-accessible.
An automated comparison using DALI also identified a number of α/β hydrolases with a high level of structural similarity, including bromoperoxidase A2 (Hofmann et al., 1998; r.m.s. difference 2.3 Å over 170 amino acids), thioesterase (Lawson et al., 1994; r.m.s. 2.6 Å over 170 amino acids), dienoate hydrolase (Nandhagopal et al., 2001; r.m.s. 2.5 Å over 167 amino acids) and carboxylesterase (Kim et al., 1997; r.m.s. 2.4 Å over 161 amino acids). A structural comparison of the active centers shows that all the triads are very similar. The r.m.s. differences for all side-chain atoms of the triad residues are 0.56, 0.58, 0.33 and 0.25 Å, respectively.
The three catalytic triad residues always occur in the same order along the poly-peptide chain in all α/β hydrolases, i.e. first the nucleophile (Ser, Cys or Asp), then the carboxylic acid (Asp or Glu) and finally histidine. The three residues form an approximate mirror-image of the functionally analogous triad of serine proteases (Ollis et al., 1992; Derewenda & Wei, 1995). In YdeN this sequence is preserved and the three residues involved in the putative catalytic site are Ser71, Asp137 and His164. However, there is no hydrogen bond between the Ser hydroxyl oxygen and the imidazole of histidine because the hydroxyl group is shifted away from His164 Nε2. Instead, the hydroxyl forms a hydrogen bond with an adjacent water molecule (Fig. 2). A comparison of diverse enzymes containing classical triads shows that the geometry of the Ser-His interaction disfavors a strong hydrogen bond in the ground state (Z. S. Derewenda, unpublished work). In several cases, the hydroxyl of the serine is clearly turned away from the His (Ho et al., 1997; Devedjiev et al., 2000). In YdeN, the unpaired hydrogen-bonding potential of His164 Nε2 is satisfied through an interaction with an additional ordered water molecule in the active site. On the other hand, His164 Nε1 donates a typical hydrogen bond to Asp137 Oδ2 (2.74 Å). Asp137 Oδ1 is in turn held in its position by a hydrogen bond with the main-chain amides of Ile139 (2.81 Å) and Val140 (2.97 Å).
The function of the nucleophilic triad in triad-containing enzymes is complemented by the oxyanion hole, which stabilizes the incipient oxyanion during the course of the reaction via hydrogen bonds donated by amide groups (Matthews et al., 1975; Nicolas et al., 1996). These hydrogen bonds create a tetrahedrally distorted carbonyl C atom, which becomes susceptible to nucleophilic attack by the serine. The oxyanion hole can be formed by two amide groups, e.g. serine proteases and carboxylesterase (Kim et al., 1997), or three amide groups, as in acetylcholinesterase (Sussman et al., 1991) and the Streptomyces scabies esterase (Wei et al., 1995). In the last two cases, the oxyanion hole may be created by three main-chain amides or by two main-chain amides and one side-chain amide.
A close analysis of the stereochemistry of the putative active site of YdeN shows that main-chain amides of Tyr11 and Leu72 might serve to form an oxyanion hole. These amino acids correspond to Leu23 and Gln115 in carboxylesterase (Kim et al., 1997). There are no glutamine or asparagine side chains in the proximity of the active site that could contribute to the oxyanion hole.
YdeN belongs to a small family of homologous microbial proteins, including a conserved hypothetical protein from Mycoplasma penetrans (43% identity), a predicted esterase from Actinobacillus pleuropneumoniae serovar (40% identity) and a hypothetical protein from Photorhabdus luminescens (36% identity). The sequence alignment (Fig. 3) shows that all the residues that line the specificity pocket are closely conserved in this family, with only minor deviations. This suggests that all proteins may hydrolyze the same or closely related substrates.
The function of YdeN is not known, but the overall similarity to other α/β hydrolases strongly suggests that the protein is an enzyme with hydrolytic properties. Until very recently it was thought that all α/β hydrolases are enzymes, but the discovery of a non-catalytic protein from Mycobacterium tuberculosis with an analogous tertiary fold has changed this view (Wilson et al., 2004). Nonetheless, in the non-catalytic protein the nucleophile is absent and it is probably safe to assume that the presence of a complete triad is a strong indicator of hydrolytic function. Many α/β hydrolases are lipases, but the absence in YdeN of a lid that would allow interfacial activation suggests that it is active against water-soluble esters, although a thioesterase function cannot be ruled out.
There are a number of hydrophobic amino acids in close proximity to the triad, including Val140, Ile139, Leu109, Leu106, Leu103, Phe99 and His70. Together, they form a small but well defined specificity pocket (Fig. 2). Comparison with carboxylesterase from Pseudomonas fluorescens revealed that Ile139, Leu109, Leu106, His70 in YdeN correspond to residues that build the specificity pocket found in the carboxylesterase, i.e. Val170, Ile70, Met73 and Phe113, respectively. Moreover, all of these residues are conserved among sequence homologues of YdeN. We conclude that it is very likely that YdeN is a carboxylesterase active on a water-soluble ester in which the substrate, possibly the acyl group, has a hydrophobic nature.
As we predicted on the basis of modeling, the two mutated residues Lys88 and Gln89 are located on a sharp loop connecting helix C to strand 4. The loop is in close proximity to a crystal contact (Fig. 4), but the two alanines are not involved in any direct intermolecular interactions between molecules in the crystal lattice. The wild-type Lys88 would have pointed towards the contact, but it is unlikely that it would seriously interfere with the crystal packing. However, it is noteworthy that along with the preceding Arg87 a lysine in position 88 would contribute to a highly charged surface patch. The loop faces the N-terminus of the adjacent molecule, potentially leading to electrostatic repulsion assuming the wild-type sequence. It is therefore possible that unlike other successful examples of crystallization by surface engineering, crystals of YdeN are not formed becuase of a reduction of excess conformational surface entropy but because of altered electrostatic interactions. If indeed more than one mechanism is at play leading to high-quality crystals, this further increases the potential of surface mutagenesis for crystallization.
This work was funded by NIGMS (grant GM62615 to ZSD and, in part, grant P-50-GM62414 to AJ and WM). We would like to thank the staff of SER-CAT for help with data collection. Data were collected at Southeast Regional Collaborative Access Team (SER-CAT) beamline 22-ID at the Advanced Photon Source (APS), Argonne National Laboratory. Use of the APS was supported by the US Department of Energy, Office of Science, Office of Basic Energy Sciences under Contract No. W-31-109-Eng-38.