|Home | About | Journals | Submit | Contact Us | Français|
The DUF1094 family contains over 100 bacterial proteins, all containing a conserved CXC motif, with unknown function. We solved the crystal structure of the Bacillus subtilis representative, the product of the yphP gene. The protein shows remarkable structural similarity to thioredoxins, with a canonical αβαβαββα topology, despite low amino acid sequence identity to thioredoxin. The CXC motif is found in the loop immediately downstream of the first β-strand, in a location equivalent to the CXXC motif of thioredoxins, with the first Cys occupying a position equivalent to the first Cys in canonical thioredoxin. The experimentally determined reduction potential of YphP is E°′ = −130 mV, significantly higher than that of thioredoxin and consistent with disulfide isomerase activity. Functional assays confirmed that the protein displays a level of isomerase activity that might be biologically significant. We propose a mechanism by which the members of this family catalyze isomerization using the CXC catalytic site.
The Bacillus subtilisyphP gene codes for a member of a superfamily of over 100 prokaryotic, highly conserved proteins (DUF1094), found predominantly in Firmicutes such as Staphylococcus and Bacillus. No function has been assigned to this family, and to date there is no structure described for any of its representatives. Tertiary structure prediction using 3DJury (1,2) suggests that YphP has a core domain with high similarity to the canonical thioredoxin (Trx)1 fold, as exemplified by the Trypanosoma brucei Trx (1r26) or Malassezia sympodialis Trx (2j23), even though amino acid sequence identity level is very low (~15%). This similarity raises the possibility that members of the DUF1094 proteins constitute a family of oxidoreductases with the ability to catalyze the formation, reduction, and/or isomerization of disulfide bonds. However, they lack the canonical CXXC sequence of the active site and instead contain a CGC motif, atypical of thioredoxins or related oxidoreductases.
The ubiquitous Trx superfamily includes proteins with a unique fold, in which the central, mixed β-sheet is sequestered between several short α-helices (3). These oxidoreductases contain a fingerprint CXXC sequence (where X denotes any amino acid), in which the two cysteines alternate between the reduced and oxidized states, allowing them to participate in disulfide-exchange reactions. The location of this active site sequence is highly conserved, at the end of the canonical β2 strand and into the following α-helix, so that the N-terminal Cys is in an extended conformation, while the remaining three amino acids assume α-helical secondary structure. Interestingly, the sequence of the XX dipeptide varies among the superfamily members and is important for modulating the redox properties of the protein (4). For example, in DsbA, which catalyzes the formation of disulfide bonds in proteins secreted into the periplasmic space of Escherichia coli, a CPHC motif results in the highest known reduction potential (E°′ = −121 mV) matching the oxidizing environment of the periplasm (5). In contrast, cytosolic thioredoxins contain a canonical CGPC motif with an E°′ value that is 160 mV lower, i.e., −270 mV, which indicates a preference for an oxidized state and matches the reducing environment of the cytoplasm (6). Glutaredoxins contain a CPYC motif and their reduction potentials vary depending on structural context, so that the E. coli glutaredoxins 1 and 3 have reduction potentials of −233 and −198 mV, respectively (6). Finally, the protein disulfide isomerase (PDI) of the yeast endoplasmic reticulum has a CGHC sequence and a reduction potential of −180 mV (7).
In spite of this functional diversity, the general architecture of the active site motif is highly conserved, so that no member of the thioredoxin superfamily has been found with fewer or more than two intervening residues between the cysteines. Consequently, the functional importance of the number of the residues between the cysteines for the reduction potential has been studied in synthetic peptides (8). These experiments revealed that for peptides C(X)nC, where n ≤ 5, those with a single intervening residue have the least stable disulfide. It has also been shown that a CGC-NH2 tripeptide has a reduction potential of −167 mV, close to that of PDI, and that it exhibits disulfide isomerization activity (9). A CXC motif engineered into the active site of an E. coli thioredoxin by deleting the Pro residue from the CGPC active site showed likewise a destabilized disulfide with a reduction potential of ≥−200 mV. In its reduced form, this variant displayed disulfide isomerase activity that was 25-fold greater than that of the synthetic CGC peptide (9).
Surprisingly, amino acid sequence alignments showed that YphP, and all other members of the DUF1094 family, contains a CGC motif, involving Cys53 and Cys55, in a location analogous to the active site of thioredoxin. To gain insight into the structure and function of the DUF1094 family, we determined the crystal structure of the B. subtilis YphP at 2.1 Å resolution. The refined model confirmed that the protein is structurally closely related to the thioredoxin superfamily and that Cys53 occupies a site corresponding to the canonical N-terminal cysteine in the CXXC motif; Cys55 is located in place of the Pro in Trx. To better understand the function of YphP, we determined its reduction potential and analyzed its isomerase activity by using a continuous assay that measures the rate of isomerization of non-native to native disulfide bonds in tachyplesin I (10). We then used two single site variants, C53A and C55A, to investigate the role of each of the two cysteine residues in the active site. Finally, we compared the activity of YphP to that of ΔP34 thioredoxin in the fluorescence assay. The results confirm that YphP represents a new family of oxidoreductases that have a natural CXC motif within its active site. Both the reduction potential and functional assays are consistent with protein disulfide isomerase activity.
The B. subtilis YphP (also denoted APC1446) was originally selected as one of the targets for high-throughput structure determination by the Midwest Center for Structural Genomics (http://www.mcsg.anl.gov). The open reading frame was amplified from genomic DNA with a recombinant KOD HiFi DNA polymerase (Novagen) from Thermococcus kodakaraensis using conditions and reagents provided by the vendor (Novagen). The gene was cloned into a pMCSG7 vector (11) using a modified ligation-independent cloning protocol (12). This construct produced a fusion protein with an N-terminal His6 tag and a recognition site for the tobacco etch virus (TEV) protease. Unfortunately, further progress was impeded by the recalcitrance of the protein to crystallization. To circumvent this problem, we used the entropy reduction approach (13−15). We identified three clusters with high conformational entropy residues: cluster 1, E39, K40, E42; cluster 2, K113, E114; and cluster 3, Q100, E101. Variants were generated in which residues in these clusters were replaced with either Tyr or Ala.
Wild-type protein was expressed with high yield in E. coli BL21(DE3)RIPL. The protein was purified using standard methods and a combination of nickel affinity chromatography (Ni-NTA agarose column; Qiagen), followed by rTEV proteolysis to cleave the His tag, and gel filtration in the presence of reducing agents (1 mM DTT, 5 mM β-mercaptoethanol) to circumvent aggregation caused by intermolecular disulfide formation. The protein was concentrated to 8−15 mg/mL. Variant proteins were purified in a similar manner.
A custom set of conditions (16) was used for crystallization screening against a reservoir of precipitant and an alternative reservoir containing salt (17). The cluster 1 Ala variant and cluster 3 Tyr variant yielded poorly reproducible crystals under a variety of conditions. In contrast, the cluster 3 Ala variant (i.e., Q100A, E101A) gave reproducible crystals in two different conditions: 3.2 M (NH4)2SO4 and 0.1 M citric acid buffer at pH 5.0 and 2.0 M (NH4)2SO4. Following optimization, the best crystals were obtained from 2.6 M (NH4)2SO4 and 0.1 M HEPES buffer at pH 7.6 with added 30% trimethylamine N-oxide; 5 mM β-mercaptoethanol in the reservoir was necessary for crystals to reach maximal dimensions.
To generate the selenomethionine- (SeMet-) labeled protein samples, the protein was expressed in E. coli B834 cells. Seed cultures of 20 mL were grown in Luria broth for 4 h at 37 °C. The seed cultures were spun down at 3000 rpm and used to inoculate autoinduction medium containing SeMet. The cultures were grown for 4.5 h at 37 °C, followed by ~16 h at 20 °C and another 24 h at 10 °C. Harvested cultures yielded ~35 g of wet mass/2 L. Protein was purified using the procedure described above and yielded more than 100 mg of pure SeMet-labeled protein. Protein was concentrated to ~25 mg/mL in a buffer containing 25 mM NaCl and 25 mM Tris buffer at pH 8.0 and 1 mM DTT. The structure was determined from SeMet-labeled crystals grown from 2 M (NH4)2SO4 and 100 mM Tris buffer at pH 6.5 and 5% PEP (pentaerythritol propoxylate) 629.
All single- and multiple-site mutations were introduced using the QuikChange site-directed mutagenesis kit (Stratagene) and confirmed by direct DNA sequencing. The respective protein variants were purified as described above for the wild type.
ΔP34 E. coli thioredoxin was produced and purified as we described previously (9).
The crystals of the Q100A, E101A variant are tetragonal, P41212, with unit cell dimensions a = b = 68.16 Å and c = 302.02 Å, and contain four molecules in the asymmetric unit. The crystals contain ~55% solvent. For data collection, a crystal was frozen in mother liquor containing additional 20% glycerol. Data were collected at beamline 5.0.2 of the Advanced Light Source (ALS). The multiwavelength anomalous diffraction (MAD) data, used to solve the structure, were collected at three wavelengths: 0.9795 Å (peak), 0.9796 Å (inflection point), and 0.9393 Å (high energy remote). The data extended to 2.3−2.5 Å with the merging R factors ranging between 11.9% and 13.6% (see Table Table1).1). Positions of selenium atoms were identified with SHELXD (18). Crystallographic phasing was done with SOLVE (19). Phases were improved by iterative model building and refinement in RESOLVE (19) followed by ARP-WARP (20) and manual rebuilding with O (21) and COOT (22). Refinement was carried out with REFMAC5 (23) and with PHENIX (24) using the TLS (translation/libration/screw) approximation of thermal motion (25). Solvent accessibility of the Cys residues was calculated using the PISA server (26).
The reduction potential of the disulfide bond in YphP was determined from the thiol−disulfide exchange equilibrium between YphP and glutathione. YphP (10 μM) was incubated in 50 mM sodium phosphate, pH 7.6, containing EDTA (1 mM) and various ratios of reduced/oxidized glutathione, E°′ = −0.252 V (27). The reaction mixture was blanketed with Ar(g), and the reaction was allowed to proceed for 60 min at 30 °C, after which it was quenched and YphP was precipitated by the addition of trichloroacetic acid to 10%. The precipitated protein was washed with acetone and resuspended in 100 mM sodium phosphate, pH 7.6, containing 5,5′-dithiobis(2-nitrobenzoic acid) (DTNB; 0.10 mg/mL). The concentration of thiols was quantified by the ensuing absorbance at 412 nm. The amount of reduced YphP was compared to that of YphP that had been incubated with DTT (1 mM) to obtain the fraction of reduced YphP (f in eq 2). The standard reduction potential of YphP was calculated by nonlinear least-squares analysis of the data with the Nernst equation (eq 2):
where n is the number of electrons, F is the Faraday constant (96485 J V−1 mol−1), R is the gas constant (8.314 J K−1 mol−1), T is the temperature (here, 303 K), E is the reduction potential of the solution, and E° is the reduction potential of YphP.
The crystal structure was solved using the multiwavelength anomalous dispersion (MAD) method, and the atomic model was refined to an R factor of 19.4% (Table (Table1).1). The final model contains 584 amino acid residues (from four independent protein molecules), 249 water molecules, and 12 SO42− ions. Of the residues 93.4% are in the most favored regions of the Ramachandran plot with the remaining residues in additionally allowed regions. Three out of the four molecules show interpretable electron density for all 144 residues with an additional cloning artifact of three residues (SNA) on the N-terminus. The fourth molecule is slightly less well defined, and only residues 1−143 were placed in the electron density. A number of electron density peaks in the solvent region were attributed to sulfate ions. Details of the data collection, refinement, and model quality statistics are listed in Table Table11.
The four molecules in the asymmetric unit are arranged into a dimer of dimers in which the major contacts are mediated by an N-terminal helix, unique to YphP and not found in a canonical thioredoxin fold (Figure (Figure1A).1A). A comparison of the tertiary structure of YphP to other known structures using DALI (28) shows that, with the exception of the N-terminal helix, the protein is very similar to a typical thioredoxin, with the highest similarity (Z score of 11.6) to the human thioredoxin (29). The canonical αβαβαββα topology is well conserved, although there are differences in the mutual disposition and lengths of the individual helices and loops (Figure (Figure1B).1B). The tetrameric ensemble appears to constitute a crystallographic artifact, because gel filtration data (not shown) show that the protein is monomeric in solution.
The four crystallographically independent molecules display a degree of conformational variability with respect to the N-terminal α-helix, suggesting that the latter is flexible with respect to the core of the molecule. This may account, at least in part, for the recalcitrance of the wild-type protein to crystallization. Several intermolecular contacts define the molecular packing. Importantly, three out of four YphP molecules in the asymmetric unit form crystal contacts directly via the low-entropy patches generated by the Q100A, E101A substitutions. This pattern suggests that an oligomeric ensemble might first form transiently in solution and then be incorporated into the crystal via surface patches introduced by mutagenesis. A similar mechanism is observed for other proteins crystallized by this approach (Z. S. Derewenda, unpublished).
The CXC motif (Figure (Figure2)2) is located within a loop following the first β-strand, as predicted by 3D-Jury (1) based on the amino acid sequence. Cys53 is in an extended conformation (ϕ = ~−70, ψ = ~165; the dihedral angles are similar in all four molecules), and its position within the active site loop is analogous to the first Cys of the canonical CXXC motif. Residue Gly54 (ϕ = ~−45, ψ = ~−40) is the first amino acid of the α1 helix, while Cys55 occupies the position of the second X (Pro in thioredoxin). Of the two cysteines, Cys53 is more solvent exposed (~42 Å2) than Cys55 (~17 Å2), although there is some variability among the four molecules, and there is evidence of static disorder of the side chain of Cys55. The distances between the two Sγ atoms in the four molecules range from 3.6 to 5.9 Å, ruling out intramolecular disulfide bonds in the crystal structure; thus, the crystal structure represents the reduced form of the enzyme, as expected from the presence of reducing agents. The sulfhydryl of Cys53 is involved in two hydrogen bonds: with the hydroxyl Oγ of Ser51 and with Nη1 of Arg121 (Figure (Figure2),2), although specific distances vary depending on the side chain conformation. Both of these residues are completely conserved within the family, suggesting a functional role. Interactions of arginines with thiolates are known in other redox-regulated proteins. For example, in the hydroperoxide resistance protein Ohr, the active site cysteine Cys60, which is directly involved in peroxide reduction, accepts a hydrogen bond from Nη1 of Arg18 at 3.4 Å (30). In the B. subtilis chaperone Hsp33, the Zn2+-binding Cys235 is within 3.4 Å of Nη1 of Arg86 (31). By contrast, Cys55 has no similar interactions. These different environments suggest that the pKa value of Cys53 is lower than that of Cys55 and that Cys53 could be the key nucleophile in the active site.
In spite of the obvious similarity of the CGC motifs in YphP and the ΔP34 variant of Trx (9), the active sites of these two proteins need not be structurally equivalent. In Trx, the deletion of Pro34 might cause a compensating rotation of the α-helix, although there is no X-ray structure available to prove this notion. Also, given the amphipathic character of this helix, it is difficult to see how a deletion of Pro34 could be accomplished without disturbing the protein’s stability. In fact, the thermal stability of the ΔP34 variant has been shown to be lower than that of the wild-type protein by ~10 °C (9). In YphP, Cys55 simply occupies the place of Pro34 in Trx, while the position immediately downstream is occupied by an Ala, thus avoiding the strain that might characterize the ΔP34 Trx variant.
Figure Figure33 shows an alignment of representative amino acid sequences of proteins in the DUF1094 family, as listed in the Pfam database (32). This family comprises 102 sequences in 62 species. The majority (80 sequences) are found among the Firmicutes, mostly in the Bacillus and Staphylococcus genera. Interestingly, in these genera there are nearly always two homologous genes of this family in any species. The remaining sequences are in Acidobacteria and Bacterioidetes, and there is only a single gene in each species. The family shows a relatively high level of amino acid conservation with several completely conserved motifs in addition to the CGCA active site. Topologically, all of these conserved motifs are clustered primarily in loops around the active site, suggesting a possible role in substrate recognition. The atypical, N-terminal α-helix is the least conserved fragment in the DUF1094 family. We were unable to identify any homologues of these proteins in eukaryotic genomes.
The overall structural similarity of YphP to thioredoxins and the presence of the CGC motif in the putative active site suggest that the protein, as well as its homologues in the DUF1094 family, constitutes a family of oxidoreductases. To confirm this hypothesis, we carried out functional assays.
First, we determined the reduction potential, E°′, of YphP to be −130 ± 5 mV (Figure (Figure4).4). As pointed out above, for the known oxidoreductases with CXXC motifs in their active sites, the reduction potentials range from −270 mV in the E. coli thioredoxin (6), a cytosolic reductant, to −122 mV in DsbA, which catalyzes oxidation of proteins secreted into the periplasm (5). The stability of the active site disulfide bond is tuned, in part, by the amino acid residues between the active site cysteines, i.e., the XX residues in the CXXC motif. The reduction potential of YphP was determined to be E°′ = −130 ± 5 mV, a notably high value, consistent with YphP being an isomerase/oxidase rather than a reductase. The only other protein containing a CXC motif within an intact protein, for which E°′ has been determined, is the ΔP34 variant of Trx; in that case E°′ ≥−200 mV, at least 70 mV higher than in wild-type Trx (9). It seems that decreasing the number of intervening residues between the cysteines from two (CXXC) to one (CXC) provides at least one mechanism to increase the reduction potential of a thiol−disulfide oxidoreductase. It is noteworthy, however, that for a synthetic CysGlyCysNH2 peptide the reduction potential E°′ = −167 mV (9). Thus, the structural scaffold of YphP is also responsible for a further increase of E°′ by about 40 mV.
We also showed that YphP was able to catalyze the isomerization of non-native to native disulfide bonds using tachyplesin I peptide as a substrate, yielding a kcat/KM value of (4.7 ± 0.03) × 103 M−1 s−1. This activity is ~3% that of protein disulfide isomerase (PDI) (10) and 42% that of ΔP34 thioredoxin (Table (Table2).2). To investigate the roles of the cysteine thiols of YphP, we replaced each of the active site cysteine residues with alanine, thus generating single-site C53A and C55A variants. Unexpectedly, both of these variants catalyzed the isomerization of disulfide bonds in the scrambled tachyplesin substrate. The C53A and C55A variants had kcat/KM values of (2.6 ± 0.2) × 103 and (3.3 ± 0.1) × 103 M−1 s−1, respectively, only slightly lower than that of the wild-type protein. Yet, when both active site cysteine residues were replaced with alanines, resulting in the C53A/C55A variant, isomerase activity was abolished completely (Table (Table22).
The canonical mechanism of disulfide bond isomerization by PDI involves an attack on the substrate disulfide bond by the thiolate of the N-terminal cysteine of the CXXC motif (33−35). A mixed disulfide is formed between PDI and the substrate, enabling the thiol and disulfides of the substrate to react until native disulfide bonds are formed in the substrate, so that PDI is no longer tethered to it by a mixed disulfide. Replacing the N-terminal Cys in PDI with a Ser eliminates enzymatic activity (36). In contrast, replacing the C-terminal Cys eliminates the ability of the enzyme to catalyze substrate reduction and oxidation but not isomerization (7,37). During the rearrangement of disulfides in the substrate protein, a stable mixed disulfide between PDI and the substrate could prevent the release of PDI. The C-terminal cysteine of PDI is thought to provide an “escape” mechanism in which a disulfide is formed within the active site of PDI to allow for substrate release (8−11,37). A similar role has been ascribed to the C-terminal cysteine residue of thioredoxin (38).
The crystal structure of YphP suggests that the protein functions as a disulfide isomerase using a mechanism similar to that of protein disulfide isomerase (Figure (Figure5).5). In the reduced form, Cys53 is most likely the attacking nucleophile, activated by Arg121. When a stable intermolecular disulfide forms, an escape route is provided by Cys55 which, we suggest, also becomes activated by Arg121 swinging into a new position, so that an intramolecular disulfide can form. This putative mechanism involves a crucial role assigned to Arg121, which is absolutely conserved in the DUF1094 family (Figure (Figure3).3). We suggest that the escape route involves transient formation of a Cys53−Cys55 disulfide, although we have not confirmed this experimentally.
Oxidized CXC sequences, which are unusual 11-member ring structures, are extremely rare in nature. The only examples in the PDB of high-resolution structures containing such motifs are the thermophilic Ak.1 protease from B. subtilis, 1DBI(39), the L40C variant of the NADH peroxidase, 1NHR(40), and Erv2p, 1JR8(41) (Figure (Figure6).6). A similar disulfide is also reported in the Mengo encephalomyelitis virus coat protein (42), although low resolution precludes detailed stereochemical analysis. Sep15 has a redox-active CX(C/U) motif, where “U” refers to selenocysteine and the use of C or U varies between organisms (43). Finally, there is evidence that a CXC disulfide plays a functional role in the chaperone Hsp33 (44), but no structure of the oxidized protein is available. In all three cases where high-resolution data are available, the torsion angles of the disulfide-containing ring suggest significant strain. It is interesting to note that the backbone conformations in the 11-member rings are different: in the Ak.1 protease and in Erv2P the residue in the X position is in the left-handed helical conformation, while in the NADH peroxidase variant a serine occupying the equivalent position is in the right-handed helical conformation, very similar to Gly in YphP (Figure (Figure6).6). Based on these structural comparisons, it appears that a disulfide bond in YphP is stereochemically possible, with a slight conformational adjustment to Cys55.
The observed activity of the two variants with only one of either of the active site cysteines is easily rationalized in view of the fact that the escape mechanism is not necessary for the enzyme to function in assays with peptides, as described elsewhere (10). In the wild-type protein, Cys53 is likely to be the key nucleophile that attacks the substrate. It is more solvent exposed than Cys55, and the proximity of Arg121 makes its thiolate form more stable. However, in the absence of Cys53, e.g., in the C53A variant, Arg121 could easily swing toward Cys55, which is still sufficiently exposed to be reactive. This arrangement is different from that in Saccharomyces cerevisiae PDI, where the C-terminal Cys is buried and unable to catalyze isomerization in the absence of the N-terminal Cys. Thus, the C53A and C55A variants can both catalyze isomerization.
The B. subtilis YphP is a member of an unusual oxidoreductase superfamily of enzymes (DUF1094) with a natural CXC motif within the active site and a tertiary fold closely reminiscent of the thioredoxins. The experimentally determined reduction potential, E°′, is −130 ± 5 mV. Functional assays show that YphP has a level of disulfide isomerase activity that might be biologically significant with natural substrates. The enzymes in this family are highly conserved and occur primarily in the Staphylococcus and Bacillus species. All of these organisms contain canonical genes that encode thioredoxins, which are cellular reducing agents that have only been crystallized in an oxidized state. It is possible that, unlike thioredoxins, the DUF1094 enzymes are cellular oxidizing agents or in vivo catalysts of oxidative protein folding. Further biological studies will be needed to answer this question.
We thank Dr. Frank Collart (Argonne National Laboratory) for providing an overexpressing clone of the YphP protein and Natalya Olekhnovitch (University of Virginia) for excellent technical assistance and the preparation of the YphP variants. The authors thank the staff at the BL 5.0.2 managed by the Berkeley Center for Structural Biology (BCSB) at the Advanced Light Source (ALS) for technical support. The BCSB is supported in part by the National Institutes of Health, National Institute of General Medical Sciences.
National Institutes of Health, United States
†This study was supported by NIH NIGMS Grant U54 GM074946-01US (Z.S.D.), Grant GM044783 (R.T.R.), and Grant U54 GM074942 (A.J.). K.L.G. was supported by Chemistry−Biology Interface Training Grant T32 BM008505 (NIH). The ALS is supported by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.
‡The atomic coordinates and structure factors have been deposited in the Protein Data Bank, www.rcsb.org (PDB ID code 3FHK).