Elucidation of the substrate specificity for any enzyme given the amino acid sequence and/or three-dimensional structure alone is a difficult and demanding problem for most proteins. The rapid sequencing of whole bacterial genomes has provided an explosion of uncharacterized enzymes whose functions cannot be reliably annotated based upon a traceable homology to proteins of known function. Our approach to this problem is to address an entire enzyme superfamily simultaneously in an attempt to reveal the evolutionary lineages for the emergence of new catalytic functions and to provide a structural framework for the deciphering of substrate profiles from the amino acid sequences alone. Over 6,000 proteins with unique amino acid sequences have been classified as members of the amidohydrolase superfamily (24
). To date more than 30 distinct reactions have been shown to be catalyzed by members of this superfamily using amino acid, carbohydrate, and nucleic acid derived substrates (5
The substrate specificity for two related enzymes was examined in this investigation. Cc0300 from C. crecentus
and Sgx9355e from an unknown bacterium found in the Sargasso Sea are currently annotated as Xaa-Pro dipeptidase in the NCBI. Sequence comparisons have established that both proteins belong to the amidohydrolase superfamily. In this study the two enzymes were purified to homogeneity and the enzyme from the Sargasso Sea was crystallized and its three dimensional structure determined. Cc0300 is a zinc metalloprotein and the active site is likely to be populated by a binuclear metal center, similar to that found previously for urease and phosphotriesterase (25
). In this binuclear metal center, the two metal ions are coordinated by an aspartic acid and four histidine residues. The metal ions are also bridged by a carboxylated lysine residue and a hydroxide from solvent. The structure of Sgx9355e is consistent with this conclusion but, unfortunately, the metal ions were lost during purification and/or crystallization.
The substrate specificity for Cc0300 and Sgx9355e were determined using small libraries of potential dipeptide substrates. Contrary to initial expectations neither of these enzymes was able to hydrolyze any peptide that contained proline at the C-terminus. However, both enzymes were able to catalyze the hydrolysis of dipeptides that terminated in a hydrophobic amino acid but were not specific for the amino acid at the N-terminus. N-acetyl and N-formyl derivatives of L-hydrophobic amino acids were hydrolyzed by these two enzymes. Limited experiments with tripeptides demonstrated that hydrophobic amino acids from the C-terminus can be hydrolyzed and thus these two enzymes are more accurately classified as carboxypeptidases with a requirement for a hydrophobic amino acid at the C-terminus.
The X-ray structure of Sgx9355e was determined with L-methionine bound in the active site. The positioning of methionine in the active site has identified those amino acid residues that are responsible for the structural determinants of the substrate specificity for this enzyme. In this complex L-methionine is positioned as the product derived from the hydrolysis of the C-terminal end of an oligopeptide. The α-carboxylate of methionine is ion-paired with His-237, a residue that originates from the loop that follows β-strand 5. The α-carboxylate of L-methionine is also hydrogen bonded to the backbone amide groups of Val-201 and Leu-202. The α-amino group of the bound L-methionine interacts with Asp-328. This residue is found at the end of β-strand 8 and is also expected to ligate the divalent metal ion in the Mα
position. The positioning of the α-amino group of the C-terminal derived amino acid product with Asp-328 is consistent with this residue functioning as a general acid/base catalyst in the transfer of the proton from the bridging hydroxide to the leaving group amine (28
). The side chain of the L-methionine is found in a small pocket that is formed from Thr-277, Ala-302, Val-305, and Ile-309. Thr-277 is found in a very short loop that immediately follows β-strand 7, whereas Ala-302, Val-305, and Ile-309 are found in a pair of helices that follow this loop. All of these residues, with the exception of Ile-309, are conserved in the amino acid sequence of Cc0300. In Cc0300 the isoleucine is substituted with a methionine.
We have previously determined the substrate profile for two other peptidases from C. crecentus
, Cc2672 and Cc3125 (9
). Cc2672 is specific for the hydrolysis of oligopeptides that terminate in either L-lysine or L-arginine. The specificity of Cc3125 is more restrictive in the sense that only dipeptides (but not longer peptides) that terminate in L-lysine or L-arginine represent productive substrates. In addition to these two proteins we have determined the structure and substrate specificity of Sgx9359b from the Sargasso Sea (gi| 44368820). This enzyme is specific for the hydrolysis of oligopeptides that terminate in L-arginine. The mode of binding of L-arginine to the active site of Sgx9359b is presented in . In this structure the α-carboxylate is ion paired to His-225 and hydrogen bonded to the backbone amide groups from Val-189 and Met-190. The α-amino group interacts with the α-carboxylate of Asp-315. These residues are homologous to those residues found to interact with L-methionine in the structure of Sgx9355e (). As expected, the structural determinants for the recognition of the guanidino group of the bound L-arginine in Sgx9359b are different. The guanidino group is hydrogen bonded with the side chain amide group of Gln-296 and ion paired with the side chain carboxylates of Glu-289 and Asp-265. These resides occur in positions similar to those of three amino acids that define the substrate recognition in Sgx9355e (Ile-309, Ala-302, and Thr-277).
A sequence alignment of Sgx9355e, Cc0300, Cc2672, Cc3125, and Sgx9359b is presented in . In this alignment there are a total of 71 residues that are fully conserved among the 5 proteins. All six of the amino acids expected to bind to the two divalent cations in the active site are conserved, as is the histidine that ion pairs with the C-terminal carboxylate of peptide substrates. The three enzymes that recognize positively charged amino acids at the C-terminus of potential substrates (Cc2672, Cc3125, Sgx9359b) have either anionic or highly polar residues at the binding site, indicated by black triangles in . Conversely, the two enzymes that recognize hydrophobic side chains at the C-terminus of potential substrates (Sgx9355e and Cc0300) have less polar amino acids at the equivalent positions.
Figure 7 Amino acid sequence comparison for Sgx9355e (gi| 44371129), Cc0300 (gi| 16124555), Cc2672 (gi| 16126907), Cc3125 (gi| 16127355) and Sgx9359b (gi|44368820). The predicted metal ligands for the two divalent metal ions are indicated with ovals. Residues (more ...)
The discovery of function for Cc0300 and Sgx9355e, coupled with the structure elucidation of Sgx9355e with a bound product, permits annotation of other proteins in the amidohydrolase superfamily with greater confidence. A search of the NCBI database of completely sequenced bacterial genomes finds 29 other protein sequences that are now predicted by analogy to hydrolyze short oligopeptides that terminate in hydrophobic amino acids. All of these proteins have a conserved histidine equivalent to His-237, threonine and alanine residues equivalent to Thr-277 and Ala-302, and an isoleucine, methionine, leucine, or valine residue equivalent to Ile-309 in the structure of Sgx9355e. A list of these sequences is presented in .
Proteins predicted to share substrate specificity with Cc0300 and Sgx9355e.