Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Structure. Author manuscript; available in PMC 2009 July 22.
Published in final edited form as:
PMCID: PMC2714228

Discovery of a Dipeptide Epimerase Enzymatic Function Guided by Homology Modeling and Virtual Screening


We have developed a computational approach to aid the assignment of enzymatic function for uncharacterized proteins that uses homology modeling to predict the structure of the binding site and in silico docking to identify potential substrates. We apply this method to proteins in the functionally diverse enolase superfamily that are homologous to the characterized L-Ala-D/L-Glu epimerase from Bacillus subtilis. In particular, a protein from Thermotoga martima was predicted to have different substrate specificity, which suggests that it has a different, but as yet unknown, biological function. This prediction was experimentally confirmed, resulting in the assignment of epimerase activity for L-Ala-D/L-Phe, L-Ala-D/L-Tyr, and L-Ala-D/L-His, whereas the enzyme is annotated incorrectly in GenBank as muconate cycloisomerase. Subsequently, crystal structures of the enzyme were determined in complex with three substrates, showing close agreement with the computational models and revealing the structural basis for the observed substrate selectivity.


Reliable assignment of function to proteins discovered in genome sequencing projects is a major challenge in genomic biology. Functional assignment of uncharacterized proteins is commonly accomplished by sequence analysis, but the assignment of function on the basis of homology can lead to incorrect or misleading annotations and cannot identify new functions. Assigning enzymatic function to proteins identified in genome sequencing efforts is challenging, in part because there is no simple relationship between measures of sequence similarity (e.g., sequence identity) and protein function. Highly similar proteins (60% sequence identity or greater) can catalyze distinct reactions, whereas highly divergent proteins can catalyze identical chemical transformations (Broun et al., 1998; de Souza et al., 1998; Seffernick et al., 2001; Tian and Skolnick, 2003). Misannotation of enzymatic function is severe in functionally diverse superfamilies, such as the enolase superfamily of (β/α)8 barrel enzymes considered here (Pegg et al., 2006; A. Schnoes and P.C.B., unpublished data). We and others have developed computational methods intended to help address this challenge. A central theme in several of these approaches is the use of structural as well as sequence information. One class of methods analyzes features of active sites (Barker and Thornton, 2003; Cammer et al., 2003; Polacco and Babbitt, 2006; Tremblay et al., 2006; Wangikar et al., 2003). Virtual metabolite screening against active sites has also been used successfully to identify enzyme substrates in retrospective (Favia et al., 2008; Hermann et al., 2006; Kalyanaraman et al., 2005; Macchiarulo et al., 2004) and prospective (Hermann et al., 2007; Song et al., 2007) studies. In general, this approach cannot be expected to predict the optimal substrate for an enzyme in terms of kcat/KM, in part because kcat is extremely difficult to predict. However, as in the field of drug design, virtual screening can help to prioritize for experimental testing compounds that are more likely to bind to the active site, which is a prerequisite for catalytic activity.

Despite remarkable advances in structural biology, including the contributions of the Protein Structure Initiative (PSI), the number of protein sequences identified through genome sequencing continues to vastly outpace the rate of structure determination. Consequently, for the foreseeable future, structural models of most protein sequences will only be available by homology modeling approaches. In principle, sufficiently accurate computational methods would enable the construction of models that could be used as surrogates for experimentally determined structures, for example for drug discovery (Jacobson and Sali, 2004; Kenyon et al., 2006) or for understanding sequence-structure-function relationships, our focus here. In a previous study, we demonstrated that it was possible to predict the substrate specificity of a divergent member of the enolase superfamily encoded by Bacillus cereus, based on docking against a homology model (Song et al., 2007). Subsequent enzymatic characterization and crystallographic analysis confirmed the predictions.

Here, we have undertaken functional assignment for proteins in the enolase superfamily that are related most closely to the experimentally characterized L-Ala-D/L-Glu epimerases (AEEs) from Escherichia coli and B. subtilis, which are believed to be involved in recycling peptidoglycan (Klenchin et al., 2004). Although these proteins have a variety of annotations in GenBank, the most likely annotation based on careful phylogenetic analysis is AEE (Glasner et al., 2006). However, some evidence suggested that some of these proteins might have other functions; for example, this clade of proteins contains a few sequences from plants and archaea that lack peptidoglycan and hence have no obvious reason to encode the AEE function in their genomes. The purpose of this study was to identify sequences in this clade that might have alternative functions and to identify their substrates. In particular, we identify an enzyme with dipeptide epimerase activity in Thermotoga maritima with a novel specificity for dipeptides with alanine in the first position and aromatic amino acids in the epimerized position. Specifically, L-Ala-L-Phe, L-Ala-L-Tyr, and L-Ala-L-His are epimerized with values of kcat/KM ~104 M−1 s−1. Crystal structures of the enzyme have been determined in complex with three substrates, showing close agreement with the computationally predicted models and revealing the structural basis for the observed substrate selectivity.


Homology models were constructed for over 100 proteins in the MLE subgroup, including 82 sequences that clustered with the experimentally characterized E. coli and B. subtilis AEEs, by using the Protein Local Optimization Program from a multiple sequence alignment, as described in Experimental Procedures. Only 65 of these proteins are shown in the phylogenetic tree (Figure 1), which shows a representative subset sharing < 60% sequence identity. The template protein for all of the homology models was chosen to be 1TKK, the AEE from B. subtilis in complex with the L-Ala-L-Glu substrate. Apo structures are also available for both the B. subtilis and E. coli AEEs, but the binding sites are partially open, making them poorly suited to our purposes (Kalyanaraman et al., 2005).

Figure 1
Phylogenetic Tree of a Representative Subset of the Dipeptide Epimerase Group of the Enolase Superfamily

In general, one challenge associated with metabolite virtual screening is that existing metabolite libraries are undoubtedly incomplete (for example, the specific dipeptides that are shown to be substrates here are not included in KEGG). We hypothesized that all of the proteins considered in this study (Figure 1) were likely to be dipeptide epimerases, based on a phylogenetic tree of a larger subgroup in which the AEEs form a single clade (Glasner et al., 2006), and the conservation of catalytic residues and a DxD motif involved in binding the NH3+ terminus of the dipeptide in the AEEs. Accordingly, we restricted the virtual screening to the 400 possible L/L dipeptides. For computational efficiency, the protein was treated as rigid.

In control docking calculations with the B. subtilis AEE structure, L-Ala-L-Glu ranked 8 out of the 400 dipeptides; most of the other top-ranked dipeptides also had Glu or Asp at the epimerized position, and a small amino acid (Gly, Cys, Ser, Ala) in the first position (Table 1). Docking against a homology model of the E. coli AEE (32% sequence identity) led to similar results. It should be noted that both the E. coli and B. subtilis AEEs epimerize dipeptides other than L-Ala-L-Glu, which is believed to be the physiologically relevant substrate, albeit with slower kinetics. For example, both epimerize L-Ser-L-Glu and L-Ala-L-Met, and the E. coli AEE, which is the less specific of the two, epimerizes substrates such as L-Ala-L-His and L-Ala-L-Gln (Schmidt et al., 2001). Kinetic constants have been measured for only selected substrates, but suggest roughly 1 order of magnitude slower kinetics for nonphysiological substrates, i.e., kcat/KM for epimerizing L-Ala-D-Glu is 7.7 × 104 and 4.7 × 104 (M−1 s−1) for the E. coli and B. subtilis AEEs, respectively, whereas the corresponding rates for L-Ala-D-Met are 2.8 × 103 and 2.2 × 103.

Table 1
Top L/L Dipeptides from Docking against the Homology Models of TM0006, the E. coli AEE, and Four Other Representative Proteins, as Well as the Template Used for Those Models

For most of the other homology models, especially those clustering relatively closely with the E. coli and B. subtilis AEEs, the docking results were similar to those obtained with the E. coli and B. subtilis AEEs, and thus consistent with the AEE activity. That is, the top hits were dominated by compounds with small amino acids in the first position, and negatively charged amino acids in the second, epimerized position. Four representative examples are shown in Table 1.

However, for ~20 of the proteins, the predicted specificities were dramatically different. Two major classes of novel predicted specificity were observed: a small number of enzymes (6) were predicted to epimerize positively charged dipeptides, and a somewhat larger number (~15) were predicted to epimerize hydrophobic (in both C- and N-terminal positions) dipeptides. Of these, we have obtained extensive experimental results (kinetics and multiple crystal structures) for the protein from Thermotoga maritima (gi:15642781, TM0006), confirming the computational predictions. Screening and structural studies are underway for several others, and those studies will be reported in due course.

The docking results for the homology model of TM0006, which shares 27% sequence identity with the B. subtilis AEE, are shown in Table 1. In the C-terminal, epimerized position, the docking results suggested selectivity for primarily aromatic, hydrophobic amino acids, instead of the strong selectivity for Glu in B. subtilis AEE. In the N-terminal position, top hits included Ser/Thr/Cys as well as larger hydrophobic amino acids such as Ile.

Experimental screening of L/L dipeptide libraries by mass spectroscopy (MS) confirmed the specificity switch (Table 2). In the Gly-Xxx, Ala-Xxx, and Thr-Xxx libraries, the best substrates had Phe, Tyr, or Trp in the epimerized position. Aliphatic side chains (Met, Leu, Ile) were also tolerated, and Ala-His and Thr-His were good substrates. In the N-terminal position, any hydrophobic amino acid was tolerated in the Xxx-Phe, Xxx-Tyr, and Xxx-His libraries. Dipeptides with charged, or most polar amino acids in the first position were usually poor substrates. Furthermore, the enzyme displayed no detectable muconate lactonizing enzyme (MLE) activity (results not shown), demonstrating that the GenBank and UniProt/TrEMBL annotations are incorrect. AEEs are part of a larger subgroup within the enolase superfamily, whose members are more similar to each other than other subgroups within the superfamily. The known functions of the subgroup are MLE, o-succinylbenzoate synthase, and racemization of N-succinyl or N-acetyl amino acids. No activity for these other assigned functions within the MLE subgroup was observed (data not shown).

Table 2
Experimental Screening of TM0006 with L/L Dipeptides by Mass Spectroscopy to Detect Incorporation of Deuterium as a Result of Epimerization

Although MS screening of dipeptide libraries allowed us to simultaneously evaluate multiple substrates efficiently, we were only able to ascertain a rough approximation of activity. However, taking the MS screening results as a whole allowed us to prioritize our choice of substrates to carry out full kinetic assays. Kinetic constants were determined for selected dipeptide substrates by observing the change in optical rotation by polarimetry (Table 3). Using the E. coli and B. subtilis AEEs as standards, we expected that authentic substrates would exhibit values of kcat/KM in the 104 M−1 s−1 range (Schmidt et al., 2001). Of the L-Ala-L-Xxx dipeptides assayed, L-Ala-L-Phe and L-Ala-L-His displayed values of 1.2 ± 0.2 × 104 M−1 s−1 and 1.3 ± 0.6 × 104 M−1 s−1, respectively. Although generally grouped with polar amino acids, histidine is also aromatic. Likewise, we found that L-Ala-L-Tyr was also epimerized with an appreciable efficiency of 9.1 ± 0.8 × 103 M−1 s−1. In order to minimize the possibility that the authentic substrate was overlooked during mass spectroscopic screening, additional L-Ala-L-Xxx dipeptides were characterized. These dipeptides were specifically chosen to systematically sample the different classes of amino acid side chains in the second position, regardless of apparent turnover in MS assays. We found that L-Ala-L-Glu and L-Ala-L-Leu were epimerized with values of kcat/KM of 4.9 ± 1 × 103 M−1 s−1 and 3.8 ± 1 × 103 M−1 s−1, respectively. These results indicate that, although not optimal, negative and aliphatic side chains can also be accommodated in the C-terminal position. Finally, low turnover of L-Ala-L-Lys, 3.6 ± 0.2 × 102 M−1 s−1, indicates that a positively charged group in the epimerized position is detrimental.

Table 3
Kinetic Constants Obtained for Epimerization of Selected Dipeptide Substrates of TM0006

Kinetic constants were also determined for selected compounds from the L-Xxx-L-Phe and L-Xxx-L-His series. Although most of the dipeptides analyzed could serve as substrates, none had kinetic constants that approached the values of kcat/KM of 104 M−1 s−1 observed for dipeptides with L-Ala in the first position. Some compounds such as L-Phe-L-Phe exhibited low values of kcat (0.21 s−1), whereas others such as L-Lys-L-Phe and L-Ile-L-Phe had high values of KM. No detectable activity was observed for epimerization of L-Asp-L-Phe. Although L-Ala-L-His was a favored substrate with the value of kcat/KM essentially the same as that for L-Ala-L-Phe, other L-Xxx-L-His dipeptides were problematic substrates, with either no activity, inability to reach saturation, or evidence of substrate inhibition (Table 3). Taken together, the results support L-Ala as the optimal N-terminal residue.

Although the kinetic parameters determined for L-Ala-L-Phe, L-Ala-L-Tyr, and L-Ala-L-His at room temperature are in the range we expected for an authentic dipeptide epimerase, T. maritima is a hyperthermophile whose optimal growth occurs at 80°C. Although we were unable to perform the assays at temperatures elevated to this level, we were able to examine epimerization of L-Ala-L-Phe at 40°C and 50°C; the values of kcat/KM were found to be 4.1 ± 0.9 × 104 M−1 s−1 and 5.4 ± 0.4 × 104 M−1 s−1 at 40°C and 50°C, respectively. The values of kcat double with each 10°C increase (from 16 ± 7 s−1 at 28°C to 35 ± 6 s−1 at 40°C, and to 76 ± 20 s−1 at 50°C). From these results we conclude that the measured kinetic parameters likely underestimate the physiological efficiency of the enzyme. The physiologically relevant substrate is currently unknown, but we consider L-Ala-L-Phe, L-Ala-L-Tyr, and L-Ala-L-His to be the most likely candidates based on their kinetic constants.

The homology model revealed the structural basis for the change in specificity (Figure 2). One critical determinant of specificity in the B. subtilis and closely related AEEs is Arg24, which coordinates the Glu side chain of the L-Ala-L-Glu ligand. The corresponding residue in TM0006 is Ser25 (Figure 3). Other members of the dipeptide epimerase group also have substitutions at this position, including the E. coli AEE, which has Gly24 at the equivalent position. The specificity for Glu in E. coli AEE and related proteins is provided by Arg and Lys side chains at other positions within the same pocket. The pocket in TM0006, however, is primarily hydrophobic, accounting for the change in specificity. With respect to the N-terminal position of the substrate, the ability to accommodate side chains larger than Ala/Ser/Thr is conferred in part by the substitution of Gly294 at the position equivalent to Ile298 in the B. subtilis AEE.

Figure 2
Stereo View Depictions of the Dipeptide-Binding Site in the B. subtilis AEE and TM0006
Figure 3
Portions of the Multiple Sequence Alignment of TM0006, Several of Its Closest Homologs Based on the Phylogenetic Tree, and the E. coli and B. subtilis AEEs

The crystal structure of TM0006 was subsequently determined as an apo structure as well as in complex with L-Ala-L-Phe, L-Ala-L-Leu, and L-Ala-L-Lys, at 1.9–2.3 Å resolution (Figure 4). During the preparation of this manuscript, an apo structure for an ortholog of TM0006 was released in the PDB (2ZAD; currently unpublished). This structure was not available when this work was performed, and it agrees closely with the apo structure determined here. The experimentally determined structure of the L-Ala-L-Phe complex is superimposed on the model generated by homology modeling and docking in Figure 5. The experimental structure confirmed the proposed binding mode; the ligands superimpose almost perfectly. The positions of most of the protein side chains in the immediate vicinity of the ligand were also predicted accurately, reflecting no major errors in the sequence alignment used to generate the homology model. The greatest discrepancy is between the predicted and observed position of Arg54, which forms a salt-bridging interaction with Glu242 in the crystal structure. In the computational model, Arg54 is swung out into solution. This error may be due to a slight shift in the backbone near Arg54 between the homology model and the crystal structure, to a limitation of the energy function used for constructing the homology model, or both. Arg54 may play some role in substrate specificity, because it comes within 4 Å of the Phe side chain of the dipeptide ligand in the crystal structure, possibly forming a favorable cation-pi interaction.

Figure 4
Overview of Structures of TM0006 Obtained by X-Ray Crystallography
Figure 5
Superposition of the Models of L-Ala-L-Phe Bound to TM0006, Based on Homology Modeling and Docking and Crystallography

The active sites of the other holo structures are shown in Supplemental Data (available online). The complex with L-Ala-L-Lys was determined to elucidate the structural basis for the relatively slow but detectable epimerization for this dipeptide, which is positively charged, in contrast to most of the other substrates, which are hydrophobic. The structure of the complex of TM0006 with L-Ala-L-Lys reveals that the positively charged nitrogen of the Lys side chain extends slightly out of the binding pocket through a narrow opening and is coordinated by water molecules.


We know of no other enzymes that epimerize hydrophobic dipeptides. The dipeptide epimerase from T. maritima clusters with a few other sequences that we also predict not to be AEEs, based on the sequence analysis, homology models, and docking results (Figures 1 and and3).3). So far, it has not been possible to express these proteins in soluble form for experimental screening. The other members of this small group include proteins from other thermophiles, Caldicellulosiruptor saccharolyticus and Chloroflexus aurantiacus, and an archaeon, Haloarcula marismortui. Most strikingly, the group also contains sequences from two plant genomes, Oryza sativa (rice) and Arabidopsis thaliana. Although the physiological relevance is not clear, the presence of closely related dipeptide epimerases in organisms lacking peptidoglycan is consistent with the change in specificity that we have described.

This study highlights the challenges facing functional assignment of enzymes, as well as a promising approach for overcoming some of these challenges. A central challenge is delineating, in sequence space, where one function ends and another begins. Overall sequence similarity among proteins is often unreliable, especially in mechanistically diverse superfamilies (Pegg et al., 2006). Changes in enzymatic function are often related to sequence changes in the binding site, and, as shown here, homology models can be used to identify proteins that are likely to have different functions than their closest functionally and structurally characterized homologs.

Docking methods can be used in conjunction with homology models to suggest specific small molecules that may be substrates, and, importantly, can suggest novel enzymatic functions, as we, to our knowledge, have done here. At this point, experimental testing remains necessary to confirm or refute these hypotheses. Predicting catalytic rates remains extremely difficult, and we have not attempted to do so. Predicting substrates likely to bind, as well as their binding modes, is more tractable, although still challenging, especially with homology models (Jacobson and Sali, 2004). In this case, most of the top dipeptides from docking were in fact substrates. Although the top docking hits were not necessarily the best substrates (L-Ala-L-Phe ranked 48 out of 400 dipeptides), they did capture the correct specificity at the epimerized position for aromatic side chains. Most importantly, the striking differences between the docking hit lists for TM0006 and the B. subtilis and E. coli AEEs allowed us to identify TM0006 as a candidate for experimental screening based on the high likelihood of it epimerizing distinct substrates.

We believe that the integrated use of computational methods (multiple sequence alignment, operon context, phylogenetic trees, homology modeling, and docking) applied on the scale of hundreds or thousands of proteins, in combination with experimental characterization (functional enzymology and structural biology) of a relatively small number of proteins that are predicted to have new functions, will be a powerful approach for accurate and large-scale functional annotation.


Computational Methods

All protein sequences annotated in the Structure Function Linkage Database (Pegg et al., 2005, 2006) as belonging to the muconate lactonizing enzyme (MLE) subgroup were used to construct the multiple sequence alignment. The proteins were first aligned by using Muscle v.3.52 (Edgar, 2004), and the initial alignment was manually refined by referring to structural alignments of the characterized MLE subgroup members (Glasner et al., 2006). The phylogenetic tree was constructed by using MrBayes v3.1.2 (Altekar et al., 2004; Ronquist and Huelsenbeck, 2003) under the WAG amino acid substitution model (Whelan and Goldman, 2001) and a gamma distribution to approximate the rate variation among sites. Positions in the alignment that had too many gaps or appeared to be mutationally saturated were excluded from phylogenetic analysis. Accession numbers and species abbreviations are listed in Table S1.

Homology models were created for over 100 sequences in the MLE subgroup, including 82 sequences that clustered with the experimentally characterized AEEs from B. subtilis and E. coli according to the phylogeny (Glasner et al., 2006). At the time of our investigation, there were only three crystal structures available among the sequences in this clade: holo (cocrystallized with L-Ala-L-Glu) and apo structures of B. subtilis AEE and an apo structure of the E. coli AEE. We used the holo structure of the B. subtilis AEE (1TKK) as a template to construct models for the 82 sequences. The models were built by using the Protein Local Optimization software (marketed as Prime by Schrödinger LLC). While constructing the models, we included both the metal ion and the cocrystallized ligand from the template. After building the models, we docked a dipeptide library against the binding site of these models by using the software Glide (v4.0108, Schrödinger LLC). The dipeptide library was prepared by using the software Ligprep (v2.0106, Schrödinger LLC).

Cloning, Expression, and Purification of the Dipeptide Epimerase from Thermotoga maritima

The gene for the dipeptide epimerase (gi:15642781) was amplified by PCR from Thermotoga maritima MSB8 genomic DNA by using the following primers: 5′-GGAGGTGTGACATATGTCGAGGATCGTGAACGTGAAGC-3′ and 5′-GAACT GCTGGATCCTCATTGATCTTTCACCCTCATTCTCG-3′ (Bio-Synthesis, Inc.) containing a 5′ NdeI site and a 3′ BamHI site, respectively. PCR reactions in 100 μl total volume contained 1 ng template, 1 mM MgSO4, 2.5 U platinum Pfx DNA polymerase (Invitrogen), 1× Pfx amplification buffer, 1× enhancer buffer, 0.4 mM of each dNTP, and 0.2 μM of each forward and reverse primer. The PCR reaction was performed with the following parameters: 94°C for 3 min, followed by 40 cycles of 94°C for 1 min, 47°C for 1.25 min, and 68°C for 3 min; the final extension time was 10 min at 68°C. After purification by gel extraction (QIAGEN), the amplified PCR product was restricted by using NdeI and BamHI restriction enzymes (New England Biolabs) per the manufacturer’s protocols. The gene was then ligated into the nontagged expression vector pET17b (Novagen) by using T4 DNA ligase (Fisher) and was transformed in E. coli XL1Blue cells for plasmid amplification and maintenance.

The cloned dipeptide epimerase from T. maritima was expressed in E. coli BL21 (DE3) cells for protein purification. In a typical protein preparation, 2 L LB media was shaken at 37°C without induction and harvested after 32 hr by centrifugation at 4800 rpm. The pelleted cells were resuspended in 60 ml buffer containing 10 mM Tris-HCl (pH 7.9) and 5 mM MgCl2. The suspension was lysed by sonication, and debris was cleared by centrifugation at 27,250 × g. The supernatant was applied to a DEAE Sepharose FF column (2.5 × 50 cm, GE Healthcare) and eluted with a linear gradient (1600 ml) of 0 to 1 M NaCl buffered with 10 mM Tris-HCl (pH 7.9) containing 5 mM MgCl2. Fractions containing the protein of interest were pooled and dialyzed three times against 10 mM Tris-HCl (pH 7.9) containing 5 mM MgCl2 before being applied to a Q Sepharose HP column (1.7 × 7 cm, GE Healthcare). The protein was eluted with a linear gradient (250 ml) of 0 to 0.5 M NaCl in 10 mM Tris-HCl (pH 7.9) containing 5 mM MgCl2. Fractions containing >99% pure protein were pooled and dialyzed into 20 mM Tris-HCl (pH 7.9) containing 100 mM NaCl and 5 mM MgCl2. The protein was concentrated to 10–15 mg/ml by using a Millipore Amicon apparatus fitted with a 10,000 NMWL ultrafiltration membrane and was stored at 4°C. Storage for more than 1 week resulted in an ~25% loss of activity.

Repeated attempts to achieve expression in an E. coli AEE knockout system failed. As an alternative, endogenous E. coli proteins were heat denatured during purification. After initial sonication and centrifugation, the cleared lysate (vide supra) was heated at 50°C for 60 min until the solution was opaque and viscous. Centrifugation at 27,250 × g for 50 min was repeated, and the lysate was further purified over DEAE/Q Sepharose columns as described above. This preparation was used to assess the validity of the AEE activity of the Thermotoga enzyme, which is elaborated on in the Supplemental Data. The E. coli AEE was purified as previously reported (Schmidt et al., 2001).

Screening of the Thermotoga Enzyme with Dipeptide Libraries

The procedure for solid-state synthesis of dipeptide libraries was reported previously (Song et al., 2007). For initial assessment of dipeptide epimerase activity, screens were set up with the following dipeptide libraries: Gly-L-Xxx, L-Ala-L-Xxx, D-Ala-L-Xxx, L-Thr-L-Xxx, L-Xxx-L-Phe, L-Xxx-L-Tyr, L-Xxx-L-His, and L-Xxx-L-Lys. In accordance with known activities in the MLE subgroup of the enolase superfamily, screens with N-succinyl-L-Xxx and N-acetyl-L-Xxx libraries were also preformed. Screens were carried out in 50 μl D2O containing 20 mM NH4HCO3 (pD 7.9), 1 mM (each dipeptide) library, and 1 μM enzyme. The reaction was incubated at 37°C for 16 hr, quenched with 2 μl 5 M NH4OH, and evaporated to dryness. The samples were then resuspended in ddH2O and analyzed by ESI mass spectrometry for incorporation of solvent deuterium as indicated by a +1 mass shift. If activity was detected, the screen was repeated with a 10-fold reduction in enzyme, and time points were taken at 0.5, 1.5, and 3 hr for better assessment of preferred substrates.

Kinetic Studies of the Thermotoga Enzyme with Dipeptide Substrates

Polarimetry measurements were determined on a Jasco P-1010 Polarimeter. Dipeptides for kinetic characterization were purchased when possible (e.g., Sigma, Bachem, Research Organics, Indofine, or MP Biochemicals), with the following exceptions: L-Ile-L-Phe was synthesized according to the procedure of Theodoropoulos and Craig (1955), and L-Lys-L-Phe was synthesized according to that of Lapeyre et al. (2006). Syntheses for all other dipeptides are provided in Supplemental Data. Kinetic parameters were obtained by quantifying the change in optical rotation as a function of time as determined by polarimetry by using a 100 mm path cell and an Hg 405 nm filter. Assays were performed at room temperature (~28°C) in 1.4 ml total volume containing 20 mM Tris-Cl (pH 7.5) and 10 mM MgCl2, with variable enzyme and substrate concentrations. The molar ellipticities for epimerized dipeptides were determined by subtracting the optical rotation at equilibrium from the starting optical rotation. Rate constants were divided by two to account for reversibility. Values for kcat and KM were determined by fitting initial velocities to Michealis-Menton curves by using the program EnzFitter (Madison, WI). Errors presented are standard deviations determined from a minimum of three independent sets of kinetic assays. Kinetic parameters determined at 40°C and 50°C were quantified as described above, by using a 100 mm path-length water-jacketed cell connected to a Fisher Isotemp water bath (model 9000). Temperatures were monitored via a sensor in direct contact with the reaction solution.

Crystallization and Data Collection

Four different crystal forms (Table 4) were grown by the hanging-drop method at room temperature: (1) TM0006 in complex with Mg2+, (2) TM0006 in complex with Mg2+ and L-Ala-L-Leu, (3) TM0006 in complex with Mg2+ and L-Ala-L-Lys, and (4) TM0006 in complex with Mg2+ and L-Ala-L-Phe. The crystallization conditions were as follows:

Table 4
X-Ray Data Collection and Refinement Statistics for Crystals of TM0006
  1. For TM0006 in complex with Mg2+, the protein solution contained TM0006 (22.3 mg/ml) in 20 mM Tris (pH 7.9), 100 mM NaCl, and 10 mM MgCl2; the precipitant contained 10% PEG 6000, 0.1 M HEPES (pH 7.5), and 5% MPD. For this sample, crystals appeared in 7–8 days and exhibited diffraction consistent with the space group P21, with 16 molecules of TM0006 per asymmetric unit.
  2. For TM0006 in complex with Mg2+ and L-Ala-L-Leu, the protein solution contained TM0006 (33 mg/ml) in 20 mM Tris (pH 7.9), 100 mm NaCl, 10 mM MgCl2, and 20 mM L-Ala-L-Leu; the precipitant contained 3.6 M NaCl and 0.1 M CH3COONa (pH 4.5). For this and the remaining samples, crystals appeared in 2 days and exhibited a diffraction pattern consistent with space group P6122, with four molecules of dipeptide epimerase per asymmetric unit.
  3. For TM0006 in complex with Mg2+ and L-Ala-L-Lys, the protein solution contained TM0006 (33 mg/ml) in 20 mM Tris (pH 7.9), 100 mM NaCl, 10 mM MgCl2, and 40 mM L-Ala-L-Lys; the precipitant contained 3.4 M NaCl and 0.1 M CH3COONa (pH 4.5).
  4. For TM0006 in complex with Mg2+ and L-Ala-L-Phe, the protein solution contained TM0006 (33 mg/ml) in 20 mM Tris (pH 7.9), 100 mM NaCl, 10 mM MgCl2, and 60 mM L-Ala-L-Phe; the precipitant contained 3.0 M NaCl and 0.1 M CH3COONa (pH 4.5).

Prior to data collection, the crystals were transferred to cryoprotectant solution composed of their mother liquids and 20% glycerol and were flash cooled in a nitrogen stream. All X-ray diffraction data sets for the complexes of TM0006 with Mg2+ (Table 4, column 1), with Mg2+ and L-Ala-L-Leu (column 2), with Mg2+ and L-Ala-L-Lys (column 3), and with Mg2+ and L-Ala-L-Phe (column 4) were collected at the NSLS X4A beamline (Brookhaven National Laboratory) on an ADSC CCD detector to 2.1, 2.1, 1.9, and 2.3 Å resolution, respectively. Diffraction intensities were integrated and scaled with DENZO and SCALEPACK, respectively (Otwinowski and Minor, 1997). The data collection statistics are given in Table 4.

Structure Determination and Model Refinement

The structure of apo TM0006 was solved by molecular replacement with the fully automated molecular replacement pipeline BALBES (Long et al., 2008), by using only input diffraction and sequence data. The partially refined structure of apo TM0006 was output from BALBES without any manual intervention. Subsequently, several iterative cycles of manual rebuilding with TOM (Jones, 1985), refinement with CNS (Brunger et al., 1998), and automatic rebuilding with ARP (Lamzin and Wilson, 1993) resulted in a model with an Rcryst and Rfree of 0.246 and 0.277, respectively. The final structure contains 42,058 protein atoms, 1,072 water molecules, and 16 Mg2+ ions for 2 octamers of TM0006 in the asymmetric unit. Both TM0006 octamers are similar to the octamers observed in the E. coli and B. subtilis AEE epimerases (PDB files 1JPD and 1JPM, respectively). For the apo TM0006 structure (Table 4, column 1) and for the holo TM0006 structures (columns 2, 3, and 4), no nonglycine residues lie in the disallowed region of the Ramachandran plot. Residues 325–327 have no density in 4 out of 16 monomers and are not included in the final model. Also, flap regions 19–27 are not included in the final model for 12 monomers out of 16. The Mg2+ ions are well defined in all 16 monomers in the asymmetric unit. Each Mg2+ ion is coordinated by the side chains of Asp188, Glu216, and Asp241 and by three water molecules in each TM0006 monomer.

The structure of TM0006 crystallized with Mg2+ and L-Ala-L-Leu was automatically solved and partially refined with BALBES by using corresponding X-ray and sequence data. Subsequent iterative cycles of manual rebuilding with TOM, refinement with CNS, and automatic rebuilding with ARP were performed. The model was refined at 2.1 Å with an Rcryst of 0.245 and an Rfree of 0.259. The final structure contained residues (3–343), Mg2+ ions, and bound dipeptide with well-defined density in all four monomers of the asymmetric unit. The Mg2+ ion is coordinated by side chains of Asp188, Glu216, and Asp241; by one water molecule; and by two oxygen atoms from the dipeptide carboxyl terminus.

The protein portion of the complex with Mg2+ and L-Ala-L-Leu was the starting point for the refinement of TM0006 crystallized with Mg2+ and L-Ala-L-Lys (Table 4, column 3) and TM0006 crystallized with Mg2+ and L-Ala-L-Phe (column 4). These three structures contain the same protein molecules crystallized in the same space group. Iterative cycles of manual rebuilding with TOM, refinement with CNS, and automatic rebuilding with ARP with subsequent inclusion of water molecules were performed for the complexes with L-Ala-L-Lys and with L-Ala-L-Phe. Mg2+ ions were clearly defined in each monomer of both complexes and have the coordination identical to that found in the complex with L-Ala-L-Leu.

Final crystallographic refinement statistics are provided in Table 4.

Supplementary Material

structure 16


This work was supported by National Institutes of Health grant P01-GM071790. M.P.J. is a consultant to Schrödinger, Inc.



Coordinates have been deposited in the PDB with accession codes 3DFY, 3DEQ, 3DER, and 3DES.


Supplemental Data include one table, two figures, and Supplemental Methods and can be found with this article online at


  • Altekar G, Dwarkadas S, Huelsenbeck JP, Ronquist F. Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics. 2004;20:407–415. [PubMed]
  • Barker JA, Thornton JM. An algorithm for constraint-based structural template matching: application to 3D templates with statistical analysis. Bioinformatics. 2003;19:1644–1649. [PubMed]
  • Broun P, Shanklin J, Whittle E, Somerville C. Catalytic plasticity of fatty acid modification enzymes underlying chemical diversity of plant lipids. Science. 1998;282:1315–1317. [PubMed]
  • Brunger AT, Adams PD, Clore GM, DeLano WL, Gros P, Grosse-Kunstleve RW, Jiang JS, Kuszewski J, Nilges M, Pannu NS, et al. Crystallography & NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr D Biol Crystallogr. 1998;54:905–921. [PubMed]
  • Cammer SA, Hoffman BT, Speir JA, Canady MA, Nelson MR, Knutson S, Gallina M, Baxter SM, Fetrow JS. Structure-based active site profiles for genome analysis and functional family subclassification. J Mol Biol. 2003;334:387–401. [PubMed]
  • de Souza ML, Seffernick J, Martinez B, Sadowsky MJ, Wackett LP. The atrazine catabolism genes atzABC are widespread and highly conserved. J Bacteriol. 1998;180:1951–1954. [PMC free article] [PubMed]
  • Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. [PMC free article] [PubMed]
  • Favia AD, Nobeli I, Glaser F, Thornton JM. Molecular docking for substrate identification: the short-chain dehydrogenases/reductases. J Mol Biol. 2008;375:855–874. [PubMed]
  • Glasner ME, Fayazmanesh N, Chiang RA, Sakai A, Jacobson MP, Gerlt JA, Babbitt PC. Evolution of structure and function in the o-succinylbenzoate synthase/N-acylamino acid racemase family of the enolase superfamily. J Mol Biol. 2006;360:228–250. [PubMed]
  • Hermann JC, Ghanem E, Li Y, Raushel FM, Irwin JJ, Shoichet BK. Predicting substrates by docking high-energy intermediates to enzyme structures. J Am Chem Soc. 2006;128:15882–15891. [PubMed]
  • Hermann JC, Marti-Arbona R, Fedorov AA, Fedorov E, Almo SC, Shoichet BK, Raushel FM. Structure-based activity prediction for an enzyme of unknown function. Nature. 2007;448:775–779. [PMC free article] [PubMed]
  • Jacobson M, Sali A. Comparative protein structure modeling and its applications to drug discovery. Annu Rep Med Chem. 2004;39:259–276.
  • Jones TA. Interactive computer-graphics: Frodo. Methods Enzymol. 1985;115:157–171. [PubMed]
  • Kalyanaraman C, Bernacki K, Jacobson MP. Virtual screening against highly charged active sites: identifying substrates of α-β barrel enzymes. Biochemistry. 2005;44:2059–2071. [PubMed]
  • Kenyon V, Chorny I, Carvajal WJ, Holman TR, Jacobson MP. Novel human lipoxygenase inhibitors discovered using virtual screening with homology models. J Med Chem. 2006;49:1356–1363. [PubMed]
  • Klenchin VA, Schmidt DM, Gerlt JA, Rayment I. Evolution of enzymatic activities in the enolase superfamily: structure of a substrate-liganded complex of the L-Ala-D/L-Glu epimerase from Bacillus subtilis. Biochemistry. 2004;43:10370–10378. [PubMed]
  • Lamzin VS, Wilson KS. Automated refinement of protein models. Acta Crystallogr D Biol Crystallogr. 1993;49:129–147. [PubMed]
  • Lapeyre M, Leprince J, Massonneau M, Oulyadi H, Renard PY, Romieu A, Turcatti G, Vaudry H. Aryldithioethyloxycarbonyl (Ardec): a new family of amine protecting groups removable under mild reducing conditions and their applications to peptide synthesis. Chem Eur J. 2006;12:3655–3671. [PubMed]
  • Long F, Vagin AA, Young P, Murshudov GN. BALBES: a molecular-replacement pipeline. Acta Crystallogr D Biol Crystallogr. 2008;64:125–132. [PubMed]
  • Macchiarulo A, Nobeli I, Thornton JM. Ligand selectivity and competition between enzymes in silico. Nat Biotechnol. 2004;22:1039–1045. [PubMed]
  • Otwinowski Z, Minor W. Processing of X-ray diffraction data collected in oscillation mode. In: Carter CWJ, Sweet RM, Abelson JN, Simon MI, editors. Methods in Enzymology. New York: Academic Press; 1997. pp. 307–326.
  • Pegg SC, Brown S, Ojha S, Huang CC, Ferrin TE, Babbitt PC. Representing structure-function relationships in mechanistically diverse enzyme superfamilies. Pac Symp Biocomput. 2005;10:358–369. [PubMed]
  • Pegg SC, Brown SD, Ojha S, Seffernick J, Meng EC, Morris JH, Chang PJ, Huang CC, Ferrin TE, Babbitt PC. Leveraging enzyme structure-function relationships for functional inference and experimental design: the structure-function linkage database. Biochemistry. 2006;45:2545–2555. [PubMed]
  • Polacco BJ, Babbitt PC. Automated discovery of 3D motifs for protein function annotation. Bioinformatics. 2006;22:723–730. [PubMed]
  • Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003;19:1572–1574. [PubMed]
  • Schmidt DM, Hubbard BK, Gerlt JA. Evolution of enzymatic activities in the enolase superfamily: functional assignment of unknown proteins in Bacillus subtilis and Escherichia coli as L-Ala-D/L-Glu epimerases. Biochemistry. 2001;40:15707–15715. [PubMed]
  • Seffernick JL, de Souza ML, Sadowsky MJ, Wackett LP. Melamine deaminase and atrazine chlorohydrolase: 98 percent identical but functionally different. J Bacteriol. 2001;183:2405–2410. [PMC free article] [PubMed]
  • Song L, Kalyanaraman C, Fedorov AA, Fedorov EV, Glasner ME, Brown S, Imker HJ, Babbitt PC, Almo SC, Jacobson MP, Gerlt JA. Prediction and assignment of function for a divergent N-succinyl amino acid racemase. Nat Chem Biol. 2007;3:486–491. [PubMed]
  • Theodoropoulos D, Craig LC. The synthesis of several isoleucyl peptides and certain of their property. J Org Chem. 1955;20:1169–1172.
  • Tian W, Skolnick J. How well is enzyme function conserved as a function of pairwise sequence identity? J Mol Biol. 2003;333:863–882. [PubMed]
  • Tremblay LW, Dunaway-Mariano D, Allen KN. Structure and activity analyses of Escherichia coli K-12 NagD provide insight into the evolution of biochemical function in the haloalkanoic acid dehalogenase superfamily. Biochemistry. 2006;45:1183–1193. [PubMed]
  • Wangikar PP, Tendulkar AV, Ramya S, Mali DN, Sarawagi S. Functional sites in protein families uncovered via an objective and automated graph theoretic approach. J Mol Biol. 2003;326:955–978. [PubMed]
  • Whelan S, Goldman N. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol. 2001;18:691–699. [PubMed]