|Home | About | Journals | Submit | Contact Us | Français|
We characterize the molecular dynamics of a previously described computational de novo designed enzyme optimized to perform a multistep retrol-aldol reaction when engineered into a TIM barrel protein scaffold. The molecular dynamics simulations show that the protein dynamics under physiological conditions of temperature and aqueous environment distorts the designed geometric factors of the substrate-enzyme reaction intermediates, such that catalysis is limited by the primary retrol-aldol step of proton abstraction from the covalently bound substrate and its interactions with a histidine-aspartate dyad. These results emphasize that computational enzyme designs will benefit from considerations of dynamical fluctuations when optimizing active site geometries.
Recently, the de novo computational design of enzymes took a major step forward when Baker and co-workers engineered several motifs of novel catalytic activity toward non-natural substrates into existing protein scaffolds1,2 and subsequently verified them to be active in the laboratory. Although the experimentally measured activity of these designed enzymes is much lower than that of natural enzymes, the de novo structural-based design combined with directed evolution for the single step Kemp elimination reaction1 realized a 5-orders of magnitude improvement over the uncatalyzed reaction, and comparable catalytic efficiency to the catalytic antibody 34E4.3
The much more difficult de novo design case was for the multistep retrol-aldol reaction culminating in the carbon–carbon bond breaking of the non-naturally occurring substrate 4-hydroxy-4-(6-methoxy-2-naphthyl)-2-butanone2 (Figure 1a). In this case the Baker group's structural-based approach was to define an active site geometry composed of protein functional groups in orientations optimized to describe a composite structure of reaction intermediates and transition states for the multistep reaction.2 With this approach they determined four different retro-aldol reaction motifs designed to meet the static geometric criteria, and which were engineered into various TIM barrel and jelly roll protein scaffolds.2 For example, the so-called Motif III manifested in a TIM barrel construct involved a kinetic mechanism of a nucleophilic lysine attack on the substrate to generate a covalently bound substrate–protein complex (Figure 1b), followed by a histidine-aspartate dyad acting as a general base to abstract a proton from the substrate during the carbon–carbon bond breaking step (Figure 1c). Combined with other mutations relevant for optimizing active site side chain packing, the resulting so-called RA22 designed enzyme displayed a rate enhancement of ~3 orders of magnitude over the uncatalyzed reaction, but underperformed relative to a comparable catalytic antibody by ~2 to 3 orders of magnitude.4,5 The kinetics of RA22 were reported to be complex, showing a burst phase, which was taken as evidence that product inhibition was a limiting feature of the intended design, given that an X-ray crystal structure on a virtually identical RA22 sequence was within ~ 1 Å rmsd of the intended active site geometry.2
In this work, we focus on a completely different design criterion than static considerations of proximity alone, by asking whether the geometric criteria of the RA22 active site and its variants are satisfied temporally during a molecular dynamics simulation. We want to assay the influence of protein structural fluctuations executed at physiological temperatures and under aqueous solvent conditions on realizing the active site architecture and its intended substrate and intermediate interactions. Since our own experiments on RA22 fail to reproduce the burst phase reported by Jiang and co-workers,2 it appears that product inhibition is not limiting the catalytic efficiency. Instead we find that catalysis is inhibited by the primary retrol-aldol step of proton abstraction from the covalently bound substrate, due to dynamical fluctuations that fail to meet the geometric criteria of substrate interactions with the histidine–aspartate dyad on a temporal basis.
In our computational and experimental studies we used two sequences reported by the Baker group on RA22. The first was the intended design based on creation of 13 mutations into the TIM barrel protein scaffold as reported,2 which we refer to as RA22. In showing that the RA22 geometric design criteria in the engineered active site were met with these 13 mutations, the crystal structure of RA22 was reported in 3B5V, with the understanding that an additional 14th mutation involving S210A was to be ignored as that sequence did not have catalytic activity. However, 3B5V contained an additional 15 alanine mutations of scaffold residues lysine, asparagine, and glutamic acid on the surface of the TIM protein; we call this sequence Ala-RA22.
Codon-optimized genes for RA22, and its variant Ala-RA22, were synthesized by GenScript (Piscataway, NJ). The genes were directionally subcloned between NdeI and XhoI sites in a pET29b+ vector (Novagen, Madison, WI) encoding a carboxy-terminal 6xHis tag. The plasmids were transformed into BL21(DE3) (Stratagene, Inc.) and cultured in LB-broth at 37 °C. Protein expression was induced with isopropyl-β-d-thiogalactopyranoside at A600 = 0.6, and growth was permitted to continue for an additional 8 h. The cells were then pelleted by centrifugation for 10 min at 500 rpm and resuspended in 50 mM potassium phosphate buffer (pH 7) and 200 mM NaCl (loading buffer) with an additional 10 mM imidazole. The cells were then lysed by French press and the lysate cleared by centrifugation at 15000 rpm for 1 h. The cleared lysates were loaded onto a Ni-NTA agarose (Qiagen, Inc.) affinity column, washed with 10 column volumes of loading buffer with an additional 40 mM imidazole, and eluted with a 40–250 mM imidazole gradient. The purity of the protein, estimated at greater than 90%, was evaluated by SDS-page. Protein concentration was determined by absorbance at 280 nm using a Lambda 650 UV–vis spectrometer (PerkinElmer, Fremont, CA). The extinction coefficient of 24410 M−1 cm−1 was calculated from the amino acid sequence using the ProtParam Tool (Swiss Institute of Bioimformatics).
Evaluation of RA22 and Ala-RA22 kinetics was performed as previously described.2 The retro-aldol reaction of 4-hydroxy-4-(6-methoxy-2-anphthyl)-2-butanone (AsisChem, Cambridge, MA), was carried out in 2.7% acetonitrile, 25 mM HEPES, 100 mM NaCl (pH 7.5) at 28 °C; 180 μL buffer containing 16.44 μM protein and 5 μL substrate (1.25, 2.5, 5, 10, 20, and 40 mM) in acetonitrile. The progress of the reaction was followed using a Spectramax M2 plate reader (Molecular Devices, Sunnyvale, CA) in Costar 96 well black round bottom plystyrene non-binding surface microplates (Cornnig, Lowell, MA). Reactions were monitored over 5 h, with λex of 330 nm and λem of 452 nm and a cutoff filter at 435 nm. Measurements of known product concentrations were used to quantify the reaction fluorescence curves.
Reaction rates for RA22 and Ala-RA22 were determined using a linear fit to the fluorescence data. The fluorescence intensity at 452 nm was converted to product concentration values using a linear calibration equation, which was based on data collection at known product concentrations. The reaction rate fit was applied over 300 min of data for each of seven initial substrate concentrations, ignoring the initial 10 min to omit a lag phase we observed for both proteins at most concentrations. The reaction rate data at different substrate concentrations were fit to the Michaelis–Menten equation21 to obtain the Km and Kcat values for the reaction.
The initial structures for the apoenzymes were obtained from the RCSB for Ala-RA22 (3B5V.pdb) and the Supporting Information from Jiang and co-workers.2 The protein was modeled with the AMBER ff99SB protein force field,22 and the TIP4P-Ew model23 was used to describe the molecular solvent. Charges and parameters for the substrate were generated with the Antechamber module of the AMBER package (version 10)24 using the AM1-BCC charge model and the general AMBER Force Field (GAFF).25 Charges and parameters for the COV2 complex were generated with the Antechamber module using the AM1-BCC charge module, and the lysine portion of the molecule is described with standard lysine parameters while the substrate portion of the molecule is described with GAFF-derived parameters. Additionally, we reparameterized the force constant for the dihedral angle of the iminium bond, as the force constant assigned by Antechamber was not strong enough to maintain the double bond character.
We used the sander module in the AMBER package for all of the molecular dynamics simulations.24 The enzymes were solvated in a box of water and neutralized before undergoing four stages of minimization: three stages with loosening restraints (5.0, 1.0, and 0.1 kcal/mol/Å2) on the solute and one stage with no restraints. The system then underwent five steps of equilibration. In the first step, under constant volume, the temperature of the system was raised from 0 to 300 K using the Anderson thermostat with a 10.0 kcal/mol/Å2 harmonic restraint on the solute. The next three steps were under NVT conditions while the harmonic restraints were loosened (5.0 kcal/mol/Å2, 1.0 kcal/mol/Å2, and 0.5 kcal/mol/Å2). Finally, we simulated the system under NPT conditions using a weak barostat coupling at 1 bar for 1 ns. For the MD simulations, we used a 2 fs time step, with the long-range electrostatic interactions calculated using the Particle Mesh Ewald method and a cutoff of 9.0 Å for real space electrostatics and LJ interactions.
We analyzed the kinetics of the retro-aldol reaction for the RA22 and Ala-Ra22 enzymes as described in methods section. We obtained Km and Kcat values which were within the (large) error bars of the steady state rate reported by Jiang and co-workers2 (Figure 2a), but not within error of the Kcat that they obtained for the burst phase. In fact, for RA22 the Kcat/Km value we obtained, 0.017 ± 0.010 M−1 s−1, is very close to the steady-state value reported by Jiang and co-workers, 0.018 ± 0.006 M−1 s−1. However, we did not observe a burst phase in the first 20 min of our progress curve (Figure 2b). The characteristic double-exponential behavior of the previously reported burst phase can be clearly seen as an abrupt change in slope after 20 min in the inset of Figure 3a of the original study.2 In contrast, we observe a brief lag-phase followed by single-exponential behavior throughout the remainder of the experiment. Therefore we conclude that the RA22 enzyme does not exhibit burst phase kinetics, and thus product inhibition is not a limiting factor for catalysis by this enzyme. Consequently we focused on the designed catalysis steps, and the effect of thermal protein motion, as the likely source of the underperformance for RA22 and its variants.
We next conducted numerous molecular dynamics simulations of RA22 and Ala-RA22 to gain insight into how the protein and substrate atomic motions affect the performance of this designed enzyme. Primarily we are interested in seeing if the geometric parameters optimized for the retro-aldol reaction under Motif III are satisfied during the time course of molecular dynamics of the RA22 enzymes (Figure 1). Although there are numerous geometric criteria we can follow based on the Supporting Information given for RA22,2 it appears that certain geometric criteria exert a more critical influence than others on the reaction rate. These include (1) the distance between the nucleophilic lysine and the ketone carbon of the unbound substrate; (2) the hydrogen-bond distance and angles between the substrate and His233 in the subsequent covalently bound complex; and (3) the hydrogen-bond distance and angles between His233 and Asp53 to form the His–Asp dyad to manifest the general catalytic base. The first geometric criterion is necessary for the nucleophilic attack step that allows the covalent complex to form (Figure 1b), while the latter two geometric criteria are important for the critical second step of proton abstraction preceding the carbon–carbon bond-breaking of the substrate (Figure 1c).
The design of RA22 employed quantum chemistry calculations to optimize the orientation of the substrate's napthyl ring to be perpendicular to the imidazole ring of His233 (which we designate O1 in Figure 3a) for the imine formation, since it is consistent with the ligand orientation of the subsequent covalent intermediate complex in the carbon–carbon bond-breaking step. This is an aspect of the static composite structure approach that we question when we consider starting the substrate orientation with the plane of its aromatic groups parallel to the His233 imidazole ring to optimize π–π stacking (labeled O2 in Figure 3b). We use the criteria of the near attack conformation (NAC) hypothesis6–8 to assess the frequency of likely nucleophilic attack conformations, and to assess the relative favorability between the O1 and O2 orientations of step 1 in the RA22 design. Specifically, a successful nucleophilic attack for bond formation requires that the lysine ε-amino group and the substrate ketone carbon are at a van der Waals contact distance of dNAC < 3.25 Å, resembling the bond to be formed in the transition state; we also consider a more relaxed criteria of dNAC < 3.5 Å to account for potential discrepancies of the empirical force field.
The molecular dynamics simulations indicate that the parallel O2 orientation with respect to His233 is optimal for the nucleophilic attack by Lys159. In fact when the simulations are initiated with the O1 conformation, the napthyl ring of the substrate eventually rotates 90° and goes from the initial T-shaped π–π interaction to a parallel O2-like π–π interaction. As this rotation occurs, the substrate hydroxyl group engages in a hydrogen bond with the lysine ε-amino, which in turn pulls the lysine amine group further away from the substrate ketone (Figure 3c). The result is that the ε-amino group is never within nucleophilic attack distance of the substrate ketone over the course of the simulation when started in the O1 orientation (Table 1). On the other hand, when started in the O2 orientation, the substrate remains as initially positioned throughout the simulation. The O2 orientation is stabilized not only by π–π stacking with His233, but also through additional π–π stacking with Phe211. These combined interactions position the substrate near the lysine amine, such that it forms hydrogen-bonds with the substrate hydroxyl group and ketone oxygen, and satisfying the NAC condition (Figure 3d). By our temporal assay, we find that ~50% of the time the dynamics allows formation of the NAC geometry in the O2 orientation for the less stringent dNAC < 3.5 Å and ~15 to 20% of the time with the more stringent dNAC < 3.25 Å (Table 1).
The critical catalysis step in the RA22 enzyme design is the formation of a His233–Asp53 dyad which acts as a general base, that in turn aids the proton abstraction from the covalently bound substrate through its hydrogen-bonded interaction with His233 (Figure 1c). This reaction intermediate sets up the transition state that subsequently breaks the carbon–carbon bond to yield the ketone and napthyl moiety products. Because the composite transition structure was used to optimize both the nucleophilic-attack and the proton abstraction steps of the reaction, a single ligand orientation (O1) guided the design process. Although the O1 substrate orientation is highly disfavored in the nucleophilic attack step, nonetheless it is the orientation that was optimized in the composite transition structure that also accounts for the proton abstraction step.2 Given that we found that the O2 orientation is the productive nucleophilic attack geometry, we therefore consider both the O1 and O2 substrate orientations of the covalently bound substrate-enzyme complex. In either case, the MD simulations indicate that the RA22 active site fluctuations distorts the optimized geometric factors of this substrate-enzyme reaction intermediate, albeit in different ways.
The formation of the His–Asp dyad is satisfied when d2 < 3.0 Å, 110° < θ3 < 130°, and 115° < θ4 < 135° (Figure 1c). Table 2 shows that the His–Asp dyad geometric criteria are rarely met with the O2 ligand orientation, while they are met ~25% of the time with the O1 orientation. This is because for the majority of the simulation Asp53 is not hydrogen bonded with His233, but is instead interacting with water molecules. The inability of the active site to conform to the geometric constraints of the His–Asp dyad with any frequency leaves the His233 side-chain much floppier, so that it is never able to form the optimal θ1 ≈ 110° and θ2 ≈ 125° angles or d1 ≈ 2.8 Å distance necessary for hydrogen-bonding with the covalently bound substrate. The lack of a hydrogen-bonding constraint means that although individually the χ1, χ2, and χ3 geometric criteria for the lysine–imine complex are met with reasonable frequency, they are rarely met simultaneously. This failure to satisfy the geometric criteria in a concerted fashion—in particular the formation of the His–Asp dyad—would account for the poor kinetic performance observed experimentally. It is interesting to note that the same qualitative conclusions are reached whether considering RA22 or Ala-RA22, which suggests that the scaffold dynamics are very similar or possibly irrelevant to the observed active site dynamics, although, experimentally, Ala-RA22 is extremely prone to aggregation.
We have used molecular dynamics to analyze the performance of a de novo designed enzyme, RA22, to catalyze a multistep retrol-aldol reaction acting on a non-naturally occurring substrate. Although the reported evidence of burst phase kinetics would imply that product inhibition is a contributing source of the poor catalytic efficiency for RA22, our own experiments find no evidence of a burst phase and thus we conclude that the intended active site design is not being met for other reasons. In fact, we find that most aspects of the static design are distorted from the optimized composite active site structure, comprising the substrate-enzyme reaction intermediates and transition states, by the aqueous protein fluctuations at room temperature. By contrast, recent quantum and molecular mechanics studies on mechanistic details of the Kemp elimination reaction determined that a majority of the designed enzymes met the design criteria,9 which may be a result of greater design control on a reaction with single step mechanism.
While the intended design positioned the substrate napthyl ring edge to interact with the face of the His233 ring (O1 orientation) to set up the nucleophilic attack by lysine on the ketone portion of the substrate, we found that the substrate finds better energetics through π–π stacking with not just Phe211 (as the original design intended) but with π–π stacking with His233 (O2 orientation). While ab initio studies have reached consensus that, at room temperature, the face-to-face and face-edge geometries are essentially isoenergetic for benzene–benzene interactions,10 the relative energetics of these conformers can change when considering ring heteroatoms such as nitrogen,11 ring substituents,12 or a heterogeneous protein environment.13 These are highly subtle effects for fixed charge force fields to model with confidence; however, empirical force fields nonetheless do a reasonable job of representing aromatic side chain interactions in the folded state of proteins14 as well as model systems.15 This would suggest that reoptimization of the composite active site structure with respect to the O2 orientation may be beneficial for improving the catalytic design.
While the O2 orientation permits a near attack distance (dNAC < 3.25 Å or dNAC < 3.5 Å) to be met with reasonable frequency in the nucleophilic attack step, we find that the O1 orientation is preferred over O2 for the proton abstraction step from the covalently bound substrate, consistent with the intended design. However even in the O1 orientation, competing interactions with water and the active site dynamics prevented Asp53 from forming the His233–Asp53 dyad, which in turn propagates through the covalently bound complex so that His233 rarely forms a hydrogen-bond with the imine–lysine complex. We attribute the underperformance of the RA22 enzyme to the dynamical distortions of the needed base for proton abstraction at this critical enzymatic step.
Naturally occurring enzymes accelerate rates over uncatalyzed reactions by 8–9 orders of magnitude, with catalytic efficiencies as measured by Kcat/Km that are typically 107–108 with some exceptional enzymes yielding Kcat/Km upward of 1014. Introduction of novel activity and enhancement of native activity in naturally occurring enzymes is difficult because native enzyme catalysis and specificity are already well optimized through natural selection. Computational de novo protein design,16–19 when applied to the development of enzyme catalysis of chemical reactions that are not part of nature's repertoire, resets the performance bar to something more manageable, by serving as a plausible beginning for subsequent laboratory directed evolution that will enhance performance through rapid exploration of sequence space using mutation and recombination. This was demonstrated explicitly for a single step Kemp elimination reaction, which took the underperforming computational de novo designed enzymes with efficiencies that ranged over Kcat/Km ≈ 101–102, and under directed evolution increased their performance by an order of magnitude.1
Because the laboratory directed evolution mutations primarily occurred in the near satellite regions of the designed active site for Kemp elimination,1,5 it implicitly suggests that the nondesigned mutations were able to fine-tune the ability of the active site geometries to meet the design criteria on a temporal basis. This may be an optimal strategy for future protein engineering strategies.20 However, we believe that directed evolution may be inhibited inherently by the performance of an enzyme built from scratch based on a static design that only optimizes spatial proximity but not dynamical congruence. For example recent NMR and MD studies of native TIM showed that the active site loop residue dynamics were highly correlated and necessary for catalytic activity.26 We suggest that computational models and methods are up to the task to incorporate dynamics into the early design stage, an alternative approach to protein engineering that deserves further consideration.
We thank the NSF Cyberinfrastructure program for support of the work presented here, and the National Energy Research Scientific Computing Center for computational resources. KAB thanks the NIH Molecular Biophysics training grant T32 GM08295. All of us thank Enghui Yap, Dr. Po Hu, Dr. Jerome Nilmeier, and Dr. Alex Sodt for helpful discussions, and Youcef Ouadah for help on the experiments.