|Home | About | Journals | Submit | Contact Us | Français|
The ability to design effective enzymes is one of the most fundamental challenges in biotechnology and in some respects in biochemistry. In fact, such ability would be one of the most convincing manifestations of a full understanding of the origin of enzyme catalysis. In this work we explore the reliability of different simulation approaches, in terms of their ability to rank different possible active site constructs. This validation is done by comparing the ability of different approaches to evaluate the catalytic contributions of various residues in chorismate mutase. It is demonstrated that the empirical valence bond (EVB) model can serve as a practical yet accurate tool in the final stages of computer aided enzyme design (CAED). Other approaches for fast screening are also examined and found to be less accurate and mainly useful for qualitative screening of ionized residues. It is pointed out that accurate ranking of different options for enzyme design cannot be accomplished by approaches that cannot capture the electrostatic preorganization effect. This is in particular true with regard to current design approaches that use gas phase or small cluster calculations and then estimate the interaction between the enzyme and the transition state (TS) model rather than the TS binding free energy or the relevant activation free energy. The ability of the EVB model to provide a tool for quantitative ranking in the final stage of CAED may help in progressing towards the design of enzymes whose catalytic power is closer to that of native enzymes than to that of the current generation of designer enzymes.
Enzyme design has become a subject of major attention and major activity in recent years (1–5). Effective design is expected to have a great potential in industrial application and eventually in medicine. Furthermore, one can argue that the ability to design efficient enzymes is the best manifestation of a true understanding of enzyme catalysis. However, at present there has been a limited success in most attempt of rational enzyme design (6, 7), and the resulting constructs have been significantly less effective than the corresponding natural enzymes (1, 8).
Many proposals for the reasons of the limited success have been brought forward, but most are not based on actual validation studies (for review see (1, 8)). In fact, it has been argued (9) that the problems are due to the incomplete modeling of the transition state (TS) and, in part, to the limited awareness to the key role of the reorganization energy. However only demonstration of better performance in enzyme design can tell us what is exactly missing in current approaches. In fact, the ultimate reliability of computer aided enzyme design (CAED) is determined by the reliability of the calculations of mutational effects on catalysis. In other words, at the end of the day a reliable CAED approach must be able to rank different design options by predicting their activation barriers and the corresponding catalytic effect. Thus, it is possible to ask what are the missing factors in current design strategies and in particular in the final screening stages, by examining the performance of different approaches in calculations of mutational effects. In this respect it is useful to note that semiquantiative computational studies of the effect of mutations of enzyme catalysis date back the empirical valence bond (EVB) simulations of the effect of mutations in the catalytic power of trypsin (10) and to more qualitative transition state description (11). Subsequent calculations of mutational effect include EVB studies (e.g. (9, 12–16) and more recent (molecular orbital) quantum mechanics/molecular mechanics (QM/MM) studies (e.g. (17–22)). Relevant approaches that may be used for fast initial screening included the linear response approximation (LRA) evaluations of the contributions of different residues to the TS energy (23) and less systematic attempts of evaluating group contributions. The above studies, and in particular the quantitative one have established that the effects of the mutations are associated with the changes in reorganization energy upon mutations (9, 13, 14, 24).
Although recent design studies have produced encouraging results they have not been based on methods capable of estimating the energetics of the final design constructs (the approaches used are unable to predict the activation barriers in the generated active sites). More specifically current CAED approaches are based on fast and elegant construction of active sites that can provide reasonable interaction with the gas phase TS model. Here the emphasis is on the generation of protein structures with reasonable interactions with the TS model. While such approaches are very effective in generating structural candidates, the scoring function used is problematic, as it cannot reproduce the TS binding free energy or the catalytic effect. This is true regardless of the attempts to identify the estimated interaction energies with the TS binding energy. Our point can be easily verified by comparing the calculated scoring function to the experimental TS binding energy. Even more sophisticated methods such as the linear response approximation (LRA) method, that will be considered here, cannot provide quantitative scoring function. Thus the current CAED approaches basically focus on generating reasonable active sites with good hydrogen bonding etc. and leave the final decision to the experimental screening.
Our philosophy is quite different than the above approaches, since we believe that it is hard to design a function when the relevant property is not reproduce by the calculations. Thus we demand that the scoring function will reproduce the correct catalytic effect (a task that cannot be accomplished by current design approaches). Of course, the more expensive is the evolution of the free energy, the less practical is our approach as compared to experimental mutational studies. Thus we will focus on the price per performance issues. Of course, the EVB cannot be used in the preliminary enzyme design since the number of options is overwhelming. This step is not the focus of the present work and will only be considered in a preliminary way.
The above discussion can be summarized by stating that the experience from quantitative computer simulations of mutational studies has not been translated to a general guideline in current enzyme design methods, and the present work is aimed at moving in such a direction. This is done by trying to establish what it takes to get reliable results in the final screening stage, in terms of the relationship between accuracy and computer time. It is found that the EVB approach can provide a reliable way of performing the final screening steps in CAED while other approaches can be useful in more preliminary steps.
In this work we chose as a benchmark the ability to reproduce the catalytic activity of different mutants of different forms of enzyme chorismate mutase (CM) that catalyzes the reaction shown in Figure 1. The studied systems are the trimer Bacillus subtilis chorismate mutase (BsCM),(25) the homodimeric chorismate mutase from Escherichia coli (EcCM) (26) and the monomeric chorismate mutase mMjCM obtained by Hilvert and coworkers (27) by topological redesign of the thermostable EcCM homologue from M. jannaschii (MjCM). The coordinates for BsCM (X-Ray), EcCM (X-Ray) and mMjCM (NMR) structures were obtained from the Protein Data Bank, Brookhaven National Laboratory, with PDB access codes 1COM, 1ECM and 2GTV, respectively. The trimer resembles a beta-barrel structure in which a core beta-sheet is surrounded by helices whereas the dimer and monomer adopt a helix-bundle structure (see Figure 2). The active site residues that interact strongly with the transition state analogue (TSA) are depicted in Figure 3.
CM has been the subject of intensive design studies (1) and was also considered in the examination of the origin of the difference between transition states (TSs) and transition state analogues (TSAs) (1, 28). The selection of this system allows us to examine three different enzymes that catalyze the same chemical reaction. Furthermore, since the protein topology and active site residues are different in the three CM forms they provide a diverse test set while keeping the same reference reaction. Thus we have a system that clearly presents a major challenge to CAED.
Below we will review the simulation approaches that will be used in this study. We start with the clarification that we do not like to use approaches of the type used recently in enzyme design studies (6, 7), where the TS features are determined in the gas phase since such approaches cannot be used to rank in a quantitative the different design constructs.
We will start with what we believe is the most reliable current approach for evaluating activation barriers of enzymatic reactions, namely the EVB. This method was used by our group and others in quantitative studies of the catalytic power of many enzymes (e.g. (9)). The EVB has been described in great details elsewhere (12, 13) and was used recently in a study of CM (29, 30). We give here only key relevant points.
The EVB is a QM/MM method that can be considered as a mixture of force fields of reactant and products (or intermediate) in a way that retains the correct change in structure and charge distribution along the reaction coordinate. The reason for the remarkable reliability of the EVB is that it is calibrated on the reference solution reaction and then the calculations in the enzyme active site only reflects (consistently) the change in the environment, exploiting the fact that the reacting system is the same in enzyme and solution. Thus, the EVB approach has to be calibrated only once per in a study of a given type of enzymatic reaction and this is done while considering the uncatalyzed reaction. More specifically the EVB begins with the resonance states (or more precisely, diabatic states) corresponding to classical valence-bond structures. These basis states are mixed to describe the reacting system. The potential energies of the diabatic states (H11 and H22) and the mixing term (H12) are represented by the Hamiltonian matrix elements,
Here R and Q represent the atomic coordinates and charges, respectively, of the reactants or products (“solute”) in the diabatic states, and r and q are the coordinates and charges of the surrounding water or protein (“solvent”). is the energy of the ith diabatic state in the gas-phase, where all the fragments are taken to be at infinity; is the intramolecular potential of the solute system (relative to its minimum) in this state; represents the interaction between the solute atoms and the surrounding solvent atoms; and represents the potential energy of the solvent.
The adiabatic ground-state energy (Eg) and the corresponding eigenvector (Cg) are obtained by solving the secular equation
The simplicity of the EVB formulation makes it relatively straightforward to obtain analytical derivatives of the potential surface by using the Hellmann-Feynman theorem for the first derivatives of Eg, and thus to sample the EVB energy surface by molecular-dynamics (MD) simulations. This is done by a combined free energy perturbation (FEP) umbrella sampling (US) procedure that provides the free energy function (Δg‡(x)) that is needed to calculate the activation free energy (Δg‡). The FEP/US mapping procedure used to evaluate the EVB free energy surface is described elsewhere (12), and here we review only essential points for the simple case of two diabatic states. In such a case we use a mapping potential of the form,
where θm changes from 0 to 1 in n+1 fixed increments (θm = 0/n, 1/n, 2/n, …, n/n). The free energy ΔGm associated with changing λ from 0 to m/n can be evaluated by a free-energy perturbation (FEP) procedure. The free energy functional that corresponds to the adiabatic ground state surface, Eg, is obtained by the FEP-umbrella sampling (FEP/US) method, which can be written as,
In this expression, εm is the mapping potential that keeps the reaction coordinate x in the region of x′, m denotes an average over an MD trajectory on this potential, β = (kBT)−1, kB is the Boltzmann constant, and T is the temperature. If the changes in εm are sufficiently gradual, the free energy functional Δg(x′) obtained with several values of m overlap over a range of x′, and patching together the full set of Δg(x′) gives a complete free energy curve for the reaction.
The FEP-US approach also can be used to obtain the free energy functional of the individual diabatic states. For example, the free energy of the reactant state (Δg1) is,
In order to relate the origin of the catalytic effect to the EVB results it is convenient to approximate the activation free energy by the modified Marcus equation (12),
where is the so called “work term” that describes the free energy of bringing the donor and acceptor to the interaction distance (at the reactant state, λ is the reorganization energy, ΔG0 is the reaction free energy, Γ is the nuclear quantum mechanical correction and the 12 are the average values of H12 in the TS (x‡) and the reactant state (x0). The nature of this expression is defined schematically in Figure 4 (see also ref. (9)). The first two terms of Eq. 6 are those used in Marcus formulation for electron transfer reactions (31) and the rest of the expression represents the effect of the very strong mixing (H12) between the diabatic states.
The reorganization energy can also be obtained directly from the EVB diabatic free energies of Eq. 5, which follow the same trend as that shown in Figure 4. The work term is discussed in ref. (14, 32) and it is related to the potential of mean force (PMF) of bringing the donor and acceptor together. When the PMF is close to zero the work term it is similar to the cage effect discussed in many of our works (see (9)).
The reorganization energy can also be estimated by using the expression (9),
where Δε is the difference between εa and εb and Δε designates average over trajectories on εa.
The EVB calculations were evaluated by using the MOLARIS simulation program (33, 34) using the ENZYMIX force field. The EVB activation barriers were calculated at the configurations selected by using the same free energy perturbation umbrella sampling (FEP/US) approach used in all of our EVB studies. The simulation systems were solvated by the surface constrained all atom solvent (SCAAS) model (33) using a water sphere of 18 Å radius centered on the substrate and surrounded by 3 Å grid of Langevin dipoles and then by a bulk solvent, while long-range electrostatic effects were treated by the local reaction field (LRF) method (33). The EVB region consisted of the entire substrate, a total of 24 atoms. The FEP mapping was evaluated by 21 frames of 20 ps each for moving along the reaction coordinate with our all atom surface constrained spherical model. All the simulations were done at 300 K with a time step of 1 fs. In order to obtain reliable results we repeated the simulations several times with different initial conditions (obtained from arbitrary points in the relaxation trajectory). The final results were obtained from the average of the different simulations. The different mutations were generated from the native enzymes by 100 ps relaxation runs.
The EVB approach is quite expensive (when one insists on converging results) and requires major computer time for a reasonable convergence when one deals with electrostatic effects in protein interiors. In some cases it is possible to obtain reasonable results by semimacroscopic models that focus on the TS electrostatic energy. This is true in particular with regard to the semimacroscopic version of the protein dipole Langevin dipole (PDLD/S) method in its linear response approximation (PDLD/S-LRA) version (35) that provides a direct link between the microscopic and macroscopic concepts. Since this method was reviewed extensively, we review here only its main features. The PDLD/S-LRA method evaluates the change in electrostatic free energies upon transfer of a given ligand (l) from water to the protein by using,
where the ΔG is the free energy of changing the ligand in the given environment (i.e., water (w) or protein (p)). Using the cycles described in ref. (35), we start with the effective PDLD potentials;
where ΔGsol denotes the electrostatic contribution to the solvation free energy of the indicated group in water (e.g., denotes the solvation of the protein-ligand complex in water). To be more precise, ΔGsol should be scaled by 1 / (1 − 1/εw) but this small correction is neglected here. The values of the ΔGsol’s are evaluated by the Langevin dipole solvent model. l and l′ denote the polar and nonpolar ligand respectively. is the electrostatic interaction between the charges of the ligand and the protein dipoles in vacuum (this is a standard PDLD notation) and is the intramolecular electrostatic interaction within the ligand. We also evaluate the non electrostatic contributions to the binding energy as described in ref. (35). Now the PDLD/S results obtained with a single protein-ligand configuration cannot capture properly the effect of the protein reorganization (see discussion in ref. (35)) and a more consistent treatment should involve the use of the LRA or related approaches (e.g., (35)). This approach provides a reasonable approximation for the corresponding electrostatic free energies:
where the effective potential Ū is defined in Eq. 9, and l and l′ designate an MD average over the coordinates of the ligand-complex in their polar and nonpolar forms respectively. It is important to realize that the average of Eq. 10 is always done where both contributions to the relevant Ūelec are evaluated at the same configurations. That is, the PDLD/S energies of the polar and nonpolar states are evaluated at each averaging step by using the same structure. However, we generate two set of structures one from MD runs on the polar state and one from MD runs on the nonpolar state. This is basically the same approach used in the microscopic LRA but now with the effective potential Ūelec.
Even the PDLD/S-LRA approach is computationally demanding and would require enormous computer time if one would likes to use this approach in the preliminary screening stages which involve exploration of many possible mutations. A promising strategy for this stage may be provided by the evaluation of the so called “electrostatic group contributions” (36, 37). These contributions are defined here as the effect of “mutating” all the residual charges of the given group to zero. In principle, we can perform such mutations and evaluate the PDLD/S-LRA binding energy for the given native and mutant. However, when we are dealing with charged and polar residues, it is reasonable to start with the faster screening approximation introduced by Muegge et al (36, 37). This approach evaluates the electrostatic group contributions to the binding energy by looking at the terms in Eq. 9, this leads to,
where εx is taken as εx ≈ 4 for polar residues and εx = εeff ≈ 40 for ionized residues. This approach was examined in several test cases (e.g. (37, 38)) and apparently provides a reasonable result for an initial screening.
The calculations of Eqs. 10 and 11 were done by using the MOLARIS simulation program (33, 34) The PDLD/S-LRA simulations involved the generation of 5 configurations in the charged and uncharged forms of the TS by MD runs of 10 ps with a 1 fs time step at 300 K. The calculations use the SCAAS spherical boundary condition (33) and the LRF long range treatment (33). The averages of Eq. 11 are evaluated on the 5 configurations with the charged forms.
As stated above we took the CM system as our benchmark. More specifically we considered; a) the native EcCM and the V35I, V35A mutants that has been used recently by Mayo and coworkers (39), b) we also considered the monomer mMjCM and the F77W-mMjCM, Q88N-mMjCM, R51Q-mMjCM and D48G-mMjCM mutants that have been examined by Hilvert and coworkers (40). This system is of a particular interest since this enzyme behaves like a molten globule when it does not bind the substrate (41) and since this enzyme have a wider catalytic landscape than that of the native enzyme. We also considered c) the R11A-EcCM, R28A-EcCM and K39A-EcCM mutations of ref. (42) and d) the native Bs-CM and the R90G-BsCM and R90Cit-BsCM mutations of refs (43, 44), where the removal of an active site charge group leads to a major loss of catalytic power. The latter mutation consists of the substitution of a charged residue, Arg90, with a neutral hydrogen donor, citrulline (see ref. (44)).1 In all of these cases the observed values of kcat/Km are known and the observed values of kcat are also known except for the mutations mentioned in section c. Here we explore the performance of several screening approaches.
The difference between the activation barrier ΔΔg‡p that corresponds to kcat/KM (or more precisely to kcat/Kbind(RS)) and the activation barrier for the uncatalyzed reaction is related to the TS binding energy by (23),
The electrostatic contributions to this binding free energy can be estimated by the PDLD/S approach and by Eq. 11. Here we started by exploring first the most qualitative approach, evaluating the group contributions to the TS binding by using Eq. 11. The corresponding results are shown if Figure 5, for the EcCM dimer. As seen from the figure, the group contributions captured the effect of the positively charged active site groups. However, for the other residues the performance is not encouraging.
Next we examined the performance of the PDLD/S-LRA in the evaluation of the TS binding free energy. The performance of this approach is summarized in Table 1. As seen from the table this approach is useful when we deal with strong direct electrostatic interactions (e.g. the interaction with Arg 11, Arg 28 and Lys 39 for EcCM system) but less effective when it comes to less direct effects. Overall it seems that at present both approaches can mainly be used in a screening of the effect of ionized residues and in some cases of polar residues (23).
In order to move to a more accurate screening we had to move to the EVB approach. This approach can evaluate Δg‡cat (that corresponds to kcat) rather than the Δg‡p considered in the previous section. Here we explored the catalytic power of the trimer (BsCM), dimer (EcCM) and monomer (mMjCM) by calculating the EVB surfaces for different mutants and native proteins considered. The results are shown in Table 2. In each of the systems considered we started by generating the given sequence from the native protein proceeded to 100 ps MD simulations in order to relax the given structures. Five structures were saved during each relaxation process and then used to generate the EVB surface and obtain the activation free energies and reorganization energies.
The calculated activation barriers of the different mutants are compared to the corresponding observed values in Table 2 and Figure 6. As seen from the table the agreement between the calculated and observed results is excellent and can be considered to be a quantitative agreement. It should be noted in this respect that the performance of our calculations may not be fully appreciated by those who note the small deviations between the calculated an observed in Figure 6 while overlooking the fact that we actually reproduced quantitatively the absolute values of the activation free energies without any parameterization on the reaction in the enzyme. Obtaining such a quantitative prediction is very encouraging and cannot be expected from current design approaches that use gas phase calculations.
Thus the EVB approach can be used in the final screening stage of CAED approaches and perhaps can be defined as the “gold standard” of our approaches. However, this approach is quite expensive (see below) and it is important to explore less quantitative approaches. Since the activation barrier is correlated with the reorganization energy (see Figure 4 and ref 9), we started the search for a faster screening approach by examining the relationship between the reorganization energy obtained from the full EVB free energy functional, λ1 (see Figure 4) and from the LRA approach Eq.7, λ2. The results are shown in Table 3 and in both cases we have significantly less good agreement between the calculated and observed results than in Table 2. The situation can be improved in screening other enzymes, since in the case of CM the reorganization energy involves a large intermolecular contribution, which requires longer simulations for full convergence. Also in the case of CM we have a very large charge-charge interaction in the solute system and this makes the outer sphere reorganization energy a less stable quantity.
The approaches examined above used the native structure as the starting point for the screening process. Obviously, a more general treatment should start from configurations where at lease the initial orientation of the active site side chains is randomized. Although we have not focused on this issue in the present work we have developed a potentially very effective approach for this stage of the screening. That is, our approach of using a simplified folding model as a reference potential for explicit folding calculations (24, 45), which can be called Folding with Coarse Grained Reference (CGR) simulations, has been used recently in exploring the catalytic landscape of CM (30). This approach or its variants can be used in prescreening. For example, we can start by taking the structure of the native main chain and sample the positions of the simplified side chains with an explicit model of the substrate (see ref. (30)). Next one can convert the simplified model to a “simplified explicit model” where the solvent is treated implicitly, and use a Monte Carlo (MC) procedure to generate the explicit side chains. The difference between the explicit and the simplified potential can be used in the MC procedure. The simplified model can also be used for fast screening of the optimal residues and thus to suggest which residues will be included in the explicit treatment. Furthermore, we can also generate optimized simplified sequences and structures and then monitor the energy difference between the simplified and explicit model for the TS binding of different mutants. Those residues that would give lowest energy can then be screened by the EVB approach. In fact we can use this MC procedure for calculating the TS binding free energy.
In the present work we decided to leave the exploration of the full CGR approach to a subsequent study and to use a simpler related treatment, where the main chain was fixed and the configurations of several side chains were sampled by using first a rotamer library and then a MC optimization in the torsional space of the side chains of the simplified explicit model The lowest energy configurations found by the above treatment were then used as starting points in EVB calculations (with the explicitly solvated full microscopic model) by taking both the mutant Q88N-mMjCM and the native enzyme and trying to reproduce their observed activation energies. This was done while starting from randomized arrangements of the configurations of the mutated residue and several other residues. The point in this validation study is that we can view the native enzyme and the mutant as the targets in a design experiment (“pretending” that we do not know the actual structure). At any rate, our study considered different configurations of the mutated residue and neighboring residues while keeping the main chain fixed and generating optimized structures with low energy. The performance of this approach, for the case where we explore the configurational space of residues 84, 88 and 91, is considered in Figure 7 for the Q88N mutant. This is done in terms of the rapidly obtained energies of the simplified explicit model rather that in terms of the EVB free energy, which will be considered below. The figure indicates that the structures with a low RMS from the “correct” structure (the one generated by a single mutation of the native structure) have low energy (a similar pattern was obtained for the native enzyme). Thus the low energy structures correspond frequently to structures that are close to the actual active site structure and the use of the rapidly obtained low energy structures as starting point for the extensive EVB simulations may provide powerful screening approach.
The above procedure was examined first in the case of the native enzyme, where we explored the configurational space of residues 84, 88 and 91 and took the four lowest energy configurations and used them in EVB studies. It was found that starting with two of the low energy configurations reproduces the observed activation barrier (calculated barrier of about 15 kcal/mol).
The situation became more complicated with the Q88N mutant. That is, using the above procedure while randomizing only the structure of residue 88 and then picking the lowest energy configuration, gave EVB barriers of about 20 kcal/mol (close to the corresponding observed barrier). Similar result was obtained for mutant structures that were generated from the native enzyme. However, taking the mutant structures generated by using the rotamer library of three residues gave EVB barriers that were significantly lower than the observed barrier. Exploring the origin of this effect, by calculating the binding energy of the substrate, indicated that the system was locked in an unstable ground state that led to a reduction in Δg‡cat, while still having reasonable Δg‡p (similar problem occurs in our study of DNA polymerase β (46)). In order to deal with this problem we used an alternative approach where we calculated the binding free energy of the EVB transition state. These calculations were performed by building a TS from the combination of the two EVB states with a mapping parameter (the θm of Eq. 3) that corresponds to the TS (see (15) for a related approach). The charging free energy of the TS was then calculated by a FEP charging approach (for the water and enzyme systems) and taken as the approximated TS binding free energy. The non-electrostatic contribution was assumed to be similar for the native and mutant enzymes and it was also assumed that the solvation free energy of the free enzyme is similar for the native and mutant enzymes. Furthermore, estimating the non-electrostatic contribution by the LIE approximation (47) gave similar contributions for the native and mutant enzyme. We also imposed a weak constraint on the protein to prevent a major rearrangement at the state where the substrate is fully uncharged (in this state the repulsion between the protein positively charged groups leads to major reorganization in the absence of the substrate negative charges). This constraint is justified since the rigorous final state should be the state where the substrate is replaced by water. At any rate, now the TS binding energies of the native and mutant enzyme appeared to follow now the observed trend. That is, the TS of the mutant generated by using the rotamer library of three residues had about 2 kcal/mol less negative binding energy than the mutant generated starting from the native structure and 4 kcal/mol less negative than the binding energy of the native enzyme. This result is in a reasonable agreement with the observed difference in Δg‡p although the error range of these calculations is larger than in the EVB calculations. Nevertheless in cases where the active site is constructed by extensive conformational search it may be beneficial to evaluate the TS binding energy rather that to calculate Δg‡cat. Here we believe that the above FEP charging approach can be very effective. However, more studies along this line are clearly needed before selecting the most effective screening strategy.
In order to assess the effectiveness of a given CAED approach it is important to have a clear idea on the computational resources needed to obtain the given result. Here we report the computer time needed for each of the approaches considered above, or in other words the price per performance ratio. The relevant estimates are summarized in Table 4. As seen from the table we can screen 28 mutants by the EVB approach, while using 200 processors. Of course, it is possible to use much more massive power and to screen more mutants by the EVB approach. We can also screen 1000 mutants on 200 processors by using the PDLD/S-LRA approach but these calculations are significantly less accurate than the EVB calculations. Thus although the EVB screening may look like a major investment in computational resources it might provide at present the most practical way of getting accurate screening. In other words, while the PDLD/S-LRA may help in the preliminary screening stage, particularly for charged mutants, its relatively small price does not solve the fact that the results are not sufficiently accurate. Here the fact that the price of computer power is rapidly declining suggests that, at least in the final screening stage, we have to use the EVB approach.
The ability to design effective enzymes can be considered as the Holy Grail of biotechnology and for some as the ultimate proof of understanding of enzyme catalysis. As much as computer aided enzyme design is concerned the challenge is not different than that addressed in our early 1986 study of computer aided mutations (10). In fact the simulation method has not changed (the same EVB) but the available computer power increases by four orders of magnitude.
At present the EVB is probably the most effective tool for quantitative screening 28 active site mutants in 24 hours with 100 nodes. Focusing on a few types of typical mutants can further optimize this procedure. Furthermore, a qualitative PDLD/S-LRA screening of 1000 mutations with significant electrostatic effect can be accomplished in 24 hours with 100 nodes and 500,000 can be screened by the same computer power using the group contribution approach.
One of the obvious questions about the performance of our method is the difference between our EVB calculations and the approaches used in other enzyme design studies. The difference is that current enzyme design approaches focus on effective an elegant initial screen but do not use methods that can provide a quantitative scoring. For example, refs. (6, 7) used gas phase calculations to estimated the TS structure and charges and then generate in a rapid screening approach protein configurations that accommodated this TS. Although this approach provided impressive results, it cannot provide quantitative information about the relative energy of different design options. The best way to realize this point is to try to see whether the method used can reproduce the absolute activation free energies, and in our experience it will result with very large errors. Furthermore, the use of gas phase QM calculations for enzyme studies is notoriously problematic (see discussion in e.g. (48)). In other words, an approach that does not treat the protein TS electrostatic free energy in a consistent way cannot capture the preorganization effect and the corresponding catalytic effect. Here our strategy is based on the philosophy that a computational approach for studies of enzymes must be able to reproduce the free energy of the TS in the enzyme active site and that approaches that ignore this requirement will be unable to really reach highly catalytic enzymes unless this is done by experimental trial and error.
It might be instructive to respond to a referee question about the reason for the quantitative performance of the EVB with the ENZYMIX force field as compare to that of some other computer programs. Here we must start by stating that very few research groups specialized in the extensive averaging approach used here and with the use of powerful long-range treatment and reliable polarization boundary conditions. Also most QM/M approaches do not involve the EVB model. Furthermore, the EVB force field has been calibrated on observed solvation free energies rather than on less relevant properties, and has been validated repeatedly on highly relevant properties such as pKas (see a review in (49)). Overall we believe that the accuracy of MOLARIS in reproducing reorganization free energy is a well-established fact (e.g. see recent study ketosteroid isomerase (50) and B12 enzymes (51)) as well as the results of the present study. Nevertheless, other modern simulation programs with properly parameterized force fields would probably give similar results once they implement the full EVB treatment and repeat the procedure used here.
This study has not focused on the early screening steps but, nevertheless, we did consider some key elements that can be used in the early screening stage. For example, we can benefit from the use of the PDLD/S-LRA or the group contribution approach in identifying residues that can contribute to electrostatic stabilization of the TS. Furthermore, in subsequent study we intend to exploit the power of the simplified model and the coarse-grained-reference approach in the early stages of the screening process. This approach can be quite effective in minimizing the energy in the sequence and configurational space.
Imposing relevant constraints in addition to the results of the actual final screening can help the general design significantly. For example, it is useful to have an estimate of the protein stability in the proposed design. Here we consider our recent electrostatic approach (52) as an extremely powerful approach. In fact, the major elements of this approach are already incorporated in the simplified model. Of course, one can use bioinformatics information on similar active sites (53, 54) but this would not solve the problem of designing a new activity.
This study has not explored our ability to design an enzyme in a blind way but in a posteriori way, focusing on the last steps of the screening process. Thus one may argue that our approach is not necessarily a predictive approach, as the results were known before the calculations. While we agree with this assessment, as much as the full design is concerned, we like to point out that as much as the final screening is concerned, the MOLARIS-EVB calculations can be considered as a relatively robust black box which contains no prior information about the enzyme catalytic power. Since this program can be used by anyone it is easy to verify that the reported results do not reflect our bias and that they are fully reproducible Thus we contend that if the program continues to produce similar agreement between calculated and observed mutational effects when applied to other enzymes, it may be justified to consider it as a predictive tool. We are fully aware of the philosophy that a prediction must be something that is done before the results are known, and we have our share of correct predictions (e.g. the primary events in vision (55) and photosynthesis (56) and even predictions of mutational effects (see discussion of Fig 4 in (15)). However, we believe that a systematic validation of enough test cases is, in principle, equivalent to an assessment of a predictive power.
We would also like to clarify that the EVB has been found to be effective only when we are dealing with a reasonable active site constructs and that generating a reasonable active site structure is a challenge that was only addressed here in a preliminary way and where we clearly cannot claim that we have a predictive power.
Recent years witnessed an impressive advances in enzyme design (e.g. (6, 7)). However, the enzymes generated by current design efforts are still far less efficient that naturally evolved enzymes. Although we are yet to see further advances, it seems to us that a part of the problem in previous CAED has been the limited focus on modeling of the actual chemical step in the actual enzyme active site, and in fact using approaches that cannot capture the critical electrostatic preorganization effect. That is, the main problem in designing enzymes with native activity is related to the ability to predict the proper TS stabilization. This difficulty is due to a large part to the difficulty of predicting the preorganization effect. Attempts to evaluate the catalytic effect by using gas phase models or by looking at the electrostatic interaction between different residues and the TS are unlikely to reproduce the correct catalytic effect since (among other problems considered elsewhere (57)) it is impossible to assess the preorganization effect without considering the protein reorganization in the simulations. Here we overcome this challenge by actually calculating the activation barrier by the EVB approach. Thus the present work is not so much about effective early screening but mainly about the requirements of the final stage in the screening process. In this respect we believe that the EVB provides a very effective way for performing the final stage in the screening process. We also provide some promising directions for the initial screening steps.
Finally, it is useful to clarify that our approach can be used to augment the initial screening steps of current CAED approaches. Furthermore, while it is hard at present to compete with the enormous power of directed evolution, the EVB can clearly help in understanding why the evolved enzymes are not perfect and in offering clues on how they can be improved.
We gratefully acknowledge the University of Southern California’s High Performance Computing and Communication Center (HPCC) for computer time. We would like to thank Professor Don Hilvert for insightful discussions and for letting us have his results prior to publication. M.R. thanks the Generalitat Valenciana from Spain for the postdoctoral fellowship.
†This work was supported by Grant GM024492 from the National Institutes of Health (NIH).
1The R90Cit mutation was performed on the native BsCM rather than with BsCM* (44) that contains a D102E mutation because control experiments with this mutation confirms that the mutation does not alter significantly the catalytic power of BsCM.