|Home | About | Journals | Submit | Contact Us | Français|
Combined QM/MM methods provide an accurate and efficient energetic description of complex chemical and biological systems, leading to significant advances in the understanding of chemical reactions in solution and in enzymes. Progress in QM/MM methodology and application will be reviewed, with a focus on ab initio QM based approaches. Ab initio QM/MM methods capitalize on the accuracy and reliability of the associated quantum mechanical approaches, however at a much higher computational cost compared with semiempirical quantum mechanical approaches. Thus reaction path and activation free energy calculations based on ab initio QM/MM methods encounter unique challenges in simulation timescales and phase space sampling. Recent developments overcoming these challenges and enabling accurate free energy determination for reaction processes in solution and enzymes will be featured in this review, along with applications.
Understanding chemical reactions in solution and in enzymes is of ultimate importance. The majority of chemical reactions for synthesis or manufacture takes place in solution. Investigating solvent effects on reaction mechanisms has long been and continues to be an area of important and challenging inquiry. In biological systems, most biological functions are accomplished through the binding of ligands with proteins and/or a series of chemical reactions catalyzed by specific enzymes with different efficiency and specificity. The structure-function relationship and the catalytic role of enzymes is thus one of the most fundamental subjects in biochemical research. It is also essential for the development of new or better inhibitors and enzymes, which have important practical applications ranging from drug design to the development of novel catalysts in industrial processes. As an increasing amount of structural information for proteins and enzymes becomes available, the structure-function relationship becomes an even more important link in biological science. Quantitative tools such as simulations will make key contributions to the investigation of this topic.
In the process of chemical reactions, only a small number of atoms directly participate in bond forming or breaking processes. Many other atoms in the system do not undergo changes in electronic structure, but instead serve as a steric and electrostatic environment to influence the properties and reactivity of the active site. The QM/MM approach, first developed by Warshel and Levitt , is a multiscale/multiresolution simulation approach: the interesting part of the system, such as the active site of an enzyme, is described at the electronic level with quantum mechanics, while the rest of the system is described at the atomistic level with molecular mechanics. This combined quantum mechanical/molecular mechanical methodology allows reliable electronic structure calculations for enzymatic reactions with a realistic and atomistic description of enzyme environment. Such an approach takes advantage of the applicability and accuracy of the QM methods for chemical reactions in systems of several tens of atoms and of the computational efficiency of the MM description for the rest of the enzyme and solvent, which normally consists of many thousands of atoms. Development of combined QM/MM methods has enabled simulations of complex chemical and biological processes, leading to significant advances in our understanding. In particular, simulations have generated considerable insight into chemical reaction mechanisms in solution and in enzymes, as discussed in several recent reviews [2, 3, 4, 5, 6, 7, 8].
The QM/MM approach can be classified into two types according to the level of QM theory used. One type involves the use of semiempirical methods such as MNDO, AM1, PM3, empirical valence bond (EVB), and the recently developed self-consistent charge density functional tight binding (SCC-DFTB) method [9, 10, 11, 12]. The majority of the work in the field is carried out with these methods. The computational efficiency of semiempirical QM/MM is such that direct MD sampling is readily affordable, and thus free energy and reaction dynamics calculations can be routinely performed.
The other type of QM/MM calculation [13, 14, 15, 16, 17, 18, 19, 20, 7] uses ab initio QM via wavefunction theory or density functional theory. DFT is the most popular approach because of the optimal balance of efficiency and accuracy [21, 22, 23]. Recent development in approximate functionals has resulted in even better accuracy at a similar computational cost . However, because ab initio QM calculations are computationally much more demanding than semiempirical methods, rigorous statistical mechanics sampling and reaction dynamics calculations with an ab initio QM/MM method are most challenging.
In the present review, we focus on the theoretical development and application of ab initio QM/MM free energy simulation methods for chemical reactions in solution and in enzymes.
The fundamental properties of a chemical reaction process include the equilibrium distribution of states, kinetics, and the reaction mechanism. Thermodynamic laws tell us that equilibrium in a reaction process is determined by the free energy difference between the reactant and product states. This free energy difference determines the equilibrium, but not the reaction rate in general.
Reaction rate can be determined rigorously using theory based on classical statistical mechanics or quantum statistical mechanics. The exact classical rate constant k(T) at temperature T is [25, 3, 2]
where γ(T) is the transmission coefficient, β = 1/kBT, kB is the Boltzmann constant, and ΔG‡(T) is the molar activation free energy, i.e. the free energy difference between the reactant and TS. Within the transition state approximation, γ(T) = 1 , and ΔG‡(T) then determines the rate.
TS is a special state of the reacting system whose population controls how rapidly the system is able to transfer from the reactant valley to the product valley. If the dynamics of a system is projected onto an order parameter, or reaction coordinate, that characterizes the reaction process, TS is the point with the highest (free) energy on this one-dimension PMF. Identification of TS is crucial not only for the rate determination, but also for the understanding of the reaction mechanism.
For enzymatic reactions, experimental studies provide crucial and indispensable information concerning the reaction mechanism, thermodynamics and kinetics. However, experimental data is often insufficient for determining the detailed reaction mechanism . Most importantly, experimental study cannot directly determine the structure of the transition state, which is crucial for biomedicinal research such as inhibitor or drug design.
Complementary to experimental study, simulations can yield atomistic or even electronic information regarding the effects of site-specific interactions on the reaction process, the reaction path, and the structure of the transition state[14, 6, 2, 3, 12, 4, 27, 16]. Simulation studies of many enzymatic processes have addressed and also raised many important issues in enzyme catalysis such as covalent catalytic mechanisms [28, 29], contribution of the pre-organized electrostatic environment of enzymes[30, 31], the effects of strain and conformational dynamics of the enzyme-substrate complex, non-equilibrium dynamical effects, and quantum tunnel ling effects[32, 33].
Despite numerous simulation studies, the origin of enzymatic proficiency remains a current topic of interest because the contributions from different sources may vary in different enzymes [26, 34]. Accurate simulation methods will definitely help the understanding of those complex interactions in the enzymatic catalysis.
Solvent molecules play significant roles in chemical reactions in solution [9, 35, 36, 33]. Energetically, the complex electrostatic interactions between the solvent and solute molecules create a reaction environment drastically different from gas phase, resulted in different chemical equilibrium and reaction rate. Dynamically, fluctuation and diffusion of the solvent molecules serve an energy bath to the motions of the reaction moieties, resulted in different relaxation and barrier-crossing dynamics. Last but not least, direct participation of the solvent molecule in the reaction not only creates a concentration effect, but also may alter the reaction path.
Solution reactions are important references for enzymatic reactions. In most cases, the slow reaction rate of the non-catalyzed solution reaction is the evolutionary driving force for enzymes; the exceptional proficiencies of enzymes are usually determined by the degree of difficulty of the solution reaction . From this perspective, understanding enzymatic proficiency requires determining the differences between the enzyme catalyzed reaction and its corresponding solution reaction.
Solution reactions are often simulated with varying levels of approximation, particularly for simulations employing ab initio approaches. One common practice is to simplify the description of the solvent by using a continuum representation. This reduces the number of degrees of freedom in the simulation, but the isotropic continuum-medium model cannot always correctly reproduce the anisotropic, site-specific interactions between the solute and solvent molecules. To make reliable comparison, both the solution and enzymatic reactions must be simulated at the same level of accuracy.
The first requirement for simulating a reaction process is a potential energy surface that accurately describes the chemical changes with the redistribution of electrons. To account for this effect, in general quantum mechanics becomes necessary.
With currently available computing resources, a high-level ab initio QM description of the entire system is affordable only for small systems of less than 100 atoms. For reactions in solution or in enzymes, too many electronic degrees of freedom of the system make it difficult, if not impossible, to describe the whole system quantum mechanically. Although semiempirical QM methods using a linear scaling method such as the divide-and-conquer approach have been applied to large proteins and enzymes [38, 39, 40, 41, 42, 43], the computational speed is still a factor and the results may be less reliable because of the semiempirical approximations.
An effective and attractive solution to the energetic description of macromolecules is found in the combined QM/MM method [1, 11]. By partitioning the entire system into QM and MM subsystems, the total internal energy of the QM/MM system can be written as
The first two terms on the right hand side are the QM internal energy and the electrostatic energy between the QM and MM subsystems, for which the calculation is described in next section. The remaining three terms are the van der Waals energy between the QM and MM subsystems, the covalent interaction energy between the two subsystems, and the purely MM interaction energy of the MM subsystem, respectively, which are described with a given MM force field.
Due to the diversity in their forms and parameters, simulations performed with different MM force fields may show somewhat large discrepancies in the results . Instead of MM force fields, an effective fragment potential method has been developed to derive a simple potential energy function for complex systems from ab initio QM calculations .
Computing the first two terms on the right hand side of Eq. (2) is the core of the QM/MM methods. A straightforward procedure is to compute the two terms together by including the electrostatic potential from the MM atoms in the QM calculation; that is
where Heff is the effective QM Hamiltonian including the MM electrostatic potential. In this charge embedding scheme, the QM internal energy and the QM/MM electrostatic energy are computed together in a self-consistent manner. In other words, the polarization of the QM subsystem due to the presence of the MM subsystem is captured at the same level of QM theory.
In the mechanical embedding approach represented by the class of ONIOM methods, the systems are hierarchically split into different layers that are described at different levels of theory. The energy function of the outer layer automatically includes the energy of the inner layer but it is described with a lower-accuracy method.
When one or more covalent bonds connect the QM and MM subsystems, how to handle the boundary between the QM and MM subsystems becomes an important question. Several approaches have been developed [13, 47][48, 19, 49, 20, 50]. Traditionally, “link” hydrogen atoms are added to the MM-bonded QM atoms to saturate the valence orbital of the QM subsystem. Using additional hydrogen atoms to cap the QM subsystem is easy to implement, but the resulting system is thermodynamically different from the original one because the total number of atoms is different.
In contrast, two other approaches, namely the pseudobond method [13, 47] and the frozen local orbital method [48, 19, 49, 20, 50], do not bear such problems. In the pseudobond method, the MM-bonded QM atoms are assigned a special basis set and an effective core potential that is designed to mimic the correct covalent bonding involving the boundary QM atoms. By making such atoms with a free valence of 1, there is no need for additional atoms to saturate the QM/MM covalent bonds. Several methods, following the pseudobond method in using effective core potentials, have subsequently been developed, including: the quantum capping potential method [51, 52], the effective group potential technique , effective Hamiltonians from a minimum principle , variational optimization of effective core potentials for molecular properties  and multicentered valence-electron effective potentials .
In the frozen local orbital method, a set of specially designed local orbitals are assigned to the boundary QM or MM atoms to maintain closure of the QM subsystem. In different implementation schemes, the magnitude of the neighboring MM charges, the positions of the MM point charges, and the positions of the frozen orbitals can vary.
In connection with this QM/MM boundary problem, another important issue is how to compute the electrostatic interactions between the QM subsystem and the nearby MM atomic charges, in particular those MM atoms in covalent contact with the QM atoms. Direct inclusion of those MM point charges in the QM effective Hamiltonian can cause charge penetration and off-balance polarization of the QM electrons. To address this problem, rescaling of the MM charges  or smearing the MM charges by Gaussian distributions have been developed .
The importance of the correct description of long-range QM/MM electrostatic interactions has not been explored in most simulations, as many QM/MM simulations have been performed with stochastic boundary conditions. The inclusion of long-range QM/MM electrostatic interaction is not trivial because the well-established Ewald summation method may not be directly applicable. In fact, several technical concerns must be addressed.
The first issue concerns the treatment of a periodic QM subsystem. In biomolecular simulations, usually the simulation box must be large enough so that the interaction between the QM subsystem, which is often charged, with its images is negligible. The second concern is how to consider the periodically distributed MM charges in the QM calculation if the charge-embedding scheme is used. Even though Ewald-type methods have been developed for semiempirical methods[58, 59], a similar development for ab initio QM/MM methods is still lacking.
No doubt the long-range electrostatic interaction is important for stabilizing the enzyme structures and may also be important for enzyme functions. However, the key question in both concerns might be whether or not the long-range polarization is significant for QM subsystem. It has been observed that even for a charge-transfer reaction, the QM polarization effects are small when the MM point charges are more than 9 ~ 14 Å from the QM atoms . Beyond this distance, the MM point charges merely contribute by providing a static electrostatic potential. Therefore a simple technical scheme has been adopted in our simulations with periodic boundary condition [61, 62, 60] in which a cutoff of 9 ~ 14 Å was used for selecting MM charges for the QM SCF calculation. The QM calculation yielded polarized QM ESP charges that were in turn used to represent the QM subsystem during MD simulations with the long range electrostatic interactions treated by the Particle-mesh Ewald Method .
To avoid the technical difficulties associated with periodic QM/MM systems, Cui and co-workers have made important progress in approximating long-range QM/MM electrostatic interactions by implementing a stochastic generalized solvent boundary with the SCC-DFTB method [64, 65]. York and co-workers have developed a variational electrostatic projection method that employs a continuum solvent model for the long-range electrostatic interactions .
A reaction path provides a clear picture of a chemical reaction mechanism as well as quantitative energetic information regarding the reactant, transition state, and product, which are required for reaction rate calculations. A reaction path also serves as the structural basis for the detailed analysis of site-specific interactions involved in the reaction process and can be used in the design of new inhibitors.
For reactions of small molecules in the gas phase, reaction paths can be determined as MEP on the total PES. The TS on the PES of the entire system plays the most important role for the rate. The free energies of the reactant and the TS can be determined with the harmonic approximation, based on QM calculations of frequencies.
For reactions in solution and in enzymes, the TS on the PES of the entire system only represents one of many such states, because there are many degrees of freedom (e.g. for the solvent and the protein) that are not directly related to the chemical changes but modulate the shape of PES. In such cases, the progress of a reaction can also often be marked by the structural changes of a reduced set of atoms, e.g. the solute conformation for solution reactions or the conformation of the substrate plus certain active site residues for enzymatic reactions. Quite conveniently, the reduced set of coordinates can be the same as the QM subsystems. Thus, one can construct the MFEP, a reaction path defined on the PMF of a reduced set of coordinates [61, 67, 68, 69, 70]. The contributions from the rest of the system are ensemble-averaged quantities obtained during MD simulations.
Usually defined as a set of geometric or energetic parameters, the reaction coordinate characterize a reaction as the one-dimensional profile connecting reactants to products. The choice of the reaction coordinate is critical. It has been shown that the improper choice of a reaction coordinate can bias the simulation and yield slower convergence . The problem of an inappropriately chosen reaction coordinate is more severe in simulations using coordinate driving techniques in which the choice of the reaction coordinate not only strongly influences the efficiency of sampling the phase space, but also causes technical difficulties in generating continuous reaction path. In many cases, the determination of the reaction coordinate is non-trivial, especially for many complicated reactions catalyzed by enzymes as the changes in specific geometrical quantities (e.g. interatomic distances) might be stepwise or nonlinearly correlated. Furthermore, the environmental degrees of freedom may also contribute to the reaction coordinate as dynamic effects .
The mathematical methods for determining reaction path are the same for the MEP on the total PES as for the MFEP on the PMF surface of the QM subsystem. The classical method for determining a reaction path is the intrinsic reaction coordinate method in which the steepest descent path is computed from the TS by following the gradient downhill. This method is not efficient for macromolecular systems and also requires knowledge about the structure of the transition state, which is a difficult task without prior knowledge of the reaction path.
Several algorithms have been developed for the determination of reaction paths for large systems in which a chain of conformations along the reaction coordinate is simultaneously optimized. Commonly used methods include the nudged elastic band (NEB) method, the Ayala-Schlegel second order minimum energy path method, and the string methods . Our experience indicates that the NEB method is simple to implement but converges slowly and has difficulty locating transition states. The Ayala-Schlegel method can converge to the correct path but its application to large molecular systems is troublesome because it requires an initial guess that is enough close to the exact path. As an alternative, a quadratic string method (QSM) was developed in our laboratory which has been shown to yield better performance than the NEB method . It has also been found that for large systems it is often important to select a small subset of coordinates as the chemical metric to define the path length for properly positioning the states along the reaction path .
One key component of MEP determination is the minimization of the QM/MM total energy, in which the QM calculation of energy and gradient is the bottleneck. To reduce the number of QM energy and gradient evaluations, Zhang, Liu, and Yang first developed an efficient iterative optimization for such a minimization in which the QM and MM subsystem optimizations are carried out sequentially, rather than concurrently . The key is to use a simplified, often QM ESP charge based, energy function during the optimization of the MM subsystem. Friesner and co-workers introduced a correction term to improve the ESP charge approximation , and a similar method was proposed by Thiel and co-workers . To improve the ESP charge approximation, Morokuma and co-workers have employed multiple QM calculations in the MM optimization process, as well as ESP multipoles .
Instead of classical rate theory that depends on the determination of the TS, the reaction rate can be determined by the probability ratio of finding successful dynamic reaction paths in multi-dimensional phase space, following the transition path sampling method . The simulation of a large scale conformational transition in an enzyme has also been reported . The major obstacle for the application of this method to reaction processes is the high computational cost for ab initio QM/MM calculations because a large number of paths must be sampled to ensure a converged ratio of successful/unsuccessful paths.
The calculation of the QM internal energy and QM/MM electrostatic interactions in Eq. (3) is the bottleneck for QM/MM calculations. In the charge embedding scheme, rigorous treatment of this term requires a SCF calculation for each different QM or MM conformation, which is quite costly for ab initio QM methods. It would be desirable if one can develop an approximate, ab initio-based QM/MM energy function for phase space sampling and long timescale simulations without the need for SCF calculations.
Lu and Yang developed the ab initio RPP method . The approach involves separating the QM energy into two components: a QM internal energy and an electrostatic interaction energy between the QM and MM subsystems. Each component is then expanded analytically in terms of both the fluctuations of the QM geometry and the MM electrostatic potential. The resulting RPP provides a simple, analytic expression for the QM/MM total energy that is valid in the vicinity of the reaction path.
RPP approximates the electrostatic term as the Coulombic interactions between the QM ESP charges (and multipoles) and the MM atomic charges, i.e.
Here, Qi(rQM, rMM) is the ESP fitted charge of QM atom i, and qj is the point charge of MM atom j from the MM force field. Whereas the MM atomic charges are constant in common force fields, the QM electrostatic charges are clearly dependent on both rQM and rMM. The QM internal energy, E1(rQM, rMM), is then defined as
This QM internal energy is the energy of the QM system in the presence of the electrostatic potential of the MM atoms, minus the Coulombic interactions between the QM ESP charges and MM atomic charges. The QM energy also depends on both rQM and rMM. Obviously, the term captures the essence of the electrostatic interactions between the QM and MM subsystems. The significance of this QM ESP charge expression is that it provides us great flexibility as the interactions are now expressed in a classical pairwise MM form. To improve the accuracy of ESP fitting, we can add electrostatic multipoles to each atom or bond.
The most important advantage of this separation scheme is that we can now introduce polarization effects into both terms as high-order perturbations. In the RPP model, the QM ESP charges are assumed to respond linearly to changes in both the external electrostatic potentials at the atomic sites and the geometries of the QM system, as
where is a reference ESP charge for QM atom i. The QM reference charges are determined at a given QM geometry and with a given external MM electrostatic potential at the position of the QM atoms. After the perturbation of the QM geometry and the external MM electrostatic potential, the polarized charges are determined through two response kernels, namely [81, 82],
The second kernel can now be computed analytically with a recently introduced ESP fitting method. Just as in the polarization effects introduced in the term , we also expand the term E1(rQM, rMM) in a similar manner [81, 62].
The approximate QM/MM total energy becomes
It has been shown that this linear-polarization QM ESP charge model yields quite accurate energetics for reaction systems, given a good initial reference state [81, 61]. It is even possible to further simplify this QM ESP charge model by truncating the polarization effects of the QM ESP charges [61, 62]. Applications with these simplified QM ESP charge models in the ab initio QM/MM simulation of reaction processes have been successful.
Many methods have been developed for computing the free energies of important chemical processes on the basis of classical statistical simulations. Most popular methods include free energy perturbation, thermodynamic integration (TI), umbrella sampling, slow growth, and a recently developed fast growth method . If one is only interested in equilibrium states, a classical free energy simulation method might be employed together with a so-called “alchemical” molecular transformation technique to provide the free energy difference between stationary states of a reaction process.
It has been pointed out that the direct application of TI and FEP with a QM/MM energy function requires special care for handling the end states . In classical MM force fields, the energy functions are continuous and (almost always) infinitely differentiable with respect to the variables. As a result, without any difficulty one can create nonphysical states of the molecule, e.g. partial existence of an atom or a group, in the simulation process. Nonetheless QM methods in general do not easily allow a direct smooth process of creating or annihilating atoms. To overcome this difficulty, many methods have been developed including mixing of energy functions [9, 86], utilization of a reference state [87, 88, 89, 90], and dual-transformation .
Compared to their critical roles in the MM simulations, classical alchemical free energy simulations are less popular in the QM/MM simulations of the condensed phase reactions. Two exceptions are the calculations of residue pKa of protein molecules  and redox potentials [86, 89, 60].
When information regarding the reaction rate or the TS structure is sought, explicit modeling of the reaction process becomes necessary. Given an appropriate reaction coordinate, sampling-based free energy calculation methods can be applied to the study of reaction processes if the QM calculations are fast enough, as in semiempirical QM methods. In such cases, the system is driven from one state to another along the reaction coordinate and the PMF is computed during this driving process.
Umbrella sampling method computes the PMF by improving the sampling of high-energy, low-probability regions of phase space. Usually the results of umbrella sampling are processed with the weighted histogram analysis method, which provides reduced statistical error. Recently, Thiel and co-workers have developed an “umbrella integration” method which combines the advantages of umbrella sampling and thermodynamic integration .
Parrinello and co-workers  have developed a meta-dynamics method in which the energy surface of the target system is gradually modulated to allow enhanced sampling of rare conformational states.
Slow growth is in fact a variant of TI. Instead of sampling a few fixed points on the reaction/transformation path and computing a converged free energy gradient for each point, slow growth slowly drives the system from one state to the other using a very small step size. The work exerted during this process is summed up to yield an upper-bound estimate of the free energy difference between the two states. The reverse process yields a lower-bound estimate. Together, the results of multiple forward and backward processes yield a good estimate of the free energy difference . When the slow growth method is combined with the coordinate driving technique, the free energy change with respect to the reaction coordinate may be obtained.
In analogy to the slow growth method, a “fast growth” method  has been developed and has been used to simulate chemical reactions [96, 97]. Unlike the slow growth method in which each simulated work is an estimate of the true free energy, in fast growth the ensemble averaged work is the upper-bound (or lower-bound for the reverse process) estimate of the true free energy. This estimate slowly converges to the true free energy upon the convergence of the ensemble. Since each individual simulation is often only in pico-second timescale range, ab initio QM/MM simulations at this timescale might be possible. However, the main drawback of the fast growth method is that it requires many simulations to yield a good estimate of the free energy, which always possess a large amount of uncertainty .
It is important to realize that the phase space sampling methods based on PMF calculation along a reaction coordinate usually require converged sampling, and of course prior knowledge about the reaction coordinate. Typically, this method is affordable for semiempirical QM/MM methods. As for ab initio QM/MM, few simulations have been reported[99, 62]. With the steady improvement of computer speed, direct sampling methods may play more important roles in the reaction simulations.
Aimed to reduce the costs of ab initio QM calculations, two methods have been developed utilizing the information about the reaction coordinate and path. Jorgensen and co-workers have developed a QM-FE method, in which the reaction path optimized for gas-phase reaction process is used to carry out free energy simulation in condensed phase . Warshel and co-workers have developed a QM(ai)/MM method, the key of which is that the sampling and free energies are first computed with a simplified EVB potential, and then corrected to ab initio QM level with FEP combined with the linear response approximation [87, 90].
Realizing the importance of the reaction path, Yang and co-workers have devoted much effort into developing reaction path optimization and free energy calculation techniques based on ab initio QM/MM methods [14, 61, 62]. The first method, termed QM/MM-FE (or QM/MMFEP) , is carried out in two stages: optimization of the reaction path and calculation of the free energies for the path.
The reaction path determination has been designed as a sequential optimization process. To exemplify the advantages of this sequential approach, let us first examine the situation where the QM and MM degrees of freedom are optimized concurrently. In this case, each change of the QM and/or MM geometries will require a new QM evaluation of the energy and gradient. While the energy and gradient calculation is computationally inexpensive for semiempirical QM methods, it is expensive for ab initio QM methods. As it usually takes hundreds to thousands of minimization steps to obtain an adequately converged structure for the whole QM/MM system, the cost for ab initio QM calculations would be prohibitive. Inspired by the fact that in certain types of optimization problems it may be desirable to break up a process into an iterative sequence of two or more steps, we perform the optimization as follows: (i) optimize a subset of the system, A, with the rest of the system, B, held constant; (ii) optimize B with A fixed according to the results obtained in step (i). Thus in the QM/MM-FE method, instead of a concurrent optimization of the QM and MM degrees of freedom, an iterative, sequential optimization protocol was developed that has proven to be effective in reducing the number of QM energy and gradient evaluations. The main idea is that, starting from a given structure of the QM/MM molecular system, the optimization is separated into two processes. One first optimizes rQM with fixed rMM, which is at an approximate minimum in the MM degrees of freedom. Afterwards, the conformation of the MM subsystem, rMM, is optimized with fixed rQM. To reduce the number of QM evaluations, an approximate QM/MM total energy, , is used for the MM optimization. In this approximate QM/MM total energy, the electrostatic interactions between the QM and MM atoms are approximated by the Coulombic interactions between the point charges of the MM atoms and the ESP fitted charges of the QM atoms. This process is then iterated until convergence, which is normally achieved within a few iterations (often less than 10). Expressed as an algorithm, the ab initio QM/MM-FE optimization procedure is as follows:
This process can be carried out individually for a single point on the reaction path, e.g. the reactant and product states, or simultaneously for a chain of conformations along the reaction coordinate with a chain-of-states optimization algorithm such as the NEB method, the Ayala-Schlegel method, or the superlinearly convergent QSM.
Once the reaction path is determined, a FEP simulation with the approximate QM/MM energy function can be carried out for the optimized QM conformations on the reaction path, similar to the QM-FE method . The validity of QM/MM-FE has been confirmed by other laboratories [100, 101, 102].
The development of the QM/MM-FE method has provided a viable way for computing accurate free energies of reactions in enzymes. But one limitation has hampered the application of this method to the simulation of solution reactions. In the QM/MM-FE method, the reaction path is optimized on the QM/MM PES, starting from a given initial structure. As a result, the optimized path is influenced by the choice of the initial conformation. In many enzyme-substrate complexes, the dependence of the initial conformation may not cause problems because the active site of the enzyme is usually protected from bulk solvent. When the reaction occurs in solution, or the enzyme active site is exposed to solvent, this dependence will be significant. The rapid exchange of solvent molecules can also cause difficulty for the convergence of the path optimization process. Similar observations have been made for the enzymatic reactions in which the enzyme undergoes significant conformational changes during the reaction process.
To overcome this problem, the QM/MM-MFEP method has been developed in which the reaction path is optimized on the PMF surface of the QM degrees of freedom, instead of the total energy surface . Within the QM/MM context, the thermodynamics of the entire system is simplified by defining the PMF of a QM/MM system in terms of the QM conformation as
where E(rQM, rMM) is the total energy of the entire system expressed as a function of the coordinates of the QM and MM subsystems, rQM and rMM, respectively. The gradient of the PMF, also known as the free energy gradient, is then
which appears conveniently as the ensemble average of the gradient of the QM atoms, obtained from MD simulations of the MM atoms. In practice, we used the free energy perturbation method and its associated gradient, instead of Eqs. (12-13) [61, 62].
Our construction of the PMF and PMF gradient is different from other work in terms of the variables of the PMF. For reasons we have discussed in the section of reaction coordinates, we allow all of QM degrees of freedom to contribute to the reaction coordinate whereas others have often used one or a few predefined geometric terms. In the latter case, a Jacobian term may be required to correctly include the effects of geometrical constraints on the reaction coordinates.
Because the QM degrees of freedom are coupled with the MM degrees of freedom, a straightforward minimization algorithm requires each step in the optimization of the QM conformations to be associated with converged sampling of the MM ensemble. In this optimization scheme, every QM optimization step to a new conformation on the PMF surface is followed by a course of MD sampling of the MM conformations, usually with a simulation time of 100 ~ 1000 ps. Such extensive MM sampling is required so that the QM PMF and gradient obtained are sufficiently accurate for successful structure and reaction path optimization of the QM subsystem. Thus, this method is less efficient in practical simulations. First, 100 ~ 1000 ps of MD simulations on a system of ~ 10,000 atoms is expensive. Second, such MD simulations must be repeated for each step in the QM optimization process. The intrinsic fluctuations and limited simulation times for the MD sampling may also contribute to slow convergence in the path optimization if a new MD simulation is always begun immediately after every QM geometry optimization step.
As in the QM/MM-FE method, to improve the efficiency of the QM/MM-MFEP method, we can reformulate the concurrent optimization of the QM subsystem and the statistical sampling of the MM subsystem into an iterative step of sequential MD sampling of the MM system at a fixed QM structure and subsequent optimization of the QM subsystem within the fixed MM conformational ensemble . The approximate energy function , introduced in the QM/MM-FE method and well described in the RPP model, plays a crucial role in reducing the number of QM energy and gradient evaluations required. In the MD sampling of the MM conformations, acts as a reference sampling energy function to drive the motion of the MM atoms without performing a QM calculation at every MD step. In the optimization of the QM subsystem within the fixed-size MM conformational ensemble, it can be used to avoid QM calculations associated with each new conformation in the MM ensemble.
The algorithm of the QM/MM-MFEP method can be described as follows.
The key feature of our new QM/MM-MFEP algorithm is the iterative QM optimization in a fixed MM ensemble, Eq. (14) above. Because the MM ensemble, , is finite and remains fixed throughout the course of the QM optimization for , one can obtain the precise PMF, Eq. (15), and its gradient, Eq. (16), defined within this ensemble. This circumvents the difficult and costly convergence problems associated with MM sampling. The optimization of the PMF can be carried out efficiently using classical numerical optimization tools. Each optimized QM structure in turn provides the next reference QM structure and its energy function, , for the next round of MD sampling of the MM conformations. Each optimized QM structure should improve on the previous one by providing a better QM geometry and corresponding ESP charges for the MM simulation in the next cycle.
The use of a finite, fixed-size ensemble of MM conformations improves the utilization of the MM conformations and avoids repetitive MD sampling at each step of the QM structure optimization. Thus, instead of performing excessive MD samplings that likely have significant overlap with each other, a few cycles of MD simulation are sufficient to yield converged results in the current method. Applications have shown that our QM/MM-MFEP method converges as efficiently as the QM/MM-FE method.
The classical dynamic effects beyond the classical TS theory is described in the transmission coefficient γ(T), the determination of which requires an MD simulation starting from the TS . Main quantum mechanical effects can be included in the quantum PMF calculations. There are only several ab initio QM/MM studies reported so far. Cui calculated γ(T) for the proton transfer reaction catalyzed in TIM using variational transition state theory . Wang et al. used the reaction path potential, which enables rapid evaluation of the potential energy around the reaction path, to calculate γ(T)  and also the quantum PMF with a centroid path integral approach . When the reaction path and its TS captures the correct physics, TS theory is a good approximation, and γ(T) is nearly 1 [3, 103, 104, 105]. Wang’s calculations on TIM resulted in kinetic isotope effects in excellent agreement with experiments and revealed that the main QM effect is in the zero-point energy.
Many enzymatic reaction processes have been modeled by QM/MM methods, of which most were investigated with semiempirical QM/MM methods. As only a small number of cases have been studied using ab initio QM/MM calculations, the number of applications of ab initio QM/MM free energy simulations are even fewer because of the difficulty of combining ab initio QM with statistical sampling. A comprehensive list can be found in Ref . In this section, we briefly review several enzyme systems whose mechanisms have been scrutinized by ab initio QM/MM, some in combination with free energy simulation.
The enzyme orotidine 5′-monophosphate decarboxylase (ODCase) is one of the most proficient enzymes known and catalyzes the decarboxylation of orotidine 5′-monophosphate without any cofactor or metal ions. The solution reaction has been modeled with a continuum solvent model  which supported a carbene intermediate state. Semiempirical simulations have supported the direct decarboxylation mechanism [107, 108]. However the reaction barriers computed in later DFT simulations, including a combined CPMD and fast growth simulation, were still higher than the experimental measurement [109, 110, 97].
The enzyme 4-oxalocrotonate tautomerase (4OT) catalyzes the conversion between 2-oxo-4-hexenedioate and 2-oxo-3-hexenedioate. Cisneros et al. have carried out extensive QM/MM-FE simulations on the reaction mechanism and identified the absence of a catalytic acid and the role of an ordered water and the protein backbone in the catalysis. [16, 111, 112, 113] Thiel and co-works have also studied the mechanism of the enzyme and its mutant .
One enzyme that has undergone extensive characterization is triosephosphate isomerase (TIM), which catalyzes the reversible isomerization of dihydroxyacetone phosphate (DHAP) to D-glyceraldehyde-3-phosphate. The mechanism has been examined by semiempirical QM/MM methods [115, 116]. Later, several DFT-based QM/MM studies were been carried out [14, 117, 17, 118]. With a higher resolution structure of the enzyme, Friesner’s group modeled the reaction process and obtained results in good agreement with experimental data . The correlation between the reaction and the loop motion has been explored in an interesting way in which the QM/MM simulation is employed in combination with a loop structure prediction algorithm.
The reaction of catechol O-methyltransferase has been simulated by Kollman and co-workers with ab initio QM/MM and QM-FE methods [120, 121, 122]. Rod and co-workers also simulated the reaction mechanism with a recently developed quantum mechanical thermodynamic cycle perturbation method .
Enolase has been simulated by both semiempirical and ab initio QM/MM methods [123, 15]. Chorismate mutase catalyzes the Claisen rearrangement from chorismate to prephenate. In addition to many semiempirical QM/MM simulations, recently it was modeled by an ab initio QM/MM method with optimization of reaction path [124, 125, 126]. For its medicinal significance, β-lactamase has been explored by many groups. Different reaction steps have been examined in detail [127, 128, 129]. Acetylcholinesterase has been simulated by QM/MM-FE with an emphasis on the effects of conformational dynamics on the reaction process [130, 131]. cAMP-dependent protein kinase [132, 133], histone lysine methyltransferase SET7/9 [134, 135, 136], and bacterial peptide deformylase [137, 138] have been studied by the QM/MM-FE method. The reaction mechanism of methane monooxygenase has been explored in detail by Friesner and co-workers[139, 140], as well as the P450cam pathway .
An important issue in modeling enzyme catalysis is the role of conformational dynamics. Special interest has been raised recently for the contribution of conformational dynamics to enzyme catalysis . For the study on this topic, a great deal of attention must be focused on two considerations: the timescale of the conformational dynamics and the coupling scheme between the enzyme dynamics and the reaction process. As agreement has been reached that the transmission coefficient in the transition state rate equation is affected by the stochastic dynamics of the enzyme but its variation has a small effect on the reaction rate in general , the challenge is to understand if/how the collective conformational dynamics of the enzyme has an impact on the reaction rate .
In the current popular form of the QM/MM approach, MM force fields are not polarizable. As efforts are underway to develop polarizable MM force fields, schemes have been proposed to combine QM with polarizable MM force fields [146, 147, 148, 149, 150], but more testings might be needed before a specific scheme is employed. In two scenarios this issue will become more complicated. One is the mutual polarization at the QM/MM boundary; the other is the charge transfer effect between the QM and MM atoms. Generally, the implementation and testing of polarizable MM force fields would be one immediate important topic in the QM/MM methodology development.
With the recent development of ab initio QM/MM free energy simulation methods, realistic and accurate simulations of reactions in solution and in enzymes become feasible. Application of such methods will provide comprehensive understanding of reactions in solution and in enzymes.
We are grateful to the National Institute of Health and the National Science Foundation for support.