|Home | About | Journals | Submit | Contact Us | Français|
Evaluating the free energy landscape of proteins and the corresponding functional aspects presents a major challenge for computer simulation approaches. This challenge is due to the complexity of the landscape and the enormous computer time needed for converging simulations. The use of simplified coarse grained (CG) folding models offers an effective way of sampling the landscape but such a treatment, however, may not give the correct description of the effect of the actual protein residues. A general way around this problem that has been put forward in our early work (Fan et al, Theor Chem Acc (1999) 103:77-80) uses the CG model as a reference potential for free energy calculations of different properties of the explicit model. This method is refined and extended here, focusing on improving the electrostatic treatment and on demonstrating key applications. This application includes: evaluation of changes of folding energy upon mutations, calculations of transition states binding free energies (which are crucial for rational enzyme design), evaluation of catalytic landscape and simulation of the time dependent responses to pH changes. Furthermore, the general potential of our approach in overcoming major challenges in studies of structure function correlation in proteins is discussed.
The elucidation of the landscape of proteins is an important element in attempts to explore the energetics and kinetics of the folding process. Furthermore, the ability to explore and sample large regions in the accessible conformational space can help in improving the description of functional properties and in exploring possible relationships between landscape and functions. Unfortunately, detailed sampling of protein landscapes requires enormous computational resources that are not generally available. Thus it is important to develop multiscale approaches that will allow one to effectively generate the free energy surface for folding and related processes. Simplified protein models have been in use since our early work in 19751 and are being used extensively in folding studies2-8. Furthermore, recent efforts have extended our early 1975 idea into membranes (see MARTINI force field ref9). However such coarse grained (CG) models are, by their nature, approximated and it is important to be able to move from the coarse-grained description to more realistic explicit models. A general way to resolve this problem has been introduced by us some time ago10, where we proposed the use of a simplified model as a reference potential for explicit calculations of folding free energies. This idea, which can be now classified under the umbrella name of multiscale modeling, became even more important in recent years in view of the interest in the relationship between landscape and function7,8,11-13.
The idea of using reference potential in multiscale modeling has been exploited by us in a wide range of problems including accelerating QM/MM calculations (ref14-16) and path integral calculations of nuclear quantum mechanical effects (ref17,18). The idea of using CG model as reference potential in folding studies10 and related processes has been also explored recently by other workers (e.g. refs8,19,20). Here we describe the details of the current version of our multiscale landscape modeling and provide instructive illustrations of its use in calculating the change in protein stability upon mutation. We also outline other promising applications of this powerful model.
The simplified protein model used in this work is an extension of the model originally developed by Levitt and Warshel1 and modified in ref.10 In the current model we have made additional refinements of the original model, focusing on the introduction of the specialized solvation terms, designed to replicate the effects of the missing solvent molecules and on other factors that are not included explicitly.
Our simplified protein model, which is depicted schematically in Fig. 1, is created by replacing the side chain of each residue by an effective “atom” named (X) and an additional dummy atom named (D). The atom X is usually placed at the geometrical center of the heavy atoms of the corresponding side chains (with a residue dependent charge and van der Waals equilibrium radius), where, the distance between the newly placed X atom and the Cα atom, rCa−X, depends upon the type of the side chain of the residue it replaces. The dummy atoms are placed along the corresponding Cα - Cβ vectors and serve as tools for rotational transformations in the process of moving between the simplified and explicit models. That is, the Cα − D bond is used to define the Cα - Cβ bond of the explicit model. The actual rotational transformations are done by using simple rotation around the specific bond. The Dummy atoms do not have any charge or van der Waals interaction with the rest of the system. The backbone atoms of each residue are treated explicitly, and the interactions between main chain atoms are identical to those used in the explicit model but then modified to reflect the missing solvent terms. In the case of ionized residues, we shift the rCa−X of atom X to the charged center of the given residue.
The potential energy surface of the simplified (sp) model is written as
Where m and s designate, respectively, “main” and “side”, U0 is the gas phase potential of the indicated term, is the missing effect of the solvent and the protein. The more detailed expression on the right hand side involves the effective steric potential (Uef), the screened interaction between different charges (Q and q designate full charges of ionized groups and atomic residual charges, respectively), the ΔU terms represent the corrections to the indicated potential terms due to the solvent and is the self energy of the ionized groups. The nature of the different terms is described below.
The interaction between the simplified side chains is given by,
where is described by an “8-6” potential of the form,
where and . The parameters and define, respectively, the well depth and equilibrium distance. The unitless parameter Cscale has a value of 10 when is calculated for polar - non polar and ionized – non polar residues only. For side chain interactions between polar – polar and non polar – non polar residues, Cscale has a value of 1. These parameters were refined by minimizing the root-mean-square deviations between the calculated and observed values of both the atomic positions and the protein size (i.e., the radii of gyration) for a series of proteins. The corresponding refined parameters are given in Table I, together with the corresponding rCa−X parameters. The term represents the charge-charge interaction in the gas phase. This term is given by,
where Qi designates the charge of the ith ionized residue, εeff is the effective dielectric for charge-charge interactions. Here we reflect the idea established in many of our works (e.g. refs21,22) and used a large effective dielectric, εeff (εeff = 40) even for protein interiors. This type of dielectric has been found recently to provide very powerful insight in studies of protein stability (see refs21,23) and thus expected to be very useful in modeling the electrostatic contribution to the stability of the simplified model.
The effective electrostatic interactions between the side chains and the main chain and between the main chain groups and themselves are given by
Here Qi and qk designate the charges of ionized residues and residual atomic charges on the main chain respectively (there are no residual charges on the simplified unionized side chains). The different dielectrics represent the compensation of the gas phase electrostatic interactions by the solvent and the protein. The dielectric for the Q and q terms is taken (following our previous studies22) as εeff = 10 for charge (Q)-residual (q) charge interaction and εeff = 4 for residual charge(q)-residual charge(q) interaction. The potential, U, is given in kcal/mole, where Q and q are given in atomic charges and r in Å. The factor 332 in Eq. (5) is used for these units (see Warshel and Russell early work24 for further details). Note that is the sum of the gas phase qq part of Umm plus the screening by the surrounding solvent and the protein. This treatment also includes the residual charge interactions that are involved in hydrogen bonding.
The self energy term represents the interaction of the side chain with its environment and accounts for the crucial free energy of transferring the side chain to the protein when all other groups are not ionized (note that the effect of interaction with other ionized groups is accounted separately by ). The overall term is given by,
where i runs over all ionized residues and and are the number of nonpolar and polar residues in the neighborhood of the ith residue. The functions Unp and Upolar are given by
The number of nonpolar residues neighboring the ith ionized residue is expressed by the analytical function
where rnp is the nonpolar radius defining the cutoff range of nonpolar neighboring residues (typically rnp = 7Å), and rij is the separation between the ionized residue i and nonpolar residue j. The same expression is used for neighboring polar residues (Npolar)., where rp = 7Å. This treatment is aimed at capturing the fact that an ionized group has to pay very large amount of energy for moving from water to a nonpolar environment24,25 and is usually surrounded by polar residues or water molecules22,24.
It should be noted that the calculations of Np, and Nnp are done while treating the ionizable residues as non polar residues. This is done since the effect of charging these groups is taken into account separately by the term.
The parameters in Eqs. (6-10) were determined (see section 3.1.3) by comparing the results obtained with these equations to the results of the actual solvation energy evaluated by the semimacroscopic version of the Protein Dipole Langevin Dipole (PDLD/S) in its linear response approximation (LRA) version (PDLD/S-LRA).
In principle it would be useful to include in the hydrophobic contribution of non polar residues and the effect of the solvent on the self energy of polar residues. However, these contributions are included implicitly in and thus we only consider explicitly the of ionized residues. This is done by using different in Eq. (3) for hydrophobic and polar residues.
The secondary structure of proteins depends strongly on the solvation of the main chains. Thus we added the correction potential that is used to modify the gas phase potential. This solvation potential is given by,
The values of and are chosen to represent the minima of the α-helix and β-sheet regions of the Ramachandran plot, while Ai and have been selected to tune the simple model α-helix and β-sheet regions to match those of the explicit model (see section 3.1.1). The specific values of these parameters are listed in Table II.
Solvation effects play a crucial role in determining hydrogen bonding. A part of this effect is taken into account by the of Eq. (5). However this uniform screening does not yield the correct physics of hydrogen bonding as established by our simulated studies where the RMS between the calculated and observed structure drift upward over time. Furthermore, even the introduction of have not resolved this problem.
In order to obtain better hydrogen bonding representation we introduced a potential , where a small Gaussian barrier was added to the potential well used to describe hydrogen bonds. At room temperature, this barrier is easily crossed during hydrogen bond formation while preventing the already formed bonds from breaking. This treatment gives the functional form
The inclusion of both and greatly increases the long term stability of the simple model, with a maximum RMS deviation of less than 3.0 Å over the course of a 100 ps trajectory.
The greatest difficulty to multiscale simulations lies in accurately and efficiently recreating the original all atom system from the simplified model system. Our strategy is based on the approach introduced by Fan et. al.10 In this approach we considered the partition functions Qsp and Qep of the simplified and explicit models, respectively, as well as the ration between them, Qsp / Qep. This gives the free energy of transfer between the surfaces by
where β = 1/kBT, kB being the Boltzmann constant and T being the absolute Temperature. In order to evaluate Eq. (14), we start by expressing the simplified and explicit potentials as
where are the coordinates of the simplified model and are the coordinates of the explicit side chain atoms, relative to the centers of the corresponding side chains in the simplified model. Now our task is to evaluate the free energy of moving from the simplified to the explicit model, ΔGsp→ep, or −ΔGep→sp. This can be expressed as the ratio between the corresponding partition function
Thus we have
where the average is done formally over the combined + coordinate set. Eq. (18) can be evaluated by using a free energy perturbation approach with a mapping potential:
so that we have,
In many applications we are not interested in the total partition function and the total free energy, but in the potential of mean force (PMF) of the simplified and explicit models. In such cases we define the PMF by considering a partial partition function
where X is the parameter that defines the PMF. With Δgsp(X) and Δgep(X) defined in the thermodynamic cycle of Fig. 2. The key for the transition between the simplified and explicit PMFs is the ratio
Here we follow the same strategy as in Eq. (18) and obtain,
If we use the mapping potential of Eq. (19) we will have,
In evaluating the free energy terms of Eqs. (18) or (27), we start typically from the simplified model in a given configuration and construct the explicit model by replacing the dummy atom with the explicit Cβ. Then, using a rotamer library26 (with random search followed by brief steepest decent torsional minimization) we select the side chain configurations that minimize the potential energy of the system, while also minimizing the displacement between the center of the explicit side chain and the position of the simplified atom. After this treatment that considers all the rotational degrees of freedom within the side chains, we apply a steepest descent minimization over the Cartesian degrees of freedom within each side chain to obtain the relaxed side chains structure. Once the explicit structure has been minimized we can use Molecular Dynamics (MD) or Monte Carlo (MC) for the evaluation of the free energy of moving from the simplified to the explicit model.
A crucial element of the introduction of any analytic potential surface is the parameterization and validation process. This section considers aspects of the refinements that were not considered in the previous sections.
The optimization of the parameters that determine the solvation contribution from and involved the use of the alanine dipeptide as a model system. The explicit free energy surface for this system was generated both for a gas phase model and for a solution model (using the surface constrained all atom solvent (SCAAS) model27,28). In order to recreate the hydrated landscape by the simplified model, it was necessary to add to Uphi−psi four potential wells at the minima of the α- helix ( ~ -90°, ψ ~ -30°), β-sheet ( ~ -160°, ψ ~ 160°), γ′-turn ( ~ -90°, ψ ~ 60°), and L-α-helix ( ~ 60°, ψ ~ 40°) regions of the Ramachandran (RC) diagram. The relevant parameters are given in Table II.
The addition of the term (and in some respects the ) to the simplified model created a surface with minima at the typical regions obtained in other works (e.g. Lovell et al.29). Our RC diagram is presented in Fig. 3 for both the explicit and simplified models. Apparently the simplified surface is not perfect, but allows us to capture the main features of the explicit model. The notations for the different regions are taken from the ref29.
The term is strongly coupled to and it is important to examine whether in addition to reproducing reasonable phi-psi map, we can retain a reasonable description of stable helices and other structures that are determined by hydrogen bonding. With the we find that a small barrier to hydrogen bond formation (~1 kcal mol-1) is easily crossed at room temperature, but forms a significant barrier to the breaking of hydrogen bonds once they have been formed. By including ΔUHB in Eq. (1), we find that even full scale protein systems like ubiquitin or chorismate mutase21 are structurally stable (RMS displacement < 3 Å) for more than 100 ps of simulation.
The introduction of is perhaps the most important new element in our approach. This potential clearly improves the physical basis of our model but the reliability of the model still depends crucially on the parametization of the self energy potential. The refinement procedure was done by selecting different ionizable groups in several protein sites, evaluating their self energy using the PDLD/S-LRA approach28 and adjusting the parameters in to reproduce the best agreement between the PDLD/S results and those of the simplified model.
In the first step of this refinement procedure we started by considering in a fully hydrophobic, non polar, protein environment. For this purpose we created the relevant environment by using in-silico methods. That is, we mutated, in-silico, residues 21-56 (out of the total of 62 residues) of the protein SSO7d (PDB ID 1SSO) to all-hydrophobic residues, except Ile29 which we mutated into Glu. This created an all-hydrophobic environment surrounding the ionizable residue Glu29 (see Fig. 4), where Glu29 is buried, by residues 22 to 33 and 43 to 56. This model which is referred to as “the fully buried Glu 29” was then modified creating two additional environments for Glu29. In the second, which is referred to as a “semi-buried” environment, we deleted residues 49 to 56, making Glu29 more exposable to solvent. Finally for the third, which is refereed to here as “exposed” environment, we mutated Val22, Met24 and Phe47 into Glycine, making Glu29 fully exposed to solvent.
Next we calculated the solvation free energy of the ionized Glu29, in all 3 hydrophobic environments described above. We used the PDLD/S-LRA as well as simplified model's Eqs. (6 – 10), and adjusted the parameters in these equations to get the best agreement between the two values. The results for simplified and PDLD/S-LRA salvation energy for the artificial protein environments are given in the first three entries of Table III.
After refining the simplified model to obtain the best description of the effect of nonpolar residues on the self-energy, we turned to the refinement of the parameters that reflect the polar contributions. This was done by the same procedure described above but now for a number of real proteins sites (where we have significant number of polar residues). The protein sites were taken from three diverge proteins: Apoflavodoxin (PDB ID 1FTG) residue site Glu106, Tm DHFR (PDB ID 1CZ3) residue sites Arg28 and Asp117, and Interleukin (PDB ID 1IOB) residue sites Glu111 and Glu113. Following the same steps as with the artificial hydrophobic environments, we calculated by using both the PDLD/S-LRA and the simplified model's approach and then refined the parameters in Eqs. (6-10). The self energies obtained by both approaches are given in Table III. The final set of refined parameters for are given in Table IV. As seen from the tables, the agreement between simplified model , and PDLD/S-LRA is quite encouraging.
In considering the transition from simplified CG models to explicit models, we start by noting that Eq. (14) is not conceptually tied to a given type of reaction or process. Rather, it represents a general means of converting between two model representations of the same system. The thermodynamic cycle of this generic transformation is given in Fig. 2. Previously10, we have shown that this process can be exploited in calculating the PMF for unfolding processes, using the radius of gyration as a means of denaturing the system from the folded to the unfolded state. We have also shown that it is possible to use simplified model structures to obtain explicit structures of both folded and unfolded proteins12. In this work we explore other applications of our general strategy. This is done in the following subsection.
One possible application of our model is the evaluation of the effect of mutation on protein stability. Although this can be done by evaluating the folding PMF for both systems, it is much simpler to do so by using the thermodynamic cycle of Fig. 5. More specifically, ultimately we are interested in evaluating:
where and are the free energies of the mutant and native forms of the protein, respectively, evaluated by the explicit model. can be evaluated using the thermodynamic cycle of Fig. 5, yielding
where is the folding free energy of the mutant in the simplified model, while , are the changes in free energy of the transformation from the simplified model to the explicit model for the folded and unfolded mutant, respectively. The folding energy of the native system is produced through the same process as the mutant. However, rather than calculating the folding energy directly, we can use the thermodynamic cycle inside the grey box in Fig. 5 and obtain
It is therefore possible to calculate the change in folding free energy in the simple model by comparing the free energy of mutating the protein in the folded and unfolded systems. Ultimately, combining Eqs. (29 – 31), and the thermodynamic cycle of Fig. 5 gives,
While Eq. (32) appears more complicated than evaluating Eq. (29) directly, it has definite advantages. First, calculating ΔΔGuf−f via Eq. (32) requires only that we calculate the free energy change in the mutation process only in the simple model (this is done in the folded and unfolded states). With the exception of mutations to or from glycine, mutating residues within the simplified model is extremely straightforward. All that is required is the change of the parameters associated with the X atom for the residue being mutated. Furthermore, it is not necessary to force the folded system to unfold. Additionally, the unfolded state can be modeled using only the neighboring residues to the residue to be mutated, greatly simplifying the calculation of and reducing the computational cost involved. These computational savings can be reinvested by increasing the number and length of frames used to calculate and via the free energy perturbation method, which greatly improves the accuracy of the calculation.
As a test case for the performance of Eq. (32), we have chosen to examine the pseudo wild-type ubiquitin and the Asp21Asn mutant discussed in our previous work21 (see also ref30). The three dimensional representation of this system is shown in Fig. 6. We have chosen this system because it has been well studied by other means21, and at 76 residues is large enough and has a compact enough secondary structure to represent an actual system of interest. Our FEP calculations were performed by both MD and Monte Carlo simulations. The calculations started with 10,000 MD relaxation steps at 300 K, using 1 fs for each step. The main chain and residues far from the mutated residues were fixed during this relaxation runs. The resulting explicit structure was then used to generate the simplified structure. Four such simplified structures were generated. Starting from the given simplified structure we simulated the motions of the explicit model by either MD or MC simulations at 300 K with the mapping potential of Eq. (19) and evaluated the free energy of moving from the simplified to the explicit model by a FEP approach. The key aspect of our procedure is demonstrated in Fig. 7, where we describe the fluctuations of the gap between the explicit and simplified potential for trajectories on the given mapping potential of the native protein (in this case the sixth window with λ = 0.5). The same procedure was repeated for 11 frames and the overall free energy of moving from the simplified to the explicit potential was calculated ( ). The same procedure was used to evaluate for the Asp 21Asn mutant. The calculations for the unfolded protein ( and ) are shown in Table V.
Another important use of our approach is in the field of enzyme design where it can be used to evaluate the binding free energy of rate determining transition states. This can be done by focusing on the electrostatic free energy contribution ΔGbind′, while using the cycle of Fig. 8, which leads to the following expression:
Where and are the changes in the free energy of transforming the simplified model to the explicit model, for the fully charged TS in the native and mutant, respectively. is the difference in free energy between native and mutant simplified structures for the fully charged TS. and are the changes in the free energy of transforming the simplified model to the explicit model for the zero charged TS, in the native and mutant, respectively. is the difference in free energy between native and mutant simplified structures for the zero charged TS.
The calculations were performed on monomeric chorismate mutase (mMjCM) (whose active site is depicted in Fig. 9). The coordinates mMjCM (NMR) structure were taken from PDB access code 2GTV. The native protein preceded to 100ps MD simulations in order to obtain a relaxed structure. The resulting native explicit structure was then used to generate the simplified structure. Both the simplified and explicit models included an explicit empirical valence bond (EVB) description of the reacting system (see ref31 for details). Three such simplified structures were generated and used to simulate the fluctuations of the explicit model by MC moves of the torsional coordinates. The simulations were done at 300 K with the mapping potential of Eq. (19) and evaluated the free energy of moving from the simplified to the explicit model with full EVB transition state charges and zero EVB transition state charges. We have used a simulation length of 1 ps for each of the 11 mapping steps and the overall free energy of moving from the simplified to the explicit potential was calculated for the charged ( ) and zero charged transition state. We choose as test cases the mutations of Gln88 and Arg51 since both (particularly Gln88) were found to present a significant challenge in our recent exploration of the potential of the EVB in enzyme design31. Here we mutated Gln88 to Asn88 and repeated the calculation described above. We also mutated Arg51 to Gln51 and repeated the calculations. The results are shown in Table VI. Finally, we calculated by taking the corresponding change in average energy (see results in Table VI).
The use of the CG model as a reference potential can facilitate studies of other long-standing problems and here we list several new key applications.
The proposal that slow conformational motions on the millisecond time scale play a major role in enzyme catalysis has been featured in many recent high profile works (see discussion in ref32). Although there is no direct experimental support, and there are very clear conceptual and logical reasons why this idea is invalid (see ref32 and reference in that work), it is crucial to explore the dynamical proposal by simulation studies that can explore the relevant milliseconds time range. Furthermore, there are other fundamental functional problems whose analysis can be greatly helped from the ability to explore slow protein motions.
Recently we introduced a renormalization method were we move from the full model to the simplified model and then to an even simpler 2-D model in representing the landscape and dynamics in the space defined by the conformational and chemical coordinate in enzymes. In this treatment we force the CG model to have the same dynamical properties as the full model by a strategy similar to the one used in our modeling of ion channels33. The main element of our approach is the refinement of the friction in the CG model so that the response of this model to the application of a strong constraint will be similar to that of the full model. The use of this model and in particular its transfer to the 2-D model allows us to explore the time dependence of processes that occur on a long time scale (e.g. ref34 and ref32). The study of ref provided the first direct proof that the dynamical proposal is invalid and further progress in the use of the renormalization model is clearly expected.
There is a significant current interest in MD simulations that take into account changes in ionizations states during the simulated process (ref35 and ref36,37). However, the current model does not consider the time dependence of the proton transport (PTR) process and thus cannot reproduce the proper time dependence of the response of a protein to changes in pH. To advance on this challenging field we combined our approach of time dependent Monte Carlo (MC) of PTR processes (ref38) and the simplified protein model described here, in studies of pH dependent MD
Our model represents the energetics of the system by a simplified version of the EVB using the modified Marcus equation (see ref39) for the energetic of any possible PTR step, where the free energies for the different protonation states is obtained by the electrostatic energy of the CG model. The MC moves are based on the electrostatic energies of the CG model and then scaled by the characteristic PT time to correspond to the rate constant predicted by transition state theory. More specifically (ref40 and ref41), the free energy of each protonation state is expressed by
where m designates the vector of the charge states of the given configuration (i.e. m = q1(m), q2(m),… qn(m)). Here qk(m) is the actual charge of the kth group at the mth configuration. The Wkl qk(m) ql(m) term represents the charge-charge interactions.
The barrier for the PT moves is given by39
where is the free energy of the reaction, and Hij is the EVB off-diagonal term that mixes the two relevant states whose average values at the transition state x≠ and at the reactant state , are designated by the corresponding . The first term is the expression of the regular Marcus equation (see ref39), which corresponds to the intersection of Δg1 and Δg2 at x≠. The second and third terms represent, respectively, the effect of H12 at x≠ and .
Using Eq. (35) with simplified modifications38, we consider random jumps of the proton to any possible site, but accept only jumps to sites i+1 and i-1 (or in the more general case to sites that are less than 4.0 from site i (usually we use 3.5 Å cutoff but here we use a simplified model for the protein side chains). Furthermore, these jumps are accepted only if they satisfy the standard Metropolis criterion with regard to the free energy change (ΔGn+1 − ΔGn) or the closely related activation barrier , which is defined by the individual PT steps in this transition. The MC procedure is converted to a time-dependent simulation by exploiting the isomorphism between the probability obtained from the MC procedure and the probability factor of the transition state theory (TST). Since the probability of the MC jump satisfies the Boltzmann probability we can write:
Where τi→j is the real time required for a move from site i to site j and Ni→j is the number of MC steps that were required for this move. The factor S for PT between internal groups is given by38:
Where 0.165 is the average time, in picoseconds, it takes for a productive trajectory to reach the TS at room temperature (see supplementary information of ref38). The factor δ represents the solute and solvent reorganization barrier for a transfer between states of similar energy, i.e. . The MC approach guarantees that the rate of the PT steps will follow the relevant Boltzmann probability in TST (see ref38).
The PT to and from the bulk is determined by the same considerations but now using the free energy of Eq. (34) for the given pH. The MC move for ΔGN→N+1 is scaled by the time of H3O+ diffusion by the same considerations as in ref41 but now for pH = -1 (which corresponds to the pKa of H3O+). This allows us to express the time of transferring proton from the bulk to a water molecule on the surface of the protein and then to a given ionizable group, as:
Since some ionized groups are not in direct contact with the bulk solvent, we scale the MC move for PT to and from the bulk by a factor, F(R), that reflects the distance, Ri,bulk, between the given residue and the closest bulk water molecule.
Where a is a parameter of positive value. In this preliminary work, a is set to a = 10 Å. The distances R, are determined by using a 4 Å grid and excluding grid points that are within 4 Å from the protein (this large exclusion radius is due to the fact that we use a simplified protein model). The function F(R) was chosen to represent the chance of a bulk molecule to approach the protein residues. The grid is then updated once in every 1000 MD steps. Since the exponential factor of F(R) is included in the criteria of the MC move that determines the acceptance of the PT moves we can write:
Thus we can write for the PT from and to the bulk
The scaling by 0.16 allows us to use the same scaling for the MC for the PT from bulk and the MC for the PT between the protein groups and between the protein groups and the bulk.
The next issue is the coupling between the PT steps and the protein fluctuations. One option is to run Langevin dynamics (LD) of the CG model. This option requires a significant computational time, since the typical LD steps are about 10fs, where the MC steps correspond to 200fs. Another option is to also simulate the protein motion by an MC treatment, and then estimate the characteristic time for the MC moves. Finally, considering the fact that protein's structural changes are of interest mainly when they appear after the changes in its ionization state, it is possible to simply turn-on the protein relaxation, only after reaching new stable ionization states, then allow the protein to relax (with fixed ionization state) and finally turn on again the MC for the PT processes. While such approach may not be fully consistent, it will converge to the same final state as the more consistent and much more expensive fully coupled LD - MC approach. All the above options will be considered in the future. Regardless of the method used for simulating the protein fluctuations, we expect the overall method to be very instrumental in modeling pH induced conformational changes.
The preliminary implementation of the model demonstrated that calculated ionization state at the equilibrated system appeared to be quite similar to those obtained with the PDLD/S-LRA treatment. We also performed preliminary simulations of the time dependent PTR in protein SSO7d (PDB ID: 1SSO) following pH change from 0 to 7. The residue sites used for these simulations are: Lys4, Lys6, Lys8, Glu10, Glu11, Lys12, Asp15, Lys18, Lys20, Lys21, Lys24, Lys27, Asp34 and Glu35. The corresponding results are given in Fig. 10. The approach is promising but more validation is needed.
This work presented the current version of our multiscale approach that uses a simplified folding model as a reference potential for all-atom simulations. The simplified model used here includes some innovations in terms of the electrostatic treatment and in particular in terms of the representation of the self-energy. However, the main point in our treatment is that the simplified potential does not have to be perfect since it is only used as a reference for the explicit potential. Of course, the closer are the simplified and the explicit potential, the faster will be the convergence of the calculations of the free energy for the transfer between these potentials.
The power of our treatment was illustrated in calculations of mutational effects, in the calculations of transition state binding energies and in calculations of time dependent response to pH changes. Another application that was demonstrated in our previous work involves the evaluation of the potential of mean force (PMF). We also outlined other applications of our approach, including its implementation in studies of long time scale dynamical effects and the proposal that dynamical effects contribute to catalysis32. More systematic options of using the reference potential for exploring the dynamics of the full model has not been fully formalized but developments in this crucial direction are clearly expected.
The exact time saving of the CG model is an issue that requires careful analysis, which is left to subsequent studies. That is, some of the key questions like obtaining a PMF for protein unfolding are hard to explore with explicit models since performing the corresponding calculations, where the proper sampling is extremely challenging. Similarly the effect of proper sampling calculations of binding energies is not yet fully clear. Thus we are not trying to give here time estimates, and hope to address the issue in subsequent works. However we can give here a rather trivial example, noting that a structural relaxation of the protein 1SSO took 100ps took 7 hours on a 64bit Dual Intel Xeon node, with 20Gb memory, while the equivalent relaxation of the simplified model on the same node took less than 2 minutes.
Overall it is clear that multiscale modeling of proteins has advanced significantly from its early days in 1975 and that a more rigorous treatments should focus on minimizing the difference between the average of the difference between the simplified and explicit potential <(Usp-Uexp)>. In fact, minimizing this functional with respect to the parameters of the simplified model is probably the most promising ways of refining the CG model.
This work was supported by NIH grants GM 24492 and GM40283. M.R. thanks the Generalitat Valenciana from Spain for the postdoctoral fellowship. We gratefully acknowledge the University of Southern California's High Performance Computing and Communications Center for computer time.