|Home | About | Journals | Submit | Contact Us | Français|
The loop 287–290 (Ile, Phe, Arg, and Phe) of the protein AcetylCholineEsterase (AChE) changes its structure upon interaction of AChE with diisopropylphosphorofluoridate (DFP). Reversible dissociation measurements suggest that the free energy (F) penalty for the loop displacement is ΔF=Ffree − Fbound ~ −4 kcal/mol. Therefore, this loop has been the target of two studies by Olson’s group for testing the efficiency of procedures for calculating F. In this paper we test for the first time the performance of our “hypothetical scanning molecular dynamics” (HSMD) method and the validity of the related modeling for a loop with bulky sidechains in explicit water. Thus, we consider only atoms of the protein that are the closest to the loop (they constitute the “template”) where the rest of the atoms are ignored. The template’s atoms are fixed in the x-ray coordinates of the free protein, and the loop is capped with a sphere of TIP3P water molecules; also, the x-ray structure of the bound loop is attached to the free template. We carry out two separate MD simulations starting from the free and bound x-ray structures, where only the atoms of the loop and waters are allowed to move while the template-water and template-loop (AMBER) interactions are considered. The absolute Ffree and Fbound (of the loop and water) are calculated from the corresponding trajectories. A main objective of this paper is to assess the reliability of this model and for that several template sizes are studied capped with 80–220 water molecules. We find that consistent results for the free energy (which also agree with the experimental data above) require a template larger than a minimal size, and a number of water molecules, which lead approximately to the experimental density of bulk water. For example, we obtain ΔFtotal = ΔFwater +ΔFloop = −3.1 ± 2.5 and −3.6 ± 4 kcal/mol for a template consisting of 944 atoms and a sphere containing 160 and 180 waters, respectively. Our calculations demonstrate the important contribution of water to the total free energy. Namely, for water densities close to the experimental value ΔFwater is always negative leading thereby to negative ΔFtotal (while ΔFloop is always positive). Also, the contribution of the water entropy TΔSwater to ΔFtotal is significant. Various aspects related to the efficiency of HSMD are tested and improved, and plans for future studies are discussed.
Calculation of the entropy, S and (Helmholtz) free energy, F constitutes a central problem in computer simulation, in spite of the significant progress achieved in the last 50 years.1–10 This problem is in particular severe in structural biology due to the flexibility and strong long-range interactions characterizing bio-macromolecules such as proteins. Thus, the potential energy surface of a protein, E(x) is rugged (x is the 3N-dimensional vector of the Cartesian coordinates of the molecule’s N atoms), i.e., this surface is “decorated” by a tremendous number of localized wells and “wider” ones, defined over regions, Ωm (called microstates) – each consisting of many localized wells (an example for a microstate is the α-helical region of a peptide (see further discussion in our Refs. 11 and 12). A microstate Ωm, which typically constitutes only a tiny part of the entire conformational space Ω, can be represented by a sample (trajectory) generated by a local molecular dynamics (MD)13,14 simulation starting from a structure belonging to Ωm. MD studies have shown that a molecule will visit a localized well only for a very short time [several femtoseconds (fs)] while staying for a much longer time within a microstate,15,16 meaning that the microstates are of a greater physical significance than the localized wells. Typically one is interested in calculating the free energy of the most stable microstates, rather than calculating the total free energy (of Ω). Identifying these microstates, in particular that with the global free energy minimum, is the extremely daunting task of protein folding. Notice that the microstates discussed above are meta-stable states; however, non-stable microstates, such as a transition state, might also be of interest.
Differences ΔF (ΔS) are commonly calculated by thermodynamic integration (TI) over physical quantities such as the energy, temperature, and the specific heat, (“calorimetric TI”) as well as non-thermodynamic parameters [free energy perturbation (FEP) is also included in this category].1–10 This is a robust and highly versatile approach, which enables one calculating a small difference in the binding F of two ligands a to b in the active site of a large enzyme solvated by water. However, while the mutation process leading from a to b is well controlled by TI, conformational changes in the entire protein (i.e.,”jumps” of side chains among rotamers) occur constantly and therefore the results might converge only after extremely long simulation times. Also, it is sometimes difficult to control the size and shape of the active site after mutation and the correct position of b in it. In many cases one is interested in calculating ΔFmn between two microstates Ωm and Ωn (denoted for brevity m and n, respectively); however, if the structural variance between the microstates is significant the integration from m to n becomes difficult and for large molecules unfeasible. These drawbacks of TI can be overcome to a large extent by methods that provide the absolute Fm (Sm) from a given sample; thus, one is required to carry out (only) two separate local MD simulations of microstates m and n, calculating directly the absolute Fm and Fn hence their difference ΔFmn = Fm − Fn, where the complex TI process is avoided (however, this approach has its own limitations, since the fluctuation of S and E of an N-particle system grows as N1/2; on the other hand, the fluctuation of the exact free energy is zero as discussed in II.3 following eq 6. In practice, however, F is approximate and its fluctuation is finite but typically smaller than that of S and E. For a more extensive discussion on TI and other techniques for calculating differences, ΔF (ΔS), see Ref. 7.)
The absolute S and F can be calculated by harmonic17–19 and quasi-harmonic20–22,10 approximations and more general methods that are not limited to harmonic conditions, such as the local states (LS)23–25 and hypothetical scanning (HS)26–28 methods of Meirovitch, and other techniques.29,30 However, all these methods are not applicable as yet to diffusive systems such as explicit water. Notice that the absolute F can also be obtained with TI provided that a reference state R with known F is available and an efficient integration path R→m can be defined. A classic example is the calculation of F of liquid argon or water by integrating the free energy from an ideal gas reference state, where for such homogeneous systems TI constitutes a very efficient method. However, for non-homogeneous systems such integration might not be trivial, and in models of peptides and proteins defining adequate reference states and integration paths is a standing problem.7
In recent years we have developed a new method for calculating the absolute S and F from a single sample called the hypothetical scanning Monte Carlo (HSMC) (or HSMD, where MD is used).11,12,31–36 HSMC(D) is based on ideas of the LS and HS methods mentioned above. Namely, in all these methods each conformation i of a sample (generated by MC MD or any other technique) is reconstructed step-by-step (from nothing, see II.4) using transition probabilities (TPs). The product of these TPs leads to an approximation for the correct Boltzmann probability PiB from which various free energy functionals can be defined. The TPs of HSMC(D) are stochastic in nature calculated by MC or MD simulations, where all interactions are taken into account. In this respect HSMC(D) (unlike HS and LS) can be viewed as exact;31 the only approximation involved is due to insufficient MC(MD) sampling. HSMC(D) has unique features: it provides rigorous lower and upper bounds for F, which enable one to determine the accuracy from HSMC(D) results alone without the need to know the correct answer. Furthermore, F can be obtained from a very small sample and in principle even from any single conformation (e.g., see results for argon in Ref. 31).
HSMC(D) has been developed systematically as applied to liquid argon, TIP3P water,31,32 self-avoiding walks on a square lattice,33 and peptides,34–36 where for the first three models HSMC(D) results have been found to agree within error bars to TI results obtained by extensive MC or MD simulations. Also, for polyglycine molecules differences ΔFmn and ΔSmn for α-helix, extended, and hairpin microstates were calculated very reliably by HSMC.34–36
HSMD has also been applied to a mobile loop of the protein α-amylase,11–12 where the system is modeled by the AMBER96 force field37 alone and by AMBER96 with the GB/SA implicit solvent of Still and coworkers.38 In this work only protein atoms close to the loop (the template) were considered; however, they were kept fixed in their x-ray crystallographic positions, while only the loop’s atoms were moved by MD. A further development step of HSMD has been achieved recently, where the implicit solvent was replaced by explicit solvent, i.e., the α-amylase loop was capped with 70 TIP3P39 water molecules, which (together with the loop’s atoms) were moved in the MD simulation. Because the application of HSMD to water has not been optimized yet, the contribution of water to the free energy was calculated by a TI procedure incorporated within the framework of HSMD – this procedure is called HSMD-TI; in this TI process the water-loop interactions are gradually decreased to zero. Notice that the related statistical errors are relatively small because the loop structure is held fixed during the integration and only the water molecules are moved by MD. Previous studies of , the minimal number of water molecules required for obtaining stable structures of proteins40 and loops41 (based on a root mean square deviation criterion) suggested that in some cases might be small (even 5 or 10 for certain loops). However, for loops, no systematic study has been carried out to determine the smallest template size and the minimal number of water molecules that would lead to consistent free energy results, i.e. results that are unchanged for larger template/water systems.
This study is carried out here, where an additional goal is to optimize HSMD(TI) further. Thus, HSMD(TI) is applied to the four-residue loop 287–290 ( Ile, Phe, Arg, and Phe) of the protein acetylcholinesterase (AChE) from Torpedo California. AChE degrades the neurotransmitter acetylcholine (ACh), producing choline and an acetate group. It is mainly found at neuromuscular junctions and cholinergic synapses in the central nervous system, where its activity serves to terminate synaptic transmission. AChE has a very high catalytic activity - each molecule degrades about 5000 ACh molecules per second. Reduction in the activity of the cholinergic neurons is a well-known feature of Alzheimer's disease. Thus, AChE inhibitors are employed to reduce the rate at which ACh is broken down, thereby increasing the concentration of ACh in the brain and combating the loss of ACh caused by the death of cholinergic neurons. AChE is also the target of many nerve gases, particularly organophosphates inhibitors (e.g. sarin), which block the function of AChE and thus cause excessive ACh to accumulate in the synaptic clefts. The excess ACh causes neuromuscular paralysis leading to death.42
Of interest is the reaction of the inhibitor diisopropylphosphorofluoridate (DFP) with AChE43–48 which leads to a displacement of the loop’s backbone roughly by 4 Å. Moreover, comparison of experimental reversible dissociation constants measured for a variety of inhibitors of differing molecular size suggests that the free energy penalty for the loop displacement is on the order of 4 kcal/mol (i.e., Ffree − Fbound ~ −4 kcal/mol).43,45,49
The fact that the crystal structures of the free50 and bound43 enzyme are available, makes this loop a convenient target for testing free energy procedures, and indeed such tests were already carried out by Olson and collaborators.51,52 In the present study we shall also extend the scope of HSMD for an internal loop (the AchE loop is “hidden” in a cleft) with large sidechains; the earlier investigation of amylase has been applied to an external loop consisting of small sidechains.
Before describing the system and our procedures in detail, it should be stressed that our study relies heavily on the notion of a microstate introduced earlier. However, determining the exact limits of a microstate in conformational space is practically impossible and therefore it is commonly defined by an MC or MD sample initiated from a microstate’s structure. Thus, the microstate’s size typically increases with simulation time, t and estimation of E, S, and F depend on t as well. Therefore, estimation of ΔEmn, ΔSmn, and ΔFmn between two microstates should be conducted with care; for details, see Refs 11 and 12.
As was pointed out above we study the 4-residue loop, Ile, Phe, Arg, Phe, (287–290) of AChE in two microstates related to the free and bound loop structures. The starting point is the available crystal structures of AChE from the protein data bank (PDB), where the unbound (free) structure 2ace50 is based on residues 4 to 535 and 2dfp, the structure of AChE with DFP consists of residues 2 to 535 (the bound structure).43 To be able to compare these structures 2dfp was trimmed so that both structures consist of the 531 residues, 4–535 and crystal water molecules were retained.
A close analysis shows that these conformations differ overall by (all heavy atoms) root mean square deviation (RMSD) of 1.22 Å, while the loop conformations differ by RMSD=1.38, 2.91, and 2.71 Å for the backbone, sidechains, and all loop heavy atoms, respectively, meaning that most of the deviation is due to sidechains movement. The RMSD of the templates, i.e. all atoms excluding the loop’s atoms, is 1.10 Å.
We used the program TINKER,53 and the AMBER force field,37 where Lys, Arg, and His are positively charged and Asp and Glu have a negative charge. Hydrogens were added to the free structure (including the crystal water) and the potential energy was minimized, with all heavy atoms restrained to their crystal positions by harmonic forces (with a force constant of 100 kcal/Å2). Afterwards the loop and (TIP3P) water atoms were allowed to relax in the presence of a fixed template. The minimization eliminates bad atomic overlaps and strains in the original structures, while keeping the atoms reasonably close to the PDB coordinates.
Because the free and bound protein structures are similar, we adopt here the same strategy used in our previous studies,11–12 i.e., the structure of the bound loop (with hydrogens added) is attached to the free template. It would be impossible to compare the free energies of the free and bound microstates keeping the DFP attached to the bound template, because the two systems will be different. Thus, we calculate the free energy required to move the loop from the free to the bound microstate in the presence of the free template. In principle, one could carry out a similar study based on the bound template; however, many coordinates of the bound structure, 2dfp appear with large B-factors above 40, while the free structure (2ace) is much better resolved (B-factor values lower than 30). Therefore, the calculations were carried out with the free template. Also, taking into account the whole protein (of 8284 atoms) would be computationally prohibitive; therefore (as in Refs 11 and 12), the template size is reduced to the Ntmpl atoms closest to the loop, where the rest of the atoms of the protein are ignored. More specifically (see Figure 1), the center of mass of the backbone atoms of the free loop is calculated as a (3D) reference point denoted xcmb and a distance (Rtmpl) is chosen. If the distance of any atom of a residue from xcmb is less than Rtmpl, the entire residue is included in the template; otherwise, the residue is eliminated. To assess the minimal template size needed we have studied Rtmpl =11, 12, and 13 Å corresponding to Ntmpl=800, 944, and 1100 protein atoms and 20, 30 and 40 crystal water molecules, respectively.
After defining the template, the water molecules and the orientations of the polar hydrogens of the free and bound loops (on the same template) are subjected to an optimization procedure. This procedure (where all the loop’s heavy atoms are kept fixed), is carried out in several rounds, each consisting of a 0.1 ns MD run (1 fs time step) at high temperature (1250 K) followed by energy minimization; the process is stopped after a fixed number of unsuccessful rounds (typically 100), namely rounds which do not reduce the energy further. Although the heavy atoms are fixed, considerable gain in potential energy is achieved. Next, keeping the template atoms fixed, the system energy is minimized where the loop and water molecules are allowed to move freely. The final “free” and “bound” structures constitute starting points for a further analysis.
While our considerations thus far are based on the crystal structures, we seek to simulate the loop in solution. Therefore, it is not clear whether the positions of the crystal waters are relevant for the solution environment. In particular, water molecules that are caged within the crystal structure are expected to stay there during the simulation, and thus can be considered as part of the template. Therefore, the number and arrangement of these waters should be globally optimized, practically by energy minimization, which, however, is a non-trivial problem (see below).
Our strategy is to add more water molecules to the already existing crystallographic waters. Thus, we define a sphere centered at xcmb with a radius, Rwater (Rwater=Rtmpl+1 Å) where waters are added at random to the hemisphere oriented towards the exterior of the template. To hold these waters around the loop they are restrained with a flat-welled half-harmonic potential (with a force constant of 10 kcal mol−1Å−2) based on their distance from xcmb. That is, if the distance of a water oxygen from xcmb is greater than Rwater a harmonic restoring force is applied, otherwise the restraining force is zero. In a previous paper41 the minimal number of waters, required to cap a loop has been studied with respect to the loop stability, i.e., the change in the loop’s RMSD from the initial (PDB) structure during 4 ns MD simulations. The objective of the present paper is similar but based on a free energy criterion; thus, we seek to find that for shows stability. To find we study 80 ≤ Nwater ≤ 220 for different template sizes.
The fully hydrated system is then treated by a sequence of MD/minimization procedure similar to that outlined above, applied only to the hydrogen atoms of the loop and all water molecules restrained within the full sphere of radius Rwater. Again, at the end of this optimization the energy is minimized where the loop’s atoms and the water molecules are free to move. The conformation of the free loop changed minimally from its crystal structure (RMSD=0.22, 0.63, and 0.57 Å for backbone, side chains and all loop atoms, respectively), while the “bound” conformation changed more (RMSD=1.16, 2.46, and 2.32 Å for backbone, side chains and all loop atoms, respectively; see discussion of the results in Table 1).
For each of the optimized “free” and “bound” structures (with several template sizes, Ntmpl and several levels of hydration defined by Nwater) an MD run at 300 K is performed, where only the loop and water atoms are moved, while the template atoms are kept fixed. Thus, 1000 loop/water configurations are collected by retaining a configuration every 0.5 ps along the 0.5 ns MD trajectory. An equilibration run of 0.5 ns is carried out prior to the production run. The total potential energy Etotal is the sum of partial energies related to the loop and water (the template-template energy is constant and thus is ignored),
where Eloop-loop is the intra loop energy, Eloop-tmpl is the energy due to loop-template interactions; these energies define the total loop energy Eloop, and the interactions related to water are defined in a similar way, where their total is defined as Ewater. The reconstruction of the loop structure is carried out in internal coordinates; therefore, the loop conformations simulated by MD are transferred from Cartesians to the dihedral angles (i, ψi, and ωi (i=1,N=4), the bond angles θi,l (i=1,N, l=1,3), the side chain angles χ, and the corresponding bond angles. For convenience, all these angles (ordered along the backbone) are denoted by αk, k=1,37=K. We have argued in Refs. 11 and 36 that to a good approximation bond stretching can be ignored.
The partition function of the loop/water system is
where E(xloop,xN)=Etot defined in eq 1; xloop is the Cartesian coordinates of the loop in microstate m (for brevity we omit the letter m in most equations). xN is the 9N Cartesian coordinates of the water molecules, where for brevity we denote in the theoretical section Nwater by N (N=Nwater). However, it is convenient to change the variables of integration from xloop to internal coordinates, which makes the integral dependent also on a Jacobian.17,18,20 This transformation is applied under the assumption that the potentials of the bond lengths (“the hard variables”) are strong and therefore their average values can be assigned to their corresponding Jacobian, J, which to a good approximation can be taken out of the integral (however, see a later discussion in this section). For the same reason one can carry out the integration over the bond lengths (assuming that they are not correlated with the αk) and the remaining integral becomes a function of the K dihedral and bond angles (αk) 17,18,20 and a Jacobian that depends only on the bond angles,
where [αk] = [α1,…αK] and d[αk] = dα1…dαK. D(T) is a product of the integral over the bond lengths and their Jacobian, J. Due to the strong bond stretching potentials, D is assumed to be the same (i.e., constant) for different microstates of the same loop and therefore lnD cancels and can be ignored in calculations of free energy and entropy differences. The Jacobian of the bond angles, which should appear under the integral, is omitted because we have shown11 that it cancels out in entropy and free energy differences (our main interest). The Boltzmann probability density corresponding to Z (eq 3) is
and the exact entropy S and exact free energy F (defined up to an additive constant) are
It should be pointed out that the fluctuation of the exact F is zero,54,55 because by substituting ρB([αk]) (eq 4) inside the curly brackets of eq 6 one obtains, E([αk]) + kBT ln ρB([αk]) = −kT ln Z = F, i.e. the expression in the curly brackets is constant and equal to F for any set [αk] within m. This means that the free energy can be obtained from any single conformation if its Boltzmann probability density is known. However, the fluctuation of an approximate free energy (i.e., which is based on an approximate probability density) is finite and it is expected to decrease as the approximation improves.54,55,30–33 Because HSMC(D) provides an approximation for ρB([αk],xN), it enables one, in principle, to estimate the free energy of the system from any single structure31,33 [Notice, however, that calculation of ρB([αk],xN) for a single conformation depends on the entire microstate as is also evident from the HSMC(D) procedure discussed later].
With MD the bond stretching energy is taken into account in eq 6 (and in free energy functionals defined later) while the corresponding entropy is ignored. The contribution of this energy to the free energy becomes an additive constant if one accepts the assumptions about the stretching energy and the corresponding Jacobian made prior to eq 6. This is a very good approximation; however, if the bond stretching entropy should be considered, we have argued in Ref. 11, II.5 that it can be estimated approximately within the framework of HSMD.
HSMC(D) is based on the ideas of the exact scanning method where a system is constructed (from nothing) step-by-step using transition probabilities (TPs). The product of these TPs is equal to the Boltzmann probability (eq 4) from which the entropy and free energy can be calculated. Practically, a loop/water configuration is generated by initially building a loop structure followed by the construction of a configuration of the surrounding water molecules. In this way a sample of statistically independent system configurations can be obtained.
For simplicity this construction is described for a loop consisting of M Gly residues (with dihedral and bond angles denoted αk, 1≤ αk ≤ 6M=K)in microstate m; the loop is surrounded by Nwater water molecules moving within the volume defined by the sphere of radius Rwater, the template, and the loop. Starting from nothing, a conformation of the loop is built first by defining the angles αk step-by-step using transition probabilities (TPs) and adding the related atoms.53 Thus, at step k, k−1 angles α1, … ,αk−1; these angles and the related structure (the past) are kept constant, and αk is defined with the exact TP density ρ(αk|αk−1 … α1),
where Zfuture (αk, …,α1) is a future partition function. The term “future” indicates that the integration defining Zfuture is carried out over the variables αk+1,…,αK and the 9N coordinates xN of the water molecules which will be determined in future steps of the build-up process. In this integration the atoms treated in the past are held fixed in their coordinates (which are determined by α1 … αk), while αk+1,…,αK are varied in a restrictive way where the corresponding conformations of the “future” part of the loop remain in microstate m. Thus
where E (eq 1) is the total potential energy of the loop/template/water system, which also imposes the loop closure condition. The product of the TPs (eq 7) leads to the (Boltzmann) probability density of the entire loop conformation,
After the loop structure has been constructed a configuration of water molecules is generated step-by-step, where the TP density ρwater(xk|αK,…,α1,xk−1) for placing water molecule k at xk is defined in a similar way to eq 7 based on Zfuture ([αk],xk) and Zfuture([αk],xk−1) where the loop conformation is kept constant and the k−1 water molecules that have already been treated are fixed at their coordinates, xk−1 and the summation in Zfuture(xk) is over the as yet undecided N−k+1 water molecules. (Notice that xk denotes the 9 Cartesian coordinates of water molecule k, while xk denotes the set of Cartesian coordinates of the k molecules 1,2,….,k). The Boltzmann probability density of the water is
and the probability density ρB([αk],xN) of the loop/water configuration is the product of . One can define for m the loop entropy, Sloop,
Sloop is defined up to an additive constant. Extending the exact scanning procedure to side chains is straightforward.
This construction procedure (which is not feasible for a large loop/water system) provides the theoretical basis for HSMC(D). Thus, the exact scanning method is equivalent to any other exact simulation technique (in particular, to Metropolis MC or MD) in the sense that large samples generated by such methods lead to the same averages and fluctuations. Therefore, one can assume that a given MC or MD sample has rather been generated by the exact scanning method, which enables one to reconstruct each conformation i by calculating the TP densities that hypothetically were used to create it step-by-step; this is the basis for HSMC(D) (as well as the HS and LS methods).
The theory of HSMD is again described as applied to a loop consisting of M Gly residues. One starts by generating an MD sample of microstate m with water molecules; the conformations are then represented in terms of the dihedral and bond angles αk,1≤αk≤ 6M=K, and the variability range Δαk is calculated,
where αk(max) and αk(min) are the maximum and minimum values of αk found in the sample, respectively. Δαk, αk(max), and αk(min) enable one to verify that the sample has not “escaped” from microstate m.
System configuration ([αk],xN) (denoted i for brevity) is reconstructed in two stages, where the loop structure is reconstructed first followed by the reconstruction of the water configuration. Thus, at step k of stage 1, k−1 angles αk−1 … α1 have already been reconstructed and the TP density of αk, ρ(αk|αk−1,…,α1) is calculated from an MD sample of nf conformations (generated in Cartesian coordinates), where the entire future of the loop and water is moved by MD [i.e., the loop atoms defined by αk,…,αK and the water coordinates (xN)] while the past (the loop atoms defined by α1,…,αk−1) are held fixed at their values in conformation i. A small segment (bin) δαk is centered at αk(i) and the number of visits of the future chain to this bin during the simulation, nvisit, is calculated; one obtains,
where ρHS(αk|αk−1, ,α1) becomes exact for very large nf(nf → ∞) and a very small bin (δαk→ 0). This means that in practice ρHS(αk|αk−1, ,α1) will be somewhat approximate due to insufficient future sampling (finite nf), a relatively large bin size δαk, an imperfect random number generator, etc. This equation is suitable for HSMC. However, for practical reasons, with HSMD a pair of angles should be treated simultaneously, where each pair consisting of a dihedral angle and its successive bond angle (e.g., and the bond angle N-Cα-C’). Thus, at each step both αk and αk+1 are considered and nvisit is increased by 1 only if αk and αk+1 are both located within the limits of δαk and δαk+1, respectively; also, for Arg we treat 3 consecutive χ angles and in the future we plan to treat 4 angles. Therefore, for n consecutive angles eq 13 becomes
where in Ref. 11 we have shown that δαk and δαk+1 can be optimized. Notice that with HSMD the future loop conformations generated by MD at each step k remain in general within the limits of m, which is represented by the analyzed MD sample. The corresponding probability density is
(eq 9) is the Boltzmann probability density of [αK] in m. Thus, for microstate m, can be estimated from a Boltzmann sample (of size ns) generated by MD using the arithmetic average,
where ρHS(t,m) is the value of ρHS([αk]) obtained for configuration t of the sample of m. (with the bar) is an estimation of the ensemble average (eq 16); correspondingly, the ensemble averages of the energy are estimated from a sample of size ns and should appear with a bar as well. However, from now on only estimations will be considered and for simplicity, all of them will appear without the bar, like the energies defined in eq 1. (eq 16 and eq 17) constitutes a measure for the loop flexibility of a pure geometrical character, i.e. with no direct dependence on the interaction energy. When the converged value of is considered, it will be denoted by Sloop, which is expected to be the exact value within the statistical error. In the same way, the difference in the loop entropies between two microstates obtained for a specific set of parameters is denoted by while the converged difference is denoted by ΔSloop, thus
The difference in the loop energy between two microstates is (see eq 1),
One can also define a free energy difference for the loop, ΔFloop,
To reconstruct the water configuration one can use in principle the HSMC(D) procedure for fluids developed previously, which would lead to [as an approximation for (eq 10)] and then to the contribution of the water configuration to the free energy
However, this procedure for fluids has not been optimized yet and it is relatively time consuming.
Alternatively, as in Ref. 12, one can obtain Fwater ([αk],xN) by a thermodynamic integration (TI) procedure based on the same reference state for the free and bound structures. In this state the water-water and water-template interactions are preserved but the (fixed) loop structure [αk ] does not “see” the surrounding waters, i.e. the loop-water interactions (electrostatic and Lennard Jones) are switched off. These interactions are gradually increased (from zero) during an MD simulation of water (while the loop structure remains fixed at [αk]). For [αk ] of microstate m one obtains from the integration which is then averaged over the ns sample configurations (as in eq 17). As described in the Appendix (V.1), the integration is carried out in two stages but in an opposite direction to that described above, i.e., first the charges are gradually decreased to zero, followed by a similar decrease in the Lennard Jones (LJ) potential, leading to , respectively. Denoting the set of [αk] in the sample by t and omitting m one obtains,
The difference in the free energy of water between m and n denoted ΔFwater is
ΔEwater (see eq 1) is
The total free energy is
and the difference in the total free energy between microstates m and n is
The difference in the water entropy between m and n is
where the corresponding difference in the total entropy is
Notice that all entropies and free energies are defined up to an additive constant.
The HSMD reconstruction procedure needs further discussion. Thus, the MD simulation of the future chain at step k starts from the reconstructed conformation i, and every g=10 fs the current conformation is retained, where the ninit initial retained conformations are discarded for equilibration. The next nf (retained) future conformations are represented in internal coordinates and their contribution to nvisit (eqs 13 and 14) is calculated. An essential issue is how to guarantee an adequate coverage of microstate m, i.e., that the future chains will span its entire region (in particular the side chain rotamers) while avoiding their “overflow” to neighboring microstates, conditions that will occur for a too small and a too large nf, respectively. [Note that even at step k, where the “past” of the loop is kept fixed, the (future) unfixed part can leave the microstate during long MD simulations. Such an “overflow” is more likely to happen for small residues such as Gly and for small k.]
In previous work11,12,34–36 we developed procedures for keeping the loop in its original microstate and measures for estimating the extent of coverage of the microstate by the reconstructed samples (an important test for an adequate coverage is verifying that entropy differences are stable as nf is increased.) In this paper these procedures have not been applied because the maximal nf values used are not large and the microstates are concentrated (i.e., the Δαk values of eq 12 are relatively small; see discussions in section III).
As has already been pointed out, our main interest is in the difference ΔFmn (ΔSmn) between microstates, rather than in the absolute values themselves. For any practical set of nf, and bin sizes, will be approximate and thus the difference, might be approximate as well. However, if is found to be stable for significantly improving sets of parameters (i.e., better approximations) the stable value can be considered as the correct difference (within the statistical errors). Indeed, in the application of HSMD to peptides36 and loops11,12 relatively small values of nf have already led to stable differences, meaning that the systematic errors in both are comparable and thus are cancelled in (we define the deviation, as the systematic error.) In Ref. 11 we have provided theoretical arguments supporting this error cancellation, which, however, should be verified for each system studied. Thus, using HSMC(D)-TI, the objective is not to obtain the most accurate Fm and Fn, but to minimize computer time by finding the worst (i.e., the worst HSMC(D)-TI approximation) for which is still correct within a required statistical error. This strategy is also applied to approximations in the model used. For example, the harmonic boundary conditions that keep water within the hemisphere impose errors that are expected to be comparable for both microstates – thus one would anticipate them to get cancelled in free energy differences. Again, one should verify that the differences are stable for increasing values of Nwater and Rwater.
As stated above and earlier, an essential aim of this work is to find the minimal template size and minimal number of water molecules required for a reliable modeling of the loop behavior (i.e., that free energy differences for the minimal and larger models are approximately the same). We have tested three template sizes, defined by radii, Rtmpl =11,12, and 13 Å consisting of 783, 944, and 1141 atoms, respectively, where each template was studied with an increasing number of water molecules, Nwater, ranging from 80 to 220.
For each pair (Rtmpl,Nwater) the solvated free and bound structures were initially optimized as described in sections II.1 and II.2 leading to two optimized structures. Then, starting from each optimized structure, an 1 ns MD trajectory was generated at 300 K, where the initial 0.5 ns part was used for equilibration and a sample of 1000 structures was generated from the last (half) part of the trajectory by retaining a structure to a sample every 0.5 ps. From these samples (which represent the free and bound microstates) the free energy and entropy are calculated.
These simulations and the reconstruction simulations (for generating the future samples) were carried out with the velocity-Verlet algorithm57 based on a time step of 2 fs, where bonds involving hydrogens (including those of water) were frozen to their ideal values by the RATTLE algorithm;57 the Berendsen57 heat bath controlled the temperature. Cut-offs on long-range interactions were not imposed, and in the reconstruction process a structure was added to the sample every g=10 fs, where the ninit=250 initial structures (2.5 ps) were discarded for equilibration. The future samples were generated for several bin sizes, where results are presented for δαk=Δαk/l, l=5, 10, 20, 30, 40, and 50, centered at αk (i.e., αk ±δαk/2) (eqs 12–14). If the counts of the smallest bin are smaller than 50 the bin size is increased to the next size, and if necessary to the next one, etc. In the case of zero counts, nvisit is taken to be 1; however, an event of zero counts is very rare.
In Table 1 we present results for αk(min), αk(max) and Δαk (eq 12) for the backbone dihedral angles and ψ and the sidechains, χ. These values are based on the samples of 1000 conformations generated for the free and bound microstates for the template of Rtmpl=12 Å and Nwater=160. The table shows that the Δαk values are relatively small (in most cases smaller than 80°) and they are comparable to those obtained by other samples [based on different (Rtmpl,Nwater) pairs]. This suggests that |ΔSloop| (eq 18b) would be small as well (probably not larger than 2 kcal/mol; compare with Δαk in Table 1 of Ref. 12). These small Δαk values stem from the constraints imposed by the template on the inner loop.
For comparison we also provide in Table 1 the dihedral values of the crystal structures, αk(crystal). These angles enable one to determine whether the samples have escaped from their original microstates. While exact definition of a microstate is practically unfeasible (see discussion in section I.3 of Ref. 11), we have accepted an “escape” criterion for a dihedral angle when αk(crystal)+60° is smaller than αk(max) or αk(crystal)−60° is larger than αk(min), i.e., if some angle values fall beyond the range αk(crystal)±60°; these angles are bold-faced in the table. The table reveals that three and two backbone angles of the free and bound microstates, respectively have been escaped but the deviations are small; the corresponding sidechain (escaped) angles are five and three, among them are χ1 – χ4 of Arg(free) (where the deviation of χ1 is small) and the slightly deviating χ1 of Phe(bound). Thus, the original microstates are not completely retained (mainly for the sidechains) but this might be expected since we model the loops in solution rather in the crystal environment.
In Table 2 results are presented for Eloop-temp, Eloop-loop, Eloop and Etotal (eq 1), and for (eq 21), calculated as described in the Appendix, V.1; we also provide results for , which is the total free energy without the contribution of the loop entropy, Sloop (eqs 16 and 17). For each quantity, we have calculated the difference, Δ between the free and bound values. These results were obtained for Rtmpl= 12 and Rwater= 13 Å for 80 ≤ Nwater≤ 180. Similar results are presented in Table 3 for Rtmpl= 13 and Rwater= 14 Å and 80≤Nwater≤ 220.
To check the stability of these results, and assess their statistical errors they were calculated for an increasing sample size of ns=10, 20 (not shown), ns=40, and 80 for the higher Nwater values. In some calculations the thermodynamic integration is doubled. The tables show that the results presented (in particular those for the larger Nwater) are very stable. Statistical errors, s/(ns)1/2, where s is the standard deviation, are smaller than 2.5 and 1.5 kcal/mol for Etotal and the other quantities, respectively [because for ns=40 and 80 the conformations are selected from the trajectory every 12.5 and 6.25 ps, respectively, the energy correlations are expected to be low. Notice that the correlations of the correct free energy are zero, because every conformation leads to the correct result (see II.3); for an approximate free energy the correlations will be smaller than the energy correlations.] The errors in Δ are differences between results obtained for the largest and smaller ns values. The Δ results (in particular those for the larger Nwater) are again very stable, i.e., the errors in most cases are relatively small, within 2 kcal/mol, due to cancelation of the individual errors in the difference; for example, , (and thus ) (eq 21) change significantly in going from a single to a double integration, however, this change is comparable in the free and bound calculations and get cancelled in Δ. It should be pointed out that to verify our error estimation, for some of the larger values of Nwater (Rtmpl=12 and 13 Å) several TI runs (for calculating ) were carried out starting from different sets of velocities.
To estimate the minimum values of Rtmpl and Nwater it is sufficient to study ΔFsum - the difference in the sums of all contributions to the free energies excluding that of Sloop (eqs 16 and 17), where ΔSloop is expected to be small. The results of ΔFsum in Table 2 are positive, 31, 30 and 14 kcal/mol for the low density water, Nwater=80, 100, and 120, respectively, meaning that the bound microstate is more stable (has lower free energy) than the free microstate (again, neglecting Sloop). On the other hand, as Nwater is increased to 140, 160, and 180, ΔFsum becomes negative, −8 −1 and −5 kcal/mol, respectively, i.e., the free loop becomes the most stable. We also provide a measure for the density of water, ρwater=Nwater/(hemisphere volume), i.e., ρwater=Nwater/[2π(Rwater)3/3], which increases to 0.0304, 0.0348 and 0.0391 Å−3 for Nwater 140, 160, and 180, respectively. These values are comparable to the experimental density of water, ρwater=0.0350 Å−3, which corresponds to 154 waters in the hemisphere; however, these densities are somewhat approximate since the hemisphere is not totally empty but contains part of the template. Thus, the density is lower for waters arranged initially in crystal water positions, or those which are put randomly inside the template. Also, bulk water can move during the simulation to crevices inside the template, and some might “seep” to the back of the template; however, because Rwater - Rtmpl,=1 Å (see figure 1) this “escape of waters is avoided to a large extent, as can be learnt from computer graphics.
The same behavior is shown in Table 3, where ΔFsum is positive, 33, 21, 48, 14, and 14 kcal/mol for Nwater = 80, 100, 120,140, and 160, respectively, becoming negative, ΔFsum= −20, −18, and −3 kcal/mol for Nwater=180, 200, and 220, i.e., for ρwater= 0.0313, 0.0348, and 0.0383 Å−3 , respectively. Thus, as in Table 2, only ρwater values close to the experimental density, 0.0350 Å−3 (corresponding to 193 waters in the hemisphere) lead to negative ΔFsum, where the increase in Nwater (as compared to R tmpl=12 Å) is due to the increase in Rwater (from 13 to 14 Å).
We have also carried out calculations for Rtmpl= 11 and Rwater=12 Å but have found ΔFsum to be positive (~30 kcal/mol) for all values of Nwater, even for ρwater ~ 0.0350 Å−3 . This suggests that the template defined by Rtmpl= 11 Å (783 atoms) is too small. Indeed, computer graphics has shown that this template does not provide the required cover for the loop and as a result some dihedral angles, especially for Arg deviate significantly from their corresponding values in Table 1. On the other hand, the fact that for water densities close to the experimental value ΔFsum becomes negative for both Rtmpl= 12 and 13 Å suggests that a template defined by Rtmpl≥ 12 Å (944 atoms) would be adequate. The strongly fluctuating (negative) values ΔFsum= −20, −8, and −3 kcal/mol obtained for Rtmpl= 13 Å and Nwater≥ 180 probably reflect the difficulty to adequately optimize starting structures for MD simulations for these relatively large systems (see discussion in II.1 and II.2). Therefore, we calculate Sloop only for the smaller systems based on Rtmpl= 12 Å, where the (negative) ΔFsum results are closer to each other.
Sometimes the energy rather than the free energy has been used in the literature as a criterion of stability; therefore, it is of interest to compare ΔFsum to the corresponding values of ΔEtotal. Table 2 and Table 3 show that these quantities (while not equal) are correlated, both decrease as Nwater increases, with R2 = 0.74 and 0.79, respectively; see Figure, 2 and Figure, 3.
It should be pointed out that changes in Nwater affect more the energy values of the free microstate than those of the bound one. Thus, in Table 2, Eloop-tmpl is changed within the ranges δfree=102−52=50 and δbound=128−120=8 kcal/mol, where the corresponding ranges for Eloop-loop also differ significantly, δfree=46−10=36 versus δbound= −10+2=12 kcal/mol. Similar relations (but somewhat less pronounced) are observed in Table 3, where the ranges for Eloop-tmpl are δfree=91−68=23 and δbound =146−131=15, and for Eloop-loop are δfree=49−20=29 and δbound=10−2=8 kcal/mol. The results for show the same tendency, where δfree=84−63=21 and δbound =53−48=5 (Table 2), and δfree=105−41=64 and δbound =62−47=15 kcal/mol (Table 3). On the other hand, increases in most cases with Nwater for both microstates.
It is of interest to determine which energy components lead to the change of ΔFsum from positive to negative at Nwater =140 (Table 2) and Nwater =180 (Table 3). Thus, for each energy (and free energy) component in Table 2 we calculate two averages of Δ, one for the three lower values of Nwater (80, 100, and 120) and the second for Nwater=140, 160 and 180 (these two averages are denoted by Δ1). The calculations show that two components contribute to the decrease of ΔFsum, while one component contributes slightly to its increase. More specifically, Δ1(Eloop-tmpl) decreases by 16 (from 60 to 44), decreases by 18 (from −6 to −24), while Δ1(Eloop-loop) increases (slightly) by 4 (from −29 to −25 kcal/mol). In Table 3 the first group consists of Nwater=80, 100, 120, 140 and 160 and the second group of Nwater=180, 200 and 220. Unlike in Table 2, Δ1(Eloop-tmpl) is practically the same for both groups, Δ1(Eloop-loop) increases slightly (as in Table 2) by 3 from −31 to −28, while decreases significantly by 41 (from ~0 to −41 kcal/mol) and thus provides the sole contribution for the decrease of ΔFsum. Thus, a consistent effect on the change of ΔFsum is provided by the water component, . From a structural point of view, the results of Table 2 suggest that on average the loop moves (but not necessarily much) to decrease Δ1(Eloop-tmpl) [where is decreased as well]. In table 3 the loop moves less, its energy is unchanged, but , which consists of the loop-water interactions decreases significantly. It has been found difficult to verify this picture by a structural analysis. The consistent behavior of our model for the two templates (as a function of Nwater) is reflected by the similar behavior of ΔFsum for both templates (the Helmholtz free energy is the characteristic thermodynamic potential in the canonical ensemble and Fsum is very close the total Helmholtz free energy). The fact that some components of ΔFsum behave differently is not unexpected, since the template size, the number of buried crystal waters in the template, and the optimization of the water-template system is different for Rtmpl=12 and 13 Å.
The discussion in the last two paragraphs demonstrates that Nwater (which defines the water pressure) affects significantly the energy and free energy components of the free and bound microstates (in particular it leads to the change in the sign of ΔFsum). Hence, one would expect that these energetic changes will correlate with the local water density around the loop. Thus, we calculated (for Rtmpl=12 Å) the average number of water molecules within spheres of radii 3, 4, 5, and 6 Å around the loops’ residues. However, for each Nwater studied (Nwater=120, 160 and 180) these numbers were found to be comparable for the free and bound microstates. On the other hand (as expected), the loop structures (in terms of dihedral angles) have been affected by Nwater but not in a systematic way and therefore these changes are not discussed here.
Results for the loop entropy, (eq 17) appear in Table 4 for Rtmpl=12 and Rwater=13 Å. Two sets of results are presented for Nwater= 160 and 180 for the free and bound microstates and for their difference (see the discussion preceding eq 18a). These results were obtained by reconstructing ns=80 loop structures, distributed homogeneously along the entire sample of 1000 system configurations. The simulated future consists of the future part of the loop including all the surrounding water molecules. The results are presented for several values of nf - the sample size of the future chains (eqs 13 and 14), where nf= 200, 400, 1200, 1600, and 2000; these values of nf are used for pairs of angles, such as a backbone dihedral and the successive bond angle. However, for the side chains we also reconstruct a single χ angle and triplets of successive χ angles (e.g., for Arg) for which the maximal nf is 1000 and 4000 (rather than 2000), respectively (see eqs 13 and 14). The results are also presented as a function of bin size, δαk=Δαk/l (eqs 12–14) where l=30, 40 and 50, while for nf= 2000 we also provide results for bin sizes defined by l=5, 10 and 20. The statistical errors were obtained from the fluctuations and results obtained for a smaller sample of ns=40 configurations.
Being an upper bound, (18a) is expected to decrease as the approximation improves, i.e., with decreasing the bin size – an expectation that is fully satisfied. For example, for Nwater=160 and nf =2000, the values are 64.3, 61.2, 60.3, 60.1, 60.1, and 60.1 (kcal/mol + constant), i.e. they decrease for l=5, 10, and 20 and converge to 60.1 (±0.3) kcal/mol for l=30, 40 and 50. The same behavior is observed for all nf values in the table, where in some cases the central values slightly decrease for l=40 and 50 but should be considered as converged within the error bars. One would also expect to decrease as nf increases in each bin. The table reveals that such decrease always occurs in going from nf=200 to 400 but then a slight increase is observed for the larger nf. This increase of stems from the fact that for large nf the future part of the loop is expected to span larger regions of conformational space during the MD simulation, and might also leave the original microstate. Therefore, the number of visits nvisit (eqs 13 and 14) to the bin decreases as compared to the number of visits obtained for nf=200 or 400, i.e. the probability decreases as well and increases.
This problem can be cured by dividing a (long) trajectory of size nf into j shorter trajectories (“units”) each based on n’f < nf conformations where nf=jn’f, and each unit starts from the reconstructed structure i with a different set of velocities followed by a short equilibration. In this procedure (which was carried out in our previous studies11,12,34–36) the future part of the loop is expected to remain within the original microstate. In any case, within the present statistical errors the results for in Table 4 can be considered as converged to the correct values [for comparison, the four results for (m stands for free or bound) for Nwater=160 and 180 based on smaller samples of ns=40 (obtained for Δαk/50, nf =2000) differ from those of Table 4 by 0.3, 0.2, 0.2 and 0.1 kcal/mol, respectively.] Moreover, we show below that differences in entropy (TΔSloop) - our main interest - are very stable. Finally notice that our results are based on relatively small samples of ns=80 as compared to samples of ns ~600 used in our previous studies, i.e., the present calculations lead to a reduction in computer time by factor of ~7. Such a small sample is effective because it has been selected homogeneously from the larger sample of 1000 structures (based on a 0.5 ns MD trajectory).
The HSMD results for the entropy are compared to those obtained with the quasi-hermonic (QH) method from larger MD samples of 10,000 loop-water configurations, which are needed for achieving a reasonable precision. To avoid the “escape” of a sample from the original microstate, it consists of 10 separate samples of 1000 configurations (0.5 ns), each started from the same structure with a different set of initial velocities, where the initial trajectory of 0.5 ns is used for equilibration and is thus discarded. The central values of (eq A3 in the Appendix) exceed the HSMD results for TSloop (for ns=2000) by ~13–16 kcal/mol. These elevated results are in accord with being an upper bound and are comparable to the overestimation of values found in our previous studies.11,12,34–36
As stated above, we are mostly interested in the results for the difference in entropy between the free and bound microstates, TΔSloop (eq 18b). Table 4 shows that for Nwater=160, the converged value is TΔSloop(n = 80) =1.2±0.1 kcal/mol which “covers” the results (eq 18a) for Δαk/l, l ≥30 for all nf values, even for nf=200, i.e., the free microstate has the higher entropy. A table similar to Table 4 based on ns=40 (not shown) has led to TΔSloop(ns =40) =1.3±0.2 kcal/mol, which is equal to TΔSloop(ns = 80) within the error bars. These result [which further supports our estimation of TΔSloop(ns = 80) ] stems from the fact that the values of for both microstates are systematically larger than those of and these positive deviations are cancelled in .
For Nwater=180, TΔSloop(ns = 80) = −0.7±0.3 kcal/mol, representing the results for Δαk/l, l ≥30 for all nf values, i.e. the bound microstate has the higher entropy. Here, the results for for the free microstate are systematically higher than for (as for Nwater=160) while for the bound microstate, , due to a significantly large contribution to the entropy of by two structures that appear in the ns=80 sample but not in the ns=40 sample. In any case, based on Δαk/l, l ≥30, we obtain, TΔSloop (ns = 40)= −0.3±0.3 kcal/mol which again is equal within the error bars to the value obtained from the ns=80 sample. Notice that the values are significantly larger than their HSMD counterparts, while the signs are the same.
The computer time required to reconstruct a loop structure capped with Nwater=160 and 180 is 8.1 and 9.2 hours CPU on a 2.1 GHz Athlon processor, meaning that the entire reconstructions required 1296 and 1472 h CPU. However, we have shown that considering only 10% (nf=200) of the maximal reconstruction samples and using smaller samples of ns=40 (rather than 80) have led to sufficiently accurate entropy differences, meaning that the total computer time can be reduced to 65 and 74 h CPU, respectively. We have generated the relatively large reconstruction samples to verify the convergence of the results.
In summary, the fact that TΔSloop changes sign in going from Nwater=160 to 180 is not surprising since significant changes are also observed in other components of the energy in Table 2 (e.g., Eloop-loop = −10 and −37 kcal/mol for Nwater=160 and 180, respectively). Also (like in previous studies), it is demonstrated that a limited future sampling in the reconstruction process (e.g., nf=200) is sufficient for obtaining the correct TΔSloop, which enables one to reduce computer time significantly. This convergence of entropy differences stems from the cancellation of approximately equal systematic errors in as discussed in detail in section II.10 of Ref. 11.
In Table 5 we summarize the contribution of the loop and water to the free energy averaged over samples of of ns=80 configurations. We provide Ewater, (eq 1) which includes the water-water, water-template and water-loop interactions, and the contribution of water to the free energy, (eq 21); TSwater (which is not provided) can be obtained from Ewater and . Eloop, which contains the loop-loop and loop-template interactions (eq 1), leads together with TSloop (taken from Table 4) to Floop. The entropy and free energy are defined up to additive constants, which are cancelled in the differences - our main interest.
The table shows that for Nwater=160, the absolute values of ΔFwater and ΔFloop are comparable; however, the contribution of TΔSwater to ΔFwater is significant, being ~33% of ΔEwater (in absolute values), while the contribution of TΔSloop to ΔFloop is small where TΔSloop constitutes only ~3.6% of ΔEloop. For Nwater=180 the situation is even more extreme, where the contribution of TΔSwater to ΔFwater (in absolute values) is larger (by 155%) than that of ΔEwater while the corresponding contribution of TΔSloop is again small (6.5%). In other words, for Nwater=160, Ewater(free) < Ewater(bound) significantly and correspondingly also TSwater(free) < TSwater(bound) significantly. For Nwater=180, Ewater(free) is only slightly smaller that Ewater(bound) while TS water(free) is larger than TSwater(bound). In any case, the effect of the entropy of water is significant. Also, the results of Table 5 and results in Table 2 and Table 3 demonstrate the important contribution of water to the total free energy, where for water densities close to the experimental value is always negative leading thereby to negative ΔFtotal (while ΔFloop or ΔEloop are always positive; see also Table 6)
The total contributions of Etotal (=Ewater+Eloop) and TStotal (=TSwater+TSloop) to Ftotal (=Fwater+Floop) are summarized in Table 6 (again, for the entropy only the results for TΔStotal are given). For Nwater=160, ΔEtotal and TΔStotal are comparable with the same sign (as for water above), and thus leading to a small negative ΔFtotal= −3.1±2.5 kcal/mol. For Nwater=180, ΔFtotal= −3.6±4 kcal/mol is equal to the value obtained for Nwater=160 within a larger error; however, ΔFtotal(180) is based on positive ΔEtotal and TΔStotal values (i.e., both the energy and entropy of the free microstate are larger than their bound counterparts). The fact that the entropic effects are significant means that (as the table also demonstrates) ΔEtotal by itself does not constitute an adequate criterion of stability. We also provide in the table the results for ΔFsum from Table 2, which are very close to those of Ftotal due to the small contribution of TΔSloop, meaning that in these cases Fsum serves as a reliable measure of stability.
The above results for ΔFtotal are equal to the experimental value, ~ −4 kcal/mol within the error bars. Furthermore, one would expect |TΔSloop(Nwater=140)| to be small, similar to the results obtained for Nwater=160 and 180 (this is based on values for Δαk as those presented in Table 1). Therefore, ΔFtotal(Nwater=140) is approximately represented by ΔFsum(Nwater=140) = −8±2 kcal/mol (see Table 2), which is close to the experimental value; the same is expected for the calculations based on Nwater=220 (for Rtmpl=13 and Rwater=14 Å), where ΔFsum(Nwater=220) = −3±2 kcal/mol (Table 3). This agreement of our calculations with the experiment should be accepted with some caution because for Rtmpl=13 and Rwater=14 Å the ΔFsum values for Nwater=180 and 200 (−20±2 and −18±2 kcal/mol, respectively; see Table 4) are too low to lead to ΔFtotal values close to the experimental result. However, the significant difference between the ΔFsum results for Nwater=180, 200, and 220, suggests that the initial water configurations in the larger systems of Rtmpl=13 and Rwater=14 Å have not sufficiently optimized (see II.2). We find this global energy optimization to be the most uncertain, while the accuracy of the simulation results (including TI) is adequate.
In the present paper HSMD-TI has been applied to the mobile loop 287–290 (Ile, Phe, Arg, and Phe) of the protein Acetylcholineesterase (AChE), where the difference in free energy between the free and bound structures of the loop, Ffree−Fbound has been estimated experimentally to be ~ − 4 kcal/mol. In view of this result, the main objectives of the present paper have been related to modeling issues: are a fixed template and capped water constitute an adequate model? and if they are, what is the minimal template size and minimal number of water molecules required to obtain reliable results? Another objective has been to further improve the efficiency of various components of HSMD-TI, in particular due to the fact that HSMD is applied for the first time to an internal loop consisting of residues with large sidechains. To achieve these aims, we have carried out a systematic study consisting of several template sizes which are capped with 80–220 water molecules.
We have emphasized the difficulty to determine the optimal distribution of water. Thus, the hemisphere contains bulk water (which can be simulated adequately by MD) as-well- as internal water molecules that reside in crevices on the surface and inside the template; because some of these waters are practically not mobile (by MD), they can be considered as part of the template, and their number, spatial positions, and orientations affect significantly the system energy. Using crystal water as internal ones might not always be adequate as our objective is to model the system in solution rather than in the crystal. Alternatively, one can seek to optimize (prior to the production simulations) the distribution of internal waters by energy minimization which, however, is an extremely difficult task. We have adopted a strategy where the initial configuration of water consists of ~30 crystal water positions and water molecules distributed randomly in the hemisphere; this configuration is optimized by a procedure based on a series of high temperature MD runs followed by energy minimizations. Clearly, finding the global energy minimum is practically unfeasible, but this is a general modeling problem not specific to HSMD-TI.
We have found that minimizing the energy of water before each integration step (during the TI process for calculating the contribution of water to the free energy) improves the convergence of the TI procedure significantly. Also, we have shown that satisfactory accuracy can be obtained by reconstructing a relatively small number of system configurations (80 or even 40), provided that they are selected homogeneously from the entire sample. This leads to a reduction in computer time by a factor of 7 as compared to our previous studies. In the reconstruction of the sidechain of Arg, three successive χ angles were treated successfully in each step, suggesting that 4 successive backbone angles (i.e., two pairs of a dihedral and the following bond angle) could be treated as well, which would decrease computer time further.
The limited number of atoms in the loop and waters, and the (relatively small) constant template lead to relatively small statistical errors (which for an N-atom system increase as N1/2). As in our previous studies (and in accord with theoretical arguments discussed in Ref. 11), we find that systematic errors in are cancelled to a large extent in differences, (eq 18a), and a similar cancelation occurs for the free energy of water, ΔFwater and other energy components. It should be pointed out (that in agreement with our previous studies and Ref. 58) the quasi-harmonic approximation has been found to overestimate the entropy significantly, which might reflect strong long-range correlations and an-harmonic effects within the loop due to the loop-template, loop-loop and loop-water interactions; also, the results for overestimate the HSMD values for |ΔSloop|. Finally, notice that the calculations of the transition probabilities of different steps are completely independent and they are also independent of the integration of water. Therefore, the reconstruction steps and TI of water can be fully parallelized.
The main conclusions from the present study (besides the above points which are mostly of a technical character) is that our approximate model is reliable, at least for the loop studied. This model is based on the same constant template for the free and bound microstates, where the loop is capped with a sphere of water molecules. By studying several template sizes and an increasing number of water molecules, we have found that to obtain consistent results for the free energy, ΔFtotal=Ffree−Fbound the template should be larger than a minimal size, and the number of water molecules (in the hemisphere of Rwater -Rtmpl=1 Å) should lead approximately to the experimental density of bulk water. Also, our results for ΔFtotal agree with the experimental data, ~ −4 kcal/mol. Our results demonstrate the important contribution of water to the total free energy, where for water densities close to the experimental value ΔFwater is always negative leading thereby to negative ΔFtotal (while ΔFloop is always positive). Also, the contribution of the water entropy, TΔSwater to ΔFtotal is significant. The next step would be to apply HSMD-TI to the free and bound loops attached to their own templates (defined by the PDB structures) rather than to a common template as was done here.
This work was supported by NIH grant 2-R01 GM066090-4 A2.
As described earlier, in the TI process the interaction energy [electrostatic and Lennard Jones (LJ)] between a fixed loop structure and the (moving) water molecules is decreased gradually to zero (rather than increased from zero) at constant T and V, where the water-water and water-template potential energy is unchanged. For the (LJ) potential we have used the shifted scaling potential, introduced by Zacharias et al.,59
where the shift parameter, δ=3 Å2, prevents the divergence of the potential (and its derivative) at small pair separations; a similar scaling function is used for the Coulomb interactions. The free energy derivatives with respect to λ, F/λ is
where the derivative of the energy is calculated analytically. The integration with respect to λ is carried out by dividing the range [1,0] into 20 equal integration bins Δλi. The (λ=1→λ=0) integration of the electrostatic interactions (i.e. charge elimination) is carried out first (in the presence of intact LJ interactions) followed by a ε=λ =1→0 integration of the LJ interactions. Thus, the entire two-stage process is based on 40 F/λi integration steps.
The MD simulation consists of a 2 fs integration step. Each (Δλi) step starts with energy minimization (based on λi) of the last structure obtained in the simulation of the i-1 step, followed by 5 ps MD simulation for equilibration, which is discarded; the next 20 ps of MD simulation a configuration is retained every 0.02 ps, i.e., altogether 1000 configurations are used for evaluating <F/λi>. It should be pointed out that the energy minimization (which was not performed in paper 20) has contributed to a nice convergence of the integration results.
For a single loop structure the free energy integration requires approximately the same time for each procedure (electrostatic or LJ) i.e., ~2×9.2 =18.4 h CPU for Nwater=160 and ~10.5×2=21 h CPU for Nwater=180 on a 2.1 GHz Athlon processor; these times refer to the double TI, where the MD simulation at each step is doubled, i.e. it is based on 40 ps for each λi (see Table 2 and Table 3). These simulation lengths are adequate because the loop’s conformation is kept fixed (as well as the template) and only the water molecules are moved by MD during integration.
With the QH method introduced by Karplus and Kushick,20 the Boltzmann probability density of structures defining a microstate is approximated by a multivariate Gaussian. Thus,
where the covariance matrix, σ, is obtained from a local MD (MC) sample and N is (usually) the number of internal coordinates. Clearly, SQH constitutes an upper bound for S since correlations higher than quadratic are neglected; also, an-harmonic contributions are ignored, and QH is not suitable for diffusive systems such as water. While QH has been used extensively during the years, a systematic study of its performance has been carried out only recently by Gilson’s group55 who have found that the performance of QH deteriorates significantly in Cartesian coordinates and when applied to more than one microstate.7