Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2671935

Formats

Article sections

- Abstract
- Introduction
- Computational Methods
- Results
- Discussion
- Conclusions
- Supplementary Material
- References

Authors

Related links

J Am Chem Soc. Author manuscript; available in PMC 2010 April 1.

Published in final edited form as:

PMCID: PMC2671935

NIHMSID: NIHMS101858

Department of Chemical Engineering and Materials Science, 151 Amundson Hall, 421 Washington Avenue SE, University of Minnesota, Minneapolis, Minnesota, 55455, USA

The publisher's final edited version of this article is available at J Am Chem Soc

See other articles in PMC that cite the published article.

We present a novel approach for computing biomolecular interaction binding affinities based on a simple path integral solution of the Fokker-Planck equation. Computing the free energy of protein-ligand interactions can expedite structure-based drug design. Traditionally, the problem is seen through the lens of statistical thermodynamics. The computations can become, however, prohibitively long for the change in the free energy upon binding to be determined accurately. In this work we present a different approach based on a stochastic kinetic formalism. Inspired by Feynman's path integral formulation, we extend the theory to classical interacting systems. The ligand is modeled as a Brownian particle subjected to the effective non-bonding interaction potential of the receptor. This allows the calculation of the relative binding affinities of interacting biomolecules in water to be computed as a function of the ligand's diffusivity and the curvature of the potential surface in the vicinity of the binding minimum. The calculation is thus exceedingly rapid. In test cases, the correlation coefficient between actual and computed free energies is >0.93 for accurate data-sets.

The accumulation of atomic-resolution information of protein-ligand interactions available through public databases of X-ray crystallographic and NMR structures^{1} has paved the way for the development of several theoretical methods for binding free energy prediction^{2} based on atomistic representations of the protein-ligand complex. Typically, the theoretical foundation is based on statistical thermodynamics. For an interaction between a protein and a ligand, E and I, interacting in solution to form a complex E*I we can write the biomolecular reaction

$${\left[E\right]}_{aq}+{\left[I\right]}_{aq}\underset{{k}_{off}}{\overset{{k}_{on}}{\rightleftarrows}}{\left[{E}^{\ast}I\right]}_{aq}$$

(1)

At equilibrium we can write for the standard free energy of association

$$\Delta {G}^{\mathrm{o}}={{\mu}^{\mathrm{o}}}_{{E}^{\ast}I}-({{\mu}^{\mathrm{o}}}_{E}+{{\mu}^{\mathrm{o}}}_{I})=-RT\phantom{\rule{thinmathspace}{0ex}}\mathrm{ln}(1\u2215{K}_{i})$$

(2)

where *μ*^{o}_{E*I}, *μ*^{o}_{E}, *μ*^{o}_{I} are the standard chemical potentials of the complex and the individual species respectively, *R* is the ideal gas constant, *T* the temperature and 1/*K _{i} =k_{on}/k_{off}* is the binding equilibrium constant.

These macroscopic thermodynamic properties connect to microscopic properties determined by atomistic computer simulations through the classical statistical thermodynamics relationship

$$\Delta {G}^{\mathrm{o}}=\Delta {A}^{\mathrm{o}}+P\Delta {V}^{\mathrm{o}}=-RT\phantom{\rule{thinmathspace}{0ex}}\mathrm{ln}[\{{Q}_{{E}^{\ast}I}\u2215\left({N}_{Avo}{Q}_{W}\right)\}\u2215\left\{({Q}_{E}\u2215\left({N}_{Avo}{Q}_{W}\right))({Q}_{I}\u2215\left({N}_{Avo}{Q}_{W}\right))\right\}]+P\Delta {V}^{\mathrm{o}}$$

(3)

where *Q _{E*I} , Q_{E} , Q_{I} , Q_{W}* are the molecular, canonical ensemble partition functions of the complex, the individual species and the solvent respectively. In principle, the partition functions enumerate all the possible microscopic states of the molecules. In practice, the direct calculation of the partition function for as complex a system as a solvated protein is theoretically and computationally unfeasible, because of the configurational integral. For one of the interacting species, say inhibitor

$${{Z}^{\text{int}}}_{I}=\int \dots \int \text{exp}\{-E({x}_{I}^{N},{x}_{W}^{M})\u2215{k}_{B}T\}d{x}_{I}^{N}d{x}_{W}^{M}=\langle \text{exp}\phantom{\rule{thickmathspace}{0ex}}E({x}_{I}^{N},{x}_{W}^{M})\u2215{k}_{B}T\rangle $$

(4)

where ${x}_{I}^{N}$ and are the ${x}_{W}^{M}$ are the dimensions of the configurational space available to the *N* molecules of species *I* and the *M* molecules of solvent *W*. $E({x}_{I}^{N},{x}_{W}^{M})$ is the potential energy of interaction between *I* and *W*.

There have been ingenious efforts to approximate the calculation, decomposing the free energy into numerous components that can be tractably computed^{3}^{-}^{14}. Nonetheless, difficulties persist, mainly because the error in the calculations of free energy components is larger than the actual, absolute value of the binding strength. Therefore, although theoretically on firm ground, the statistical thermodynamics approach to use atomistic models of biological molecules to predict the free energy of binding can be fruitful only with the injection of empirically derived corrections or severe simplifying assumptions. We propose the following alternative approach.

Considering the phase orbit of a classical mechanical system in phase space and integrating out the solvent degrees of freedom, the path will resemble the motion of a Brownian particle described by a Langevin equation:

$$\frac{d\underset{\u2012}{x}}{dt}=-\frac{D}{{k}_{B}T}\frac{\partial U\left(\underset{\u2012}{x}\right)}{\partial \underset{\u2012}{x}}+\underset{\u2012}{W}\left(t\right)$$

(5)

where W(t) is Gaussian noise with average <W(t)>=0 and correlation *W(t)W(t*') = 2*Dδ(t–t'), D* is the diffusion of the system in phase space, *k _{B}* and

An equivalent Fokker-Planck equation is

$$\frac{dP(\underset{\u2012}{x},t)}{dt}=D\frac{\partial}{\partial \underset{\u2012}{x}}\left(\frac{1}{{k}_{B}T}\frac{\partial U\left(\underset{\u2012}{x}\right)}{\partial \underset{\u2012}{x}}P(\underset{\u2012}{x},t)\right)+D\frac{{\partial}^{2}P(\underset{\u2012}{x},t)}{\partial {\underset{\u2012}{x}}^{2}}$$

(6)

The solution can be written as a path integral ^{15}^{-}^{18}

$$P({\underset{\u2012}{x}}_{f},{t}_{f}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{\underset{\u2012}{x}}_{i},{t}_{i})={e}^{-(U\left({\underset{\u2012}{x}}_{f}\right)-U\left({\underset{\u2012}{x}}_{i}\right))\u2215{k}_{B}T}\int \mathcal{D}\underset{\u2012}{x}\left[t\right]{e}^{-{S}_{eff}\left[\underset{\u2012}{x}\right]}$$

(7)

where,

$${S}_{eff}\left[\underset{\u2012}{x}\right]={\int}_{{t}_{i}}^{{t}_{f}}d\tau \left[\stackrel{\u2022}{\underset{\u2012}{x}}\left(\tau \right)\u22154D+{V}_{eff}\left(\underset{\u2012}{x}\right)\right]$$

(8)

The effective potential is

$${V}_{eff}\left(\underset{\u2012}{x}\right)=\frac{D}{2}{\left(\frac{1}{{k}_{B}T}\frac{\partial U\left(\underset{\u2012}{x}\right)}{\partial \underset{\u2012}{x}}\right)}^{2}-\frac{D}{{k}_{B}T}\frac{{\partial}^{2}U\left(\underset{\u2012}{x}\right)}{\partial {\underset{\u2012}{x}}^{2}}$$

(9)

The conditional probability solution for remaining in the same state space position for small time intervals is

$$\underset{t\to 0}{\text{lim}}P(\underset{\u2012}{x},\underset{\u2012}{x};t)=\frac{1}{\sqrt[n]{4\pi Dt}}\text{exp}(-{V}_{eff}\left(\underset{\u2012}{x}\right)t)$$

(10)

Here *t* is very short, of the order of the average duration of solvent collisions, and n is the number of degrees of freedom.

In a lucid analysis of the Langevin equation, Grooth ^{19} surmises that the friction coefficient *γ* = *k _{B} T/D* = 2

$$\underset{t\to 0}{\text{lim}}P(\underset{\u2012}{x},\underset{\u2012}{x};t)=\sqrt[n]{\frac{{k}_{B}T}{8\pi {m}_{s}{D}^{2}}}\text{exp}\left(-\frac{2{m}_{s}D{V}_{eff}\left(\underset{\u2012}{x}\right)}{{k}_{B}T}\right)$$

(11)

If we consider the reversible reaction between the two areas of phase space representing the bound and unbound states of the ligand given by

$${\left[{I}_{bound}\right]}_{aq}\underset{{k}_{-1}}{\overset{{k}_{1}}{\rightleftarrows}}{\left[I\right]}_{aq}$$

(12)

where [*I*_{bound}]=[*E*I*], the transitional probability to stay in the bound conformation (denoted by *x _{B}*) is

$$\underset{t\to 0}{\text{lim}}P({\underset{\u2012}{x}}_{B},{\underset{\u2012}{x}}_{B};t)=\frac{{k}_{-1}}{{k}_{1}+{k}_{-1}}$$

(13)

Combining equations 11 and 13

$$\sqrt[n]{\frac{{k}_{B}T}{8\pi {m}_{s}{D}^{2}}}\text{exp}\left(-\frac{2{m}_{s}D{V}_{eff}\left(\underset{\u2012}{x}\right)}{{k}_{B}T}\right)=\frac{{k}_{-1}}{{k}_{1}+{k}_{-1}}$$

(14)

For equations 1 and 12 to be equivalent, since [*I _{bound}*]

$$\mathrm{ln}\left(1+\frac{{K}_{i}}{\left[E\right]}\right)=-\left(\frac{2{m}_{s}{D}^{2}}{{\left({k}_{B}T\right)}^{2}}\frac{{\partial}^{2}U\left(\underset{\u2012}{x}\right)}{\partial {\underset{\u2012}{x}}^{2}}+\mathrm{ln}\left(\sqrt[n]{\frac{{k}_{B}T}{8\pi {m}_{s}{D}^{2}}}\right)\right)$$

(15)

where *K _{i}* is the equilibrium dissociation constant for eq 1, [

Equations 14 and 15 provide an elegant means for computing the free energy of biomolecular interactions given the bound state structure. Eq 15 has only two variables – the diffusion coefficient of the ligand, *D*, and the second derivatives of the potential for the bound complex. The latter can be computed quickly given the bound state, as described later.

Diffusion coefficients for organic molecules can also be estimated to high accuracy by semiempirical derivatives of the Stokes-Einstein relation which typically give scaling functions *D~V ^{−0.6}* where

Even though, in theory, the right hand side (RHS) of eq 15 can be exactly determined, experimental errors would give rise to empirical coefficients for the two terms in RHS. In practice, therefore, these variables in eq 15 must be trained with a set of protein-ligand complexes with known binding free energies.

Here we present a systematic study of the application of eq 15 towards the prediction of binding affinities of three different enzyme-inhibitor systems – bovine trypsin, β-secretase and aldose reductase. We restrict our study to systems for which experimental inhibitor affinity measures (*K _{i}, IC_{50}*) as well as X-ray crystal structures for the bound complexes are available. We find that eq 15 is not only a convenient and accurate means for predicting binding affinities, but also provides useful insights on the effect of the oft-ignored assay conditions on measured binding affinities.

Diffusivities for organic molecules at infinite dilutions can be accurately determined by semiempirical derivatives of the Stokes-Einstein relationship of which the following equation by Wilke-Chang is perhaps the most popular,^{21}

$${D}_{WC}=\frac{7.4\times {10}^{-12}\sqrt{\chi {M}_{W}}T}{\eta {V}^{0.6}}$$

(16)

where *χ* indicates the degree of solvent association, *M _{w}* is the molar mass of the solvent (18.015 g mol

The first and second terms on the RHS of eq 15 (denoted as TermA and TermB, respectively) were computed using *D _{WC}* for diffusivity

$$\mathrm{ln}\phantom{\rule{thickmathspace}{0ex}}\left[{\u2023}_{{\rm K}}^{i}\right]=\mathrm{A}{\phantom{\rule{thinmathspace}{0ex}}}^{\ast}\phantom{\rule{thinmathspace}{0ex}}\text{Term}\mathrm{A}+\mathrm{B}{\phantom{\rule{thinmathspace}{0ex}}}^{\ast}\phantom{\rule{thinmathspace}{0ex}}\text{Term}\mathrm{B}+\mathrm{I}$$

(17)

where A and B are empirical constants and I is the intercept. For the trypsin dataset *ln*[*K _{i}*′] =

Since *χ* is a dimensionless empirical parameter, different values have been proposed for *χ* in the literature for different types of molecules – 2.6 by Wilke and Chang^{21}, 2.9 for organic electrolytes^{25}, 2.26 for nonelectrolytes^{26} and 1.61 for aromatics^{27}. In this study we used *χ* =2.6 and introduced a new empirical parameter , such that *D = *D _{WC}*, to correct for the use of

The modified eq 15 in three dimensions is therefore,

$$\mathrm{ln}\left(1+\frac{{K}_{i}}{\left[E\right]}\right)=-\kappa \left(\frac{2{m}_{s}{\phi}^{2}{{D}_{WC}}^{2}}{{\left({k}_{B}T\right)}^{2}}\frac{{\partial}^{2}U\left(\underset{\u2012}{x}\right)}{\partial {\underset{\u2012}{x}}^{2}}+\mathrm{ln}\left({\lambda}^{3}\sqrt[3]{\frac{{k}_{B}T}{8\pi {m}_{s}{\phi}^{2}{{D}_{WC}}^{2}}}\right)\right)$$

(18)

Note that parameters and *λ* are not necessary to obtain a useful equation for predicting activities. However, determination of helps to compare the effective diffusivities of the ligands between datasets, while parameter *λ* nondimensionalizes the probability of eq 10.

For the first derivative of the potential to be zero in *V _{eff}* (eq 9), the solvated enzyme/ligand complex has to be proximal to its lowest energy conformation. Protein/ligand complexes were therefore minimized prior to computation of the second derivatives. The ligand/receptor complexes were minimized in MOE

If we treat the inhibitor as a rigid body at the potential minumum, ^{2}*U*(*x*)/*x*^{2} of eq 15 is the trace of the resultant three dimensional Hessian matrix (*k = k _{xx} + k_{yy} + k_{zz}*). The components of the Hessian matrix were evaluated from a finite difference of the forces about the potential minimum as follows:

$${k}_{ij}=\frac{{\partial}^{2}U}{\partial i\partial j}=\frac{\partial {F}_{i}}{\partial j}=\frac{{F}_{i+\Delta i}-{F}_{i-\Delta i}}{2\Delta j}$$

with Δ*i*=Δ*j*=1×10^{−6} Å, where *i* and *j* refer to the three Cartesian dimensions x,y and z, *F _{i}* is the force at position

Statistical analysis and linear regression was carried out using JMP^{29}. The leave-one-out cross validation coefficient, *R ^{2}_{cv}*, and the root mean square error of cross validation (

TermA and TermB of eq 17 were computed for the entire dataset (Table 1). The linear regression fit for the entire dataset is presented in figure 1. While the R^{2} of the fit is not very significant (0.51), a close examination of figure 1 presents some remarkable trends. Significant clustering is observed for points belonging to the same protein, with either systematic over- or under prediction within data-points of those sub-sets. We assumed that these differences were due to either the specific details of the assay buffers, or the chemical characteristics of ligands themselves, both of which would affect the ligands’ diffusivities. The Wilke-Chang equation (eq 16) evaluates diffusivities for molecules at infinite dilution, and systematic deviations from the calculated diffusivities are expected that are dependent on the assay buffers. We therefore decided that each subset had to be trained separately for better fit. Also intriguing was the presence of two clusters in the β-secretase data-set, one of which was significantly underpredicted (figure 1) and belonged entirely to data from the work of the same group presented in two different papers^{31}^{,}^{32}. A comparison of the assays between this subset and the rest of the β-secretase complexes revealed that these assays had 10% dimethylsulfoxide (DMSO) in the assay buffer. DMSO is a highly viscous solvent and is bound to have significant effect on the diffusivity of the ligand molecules. Aminabhavi and Gopalakrishna^{33} report a viscoscity (mPa.s) increase to 1.787 from 0.891 as the mole fraction of water changes from 1 to 0.9. A 10% DMSO buffer, therefore increases the buffer viscoscity by ~30%, sufficient to cause the relative shift observed in figure 1. It is also quite likely that other effects of DMSO, such as altered solubility and hydrophobicity play an important role towards the activity as well. These data points were therefore separated into two, separate subsets for training the linear regression fit.

Linear regression fit for the entire data-set - ■ (subset 1), † (subset 2a), + (subset 2b) and ● (subset 3).

The statistics of the linear regression fit for the four different subsets belonging to bovine trypsin, β-secretase (with and without DMSO) and aldose reductase are presented in Table 2.

Experimental evaluations of the inhibitor affinity measurements are very sensitive to experimental conditions with typical coefficients of variation of ~20% in the reported analytical measurements. The best scenario therefore involves data obtained by the same laboratory and the use of the same assay. The trypsin data-set, therefore, was of very high quality, since all of the data points used in that set were obtained by the same team. *Katz et al* ^{34}^{,}^{35} conducted a high-throughput study involving the determination of over 300 high resolution crystal structures of trypsin bound with small molecules inhibitors (2-(2-phenol-indoles and 2-(2-phenol)-benzamidazoles) over a wide range of pH. Scanning through the available structures led to a data-set of 12 inhibitors with their inhibition constants (*K _{i}*s). This reduced data-set was due to the fact that all the inhibitors studied here exhibit a parabolic pH dependence with a minimum in

While there are no outliers in this data-set, the hydroxyl groups of structures 1o36 and 1o3g had to be protonated for a better fit. This was not unreasonable considering that the pKa's of the hydroxyl groups for the free molecules are predicted to be close to the pH of the assays. The activities are predicted to very high accuracy (R^{2} = 0.93, σ= 0.62, Table 2) in the prediction of **ln**[*K _{i}*′]. A σ= 0.62 implies that the

The β-secretase data-set is derived from the work of several independent laboratories each with differing assay conditions, substrates, substrate- and enzyme concentrations and measures of affinity (*K _{i}* and IC50s) (Table 1). The LHS of eq 18 has the term 1+

Linear regression fit for the β-secretase data set – 2a with complexes whose assays contained 10% DMSO ( – outlier, 2oht).

Linear regression fit for the β-secretase data set – 2b with complexes whose assays contained no DMSO ( – outliers).

The second data-set has five outliers, 1ym1, 1ym2, 2iso, 2irz and 2iqg. A sixth outlier, the ligand of 2oah, was assayed without separation of its stereoisomers, therefore justifying its exclusion from the fit. Due to the significant reduction in the accuracy of prediction with the inclusion of 2oht in the previous data-set, we chose to exclude 1ym1, 1ym2, 2iqg, 2iso and 2irz from the fit, even though we are aware that the exclusion of these points artificially improves the statistics of the fit. Supporting our choice is the fact that the four of the outliers are from the work of two different papers ^{38}^{,}^{39}, and three of these – 2iqg, 2iso, and 2irz (and 2ph6) use as a substrate a fusion protein containing maltose binding protein at the amino terminal end and 125 amino acids of the amyloid precursor protein amino terminal end. All the other assays use significantly smaller peptide substrate in the assay. The use of a constant *χ* in the Wilke-Chang equation for the empirical fit could lead to errors in the estimation of *D*, since its value suggested value varies from 1.61 to 2.9 depending on the chemical nature of the molecule^{25}^{,}^{27}. This is a flaw in our analysis since the specific chemical characteristics of the molecule were not considered in the empirical determination of *D*. The ligands of 1ym1 and 1ym2, for instance, have very little aromatic character to them while that is not true of the other ligands. Also, conditions specific to those assays may also be responsible for the observed deviation. Even so, the deviation in the predicted **ln**[*K _{i}*′]s for these outliers is within a factor of 2.5, which implies that the predicted activity is correct to an order of magnitude. In general, since the ligands in this study are significantly more complex than those used to drive the empirical fit in the Wilke-Chang equation, extrapolation to more complex molecules may increase the error.

The aldose reductase data-set ^{40}^{-}^{53} is expected to be most noisy, since the data spans over a decade of work and is obtained from the work of several different laboratories. In fact an examination of the assays shows that the affinity data was obtained from the use of a wide range of substrates (glucose, xylitol, xylose, glyceraldehydes, and benzyl alcohol). The LHS of eq 18 involves [*E*] whose value depends on [*E _{Total}*] and the ratio of [

Since eq 18 predicts the affinities to very high precision, noisy data is expected to lead to inaccurate empirical prediction formulas for the activity. This is observed in the fit for this data-set (figure 5), where the contribution of the TermA is not captured at all. This data-set has three outliers - 1pwl, 1z8a, 1z89. While we can not explain the inaccuracy in the prediction of 1pwl, an examination of the assay buffers of the ligands of 1z89 and 1z8a, not surprisingly, found 20% DMSO in the assay buffer. Here, (σ = 1.22) activities are predicted to within an order of magnitude, which is close to the expected level of noise in this data-set. The widely varying chemical composition of the ligands in this data-set will also increase errors in the evaluation of the diffusivities. A high sensitivity to the diffusivity is expected in the prediction of the activities due to the nature of eq 18, with the sensitivity being much higher for TermA than TermB, since a *D ^{2}* features in TermA while TermB involves ln (1/

In general it appears that eq 18 can predict free energies almost to the precision of the available data for a linear regression fit. Of great interest is the effect of the magnitude of the diffusion coefficient on the observed inhibitor affinities. The systematic error observed in the residuals in figure 1 (Table 1) appears to be due to the different correction factor () that was required for different enzyme assays. The correction factor, , was determined by equalizing the coefficients of TermA and TermB in the linear regression fit (Table 2). The value of explains the strong dependence on TermB in data-set 2a. This appears to be due to the lower diffusivities of these molecules in comparison to the trypsin dataset. Larger diffusivities of the ligands in the trypsin data-set reduce the accuracy of prediction with TermB alone in the trypsin dataset (R^{2} = 0.39, σ=1.55, F=5.84). The decrease in the diffusivities of data-set 2a is most likely due to the increased viscosity of the solution due to addition of DMSO, since viscosity appears in the denominator of Wilke-Chang equation. The sensitivity of the activity of the molecules to their diffusion coefficients highlights the importance of the conditions of the assay towards the observed activity.

An interesting implication of the strong dependence on TermB for subset 2a is that the energetics of the binding are of little consequence for predicting activity for such systems. The activity almost entirely depends on the diffusion coefficient. The Wilke-Chang equation implies that activity will be strongly correlated with molecular volume. Since molecular masses and volumes are strongly correlated (R^{2} = 0.96 for the entire data-set), the molecular mass would also be highly correlated with activity. This interesting fact has been observed by Kim and Skolnick,^{55} where a comparison of scoring functions of different docking algorithms led to the observation that molecular mass was as good a predictor of activity as the scoring functions themselves. There, the authors argue that this observation may be due to the fact that most rational drug design involves improvement of efficacy of drug leads through the addition of favorable functional moeties, thus leading to an increase in mass. Equation 18 implies that the reason for the observed correlation is more fundamental since the size of the molecule directly affects its diffusivity. In fact we find that the use of the logarithm of the molecular weight (**ln**(*M _{W}*)) and

Apart from the composition of the assay buffer, other factors such as temperature also strongly influence *D*. For instance, from the Wilke-Chang equation, the ratio of diffusivities of a molecule at infinite dilution at temperatures 310 and 298 would be 1.34. Thus, for a low molecular weight ligand, eq 18 would predict significantly different activities at different temperatures.

One common assumption made to simplify the configurational integral of eq 4 is that it can be approximated by “end-point” calculations that limit the computation to the solvated bound/unbound ligand/receptor alone. Equation 18 implies that such an approximation will capture only a part of the picture, neglecting the contribution of the molecules’ diffusivity towards binding affinity.

Despite the success of eq 18 with the trypsin and β-secretase data-sets and the fairly low computational cost of its evaluation, several problems exist with its usage on a large scale for predicting activities. The first comes from the limitations of the empirical nature of the Wilke-Chang equation for predicting diffusivities. It is not surprising therefore, that the highest accuracy in the prediction of activities was achieved for data-sets 1 and 2a where the ligands were of similar chemical nature. For a data-set with molecules of widely differing properties, such as data-set 3, the use of a single empirical coefficient will most likely lead to significant errors in the prediction of diffusion coefficients. Besides, the success of the Wilke-Chang equation has been demonstrated for a limited number of small organic molecules in solution and it is not clear if its extrapolation to larger and chemically more diverse molecules typical of drug candidates would lead to accurate predictions of diffusion coefficients. It appears therefore that independent methods for determining diffusivities may have to supplement drug design experiments.

A second difficulty of the method outlined here is with the determination of the trace of the Hessian matrix. The trace is very sensitive to the final minimized complex as well as to the minimization protocol used. The correlation coefficient, *R ^{2}*, of the linear regression fit for trypsin data-set decreases significantly when cut-offs (8 Å) were used for evaluating the non-bonded interactions. A Generalized-Born solvation model was necessary during minimization to simulate the effect of the solvent. This is expected since the minimum of

Also interesting is the relationship that results for *V _{eff}*, where the significant factor for unbinding is not the interaction potential as intuition suggests, but its second derivative. In other words, the unbinding free energy depends not on the depth of the potential well that the ligand finds itself in, but rather on the curvature of this potential in the vicinity of energy minimum. This is a consequence of the over-damping effect of the solvent that appears as an assumption in the Fokker-Plank equation.

Our greatest challenge in this work, was to find data-sets of sufficiently large size (for statistical significance) that met all of our criteria – namely the availability of affinity data together with crystal structures for every complex, as well as the absence of significant conformational rearrangements in the active-site associated with binding. The absence of metal atoms and significant interactions with crystal waters was also considered essential as we were interested in the simplest possible data-sets to test the theory. We believe that it is due to this choice that we obtained reasonably good fits despite significant simplification of the physics. It is quite likely that other rate-limiting steps such as secondary binding events prior to unbinding or conformational rearrangements in the active-site during unbinding would significantly complicate the picture and perhaps demand a less trivial solution.

Two other methods for affinity prediction that were derived involving assumptions, albeit very different from ours, that lead to a simplification of the underlying physics and hence a reduction of the computational expense can be mentioned here. The first of these is the Linear Interaction Energy (LIE) method developed by Aqvist and coworkers^{56}. The LIE method was derived as an approximation to free energy perturbation, where a linear response is assumed for electrostatic interactions, together with an empirical expression for nonpolar effects. The interaction energies are evaluated from the average of molecular dynamics simulation (MD) energies. The second method is the Mining Minima method developed by Gilson and coworkers^{57} and applies to the direct computation of the configurational integral of a molecule (eq 4) as a sum of the contribution of low energy states using a Monte Carlo (MC) technique. Both methods require conformational sampling through MD or MC techniques. In comparison, our method has the advantage that a single conformation is sufficient for free energy prediction.

In a recent review of docking and scoring functions Leach *et al*^{58} point out that the accuracy of scoring functions has reached a plateau and that there is need for a breakthrough to develop. The path-integral formalism presented in this work offers an alternate perspective on the problem of binding free energy prediction.

The practical complexity of the traditional statistical thermodynamics approach implies that the effect of the assay buffer cannot be reasonably accounted for without significantly increasing the already intractably high computational time of estimating the configurational integral of eq 4. The path integral formalism presented here is elegant not only for its high accuracy, but also for its computational expense, which is minimal.

In its current form, eq 18 has great potential for the development of improved scoring functions for docking algorithms. A more extensive verification of eq 18 is certainly necessary, and at this point this is mainly limited by lack of high quality data-sets such as that of bovine trypsin. It remains to be seen how eq 18 performs in real life drug design scenarios, where typically only a single bound complex is available as a template for the prediction of other bound complexes through docking. It would also greatly help if inhibition experiments were supplemented with experimental determination of the ligands’ diffusivities in their respective buffers, which would remove the reliance on empirical relations such as Wilke-Chang equation that was used in this work.

The authors would like to thank Dr. Abdallah Sayyed-Ahmad and Dr. Anthony D Hill for helpful discussions. This work was supported by a grant from NIH (GM 070989). Computational support from the Minnesota Supercomputing Institute (MSI) is gratefully acknowledged. This work was also partially supported by National Computational Science Alliance under MCA04T033.

Supporting Information Available: Contains derivations of important relations in the kinetics of competitive inhibition. This material is available free of charge via the Internet at http://pubs.acs.org.

1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. Nucleic Acids Res. 2000;28:235–242. [PMC free article] [PubMed]

2. Pohorille A, Chipot C, editors. Free energy calculations - Theory and applications in chemistry and biology. Springer; Heildelberg: 2007.

3. Gräter F, Schwarzl SM, Dejaegere A, Fischer S, Smith JC. J Phys Chem B. 2005;109:10474–10483. [PubMed]

4. Oostenbrink C, van Gunsteren WF. Proc Natl Acad Sci U S A. 2005;102:6750–6754. [PubMed]

5. Raha K, Merz KM., Jr. J Med Chem. 2005;48:4558–4575. [PubMed]

6. Huang N, Kalyanaraman C, Bernacki K, Jacobson MP. Phys Chem Chem Phys. 2006;8:5166–5177. [PubMed]

7. Laurie AT, Jackson RM. Curr Protein Pept Sci. 2006;7:395–406. [PubMed]

8. Foloppe N, Hubbard R. Curr Med Chem. 2006;13:3583–3608. [PubMed]

9. Gilson MK, Zhou HX. Annu Rev Biophys Biomol Struct. 2007;36:21–42. [PubMed]

10. Mobley DL, Graves AP, Chodera JD, McReynolds AC, Shoichet BK, Dill KA. J Mol Biol. 2007;371:1118–1134. [PMC free article] [PubMed]

11. Ruvinsky AM. J Comput Aided Mol Des. 2007;21:361–370. [PubMed]

12. Sega M, Faccioli P, Pederiva F, Garberoglio G, Orland H. Phys Rev Lett. 2007;99:118102–118104. [PubMed]

13. Ghosh A, Elber R, Scheraga HA. P Natl Acad Sci USA. 2002;99:10394–10398. [PubMed]

14. Zeevaart JG, Wang LG, Thakur VV, Leung CS, Tirado-Rives J, Bailey CM, Domaoal RA, Anderson KS, Jorgensen WL. Journal of the American Chemical Society. 2008;130:9492–9499. [PMC free article] [PubMed]

15. Onsager L, Machlup S. Phys. Rev. 1953;91:1505–1512.

16. Feynman RP, Hibbs AR. Quantum mechanics and path integrals. McGraw-Hill; Maidenhead: 1965.

17. Wiegel FW. Physica. 1967;37:105–113.

18. Wiegel FW. Physica. 1967;33:734–736.

19. Grooth BGD. American Journal of Physics. 1999;67:1248–1252.

20. McQuarrie DA. Journal of Applied Probability. 1967;4:413–478.

21. Wilke CR, Chang P. A. I. Ch. E. Journal. 1955;1:264–270.

22. Le Bas G. The Properties of Gases and Liquids, Monograph. Longmans, Green and Co.; New York: 1915.

23. La-Scalea MA, Menezes CMS, Ferreira EI. J Mol Struc-Theochem. 2005;130:111–120.

24. MOE: The Molecular Operating Environment . 2006.08 ed. Chemical Computing Group: 1010 Sherbrooke Street West, Suite 910, Montreal, Canada H3A 2R7: 2005.

25. van der Wielen LAM, Zomerdijk M, Houwers J, Luyben KCAM. Chem Eng J. 1997;66:111–121.

26. Hayduk W, Laudie H. A. I. Ch. E. Journal. 1974;20:611–615.

27. Niesner R, Heintz A. J Chem Eng Data. 2000;45:1121–1124.

28. Halgren TA. J Computl Chem. 1996;17:490–519.

29. JMP Statistics Software . 4.0.4 ed. SAS Institute; Cary, NC, USA: 1989.

30. MATLAB . 7.2.0.294 (R2006a) ed. The MathWorks, Inc.; Natick, MA, USA: 2006.

31. Congreve M, Aharony D, Albert J, Callaghan O, Campbell J, Carr RAE, Chessari G, Cowan S, Edwards PD, Frederickson M, McMenamin R, Murray CW, Patel S, Wallis N. J Med Chem. 2007;50:1124–1132. [PubMed]

32. Murray CW, Callaghan O, Chessari G, Cleasby A, Congreve M, Frederickson M, Hartshorn MJ, McMenamin R, Patel S, Wallis N. J Med Chem. 2007;50:1116–1123. [PubMed]

33. Aminabhavi TM, Gopalakrishna B. J Chem Eng Data. 1995;40:856–861.

34. Katz BA, Elrod K, Luong C, Rice MJ, Mackman RL, Sprengeler PA, Spencer J, Hataye J, Janc J, Link J, Litvak J, Rai R, Rice K, Sideris S, Verner E, Young W. J Mol Biol. 2001;307:1451–1486. [PubMed]

35. Katz BA, Elrod K, Verner E, Mackman RL, Luong C, Shrader WD, Sendzik M, Spencer JR, Sprengeler PA, Kolesnikov A, Tai VW, Hui HC, Breitenbucher JG, Allen D, Janc JW. J Mol Biol. 2003;329:93–120. [PubMed]

36. Coburn CA, Stachel SJ, Li YM, Rush DM, Steele TG, Chen-Dodson E, Holloway MK, Xu M, Huang Q, Lai MT, DiMuzio J, Crouthamel MC, Shi XP, Sardana V, Chen ZG, Munshi S, Kuo L, Makara GM, Annis DA, Tadikonda PK, Nash HM, Vacca JP, Wang T. J Med Chem. 2004;47:6117–6119. [PubMed]

37. Hanessian S, Yang G, Rondeau JM, Neumann U, Betschart C, Tintelnot-Blomley M. J Med Chem. 2006;49:4544–4567. [PubMed]

38. Hanessian S, Yun H, Hou Y, Yang G, Bayrakdarian M, Therrien E, Moitessier N, Roggo S, Veenstra S, Tintelnot-Blomley M, Rondeau JM, Ostermeier C, Strauss A, Ramage P, Paganetti P, Neumann U, Betschart C. J Med Chem. 2005;48:5175–5190. [PubMed]

39. Rajapakse HA, Nantermet PG, Selnick HG, Munshi S, McGaughey GB, Lindsley SR, Young MB, Lai MT, Espeseth AS, Shi XP, Colussi D, Pietrak B, Crouthamel MC, Tugusheva K, Huang Q, Xu M, Simon AJ, Kuo L, Hazuda DJ, Graham S, Vacca JP. J Med Chem. 2006;49:7270–7273. [PubMed]

40. El-Kabbani O, Darmanin C, Oka M, Schulze-Briese C, Tomizaki T, Hazemann I, Mitschler A, Podjarny A. J Med Chem. 2004;47:4530–4537. [PubMed]

41. Wilson DK, Tarle I, Petrash JM, Quiocho FA. P Natl Acad Sci USA. 1993;90:9847–9851. [PubMed]

42. Urzhumtsev A, TeteFavier F, Mitschler A, Barbanton J, Barth P, Urzhumtseva L, Biellmann JF, Podjarny AD, Moras D. Structure. 1997;5:601–612. [PubMed]

43. Calderone V, Chevrier B, Van Zandt M, Lamour V, Howard E, Poterszman A, Barth P, Mitschler A, Lu JH, Dvornik DM, Klebe G, Kraemer O, Moorman AR, Moras D, Podjarny A. Acta Crystallogr D. 2000;56:536–540. [PubMed]

44. Kinoshita T, Miyake H, Fujii T, Takakura S, Goto T. Acta Crystallogr D. 2002;58:622–626. [PubMed]

45. Howard EI, Sanishvili R, Cachau RE, Mitschler A, Chevrier B, Barth P, Lamour V, Van Zandt M, Sibley E, Bon C, Moras D, Schneider TR, Joachimiak A, Podjarny A. Proteins-Strut Func Bioinf. 2004;55:792–804. [PubMed]

46. Ruiz F, Hazemann I, Mitschler A, Joachimiak A, Schneider T, Karplus M, Podjarny A. Acta Crystallogr D. 2004;60:1347–1354. [PubMed]

47. Van Zandt MC, Jones ML, Gunn DE, Geraci LS, Jones JH, Sawicki DR, Sredy J, Jacot JL, DiCioccio AT, Petrova T, Mitschler A, Podjarny AD. J Med Chem. 2005;48:3141–3152. [PubMed]

48. El-Kabbani O, Darmanin C, Schneider TR, Hazemann I, Ruiz F, Oka M, Joachimiak A, Schulze-Briese C, Tomizaki T, Mitschler A, Podjarny A. Proteins-Struct Func Bioinf. 2004;55:805–813. [PubMed]

49. Steuber H, Zentgraf M, Podjarny A, Heine A, Klebe G. J Mol Biol. 2006;356:45–56. [PubMed]

50. Brownlee JM, Carlson E, Milne AC, Pape E, Harrison DHT. Bioorg Chem. 2006;34:424–444. [PMC free article] [PubMed]

51. Steuber H, Zentgraf M, Gerlach C, Sotriffer CA, Heine A, Klebe G. J Mol Biol. 2006;363:174–187. [PubMed]

52. Petrova T, Ginell S, Mitschler A, Hazemann I, Schneider T, Cousido A, Lunin VY, Joachimiak A, Podjarny A. Acta Crystallogr D. 2006;62:1535–1544. [PubMed]

53. Steuber H, Heine A, Klebe G. J Mol Biol. 2007;368:618–638. [PubMed]

54. Ehrig T, Bohren KM, Prendergast FG, Gabbay KH. Biochemistry. 1994;33:7157–7165. [PubMed]

55. Kim R, Skolnick J. J Comput Chem. 2008;29:1316–1331. [PMC free article] [PubMed]

56. Hansson T, Marelius J, Åqvist J. J Comput Aided Mol Des. 1998;12:27–35. [PubMed]

57. Gilson MK, Given JA, Head MS. Chem Biol. 1997;4:87–92. [PubMed]

58. Leach AR, Shoichet BK, Peishoff CE. J Med Chem. 2006;49:5851–5855. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |