|Home | About | Journals | Submit | Contact Us | Français|
A central challenge in structure-based ligand design is the accurate prediction of binding free energies. Here, we apply alchemical free energy calculations in explicit solvent to predict ligand binding in a model cavity in T4 lysozyme. Even in this simple site, there are challenges. We made systematic improvements, beginning with single poses from docking, then including multiple poses, additional protein conformational changes, and using an improved charge model. Computed absolute binding free energies had an RMS error of 1.9 kcal/mol relative to previously determined experimental values. In blind prospective tests, the methods correctly discriminated between several true ligands and decoys in a set of putative binders identified by docking. In these prospective tests, the RMS error in predicted binding free energies relative to those subsequently determined experimentally was only 0.6 kcal/mol. X-ray crystal structures of the new ligands bound in the cavity corresponded closely to predictions from the free energy calculations, but sometimes differed from those predicted by docking. Finally, we examined the impact of holding the protein rigid, as in docking, with a view to learning how approximations made in docking affect accuracy and how they may be improved.
A central problem in ligand discovery and design is the prediction of ligand–receptor binding free energies. Current methods cover a spectrum of physical rigor and computational cost. Among physics-based methods, physics-based docking and scoring is computationally the least expensive. In this approach, ligand orientations (poses) are assigned scores, related to the intermolecular interaction energy, and ranked relative to other poses and other ligands1. A few scoring functions include an explicit or implicit estimate of the desolvation free energy of the receptor and ligand2. Receptor flexibility, strain energies3,1, and various entropies are usually neglected, as is any reference to the unbound protein and ligand states. These approximations put estimation of binding affinities well out of the reach of docking methods, although these methods can often correctly rank-order candidate molecules for testing.
At a higher level of theory are MM-GBSA/PBSA methods4,5,6. These methods estimate the absolute free energies of bound and unbound reference states. Enthalpies are estimated using average energies from a molecular mechanics force field, and combined with an entropy estimate and a solvation free energy from an implicit solvent model. The difficulty is that the binding free energy is a small difference between very large absolute free energies, requiring either very high accuracy in computing these large numbers, or cancellation of errors. Thus, while these approaches have had successes5,6, they also have several drawbacks, such as sensitivity to details of the implicit solvent model used7,8 and to the method used for estimating the entropy term. Such methods perform poorly on some test sets9,10.
At the highest level of rigor are various free energy methods, including the alchemical free energy calculations described below, and PMF-based methods11,12 (for recent reviews of free energy methods, see Refs.13,14,15). Here, we focus on alchemical free energy methods, which evaluate ratios of partition functions to estimate binding free energies, and thus include entropic and other contributions neglected at lower levels of theory. These methods, combined with some theoretical developments first laid out in the mid-1990s16,17,18 and refined later19, now allow absolute binding free energies to be computed rigorously and exactly, provided that the molecular mechanics forcefield used accurately describes the underlying physics, and that enough sampling can be performed that the estimates of the relevant thermodynamic averages are converged20.
If, in principle, alchemical free energy calculations allow for exact prediction of binding free energies, the requirements of accurate force fields and adequate sampling introduce error into the computed free energies. This error is often difficult to isolate in the complicated environment of protein active sites. In such sites, failures of sampling or force fields are exacerbated by binding-induced conformational changes, titratable groups, metal ions, and ordered waters, among other complications. Furthermore, when sampling is inadequate, alchemical free energy methods can easily give biased results; for example, computed free energies are often sensitive to the choice of the initial receptor or ligand structure21,22,23,24,25,26,20.
Here, to isolate sources of error, we study a highly simplified binding site using alchemical free energy methods and molecular dynamics. We focus on the binding of small aromatic ligands to the small, buried hydrophobic binding site in an engineered mutant of T4 lysozyme (the L99A site; Figure 1) that has been studied extensively experimentally27,28,29,30,31,32,33, with docking methods31,33, and in some previous computational free energy studies18,34,19,24. Here, we systematically evaluate the effect of various approximations on computed binding free energies. This model binding site provides a good starting point because it is simple and has been thoroughly characterized experimentally.
A second advantage of this model binding site also is that it provides an excellent opportunity for prospective predictions, since it is relatively easy to find new compounds that bind31. This is valuable, because it can be far easier to suggest explanations for previous observations than to actually make new predictions, and predictive ability provides a fundamental test for methods.
Here, we performed two sets of studies: Retrospective, in which we studied binding of ligands with previously measured affinities, and prospective, in which we predicted, in a blind test, the binding modes and affinities of several previously uncharacterized small molecules. After making predictions, we tested them experimentally, using isothermal titration calorimetry to measure affinities and X-ray crystallography to determine structures of the complexes.
We first computed binding free energies for a test set of 13 small neutral compounds using alchemical free energy calculations, as described in Methods (Table 1). Of these, binding affinities for 11 had previously been measured by isothermal titration calorimetry30, and two had previously been determined not to bind more strongly than an affinity of 10 mM using a thermal denaturation assay29,33.
We started with a simple approach. We used the best-scoring docking pose for each compound as a starting structure from which to simply compute binding free energies using our standard free energy calculation protocol discussed in Methods. We previously found that this single-pose approach often results in ligands remaining trapped in the vicinity of their starting orientation on simulation timescales20. Thus, with this approach, the free energy calculations effectively become an expensive re-scoring of docking poses, including conformational averaging and entropic effects, but only for a single orientation. We present results using this approach as (Table 1). In keeping with the approximations typically made in docking, we consider only single orientations for these results, meaning that we also neglect symmetry-equivalent orientations for molecules like toluene, phenol, and benzene, which in reality also contribute to binding17,19,24,20. The RMS error for relative to experiment is 3.51±0.04 kcal/mol, and the correlation coefficient (R) between computed free energies and experiment is 0.51±0.05. (These RMS and correlation calculations do not include the nonbinders, since free energies of association for these are not known.) (Figure 2(a)). This approach underestimates all the binding affinities. This is likely due to undersampling, a failure to adequately sample the most optimal binding conformations. Previous work had suggested this approach would fail in the case where the best docking pose is not the orientation that actually contributes most to binding20, so this outcome was expected.
Next, we account for the presence of multiple potential ligand binding modes separated by kinetic barriers. We compute binding free energies of different possible binding modes separately, and combine their contributions to get an overall binding free energy20 (see also Section 4.1). We refer to these free energies, which also include the contributions of orientations related by symmetry, as (Table 1). With this approach, the computed binding free energies are substantially closer to experiment in several cases (and are never worse) than those computed using the single-orientation approach, since occasionally the best-scoring pose is not the pose that contributes most to the binding free energy. This inclusion of these contributions reduces the RMS error in the computed free energies relative to experiment, from 3.51±0.04 kcal/mol to 2.55±0.03 kcal/mol, and raises the correlation coefficient, R, from 0.51±0.05 to 0.72±0.05.
The improvements with this approach come for several reasons. For three of the ligands (indene, indole, and 2,3-benzofuran) multiple orientations are within kT of one another and all contribute substantially. For isobutylbenzene, the best pose from docking is not in the orientation that contributes most to binding, so including multiple candidate orientations results in inclusion of the dominant orientation. For the remainder of the compounds, improvements come from inclusion of symmetry number corrections. These issues have been addressed in more detail in work on a related binding site20. In general, it is extremely difficult to predict in advance whether multiple orientations may be relevant.
The section above describes our treatment of relevant ligand orientations. However, the protein may also have relevant slow degrees of freedom which can be difficult to sample29,35. Here, a key change is the reorientation of the Val111 sidechain observed in X-ray structures in response to ligands such as n-butylbenzene, isobutylbenzene, o-xylene, and p-xylene (Ref. 29 and Figure 3). The energy barriers associated with this reorientation are large enough to prevent the sidechain from rotating on simulation timescales35. This leads to an apparent dependence of computed free energies on the initial protein structure used in simulations. For example, binding free energies that are computed from the holo protein structure are too negative (favorable) if the sidechain does not have time to re-orient as the ligand is removed, because the protein strain energy (the energetic cost of deforming the protein on binding) is not properly accounted for. On the other hand, if the apo protein structure is used, as we did here, binding free energies are too positive (unfavorable), as the ligand sterically clashes to some degree with the protein35. This dependence on the starting structure is simply due to kinetic trapping of the protein in conformations near its starting conformation.
To overcome the kinetic trapping of Val111, we use a recently-developed “confine-and-release” framework to obtain correct binding free energies that are independent of the starting structure35. Specifically, when the Val111 remains trapped, computed binding free energies are really “confined” binding free energies, with Val111 confined to a particular orientation, so we use umbrella sampling to compute the free energy of releasing the valine from its confinement in the bound and unbound states. Here, this is accomplished by forcing sampling of alternative orientations using a harmonic biasing potential, and recovering the free energy landscape for this degree of freedom35. We do this for all of the compounds considered here, although for many compounds, only the apo orientation of the valine is found to be relevant, as observed experimentally29. This is a rigorous way to account for kinetic trapping. The confine-and-release framework is a generalization (to protein degrees of freedom) of the biasing potential approaches applied previously to ligands in a number of studies16,18,19,20,22,24,11.
The confine-and-release approach, which yields the total estimated binding free energy ΔGbind, further improves the agreement of computed binding free energies with experiment (Table 1 and Figure 2(b)). With this approach, the RMS error relative to experiment further decreases from 2.55±0.03 kcal/mol to 2.24±0.04 kcal/mol, while the correlation with experiment remains unchanged (R=0.72±0.05). There is significant improvement in the agreement with experiment for a number of the ligands, especially isobutylbenzene, p-xylene, n-butylbenzene, o-xylene, and ethylbenzene. As mentioned above, for the first four of these, Val111 re-orientation on binding is observed in the co-crystal structure29. For ethylbenzene, the deposited structure does not show the Val111 rotated relative to the benzene bound structure, but the electron density seems to allow the possibility of either orientation29. A prediction of this work is that, if the crystal structure with ethylbenzene can be solved at higher resolution, the Val111 sidechain should be observed to adopt a conformation similar to that in the p-xylene bound structure.
The confine-and-release approach can be applied to a variety of protein conformational changes. Here, we chose to apply it specifically to a single Val111 dihedral. This choice was motivated by the fact that Val111 was previously observed (experimentally) to reorient on ligand binding29. Also, previous computational work led us to believe that sampling of this degree of freedom could be very slow24. We therefore examined our initial simulations to look for Val111 reorientation, and found the kinetic trapping described above35. This led us to apply the confine-and-release approach to this particular degree of freedom.
It is worth noting that this is not simply an issue of predicting a correct bound structure. Rather, using either the apo or holo structure leads to biased binding free energies (Tables 1 and and2)2) when the confine-and-release approach is not used. Only when we account for Val111 reorientation using confine-and-release do computed free energies become consistent between simulations beginning from apo and holo starting structures (below and Ref 35).
After the confine-and-release calculations, there was still particularly poor agreement with experiment for several ligands – especially o-xylene, indene, indole, isobutylbenzene, and 2,3-benzofuran. (The first three of these are the only binders in Figure 2(b) with computed binding free energies worse than −2 kcal/mol). One possible explanation is inadequate sampling – perhaps due to additional protein conformational rearrangements that are not being sampled. For example, for indene, isobutylbenzene, and o-xylene, helix F, which forms one side of the cavity, shifts around 2 Å on ligand binding, making the binding site larger29. Additionally, previous free energy calculations on the same system tended to overestimate binding free energies for some of these same compounds when beginning from the holo structures24.
However, inspection of simulation trajectories suggests that this helix motion is being sampled. As a more quantitative test, we repeated the calculations for selected compounds beginning from the holo structures. If computed binding free energies are different starting from apo and holo structures, even after applying the confine-and-release approach for Val111, it would indicate inadequate sampling. While computed binding free energies are significantly different for calculations started from the apo and holo structure before accounting for Val111 reorientation, the differences are essentially negligible (within uncertainty) when the confine-and-release approach is used to account for this change (Table 2). (The largest difference, using the confine-and-release approach, is for o-xylene, −0.64 ± 0.28 kcal/mol; since the uncertainty represents one standard deviation, this is still only a 2σ variation). This implies that sampling of these conformational changes is probably sufficient and that the error lies elsewhere.
Free energies computed using the holo starting structures also show that the holo protein structure of several of these ligands is unfavorable by roughly 4 kcal/mol in the absence of bound ligand (Table 2 and Ref.35). This is presumably because of steric clashes with the protein, and is the reason why only some of the ligands induce this conformational change on binding.
Next, we considered another possible source of error: the simulation parameters. There are different methods for assign partial charges for small molecules (for a recent discussion, see36). In the work reported above, we used AM1-CM237 partial atomic charges for the small molecules, as in docking studies. However, we found previously that AM1-BCC charges performed better than AM1-CM2 charges for hydration free energies, perhaps because they are more similar to the HF 6/31G* charges the force field was parameterized with36. Therefore, we tested the AM1-BCC charges here as well. Table 3 shows that AM1-BCC charges further reduce the RMS error between computed and experimental binding free energies from 2.24±0.04 to 1.89±0.04 kcal/mol, and the correlation coefficient, R, increases from 0.72±0.05 to 0.79±0.07.
One major challenge for docking methods is to discriminate between binders and non-binders. We have included two known non-binders (with affinities worse than 10 mM) in the set of molecules examined here: phenol and 2-fluorobenzaldehyde. For these two compounds, computed binding free energies indicate only weak affinity: −2.9 ± 0.1 kcal/mol and weaker (more positive) (Table 1). A 10mM detection threshold in binding affinity corresponds to a binding free energy of roughly −2.7 kcal/mol. Thus, the computed binding free energies for these two compounds are at the detection limit, essentially consistent with the experimental observation that they are non-binders.
Our free energy calculations are computationally expensive. Are the results any more accurate than those that can be obtained from molecular docking? As shown in Figure 4, DOCK scores for the ligands studied here correlate poorly with experimental binding free energies. In fact, they are anti-correlated (R=−0.69), the opposite of what one would like. Moreover, the two non-binders have DOCK scores similar to those for the majority of the true ligands (and much more favorable scores than several ligands), hence it is impossible to discriminate between binders and non-binders. In fairness to docking, it is worth noting that these nonbinders were included in our test set because they have proven challenging for docking to discriminate from the binders. Also, the first goal of docking is to separate likely from unlikely ligands, and it does seem to be performing remarkably well in this binding site, where about 80% of the top 100 docking hits would probably bind. Additionally, we find that docking also performs quite well in this site at generating sterically reasonable potential bound orientations. That said, the free energy calculations give substantially better affinity estimates and correlations than docking does, and are better at recognizing nonbinders.
The docking results discussed here used the benzene-bound protein structure, which is virtually identical to the apo structure. However, docking to alternate protein conformations seems not to result in significant improvements in quality of docking results, except when many different crystallographic protein conformations are used32.
Docking typically treats proteins as rigid. How big is the error introduced by this assumption? To test this, we held the protein rigid and repeated our free energy calculations, including the effects of ligand symmetries and multiple ligand orientations (and using AM1-CM2 charges). This led to essentially zero correlation (R=−0.05±0.09) between computed free energies and experimental values, and an RMS error of 19.78±0.06 kcal/mol (Figure 5(a)).
As a simple improvement on this rigid protein approximation, we also allowed the protein to relax to a different structure for each ligand. First, we minimized the entire protein in the presence of each ligand, using the initial docking geometry, and subsequently held the protein rigid during the simulations. This resulted in a correlation (R) of 0.82±0.09 and an RMS error of 4.92±0.07 kcal/mol relative to experiment (Figure 5(b)). Second, we minimized only a region of the protein around the binding site (Section 4.1.10) in the presence of each ligand, before holding the protein fixed during the simulations. This resulted in a correlation (R) of 0.32±0.08 and an RMS error of 4.06±0.06 kcal/mol relative to experiment (Figure 5(c)).
Overall, it appears that keeping the protein rigid while estimating binding free energies is detrimental to binding free energy estimation, even if minimization is performed separately for each ligand.
We also performed a blind test of these free energy methods. We selected five small molecules that were among the top-scoring molecules from a docking screen of a compound library (using protocols described previously33 ). We calculated binding free energies in the same manner as above, and compared the resulting dissociation constants with the detection threshold of 10 mM for the experimental thermal denaturation assay. We predicted that four of the molecules would bind and one would not 1 (the 10 mM threshold fell just between affinities for two of the compounds). We then tested these predictions experimentally using the upshift in thermal denaturation, in which the melting temperatures of the protein in the presence and absence of the ligand were compared27. All molecules were tested in their neutral forms, using either circular dichroism or fluorescence to monitor the transition from the folded to the unfolded state, and resulting Tm values were compared to that of the apo protein. All melts occurred reversibly in a manner consistent with two-state unfolding. 1,2-dichlorobenzene, n-methylaniline, 1-methylpyrrole and 1,2-benzenedithiol increased the Tm significantly, between 1.0 and 2.9°C (Table 4). Conversely, thieno-[2,3c]pyridine was not observed to increase the Tm, even at 2.5 mM concentration, consistent with the predictions of the free energy calculations. In contrast, docking had predicted all five would bind, but the free energy methods correctly identified the nonbinder (thieno[2,3-c]pyridine).
We then obtained crystal structures (Table 5 to determine how well these free energy calculations could predict the bound ligand conformations. We soaked three of the ligands into L99A lysozyme protein crystals. The crystals diffracted to between 1.7 Å and 2.07 Å on a home source. In all three structures, initial Fo−Fc electron density unambiguously identified the orientation of the ligand in the site; for dichlorobenzene, two orientations of the ligand were apparent.
In parallel, we predicted dominant bound orientations for each of these ligands (Section 4.1). Then we determined the structures for these three ligands, and compared with our predicted structures, the best predicted DOCK poses, and the electron density (Figure 7). These structures are deposited in the Protein Data Bank under codes 2OTY, 2OTZ, and 2OU0.
For 1-methylpyrrole, the X-ray, docking, and molecular dynamics poses were quite similar (Figure 7). The particular molecular dynamics snapshot that was selected as representative appears slightly twisted relative to the other two structures, but this is simply due to the arbitrary selection of a single MD conformation from an ensemble. The X-ray pose falls well within the range of structures sampled by the simulation from which this snapshot was chosen (and the underlying electron density is consistent with a range of structures seen in simulation). The RMSD between the docking pose and X-ray structure is 0.39 Å and that between the free energy snapshot and X-ray is 0.94 Å
For n-methylaniline, free energy methods predicted two orientations, with essentially equal occupancy probabilities (Figure 7). The X-ray and docking poses match well with one of these orientations, but there is no evidence in the electron density for the second orientation. The two orientations differ only by rotation around the C1–N bond. The RMSD between the docking pose and X-ray is 0.63 Å and that between the lower RMSD free energy snapshot and X-ray is 0.69 Å (the higher RMSD orientation is 1.29 Å away from X-ray).
For 1,2-dichlorobenzene, two separate orientations were observed in the crystal structure, differening by a rotation of around 60° in the plane of the aromatic ring. DOCK failed to properly identify either of these orientations as the best-scoring pose, whereas our free energy methods picked out one but indicated that the second was energetically unfavorable. In particular, the ligand would occasionally transition into this alternate orientation, but it would typically remain only transiently. We further tested this by conducting a separate set of calculations where the ligand was restrained to remain in the alternate orientation, but, with our parameter set, it appears substantially less favorable for binding than the orientation predicted to dominate using our free energy methods. This could be a force-field failure, inadequate sampling, or a difference between experimental and simulation conditions. The docking pose is 2.8–2.9 Å away from the X-ray orientations, while the free energy snapshot is only 0.77 Å away from the most similar X-ray orientation.
Next, we predicted binding free energies and then measured them by isothermal titration calorimetry. Since the AM1-BCC charge model worked best in retrospective studies, we used these charges for binding free energy predictions. Ligand titrations gave easily modeled curves using a low c-value protocol38 (Figure 8). Not only did binding free energy calculations give the correct rank ordering of binders, but computed binding free energies for these compounds were remarkably accurate (an RMS error of 0.57±0.09 kcal/mol, Table 4).
Alchemical free energy calculations using molecular dynamics can be used to compute fairly accurate binding free energies of ligands in the T4 lysozyme L99A binding site, with an RMS error of 1.89±0.04 kcal/mol in retrospective tests. This is a much higher accuracy than docking, where scores were inversely correlated with experiment, at least among the top-scoring ligands here. Admittedly, the docking program, DOCK, was never designed to predict binding affinities, and performs remarkably well at ranking likely ligands highly in large libraries33. Also, in these calculations, we are comparing with previously known results. A more rigorous test is to compare genuinely new predictions on untested candidate ligands with subsequent experiment.
Therefore, in a blind test, we predicted affinities and binding orientations for five previously uncharacterized compounds predicted by DOCK to bind, then tested these predictions experimentally. With alchemical free energy calculations, we correctly recognized the one non-binder, accurately predicted ligand bound orientations, and quantitatively predicted binding free energies. In each of these areas, free energy methods agreed better with experiment than docking did. Free energy methods, unlike docking, also correctly ranked the ligand binding affinities in these prospective tests. Thus, it appears that alchemical free energy methods can be truly predictive and can rescue docking failures.
The approach described here, including the retrospective study of previously published data, required no knowledge of the bound structure of the protein and ligand, and used the apo protein structure. Previous work on this same binding site (for example, in Ref.24) has required an accurate bound structure of the complex of interest as a starting point.
This present study is limited to a simplified model binding site, where many complications of other binding sites are absent. The L99A cavity studied here has no interface with bulk water, no ordered waters to displace, is small, and the dominant interactions appear to be mainly non-polar. Nevertheless, the use of this system has allowed us to be systematic in isolating and solving various sampling problems. We identified several keys to obtaining accurate binding free energies: First, multiple potential ligand orientations must be included; one cannot rely on the single top docking pose to be the dominant ligand orientation. There can be large energetic barriers between different ligand orientations which make timescales for orientational change long compared to simulation times. Second, even seemingly small protein conformational changes on ligand binding, such as the reorientation of a single sidechain, Val111, can be difficult to sample correctly in free energy calculations, yet it is essential to include these conformational changes to get correct binding free energies.
On this second point, it is interesting to note that previous computational work on this same binding site gave free energies that were more than 2 kcal/mol too negative for four of 10 known binders in calculations initiated from the holo structures24. We performed similar free energy calculations for these compounds, found that our calculations overestimated binding affinities for the same ligands, despite the fact that we use a different force field and different parameters – but only when we failed to account for the free energy associated with Val111 reorientation. Indeed, in the previous work, results were sensitive to the starting orientation of the Val111 sidechain for at least one ligand, indicating that this sidechain degree of freedom was inadequately sampled24. Our results indicate that if the free energy associated with this reorientation is not included, it can bias computed binding free energies by as much as 4 kcal/mol.
Given that these small conformational changes can contribute so substantially to overall binding free energies, it will likely prove essential to develop improved methods for computing free energies associated with larger-scale protein conformational changes, such as loop motions on ligand binding.
To address the rigid protein assumption typically used in docking, we also tried free energy calculations with the protein held rigid. Using a rigid apo structure for all ligands resulted in very large errors (RMS error 20 kcal/mol; zero correlation). Minimizing the protein separately in the presence of each ligand worked better, but RMS errors remained high (above 4 kcal/mol), and the approach lacked the ability to recognize non-binders. Apparently, for this binding site, holding the protein rigid cannot easily produce binding affinities that agree quantitatively (even within 4 kcal/mol RMS error) with experiment, but strategies involving minimization of the protein can provide some improvement over treating it as completely rigid. But for high accuracy, it may be necessary to include not only protein conformational change, but a correct accounting for the free energy costs associated with these protein conformational changes – which can be substantial, even at the single sidechain level. Lastly, our results indicated that higher-quality charges can lead to substantial improvements in binding affinities; thus, the AM1-BCC charges that performed best here may also be a better choice for docking.
Overall, our results indicate that free energy methods are reaching the point where they can be useful when used predictively. However, in the relatively simple system examined here, this reasonably high level of accuracy depends on carefully accounting for the presence of multiple potential ligand bound orientations, and the possibility of protein conformational changes on ligand binding. In principle extremely long molecular dynamics simulations can handle both of these, but in practice, the computational cost of such simulations is often prohibitive. For now, treating both problems requires deliberate sampling of the relevant degrees of freedom, and so some pre-knowledge of these degrees of freedom. We suspect that challenges observed in this model binding site will be found in biologically relevant binding sites as well. Despite these limitations, alchemical free energy methods hold great promise, both in predictive power, and in guiding improvement of more approximate physics-based methods.
We begin with the benzene-bound crystal structure of the T4 lysozyme L99A mutant30,29 (PDB entry 181L, which is virtually identical to the apo structure, 1L90, from which it has an RMSD of 0.14 Å), dock our ligands into the binding site, and determine which poses or orientations are kinetically stable and distinct. We then begin free energy calculations from stable, distinct orientations, as described previously20, to compute binding free energies. Finally, for each ligand, we combine contributions from these different orientations to rigorously estimate binding free energies.
Except where indicated otherwise, the initial protein structure for molecular dynamics simulations and free energy calculations is the benzene-bound structure of the L99A lysozyme mutant, which is essentially identical to the apo structure. This was prepared with the GROMACS39,40 3.3 utility PDB2GMX with default protonation states, using a GROMACS port41 of the AMBER 9642 force field. Since the cavity that makes up the binding site is completely hydrophobic without any nearby titratable groups, these protonation states present no difficulties. Following preparation, the protein was placed in a dodecahedral simulation box and surrounded by roughly 6,000 water molecules which were pre-equilibrated for 1 ns with the protein held fixed prior to the equilibration of the full system, which is discussed below.
We used DOCK 3.5.54 to fit the molecules of interest into the protein structure (Tables 1 and and4).4). We retained all of the generated poses (numbering in the thousands) and scores of the molecules, then sorted these by score. We then began with the best scoring pose and worked towards the worst, retaining every pose that was different by more than 2 Å RMSD from a better scoring pose, to generate a set of the best scoring, distinct ligand orientations. This typically resulted in 10–40 distinct poses, of which we retained only the group of top-scoring poses (typically 3–8).
From these poses, we generated general AMBER force field (GAFF)43 parameters for each ligand using ANTECHAMBER version 1.2.444, and AM1-CM2 charges45 as discussed previously20. These charges were employed in docking studies on the same system33, and we sought to separate parameter differences from methodology differences as much as possible. We also present results in this work where we use AM1-BCC46,47 charges computed with ANTECHAMBER.
From the resulting small-molecule AMBER topology and coordinate files, we generated GROMACS topology and coordinate files using the amb2gmx conversion utility48. These ligand topologies and coordinates were then merged with those for the pre-solvated protein system prior to simulation.
To further reduce the number of ligand orientations we consider in free energy calculations, we initiated separate 1 ns molecular dynamics simulations from all candidate orientations to identify those that are kinetically distinct20. We only retained one orientation of each set of orientations that interconvert easily within simulation timescales. This typically resulted in 1–4 kinetically distinct orientations which were used for the calculations presented in Section 2.
We then chose reference orientations for restraining the ligand in the binding site relative to the protein for subsequent free energy calculations. These are defined by picking a specific value in each of six relative protein-ligand degrees of freedom to which to restrain the ligand20. These values were chosen as the most probable value of each degree of freedom as determined from histograms computed during the 1 ns simulations, although in principle this choice is arbitrary19,20. The degrees of freedom used are as described previously19,20.
We carried out independent binding free energy calculations for each kinetically distinct orientation. Using the orientational decomposition procedure described previously20, we combined the effective binding free energies of each orientation into an overall binding free energy ( ). We also computed binding free energies that would have resulted had we only considered a single potential bound orientation and neglected symmetry number corrections, as done in docking ( ).
Binding free energy calculations were carried out in GROMACS 3.3 (with several crucial bugfixes described previously20 ) using the Bennett acceptance ratio (BAR)49,50 method to estimate free energy differences. To calculate absolute binding free energies, we used a thermodynamic cycle developed previously17,19,20. In this cycle, we begin with the ligand bound to the protein, then restrain the ligand harmonically to a reference orientation within the binding site. We then annihilate the ligand’s partial charges, then decouple its Lennard-Jones interactions with the rest of the system. The final ligand state is equivalent to a non-interacting ligand with no electrostatics, restrained, in vacuum or water. We then analytically calculate the free energy of removing the restraints, and compute the free energy of restoring first the Lennard-Jones and then the electrostatic interactions in water. This entire process forms a thermodynamic cycle that transfers the ligand from the binding site to bulk water in the standard state. If all of the component calculations are converged, this rigorously provides a measurement of the absolute binding free energy, ΔGo, for the forcefield and solvent model used 2. As part of each of the steps in the cycle, independent free energy calculations were conducted at a number of intermediate alchemical states (denoted by the parameter λ) which were the same as those described in our previous work20.
Following these binding free energy calculations and the predictions discussed below, we also computed binding free energies for the set of small molecules using AM1-BCC charges. To do this, we computed the free energy of changing AM1-CM2 to AM1-BCC charges in water for each compound, and then repeated the restraining and charging calculations for the compound in the protein for each orientation. Since the Lennard-Jones decoupling is done with compound’s electrostatics already turned off, it was unnecessary to repeat these calculations.
For all of the simulations discussed here (at each λ value), equilibration was performed as follows. First, velocities were assigned from a Maxwell-Boltzmann distribution at 300 K and the the system was subjected to 10 ps of isothermal molecular dynamics. This was followed by 100 ps of isothermal-isobaric dynamics with pressure regulated by the Berendsen weak-coupling scheme51 as discussed previously20. Following this, the simulation cell size was fixed and production simulations were run with isothermal dynamics, using the Langevin integrator for temperature control with a friction coefficient of 1 ps−1. Production simulations were 1 ns in length for simulations of the complex (at each alchemical intermediate state, or λ value), and 5 ns for the ligand in water, except where noted otherwise.
All remaining protocols are as discussed previously20, with several exceptions: First, PME parameters were modified from those used previously to increase accuracy. Here, we used a PME spline order of 6, a relative tolerance of 10−6, and a Fourier spacing of as close as possible to 1.0 Å. Additionally, we applied a long range van der Waals correction (in addition to the analytical correction employed previously20 ) to correct for the effect of truncating the long-range dispersive interactions at a finite cutoff. These interactions are everywhere attractive, and can contribute significantly to binding free energies due to the fact that proteins have a higher density of attractive sites than water22. While this issue will be discussed in detail elsewhere52, the approach used here was, briefly, to run as usual the set of simulations where the ligand Lennard-Jones interactions are decoupled (that is, the ligand-environment Lennard-Jones interactions are turned off). These simulations were then reprocessed with long (24 Å) cutoffs for Lennard-Jones interactions, and the weighted histogram analysis method (WHAM)53 was used to reweight the data from the simulations conducted with the short cutoff in order to estimate what the decoupling free energy would have been had we run with the longer cutoff. This is a relatively small correction (0.2–0.8 kcal/mol) in the direction of increased binding affinity. This correction would be larger had not an approximate analytical dispersion correction already been included in the original runs by using the GROMACS correction option ENERPRES, and tends to be larger for larger ligands22,52.
To aid convergence of the calculations for benzene, which has a very high symmetry number, we used an approach employed previously24 and restricted benzene to stay within a single symmetric orientation during our free energy calculations, then simply included the effect of symmetry as a symmetry number correction to the binding free energy20.
For some ligands, a valine sidechain (Val111) in the binding cavity is observed experimentally to change rotameric states on ligand binding. This conformational change is not typically sampled (or not well sampled) during our molecular dynamics simulations (discussed in Section 2). Neglect of this change leads to an underestimate of binding free energies for those ligands when the apo protein structure is used, and an overestimate when the holo protein structure is used. In the former case, the protein typically remains trapped in a conformation where the valine interferes with ligand binding; in the latter, when the ligand is removed from the binding site, the protein remains kinetically trapped with the sidechain in an unfavorable orientation, leading to neglect of protein strain energy in the free energy calculation. One way to properly account for the presence of these multiple conformations is to use the “confine-and-release” strategy35. The basic idea is to compute the binding free energy of the ligand to the protein, with the protein conformation confined (either kinetically or with artificial restraints) to a particular region of configuration space, then to compute the free energy of releasing the protein from confinement in the bound and unbound states. This provides a rigorous approach for computing binding free energies which include contributions from these conformational changes35.
We apply the confine-and-release approach to compute the binding free energies of all the compounds considered here. To do this, we first compute the binding free energy with the valine sidechain kinetically trapped in the orientation from the apo structure (which we check by monitoring the dihedral angle throughout all of our simulations). In some cases, the valine actually manages to briefly escape from its kinetic trap at one or several λ values in the alchemical part of the calculation; we discard any simulation snapshots where it had done so from our data analysis in order to apply the confine-and-release approach. Once we have confined binding free energies, computed with the sidechain trapped, we use umbrella sampling and WHAM53 to compute the potential of mean force (PMF) for rotating the valine sidechain in the bound and unbound states. From this, we compute the free energy of releasing the protein from confinement, and thus the binding free energies. We present our results with and without the confine-and-release approach, which provides a rigorous way to account for inadequate sampling.
Simulation details for the umbrella sampling calculations are as described previously35.
Since free energy calculations were conducted with two charge sets, two sets of umbrella sampling calculations for Val111 were carried out for each ligand –one where the ligand had AM1-BCC charges, and one where it had AM1-CM2 charges. This was important since the details of the ligand electrostatics can influence the free energy landscape associated with this reorientation.
To predict whether untested compounds would bind, we selected five small molecules predicted by docking (using protocols described previously33 ) to be binders. We followed the same protocols described above — docking to the binding site, retaining a number of different kinetically distinct starting orientations, running separate binding free energy calculations for each of these, and then combining them to get a total binding free energy. We also applied the confine-and-release approach to account for any reorientation of the Val111 sidechain on binding of these molecules. From computed binding free energies, we calculated dissociation constants, and then predict that those compounds with dissociation constants less than 10 mM (the experimental detection threshold for the thermal upshift assay) should bind. These predictions – and those of bound structures, below – were made with AM1-CM2 charges, as AM1-BCC charges were only examined later.
Predicting bound structures of these unknown ligands is challenging, since our method is intended to provide an accurate estimate of the binding free energy which includes contributions from a variety of different ligand and protein structures, and neglects any effects of the crystal environment which are present in X-ray structures, as well as other differences. Here, we attempted to identify the dominant bound structure by identifying which kinetically distinct ligand binding orientation contributes most favorably to the total binding free energy; to do this, we calculated occupancy probabilities of the different dominant orientations from their estimated binding free energies. The predicted orientation was the orientation with the highest probability of occupancy. Then, we predicted a bound structure by taking a representative snapshot from a simulation where the ligand remains, without restraints, stably within the region of configuration space corresponding to that orientation. Our predicted bound orientations, then, were single snapshots from molecular dynamics simulations. For one ligand, we predicted that two orientations would have nearly comparable occupancy probabilities, so we predicted that both would be observed.
After these predictions, we continued retrospective studies and found that the AM1-BCC charge model gave more accurate binding free energies. Therefore, we used the AM1-BCC charge set to make predictions for binding free energies prior to measuring these calorimetrically.
To compare our methodology more closely with docking, we repeated the free energy calculations using essentially the same protocols, but with the protein held rigid in its prepared starting structure, as is often done in docking. To do so, we used the GROMACS option of defining frozen groups that are held fixed during dynamics. Because there are so many fewer degrees of freedom when the protein is completely rigid, convergence was more rapid, and production simulations required only 100 ps at each λ value. Protocols were otherwise the same, and these calculations used AM1-CM2 charges.
We considered several choices for the rigid protein structure. First, we held the protein rigid in its starting structure (prepared from PDB code 181L). Second, motivated by testing approaches that could easily be applied in docking and scoring, we minimized the entire protein in the presence of each ligand individually in vacuum, and used each of these structures for the appropriate ligand. The RMSD to the starting prepared structure is typically around 0.5 Å with this approach. Finally, we modified this second protocol to allow only residues near the binding site (residues 78, 84, 85, 87, 88, 91, 98, 99, 100, 102, 103, 106, 111, 118, 121, 133, and 153) to move during minimization. With this protocol, changes in structure were very minor (often less than 0.01 Å RMSD (the RMSD reported is for the protein as a whole)).
Minimization protocols were as discussed above and previously20, except the order of minimization was reversed (steepest descents followed by L-BFGS) and, since minimization was done in vacuum, cutoff electrostatics was used instead of PME, using a cutoff of 11 Å.
Calculated uncertainties reported here are one standard deviation of the mean over 40 block bootstrap trials, where the block length is taken to be equal to the autocorrelation time, as described previously20.
To detect binding, L99A protein was denatured reversibly by temperature in the presence and absence of the putative ligand. Molecules that bind preferentially to the folded cavity-containing protein should stabilize it relative to the apo protein, raising its temperature of melting29.
Thermal denaturation experiments were carried out in a Jasco J-715 spectropolarimeter with a Jasco PTC-348WI Peltier-effect in-cell temperature control device and in-cell stirring. Each compound was screened in its neutral form. 1,2-benzenedithiol was assayed in a pH 3 buffer containing 25mM KCl, 2.9 mM phosphoric acid and 17 mM KH2PO4. Compounds 1,2-dichlorobenzene and 1-methylpyrrole were screened in a pH 5.4 buffer containing 100 mM sodium chloride, 8.6 mM sodium acetate, and 1.6 mM acetic acid. Compounds thieno[2,3-c]pyridine and n-methylaniline were screened in a pH 6.8 buffer composed of 50 mM potassium chloride, and 38% (v/v) ethylene glycol. All buffers are as previously described29. Thermal denaturation of the protein in the presence of compounds 1,2-dichlorobenzene, thieno[2,3-c]pyridine, and n-methylaniline were monitored by circular dichroism (CD) between 223 and 234 nm (although the 223 nm wavelength is the ideal wavelength for measuring the helical signal of T4 lysozyme, the higher wavelengths, which were less affected by absorbance from some of the compounds, can be used to monitor the edge of the helical signal). For 1,2-benzenedithiol and 1-methylpyrrole, which have high absorbance in the far UV region, thermal denaturation was measured by the intensity of the integrated fluorescence emission for all wavelengths above 300 nm, exciting at 290nm. Thermal melts were performed at a temperature ramp rate of 2 K/min. A least-squares fit of the two-state transition model was performed with the program EXAM54 to calculate Tm and van’t Hoff ΔH values for the thermal denaturations. The ΔCp was set to 1.94 kcal mol−1K−1.
Thermal denaturation of apo T4 lysozyme L99A was carried out with 0.02–0.04 mg/ml protein in the same buffer conditions described above. Compounds were included at concentrations as high as 10 mM. Each denaturation experiment was performed at least three times.
Quantitative estimates of association for ligand binding to L99A T4 lysozyme were obtained by isothermal titration calorimetry (ITC) using a Microcal VP-ITC calorimeter55 operated at 10°C with a reference power of 10 μcal/sec, a stirring speed of 300 rpm, and a data collection interval of four minutes per injection. An initial injection of 2 μL of ligand was followed by an additional 29 injections of 10 μL totaling 292 μL. These were added to 0.05 to 0.13 mM protein in the 1.4266 mL sample cell. The concentration of small molecule ligands in the syringe was adjusted such that the final molar ratio of ligand to protein was at least twofold by the end of the titrations. Protein concentrations were determined by molar absorptivity at 280 nm in 0.5 M NaCl, 0.1 M sodium phosphate buffer, pH 6.8. Ligand concentrations were determined by volume of material added to a known volume of buffer. Baseline mixing heats were estimated by injection of ligand into buffer. Reaction heat profiles were fit to the single binding site model using the ITC worksheet of ORIGIN version 7.0.
T4 lysozyme mutant L99A was overexpressed, purified, and crystals grown as described previously56. The crystals belong to space group P3221. Crystals were soaked overnight to four days in crystallization buffer containing as much as 50 mM compound. In addition, drops of neat compound were added to the cover slip surrounding the drop containing the crystal. After soaking, the crystals were cryoprotected with a 50:50 Paraton-N (Hampton Research, Aliso Viejo, CA), mineral oil mix. X-ray data were collected at 110 K with an in-house Raxis IV detector. Reflections were indexed, integrated, and scaled using the HKL package57. The complex structures were refined using REFMAC558. For model building and water placement, we used Coot59. The X-ray crystal structures have been deposited in the PDB as 2OTY, 2OTZ, and 2OU0.
We thank Sarah Boyce for help with protein preparation. We thank Benoît Roux (University of Chicago), Bill Swope (IBM Almaden), Jed Pitera (IBM Almaden), Guha Jayachandran (Stanford University), and Vijay Pande (Stanford University) for helpful discussions, and the reviewers for insightful comments on the manuscript. JDC was supported in part by HHMI and IBM pre-doctoral fellowships. DLM and KAD acknowledge NIH grant GM63592, Anteon Corporation grant USAF-5408-04-SC-0008, and a UCSF Sandler Award. BKS acknowledges NIH grant GM59957. This work was performed in part with the UCSF QB3 Shared Computing Facility.
1Initial predictions were made with AM1-CM2 charges, but AM1-BCC charges were tried later and led to the same outcome
2The sum of the components around the cycle is actually the negative of ΔGo.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.