There are two novel aspects to our study. First, prospective tests were undertaken with 20 new molecules not previously measured for this site. We calculated absolute binding free energies for 14 of them (including a fortuitous impurity) and relative binding free energies for six. As far as we know, this exceeds the sum total of all prospective tests of these methods in the literature over the last 20 years. Second, these experiments are conducted in a simple cavity site, where we can hope to learn systematically from successes and especially failures of the method by detailed comparisons of prediction to experiment.
To understand what we may learn from this model system, it is useful to summarize how well the calculations corresponded to experiment. We used three criteria, each successively more stringent, to assess the predictions. 1) Did the free energy methods distinguish ligands from non-binders? 2) Were the free energy predictions accurate and can the compounds be rank-ordered by affinity? 3) Are the crystallographic binding modes, and possible alternate ligand orientations, predicted? From the initial prospective calculations, predicting ligand versus non-binder was successful in 10 of 13 cases. Including thiophene-2-carboxaldoxime, for which there were, as we discovered, two relevant isomers to consider, raises these numbers to 11 of 14 correct. Including n-phenylglycinonitrile, for which a trivial error led to the assignment of the wrong partial charges, raises the number to 12 correct predictions out of 14 total at the binder/non-binder level. At the more stringent level of energetic accuracy, the RMS error in ΔGb,calc was 1.8 kcal/mol (1.6 kcal/mol after bugfixes described above). This number does not include the four weakly binding ligands whose affinity could not be measured; if we include estimates of affinity of these weak ligands compared to the calculated energies, the RMS error rises 2.3 kcal/mol (see Results). Although not entirely wrong, these predicted energies are too inaccurate for affinity progression in congeneric series, for instance. Rank-ordering of the entire test set by affinity was certainly unsuccessful. Finally, the lack of energetic accuracy was due in large part to the substantially overestimated affinities, which in turn can be traced to problems with convergence and to an error in the long range van der Waals correction energy (below). The predictions of ligand orientations and alternate conformations were mixed; six of the nine ligands had a predicted pose with an RMSD of ~2Å or better, but three predictions were entirely wrong. Interestingly, eight of the nine ligands induced a protein conformational change in helix F that was not predicted, because of a failure to sample relevant protein motions.
These discrepancies between the theoretical predictions and the experimental results may have three sources, putting aside the chance that the experiments themselves may be wrong (for this study, based as it is on full binding affinity measurements and crystallography, the experimental observables seem reliable). The first sort of error is purely mechanical, relating to improper working of the computational methods or incorrect user choices. These are rarely observed in retrospective studies, as they can be caught and fixed before they are communicated, but in prospective studies, of the sort described here, they can appear; indeed, their occurrence can reflect how easily a method is, in practice, to perform. Second, there can be problems with the sampling of relevant states in the simulation, such as a particular ligand orientation or protein conformation, and convergence to the proper equilibrium distribution among these states, as defined by the force field (we distinguish between “sampling” and “convergence” for this reason). Third, the force field itself is an approximation, and even once the simulation has converged to free energy estimates that are “correct” according to the force field, these may still disagree with experiment. Force field errors can only be detected after results are converged, so before considering force field errors we assess the quality of convergence.
As it happens, technical and scripting problems, such as the bugs in scripts and simulation packages and tools, or even the choice of the wrong stereo-isomer from the initial poses calculated by DOCK 6 27
, had a substantial impact; each led to specific failures (n-phenylglycinonitrile, thiophene-2-carboxaldoxime). An error in the calculation of the long range van der Waals forces also resulted in too favorable binding free energies, in some cases by as much as 2–3 kcal/mol. Another avoidable error occurred during the calculation of ligand restraining energies; this is an involved, if technically important point, and so we take a short detour to explain it here.
Distance and orientational restraints are used to maintain the ligand inside the binding site as its interactions with the protein are decoupled in the free energy simulation. The first step in the alchemical transformation is to compute the free energy cost of applying these restraints, which owes to the loss of translational and rotational freedom (Methods & Supplemental Material
). The cost of applying these restraints is calculated in steps of increasing restoring forces, the integration of which amounts to the total restraining free energy. For some calculations, the free energy of restraining the ligand to a single orientation (typically 1–2 kcal/mol) was unusually large (4–12 kcal/mol). Upon examining these simulations retrospectively, we found that particular ligands (for example 2-ethoxy-3,4-dihydro-2h-pyran) had erroneously begun their restraining simulations in an unfavorable, kinetically trapped orientation that was different from the target orientation of the restraints. Given the length of the simulations, this can present a serious convergence problem. The ligand may remain in that unfavorable orientation at steps in the restraining process where the restraints are not strong enough to pull the ligand across the kinetic barrier into the target orientation. As long as the ligand remains trapped, the restoring forces are being integrated, at ever increasing levels, into the overall restraining energy. When the barrier is significant, a large restraining energy is thus calculated. Thus by misidentifying an unfavorable trap as a favorable thermodynamic state, these unconverged restraining energies ultimately make binding appear artificially
too favorable, since they are added back to the ligands net free energy of binding, to “repay” the system for the cost of artificially restraining the ligand.
Taken together, these errors give some indication of how intricate these calculations remain, with many opportunities for the introduction of essentially trivial human error. Aggravating as this is, these errors can be traced to their origin and results typically improve when the errors are fixed.
In principle, an algorithmic origin of error comes from the calculation of ligand solvation energy, which might typically be laid at the door of the force field since sampling is less of a concern here. However, recent tests on computed hydration free energies of tens to hundreds of small molecules gave RMS errors to experiment that were typically in the 1.0 to 1.8 kcal/mol range, depending on compound polarity 13,28–30
. For example, phenol, one of the ligands here, had a computed hydration free energy that differed by only 0.9 kcal/mol to that observed experimentally. This leads us to believe modeling of ligand interactions with bulk solvent is probably not the dominant source of error in this study. 29
Of course, force field errors may also play a role in ligand-protein interaction energies, but as we will argue in the following sections, these errors, to the extent that they occurred, were largely obscured by what we believe to be problems with sampling and convergence.
Protein Conformational Change, Sampling Protein Motions & Convergence of the Calculations
Rerunning the calculations starting with the holo structures was an attempt to determine whether or not the simulations reached convergence and if under-sampling of helix F led to errors in the predicted ΔGb
values. Several scenarios may be considered. When apo and holo results agree, it suggests that either there is no convergence problem or there is a cancellation of errors. When they disagree, and the holo result is more favorable, it suggests a sampling problem in the apo state, possibly related to protein conformational changes on binding, as these will typically make binding appear more favorable 2,24
. The case where the apo result is more favorable than the holo is counter-intuitive, but owing to poor choices in initial ligand orientations, and consequent convergence problems in the restraining energies (see above), this did in fact occur.
Because we knew that the helix motion had not been sampled over the course of the apo calculation, we assumed that the second case would predominate, that is, the holo result would be more favorable for the eight ligands that induced protein conformational change upon binding. Instead, the opposite was true, with the holo result less
favorable than the apo result in nearly every case (Table 4 Supplementary Material
). This suggests that under-sampling of helix F in the apo calculation was not the dominant cause of errors observed in the ΔGb
values; instead several other sources of convergence problems, including ligand restraints discussed above, were responsible for these errors (Fig. 3 Supplementary Material
). Still, since F-helix closure was rarely observed in the course of the holo simulations following ligand decoupling as well, the energetic costs associated with breaking the helix during binding were poorly accounted for in these computed free energies; had it been, the holo results might have even been less
Despite being on the whole less favorable than the apo predictions, the holo results are less frequently overestimated, and are therefore in better overall agreement with the experimental binding free energies. Also, two of the ligands with the highest RMSD values to the crystallographic binding mode beginning from the apo prediction, nitrosobenzene and benzylacetate, were correctly predicted starting from the holo structure. This improvement reflects the prediction of a near-native orientation in the holo vs. an incorrect one in the apo pose prediction, and is consistent with large kinetic barriers separating different ligand binding modes 3,24
. For 2-ethoxy-3,4-dihydro-2h-pyran, however, even starting from the holo structure, the correct ligand orientation was not generated during the setup stage of the calculation (discussed in detail in Methods), and therefore was never sampled in the simulation. The end result is that the prediction for this ligand does not improve relative to the absolute binding free energy calculation from the apo structure.
Sampling the Ligand: Starting with the Wrong Orientation
Previously we had shown that decomposing configuration space by orientation, i.e. generating multiple starting orientations by docking and using simulations beginning from these to identify candidate binding modes, which we then consider separately, would allow adequate sampling (and also convergence) of ligand states separated by large kinetic barriers 3
. But this approach requires that docking generate candidate starting orientations which are close enough to the true binding mode that it can be found on nanosecond timescales. However, in L99A/M102Q a protein conformational change alters the size of the small binding site to allow larger ligands to bind and, in some cases, adversely affects the quality of the docking poses. This presents a problem for the alchemical free energy methods, which rely on having starting orientations that are reasonable. This proved to be the key to several of the mispredicted ligands in the apo calculations, most notably benzylacetate, which cannot be docked into the apo binding site with a good initial orientation () and is a hard failure, both predicted pose and affinity are incorrect.
Relative Free Energy Calculations
We examined a different problem, looking for prediction of relative affinities within a congeneric series of phenol derivatives. Since we already knew affinity and structures for two such ligands, phenol and catechol, we began relative binding free energy calculations for six others beginning with both of these two, separately. We further simplified the experiment by restricting the starting orientations of the ligands to symmetric binding modes about the ligand hydroxyl group that contacts Gln102. Starting from catechol, these calculations did well (, ) for all six compounds. For the same six compounds, performance diminished when we began with phenol. How can these results be reconciled?
For the relative free energy predictions, it was necessary to generate one or more starting orientations of each new ligand. In the absolute calculations, we consider many candidate binding modes for each ligand, but this is not a typical for relative free energy calculations. We therefore assumed that the position of the hydroxyl would typically be preserved, but we were unsure of the location for additional substituents. For example, for 2-methylphenol overlaid onto phenol, the methyl group could be on the left or right side of the binding site while preserving the hydroxyl position. Because of kinetic barriers to rotation of ligands within the binding site we considered both (see Material & Methods).
The results of the relative calculations reflect this initial sampling of the ligand orientation. There are only two potential ligand starting orientations for phenol. Also, because of the restriction of the hydroxyl position, phenol starts with only one position for the hydroxyl group (). If the simulations could be run long enough to sample all ligand orientations, the starting orientations would not substantially affect the results. But timescales for interconversion of orientations in this binding site are slow, so the choice of starting orientations affects the results. Indeed, it happens that the position of the hydroxyl position in the derivatives is not conserved, in several cases adopting the ortho hydroxyl position in catechol, (). Hence, for phenol the prospective predictions of the relative affinity are little better than random ().
Crystallographic orientations of the reference ligands phenol (orange, PDB ID 1LI2) and catechol (cyan, PDB ID 1XEP) overlayed on the apo reference structure (gray, PDB ID 1LGU). The two alternate hydroxyl positions are labeled A & B.
By the same logic, the performance of the catechol transformation is substantially better. Catechol has two hydroxyl groups, which translates to four starting orientations for the ligand, two reflections around each hydroxyl axis. Relative to phenol, the sampling of the hydroxyl position and, consequently, the ortho substituent, is better. The result is that one starts with reasonable sampling of ligand orientations, improving the predictions. Besides accurately predicting the relative free energy change (RMS error 1.1 kcal/mol), the compounds are also correctly rank-ordered, with the exception of 5-chloro-2-methylphenol. Pose fidelity to the crystallographic binding mode is high, and in three cases alternate orientations were correctly predicted. 5-chloro-2-methylphenol is the caveat to this set of transformations; its pose is accurately predicted (RMSD of 0.9Å), but the relative free energy is overestimated, possibly due to force field issues. For this ligand, and indeed all the free energy results, a good starting orientation is necessary but not sufficient for accurate prediction of both affinity and pose.