Modeling charge–charge interactions in docking is challenging because the gain in electrostatic energy upon ligand binding has to be balanced against desolvation energies. Both values are high in magnitude, as are their errors in computer simulations. This study allowed us to probe charge–charge interactions in a controlled environment, a small pocket completely buried from solvent. If we are able to get this balance right anywhere, it should be in such a relatively simple site. Correspondingly, mispredictions are particularly informative because they come much less entangled by the approximations necessary in more complicated sites. Five points stand out from this study. First, overall electrostatic and desolvation energy appear to be balanced well in the physics-based scoring function. No neutral compound ranked among the top 100 molecules, and the first dicationic molecule scores poorly at rank 3295. Second, from a practical standpoint, virtual screening with this cavity was successful. Fifteen of 16 chemically diverse compounds, which ranked in the top 5% of the database, did actually bind to the site when tested experimentally. For all ten high-ranking ligands for which the crystal structures in complex with CCP W191G were determined, the binding modes were predicted within <1 Å rmsd. Third, neither using a higher level of theory for calculating partial charges and desolvation energies, nor changing the dielectric constant in the cavity, improves these results. Fourth, although the overall performance is good, problems exist for neutral compounds. The only neutral ligand found that interacts with the deprotonated Asp135 (36) ranks poorly (1152nd), whereas the best scoring neutral decoy 27 ranks 147th. Fifth, analyzing false negative predictions points to weaknesses in current docking protocols and can guide the improvement of scoring functions and docking algorithms. Examples of such instructive false negatives are the alkyl amines 33–35. Their poor ranking owes to an inadequate handling of explicit water molecules during docking. Similarly, the binding mode of phenol was not predicted correctly, because pKa shifts were not considered. We consider these points further below.
The physics-based scoring function used here (equation (1)
) was surprisingly effective at enriching new ligands and predicting their binding geometries. We had expected the scoring function to have trouble balancing the interaction energy and desolvation terms, finding either more high-scoring neutral or dicationic hits than was warranted. Instead, the top scoring hits were dominated by singly charged cationic heterocycles, with the first neutral ligand ranked 147th (top 2.8% of the database) and the first dicationic molecule ranked 3295th (top 62.2% of the database). Of the 17 high-scoring molecules tested experimentally for binding, only two, 3,5-difluoroanline (27) and aminoresorcin (30) were not observed to bind (). It is debatable if aminoresorcin is really a false positive or rather a true negative prediction, since it does not even rank in the top 6% of the database. For four of the new high-ranking ligands binding constants were determined. They range from 20 μM to 60 μM putting them among the better ligands known for this cavity30
with a “ligand efficiency” for the best ligand close to the projected maximum.48–50
The geometric fidelity of the docking predictions was also high ( and ). At a first glance, predicting the correct pose might seem trivial, since most of the ligands are rigid and the pocket is small, but even in this simple system it can be a challenge. For instance, 2,6-diaminopyridine (14) does not form a double hydrogen bond with Asp235 via
its ring and exocyclic nitrogen atoms as one might expect, and as it is actually observed for 2-aminopyridine,30
2,5-diaminopyridine (15; ), 2-amino-5-picoline (16; ), 2-amino-4-picoline (17; ) and 2,4-diaminopyrimidine (18; ). Instead 2,6-diaminopyridine only forms one hydrogen bond via
the exocyclic amine group, and the protonated ring nitrogen does not have a hydrogen bonding partner at all (). Although this is not the binding mode we might intuitively predict for this ligand, it is correctly predicted in the docked geometry (). Also the binding mode of thiopheneamidinium (21) is predicted correctly, despite the absence of steric constraints to guide the position of the sulfur atom of the thiophene ring (). In summary, the quality of the docked geometries was typically high for the novel ligands, even in cases where distinguishing between the correct and incorrect pose involved a subtle balance of forces; even in a simple site, such balanced forces are often in play.
Along with the high hit rates came new and interesting chemotypes as ligands. Considering their small size, the enriched ligands are diverse and include disubstituted pyridines (14–17, 20, 25), pyrimidines (18, 19, 23), amidines (21, 22), alcohols (24, 26, 28), and non-aromatic ligands (22) (), none of which had previously been discovered. That said, all of these molecules are small and cationic; could they have been found by simpler methods, such as chemical similarity? Using Daylight fingerprints, only three of the 16 new ligands have a Tanimoto coefficient of 0.85 or better to the previously known ligands.29–31
Another way to pose this question is to ask how many of the supposedly novel docking hits would have been found by screening the database by similarity to the previously known ligands? Again using topological similarity as a metric, the enrichment of the docking-derived ligands from the similarity search was considerably lower than the structure-based docking enrichment (); most of the new chemotypes would not have been discovered solely by using a similarity search. Thus the docking-derived ligands seem genuinely novel, which is important for a model binding site as diverse ligands will avoid bias towards a particular type of chemotype when testing and improving scoring functions.
Taken together, the high enrichment of mono-cationic ligands and the high fidelity of the binding mode predictions suggest that the relatively simple, physics-based scoring function represented by equation (1)
can at least separate likely from unlikely ligands, getting the overall balance between electrostatic interactions and desolvation correct. On closer inspection, however, problems with the predictions do emerge. Not all ligand interactions were correctly predicted (), one high-ranking docking hit did not bind (27), and two neutral ligands (32 and 36) were ranked poorly. What do these problems tell us about weaknesses in our scoring functions and how might they be overcome?
We had previously found, in the neutral lysozyme cavities, that docking could be improved by moving to a higher level of theory in modeling ligand desolvation and partial atomic charges.25
Here, we investigated moving one step further, from a semi-empirical quantum mechanical method to a fully quantum mechanical method to calculate ligand partial charges and desolvation energies. Overall, moving to higher theory had little effect, with changes only in the relative ranking of the ligands and decoys ( and ; and ). The desolvation energies calculated by both methods can differ by several kcal/mol. The consequence for docking is that different ranks are predicted for specific ligands, without changing overall performance. Indeed, we may be reaching a limit on how well we can hope to do with even fairly sophisticated methods for calculating ligand desolvation. The error in the calculated energy for the transfer for a cation from water to vacuum with these methods is 3–4 kcal/mol.37,38
With our scoring function, a change of 3 kcal/mol can make a difference of about 200 rank units. To have a significant impact on molecular docking for virtual screening, a new method to calculate charges and desolvation energies must have a smaller error than this 3–4 kcal/mol uncertainty level that most of the current methods have for simple solvent transfer free energies.
There is no consensus as to what is the best dielectric constant to model electrostatics in a protein binding pocket.45–47
Based on strictly electronic effects, we used a dielectric constant of 2.25
This may be an extreme choice, given that we are docking to a rigid receptor. Also, changing the dielectric constant is a way to influence the weighting between the van der Waals term, ligand desolvation energy and electrostatic energy, and so, from a pragmatic standpoint, it seemed interesting to explore. We therefore repeated the docking screens using different dielectric constants for the protein binding site, leaving the external dielectric fixed at 78 (). In addition to the negatively charged CCP W191G, we also docked the database against the hydrophobic cavity in T4 lysozyme L99A and the slightly more polar cavity in T4 lysozyme L99A/M102Q. In all three systems, the best enrichment is obtained for values between 1.84 and 3.04. We also compared the performance obtained when the partial charges are either calculated in water or cyclohexane (Figure S2; Supplementary Data
). In all three systems, the enrichment is not influenced by these small changes. The similar behavior of these systems, in which the properties of the binding pocket range from completely hydrophobic to polar to charged, indicates that the physics-based scoring function used here is not grossly biased towards a particular type of interaction.
An attractive feature of model binding sites, such as CCP W191G, is that false positive and false negative predictions are often more informative than true predictions. We were thus almost disappointed by the high initial hit rate of the prospective docking screen. As we dug further, however, interesting problems did emerge. The neutral compound 3,5-difluoroaniline (27; ) ranks well, but does not bind to the cavity, whereas another neutral compound, 3-fluorocatechol (36) scores badly but does bind. Also, the alkyl amines 33–35 rank poorly, but bind to the cavity. These ligands form one hydrogen bond to a water molecule, which was not considered during docking (). If this water molecule is considered during docking, the scores of these alkyl amines improve, leading to a difference in more than 500 rank units. Unfortunately, simply adding a water molecule to the target is not a panacea. There are some molecules, like imidazoyl-methanol (24), that can dock with or without a water molecule despite the fact that it hydrogen bonds with it in the crystal structure (). Worse, most of the ligands will not bind with either of those water molecules present. To improve docking, algorithms are needed that treat the water structure flexibly, and that can balance the energetic costs and benefits of either binding or displacing ordered water molecules.51–54
Perhaps the most interesting mispredicted molecules are phenol (32) and 3-fluorocatechol (36), which are the first neutral ligands for this cavity (; and ). Neither of these molecules ranks well in the docking hit list. Admittedly, these neutral compounds are weaker ligands than many of the cationic ligands, though it is also true that compound 12 () binds in the millimolar range (1.5 mM).30
The poor ranking of the neutral compared to the charged ligands points to weaknesses in the docking protocol. Most likely, binding of phenol is associated with a pKa
shift of Asp235 (). Such pKa
shifts of either the ligand or the protein are not uncommon, but are not considered routinely in current docking protocols. Reliably modeling these changes would lead to better predictions. 3-Fluorocatechol interacts with the deprotonated Asp235 as modeled during docking, but still does not rank well. Thus, even though the docking screen performed well overall, there is room to improve the balance between electrostatic and desolvation energy; such an imbalance, obvious in this simple system, will become more deleterious in more complicated “drug-like” sites.