Molecular docking aims to evaluate the feasible binding geometries of a putative ligand with a target of known 3D structure. Typically, docking algorithms consist of both a search algorithm for the exploration of different ligand (and sometimes protein) conformations, and a scoring function for the calculation of ligand binding affinities. Ideally, the scoring function should be able to identify a solution with the correct ligand binding mode from alternative solutions, and ultimately be able to rank a set of ligands according to experimental binding affinity. In principle, the binding affinity should be calculated based on the first principle of thermodynamics. The most powerful approach is the absolute binding free energy (ABFE) approach
1-6, which uses extensive conformational sampling from molecular dynamics simulation, fully detailed atomic force fields, and a separate simulation of the solvation of the ligand, protein and associated complex. However, ABFE is too computationally expensive to be applied to screen millions of compounds. Furthermore, in spite of its cost, the prediction from ABFE is not always accurate
7. Tremendous efforts have been made to develop physical-based or knowledge-based docking scoring functions to efficiently predict binding affinity. However, docking scoring functions remain notoriously weak predictors of binding affinity. Indeed, following an evaluation of 10 docking programs and 37 scoring functions, Warren et al.
8 concluded that scoring functions are in need of significant improvements for predicting binding affinity. The principal reason for failure is the inability of the scoring function to reliably rank optimal ‘native-like’ ligand conformations above ‘non-native’ orientations
9. Thus, although in most cases the correct binding mode can be retrieved during the conformational search, assigning the lowest energy score to the correct binding pose has proved to be more challenging. This inevitably leads to poor correlation with experimentally determined binding affinities. In general, the prediction of binding affinity is a challenging task since it is not only the result of collective weak noncovalent interactions, but it also includes the ability of the ligand to access the binding site, the desolvation free energy of the ligand and the binding site, and entropy and enthalpy changes in the ligand, protein, and solvent
10. A realistic goal for docking scoring functions may be to discriminate active and inactive compounds and to rapidly filter out likely inactives in high-throughput screening campaigns.
Almost all existing docking scoring functions, including physical-based force fields, involve the fitting of data from experiments and calculations based on quantum mechanics. Docking scoring functions typically assign a common set of weights to the individual energy terms that contribute to the overall energy score, however, the weights assigned to the individual energy terms are, in reality, gene family-dependent. In order to achieve more accurate results, the docking score must be trained to derive a unique set of weights for each individual family. Furthermore, many docking algorithms incorrectly assume that individual interactions contribute towards the total binding affinity in an additive manner, and therefore derive their predictions of binding affinity from a linear combination of individual energy terms. However, this method is not theoretically sound, since it fails to consider the cooperative effects of noncovalent interactions. Such cooperativity has only been considered recently. These studies all highlight the need for docking scoring functions that are based on nonlinear models
10-13. Thus, the development of target-dependent docking scoring functions based on nonlinear models would significantly enhance their predictive ability, facilitating the rational design of ligands that bind to specific sites on target proteins
11.
The ability of machine learning methods to produce effective scoring functions has been demonstrated in a number of recent studies. For instance, both Deng
et al. 14 and Ballester and Mitchell
15 characterized protein-ligand complexes using feature vectors comprising the number of occurrences of specific protein-ligand atom type pairs interacting within predefined distance thresholds. While Deng
et al. trained Kernel a PLS (partial least squares) to predict protein-ligand binding affinity, Ballester and Mitchell trained a random forest classifier for the same purpose. The algorithm by Ballester and Mitchell (‘RF-Score’) was shown to outperform 16 existing scoring functions when using a benchmark consisting of 195 diverse protein-ligand complexes. In 2005, Springer
et al. 16 presented PostDOCK, a post-processing filter to distinguish true binding protein-ligand complexes from docking artefacts generated by the popular docking program DOCK 4.0.1
17. PostDOCK uses biochemical descriptors to characterize the protein-ligand interaction, including nonbonded van der Waals and electrostatic interaction force field terms from the DOCK score, terms for hydrogen bonding, metal bonding, lipophilicity, and a series of buried solvent accessible surface area descriptors. The authors used a random forest classifier to separate the binding and the nonbinding ligands from a test set of 44 structurally diverse protein targets, and showed that PostDOCK was able to outperform both the DOCK and ChemScore
18 scoring functions. More recently, Sato
et al. 19 developed Pharm-IF, a novel atom-based, pharmacophore-based interaction fingerprint. The fingerprint is calculated from the distances of pairs of ligand pharmacophore features that interact with protein atoms. Hydrogen bonds, ionic interactions and hydrophobic interactions are considered. Four machine learning algorithms (naïve Bayesian classifier, random forest, support vector machine and artificial neural network) were used to build prediction models based on Pharm-IF, and each was shown to outperform the GLIDE
20 docking algorithm across all five targets investigated. Durrant and McCammon
21 also recently demonstrated that neural networks can be used to successfully characterize the binding affinities of protein-ligand complexes. Their neural networks, which were trained by characterizing 4,141 protein-ligand complexes across 194 dimensions, were used to rescore the docked poses of a selected set of ligands. The authors demonstrated that their networks were able distinguish between well-docked and poorly docked ligands, and even between true ligands and decoy compounds when both were well-docked.
Classification support vector machines (SVMs) separate positive and negative data in multidimensional space by drawing an optimal hyperplane with a maximum distance between the two classes. As such, SVMs can potentially be applied in developing nonlinear models of docking scoring functions using high-throughput screening (HTS) data. Unfortunately, data imbalance, which thwarts SVMs, is a common feature amongst a wide range of experimental data, including the data here. It occurs when one of the classes in the dataset, commonly the class of interest, is represented by a significantly smaller number of samples than the other class(es). For instance, of the tens of thousands of compounds typically investigated in HTS, it is likely that only a small fraction will show activity against the target of interest. As a result, HTS data are commonly imbalanced, with the vast majority of compounds belonging to the negative class. The use of such imbalanced data in the training of classification SVMs tends to push the optimal hyperplane towards the positive class, resulting in a tendency of the model to classify observations into the negative class, and therefore have high precision but low recall
22.
Sampling strategies, such as oversampling the minority class, or undersampling the majority class are popular choices when addressing the issue of data imbalance
23. In a recent study, Tang et al.
24 proposed a granular sampling strategy for data rebalancing, which undersamples the majority class, while minimizing information loss. Their method treats all positive samples as important due to their rarity, and discards uninformative (i.e., non-SV) negative samples. Indeed, since only SVs are important for SVM model classification, the removal of non-SV samples has no significant effect on model performance. In this way, the granular sampling method identifies samples that are adjacent to the border between the two classes, and uses them to draw the separating hyperplane. Those samples that are far away from the hyperplane are discarded during model building. Li et al.
22 recently applied this granular sampling method to facilitate the analysis of highly imbalanced luciferase inhibition bioassay data. While the data imbalance associated with HTS data has been noted in a number of other studies
25-27, to our knowledge, there have been no methods reported to tackle this problem effectively.
To address the issue of data imbalance in SVM training during docking, we have developed a multiple planar SVM scheme to balance precision and recall. The multiple planar SVM randomly partitions the overrepresented negative cases into
n subsets.
n SVM models are then trained using each of the partitioned negative subsets, plus a common set of positive cases. The prediction is based on the combined score of the
n SVM models (). Compared to the granular sampling method, the multiple planar method is more computationally efficient. Indeed, the main drawback of the granular sampling method is that it is not possible to extract all informative negative samples using a single SVM. For instance, in the aforementioned work of Li et al.
22, 67 sampling rounds, each involving the modelling of a new SVM, were required before the results converged and the sampling process was considered complete. While the multiple planar method also requires the training of multiple SVM models, each of these models is a small fraction of the size of the overall dataset.
In this paper we show how the use of a SVM, trained by associating sets of individual energy terms retrieved from molecular docking with the known binding affinity of each compound in the training set, can be used to improve the correlation between known binding affinities and those predicted by a popular docking program for a series of known inhibitors. The docking program eHiTS
28 was selected for this study due to its ability to output a large number of individual energy terms contributing to the overall energy score. Although we only used eHiTS in this study, our methodology can be applied to any other docking software provided that it outputs individual energy terms. More generally, it is applicable to all docking algorithms if 3D structural fingerprints are used as feature vectors.
We have applied this methodology to the development of an enoyl acyl carrier protein reductase (InhA)-specific docking scoring model. InhA is the target of the first-line anti-tubercular drug isoniazid. The therapeutic effect of isoniazid relies on it forming a covalent conjugate with an NAD co-factor. Since the extensive use of isoniazid has prompted the emergence of drug-resistant
M.tb strains, the development of direct inhibitors of InhA is highly desirable
29. We are particularly interested in repurposing existing pharmaceuticals to directly inhibit InhA. Indeed, in a recent study, we have shown that the safe pharmaceutical Comtan, a catechol-
O-methyltransferase (COMT) inhibitor, which is used in the treatment of Parkinson's disease, is able to directly inhibit InhA
30. In this report, based on the SVM model and ligand binding site similarity, we propose that phosphodiesterase inhibitors could potentially be direct inhibitors of InhA.