Scoring function is at the heart of molecular docking by assisting a docking program to efficiently explore the binding space of a ligand. It is also responsible for evaluating the binding affinity once the correct binding pose is identified. Therefore, the predictability of scoring functions has a significant impact on the productivity of DBVS.
A multitude of scoring functions have been reported in the past decades (10
) (Table ), and new ones are still emerging. Current scoring functions, as reviewed in other works (23
), can be roughly classified into three types: (a) Force field-based scoring functions employ classic force field to compute the noncovalent ligand–target interactions, such as van der Waals and electrostatic energies. They are often augmented by a GB/SA or PB/SA term in order to account for solvation effects. (b) Empirical scoring functions calculate the overall binding free energy from several energetic terms, including hydrogen bond interaction and hydrophobic interaction. The weighting factors of all terms are calibrated from a set of known complexes with experimentally determined structures and binding affinities. (c) Knowledge-based scoring functions compute the ligand–target interactions as a sum of distance-dependent statistical potentials between the ligand and the target. It is notable that the deduction of such potentials needs only the structural information of ligand–target complexes, which is being accumulated rapidly due to structural biology advances.
Examples of Current Scoring Functions
The performance of various scoring functions has been investigated by several comparative studies (73
), with respect to the ability of reproducing known binding pose, predicting binding affinity and rank-ordering a compound library. The state-of-the-art scoring functions are at different levels of accuracy, and it is clear that no single scoring function consistently outperforms others in all cases. It is concluded from previous comparative studies that today’s scoring functions are often capable of identifying the correct binding pose of a ligand, while binding affinity prediction with high accuracy is still far from reach (73
). Therefore, considerable efforts have been made to improve the performance of current scoring functions. Common strategies include adding additional factors to account for solvation and entropic effects (71
), deriving more accurate energy terms by high-level quantum calculations (78
), and consensus scoring by combination of multiple scoring functions (79
). In this review, we highlighted the recent progress in developing target-biased scoring functions as well as those employed machine learning techniques.
Target-Biased Scoring Functions
Most of the today’s scoring functions are generic models derived from the large-scale experimental data of ligand–target complexes and are presumably applicable to all sorts of target classes. However, previous comparative studies have revealed that a universally accurate scoring function is still out of reach. A practical remedy to this might be developing target-biased alternatives for specific targets or tasks (81
Target-Biased Scoring Functions Derived by Re-parameterization
The most straightforward way to obtain a target-biased scoring function is, probably, to re-calibrate an existing all-purpose scoring function directly on certain target classes. For example, DrugScore-RNA (82
) adopts the same framework as DrugScore (69
) but is derived from 670 crystal structures of nucleic acid–ligand and nucleic acid–protein complexes. Similar idea has been implemented in the kinase family-specific potential of mean force (kinase-PMF) (68
), a kinase-targeted scoring function adjusted from the original PMF04 (67
Tweaking the parameters in original scoring functions toward specific targets is also a prevalent strategy to derive target-biased scoring functions. For example, Teramoto and Fukunishi have applied a supervised scoring model to tailor the FlexX scoring function (F
-score), which outperformed its former version on three of the five tested targets (83
). The TOP approach suggested by Seifert (84
) have employed iterative taboo search to optimize the scoring function in ProPose and the original Böhm scoring function against three targets, including CDK2, estrogen receptor, and COX2. By adding negative data of ligands that are known not to bind particular target, Pham and Jain have tuned the scoring function in Surflex-Dock and observed substantially enhanced screening enrichment for HIV protease and poly(ADP-ribose) polymerase (85
). An augmented Flo+ scoring function has been developed by Catana and Stouten using N-way partial least squares (PLS) (86
), which significantly improved the correlation between observed and calculated pKi
values from R2
0.5 to 0.8 on a relatively diverse set of ligand–target complexes spanning seven protein families. Therefore, it would be attractive if scoring functions offer extendable or customizable features.
Target-Biased Scoring Functions Require no Re-parameterization
The above-mentioned target-biased scoring functions typically require re-parameterization or special treatment of established scoring functions. Too often, existing scoring functions are available to end-users as black boxes, hence it is not readily possible to adjust their parameters by any optimization algorithm. Several approaches have been proposed to address this issue. One of the earliest examples is the MultiScore that employs the raw scores from eight scoring functions to characterize the observed pKi
), which has been found to work better for matrix metalloproteinases. The implied idea is slightly different from that of consensus scoring (79
) in that it assumes uneven contributions from individual scoring functions. In a similar way, the AutoShim method has incorporated the original Flo+ score as well as additional target-specific pharmacophore points (shims) as descriptors in PLS analysis (88
). More recently, Cheng et al
. have proposed a knowledge-guided strategy (KGS) based on the similarity principle aiming to improve the accuracy of binding affinity prediction of current scoring functions (89
). The KGS strategy computes the binding affinity of a query ligand–target complex based on the known binding affinity of an appropriate reference complex, which is required to share a similar pattern of key ligand–target interactions to that of the query complex of interest. The KGS strategy has been validated with both observed and docked ligand–target complex structures. Moreover, it can in principle work in concert with any scoring method, and its application is not limited to specific classes of ligand–target complexes.
Machine Learning and Scoring Functions
Machine learning techniques are powerful to construct and optimize predictive models. In recent years, there is an increasing interest in developing novel scoring functions by means of machine learning (90
). A notable feature is that they take into account the commonly observed ligand–target binding interactions in an implicit manner, which obviates the need of explicitly modeling the error-prone interactions, including solvation and entropic effects. Moreover, machine learning techniques such as neural networks (NN), support vector machines (SVM), and random forest (RF) are able to account for the nonlinear dependence among the various interactions involved in ligand–target binding. As a result, despite being less concrete on the physicochemical basis, they often demonstrated a superior or at least comparable performance to that of classic scoring functions in binding affinity estimation.
The NNScore scoring function developed by Durrant and McCammon is based on NN (91
), which attempts to computationally simulate the microscopic organization of human brain. The input layer consists of 194 neurodes that are related to ligand–target interactions. Kinnings et al
) have applied SVM to train a new scoring function for identifying inhibitors of Mycobacterium tuberculosis
InhA, using the individual energy terms as descriptors obtained directly from the built-in scoring function of eHiTS. Amini et al
. have introduced the support vector inductive logic programming as a general approach to develop system-specific scoring functions (93
). The descriptors they used are the distances from each fragment’s central ligand atom to target atoms. In the development of PHOENIX scoring function, Tang et al
. have adopted an indirect idea (94
). They first modeled independently enthalpy (ΔH
) and the change of entropy (T
) by fitting relevant descriptors to experimentally measured calorimetric data through PLS and then calculated the binding free energy (ΔG
) according to thermodynamic cycle.
Similar to the idea of using occurrence count of ligand–target atom pair as geometric descriptor to generate a scoring function (95
), Li et al
) have developed a target-specific scoring method, SVM-SP, by using SVM. SVM-SP employs 135 atom pair potentials as descriptors that are derived in the same way as traditional knowledge-based scoring functions. The effectiveness of SVM-SP has been strongly supported by the discovery of three novel micromolar hits against epidermal growth factor receptor. The recently released RF-score by Ballester and Mitchell (97
) has been built with RF, where a set of descriptors are introduced based on the count of a particular ligand–target atom pair within a certain distance range. Despite the relatively coarse definition of ligand–target atom pairs, which considers only atomic number with no concern about distance dependence, RF-score strikingly outperformed all 16 state-of-the-art scoring functions in a recent benchmark (73