|Home | About | Journals | Submit | Contact Us | Français|
Druglikeness is a key consideration when selecting compounds during the early stages of drug discovery. However, evaluation of druglikeness in absolute terms does not adequately reflect the whole spectrum of compound quality. More worryingly, widely used rules may inadvertently foster undesirable molecular property inflation as they permit the encroachment of rule-compliant compounds toward their boundaries. We propose a measure of druglikeness based on the concept of desirability called Quantitative Estimate of Druglikeness (QED). The empirical rationale of QED reflects the underlying distribution of molecular properties. QED is intuitive, transparent, straightforward to implement in many practical settings and allows compounds to be ranked by their relative merit. We extend the utility of QED by applying it to the problem of molecular target druggability assessment by prioritizing a large set of published bioactive compounds. The measure may also capture the abstract notion of aesthetics in medicinal chemistry.
The concept of druglikeness provides useful guidelines for early stage drug discovery 1, 2. Analysis of the observed distribution of some key physicochemical properties of approved drugs, including molecular weight, hydrophobicity and polarity, reveals they preferentially occupy a relatively narrow range of possible values3. Compounds that fall within this range are described as “druglike.” Note that this definition holds in the absence of any obvious structural similarity to an approved drug. It has been shown that preferential selection of druglike compounds increases the likelihood of surviving the well-documented high rates of attrition in drug discovery4.
Druglikeness can be rationalized by consideration of how simple physicochemical properties impact molecular behavior in vivo, with particular respect to solubility, permeability, metabolic stability and transporter effects. Indeed druglikeness is often used as a proxy for oral bioavailability. However, druglikeness provides a broad composite descriptor that implicitly captures several criteria, with bioavailability amongst the most prominent.
In practical terms, assessment of druglikeness is most commonly manifested as rules, the original and most well known of which is Lipinski’s Rule of Five (Ro5)5. The rule states that a compound is more likely to exhibit poor absorption or permeation when two or more of the following physicochemical criteria are fulfilled: the molecular weight (MW) is greater than 500Da; the calculated logP (ClogP) is greater than 5; there are more than 5 hydrogen-bond donors or the number of hydrogen-bond acceptors (nitrogen and oxygen atoms) is greater than 10. The rule does not apply to substrates of biological transporters or natural products. Aside from its predictive power, the widespread adoption of the Ro5 as a guideline for compound evaluation can also be attributed to the fact that it is conceptually simple and straightforward to implement.
Lipinski’s insight - that the great majority of orally absorbed drugs occupy a privileged area of molecular property space5, 6 - has resulted in greater awareness of the importance of molecular properties in determining oral bioavailability. The rule has inspired numerous refinements and investigations into the concept of druglikeness: a comprehensive review of the area is provided by Ursu et al. 2. The rule of five is not without its critics7, yet in detail the issues tend to be with its qualitative nature, or the focus on oral drug space, as opposed to druglike thinking per se.
Paradoxically, since the publication of Lipinski’s seminal paper5 there appears to be a growing epidemic, of what Hann has termed “molecular obesity” 8 amongst new pharmacological compounds (Supplementary Figure 1). Compounds with higher molecular weight and lipophilicity have a higher probability of attrition at each stage of clinical development 4, 9-11. Thus, the inflation of physico-chemical properties that increases the risks associated with clinical development may partly explain the decline in productivity of small molecule drug discovery over the past two decades4. However, the mean molecular properties of new pharmacological compounds are still considered Lipinski compliant, despite the fact their property distributions are far from historical norms.
Whilst the Ro5 is predictive of oral bioavailability, 16% of oral drugs violate at least one of the criteria and 6% fail two or more (although this does include natural products and substrates of transporters) (Supplementary Figure 2a and Supplementary Table 1). Notably, high profile drugs such as atorvastatin (Lipitor) and montelukast (Singulair), fail more than one of the Lipinski rules (Supplementary Figure 2b). Despite Lipinski’s recommendation that the rule be considered as a guideline in reality it is routinely used to filter libraries of compounds. The implementation of rules as filters means that no discrimination is achieved beyond a qualitative pass or fail – all compounds that comply with the rules are considered equal, as are all that breach.
The response to such issues is not to define more refined rules. Instead, methods to quantify druglikeness are required 12-14. However, scoring schemes proposed to date, often derived by machine learning methods, have lacked the intuitiveness, transparency and ease of implementation of the Ro5. To quantify compound quality we apply the concept of desirability15 to provide a quantitative metric for assessing druglikeness that we call QED (Quantitative Estimate of Druglikeness). QED values can range between zero (all properties unfavourable) and one (all properties favourable). The desirability approach can be used to generate functions to describe any set of compounds depending on requirements. Here we will demonstrate the utility of the approach by describing desirability functions derived from a set of orally absorbed approved drugs.
Desirability provides a simple yet powerful approach to multi-criteria optimization. It is finding increasing utility in a number of applications in drug discovery including compound selection 16, library design 17, 18, molecular target prioritisation, central nervous system penetration 19 and estimating the reliability of screening data 20.The concept was originally introduced by Harrington15 in the area of process engineering and further refined by Derringer21. Desirability takes multiple numeric or categoric parameters measured on different scales and describes each by an individual desirability function. These are then integrated into a single dimensionless score. In the case of compounds, a series of desirability functions (d) are derived, each corresponding to a different molecular descriptor. Combining the individual desirability functions into the QED is achieved by taking the geometric mean of the individual functions, as shown in Equation 1.
Conventionally, desirability functions are defined arbitrarily, usually as monotonic decreasing or increasing functions, or “hump” functions at defined parameter ranges and inflection points. Importantly, whereas previous approaches have used functions defined by user experiences and expectations16, 19, our approach differs fundamentally in that the functions are derived empirically by describing the underlying property distributions of a set of approved drugs, much as the boundaries defined by Lipinski were. The data used comprises a carefully curated collection of 771 orally dosed approved drugs. Eight widely-used molecular properties were selected on the basis of published precedence for their relevance in determining druglikeness3, 5, 22, 23: molecular weight (MW), octanol-water partition coefficient (ALOGP)24, number of hydrogen bond donors (HBD), number of hydrogen bond acceptors (HBA), molecular polar surface area (PSA), number of rotatable bonds (ROTB), the number of aromatic rings (AROM)25, 26 and number of structural alerts (ALERTS)27. The molecular properties were chosen on the basis that they have all been shown to influence the likelihood of attrition and can all be calculated robustly at high-throughput. Histograms showing the distribution of the eight molecular properties across the set of oral drugs are shown in Figure 1. We found that the property distribution data are consistently best modelled as asymmetric double sigmoidal (ADS) functions, which are also shown in Figure 1 over the same range. The general ADS function is shown in Equation 2 where d(x) is the desirability function for molecular descriptor x.
The parameters (a, b, c, d, e and f) for each of the ADS functions dMW, dALOGP, dHBD, dHBA, dPSA, dROTB, dAROM and dALERTS are shown in Supplementary Table 2, as are the R2 values and the rank amongst a library of non-linear functions.
The chosen molecular descriptors may vary in the importance of their contribution to druglikeness, so each can be weighted by their relative significance. The unweighted QED shown in Equation 1 would then be replaced with a weighted QED (QEDw), shown in Equation 3,
where d is the individual desirability function, w is the weight applied to each function and n is the number of descriptors. Rather than assigning weights subjectively, we rationalized that the optimal set of weights is that which maximises information content, which can be measured by calculating the Shannon entropy28, 29 (Supplementary Figures 3a and 3b). An exhaustive search of possible weight combinations was performed for the set of approved drugs (see Methods). Three resulting sets of weights were considered (Table 1): i) the set of weights that gave the maximal information content (QEDwmax), ii) the mean weights of the 1,000 weight combinations giving the highest information content (QEDwmo) and iii) all weights as unity i.e. unweighted (QEDwu). Interestingly, the QEDwmax series gave zero weight to the PSA and HBA parameters suggesting the information in these parameters is redundant. To help explain the relative weights we performed a Principal Component Analysis of the unweighted desirability functions (Supplementary Figure 3c). The results were consistent with the entropy analysis in that the least correlated descriptors were weighted most highly. The pair-wise cross correlations between each of the properties is shown in Supplementary Figure 3d and listed in Supplementary Table 4. The complete weighted QED is given in Equation 4.
The property descriptors that are considered, the weights given to those descriptors and the set of data that the functions are derived from, can all be varied according to requirements. In this study we have considered approved drugs that are dosed orally, but given appropriate data sets, desirability functions could be derived with relative ease to describe the relevant chemical space for parenteral administration, blood-brain barrier penetration30 or taxonomic species with different permeability barriers.
A benchmark study was designed to determine the relative performance of QED, the druglike classifiers defined by Lipinski5, Veber23 and Ghose22 and the quantitative score of Gleeson et al.31 in distinguishing a set of drugs from a background set of compounds. The issue of assessing whether a compound is objectively druglike or otherwise is non-trivial and, as we have already argued, such a binary classification is somewhat misleading. With this consideration in mind, for the purposes assessing the relative performance of rule-based classifiers and QED we attempted to benchmark their performance qualitatively. The Drugbank database 32 was used as the positive set whilst the small molecule ligands of the Protein Data Bank (PDB) were used as the negative set.
The results of the benchmark study are shown in Figure 2a and Supplementary Figure 4 in the form of a Receiver-Operator Characteristic (ROC) plot. QED outperforms the Ro5 and Ghose rules and performs marginally better than the Veber rule at a QED of 0.35 (the threshold at which Veber closely approaches that of QED). However, unlike rule-based approaches this threshold could be modulated to give different levels of sensitivity and specificity according to requirements. The Ghose rule is less sensitive but more specific than the Veber and Lipinski rules. Interestingly this benchmark suggests that the Veber rule outperforms the Ro5. QEDwmo and QEDwu outperform the quantitative measure of Gleeson regardless of threshold. QEDwmax outperforms Gleeson above 0.37, below which it performs comparably. Performance of QEDwmax, QEDwmo and unweighted QEDwu is generally comparable, suggesting that the optimally weighted index (QEDwmax) can provide similar discrimination despite using fewer molecular descriptors. The best performing measure alternates between QEDwmo and the unweighted QEDwu depending on the range being considered, with QEDwmax performing marginally worse. Given the somewhat artificial nature of the benchmark, in practical terms QEDwmo and QEDwu could be used interchangeably; only 18 (2.3%) of the DrugStore oral drugs have ΔQED (between QEDwmo and QEDwu ) of >0.15 and 100 drugs (13.0%) have ΔQED >0.10 (Supplementary Figures 5a and 5b). Therefore, QEDwmo is used in all further analyses described here.
Direct comparison of the Ro5 and QED is illustrated in Figures 2b and 2c for the set of 771 oral drugs. An advantage of QED is its ability to rank compounds whether they fail the Ro5 or not. Interestingly, oral drugs that fail the Ro5 show QED values over a very wide range from nearly 0 to 0.8 (Figure 2c). Figure 2d shows the differences in the distribution of QED scores for compounds in the ChEMBL database of small molecule bioactivities33, small molecule ligands from the PDB and the set of oral drugs from DrugStore used to derive the functions. Such comparative analyses provide the means of establishing the relative druglikeness of any library of compounds.
As beauty is in the eye of the beholder, so chemical attractiveness is in the eye of the chemist34, 35,36. A study that compared the ability of chemists to assess druglikeness, revealed that while chemists would agree on the ‘attractive’ or drug-like structures, subjective human analysis is inconsistent in rejecting undesirable or ‘ugly’ compounds35. In an attempt to use chemists collective experiences as a means to evaluate druglikeness Takaoka et al. found the correlation coefficient between druglike scores assigned by individual chemist to be 0.5 – 0.6 34. Lipinski has argued that pattern recognition is the forte of the chemist37, 38. Wipke and Rogers have described the chemist’s knowledge of chemical structures as Gestalt pattern recognition process39. Thus we suggest that QED is an objective score that may correlate with the tacit knowledge of chemists’ subjective assessment of druglikeness or chemical attractiveness. The advantage of a codified metric on which chemical attractiveness can be judged is its application to ranking very large numbers of compounds. To aid the interpretation, it may be useful to consider QED values in the context of the observed distribution of a large reference set. To illustrate this, QED values corresponding to key percentiles from the ChEMBL database are shown in Supplementary Table 3 and a complete list is provided in the Supplementary Information.
Compared to the binary classification of the Ro5, QED exhibits a continuous scale from the most druglike drugs (Figure 3a) to the least druglike (Figure 3b). Comparison of the most druglike drugs that fail Ro5 (Figure 3c) and the least druglike drugs that pass the Ro5 (Figure 3d) illustrate the potential of QED to objectively rank compounds by the elusive quality of chemical attractiveness.
To assess whether QED reflects chemists’ opinions of chemical attractiveness we compared QED with the manually assigned annotations for 17,117 diverse compounds scored by a survey of 79 chemists from across AstraZeneca’s chemistry community (see Supplementary Information). Each chemist was asked to provide a yes or no answer to the question “would you undertake chemistry on this compound if it were a hit?” for approximately 200 compounds each. Less than one third (31.8%) of the compounds were considered as attractive chemical starting point for hit optimisation (5,457). Of 11,660 compounds that were considered unattractive, 4,497 (38.6%) were considered to be “too complex” and 5,243 (45.0%) considered “too simple”, the remainder having no reason assigned. The mean QED is 0.67 (S.D. = 0.16) for the attractive compounds, 0.49 (S.D. = 0.23) for the unattractive compounds and 0.34 (S.D. = 0.24) for the unattractive compounds considered “too complex” (Figure 3e and 3f). The difference in QEDs between the attractive and unattractive compounds is statistically significant. The estimated difference in the medians of the attractive and unattractive compounds is 0.164 (Wilcoxon rank-sum test, 95% confidence interval 0.157-0.171). The equivalent value for the difference between the attractive and the “too complex” set is 0.349 (95% confidence interval 0.340-0.358).
A logical extension of the concept of compound druglikeness is to apply it to the problem of target druggability assessment. Hopkins and Groom 40 postulated that if there are physico-chemical limitations to the properties of compounds that are likely to be oral drugs (as Lipinski proposed 5, 6), then drug binding sites should have complementary properties. An implication of this idea is that not all ligand binding sites have the appropriate physico-chemical and topological properties to non-covalently bind small molecule drugs with sufficient affinity. Binding sites that do have these characteristics are described as druggable. Note that this definition is independent of any wider biological considerations. A number of algorithms have been developed to determine the druggability of proteins based on analysis of the structural and physico-chemical properties of an identified binding site 41-44. A common feature of structure-based druggability analysis methods is the classification of a binding site into the categories of druggable or undruggable based on predefined training data.
Much as we have argued for the benefits of considering druglikeness in quantitative terms, the druggability of a protein can also be considered as a continuum of chemical tractability 45 rather than as a simple binary categorical assignment, thereby enabling the prioritization of druggable binding sites. QED provides an efficient means to quantify and rank the druggability of targets according to the chemical attractiveness of their associated ligands. QEDs were calculated for each compound in the ChEMBL database33 of published bioactivity (release ChEMBL09) having an affinity <10uM for a defined human protein target. The resulting 167,045 compounds are associated with 1,729 human proteins.
Top ranking targets by three different schemes are shown in Table 2. The first scheme involves ranking targets by the mean QED of their associated ligands (Table 2 and Supplementary Table 5). The mean QED for all targets in the list is 0.478. For the targets of approved drugs the mean QED is 0.492 and for the targets of approved oral drugs the mean QED is 0.539 (with an average standard deviation for a target of 0.231). Drug targets are indeed enriched towards the more highly desirable targets with 70% of the drug targets being found in the top 50% of the prioritized target list.
Within a set of ligands for a target, it is useful to consider the QED of distinct chemical series, as even targets that are perceived as being relatively intractable may have a small proportion of associated chemical matter that is druglike and of potential interest. To approximate distinct chemical series, all by all Tanimoto similarity matrices46 were calculated for each of the 1,729 human protein targets in ChEMBL. Compound similarity is represented as a network with chemical series being identified as distinct subgraphs within the network using a Tanimoto similarity threshold of 0.7. We define a chemical series as a cluster comprising at least 5 compounds and an active chemical series as one where the proportion of actives is at least 0.7 (with an activity threshold of 10μM or Ligand Efficiency of 0.3). The number of compounds, series and active series for all ChEMBL targets is listed in the Supplementary Information. Chemical similarity networks for four targets are shown in Figure 4. The chemical network representation in Figure 4 illustrates the presence of highly desirable chemotypes even for some targets with low mean QED. The mean QED of the most druglike active series for each target provides the second ranking scheme, listed in Table 2 and Supplementary Table 6. The mean QED of the best compound cluster of all ChEMBL targets is 0.569 (where the cluster comprises at least 5 compounds). The mean QED of the best cluster of human drug targets is 0.693. The mean QED when considering only the best cluster of the targets of oral drugs is 0.766.
A third approach to ranking targets is to consider the degree of enrichment of druglike series. Here, targets are ranked by the proportion of active series that have a mean QED above that of the top 10% of the ChEMBL database (0.796) (Table 2 and Supplementary Table 7).
QED provides the means to rank chemical structures by their merit relative to a target function, which in this case are the properties of oral drugs. Furthermore, by extension of the concept to the set of ligands associated with a drug target, QED provides an efficient means to quantify and rank the druggability of targets. Lipinski’s Rule of Five has gained considerable traction in early stage drug discovery largely because it is predictive, intuitive and simple to implement. We believe QED compares favourably in each of these regards. Compared to the rule-based approaches QED offers a richer, more nuanced view of druglikeness. The QED functions are based on the underlying distribution data of drug properties and unlike rule-based metrics can identify cases when a generally unfavourable property can be tolerated where the other parameters are close to ideal. In so doing the phenomenon of druglikeness is evolved from a binary ‘black and white’ assessment to a more realistic and gradated description of the continuum of compound quality.
A non-redundant data set comprising 771 approved drugs was derived from the ChEMBL DrugStore database47. The selected compounds were all (i) marketed drugs, (ii) classified as small molecular weight therapeutics (i.e. no nutritional supplements, diagnostic agents or biologics), (iii) of specified molecular structure, (iv) composed of at least six atoms, (v) dependent on a biological macromolecule for their mode of action (i.e. exclude chelators and buffers), (vi) orally administered, (vii) systemically absorbed (i.e. exclude compounds whose site of action is in the gastro-intestinal (GI) tract e.g. orlistat targets gastric lipase, acarbose targets enteric alpha glucosidase).
Physico-chemical properties were calculated using the Pipeline Pilot Chemistry Collection (version 18.104.22.1680) from Accelrys (San Diego, CA, USA). The properties calculated were Molecular Weight (MW), octanol-water partition coefficient (ALOGP) (using the atom-based method by Ghose and Crippen24), number of hydrogen bond donors (HBD), number of hydrogen bond acceptors (HBA), molecular polar surface area (PSA), number of rotatable bonds (ROTB) and the number of aromatic rings (AROM)25, 26. Finally, a substructure search was performed against each drug using a curated reference set of 94 functional moieties that are potentially mutagenic, reactive or have unfavourable pharmacokinetic properties27. The number of matches for each compound was captured (ALERTS). We chose to omit the acid dissociation constant (pKa) as the available high-throughput computational approaches do not provide sufficient accuracy48.
Histograms were plotted reflecting the distribution of each of the 8 molecular properties for the oral drugs. For discrete variables (HBD, HBA, ROTB, AROM, ALERTS) a bin size of 1 was used. For continuous variables (MW, ALOGP and PSA) the optimal bin size D was estimated by optimising the cost function C(Δ)49 (Equation 5):
where k and v are the mean and variance of the occupancy of bins of size Δ respectively. For PSA a local minimum C(Δ) was used. A library of functions was fitted to the distributions using TableCurve 2D version 5.01 (Systat Software, CA, USA). Asymmetric Double Sigmoidal (ADS) functions (Equation 2) were found to be the most consistently high-ranking non-linear functions (Supplementary Table 1) and also reflected the important underlying asymmetry. Each function was then normalized by dividing by the maximum function value d(x)max to give a value between 0 and 1.
The individual desirability functions were combined into the QED by taking the geometric mean, which, by logarithmic identities, can be expressed as the exponent of the arithmetic mean of the logarithm transformed identities (Equations 1 and 3).
where QEDw is the weighted QED calculated with a set of weights w. Each possible combination of weights between 0 and 1 at increments of 0.25 were exhaustively enumerated for all 8 molecular descriptors, giving 58 (390,625) weight combinations of for each of the 771 drugs. The combination of weights giving the highest entropy gives QEDwmax (Table 1). Inspection of the ranked weight combinations revealed a “spike” of higher entropy values over the highest-scoring 1,000 combinations (Supplementary Figure 3a). The mean of each individual molecular property weight over these 1,000 highest ranked entropy scores gives the mean optimal weighted QEDwmo (Table 1). QEDwmo may more accurately sample the high entropy combinations whilst attenuating the quantized nature of the weight increments. The robustness of this procedure was established by assessing the relationship of individual descriptor weights to the ranked entropy scores compared to a randomized series (Supplementary Figure 3b).
PCA was performed on the 8 unweighted desirability functions calculated on the ChEMBL database (release ChEMBL09) using Pipeline Pilot’s R Statistics Component Collection (Supplementary Figure 3c).
The benchmarking assessment involves assignment of positive and negative compound sets. The DrugBank database32 was used to derive the positive set. 771 compounds having the word “oral” in their “Route of Administration” field were selected. Whilst we endeavoured to obtain a truly independent positive set for the benchmark inevitably significant overlap was found between the DrugBank set and the drugs used to derive QED. 554 of the 771 compounds were structurally identical and a further 30 had significant structural similarity (Tanimoto score > 0.8). Small molecule ligands from the Protein Data Bank’s (PDB’s) Ligand Dictionary50 was selected as the negative set as it provides a large and diverse source of chemical tools, metabolites, natural products, crystallographic buffers as well as drugs. To prevent ambiguity, 475 compounds were removed that had significant structural similarity to the positive set (Tanimoto score > 0.8), leaving a negative set of 10,250.
The following performance measures were used:
Where MCC = Mathews Correlation Coefficient, TP = True Positives, TN = True Negatives, FP = False Positives and FN = False Negatives.
The ChEMBL database includes a highly heterogeneous assortment of published bioactivity data. Bioactivity endpoints were only considered when (i) there was a defined protein molecular target, (ii) the activity type was either IC50, Ki or Kd, (iii) the relation was ‘=’, ‘<’ or ‘<=’, (iv) standard units were defined as ‘nM’ and (v) the activity was greater than 10-6 nM (largely to remove misannotations). A broad range of bioactivity values are typically reported for a given combination of target and ligand due to a combination of biological, technical and annotation errors. Selection of the “correct” value is non-trivial, particularly when using large-scale automated procedures. Simple calculation of a mean is sensitive to outliers. As such, for each combination of target and ligand we identified the modal log unit of bioactivity and calculated the mean value of activities within that range. Consideration of only human targets results in 167,045 unique compounds being associated with 1,729 proteins, giving 310,551 compound target pairs.
For each protein target the Tanimoto structural similarity of each associated compound to every other associated compound was calculated using Pipeline Pilot (FCFP_4 fingerprints) to give an all-against-all similarity matrix. Compound networks were derived from these matrices using the Python package NetworkX.
This research received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement N° 223461 and the Scottish Universities Life Sciences Alliance. We thank John Overington for providing the DrugStore data, Ruth Brenk for the provision of SMARTS for structural alerts and Ian Carruthers for assistance with DrugStore database queries. We also express our thanks to the chemistry community of AstraZeneca for participating in the chemistry survey.
Supplementary information is available online at XXXX. We have implemented QED as simple functions in Python, SQL (Structure Query Language), Accelrys Pipeline Pilot and Microsoft Excel, the codes for which are available in the Supplementary Information. The Microsoft Excel example also includes data on the 771 oral drugs used to derive the desirability functions. Pre-calculated QED values and desirability functions for 657,736 compounds from ChEMBL (release ChEMBL09) are also available.
A.L.H. conceived the approach and designed the algorithm. G.R.B. implemented the algorithm, performed the calculations, identified the functions and performed the analysis. G.V.P. developed the use of Shannon entropy as the weighting scheme. J.B. wrote the Pipeline Pilot implementation. S.M. coordinated the survey of AstraZeneca chemists. A.L.H and G.R.B. wrote the manuscript. G.R.B. produced the figures. All authors commented on the manuscript.