The concept of druglikeness provides useful guidelines for early stage drug discovery
1, 2. Analysis of the observed distribution of some key physicochemical properties of approved drugs, including molecular weight, hydrophobicity and polarity, reveals they preferentially occupy a relatively narrow range of possible values
3. Compounds that fall within this range are described as “druglike.” Note that this definition holds in the absence of any obvious structural similarity to an approved drug. It has been shown that preferential selection of druglike compounds increases the likelihood of surviving the well-documented high rates of attrition in drug discovery
4.
Druglikeness can be rationalized by consideration of how simple physicochemical properties impact molecular behavior in vivo, with particular respect to solubility, permeability, metabolic stability and transporter effects. Indeed druglikeness is often used as a proxy for oral bioavailability. However, druglikeness provides a broad composite descriptor that implicitly captures several criteria, with bioavailability amongst the most prominent.
In practical terms, assessment of druglikeness is most commonly manifested as rules, the original and most well known of which is Lipinski’s Rule of Five (Ro5)
5. The rule states that a compound is more likely to exhibit poor absorption or permeation when two or more of the following physicochemical criteria are fulfilled: the molecular weight (MW) is greater than 500Da; the calculated logP (ClogP) is greater than 5; there are more than 5 hydrogen-bond donors or the number of hydrogen-bond acceptors (nitrogen and oxygen atoms) is greater than 10. The rule does not apply to substrates of biological transporters or natural products. Aside from its predictive power, the widespread adoption of the Ro5 as a guideline for compound evaluation can also be attributed to the fact that it is conceptually simple and straightforward to implement.
Lipinski’s insight - that the great majority of orally absorbed drugs occupy a privileged area of molecular property space
5, 6 - has resulted in greater awareness of the importance of molecular properties in determining oral bioavailability. The rule has inspired numerous refinements and investigations into the concept of druglikeness: a comprehensive review of the area is provided by Ursu
et al.
2. The rule of five is not without its critics
7, yet in detail the issues tend to be with its qualitative nature, or the focus on oral drug space, as opposed to druglike thinking
per se.
Paradoxically, since the publication of Lipinski’s seminal paper
5 there appears to be a growing epidemic, of what Hann has termed “molecular obesity”
8 amongst new pharmacological compounds (
Supplementary Figure 1). Compounds with higher molecular weight and lipophilicity have a higher probability of attrition at each stage of clinical development
4, 9-11. Thus, the inflation of physico-chemical properties that increases the risks associated with clinical development may partly explain the decline in productivity of small molecule drug discovery over the past two decades
4. However, the mean molecular properties of new pharmacological compounds are still considered Lipinski compliant, despite the fact their property distributions are far from historical norms.
Whilst the Ro5 is predictive of oral bioavailability, 16% of oral drugs violate at least one of the criteria and 6% fail two or more (although this does include natural products and substrates of transporters) (
Supplementary Figure 2a and
Supplementary Table 1). Notably, high profile drugs such as atorvastatin (Lipitor) and montelukast (Singulair), fail more than one of the Lipinski rules (
Supplementary Figure 2b). Despite Lipinski’s recommendation that the rule be considered as a guideline in reality it is routinely used to filter libraries of compounds. The implementation of rules as filters means that no discrimination is achieved beyond a qualitative pass or fail – all compounds that comply with the rules are considered equal, as are all that breach.
The response to such issues is not to define more refined rules. Instead, methods to quantify druglikeness are required
12-14. However, scoring schemes proposed to date, often derived by machine learning methods, have lacked the intuitiveness, transparency and ease of implementation of the Ro5. To quantify compound quality we apply the concept of desirability
15 to provide a quantitative metric for assessing druglikeness that we call QED (Quantitative Estimate of Druglikeness). QED values can range between zero (all properties unfavourable) and one (all properties favourable). The desirability approach can be used to generate functions to describe any set of compounds depending on requirements. Here we will demonstrate the utility of the approach by describing desirability functions derived from a set of orally absorbed approved drugs.
Desirability provides a simple yet powerful approach to multi-criteria optimization. It is finding increasing utility in a number of applications in drug discovery including compound selection
16, library design
17, 18, molecular target prioritisation, central nervous system penetration
19 and estimating the reliability of screening data
20.The concept was originally introduced by Harrington
15 in the area of process engineering and further refined by Derringer
21. Desirability takes multiple numeric or categoric parameters measured on different scales and describes each by an individual desirability function. These are then integrated into a single dimensionless score. In the case of compounds, a series of desirability functions (
d) are derived, each corresponding to a different molecular descriptor. Combining the individual desirability functions into the QED is achieved by taking the geometric mean of the individual functions, as shown in
Equation 1.
Conventionally, desirability functions are defined arbitrarily, usually as monotonic decreasing or increasing functions, or “hump” functions at defined parameter ranges and inflection points. Importantly, whereas previous approaches have used functions defined by user experiences and expectations
16, 19, our approach differs fundamentally in that the functions are derived empirically by describing the underlying property distributions of a set of approved drugs, much as the boundaries defined by Lipinski were. The data used comprises a carefully curated collection of 771 orally dosed approved drugs. Eight widely-used molecular properties were selected on the basis of published precedence for their relevance in determining druglikeness
3, 5, 22, 23: molecular weight (MW), octanol-water partition coefficient (ALOGP)
24, number of hydrogen bond donors (HBD), number of hydrogen bond acceptors (HBA), molecular polar surface area (PSA), number of rotatable bonds (ROTB), the number of aromatic rings (AROM)
25, 26 and number of structural alerts (ALERTS)
27. The molecular properties were chosen on the basis that they have all been shown to influence the likelihood of attrition and can all be calculated robustly at high-throughput. Histograms showing the distribution of the eight molecular properties across the set of oral drugs are shown in . We found that the property distribution data are consistently best modelled as asymmetric double sigmoidal (ADS) functions, which are also shown in over the same range. The general ADS function is shown in
Equation 2 where
d(x) is the desirability function for molecular descriptor
x.
The parameters (
a, b, c, d, e and
f) for each of the ADS functions
dMW,
dALOGP,
dHBD,
dHBA,
dPSA,
dROTB,
dAROM and
dALERTS are shown in
Supplementary Table 2, as are the R
2 values and the rank amongst a library of non-linear functions.