The concept of druglikeness provides useful guidelines for early stage drug discovery 1, 2
. Analysis of the observed distribution of some key physicochemical properties of approved drugs, including molecular weight, hydrophobicity and polarity, reveals they preferentially occupy a relatively narrow range of possible values3
. Compounds that fall within this range are described as “druglike.” Note that this definition holds in the absence of any obvious structural similarity to an approved drug. It has been shown that preferential selection of druglike compounds increases the likelihood of surviving the well-documented high rates of attrition in drug discovery4
Druglikeness can be rationalized by consideration of how simple physicochemical properties impact molecular behavior in vivo, with particular respect to solubility, permeability, metabolic stability and transporter effects. Indeed druglikeness is often used as a proxy for oral bioavailability. However, druglikeness provides a broad composite descriptor that implicitly captures several criteria, with bioavailability amongst the most prominent.
In practical terms, assessment of druglikeness is most commonly manifested as rules, the original and most well known of which is Lipinski’s Rule of Five (Ro5)5
. The rule states that a compound is more likely to exhibit poor absorption or permeation when two or more of the following physicochemical criteria are fulfilled: the molecular weight (MW) is greater than 500Da; the calculated logP (ClogP) is greater than 5; there are more than 5 hydrogen-bond donors or the number of hydrogen-bond acceptors (nitrogen and oxygen atoms) is greater than 10. The rule does not apply to substrates of biological transporters or natural products. Aside from its predictive power, the widespread adoption of the Ro5 as a guideline for compound evaluation can also be attributed to the fact that it is conceptually simple and straightforward to implement.
Lipinski’s insight - that the great majority of orally absorbed drugs occupy a privileged area of molecular property space5, 6
- has resulted in greater awareness of the importance of molecular properties in determining oral bioavailability. The rule has inspired numerous refinements and investigations into the concept of druglikeness: a comprehensive review of the area is provided by Ursu et al.
. The rule of five is not without its critics7
, yet in detail the issues tend to be with its qualitative nature, or the focus on oral drug space, as opposed to druglike thinking per se
Paradoxically, since the publication of Lipinski’s seminal paper5
there appears to be a growing epidemic, of what Hann has termed “molecular obesity” 8
amongst new pharmacological compounds (Supplementary Figure 1
). Compounds with higher molecular weight and lipophilicity have a higher probability of attrition at each stage of clinical development 4, 9-11
. Thus, the inflation of physico-chemical properties that increases the risks associated with clinical development may partly explain the decline in productivity of small molecule drug discovery over the past two decades4
. However, the mean molecular properties of new pharmacological compounds are still considered Lipinski compliant, despite the fact their property distributions are far from historical norms.
Whilst the Ro5 is predictive of oral bioavailability, 16% of oral drugs violate at least one of the criteria and 6% fail two or more (although this does include natural products and substrates of transporters) (Supplementary Figure 2a
and Supplementary Table 1
). Notably, high profile drugs such as atorvastatin (Lipitor) and montelukast (Singulair), fail more than one of the Lipinski rules (Supplementary Figure 2b
). Despite Lipinski’s recommendation that the rule be considered as a guideline in reality it is routinely used to filter libraries of compounds. The implementation of rules as filters means that no discrimination is achieved beyond a qualitative pass or fail – all compounds that comply with the rules are considered equal, as are all that breach.
The response to such issues is not to define more refined rules. Instead, methods to quantify druglikeness are required 12-14
. However, scoring schemes proposed to date, often derived by machine learning methods, have lacked the intuitiveness, transparency and ease of implementation of the Ro5. To quantify compound quality we apply the concept of desirability15
to provide a quantitative metric for assessing druglikeness that we call QED (Quantitative Estimate of Druglikeness). QED values can range between zero (all properties unfavourable) and one (all properties favourable). The desirability approach can be used to generate functions to describe any set of compounds depending on requirements. Here we will demonstrate the utility of the approach by describing desirability functions derived from a set of orally absorbed approved drugs.
Desirability provides a simple yet powerful approach to multi-criteria optimization. It is finding increasing utility in a number of applications in drug discovery including compound selection 16
, library design 17, 18
, molecular target prioritisation, central nervous system penetration 19
and estimating the reliability of screening data 20
.The concept was originally introduced by Harrington15
in the area of process engineering and further refined by Derringer21
. Desirability takes multiple numeric or categoric parameters measured on different scales and describes each by an individual desirability function. These are then integrated into a single dimensionless score. In the case of compounds, a series of desirability functions (d
) are derived, each corresponding to a different molecular descriptor. Combining the individual desirability functions into the QED is achieved by taking the geometric mean of the individual functions, as shown in Equation 1
Conventionally, desirability functions are defined arbitrarily, usually as monotonic decreasing or increasing functions, or “hump” functions at defined parameter ranges and inflection points. Importantly, whereas previous approaches have used functions defined by user experiences and expectations16, 19
, our approach differs fundamentally in that the functions are derived empirically by describing the underlying property distributions of a set of approved drugs, much as the boundaries defined by Lipinski were. The data used comprises a carefully curated collection of 771 orally dosed approved drugs. Eight widely-used molecular properties were selected on the basis of published precedence for their relevance in determining druglikeness3, 5, 22, 23
: molecular weight (MW), octanol-water partition coefficient (ALOGP)24
, number of hydrogen bond donors (HBD), number of hydrogen bond acceptors (HBA), molecular polar surface area (PSA), number of rotatable bonds (ROTB), the number of aromatic rings (AROM)25, 26
and number of structural alerts (ALERTS)27
. The molecular properties were chosen on the basis that they have all been shown to influence the likelihood of attrition and can all be calculated robustly at high-throughput. Histograms showing the distribution of the eight molecular properties across the set of oral drugs are shown in . We found that the property distribution data are consistently best modelled as asymmetric double sigmoidal (ADS) functions, which are also shown in over the same range. The general ADS function is shown in Equation 2
is the desirability function for molecular descriptor x
The parameters (a, b, c, d, e
) for each of the ADS functions dMW
are shown in Supplementary Table 2
, as are the R2
values and the rank amongst a library of non-linear functions.
Histograms of 8 selected molecular properties for a set of 771 orally absorbed small molecule drugs