Genome sequencing efforts and an increasingly molecular level understanding of biological processes have uncovered myriad new biological targets of both fundamental and potential therapeutic interest. Pharmacological studies of these targets with small molecule ligands provide a powerful means to dissect their functions and to validate their therapeutic potential. Increasing access to high-throughput screening technologies in academia has facilitated the discovery of such molecules [1
]. However, several classes of targets have proven particularly challenging in these efforts, and some have even been labeled ‘undruggable’ as a result. Despite this moniker, the identification of small molecules to address these challenging targets is, at its core, a chemical problem in molecular recognition and, thus, one that begs a chemical solution.
In considering the limitations of chemical libraries in addressing challenging targets, it is important to recognize that the vast majority of these libraries are based on existing drugs. Leveraging these well-established structural classes has been viewed as a means to identify new small molecule ligands having desirable properties for subsequent drug development. However, a recent study by Overington and coworkers indicates that all current small molecule drugs address only 207 protein targets encoded in the human genome [2
]. Moreover, 50% of all drugs are focused on only four protein classes: rhodopsin-like G-protein coupled receptors, nuclear receptors, and voltage- and ligand-gated ion channels. Additional factors further narrow the range of chemical structures found in drug-like libraries, including convenient synthetic accessibility, physicochemical properties thought to be desirable for various reasons, and, in the case of proprietary industrial libraries, each company’s intellectual property position. Accordingly, it has been estimated that only ≈10–14% of the proteins encoded in the human genome are ‘druggable’ using existing drug-like molecules [3
How, then, might one develop small molecules to address the many challenging, but biologically and therapeutically interesting, targets that lie beyond this small subset? Chemical space [4
], the complete set of all possible small molecules, has been variously calculated to comprise 1030
structures, depending on the parameters used (for one example, see [5
]). Only a tiny fraction of this total space can be tested in a typical large screening campaign involving on the order of 106
molecules – even at the lower limit of the estimates for chemical space, this fraction (10−24
) is approximately equivalent to being able to test only one cell in one person out of the entire population of Earth! However, despite these seemingly daunting numbers, it seems likely that only a portion of total chemical space is relevant to biology, in that these molecules must be reasonably soluble and stable in aqueous media, and have appropriate structural features to bind to proteins and other biological targets with useful specificity; structural factors impacting cell permeability and pharmacokinetics impose further constraints upon molecules used in cellular and animal studies and beyond.
A recent study by Shoichet and coworkers provides important insights into how existing screening collections overcome this numerical problem and target biologically-relevant chemical space [6
]. In their analysis of Reymond and coworker’s universe of synthetically accessible molecules with ≤11 heavy atoms (C, N, O, F) [7
], commercially available compounds and libraries exhibited much higher similarity to metabolites and natural products than did the complete set of all 26.4 million possible molecules. The authors conclude that the reason existing libraries are effective at all in identifying new small molecule ligands is that they are based, albeit largely unintentionally, on structures in naturally occurring molecules, which have coevolved with proteins that bind them. Indeed, the tremendous historical impact of natural products upon drug discovery is well-established [8
Thus, rather than sampling chemical space randomly to address challenging biological targets, there is considerable interest in developing new libraries based on other classes of molecules that are biologically validated, but remain underrepresented in current screening collections (for an example, see: [9
]). Importantly, the Shoichet study indicates that 83% of natural product scaffolds and 20% of metabolite scaffolds (with ≤11 heavy atoms; the percentages are likely higher for larger molecules) are absent from commercially available collections [6
]. Accordingly, libraries based on specific, underrepresented scaffolds may address challenging targets by providing new pharmacophores and binding geometries.
In addition to specific underrepresented scaffolds, more general differences between biologically active natural products and existing synthetic drugs, based on structural and physicochemical properties, have also been described [10
]. While large datasets are often used in such analyses, we favor the use of smaller datasets and commonly available software programs to provide increased accessibility to synthetic chemists while retaining robustness of the results. We have now updated our own previous analysis [12
] to include 40 top-selling small molecule drugs [13
] (39 of which are orally bioavailable), a collection of 60 diverse natural products (including the 24 identified by Ganesan as having led to an approved drug from 1970–2006 [14
]) and 20 drug-like compounds from ChemBridge and Chem Div (see Supplementary Information
). Each compound was analyzed for 20 calculated structural and physicochemical parameters, then principal component analysis was used to replot the data in a two-dimensional format representing 73% of the information in the full 20-dimensional dataset (). The two unitless, orthogonal axes represent linear combinations of the original 20 parameters.
Figure 1 Principal component analysis of 20 structural and physicochemical characteristics of 40 top-selling drugs (red circles), 60 natural products (blue triangles), including Ganesan’s rule-of-five compliant (pink filled) and non-compliant (blue filled) (more ...)
Notably, the top-selling drugs cluster largely in one region of the plot, and the drug-like libraries overlap with this region. The few outlier drugs are natural products or derivatives, and these molecules, along with the 60 natural products, span a much broader range of chemical space. Analysis of component loadings indicates that, in general, the natural products in this analysis feature higher polarity/decreased hydrophobicity and higher molecular weights (to left on x-axis) and more stereochemical features and fewer aromatic rings (to bottom on y-axis) compared to synthetic drugs and drug-like libraries (see Supplementary Information
). Interestingly, two subsets of rule-of-five [15
] compliant and non-compliant natural products identified by Ganesan do cluster in distinct regions of our plot, although both subsets are equal in size (12 molecules) and have resulted in equal numbers of orally bioavailable drugs (7 each) [14
]. Thus, libraries that explore underrepresented regions of biologically-relevant chemical space with respect to structural and physicochemical parameters may also address challenging targets by providing, for example, larger binding surfaces, polarity/charge states, and functional groups that are often excluded from drug-like libraries.
To meet the need for new libraries to address challenging targets, many academic labs are developing libraries that are based on underrepresented scaffolds and that probe underrepresented regions of chemical space. In so doing, it is often appropriate to ignore, or at least avoid strict adherence to, various ‘rules’ that have been developed for drug-like libraries. First, many of these rules have been established with a view toward developing drugs that are orally bioavailable; of course, this is not a primary concern in academic screening, where new probes are needed to investigate fundamental biological processes in biochemical and cellular systems and, in the limit, to validate new targets in animal models. Second, natural products are often cited as being exempt from such rules, due to the influence of carrier-mediated and active transport; interestingly, however, it has recently been suggested that such transport may be much more common across all drugs than previously assumed [16
] and, thus, the exclusion of natural products from rule definitions becomes less straightforward. Third, and perhaps most importantly, these rules are based on retrospective analysis of existing drugs and do not provide a blueprint on how to escape from the narrow range of targets addressed by such drugs in the future.
Herein, we discuss three classes of biological targets that have proven challenging to address using existing drug-like libraries. While there is certainly some overlap between these categories, they provide a useful framework for discussion. For each, we provide recent examples of natural products and novel molecules derived from academic libraries that successfully engage these targets.