|Home | About | Journals | Submit | Contact Us | Français|
Existing drugs address a relatively narrow range of biological targets. As a result, libraries of drug-like molecules have proven ineffective against a variety of challenging targets, such as protein–protein interactions, nucleic acid complexes, and antibacterial modalities. In contrast, natural products are known to be effective at modulating such targets, and new libraries are being developed based on underrepresented scaffolds and regions of chemical space associated with natural products. This has led to several recent successes in identifying new chemical probes that address these challenging targets.
Genome sequencing efforts and an increasingly molecular level understanding of biological processes have uncovered myriad new biological targets of both fundamental and potential therapeutic interest. Pharmacological studies of these targets with small molecule ligands provide a powerful means to dissect their functions and to validate their therapeutic potential. Increasing access to high-throughput screening technologies in academia has facilitated the discovery of such molecules . However, several classes of targets have proven particularly challenging in these efforts, and some have even been labeled ‘undruggable’ as a result. Despite this moniker, the identification of small molecules to address these challenging targets is, at its core, a chemical problem in molecular recognition and, thus, one that begs a chemical solution.
In considering the limitations of chemical libraries in addressing challenging targets, it is important to recognize that the vast majority of these libraries are based on existing drugs. Leveraging these well-established structural classes has been viewed as a means to identify new small molecule ligands having desirable properties for subsequent drug development. However, a recent study by Overington and coworkers indicates that all current small molecule drugs address only 207 protein targets encoded in the human genome . Moreover, 50% of all drugs are focused on only four protein classes: rhodopsin-like G-protein coupled receptors, nuclear receptors, and voltage- and ligand-gated ion channels. Additional factors further narrow the range of chemical structures found in drug-like libraries, including convenient synthetic accessibility, physicochemical properties thought to be desirable for various reasons, and, in the case of proprietary industrial libraries, each company’s intellectual property position. Accordingly, it has been estimated that only ≈10–14% of the proteins encoded in the human genome are ‘druggable’ using existing drug-like molecules .
How, then, might one develop small molecules to address the many challenging, but biologically and therapeutically interesting, targets that lie beyond this small subset? Chemical space , the complete set of all possible small molecules, has been variously calculated to comprise 1030–10200 structures, depending on the parameters used (for one example, see ). Only a tiny fraction of this total space can be tested in a typical large screening campaign involving on the order of 106 molecules – even at the lower limit of the estimates for chemical space, this fraction (10−24) is approximately equivalent to being able to test only one cell in one person out of the entire population of Earth! However, despite these seemingly daunting numbers, it seems likely that only a portion of total chemical space is relevant to biology, in that these molecules must be reasonably soluble and stable in aqueous media, and have appropriate structural features to bind to proteins and other biological targets with useful specificity; structural factors impacting cell permeability and pharmacokinetics impose further constraints upon molecules used in cellular and animal studies and beyond.
A recent study by Shoichet and coworkers provides important insights into how existing screening collections overcome this numerical problem and target biologically-relevant chemical space . In their analysis of Reymond and coworker’s universe of synthetically accessible molecules with ≤11 heavy atoms (C, N, O, F) , commercially available compounds and libraries exhibited much higher similarity to metabolites and natural products than did the complete set of all 26.4 million possible molecules. The authors conclude that the reason existing libraries are effective at all in identifying new small molecule ligands is that they are based, albeit largely unintentionally, on structures in naturally occurring molecules, which have coevolved with proteins that bind them. Indeed, the tremendous historical impact of natural products upon drug discovery is well-established .
Thus, rather than sampling chemical space randomly to address challenging biological targets, there is considerable interest in developing new libraries based on other classes of molecules that are biologically validated, but remain underrepresented in current screening collections (for an example, see: ). Importantly, the Shoichet study indicates that 83% of natural product scaffolds and 20% of metabolite scaffolds (with ≤11 heavy atoms; the percentages are likely higher for larger molecules) are absent from commercially available collections . Accordingly, libraries based on specific, underrepresented scaffolds may address challenging targets by providing new pharmacophores and binding geometries.
In addition to specific underrepresented scaffolds, more general differences between biologically active natural products and existing synthetic drugs, based on structural and physicochemical properties, have also been described [10,11]. While large datasets are often used in such analyses, we favor the use of smaller datasets and commonly available software programs to provide increased accessibility to synthetic chemists while retaining robustness of the results. We have now updated our own previous analysis  to include 40 top-selling small molecule drugs  (39 of which are orally bioavailable), a collection of 60 diverse natural products (including the 24 identified by Ganesan as having led to an approved drug from 1970–2006 ) and 20 drug-like compounds from ChemBridge and Chem Div (see Supplementary Information). Each compound was analyzed for 20 calculated structural and physicochemical parameters, then principal component analysis was used to replot the data in a two-dimensional format representing 73% of the information in the full 20-dimensional dataset (Figure 1). The two unitless, orthogonal axes represent linear combinations of the original 20 parameters.
Notably, the top-selling drugs cluster largely in one region of the plot, and the drug-like libraries overlap with this region. The few outlier drugs are natural products or derivatives, and these molecules, along with the 60 natural products, span a much broader range of chemical space. Analysis of component loadings indicates that, in general, the natural products in this analysis feature higher polarity/decreased hydrophobicity and higher molecular weights (to left on x-axis) and more stereochemical features and fewer aromatic rings (to bottom on y-axis) compared to synthetic drugs and drug-like libraries (see Supplementary Information). Interestingly, two subsets of rule-of-five  compliant and non-compliant natural products identified by Ganesan do cluster in distinct regions of our plot, although both subsets are equal in size (12 molecules) and have resulted in equal numbers of orally bioavailable drugs (7 each) . Thus, libraries that explore underrepresented regions of biologically-relevant chemical space with respect to structural and physicochemical parameters may also address challenging targets by providing, for example, larger binding surfaces, polarity/charge states, and functional groups that are often excluded from drug-like libraries.
To meet the need for new libraries to address challenging targets, many academic labs are developing libraries that are based on underrepresented scaffolds and that probe underrepresented regions of chemical space. In so doing, it is often appropriate to ignore, or at least avoid strict adherence to, various ‘rules’ that have been developed for drug-like libraries. First, many of these rules have been established with a view toward developing drugs that are orally bioavailable; of course, this is not a primary concern in academic screening, where new probes are needed to investigate fundamental biological processes in biochemical and cellular systems and, in the limit, to validate new targets in animal models. Second, natural products are often cited as being exempt from such rules, due to the influence of carrier-mediated and active transport; interestingly, however, it has recently been suggested that such transport may be much more common across all drugs than previously assumed  and, thus, the exclusion of natural products from rule definitions becomes less straightforward. Third, and perhaps most importantly, these rules are based on retrospective analysis of existing drugs and do not provide a blueprint on how to escape from the narrow range of targets addressed by such drugs in the future.
Herein, we discuss three classes of biological targets that have proven challenging to address using existing drug-like libraries. While there is certainly some overlap between these categories, they provide a useful framework for discussion. For each, we provide recent examples of natural products and novel molecules derived from academic libraries that successfully engage these targets.
Protein–protein interactions are pervasive in biology, but are classically challenging targets because they often involve large, flat binding interfaces comprised of non-contiguous amino acid residues, and lack cognate small-molecule binding partners [17,18]. As such, they have frequently proven problematic in screens of drug-like libraries . However, there is, a priori, no thermodynamic reason why such interactions should not be amenable to modulation with small molecules , and promising early results with Bcl-2 antagonists suggest that such molecules can, indeed, be translated to the clinic [21,22]. Three examples of recently identified small molecules that modulate protein–protein interaction targets are the natural products FR901464 and pladienolide B and the library-derived macrocycle robotnikinin (Figure 2).
The spliceosome is a macromolecular complex that orchestrates the process of mRNA splicing, in which non-coding intron sequences in pre-mRNA are excised to produce mature mRNA [23,24]. Aberrant splicing is associated with numerous genetic diseases, including cancer. Spliceosome function involves dynamic interactions of over a hundred proteins and five RNA components that form snRNP (small nuclear ribonucleoprotein) subunits. These snRNPs then undergo a dynamic process of assembly, rearrangement, and release in association with the pre-mRNA substrate, followed by two transesterification reactions that comprise the actual chemical splicing process. The microbial natural products FR901464 and pladienolide B, originally discovered as anticancer agents, were recently identified as spliceosome inhibitors by the Yoshida and Mizui groups [25,26]. In both cases, inhibition was attributed to binding to the SAP130 and/or SAP155 proteins in the U2 snRNP SF3b subcomplex, which coordinates pre-mRNA recognition and spliceosome assembly. While the individual functions of these proteins are still being elucidated, they are thought to be scaffolding proteins not directly associated with active site functions , suggesting that these compounds likely act by modulating protein–protein or other macromolecular interactions . Spliceostatin A, a methylated derivative of FR901464, was also shown to bind reversibly, despite the presence of the potentially reactive epoxide moiety, and caused aberrant nuclear release and translation of pre-mRNA, suggesting possible mechanisms for the selective antitumor activity of these compounds . Notably, E7107, an analog of pladienolide B, was also advanced to Phase I clinical trials .
The hedgehog signal transduction pathway is involved in embryonic development and cancer . Signaling is initiated by binding of the extracellular protein Sonic hedgehog (Shh) to a 12-transmembrane cell surface receptor Patched (Ptch1). Previously identified small molecule agonists and antagonists of this pathway act downstream of the Shh–Ptch1 interaction . Robotnikinin, a 12-membered macrocycle related to scaffolds found in many macrolide natural products, was recently developed as an inhibitor of this pathway, following screens of a 2070-membered macrocycle library by Peng, Schreiber, and coworkers [29,30]. This compound was shown to bind the active, 20 kDa N-terminal fragment of Shh and to disrupt the ability of Shh to relay its signal to Ptch1. While it is not yet known if robotnikinin acts by direct physical disruption of the Shh–Ptch1 interaction or by modulating interactions with other Shh-associated proteins, this work demonstrates that it is feasible to identify molecules that disrupt protein–protein interaction function using macrocycle libraries based on natural product scaffolds.
Nucleic acids are central to diverse biological processes, acting as substrates in the case of DNA and RNA, and as catalysts and regulators in the case of ribonucleoproteins and ribozymes. However, complexes involving nucleic acids are generally considered challenging targets for modulation with small molecules, due to the macromolecular nature of the interactions (protein–nucleic acid, nucleic acid–nucleic acid, and protein–protein), and the distinctive electrostatic features of nucleic acids [31,32]. However, many natural products do, in fact, modulate such targets, including a variety of clinically approved drugs [33,34]. Two examples of recently identified small molecules that target nucleic acid complexes are the natural product avrainvillamide and a library-derived lactam carboxamide (Figure 3).
Nucleophosmin is a multifunctional protein that is implicated in numerous processes relevant to cell growth, proliferation, and transformation . Aberrant nucleophosmin function is associated with acute myeloid leukemia and anaplastic large-cell lymphomas . Due to its complex role in many different cellular activities, involving interactions with DNA, RNA, and other proteins, small molecules that modulate individual functions of nucleophosmin would be powerful pharmacological tools to dissect those functions. The fungal natural product avrainvillamide was originally discovered based on its anticancer activity . Recently, the antiproliferative effects of avrainvillamide have been attributed to its interaction with nucleophosmin by Myers and coworkers . Site-directed mutagenesis studies indicated that avrainvillamide binds nucleophosmin covalently at Cys-275. This site has been implicated in nucleic acid binding , suggesting that this natural product acts by directly targeting this protein–nucleic acid interaction.
The HOX family of transcription factors are master regulators of development, and aberrant expression is implicated in leukemia and other cancers . Recently, a lactam carboxamide inhibitor of a HOX transcription factor was discovered from a 400-membered library of carboxy-γ-lactams and carboxy-2-quinolones having stereochemically and functionally diverse substituents by Stadler, Shaw, and coworkers . Although the library was not explicitly inspired by natural products, a variety of highly substituted γ-lactam natural products are known, including lactacystin and salinosporamide (see also ). Screening of the library in 150 assays of protein–biopolymer interactions yielded the lactam carboxamide as an inhibitor of HOXA13 binding to DNA. The compound also inhibited transcriptional repression by HOXA13 in a luciferase reporter gene assay. This work demonstrates that cell-permeable inhibitors of transcription factor protein–DNA interactions can, indeed, be identified using novel chemical libraries.
The increasing incidence of antibacterial resistance poses a major threat to human health . Unfortunately, many pharmaceutical companies have abandoned their antibacterial discovery programs, and those that have continued them have found such targets to be difficult to address with existing libraries . At least part of this problem can be attributed to the relatively narrow range of chemical structures present in industrial libraries, resulting from the corresponding focus on a narrow range of human protein targets. In contrast, known antibacterials address a distinct set of bacterial targets, including non-protein targets, and exhibit significantly different structural and physicochemical properties compared to other classes of drugs . Accordingly, novel chemical libraries are needed to provide new starting points for antibacterial discovery. Two recent examples of new scaffolds that address antibacterial targets are the marine natural product abyssomycin C and the library-derived compound gemmacin (Figure 4).
Tetrahydrofolate (THF) is a metabolite involved in myriad biological processes shared by humans and bacteria, including DNA synthesis and repair. While inhibitors of THF biosynthesis have been used heavily to combat bacterial infections, many target biosynthetic steps that are also found in the mammalian host. In contrast, para-aminobenzoic acid biosynthesis is specific to bacterial THF biosynthesis and is not used in the human pathway. The natural product abyssomicin C was discovered from a marine actinomycete Verrucosispora as an inhibitor of para-aminobenzoic acid biosynthesis by Süssmuth and coworkers . This compound was proposed to act as a novel chorismate mimic, possibly involving covalent binding via a Michael acceptor moiety. Subsequently, total synthesis by Nicolaou and coworkers revealed that the isomer atrop-abyssomicin C was actually more active against methicillin-resistant Staphylococcus aureus , and reevaluation of the fermentation broth confirmed that this was, in fact, the major isomer produced. Further investigations by Süssmuth demonstrated that atrop-abyssomicin C binds the PabB subunit of 4-amino-4-deoxychorismate synthase covalently at Cys-263 . Interestingly, the initial Michael addition to the enone may be followed transannular attack of the resulting enol on the butenolide moiety, and this may lead to additional conformational changes impacting activity. It is worth noting that, despite the role of such electrophilic functionalities in many biologically important natural products, they are often excluded from drug-like libraries a priori.
Selective membrane disruption is one mechanism by which several known peptides exert their antimicrobial properties . Gemmacin was identified as a new antibacterial agent from a 242-membered library based on 18 natural product scaffolds by Spring and coworkers . This compound has broad-spectrum activity against Gram-positive bacteria, including methicillin-resistant S. aureus, and exhibits low toxicity against fungal and mammalian cells. To elucidate the mechanism of growth inhibition, gemmacin was assayed for common modes of antibacterial activity, including dihydrofolate reductase inhibition, protein synthesis, and ATP synthesis uncoupling. Detection of reactive oxygen species in a Spodoptera frugiperda cell line suggested membrane disruption as a likely candidate. Although this mode of action is shared by other molecules, the previous examples were peptidic and of considerably higher molecular weight. Notably, gemmacin is zwiterionic, a feature seldom found in drug-like libraries. This work emphasizes the utility of libraries with new scaffolds in identifying compounds to address drug-resistant bacteria.
Two facets of natural products discussed earlier are their specific scaffolds and their general structural and physicochemical properties. Along these lines, PubChem substructure searches for all three library-derived probes described above revealed that these scaffolds are, indeed, poorly represented in the NIH Molecular Libraries Small Molecule Repository (see Supplementary Information). Of the ≈342,000 compounds currently listed in the MLSMR (as of January 2010), the 12-membered ring core scaffold of robotnikinin was found only in the 249 members of the same library that was used to discover it, and only two 12-membered macrolactones were found. Substituted γ-lactams based on the lactam carboxamide scaffold were also underrepresented, with only 45 compounds having β,β-disubstitution and none having the β-carbonyl-β-aryl substituent pattern. Finally, the [3.2.1]bicyclic amine scaffold of gemmacin was found in only 30 compounds, almost all of which were N-substituted variants of a single cyclic imide scaffold, with no compounds having carbon substitution patterns similar to gemmacin.
In our PCA analysis, we were interested to find that all three of these new probes overlap with existing top-selling drugs (Figure 1). In contrast, the four natural products discussed above fall well outside of this region, overlapping with other natural products. This suggests that our current analysis of structural and physicochemical properties, while effective at identifying differences between synthetic drugs and natural products, is insufficient to discriminate between specific scaffolds that are common or underrepresented. This is, perhaps, not surprising, as the three-dimensional pharmacophores and binding geometries provided by these new scaffolds are difficult to incorporate into such parameter-based vector analyses. Thus, alternative computational approaches must be used to identify such scaffolds [6,50]. Nonetheless, it is important to note that two out of the three library-derived probes violate ‘rules’ that have been developed previously for orally bioavailable drugs and are often used to filter drug-like libraries (lactam carboxamide MW=589, XlogP=5.4, rotatable bonds=11; gemmacin MW=539, XlogP=5.3). This emphasizes the need to develop novel libraries that do not adhere strictly to these parameters in the search for molecules that can address challenging targets.
Efforts to identify small molecules that address challenging biological targets have met with increasing success using novel libraries based on natural products. Continued advances in chemical synthesis to enable efficient production of complex molecules, coupled with accessible computational tools to facilitate library design, should provide increased efficiency in this discovery process in the future. The ultimate goal of these ventures into underexploited regions of chemical space is to expand the range of ‘druggable’ targets, such that the identification of new ligands for currently challenge targets ultimately becomes routine. Success in this endeavor would have major positive impacts in chemical biology and drug discovery.
We thank Prof. Ganesan (University of Southampton) for stimulating discussions. D.S.T. is an Alfred P. Sloan Research Fellow. Financial support for our laboratory from the NIH (P41 GM076267), Starr Cancer Consortium, Tri-Institutional Stem Cell Initiative, William H. Goodwin and Alice Goodwin and the Commonwealth Foundation for Cancer Research, and MSKCC Experimental Therapeutics Center is gratefully acknowledged.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Papers of particular interest, published within the annual period of review, have been highlighted as:
• of special interest
•• of outstanding interest