The Library of Integrated Network-based Cellular Signatures (LINCS) project is a large-scale coordinated effort to build a comprehensive systems biology reference resource. The goals of the program include the generation of a very large multidimensional data matrix and informatics and computational tools to integrate, analyze, and make the data readily accessible. LINCS data include genome-wide transcriptional signatures, biochemical protein binding profiles, cellular phenotypic response profiles and various other datasets for a wide range of cell model systems and molecular and genetic perturbations. Here we present a partial survey of this data facilitated by data standards and in particular a robust compound standardization workflow; we integrated several types of LINCS signatures and analyzed the results with a focus on mechanism of action (MoA) and chemical compounds. We illustrate how kinase targets can be related to disease models and relevant drugs. We identified some fundamental trends that appear to link Kinome binding profiles and transcriptional signatures to chemical information and biochemical binding profiles to transcriptional responses independent of chemical similarity. To fill gaps in the datasets we developed and applied predictive models. The results can be interpreted at the systems level as demonstrated based on a large number of signaling pathways. We can identify clear global relationships, suggesting robustness of cellular responses to chemical perturbation. Overall, the results suggest that chemical similarity is a useful measure at the systems level, which would support phenotypic drug optimization efforts. With this study we demonstrate the potential of such integrated analysis approaches and suggest prioritizing further experiments to fill the gaps in the current data.
systems-biology; data integration; drug profiling; chemical similarity; kinome profiles; transcriptional signatures
Depending on endoplasmic reticulum (ER) stress levels, the ER transmembrane multi-domain protein IRE1α promotes either adaptation or apoptosis. Unfolded ER proteins cause IRE1α lumenal domain homo-oligomerization, inducing trans auto-phosphorylation that further drives homo-oligomerization of its cytosolic kinase/ endoribonuclease (RNase) domains to activate mRNA splicing of adaptive XBP1 transcription factor. However, under high/chronic ER stress, IRE1α surpasses an oligomerization threshold that expands RNase substrate repertoire to many ER-localized mRNAs, leading to apoptosis. To modulate these effects, we developed ATP-competitive IRE1α Kinase Inhibiting RNase Attenuators—KIRAs—that allosterically inhibit IRE1α’s RNase by breaking oligomers. One optimized KIRA, KIRA6, inhibits IRE1α in vivo and promotes cell survival under ER stress. Intravitreally, KIRA6 preserves photoreceptor functional viability in rat models of ER stress-induced retinal degeneration. Systemically, KIRA6 preserves pancreatic β-cells, increases insulin, and reduces hyperglycemia in Akita diabetic mice. Thus, IRE1α powerfully controls cell fate, but can itself be controlled with small molecules to reduce cell degeneration.
Motivation: Novel tools need to be developed to help scientists analyze large amounts of available screening data with the goal to identify entry points for the development of novel chemical probes and drugs. As the largest class of drug targets, G protein-coupled receptors (GPCRs) remain of particular interest and are pursued by numerous academic and industrial research projects.
Results: We report the first GPCR ontology to facilitate integration and aggregation of GPCR-targeting drugs and demonstrate its application to classify and analyze a large subset of the PubChem database. The GPCR ontology, based on previously reported BioAssay Ontology, depicts available pharmacological, biochemical and physiological profiles of GPCRs and their ligands. The novelty of the GPCR ontology lies in the use of diverse experimental datasets linked by a model to formally define these concepts. Using a reasoning system, GPCR ontology offers potential for knowledge-based classification of individuals (such as small molecules) as a function of the data.
Availability: The GPCR ontology is available at http://www.bioassayontology.org/bao_gpcr and the National Center for Biomedical Ontologies Web site.
Supplementary data are available at Bioinformatics online.
Bioinformatics and computer aided drug design rely on the curation of a large number of protocols for biological assays that measure the ability of potential drugs to achieve a therapeutic effect. These assay protocols are generally published by scientists in the form of plain text, which needs to be more precisely annotated in order to be useful to software methods. We have developed a pragmatic approach to describing assays according to the semantic definitions of the BioAssay Ontology (BAO) project, using a hybrid of machine learning based on natural language processing, and a simplified user interface designed to help scientists curate their data with minimum effort. We have carried out this work based on the premise that pure machine learning is insufficiently accurate, and that expecting scientists to find the time to annotate their protocols manually is unrealistic. By combining these approaches, we have created an effective prototype for which annotation of bioassay text within the domain of the training set can be accomplished very quickly. Well-trained annotations require single-click user approval, while annotations from outside the training set domain can be identified using the search feature of a well-designed user interface, and subsequently used to improve the underlying models. By drastically reducing the time required for scientists to annotate their assays, we can realistically advocate for semantic annotation to become a standard part of the publication process. Once even a small proportion of the public body of bioassay data is marked up, bioinformatics researchers can begin to construct sophisticated and useful searching and analysis algorithms that will provide a diverse and powerful set of tools for drug discovery researchers.
Bioassay; Ontology; Machine learning; Natural language processing; Bayesian; Semantic curation
Cell lines have been widely used in biomedical research. The community-based Cell Line Ontology (CLO) is a member of the OBO Foundry library that covers the domain of cell lines. Since its publication two years ago, significant updates have been made, including new groups joining the CLO consortium, new cell line cells, upper level alignment with the Cell Ontology (CL) and the Ontology for Biomedical Investigation, and logical extensions.
Construction and content
Collaboration among the CLO, CL, and OBI has established consensus definitions of cell line-specific terms such as ‘cell line’, ‘cell line cell’, ‘cell line culturing’, and ‘mortal’ vs. ‘immortal cell line cell’. A cell line is a genetically stable cultured cell population that contains individual cell line cells. The hierarchical structure of the CLO is built based on the hierarchy of the in vivo cell types defined in CL and tissue types (from which cell line cells are derived) defined in the UBERON cross-species anatomy ontology. The new hierarchical structure makes it easier to browse, query, and perform automated classification. We have recently added classes representing more than 2,000 cell line cells from the RIKEN BRC Cell Bank to CLO. Overall, the CLO now contains ~38,000 classes of specific cell line cells derived from over 200 in vivo cell types from various organisms.
Utility and discussion
The CLO has been applied to different biomedical research studies. Example case studies include annotation and analysis of EBI ArrayExpress data, bioassays, and host-vaccine/pathogen interaction. CLO’s utility goes beyond a catalogue of cell line types. The alignment of the CLO with related ontologies combined with the use of ontological reasoners will support sophisticated inferencing to advance translational informatics development.
Cell line; Cell line cell; Immortal cell line cell; Mortal cell line cell; Cell line cell culturing; Anatomy
The lack of established standards to describe and annotate biological assays and screening outcomes in the domain of drug and chemical probe discovery is a severe limitation to utilize public and proprietary drug screening data to their maximum potential. We have created the BioAssay Ontology (BAO) project (http://bioassayontology.org) to develop common reference metadata terms and definitions required for describing relevant information of low-and high-throughput drug and probe screening assays and results. The main objectives of BAO are to enable effective integration, aggregation, retrieval, and analyses of drug screening data. Since we first released BAO on the BioPortal in 2010 we have considerably expanded and enhanced BAO and we have applied the ontology in several internal and external collaborative projects, for example the BioAssay Research Database (BARD). We describe the evolution of BAO with a design that enables modeling complex assays including profile and panel assays such as those in the Library of Integrated Network-based Cellular Signatures (LINCS). One of the critical questions in evolving BAO is the following: how can we provide a way to efficiently reuse and share among various research projects specific parts of our ontologies without violating the integrity of the ontology and without creating redundancies. This paper provides a comprehensive answer to this question with a description of a methodology for ontology modularization using a layered architecture. Our modularization approach defines several distinct BAO components and separates internal from external modules and domain-level from structural components. This approach facilitates the generation/extraction of derived ontologies (or perspectives) that can suit particular use cases or software applications. We describe the evolution of BAO related to its formal structures, engineering approaches, and content to enable modeling of complex assays and integration with other ontologies and datasets.
A fundamental impediment to functional recovery from spinal cord injury (SCI) and traumatic brain injury is the lack of sufficient axonal regeneration in the adult central nervous system. There is thus a need to develop agents that can stimulate axon growth to re-establish severed connections. Given the critical role played by protein kinases in regulating axon growth and the potential for pharmacological intervention, small molecule protein kinase inhibitors present a promising therapeutic strategy. Here, we report a robust cell-based phenotypic assay, utilizing primary rat hippocampal neurons, for identifying small molecule kinase inhibitors that promote neurite growth. The assay is highly reliable and suitable for medium throughput screening, as indicated by its Z′-factor of 0.73. A focused structurally diverse library of protein kinase inhibitors was screened, revealing several compound groups with the ability to strongly and consistently promote neurite growth. The best performing bioassay hit robustly and consistently promoted axon growth in a postnatal cortical slice culture assay. This study can serve as a jumping-off point for structure activity relationship (SAR) and other drug discovery approaches towards the development of drugs for treating SCI and related neurological pathologies.
Large corpora of kinase small molecule inhibitor data are accessible to public sector research from thousands of journal article and patent publications. These data have been generated employing a wide variety of assay methodologies and experimental procedures by numerous laboratories. Here we ask the question how applicable these heterogeneous datasets are to predict kinase activities and which characteristics of the datasets contribute to their utility. We accessed almost 500,000 molecules from the Kinase Knowledge Base (KKB) and after rigorous aggregation and standardization generated over 180 distinct datasets covering all major groups of the human Kinome. To assess the value of the datasets we generated hundreds of classification and regression models. Their rigorous cross-validation and characterization demonstrated highly predictive classification and quantitative models for the majority of kinase targets if a minimum required number of active compounds or structure-activity data points were available. We then applied the best classifiers to compounds most recently profiled in the NIH Library of Integrated Network-based Cellular Signatures (LINCS) program and found good agreement of profiling results with predicted activities. Our results indicate that, although heterogeneous in nature, the publically accessible datasets are exceedingly valuable and well suited to develop highly accurate predictors for practical Kinome-wide virtual screening applications and to complement experimental kinase profiling.
Sphingosine 1-phosphate (S1P) is a lysophospholipid signaling molecule that regulates important biological functions, including lymphocyte trafficking and vascular development, by activating G protein-coupled receptors for S1P, namely S1P1 through S1P5. Here we map the S1P3 binding pocket with a novel allosteric agonist (CYM-5541), an orthosteric agonist (S1P), and a novel bitopic antagonist (SPM-242). With a combination of site-directed mutagenesis, ligand competition assay, and molecular modeling, we concluded that S1P and CYM-5541 occupy different chemical spaces in the ligand binding pocket of S1P3. CYM-5541 allowed us to identify an allosteric site where Phe263 is a key gate-keeper residue for its affinity and efficacy. This ligand lacks a polar moiety and the novel allosteric hydrophobic pocket permits S1P3 selectivity of CYM-5541 within the highly similar S1P receptor family. On the other hand, a novel S1P3-selective antagonist, SPM-242, in the S1P3 pocket occupies the ligand binding spaces of both S1P and CYM-5541, showing its bitopic mode of binding. Therefore, our coordinated approach with biochemical data and molecular modeling, based on our recently published S1P1 crystal structure data in a highly conserved set of related receptors with a shared ligand, provides a strong basis for the successful optimization of orthosteric, allosteric, and bitopic modulators of S1P3.
The identification of a new class of potent and selective ROCK-II inhibitors is presented. Compound 5 (SR-3677) had an IC50 of ~3 nM in enzyme and cell based assays and had an off-target hit rate of 1.4% against 353 kinases, and inhibited only 3 out of 70 nonkinase enzymes and receptors. Pharmacology studies showed that 5 was efficacious in both, increasing ex vivo aqueous humor outflow in porcine eyes and inhibiting myosin light chain phosphorylation.
Under endoplasmic reticulum (ER) stress, unfolded proteins accumulate in the ER to activate the ER transmembrane kinase/endoribonuclease (RNase)—IRE1α. IRE1α oligomerizes, autophosphorylates, and initiates splicing of XBP1 mRNA, thus triggering the unfolded protein response (UPR). Here we show that IRE1α’s kinase-controlled RNase can be regulated in two distinct modes with kinase inhibitors: one class of ligands occupy IRE1α’s kinase ATP-binding site to activate RNase-mediated XBP1 mRNA splicing even without upstream ER stress, while a second class can inhibit the RNase through the same ATP-binding site, even under ER stress. Thus, alternative kinase conformations stabilized by distinct classes of ATP-competitive inhibitors can cause allosteric switching of IRE1α’s RNase—either on or off. As dysregulation of the UPR has been implicated in a variety of cell degenerative and neoplastic disorders, small molecule control over IRE1α should advance efforts to understand the UPR’s role in pathophysiology and to develop drugs for ER stress-related diseases.
PPARγ is involved in expression of genes that control glucose and lipid metabolism. PPARγ is the molecular target of the thiazolidinedione (TZD) class of antidiabetic drugs. However, despite their clinical use these drugs are related to numerous adverse effects, which are related to their full activation of PPARγ transcriptional responses. PPARγ partial agonists are the focus of development efforts towards second-generation PPARγ modulators with favourable pharmacology, potent insulin sensitization without the severe full agonists’ adverse effects. In order to identify novel PPARγ partial agonist lead compounds, we developed a virtual screening protocol based on 3D-ligand shape similarity and docking. 235 compounds were prioritized for experimental screening from a 340,000 MLSMR chemical library. Seven novel potent partial agonists were confirmed in cell-based transactivation and competitive binding assays. Our results illustrate a well-designed virtual screening campaign successfully identifying novel lead compounds as potential entry points for the development of antidiabetic drugs.
diabetes; drug design; partial agonists; PPARγ; virtual screening
Huge amounts of high-throughput screening (HTS) data for probe and drug development projects are being generated in the pharmaceutical industry and more recently in the public sector. The resulting experimental datasets are increasingly being disseminated via publically accessible repositories. However, existing repositories lack sufficient metadata to describe the experiments and are often difficult to navigate by non-experts. The lack of standardized descriptions and semantics of biological assays and screening results hinder targeted data retrieval, integration, aggregation, and analyses across different HTS datasets, for example to infer mechanisms of action of small molecule perturbagens. To address these limitations, we created the BioAssay Ontology (BAO). BAO has been developed with a focus on data integration and analysis enabling the classification of assays and screening results by concepts that relate to format, assay design, technology, target, and endpoint. Previously, we reported on the higher-level design of BAO and on the semantic querying capabilities offered by the ontology-indexed triple store of HTS data. Here, we report on our detailed design, annotation pipeline, substantially enlarged annotation knowledgebase, and analysis results. We used BAO to annotate assays from the largest public HTS data repository, PubChem, and demonstrate its utility to categorize and analyze diverse HTS results from numerous experiments. BAO is publically available from the NCBO BioPortal at http://bioportal.bioontology.org/ontologies/1533. BAO provides controlled terminology and uniform scope to report probe and drug discovery screening assays and results. BAO leverages description logic to formalize the domain knowledge and facilitate the semantic integration with diverse other resources. As a consequence, BAO offers the potential to infer new knowledge from a corpus of assay results, for example molecular mechanisms of action of perturbagens.
High-throughput screening data repositories, such as PubChem, represent valuable resources for the development of small molecule chemical probes and can serve as entry points for drug discovery programs. While the loose data format offered by PubChem allows for great flexibility, important annotations, such as the assay format and technologies employed, are not explicitly indexed. We have previously developed a BioAssay Ontology (BAO) and curated over 350 assays with standardized BAO terms. Here we describe the use of BAO annotations to analyze a large set of assays that employ luciferase- and β-lactamase-based technologies. We identified promiscuous chemotypes pertaining to different sub-categories of assays and specific mechanisms by which these chemotypes interfere in reporter gene assays. Our results show that the data in PubChem can be used to identify promiscuous compounds that interfere non-specifically with particular technologies. Furthermore, we show that BAO is a valuable toolset for the identification of related assays and for the systematic generation of insights that are beyond the scope of individual assays or screening campaigns.
compound promiscuity; assay ontology; reporter gene assays; high-throughput screening data analysis; cheminformatics
PPARγ is the functioning receptor for the thiazolidinedione (TZD) class of anti-diabetes drugs including rosiglitazone and pioglitazone1. These drugs are full classical agonists for this nuclear receptor, but recent data has shown that many PPARγ-based drugs have a separate biochemical activity, blocking the obesity-linked phosphorylation of PPARγ by Cdk52. Here we describe novel synthetic compounds that have a unique mode of binding to PPARγ, completely lack classical transcriptional agonism and block the Cdk5-mediated phosphorylation in cultured adipocytes and in insulin-resistant mice. Moreover, one such compound, SR1664, has potent anti-diabetic activity while not causing the fluid retention and weight gain that are serious side effects of many of the PPARγ drugs. Unlike TZDs, SR1664 also does not interfere with bone formation in culture. These data illustrate that new classes of anti-diabetes drugs can be developed by specifically targeting the Cdk5-mediated phosphorylation of PPARγ.
diabetes; anti-diabetic agent; PPARγ; agonism; synthetic ligands; structure
The lymphoid tyrosine phosphatase (Lyp, PTPN22) is a critical negative regulator of T cell antigen receptor (TCR) signaling. A single-nucleotide polymorphism (SNP) in the ptpn22 gene correlates with the incidence of various autoimmune diseases, including type 1 diabetes, rheumatoid arthritis, and systemic lupus erythematosus. Since the disease-associated allele is a more potent inhibitor of TCR signaling, specific Lyp inhibitors may become valuable in treating autoimmunity. Using a structure-based approach, we synthesized a library of 34 compounds that inhibited Lyp with IC50 values between 0.27 and 6.2 μM. A reporter assay was employed to screen for compounds that enhanced TCR signaling in cells, and several inhibitors displayed a dose-dependent, activating effect. Subsequent probing for Lyp's direct physiological targets by immunoblot analysis confirmed the ability of the compounds to inhibit Lyp in T cells. Selectivity profiling against closely related tyrosine phosphatases and in silico docking studies with the crystal structure of Lyp yielded valuable information for the design of Lyp-specific compounds.
T helper cells that produce Interleukin-17 (IL-17) (TH17 cells) are a recently identified CD4+ T-cell subset with characterized pathological roles in autoimmune diseases1–3. The nuclear receptors retinoic acid receptor-related orphan receptors α and γt (RORα and RORγt) have indispensible roles in the development of this cell type4–7. Here we present a first-in-class, high-affinity synthetic ligand, SR1001, specific to both RORα and RORγt that inhibits TH17 cell differentiation and function. SR1001 binds specifically to the ligand binding domains (LBDs) of RORα and RORγt inducing a conformational change within the LBD that encompasses repositioning of helix 12 leading to diminished affinity for coactivators and increased affinity for corepressors resulting in suppression of the receptors transcriptional activity. SR1001 inhibited the development of murine TH17 cells as demonstrated by inhibition of IL-17A gene expression and protein production. Additionally, SR1001 inhibited the expression of cytokines when added to differentiated murine or human TH17 cells. Finally, SR1001 effectively suppressed the clinical severity of autoimmune disease in mice. Thus, our data demonstrates the feasibility of targeting the orphan receptors RORα and RORγt to specifically inhibit TH17 cell differentiation and function and indicates that this novel class of compound has potential utility in the treatment of autoimmune diseases.
High-throughput screening (HTS) is one of the main strategies to identify novel entry points for the development of small molecule chemical probes and drugs and is now commonly accessible to public sector research. Large amounts of data generated in HTS campaigns are submitted to public repositories such as PubChem, which is growing at an exponential rate. The diversity and quantity of available HTS assays and screening results pose enormous challenges to organizing, standardizing, integrating, and analyzing the datasets and thus to maximize the scientific and ultimately the public health impact of the huge investments made to implement public sector HTS capabilities. Novel approaches to organize, standardize and access HTS data are required to address these challenges.
We developed the first ontology to describe HTS experiments and screening results using expressive description logic. The BioAssay Ontology (BAO) serves as a foundation for the standardization of HTS assays and data and as a semantic knowledge model. In this paper we show important examples of formalizing HTS domain knowledge and we point out the advantages of this approach. The ontology is available online at the NCBO bioportal http://bioportal.bioontology.org/ontologies/44531.
After a large manual curation effort, we loaded BAO-mapped data triples into a RDF database store and used a reasoner in several case studies to demonstrate the benefits of formalized domain knowledge representation in BAO. The examples illustrate semantic querying capabilities where BAO enables the retrieval of inferred search results that are relevant to a given query, but are not explicitly defined. BAO thus opens new functionality for annotating, querying, and analyzing HTS datasets and the potential for discovering new knowledge by means of inference.
Tyrosine phosphorylation, controlled by the coordinated action of protein-tyrosine kinases (PTKs) and protein-tyrosine phosphatases (PTPs), is a fundamental regulatory mechanism of numerous physiological processes. PTPs are implicated in a number of human diseases and their potential as prospective drug targets is increasingly being recognized. Despite their biological importance, until now no comprehensive overview has been reported describing how all members of the human PTP family are related. Here we review the entire human PTP family and present a systematic knowledge-based characterization of global and local similarity relationships, which are relevant for the development of small molecule inhibitors. We use parallel homology modeling to expand the current PTP structure space and analyze the human PTPs based on local three-dimensional catalytic sites and domain sequences. Furthermore, we demonstrate the importance of binding site similarities in understanding cross-reactivity and inhibitor selectivity in the design of small molecule inhibitors.
Tyrosine dephosphorylation; phosphatases; tyrosine phosphatome; phosphatase inhibitors; structure-based drug design; sequence similarity; catalytic binding site similarity
We have studied the Sphingosine 1-phosphate (S1P) receptor system to better understand why certain molecular targets within a closely related family are much more tractable when identifying compelling chemical leads. Five medically important G protein-coupled receptors for S1P regulate heart rate, coronary artery caliber, endothelial barrier integrity, and lymphocyte trafficking. Selective S1P receptor agonist probes would be of great utility to study receptor subtype-specific function. Through systematic screening of the same libraries, we identified novel selective agonists chemotypes for each of the S1P1 and S1P3 receptors. uHTS for S1P1 was more effective than for S1P3, with many selective, low nanomolar hits of proven mechanism emerging for. Receptor structure modeling and ligand docking reveal differences between the receptor binding pockets, which are the basis for sub-type selectivity. Novel selective agonists interact primarily in the hydrophobic pocket of the receptor in the absence of head-group interactions. Chemistry-space and shape-based analysis of the screening libraries in combination with the binding models explain the observed differential hit rates and enhanced efficiency for lead discovery for S1P1 vs. S1P3 in this closely related receptor family.