Phenotypes are the observable characteristics of an organism arising from its response to the environment. Phenotypes associated with engineered and natural genetic variation are widely recorded using phenotype ontologies in model organisms, as are signs and symptoms of human Mendelian diseases in databases such as OMIM and Orphanet. Exploiting these resources, several computational methods have been developed for integration and analysis of phenotype data to identify the genetic etiology of diseases or suggest plausible interventions. A similar resource would be highly useful not only for rare and Mendelian diseases, but also for common, complex and infectious diseases. We apply a semantic text-mining approach to identify the phenotypes (signs and symptoms) associated with over 6,000 diseases. We evaluate our text-mined phenotypes by demonstrating that they can correctly identify known disease-associated genes in mice and humans with high accuracy. Using a phenotypic similarity measure, we generate a human disease network in which diseases that have similar signs and symptoms cluster together, and we use this network to identify closely related diseases based on common etiological, anatomical as well as physiological underpinnings.
Falling costs in genomic laboratory experiments have led to a steady increase of genomic feature and variation data. Multiple genomic data formats exist for sharing these data, and whilst they are similar, they are addressing slightly different data viewpoints and are consequently not fully compatible with each other. The fragmentation of data format specifications makes it hard to integrate and interpret data for further analysis with information from multiple data providers. As a solution, a new ontology is presented here for annotating and representing genomic feature and variation dataset contents. The Genomic Feature and Variation Ontology (GFVO) specifically addresses genomic data as it is regularly shared using the GFF3 (incl. FASTA), GTF, GVF and VCF file formats. GFVO simplifies data integration and enables linking of genomic annotations across datasets through common semantics of genomic types and relations.
Availability and implementation. The latest stable release of the ontology is available via its base URI; previous and development versions are available at the ontology’s GitHub repository: https://github.com/BioInterchange/Ontologies; versions of the ontology are indexed through BioPortal (without external class-/property-equivalences due to BioPortal release 4.10 limitations); examples and reference documentation is provided on a separate web-page: http://www.biointerchange.org/ontologies.html. GFVO version 1.0.2 is licensed under the CC0 1.0 Universal license (https://creativecommons.org/publicdomain/zero/1.0) and therefore de facto within the public domain; the ontology can be appropriated without attribution for commercial and non-commercial use.
Bioinformatics; Genomics; Ontology
There is no publicly available resource that provides the relative severity of adverse drug reactions (ADRs). Such a resource would be useful for several applications, including assessment of the risks and benefits of drugs and improvement of patient-centered care. It could also be used to triage predictions of drug adverse events.
The intent of the study was to rank ADRs according to severity.
We used Internet-based crowdsourcing to rank ADRs according to severity. We assigned 126,512 pairwise comparisons of ADRs to 2589 Amazon Mechanical Turk workers and used these comparisons to rank order 2929 ADRs.
There is good correlation (rho=.53) between the mortality rates associated with ADRs and their rank. Our ranking highlights severe drug-ADR predictions, such as cardiovascular ADRs for raloxifene and celecoxib. It also triages genes associated with severe ADRs such as epidermal growth-factor receptor (EGFR), associated with glioblastoma multiforme, and SCN1A, associated with epilepsy.
ADR ranking lays a first stepping stone in personalized drug risk assessment. Ranking of ADRs using crowdsourcing may have useful clinical and financial implications, and should be further investigated in the context of health care decision making.
pharmacovigilance; adverse drug reactions; drug side effects; crowdsourcing; patient-centered care; alert fatigue
Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework.
We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes.
The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of model genetic organisms and can be readily applied to species with fewer genetic resources and less well-characterized genomes. In addition, these tools should enhance future efforts to explore the relationships among phenotypic similarity, gene function, and sequence similarity in plants, and to make genotype-to-phenotype predictions relevant to plant biology, crop improvement, and potentially even human health.
Electronic supplementary material
The online version of this article (doi:10.1186/s13007-015-0053-y) contains supplementary material, which is available to authorized users.
Semantic similarity measures over phenotype ontologies have been demonstrated to provide a powerful approach for the analysis of model organism phenotypes, the discovery of animal models of human disease, novel pathways, gene functions, druggable therapeutic targets, and determination of pathogenicity.
We have developed PhenomeNET 2, a system that enables similarity-based searches over a large repository of phenotypes in real-time. It can be used to identify strains of model organisms that are phenotypically similar to human patients, diseases that are phenotypically similar to model organism phenotypes, or drug effect profiles that are similar to the phenotypes observed in a patient or model organism. PhenomeNET 2 is available at http://aber-owl.net/phenomenet.
Phenotype-similarity searches can provide a powerful tool for the discovery and investigation of molecular mechanisms underlying an observed phenotypic manifestation. PhenomeNET 2 facilitates user-defined similarity searches and allows researchers to analyze their data within a large repository of human, mouse and rat phenotypes.
Phenotype; Semantic similarity; Ontology
Many ontologies have been developed in biology and these ontologies increasingly contain large volumes of formalized knowledge commonly expressed in the Web Ontology Language (OWL). Computational access to the knowledge contained within these ontologies relies on the use of automated reasoning.
We have developed the Aber-OWL infrastructure that provides reasoning services for bio-ontologies. Aber-OWL consists of an ontology repository, a set of web services and web interfaces that enable ontology-based semantic access to biological data and literature. Aber-OWL is freely available at http://aber-owl.net.
Aber-OWL provides a framework for automatically accessing information that is annotated with ontologies or contains terms used to label classes in ontologies. When using Aber-OWL, access to ontologies and data annotated with them is not merely based on class names or identifiers but rather on the knowledge the ontologies contain and the inferences that can be drawn from it.
Ontology-based data access; Linked data; OWL
Background: Recent years have seen a surge in projects that produce large volumes of structured, machine-readable biodiversity data. To make these data amenable to processing by generic, open source “data enrichment” workflows, they are increasingly being represented in a variety of standards-compliant interchange formats. Here, we report on an initiative in which software developers and taxonomists came together to address the challenges and highlight the opportunities in the enrichment of such biodiversity data by engaging in intensive, collaborative software development: The Biodiversity Data Enrichment Hackathon.
Results: The hackathon brought together 37 participants (including developers and taxonomists, i.e. scientific professionals that gather, identify, name and classify species) from 10 countries: Belgium, Bulgaria, Canada, Finland, Germany, Italy, the Netherlands, New Zealand, the UK, and the US. The participants brought expertise in processing structured data, text mining, development of ontologies, digital identification keys, geographic information systems, niche modeling, natural language processing, provenance annotation, semantic integration, taxonomic name resolution, web service interfaces, workflow tools and visualisation. Most use cases and exemplar data were provided by taxonomists.
One goal of the meeting was to facilitate re-use and enhancement of biodiversity knowledge by a broad range of stakeholders, such as taxonomists, systematists, ecologists, niche modelers, informaticians and ontologists. The suggested use cases resulted in nine breakout groups addressing three main themes: i) mobilising heritage biodiversity knowledge; ii) formalising and linking concepts; and iii) addressing interoperability between service platforms. Another goal was to further foster a community of experts in biodiversity informatics and to build human links between research projects and institutions, in response to recent calls to further such integration in this research domain.
Conclusions: Beyond deriving prototype solutions for each use case, areas of inadequacy were discussed and are being pursued further. It was striking how many possible applications for biodiversity data there were and how quickly solutions could be put together when the normal constraints to collaboration were broken down for a week. Conversely, mobilising biodiversity knowledge from their silos in heritage literature and natural history collections will continue to require formalisation of the concepts (and the links between them) that define the research domain, as well as increased interoperability between the software platforms that operate on these concepts.
Biodiversity informatics; Data enrichment; Hackathon; Intelligent openness; Linked data; Open source; Software; Semantic Web; Taxonomy; Web services
Over the past 15 years, the biomedical research community has increased its efforts to produce ontologies encoding biomedical knowledge, and to provide the corresponding infrastructure to maintain them. As ontologies are becoming a central part of biological and biomedical research, a communication channel to publish frequent updates and latest developments on them would be an advantage.
Here, we introduce the JBMS thematic series on Biomedical Ontologies. The aim of the series is to disseminate the latest developments in research on biomedical ontologies and provide a venue for publishing newly developed ontologies, updates to existing ontologies as well as methodological advances, and selected contributions from conferences and workshops. We aim to give this thematic series a central role in the exploration of ongoing research in biomedical ontologies and intend to work closely together with the research community towards this aim. Researchers and working groups are encouraged to provide feedback on novel developments and special topics to be integrated into the existing publication cycles.
The Semanticscience Integrated Ontology (SIO) is an ontology to facilitate biomedical knowledge discovery. SIO features a simple upper level comprised of essential types and relations for the rich description of arbitrary (real, hypothesized, virtual, fictional) objects, processes and their attributes. SIO specifies simple design patterns to describe and associate qualities, capabilities, functions, quantities, and informational entities including textual, geometrical, and mathematical entities, and provides specific extensions in the domains of chemistry, biology, biochemistry, and bioinformatics. SIO provides an ontological foundation for the Bio2RDF linked data for the life sciences project and is used for semantic integration and discovery for SADI-based semantic web services. SIO is freely available to all users under a creative commons by attribution license. See website for further information: http://sio.semanticscience.org.
The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.
BioHackathon; Bioinformatics; Semantic Web; Web services; Ontology; Visualization; Knowledge representation; Databases; Semantic interoperability; Data models; Data sharing; Data integration
Motivation: Methods for computational drug target identification use information from diverse information sources to predict or prioritize drug targets for known drugs. One set of resources that has been relatively neglected for drug repurposing is animal model phenotype.
Results: We investigate the use of mouse model phenotypes for drug target identification. To achieve this goal, we first integrate mouse model phenotypes and drug effects, and then systematically compare the phenotypic similarity between mouse models and drug effect profiles. We find a high similarity between phenotypes resulting from loss-of-function mutations and drug effects resulting from the inhibition of a protein through a drug action, and demonstrate how this approach can be used to suggest candidate drug targets.
Availability and implementation: Analysis code and supplementary data files are available on the project Web site at https://drugeffects.googlecode.com.
firstname.lastname@example.org or email@example.com
Supplementary data are available at Bioinformatics online.
The identification of protein and gene names (PGNs) from the scientific literature requires semantic resources: Terminological and lexical resources deliver the term candidates into PGN tagging solutions and the gold standard corpora (GSC) train them to identify term parameters and contextual features. Ideally all three resources, i.e. corpora, lexica and taggers, cover the same domain knowledge, and thus support identification of the same types of PGNs and cover all of them. Unfortunately, none of the three serves as a predominant standard and for this reason it is worth exploring, how these three resources comply with each other. We systematically compare different PGN taggers against publicly available corpora and analyze the impact of the included lexical resource in their performance. In particular, we determine the performance gains through false positive filtering, which contributes to the disambiguation of identified PGNs.
In general, machine learning approaches (ML-Tag) for PGN tagging show higher F1-measure performance against the BioCreative-II and Jnlpba GSCs (exact matching), whereas the lexicon based approaches (LexTag) in combination with disambiguation methods show better results on FsuPrge and PennBio. The ML-Tag solutions balance precision and recall, whereas the LexTag solutions have different precision and recall profiles at the same F1-measure across all corpora. Higher recall is achieved with larger lexical resources, which also introduce more noise (false positive results). The ML-Tag solutions certainly perform best, if the test corpus is from the same GSC as the training corpus. As expected, the false negative errors characterize the test corpora and – on the other hand – the profiles of the false positive mistakes characterize the tagging solutions. Lex-Tag solutions that are based on a large terminological resource in combination with false positive filtering produce better results, which, in addition, provide concept identifiers from a knowledge source in contrast to ML-Tag solutions.
The standard ML-Tag solutions achieve high performance, but not across all corpora, and thus should be trained using several different corpora to reduce possible biases. The LexTag solutions have different profiles for their precision and recall performance, but with similar F1-measure. This result is surprising and suggests that they cover a portion of the most common naming standards, but cope differently with the term variability across the corpora. The false positive filtering applied to LexTag solutions does improve the results by increasing their precision without compromising significantly their recall. The harmonisation of the annotation schemes in combination with standardized lexical resources in the tagging solutions will enable their comparability and will pave the way for a shared standard.
Biomedical entities, their identifiers and names, are essential in the representation of biomedical facts and knowledge. In the same way, the complete set of biomedical and chemical terms, i.e. the biomedical “term space” (the “Lexeome”), forms a key resource to achieve the full integration of the scientific literature with biomedical data resources: any identified named entity can immediately be normalized to the correct database entry. This goal does not only require that we are aware of all existing terms, but would also profit from knowing all their senses and their semantic interpretation (ambiguities, nestedness).
This study compiles a resource for lexical terms of biomedical interest in a standard format (called “LexEBI”), determines the overall number of terms, their reuse in different resources and the nestedness of terms. LexEBI comprises references for protein and gene entries and their term variants and chemical entities amongst other terms. In addition, disease terms have been identified from Medline and PubmedCentral and added to LexEBI. Our analysis demonstrates that the baseforms of terms from the different semantic types show only little polysemous use. Nonetheless, the term variants of protein and gene names (PGNs) frequently contain species mentions, which should have been avoided according to protein annotation guidelines. Furthermore, the protein and gene entities as well as the chemical entities, both do comprise enzymes leading to hierarchical polysemy, and a large portion of PGNs make reference to a chemical entity. Altogether, according to our analysis based on the Medline distribution, 401,869 unique PGNs in the documents contain a reference to 25,022 chemical entities, 3,125 disease terms or 1,576 species mentions.
LexEBI delivers the complete biomedical and chemical Lexeome in a standardized representation (http://www.ebi.ac.uk/Rebholz-srv/LexEBI/). The resource provides the disease terms as open source content, and fully interlinks terms across resources.
A major aim of the biological sciences is to gain an understanding of human physiology and disease. One important step towards such a goal is the discovery of the function of genes that will lead to better understanding of the physiology and pathophysiology of organisms ultimately providing better understanding, diagnosis, and therapy. Our increasing ability to phenotypically characterise genetic variants of model organisms coupled with systematic and hypothesis-driven mutagenesis is resulting in a wealth of information that could potentially provide insight to the functions of all genes in an organism. The challenge we are now facing is to develop computational methods that can integrate and analyse such data. The introduction of formal ontologies that make their semantics explicit and accessible to automated reasoning promises the tantalizing possibility of standardizing biomedical knowledge allowing for novel, powerful queries that bridge multiple domains, disciplines, species and levels of granularity. We review recent computational approaches that facilitate the integration of experimental data from model organisms with clinical observations in humans. These methods foster novel cross species analysis approaches, thereby enabling comparative phenomics and leading to the potential of translating basic discoveries from the model systems into diagnostic and therapeutic advances at the clinical level.
Motivation: Many complex diseases are the result of abnormal pathway functions instead of single abnormalities. Disease diagnosis and intervention strategies must target these pathways while minimizing the interference with normal physiological processes. Large-scale identification of disease pathways and chemicals that may be used to perturb them requires the integration of information about drugs, genes, diseases and pathways. This information is currently distributed over several pharmacogenomics databases. An integrated analysis of the information in these databases can reveal disease pathways and facilitate novel biomedical analyses.
Results: We demonstrate how to integrate pharmacogenomics databases through integration of the biomedical ontologies that are used as meta-data in these databases. The additional background knowledge in these ontologies can then be used to enable novel analyses. We identify disease pathways using a novel multi-ontology enrichment analysis over the Human Disease Ontology, and we identify significant associations between chemicals and pathways using an enrichment analysis over a chemical ontology. The drug–pathway and disease–pathway associations are a valuable resource for research in disease and drug mechanisms and can be used to improve computational drug repurposing.
Motivation: The systematic observation of phenotypes has become a crucial tool of functional genomics, and several large international projects are currently underway to identify and characterize the phenotypes that are associated with genotypes in several species. To integrate phenotype descriptions within and across species, phenotype ontologies have been developed. Applying ontologies to unify phenotype descriptions in the domain of physiology has been a particular challenge due to the high complexity of the underlying domain.
Results: In this study, we present the outline of a theory and its implementation for an ontology of physiology-related phenotypes. We provide a formal description of process attributes and relate them to the attributes of their temporal parts and participants. We apply our theory to create the Cellular Phenotype Ontology (CPO). The CPO is an ontology of morphological and physiological phenotypic characteristics of cells, cell components and cellular processes. Its prime application is to provide terms and uniform definition patterns for the annotation of cellular phenotypes. The CPO can be used for the annotation of observed abnormalities in domains, such as systems microscopy, in which cellular abnormalities are observed and for which no phenotype ontology has been created.
Availability and implementation: The CPO and the source code we generated to create the CPO are freely available on http://cell-phenotype.googlecode.com.
Supplementary data are available at Bioinformatics online.
The use of model organisms to provide information on gene function has proved to be a powerful approach to our understanding of both human disease and fundamental mammalian biology. Large-scale community projects using mice, based on forward and reverse genetics, and now the pan-genomic phenotyping efforts of the International Mouse Phenotyping Consortium (IMPC), are generating resources on an unprecedented scale which will be extremely valuable to human genetics and medicine. We discuss the nature and availability of data, mice and ES cells from these large-scale programmes, the use of these resources to help prioritise and validate candidate genes in human genetic association studies, and how they can improve our understanding of the underlying pathobiology of human disease.
mouse; genetics; phenotyping; human; ontology; GWAS; CNV; database
High-throughput phenotyping projects in model organisms have the potential to improve our understanding of gene functions and their role in living organisms. We have developed a computational, knowledge-based approach to automatically infer gene functions from phenotypic manifestations and applied this approach to yeast (Saccharomyces cerevisiae), nematode worm (Caenorhabditis elegans), zebrafish (Danio rerio), fruitfly (Drosophila melanogaster) and mouse (Mus musculus) phenotypes. Our approach is based on the assumption that, if a mutation in a gene leads to a phenotypic abnormality in a process , then must have been involved in , either directly or indirectly. We systematically analyze recorded phenotypes in animal models using the formal definitions created for phenotype ontologies. We evaluate the validity of the inferred functions manually and by demonstrating a significant improvement in predicting genetic interactions and protein-protein interactions based on functional similarity. Our knowledge-based approach is generally applicable to phenotypes recorded in model organism databases, including phenotypes from large-scale, high throughput community projects whose primary mode of dissemination is direct publication on-line rather than in the literature.
As the number and size of biological knowledge resources for physiology grows, researchers need improved tools for searching and integrating knowledge and physiological models. Unfortunately, current resources—databases, simulation models, and knowledge bases, for example—are only occasionally and idiosyncratically explicit about the semantics of the biological entities and processes that they describe.
We present a formal approach, based on the semantics of biophysics as represented in the Ontology of Physics for Biology, that divides physiological knowledge into three partitions: structural knowledge, process knowledge and biophysical knowledge. We then computationally integrate these partitions across multiple structural and biophysical domains as computable ontologies by which such knowledge can be archived, reused, and displayed. Our key result is the semi-automatic parsing of biosimulation model code into PhysioMaps that can be displayed and interrogated for qualitative responses to hypothetical perturbations.
Strong, explicit semantics of biophysics can provide a formal, computational basis for integrating physiological knowledge in a manner that supports visualization of the physiological content of biosimulation models across spatial scales and biophysical domains.
PhenomeNet is an approach for integrating phenotypes across species and identifying candidate genes for genetic diseases based on the similarity between a disease and animal model phenotypes. In contrast to ‘guilt-by-association’ approaches, PhenomeNet relies exclusively on the comparison of phenotypes to suggest candidate genes, and can, therefore, be applied to study the molecular basis of rare and orphan diseases for which the molecular basis is unknown. In addition to disease phenotypes from the Online Mendelian Inheritance in Man (OMIM) database, we have now integrated the clinical signs from Orphanet into PhenomeNet. We demonstrate that our approach can efficiently identify known candidate genes for genetic diseases in Orphanet and OMIM. Furthermore, we find evidence that mutations in the HIP1 gene might cause Bassoe syndrome, a rare disorder with unknown genetic aetiology. Our results demonstrate that integration and computational analysis of human disease and animal model phenotypes using PhenomeNet has the potential to reveal novel insights into the pathobiology underlying genetic diseases.
phenotype; animal model; rare disease; orphan disease; Orphanet; biomedical informatics
Units are basic scientific tools that render meaning to numerical data. Their standardization and formalization caters for the report, exchange, process, reproducibility and integration of quantitative measurements. Ontologies are means that facilitate the integration of data and knowledge allowing interoperability and semantic information processing between diverse biomedical resources and domains. Here, we present the Units Ontology (UO), an ontology currently being used in many scientific resources for the standardized description of units of measurements.
Researchers use animal studies to better understand human diseases. In recent years, large-scale phenotype studies such as Phenoscape and EuroPhenome have been initiated to identify genetic causes of a species' phenome. Species-specific phenotype ontologies are required to capture and report about all findings and to automatically infer results relevant to human diseases. The integration of the different phenotype ontologies into a coherent framework is necessary to achieve interoperability for cross-species research.
Here, we investigate the quality and completeness of two different methods to align the Human Phenotype Ontology and the Mammalian Phenotype Ontology. The first method combines lexical matching with inference over the ontologies' taxonomic structures, while the second method uses a mapping algorithm based on the formal definitions of the ontologies. Neither method could map all concepts. Despite the formal definitions method provides mappings for more concepts than does the lexical matching method, it does not outperform the lexical matching in a biological use case. Our results suggest that combining both approaches will yield a better mappings in terms of completeness, specificity and application purposes.
Phenotype ontologies are used in species-specific databases for the annotation of mutagenesis experiments and to characterize human diseases. The Entity-Quality (EQ) formalism is a means to describe complex phenotypes based on one or more affected entities and a quality. EQ-based definitions have been developed for many phenotype ontologies, including the Human and Mammalian Phenotype ontologies.
We analyze formalizations of complex phenotype descriptions in the Web Ontology Language (OWL) that are based on the EQ model, identify several representational challenges and analyze potential solutions to address these challenges.
In particular, we suggest a novel, role-based approach to represent relational qualities such as concentration of iron in spleen, discuss its ontological foundation in the General Formal Ontology (GFO) and evaluate its representation in OWL and the benefits it can bring to the representation of phenotype annotations.
Our analysis of OWL-based representations of phenotypes can contribute to improving consistency and expressiveness of formal phenotype descriptions.
Ontologies are widely used in the biomedical community for annotation and integration of databases. Formal definitions can relate classes from different ontologies and thereby integrate data across different levels of granularity, domains and species. We have applied this methodology to the Ascomycete Phenotype Ontology (APO), enabling the reuse of various orthogonal ontologies and we have converted the phenotype associated data found in the SGD following our proposed patterns. We have integrated the resulting data in the cross-species phenotype network PhenomeNET, and we make both the cross-species integration of yeast phenotypes and a similarity-based comparison of yeast phenotypes across species available in the PhenomeBrowser. Furthermore, we utilize our definitions and the yeast phenotype annotations to suggest novel functional annotations of gene products in yeast.
Ontologies are now pervasive in biomedicine, where they serve as a means to standardize terminology, to enable access to domain knowledge, to verify data consistency and to facilitate integrative analyses over heterogeneous biomedical data. For this purpose, research on biomedical ontologies applies theories and methods from diverse disciplines such as information management, knowledge representation, cognitive science, linguistics and philosophy. Depending on the desired applications in which ontologies are being applied, the evaluation of research in biomedical ontologies must follow different strategies. Here, we provide a classification of research problems in which ontologies are being applied, focusing on the use of ontologies in basic and translational research, and we demonstrate how research results in biomedical ontologies can be evaluated. The evaluation strategies depend on the desired application and measure the success of using an ontology for a particular biomedical problem. For many applications, the success can be quantified, thereby facilitating the objective evaluation and comparison of research in biomedical ontology. The objective, quantifiable comparison of research results based on scientific applications opens up the possibility for systematically improving the utility of ontologies in biomedical research.
biomedical ontology; quantitative biology; ontology evaluation; evaluation criteria; ontology-based applications