An ontology is a formal representation of a domain modeling the entities in the domain and their relations. When a domain is represented by multiple ontologies, there is need for creating mappings among these ontologies in order to facilitate the integration of data annotated with these ontologies and reasoning across ontologies. The objective of this paper is to recapitulate our experience in aligning large anatomical ontologies and to reflect on some of the issues and challenges encountered along the way. The four anatomical ontologies under investigation are the Foundational Model of Anatomy, GALEN, the Adult Mouse Anatomical Dictionary and the NCI Thesaurus. Their underlying representation formalisms are all different. Our approach to aligning concepts (directly) is automatic, rule-based, and operates at the schema level, generating mostly point-to-point mappings. It uses a combination of domain-specific lexical techniques and structural and semantic techniques (to validate the mappings suggested lexically). It also takes advantage of domain-specific knowledge (lexical knowledge from external resources such as the Unified Medical Language System, as well as knowledge augmentation and inference techniques). In addition to point-to-point mapping of concepts, we present the alignment of relationships and the mapping of concepts group-to-group. We have also successfully tested an indirect alignment through a domain-specific reference ontology. We present an evaluation of our techniques, both against a gold standard established manually and against a generic schema matching system. The advantages and limitations of our approach are analyzed and discussed throughout the paper.
Ontology; ontology alignment; knowledge representation; anatomy; Semantic Web
The formal description of experiments for efficient analysis, annotation and sharing of results is a fundamental part of the practice of science. Ontologies are required to achieve this objective. A few subject-specific ontologies of experiments currently exist. However, despite the unity of scientific experimentation, no general ontology of experiments exists. We propose the ontology EXPO to meet this need. EXPO links the SUMO (the Suggested Upper Merged Ontology) with subject-specific ontologies of experiments by formalizing the generic concepts of experimental design, methodology and results representation. EXPO is expressed in the W3C standard ontology language OWL-DL. We demonstrate the utility of EXPO and its ability to describe different experimental domains, by applying it to two experiments: one in high-energy physics and the other in phylogenetics. The use of EXPO made the goals and structure of these experiments more explicit, revealed ambiguities, and highlighted an unexpected similarity. We conclude that, EXPO is of general value in describing experiments and a step towards the formalization of science.
ontology; formalization; annotation; artificial intelligence; metadata
There is an increasing interest in developing ontologies and controlled vocabularies to improve the efficiency and consistency of manual literature curation, to enable more formal biocuration workflow results and ultimately to improve analysis of biological data. Two ontologies that have been successfully used for this purpose are the Gene Ontology (GO) for annotating aspects of gene products and the Molecular Interaction ontology (PSI-MI) used by databases that archive protein–protein interactions. The examination of protein interactions has proven to be extremely promising for the understanding of cellular processes. Manual mapping of information from the biomedical literature to bio-ontology terms is one of the most challenging components in the curation pipeline. It requires that expert curators interpret the natural language descriptions contained in articles and infer their semantic equivalents in the ontology (controlled vocabulary). Since manual curation is a time-consuming process, there is strong motivation to implement text-mining techniques to automatically extract annotations from free text. A range of text mining strategies has been devised to assist in the automated extraction of biological data. These strategies either recognize technical terms used recurrently in the literature and propose them as candidates for inclusion in ontologies, or retrieve passages that serve as evidential support for annotating an ontology term, e.g. from the PSI-MI or GO controlled vocabularies. Here, we provide a general overview of current text-mining methods to automatically extract annotations of GO and PSI-MI ontology terms in the context of the BioCreative (Critical Assessment of Information Extraction Systems in Biology) challenge. Special emphasis is given to protein–protein interaction data and PSI-MI terms referring to interaction detection methods.
Numerous ontologies have recently been developed in life sciences to support a consistent annotation of biological objects, such as genes or proteins. These ontologies underlie continuous changes which can impact existing annotations. Therefore, it is valuable for users of ontologies to study the stability of ontologies and to see how many and what kind of ontology changes occurred.
We present OnEX (Ontology Evolution EXplorer) a system for exploring ontology changes. Currently, OnEX provides access to about 560 versions of 16 well-known life science ontologies. The system is based on a three-tier architecture including an ontology version repository, a middleware component and the OnEX web application. Interactive workflows allow a systematic and explorative change analysis of ontologies and their concepts as well as the semi-automatic migration of out-dated annotations to the current version of an ontology.
OnEX provides a user-friendly web interface to explore information about changes in current life science ontologies. It is available at .
The importance of ontologies in the biomedical domain is generally recognized. However, their quality is often too poor for large-scale use in critical applications, at least partially due to insufficient training of ontology developers.
To show the efficacy of guideline-based ontology development training on the performance of ontology developers. The hypothesis was that students who received training on top-level ontologies and design patterns perform better than those who only received training in the basic principles of formal ontology engineering.
A curriculum was implemented based on a guideline for ontology design. A randomized controlled trial on the efficacy of this curriculum was performed with 24 students from bioinformatics and related fields. After joint training on the fundamentals of ontology development the students were randomly allocated to two groups. During the intervention, each group received training on different topics in ontology development. In the assessment phase, all students were asked to solve modeling problems on topics taught differentially in the intervention phase. Primary outcome was the similarity of the students’ ontology artefacts compared with gold standard ontologies developed by the authors before the experiment; secondary outcome was the intra-group similarity of group members’ ontologies.
The experiment showed no significant effect of the guideline-based training on the performance of ontology developers (a) the ontologies developed after specific training were only slightly but not significantly closer to the gold standard ontologies than the ontologies developed without prior specific training; (b) although significant differences for certain ontologies were detected, the intra-group similarity was not consistently influenced in one direction by the differential training.
Methodologically limited, this study cannot be interpreted as a general failure of a guideline-based approach to ontology development. Further research is needed to increase insight into whether specific development guidelines and practices in ontology design are effective.
The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages.
Python; large-scale models; reproducibility; computational neuroscience; workflow; integration
Several biomedical ontologies cover the domain of biological functions, including molecular and cellular functions. However, there is currently no publicly available ontology of anatomical functions.
Consequently, no explicit relation between anatomical structures and their functions is expressed in the anatomy ontologies that are available for various species. Such an explicit relation between anatomical structures and their functions would be useful both for defining the classes of the anatomy and the phenotype ontologies accurately.
We provide an ontological analysis of functions and functional abnormalities. From this analysis, we derive an approach to the automatic extraction of anatomical functions from existing ontologies which uses a combination of natural language processing, graph-based analysis of the ontologies and formal inferences. Additionally, we introduce a new relation to link material objects to processes that realize the function of these objects. This relation is introduced to avoid a needless duplication of processes already covered by the Gene Ontology in a new ontology of anatomical functions.
Ontological considerations on the nature of functional abnormalities and their representation in current phenotype ontologies show that we can extract a skeleton for an ontology of anatomical functions by using a combination of process, phenotype and anatomy ontologies automatically. We identify several limitations of the current ontologies that still need to be addressed to ensure a consistent and complete representation of anatomical functions and their abnormalities.
The source code and results of our analysis are available at http://bioonto.de.
To develop and apply formal ontology creation methods to the domain of antimicrobial prescribing and to formally evaluate the resulting ontology through intrinsic and extrinsic evaluation studies.
We extended existing ontology development methods to create the ontology and implemented the ontology using Protégé-OWL. Correctness of the ontology was assessed using a set of ontology design principles and domain expert review via the laddering technique. We created three artifacts to support the extrinsic evaluation (set of prescribing rules, alerts and an ontology-driven alert module, and a patient database) and evaluated the usefulness of the ontology for performing knowledge management tasks to maintain the ontology and for generating alerts to guide antibiotic prescribing.
The ontology includes 199 classes, 10 properties, and 1,636 description logic restrictions. Twenty-three Semantic Web Rule Language rules were written to generate three prescribing alerts: 1) antibiotic-microorganism mismatch alert; 2) medication-allergy alert; and 3) non-recommended empiric antibiotic therapy alert. The evaluation studies confirmed the correctness of the ontology, usefulness of the ontology for representing and maintaining antimicrobial treatment knowledge rules, and usefulness of the ontology for generating alerts to provide feedback to clinicians during antibiotic prescribing.
This study contributes to the understanding of ontology development and evaluation methods and addresses one knowledge gap related to using ontologies as a clinical decision support system component—a need for formal ontology evaluation methods to measure their quality from the perspective of their intrinsic characteristics and their usefulness for specific tasks.
Ontology; Clinical decision support; Evaluation
As biomedical investigators strive to integrate data and analyses across spatiotemporal scales and biomedical domains, they have recognized the benefits of formalizing languages and terminologies via computational ontologies. Although ontologies for biological entities—molecules, cells, organs—are well-established, there are no principled ontologies of physical properties—energies, volumes, flow rates—of those entities. In this paper, we introduce the Ontology of Physics for Biology (OPB), a reference ontology of classical physics designed for annotating biophysical content of growing repositories of biomedical datasets and analytical models. The OPB's semantic framework, traceable to James Clerk Maxwell, encompasses modern theories of system dynamics and thermodynamics, and is implemented as a computational ontology that references available upper ontologies. In this paper we focus on the OPB classes that are designed for annotating physical properties encoded in biomedical datasets and computational models, and we discuss how the OPB framework will facilitate biomedical knowledge integration.
Researchers use animal studies to better understand human diseases. In recent years, large-scale phenotype studies such as Phenoscape and EuroPhenome have been initiated to identify genetic causes of a species' phenome. Species-specific phenotype ontologies are required to capture and report about all findings and to automatically infer results relevant to human diseases. The integration of the different phenotype ontologies into a coherent framework is necessary to achieve interoperability for cross-species research.
Here, we investigate the quality and completeness of two different methods to align the Human Phenotype Ontology and the Mammalian Phenotype Ontology. The first method combines lexical matching with inference over the ontologies' taxonomic structures, while the second method uses a mapping algorithm based on the formal definitions of the ontologies. Neither method could map all concepts. Despite the formal definitions method provides mappings for more concepts than does the lexical matching method, it does not outperform the lexical matching in a biological use case. Our results suggest that combining both approaches will yield a better mappings in terms of completeness, specificity and application purposes.
Motivation: Many published manuscripts contain experiment protocols which are poorly described or deficient in information. This means that the published results are very hard or impossible to repeat. This problem is being made worse by the increasing complexity of high-throughput/automated methods. There is therefore a growing need to represent experiment protocols in an efficient and unambiguous way.
Results: We have developed the Experiment ACTions (EXACT) ontology as the basis of a method of representing biological laboratory protocols. We provide example protocols that have been formalized using EXACT, and demonstrate the advantages and opportunities created by using this formalization. We argue that the use of EXACT will result in the publication of protocols with increased clarity and usefulness to the scientific community.
Availability: The ontology, examples and code can be downloaded from http://www.aber.ac.uk/compsci/Research/bio/dss/EXACT/
Contact: Larisa Soldatova email@example.com
The recent availability of high-throughput data in molecular biology has increased the need for a formal representation of this knowledge domain. New ontologies are being developed to formalize knowledge, e.g. about the functions of proteins. As the Semantic Web is being introduced into the Life Sciences, the basis for a distributed knowledge-base that can foster biological data analysis is laid. However, there still is a dichotomy, in tools and methodologies, between the use of ontologies in biological investigation, that is, in relation to experimental observations, and their use as a knowledge-base.
RDFScape is a plugin that has been developed to extend a software oriented to biological analysis with support for reasoning on ontologies in the semantic web framework. We show with this plugin how the use of ontological knowledge in biological analysis can be extended through the use of inference. In particular, we present two examples relative to ontologies representing biological pathways: we demonstrate how these can be abstracted and visualized as interaction networks, and how reasoning on causal dependencies within elements of pathways can be implemented.
The use of ontologies for the interpretation of high-throughput biological data can be improved through the use of inference. This allows the use of ontologies not only as annotations, but as a knowledge-base from which new information relevant for specific analysis can be derived.
Ontologies, which are made up by standardized and defined controlled vocabulary terms and their interrelationships, are comprehensive and readily searchable repositories for knowledge in a given domain. The Open Biomedical Ontologies (OBO) Foundry was initiated in 2001 with the aims of becoming an “umbrella” for life-science ontologies and promoting the use of ontology development best practices. A software application (OBO-Edit; *.obo file format) was developed to facilitate ontology development and editing. The OBO Foundry now comprises over 100 ontologies and candidate ontologies, including the NCBI organismal classification ontology (NCBITaxon), the Mosquito Insecticide Resistance Ontology (MIRO), the Infectious Disease Ontology (IDO), the IDOMAL malaria ontology, and ontologies for mosquito gross anatomy and tick gross anatomy. We previously developed a disease data management system for dengue and malaria control programs, which incorporated a set of information trees built upon ontological principles, including a “term tree” to promote the use of standardized terms. In the course of doing so, we realized that there were substantial gaps in existing ontologies with regards to concepts, processes, and, especially, physical entities (e.g., vector species, pathogen species, and vector surveillance and management equipment) in the domain of surveillance and management of vectors and vector-borne pathogens. We therefore produced an ontology for vector surveillance and management, focusing on arthropod vectors and vector-borne pathogens with relevance to humans or domestic animals, and with special emphasis on content to support operational activities through inclusion in databases, data management systems, or decision support systems. The Vector Surveillance and Management Ontology (VSMO) includes >2,200 unique terms, of which the vast majority (>80%) were newly generated during the development of this ontology. One core feature of the VSMO is the linkage, through the has_vector relation, of arthropod species to the pathogenic microorganisms for which they serve as biological vectors. We also recognized and addressed a potential roadblock for use of the VSMO by the vector-borne disease community: the difficulty in extracting information from OBO-Edit ontology files (*.obo files) and exporting the information to other file formats. A novel ontology explorer tool was developed to facilitate extraction and export of information from the VSMO *.obo file into lists of terms and their associated unique IDs in *.txt or *.csv file formats. These lists can then be imported into a database or data management system for use as select lists with predefined terms. This is an important step to ensure that the knowledge contained in our ontology can be put into practical use.
ontology; vector; pathogen; surveillance; management
Ontologies help to identify and formally define the entities and relationships in specific domains of interest. Bio-ontologies, in particular, play a central role in the annotation, integration, analysis, and interpretation of biological data. Missing from the number of bio-ontologies is one that includes phenotypic trait information found in livestock species. As a result, the Animal Trait Ontology (ATO) project being carried out under the auspices of the USDA-National Animal Genome Research Program is aimed at the development of a standardized trait ontology for farm animals and software tools to assist the research community in collaborative creation, editing, maintenance, and use of such an ontology. The ATO is currently inclusive of cattle, pig, and chicken species, and will include other livestock species in the future. The ATO will eventually be linked to other species (e.g., human, rat, mouse) so that comparative analysis can be efficiently performed between species.
ontology; trait; phenotype; animal; cattle; chicken
Application oriented ontologies are important for reliably communicating and
managing data in databases. Unfortunately, they often differ in the
definitions they use and thus do not live up to their potential. This
problem can be reduced when using a standardized and ontologically
consistent template for the top-level categories from a top-level formal
foundational ontology. This would support ontological consistency within
application oriented ontologies and compatibility between them. The Basic
Formal Ontology (BFO) is such a foundational ontology for the biomedical
domain that has been developed following the single inheritance policy. It
provides the top-level template within the Open Biological and Biomedical
Ontologies Foundry. If it wants to live up to its expected role, its three
top-level categories of material entity (i.e., ‘object’,
‘fiat object part’, ‘object
aggregate’) must be exhaustive, i.e. every concrete material entity
must instantiate exactly one of them.
By systematically evaluating all possible basic configurations of material
building blocks we show that BFO's top-level categories of material
entity are not exhaustive. We provide examples from biology and everyday
life that demonstrate the necessity for two additional categories:
‘fiat object part aggregate’ and
‘object with fiat object part aggregate’. By
distinguishing topological coherence, topological adherence, and metric
proximity we furthermore provide a differentiation of clusters and groups as
two distinct subcategories for each of the three categories of material
entity aggregates, resulting in six additional subcategories of material
We suggest extending BFO to incorporate two additional categories of material
entity as well as two subcategories for each of the three categories of
material entity aggregates. With these additions, BFO would exhaustively
cover all top-level types of material entity that application oriented
ontologies may use as templates. Our result, however, depends on the premise
that all material entities are organized according to a constitutive
GALEN technology for re-usable terminologies using formal classification is being applied to the creation and maintenance of a reference terminology for drugs. GALEN's techniques are being used to address specific deficiencies of existing drug classifications that make it difficult to create and maintain guidelines to support prescribing in the care of patients with chronic diseases. The reference terminology is in two parts; firstly, a re-usable and automatically-classified 'ontology' is built with GALEN technology; this describes generic drugs, their composition in terms of chemicals and chemical classes, their actions, indications and interactions. Secondly, a 'dictionary' of prescribable proprietary products is integrated with this ontology. The result is a drug resource designed to support both the traditional uses of a drug knowledge base (e.g. prescribing and messaging), and the specialized demands of guideline authoring and execution.
The Protein Ontology (PRO) is designed as a formal and principled Open Biomedical Ontologies (OBO) Foundry ontology for proteins. The components of PRO extend from a classification of proteins on the basis of evolutionary relationships at the homeomorphic level to the representation of the multiple protein forms of a gene, including those resulting from alternative splicing, cleavage and/or post-translational modifications. Focusing specifically on the TGF-beta signaling proteins, we describe the building, curation, usage and dissemination of PRO.
PRO is manually curated on the basis of PrePRO, an automatically generated file with content derived from standard protein data sources. Manual curation ensures that the treatment of the protein classes and the internal and external relationships conform to the PRO framework. The current release of PRO is based upon experimental data from mouse and human proteins wherein equivalent protein forms are represented by single terms. In addition to the PRO ontology, the annotation of PRO terms is released as a separate PRO association file, which contains, for each given PRO term, an annotation from the experimentally characterized sub-types as well as the corresponding database identifiers and sequence coordinates. The annotations are added in the form of relationship to other ontologies. Whenever possible, equivalent forms in other species are listed to facilitate cross-species comparison. Splice and allelic variants, gene fusion products and modified protein forms are all represented as entities in the ontology. Therefore, PRO provides for the representation of protein entities and a resource for describing the associated data. This makes PRO useful both for proteomics studies where isoforms and modified forms must be differentiated, and for studies of biological pathways, where representations need to take account of the different ways in which the cascade of events may depend on specific protein modifications.
PRO provides a framework for the formal representation of protein classes and protein forms in the OBO Foundry. It is designed to enable data retrieval and integration and machine reasoning at the molecular level of proteins, thereby facilitating cross-species comparisons, pathway analysis, disease modeling and the generation of new hypotheses.
Motivation: Describing biological sample variables with ontologies is complex due to the cross-domain nature of experiments. Ontologies provide annotation solutions; however, for cross-domain investigations, multiple ontologies are needed to represent the data. These are subject to rapid change, are often not interoperable and present complexities that are a barrier to biological resource users.
Results: We present the Experimental Factor Ontology, designed to meet cross-domain, application focused use cases for gene expression data. We describe our methodology and open source tools used to create the ontology. These include tools for creating ontology mappings, ontology views, detecting ontology changes and using ontologies in interfaces to enhance querying. The application of reference ontologies to data is a key problem, and this work presents guidelines on how community ontologies can be presented in an application ontology in a data-driven way.
Supplementary information: Supplementary data are available at Bioinformatics online.
The development of high-throughput experimentation has led to astronomical growth in biologically relevant lipids and lipid derivatives identified, screened, and deposited in numerous online databases. Unfortunately, efforts to annotate, classify, and analyze these chemical entities have largely remained in the hands of human curators using manual or semi-automated protocols, leaving many novel entities unclassified. Since chemical function is often closely linked to structure, accurate structure-based classification and annotation of chemical entities is imperative to understanding their functionality.
As part of an exploratory study, we have investigated the utility of semantic web technologies in automated chemical classification and annotation of lipids. Our prototype framework consists of two components: an ontology and a set of federated web services that operate upon it. The formal lipid ontology we use here extends a part of the LiPrO ontology and draws on the lipid hierarchy in the LIPID MAPS database, as well as literature-derived knowledge. The federated semantic web services that operate upon this ontology are deployed within the Semantic Annotation, Discovery, and Integration (SADI) framework. Structure-based lipid classification is enacted by two core services. Firstly, a structural annotation service detects and enumerates relevant functional groups for a specified chemical structure. A second service reasons over lipid ontology class descriptions using the attributes obtained from the annotation service and identifies the appropriate lipid classification. We extend the utility of these core services by combining them with additional SADI services that retrieve associations between lipids and proteins and identify publications related to specified lipid types. We analyze the performance of SADI-enabled eicosanoid classification relative to the LIPID MAPS classification and reflect on the contribution of our integrative methodology in the context of high-throughput lipidomics.
Our prototype framework is capable of accurate automated classification of lipids and facile integration of lipid class information with additional data obtained with SADI web services. The potential of programming-free integration of external web services through the SADI framework offers an opportunity for development of powerful novel applications in lipidomics. We conclude that semantic web technologies can provide an accurate and versatile means of classification and annotation of lipids.
A scientific ontology is a formal representation of knowledge within a domain, typically including central concepts, their properties, and relations. With the rise of computers and high-throughput data collection, ontologies have become essential to data mining and sharing across communities in the biomedical sciences. Powerful approaches exist for testing the internal consistency of an ontology, but not for assessing the fidelity of its domain representation. We introduce a family of metrics that describe the breadth and depth with which an ontology represents its knowledge domain. We then test these metrics using (1) four of the most common medical ontologies with respect to a corpus of medical documents and (2) seven of the most popular English thesauri with respect to three corpora that sample language from medicine, news, and novels. Here we show that our approach captures the quality of ontological representation and guides efforts to narrow the breach between ontology and collective discourse within a domain. Our results also demonstrate key features of medical ontologies, English thesauri, and discourse from different domains. Medical ontologies have a small intersection, as do English thesauri. Moreover, dialects characteristic of distinct domains vary strikingly as many of the same words are used quite differently in medicine, news, and novels. As ontologies are intended to mirror the state of knowledge, our methods to tighten the fit between ontology and domain will increase their relevance for new areas of biomedical science and improve the accuracy and power of inferences computed across them.
An ontology represents the concepts and their interrelation within a knowledge domain. Several ontologies have been developed in biomedicine, which provide standardized vocabularies to describe diseases, genes and gene products, physiological phenotypes, anatomical structures, and many other phenomena. Scientists use them to encode the results of complex experiments and observations and to perform integrative analysis to discover new knowledge. A remaining challenge in ontology development is how to evaluate an ontology's representation of knowledge within its scientific domain. Building on classic measures from information retrieval, we introduce a family of metrics including breadth and depth that capture the conceptual coverage and parsimony of an ontology. We test these measures using (1) four commonly used medical ontologies in relation to a corpus of medical documents and (2) seven popular English thesauri (ontologies of synonyms) with respect to text from medicine, news, and novels. Results demonstrate that both medical ontologies and English thesauri have a small overlap in concepts and relations. Our methods suggest efforts to tighten the fit between ontologies and biomedical knowledge.
Motivation: To provide consistent computable descriptions of phenotype data, PomBase is developing a formal ontology of phenotypes observed in fission yeast.
Results: The fission yeast phenotype ontology (FYPO) is a modular ontology that uses several existing ontologies from the open biological and biomedical ontologies (OBO) collection as building blocks, including the phenotypic quality ontology PATO, the Gene Ontology and Chemical Entities of Biological Interest. Modular ontology development facilitates partially automated effective organization of detailed phenotype descriptions with complex relationships to each other and to underlying biological phenomena. As a result, FYPO supports sophisticated querying, computational analysis and comparison between different experiments and even between species.
Availability: FYPO releases are available from the Subversion repository at the PomBase SourceForge project page (https://sourceforge.net/p/pombase/code/HEAD/tree/phenotype_ontology/). The current version of FYPO is also available on the OBO Foundry Web site (http://obofoundry.org/).
firstname.lastname@example.org or email@example.com
Ontologies are intended to capture and formalize a domain of knowledge. The
ontologies comprising the Open Biological Ontologies (OBO) project, which includes
the Gene Ontology (GO), are formalizations of various domains of biological
knowledge. Ontologies within OBO typically lack computable definitions that serve to
differentiate a term from other similar terms. The computer is unable to determine the
meaning of a term, which presents problems for tools such as automated reasoners.
Reasoners can be of enormous benefit in managing a complex ontology. OBO term
names frequently implicitly encode the kind of definitions that can be used by
computational tools, such as automated reasoners. The definitions encoded in the
names are not easily amenable to computation, because the names are ostensibly
natural language phrases designed for human users. These names are highly regular
in their grammar, and can thus be treated as valid sentences in some formal or
computable language.With a description of the rules underlying this formal language,
term names can be parsed to derive computable definitions, which can then be
reasoned over. This paper describes the effort to elucidate that language, called Obol,
and the attempts to reason over the resulting definitions. The current implementation
finds unique non-trivial definitions for around half of the terms in the GO, and
has been used to find 223 missing relationships, which have since been added to
the ontology. Obol has utility as an ontology maintenance tool, and as a means of
generating computable definitions for a whole ontology.
The software is available under an open-source license from: http://www.fruitfly.
org/~cjm/obol. Supplementary material for this article can be found at: http://www.
Phenotype ontologies are used in species-specific databases for the annotation of mutagenesis experiments and to characterize human diseases. The Entity-Quality (EQ) formalism is a means to describe complex phenotypes based on one or more affected entities and a quality. EQ-based definitions have been developed for many phenotype ontologies, including the Human and Mammalian Phenotype ontologies.
We analyze formalizations of complex phenotype descriptions in the Web Ontology Language (OWL) that are based on the EQ model, identify several representational challenges and analyze potential solutions to address these challenges.
In particular, we suggest a novel, role-based approach to represent relational qualities such as concentration of iron in spleen, discuss its ontological foundation in the General Formal Ontology (GFO) and evaluate its representation in OWL and the benefits it can bring to the representation of phenotype annotations.
Our analysis of OWL-based representations of phenotypes can contribute to improving consistency and expressiveness of formal phenotype descriptions.
The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers.
The Basic Formal Ontology (BFO) is a top-level formal foundational ontology for the biomedical domain. It has been developed with the purpose to serve as an ontologically consistent template for top-level categories of application oriented and domain reference ontologies within the Open Biological and Biomedical Ontologies Foundry (OBO). BFO is important for enabling OBO ontologies to facilitate in reliably communicating and managing data and metadata within and across biomedical databases. Following its intended single inheritance policy, BFO's three top-level categories of material entity (i.e. ‘object’, ‘fiat object part’, ‘object aggregate’) must be exhaustive and mutually disjoint. We have shown elsewhere that for accommodating all types of constitutively organized material entities, BFO must be extended by additional categories of material entity.
Unfortunately, most biomedical material entities are cumulative-constitutively organized. We show that even the extended BFO does not exhaustively cover cumulative-constitutively organized material entities. We provide examples from biology and everyday life that demonstrate the necessity for ‘portion of matter’ as another material building block. This implies the necessity for further extending BFO by ‘portion of matter’ as well as three additional categories that possess portions of matter as aggregate components. These extensions are necessary if the basic assumption that all parts that share the same granularity level exhaustively sum to the whole should also apply to cumulative-constitutively organized material entities. By suggesting a notion of granular representation we provide a way to maintain the single inheritance principle when dealing with cumulative-constitutively organized material entities.
We suggest to extend BFO to incorporate additional categories of material entity and to rearrange its top-level material entity taxonomy. With these additions and the notion of granular representation, BFO would exhaustively cover all top-level types of material entities that application oriented ontologies may use as templates, while still maintaining the single inheritance principle.