Ontologies have increasingly been used in the biomedical domain, which has prompted the emergence of different initiatives to facilitate their development and integration. The Open Biological and Biomedical Ontologies (OBO) Foundry consortium provides a repository of life-science ontologies, which are developed according to a set of shared principles. This consortium has developed an ontology called OBO Relation Ontology aiming at standardizing the different types of biological entity classes and associated relationships. Since ontologies are primarily intended to be used by humans, the use of graphical notations for ontology development facilitates the capture, comprehension and communication of knowledge between its users. However, OBO Foundry ontologies are captured and represented basically using text-based notations. The Unified Modeling Language (UML) provides a standard and widely-used graphical notation for modeling computer systems. UML provides a well-defined set of modeling elements, which can be extended using a built-in extension mechanism named Profile. Thus, this work aims at developing a UML profile for the OBO Relation Ontology to provide a domain-specific set of modeling elements that can be used to create standard UML-based ontologies in the biomedical domain.
We have studied the OBO Relation Ontology, the UML metamodel and the UML profiling mechanism. Based on these studies, we have proposed an extension to the UML metamodel in conformance with the OBO Relation Ontology and we have defined a profile that implements the extended metamodel. Finally, we have applied the proposed UML profile in the development of a number of fragments from different ontologies. Particularly, we have considered the Gene Ontology (GO), the PRotein Ontology (PRO) and the Xenopus Anatomy and Development Ontology (XAO).
The use of an established and well-known graphical language in the development of biomedical ontologies provides a more intuitive form of capturing and representing knowledge than using only text-based notations. The use of the profile requires the domain expert to reason about the underlying semantics of the concepts and relationships being modeled, which helps preventing the introduction of inconsistencies in an ontology under development and facilitates the identification and correction of errors in an already defined ontology.
Ontologies in biomedicine facilitate information integration, data exchange, search and query of biomedical data, and other critical knowledge-intensive tasks. The OBO Foundry is a collaborative effort to establish a set of principles for ontology development with the eventual goal of creating a set of interoperable reference ontologies in the domain of biomedicine. One of the key requirements to achieve this goal is to ensure that ontology developers reuse term definitions that others have already created rather than create their own definitions, thereby making the ontologies orthogonal.
We used a simple lexical algorithm to analyze the extent to which the set of OBO Foundry candidate ontologies identified from September 2009 to September 2010 conforms to this vision. Specifically, we analyzed (1) the level of explicit term reuse in this set of ontologies, (2) the level of overlap, where two ontologies define similar terms independently, and (3) how the levels of reuse and overlap changed during the course of this year.
We found that 30% of the ontologies reuse terms from other Foundry candidates and 96% of the candidate ontologies contain terms that overlap with terms from the other ontologies. We found that while term reuse increased among the ontologies between September 2009 and September 2010, the level of overlap among the ontologies remained relatively constant. Additionally, we analyzed the six ontologies announced as OBO Foundry members on March 5, 2010, and identified that the level of overlap was extremely low, but, notably, so was the level of term reuse.
We have created a prototype web application that allows OBO Foundry ontology developers to see which classes from their ontologies overlap with classes from other ontologies in the OBO Foundry (http://obomap.bioontology.org). From our analysis, we conclude that while the OBO Foundry has made significant progress toward orthogonality during the period of this study through increased adoption of explicit term reuse, a large amount of overlap remains among these ontologies. Furthermore, the characteristics of the identified overlap, such as the terms it comprises and its distribution among the ontologies, indicate that the achieving orthogonality will be exceptionally difficult, if not impossible.
The Open Biomedical Ontologies (OBO) Foundry is a collection of freely available ontologically structured controlled vocabularies in the biomedical domain. Most of them are disseminated via both the OBO Flatfile Format and the semantic web format Web Ontology Language (OWL), which draws upon formal logic. Based on the interpretations underlying OWL description logics (OWL-DL) semantics, we scrutinize the OWL-DL releases of OBO ontologies to assess whether their logical axioms correspond to the meaning intended by their authors.
We analyzed ontologies and ontology cross products available via the OBO Foundry site http://www.obofoundry.org for existential restrictions (someValuesFrom), from which we examined a random sample of 2,836 clauses.
According to a rating done by four experts, 23% of all existential restrictions in OBO Foundry candidate ontologies are suspicious (Cohens' κ = 0.78). We found a smaller proportion of existential restrictions in OBO Foundry cross products are suspicious, but in this case an accurate quantitative judgment is not possible due to a low inter-rater agreement (κ = 0.07). We identified several typical modeling problems, for which satisfactory ontology design patterns based on OWL-DL were proposed. We further describe several usability issues with OBO ontologies, including the lack of ontological commitment for several common terms, and the proliferation of domain-specific relations.
The current OWL releases of OBO Foundry (and Foundry candidate) ontologies contain numerous assertions which do not properly describe the underlying biological reality, or are ambiguous and difficult to interpret. The solution is a better anchoring in upper ontologies and a restriction to relatively few, well defined relation types with given domain and range constraints.
Although policy providers have outlined minimal metadata guidelines and naming conventions, ontologies of today still display inter- and intra-ontology heterogeneities in class labelling schemes and metadata completeness. This fact is at least partially due to missing or inappropriate tools. Software support can ease this situation and contribute to overall ontology consistency and quality by helping to enforce such conventions.
We provide a plugin for the Protégé Ontology editor to allow for easy checks on compliance towards ontology naming conventions and metadata completeness, as well as curation in case of found violations.
In a requirement analysis, derived from a prior standardization approach carried out within the OBO Foundry, we investigate the needed capabilities for software tools to check, curate and maintain class naming conventions. A Protégé tab plugin was implemented accordingly using the Protégé 4.1 libraries. The plugin was tested on six different ontologies. Based on these test results, the plugin could be refined, also by the integration of new functionalities.
The new Protégé plugin, OntoCheck, allows for ontology tests to be carried out on OWL ontologies. In particular the OntoCheck plugin helps to clean up an ontology with regard to lexical heterogeneity, i.e. enforcing naming conventions and metadata completeness, meeting most of the requirements outlined for such a tool. Found test violations can be corrected to foster consistency in entity naming and meta-annotation within an artefact. Once specified, check constraints like name patterns can be stored and exchanged for later re-use. Here we describe a first version of the software, illustrate its capabilities and use within running ontology development efforts and briefly outline improvements resulting from its application. Further, we discuss OntoChecks capabilities in the context of related tools and highlight potential future expansions.
The OntoCheck plugin facilitates labelling error detection and curation, contributing to lexical quality assurance in OWL ontologies. Ultimately, we hope this Protégé extension will ease ontology alignments as well as lexical post-processing of annotated data and hence can increase overall secondary data usage by humans and computers.
Current efforts within the biomedical ontology community focus on achieving interoperability between various biomedical ontologies that cover a range of diverse domains. Achieving this interoperability will contribute to the creation of a rich knowledge base that can be used for querying, as well as generating and testing novel hypotheses. The OBO Foundry principles, as applied to a number of biomedical ontologies, are designed to facilitate this interoperability. However, semantic extensions are required to meet the OBO Foundry interoperability goals. Inconsistencies may arise when ontologies of properties – mostly phenotype ontologies – are combined with ontologies taking a canonical view of a domain – such as many anatomical ontologies. Currently, there is no support for a correct and consistent integration of such ontologies.
We have developed a methodology for accurately representing canonical domain ontologies within the OBO Foundry. This is achieved by adding an extension to the semantics for relationships in the biomedical ontologies that allows for treating canonical information as default. Conclusions drawn from default knowledge may be revoked when additional information becomes available. We show how this extension can be used to achieve interoperability between ontologies, and further allows for the inclusion of more knowledge within them. We apply the formalism to ontologies of mouse anatomy and mammalian phenotypes in order to demonstrate the approach.
Biomedical ontologies require a new class of relations that can be used in conjunction with default knowledge, thereby extending those currently in use. The inclusion of default knowledge is necessary in order to ensure interoperability between ontologies.
Biological sequences play a major role in molecular and computational biology. They are studied as information-bearing entities that make up DNA, RNA or proteins. The Sequence Ontology, which is part of the OBO Foundry, contains descriptions and definitions of sequences and their properties. Yet the most basic question about sequences remains unanswered: what kind of entity is a biological sequence? An answer to this question benefits formal ontologies that use the notion of biological sequences and analyses in computational biology alike.
We provide both an ontological analysis of biological sequences and a formal representation that can be used in knowledge-based applications and other ontologies. We distinguish three distinct kinds of entities that can be referred to as "biological sequence": chains of molecules, syntactic representations such as those in biological databases, and the abstract information-bearing entities. For use in knowledge-based applications and inclusion in biomedical ontologies, we implemented the developed axiom system for use in automated theorem proving.
Axioms are necessary to achieve the main goal of ontologies: to formally specify the meaning of terms used within a domain. The axiom system for the ontology of biological sequences is the first elaborate axiom system for an OBO Foundry ontology and can serve as starting point for the development of more formal ontologies and ultimately of knowledge-based applications.
Ontologies, which are made up by standardized and defined controlled vocabulary terms and their interrelationships, are comprehensive and readily searchable repositories for knowledge in a given domain. The Open Biomedical Ontologies (OBO) Foundry was initiated in 2001 with the aims of becoming an “umbrella” for life-science ontologies and promoting the use of ontology development best practices. A software application (OBO-Edit; *.obo file format) was developed to facilitate ontology development and editing. The OBO Foundry now comprises over 100 ontologies and candidate ontologies, including the NCBI organismal classification ontology (NCBITaxon), the Mosquito Insecticide Resistance Ontology (MIRO), the Infectious Disease Ontology (IDO), the IDOMAL malaria ontology, and ontologies for mosquito gross anatomy and tick gross anatomy. We previously developed a disease data management system for dengue and malaria control programs, which incorporated a set of information trees built upon ontological principles, including a “term tree” to promote the use of standardized terms. In the course of doing so, we realized that there were substantial gaps in existing ontologies with regards to concepts, processes, and, especially, physical entities (e.g., vector species, pathogen species, and vector surveillance and management equipment) in the domain of surveillance and management of vectors and vector-borne pathogens. We therefore produced an ontology for vector surveillance and management, focusing on arthropod vectors and vector-borne pathogens with relevance to humans or domestic animals, and with special emphasis on content to support operational activities through inclusion in databases, data management systems, or decision support systems. The Vector Surveillance and Management Ontology (VSMO) includes >2,200 unique terms, of which the vast majority (>80%) were newly generated during the development of this ontology. One core feature of the VSMO is the linkage, through the has_vector relation, of arthropod species to the pathogenic microorganisms for which they serve as biological vectors. We also recognized and addressed a potential roadblock for use of the VSMO by the vector-borne disease community: the difficulty in extracting information from OBO-Edit ontology files (*.obo files) and exporting the information to other file formats. A novel ontology explorer tool was developed to facilitate extraction and export of information from the VSMO *.obo file into lists of terms and their associated unique IDs in *.txt or *.csv file formats. These lists can then be imported into a database or data management system for use as select lists with predefined terms. This is an important step to ensure that the knowledge contained in our ontology can be put into practical use.
ontology; vector; pathogen; surveillance; management
Ontologies have emerged as a fast growing research topic in the area of semantic web during last decade. Currently there are 204 ontologies that are available
through OBO Foundry and BioPortal. Several excellent tools for navigating the ontological structure are available, however most of them are dedicated to a
specific annotation data or integrated with specific analysis applications, and do not offer flexibility in terms of general-purpose usage for ontology exploration.
We developed OntoVisT, a web based ontological visualization tool. This application is designed for interactive visualization of any ontological hierarchy for a
specific node of interest, up to the chosen level of children and/or ancestor. It takes any ontology file in OBO format as input and generates output as DAG
hierarchical graph for the chosen query. To enhance the navigation capabilities of complex networks, we have embedded several features such as search criteria,
zoom in/out, center focus, nearest neighbor highlights and mouse hover events. The application has been tested on all 72 data sets available in OBO format through
OBO foundry. The results for few of them can be accessed through OntoVisT-Gallery.
The database is available for free at http://ccbb.jnu.ac.in/OntoVisT.html
ontology; visualization tool; directed acyclic graphs; web server; gene ontology
Ontologies are commonly used in biomedicine to organize concepts to describe domains such as anatomies, environments, experiment, taxonomies etc. NCBO BioPortal currently hosts about 180 different biomedical ontologies. These ontologies have been mainly expressed in either the Open Biomedical Ontology (OBO) format or the Web Ontology Language (OWL). OBO emerged from the Gene Ontology, and supports most of the biomedical ontology content. In comparison, OWL is a Semantic Web language, and is supported by the World Wide Web consortium together with integral query languages, rule languages and distributed infrastructure for information interchange. These features are highly desirable for the OBO content as well. A convenient method for leveraging these features for OBO ontologies is by transforming OBO ontologies to OWL.
We have developed a methodology for translating OBO ontologies to OWL using the organization of the Semantic Web itself to guide the work. The approach reveals that the constructs of OBO can be grouped together to form a similar layer cake. Thus we were able to decompose the problem into two parts. Most OBO constructs have easy and obvious equivalence to a construct in OWL. A small subset of OBO constructs requires deeper consideration. We have defined transformations for all constructs in an effort to foster a standard common mapping between OBO and OWL. Our mapping produces OWL-DL, a Description Logics based subset of OWL with desirable computational properties for efficiency and correctness. Our Java implementation of the mapping is part of the official Gene Ontology project source.
Our transformation system provides a lossless roundtrip mapping for OBO ontologies, i.e. an OBO ontology may be translated to OWL and back without loss of knowledge. In addition, it provides a roadmap for bridging the gap between the two ontology languages in order to enable the use of ontology content in a language independent manner.
The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers.
Biobanks are a critical resource for translational science. Recently, semantic web technologies such as ontologies have been found useful in retrieving research data from biobanks. However, recent research has also shown that there is a lack of data about the administrative aspects of biobanks. These data would be helpful to answer research-relevant questions such as what is the scope of specimens collected in a biobank, what is the curation status of the specimens, and what is the contact information for curators of biobanks. Our use cases include giving researchers the ability to retrieve key administrative data (e.g. contact information, contact's affiliation, etc.) about the biobanks where specific specimens of interest are stored. Thus, our goal is to provide an ontology that represents the administrative entities in biobanking and their relations. We base our ontology development on a set of 53 data attributes called MIABIS, which were in part the result of semantic integration efforts of the European Biobanking and Biomolecular Resources Research Infrastructure (BBMRI). The previous work on MIABIS provided the domain analysis for our ontology. We report on a test of our ontology against competency questions that we derived from the initial BBMRI use cases. Future work includes additional ontology development to answer additional competency questions from these use cases.
We created an open-source ontology of biobank administration called Ontologized MIABIS (OMIABIS) coded in OWL 2.0 and developed according to the principles of the OBO Foundry. It re-uses pre-existing ontologies when possible in cooperation with developers of other ontologies in related domains, such as the Ontology of Biomedical Investigation. OMIABIS provides a formalized representation of biobanks and their administration. Using the ontology and a set of Description Logic queries derived from the competency questions that we identified, we were able to retrieve test data with perfect accuracy. In addition, we began development of a mapping from the ontology to pre-existing biobank data structures commonly used in the U.S.
In conclusion, we created OMIABIS, an ontology of biobank administration. We found that basing its development on pre-existing resources to meet the BBMRI use cases resulted in a biobanking ontology that is re-useable in environments other than BBMRI. Our ontology retrieved all true positives and no false positives when queried according to the competency questions we derived from the BBMRI use cases. Mapping OMIABIS to a data structure used for biospecimen collections in a medical center in Little Rock, AR showed adequate coverage of our ontology.
Caused by intracellular Gram-negative bacteria Brucella spp., brucellosis is the most common bacterial zoonotic disease. Extensive studies in brucellosis have yielded a large number of publications and data covering various topics ranging from basic Brucella genetic study to vaccine clinical trials. To support data interoperability and reasoning, a community-based brucellosis-specific biomedical ontology is needed.
The Brucellosis Ontology (IDOBRU: http://sourceforge.net/projects/idobru), a biomedical ontology in the brucellosis domain, is an extension ontology of the core Infectious Disease Ontology (IDO-core) and follows OBO Foundry principles. Currently IDOBRU contains 1503 ontology terms, which includes 739 Brucella-specific terms, 414 IDO-core terms, and 350 terms imported from 10 existing ontologies. IDOBRU has been used to model different aspects of brucellosis, including host infection, zoonotic disease transmission, symptoms, virulence factors and pathogenesis, diagnosis, intentional release, vaccine prevention, and treatment. Case studies are typically used in our IDOBRU modeling. For example, diurnal temperature variation in Brucella patients, a Brucella-specific PCR method, and a WHO-recommended brucellosis treatment were selected as use cases to model brucellosis symptom, diagnosis, and treatment, respectively. Developed using OWL, IDOBRU supports OWL-based ontological reasoning. For example, by performing a Description Logic (DL) query in the OWL editor Protégé 4 or a SPARQL query in an IDOBRU SPARQL server, a check of Brucella virulence factors showed that eight of them are known protective antigens based on the biological knowledge captured within the ontology.
IDOBRU is the first reported bacterial infectious disease ontology developed to represent different disease aspects in a formal logical format. It serves as a brucellosis knowledgebase and supports brucellosis data integration and automated reasoning.
The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data. One approach to integration is through the annotation of multiple bodies of data using common controlled vocabularies or ‘ontologies’. Unfortunately, the very success of this approach has led to a proliferation of ontologies, which itself creates obstacles to integration. The Open Biomedical Ontologies (OBO) consortium is pursuing a strategy to overcome this problem. Existing OBO ontologies, including the Gene Ontology, are undergoing coordinated reform, and new ontologies are being created on the basis of an evolving set of shared principles governing ontology development. The result is an expanding family of ontologies designed to be interoperable and logically well formed and to incorporate accurate representations of biological reality. We describe this OBO Foundry initiative and provide guidelines for those who might wish to become involved.
The Protein Ontology (PRO) is designed as a formal and principled Open Biomedical Ontologies (OBO) Foundry ontology for proteins. The components of PRO extend from a classification of proteins on the basis of evolutionary relationships at the homeomorphic level to the representation of the multiple protein forms of a gene, including those resulting from alternative splicing, cleavage and/or post-translational modifications. Focusing specifically on the TGF-beta signaling proteins, we describe the building, curation, usage and dissemination of PRO.
PRO is manually curated on the basis of PrePRO, an automatically generated file with content derived from standard protein data sources. Manual curation ensures that the treatment of the protein classes and the internal and external relationships conform to the PRO framework. The current release of PRO is based upon experimental data from mouse and human proteins wherein equivalent protein forms are represented by single terms. In addition to the PRO ontology, the annotation of PRO terms is released as a separate PRO association file, which contains, for each given PRO term, an annotation from the experimentally characterized sub-types as well as the corresponding database identifiers and sequence coordinates. The annotations are added in the form of relationship to other ontologies. Whenever possible, equivalent forms in other species are listed to facilitate cross-species comparison. Splice and allelic variants, gene fusion products and modified protein forms are all represented as entities in the ontology. Therefore, PRO provides for the representation of protein entities and a resource for describing the associated data. This makes PRO useful both for proteomics studies where isoforms and modified forms must be differentiated, and for studies of biological pathways, where representations need to take account of the different ways in which the cascade of events may depend on specific protein modifications.
PRO provides a framework for the formal representation of protein classes and protein forms in the OBO Foundry. It is designed to enable data retrieval and integration and machine reasoning at the molecular level of proteins, thereby facilitating cross-species comparisons, pathway analysis, disease modeling and the generation of new hypotheses.
The Gene Ontology (GO) consists of nearly 30,000 classes for describing the activities and locations of gene products. Manual maintenance of an ontology of this size is a considerable effort, and errors and inconsistencies inevitably arise. Reasoners can be used to assist with ontology development, automatically placing classes in a subsumption hierarchy based on their properties. However, the historic lack of computable definitions within the GO has prevented the user of these tools.
In this paper we present preliminary results of an ongoing effort to normalize the GO by explicitly stating the definitions of compositional classes in a form that can be used by reasoners. These definitions are partitioned into mutually exclusive cross-product sets, many of which reference other OBO Foundry candidate ontologies for chemical entities, proteins, biological qualities and anatomical entities. Using these logical definitions we are gradually beginning to automate many aspects of ontology development, detecting errors and filling in missing relationships. These definitions also enhance the GO by weaving it into the fabric of a wider collection of interoperating ontologies, increasing opportunities for data integration and enhancing genomic analyses.
Ontology development is a rapidly growing area of research, especially in the life sciences domain. To promote collaboration and interoperability between different projects, the OBO Foundry principles require that these ontologies be open and non-redundant, avoiding duplication of terms through the re-use of existing resources. As current options to do so present various difficulties, a new approach, MIREOT, allows specifying import of single terms. Initial implementations allow for controlled import of selected annotations and certain classes of related terms.
OntoFox http://ontofox.hegroup.org/ is a web-based system that allows users to input terms, fetch selected properties, annotations, and certain classes of related terms from the source ontologies and save the results using the RDF/XML serialization of the Web Ontology Language (OWL). Compared to an initial implementation of MIREOT, OntoFox allows additional and more easily configurable options for selecting and rewriting annotation properties, and for inclusion of all or a computed subset of terms between low and top level terms. Additional methods for including related classes include a SPARQL-based ontology term retrieval algorithm that extracts terms related to a given set of signature terms and an option to extract the hierarchy rooted at a specified ontology term. OntoFox's output can be directly imported into a developer's ontology. OntoFox currently supports term retrieval from a selection of 15 ontologies accessible via SPARQL endpoints and allows users to extend this by specifying additional endpoints. An OntoFox application in the development of the Vaccine Ontology (VO) is demonstrated.
OntoFox provides a timely publicly available service, providing different options for users to collect terms from external ontologies, making them available for reuse by import into client OWL ontologies.
Clinical studies often use data dictionaries with controlled sets of terms to facilitate data collection, limited interoperability and sharing at a local site. Multi-center retrospective clinical studies require that these data dictionaries, originating from individual participating centers, be harmonized in preparation for the integration of the corresponding clinical research data. Domain ontologies are often used to facilitate multi-center data integration by modeling terms from data dictionaries in a logic-based language, but interoperability among domain ontologies (using automated techniques) is an unresolved issue. Although many upper-level reference ontologies have been proposed to address this challenge, our experience in integrating multi-center sleep medicine data highlights the need for an upper level ontology that models a common set of terms at multiple-levels of abstraction, which is not covered by the existing upper-level ontologies. We introduce a methodology underpinned by a Minimal Domain of Discourse (MiDas) algorithm to automatically extract a minimal common domain of discourse (upper-domain ontology) from an existing domain ontology. Using the Multi-Modality, Multi-Resource Environment for Physiological and Clinical Research (Physio-MIMI) multi-center project in sleep medicine as a use case, we demonstrate the use of MiDas in extracting a minimal domain of discourse for sleep medicine, from Physio-MIMI’s Sleep Domain Ontology (SDO). We then extend the resulting domain of discourse with terms from the data dictionary of the Sleep Heart and Health Study (SHHS) to validate MiDas. To illustrate the wider applicability of MiDas, we automatically extract the respective domains of discourse from 6 sample domain ontologies from the National Center for Biomedical Ontologies (NCBO) and the OBO Foundry.
Most biomedical ontologies are represented in the OBO Flatfile Format, which is an easy-to-use graph-based ontology language. The semantics of the OBO Flatfile Format 1.2 enforces a strict predetermined interpretation of relationship statements between classes. It does not allow flexible specifications that provide better approximations of the intuitive understanding of the considered relations. If relations cannot be accurately expressed then ontologies built upon them may contain false assertions and hence lead to false inferences. Ontologies in the OBO Foundry must formalize the semantics of relations according to the OBO Relationship Ontology (RO). Therefore, being able to accurately express the intended meaning of relations is of crucial importance. Since the Web Ontology Language (OWL) is an expressive language with a formal semantics, it is suitable to de ne the meaning of relations accurately.
We developed a method to provide definition patterns for relations between classes using OWL and describe a novel implementation of the RO based on this method. We implemented our extension in software that converts ontologies in the OBO Flatfile Format to OWL, and also provide a prototype to extract relational patterns from OWL ontologies using automated reasoning. The conversion software is freely available at http://bioonto.de/obo2owl, and can be accessed via a web interface.
Explicitly defining relations permits their use in reasoning software and leads to a more flexible and powerful way of representing biomedical ontologies. Using the extended langua0067e and semantics avoids several mistakes commonly made in formalizing biomedical ontologies, and can be used to automatically detect inconsistencies. The use of our method enables the use of graph-based ontologies in OWL, and makes complex OWL ontologies accessible in a graph-based form. Thereby, our method provides the means to gradually move the representation of biomedical ontologies into formal knowledge representation languages that incorporates an explicit semantics. Our method facilitates the use of OWL-based software in the back-end while ontology curators may continue to develop ontologies with an OBO-style front-end.
The Salivaomics Knowledge Base (SKB) is designed to serve as a computational infrastructure that can permit global exploration and utilization of data and information relevant to salivaomics. SKB is created by aligning (1) the saliva biomarker discovery and validation resources at UCLA with (2) the ontology resources developed by the OBO (Open Biomedical Ontologies) Foundry, including a new Saliva Ontology (SALO).
We define the Saliva Ontology (SALO; http://www.skb.ucla.edu/SALO/) as a consensus-based controlled vocabulary of terms and relations dedicated to the salivaomics domain and to saliva-related diagnostics following the principles of the OBO (Open Biomedical Ontologies) Foundry.
The Saliva Ontology is an ongoing exploratory initiative. The ontology will be used to facilitate salivaomics data retrieval and integration across multiple fields of research together with data analysis and data mining. The ontology will be tested through its ability to serve the annotation ('tagging') of a representative corpus of salivaomics research literature that is to be incorporated into the SKB.
Motivation: Ontologies and taxonomies have proven highly beneficial for biocuration. The Open Biomedical Ontology (OBO) Foundry alone lists over 90 ontologies mainly built with OBO-Edit. Creating and maintaining such ontologies is a labour-intensive, difficult, manual process. Automating parts of it is of great importance for the further development of ontologies and for biocuration.
Results: We have developed the Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG), a system which supports the creation and extension of OBO ontologies by semi-automatically generating terms, definitions and parent–child relations from text in PubMed, the web and PDF repositories. DOG4DAG is seamlessly integrated into OBO-Edit. It generates terms by identifying statistically significant noun phrases in text. For definitions and parent–child relations it employs pattern-based web searches. We systematically evaluate each generation step using manually validated benchmarks. The term generation leads to high-quality terms also found in manually created ontologies. Up to 78% of definitions are valid and up to 54% of child–ancestor relations can be retrieved. There is no other validated system that achieves comparable results.
By combining the prediction of high-quality terms, definitions and parent–child relations with the ontology editor OBO-Edit we contribute a thoroughly validated tool for all OBO ontology engineers.
Availability: DOG4DAG is available within OBO-Edit 2.1 at http://www.oboedit.org
Supplementary Information: Supplementary data are available at Bioinformatics online.
Biomedical ontologies exist to serve integration of clinical and experimental data, and it is critical to their success that they be put to widespread use in the annotation of data. How, then, can ontologies achieve the sort of user-friendliness, reliability, cost-effectiveness, and breadth of coverage that is necessary to ensure extensive usage?
Our focus here is on two different sets of answers to these questions that have been proposed, on the one hand in medicine, by the SNOMED CT community, and on the other hand in biology, by the OBO Foundry. We address more specifically the issue as to how adherence to certain development principles can advance the usability and effectiveness of an ontology or terminology resource, for example by allowing more accurate maintenance, more reliable application, and more efficient interoperation with other ontologies and information resources.
SNOMED CT and the OBO Foundry differ considerably in their general approach. Nevertheless, a general trend towards more formal rigor and cross-domain interoperability can be seen in both and we argue that this trend should be accepted by all similar initiatives in the future.
Future efforts in ontology development have to address the need for harmonization and integration of ontologies across disciplinary borders, and for this, coherent formalization of ontologies is a prerequisite.
Biomedical Ontologies; Ontology Harmonization; Quality Assurance; SNOMED CT
Life scientists need help in coping with the plethora of fast growing and scattered knowledge resources. Ideally, this knowledge should be integrated in a form that allows them to pose complex questions that address the properties of biological systems, independently from the origin of the knowledge. Semantic Web technologies prove to be well suited for knowledge integration, knowledge production (hypothesis formulation), knowledge querying and knowledge maintenance.
We implemented a semantically integrated resource named BioGateway, comprising the entire set of the OBO foundry candidate ontologies, the GO annotation files, the SWISS-PROT protein set, the NCBI taxonomy and several in-house ontologies. BioGateway provides a single entry point to query these resources through SPARQL. It constitutes a key component for a Semantic Systems Biology approach to generate new hypotheses concerning systems properties. In the course of developing BioGateway, we faced challenges that are common to other projects that involve large datasets in diverse representations. We present a detailed analysis of the obstacles that had to be overcome in creating BioGateway. We demonstrate the potential of a comprehensive application of Semantic Web technologies to global biomedical data.
The time is ripe for launching a community effort aimed at a wider acceptance and application of Semantic Web technologies in the life sciences. We call for the creation of a forum that strives to implement a truly semantic life science foundation for Semantic Systems Biology.
Access to the system and supplementary information (such as a listing of the data sources in RDF, and sample queries) can be found at .
Monitoring of insect vector populations with respect to their susceptibility to one or more insecticides is a crucial element of the strategies used for the control of arthropod-borne diseases. This management task can nowadays be achieved more efficiently when assisted by IT (Information Technology) tools, ranging from modern integrated databases to GIS (Geographic Information System). Here we describe an application ontology that we developed de novo, and a specially designed database that, based on this ontology, can be used for the purpose of controlling mosquitoes and, thus, the diseases that they transmit.
The ontology, named MIRO for Mosquito Insecticide Resistance Ontology, developed using the OBO-Edit software, describes all pertinent aspects of insecticide resistance, including specific methodology and mode of action. MIRO, then, forms the basis for the design and development of a dedicated database, IRbase, constructed using open source software, which can be used to retrieve data on mosquito populations in a temporally and spatially separate way, as well as to map the output using a Google Earth interface. The dependency of the database on the MIRO allows for a rational and efficient hierarchical search possibility.
The fact that the MIRO complies with the rules set forward by the OBO (Open Biomedical Ontologies) Foundry introduces cross-referencing with other biomedical ontologies and, thus, both MIRO and IRbase are suitable as parts of future comprehensive surveillance tools and decision support systems that will be used for the control of vector-borne diseases. MIRO is downloadable from and IRbase is accessible at VectorBase, the NIAID-sponsored open access database for arthropod vectors of disease.
It is a historical fact that a successful campaign against vector populations is one of the prerequisites for effectively fighting and eventually eradicating arthropod-borne diseases, be that in an epidemic or, even more so, in endemic cases. Based mostly on the use of insecticides and environmental management, vector control is now increasingly hampered by the occurrence of insecticide resistance that manifests itself, and spreads rapidly, briefly after the introduction of a (novel) chemical substance. We make use here of a specially built ontology, MIRO, to drive a new database, IRbase, dedicated to storing data on the occurrence of insecticide resistance in mosquito populations worldwide. The ontological approach to the design of databases offers the great advantage that these can be searched in an efficient way. Moreover, it also provides for an increased interoperability of present and future epidemiological tools. IRbase is now being populated by both older data from the literature and data recently collected from field.
Biological ontologies are now being widely used for annotation, sharing and retrieval of the biological data. Many of these ontologies are hosted under the umbrella of the Open Biological Ontologies Foundry. In order to support interterminology mapping, composite terms in these ontologies need to be translated into atomic or primitive terms in other, orthogonal ontologies, for example, gluconeogenesis (biological process term) to glucose (chemical ontology term). Identifying such decompositional ontology translations is a challenging problem. In this paper, we propose a network-theoretic approach based on the structure of the integrated OBO relationship graph. We use a network-theoretic measure, called the clustering coefficient, to find relevant atomic terms in the neighborhood of a composite term. By eliminating the existing GO to ChEBI Ontology mappings from OBO, we evaluate whether the proposed approach can re-identify the corresponding relationships. The results indicate that the network structure provides strong cues for decompositional ontology translation and the existing relationships can be used to identify new translations.
network theory; biomedical ontologies; ontology translation; open biomedical ontologies
The Open Biomedical Ontologies (OBO) Foundry is a coordinated community-wide effort to develop ontologies that support the annotation and integration of scientific data. In work supported by the National Database of Autism Research (NDAR), we are developing an ontology of autism that extends the ontologies available in the OBO Foundry. We undertook a systematic literature review to identify domain terms and relationships relevant to autism phenotypes. To enable user queries and inferences about such phenotypes using data in the NDAR repository, we augmented the domain ontology with an information model. In this paper, we show how our approach, using a combination of description logic and rule-based reasoning, enables high-level phenotypic abstractions to be inferred from subject-specific data. Our integrated domain ontology–information model approach allows scientific data repositories to be augmented with rule-based abstractions that facilitate the ability of researchers to undertake data analysis.