PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (623147)

Clipboard (0)
None

Related Articles

1.  Experience in Aligning Anatomical Ontologies 
An ontology is a formal representation of a domain modeling the entities in the domain and their relations. When a domain is represented by multiple ontologies, there is need for creating mappings among these ontologies in order to facilitate the integration of data annotated with these ontologies and reasoning across ontologies. The objective of this paper is to recapitulate our experience in aligning large anatomical ontologies and to reflect on some of the issues and challenges encountered along the way. The four anatomical ontologies under investigation are the Foundational Model of Anatomy, GALEN, the Adult Mouse Anatomical Dictionary and the NCI Thesaurus. Their underlying representation formalisms are all different. Our approach to aligning concepts (directly) is automatic, rule-based, and operates at the schema level, generating mostly point-to-point mappings. It uses a combination of domain-specific lexical techniques and structural and semantic techniques (to validate the mappings suggested lexically). It also takes advantage of domain-specific knowledge (lexical knowledge from external resources such as the Unified Medical Language System, as well as knowledge augmentation and inference techniques). In addition to point-to-point mapping of concepts, we present the alignment of relationships and the mapping of concepts group-to-group. We have also successfully tested an indirect alignment through a domain-specific reference ontology. We present an evaluation of our techniques, both against a gold standard established manually and against a generic schema matching system. The advantages and limitations of our approach are analyzed and discussed throughout the paper.
PMCID: PMC2575410  PMID: 18974854
Ontology; ontology alignment; knowledge representation; anatomy; Semantic Web
2.  An ontology of scientific experiments 
The formal description of experiments for efficient analysis, annotation and sharing of results is a fundamental part of the practice of science. Ontologies are required to achieve this objective. A few subject-specific ontologies of experiments currently exist. However, despite the unity of scientific experimentation, no general ontology of experiments exists. We propose the ontology EXPO to meet this need. EXPO links the SUMO (the Suggested Upper Merged Ontology) with subject-specific ontologies of experiments by formalizing the generic concepts of experimental design, methodology and results representation. EXPO is expressed in the W3C standard ontology language OWL-DL. We demonstrate the utility of EXPO and its ability to describe different experimental domains, by applying it to two experiments: one in high-energy physics and the other in phylogenetics. The use of EXPO made the goals and structure of these experiments more explicit, revealed ambiguities, and highlighted an unexpected similarity. We conclude that, EXPO is of general value in describing experiments and a step towards the formalization of science.
doi:10.1098/rsif.2006.0134
PMCID: PMC1885356  PMID: 17015305
ontology; formalization; annotation; artificial intelligence; metadata
3.  How to link ontologies and protein–protein interactions to literature: text-mining approaches and the BioCreative experience 
There is an increasing interest in developing ontologies and controlled vocabularies to improve the efficiency and consistency of manual literature curation, to enable more formal biocuration workflow results and ultimately to improve analysis of biological data. Two ontologies that have been successfully used for this purpose are the Gene Ontology (GO) for annotating aspects of gene products and the Molecular Interaction ontology (PSI-MI) used by databases that archive protein–protein interactions. The examination of protein interactions has proven to be extremely promising for the understanding of cellular processes. Manual mapping of information from the biomedical literature to bio-ontology terms is one of the most challenging components in the curation pipeline. It requires that expert curators interpret the natural language descriptions contained in articles and infer their semantic equivalents in the ontology (controlled vocabulary). Since manual curation is a time-consuming process, there is strong motivation to implement text-mining techniques to automatically extract annotations from free text. A range of text mining strategies has been devised to assist in the automated extraction of biological data. These strategies either recognize technical terms used recurrently in the literature and propose them as candidates for inclusion in ontologies, or retrieve passages that serve as evidential support for annotating an ontology term, e.g. from the PSI-MI or GO controlled vocabularies. Here, we provide a general overview of current text-mining methods to automatically extract annotations of GO and PSI-MI ontology terms in the context of the BioCreative (Critical Assessment of Information Extraction Systems in Biology) challenge. Special emphasis is given to protein–protein interaction data and PSI-MI terms referring to interaction detection methods.
doi:10.1093/database/bas017
PMCID: PMC3309177  PMID: 22438567
4.  OnEX: Exploring changes in life science ontologies 
BMC Bioinformatics  2009;10:250.
Background
Numerous ontologies have recently been developed in life sciences to support a consistent annotation of biological objects, such as genes or proteins. These ontologies underlie continuous changes which can impact existing annotations. Therefore, it is valuable for users of ontologies to study the stability of ontologies and to see how many and what kind of ontology changes occurred.
Results
We present OnEX (Ontology Evolution EXplorer) a system for exploring ontology changes. Currently, OnEX provides access to about 560 versions of 16 well-known life science ontologies. The system is based on a three-tier architecture including an ontology version repository, a middleware component and the OnEX web application. Interactive workflows allow a systematic and explorative change analysis of ontologies and their concepts as well as the semi-automatic migration of out-dated annotations to the current version of an ontology.
Conclusion
OnEX provides a user-friendly web interface to explore information about changes in current life science ontologies. It is available at .
doi:10.1186/1471-2105-10-250
PMCID: PMC2746816  PMID: 19678926
5.  Applying the functional abnormality ontology pattern to anatomical functions 
Background
Several biomedical ontologies cover the domain of biological functions, including molecular and cellular functions. However, there is currently no publicly available ontology of anatomical functions.
Consequently, no explicit relation between anatomical structures and their functions is expressed in the anatomy ontologies that are available for various species. Such an explicit relation between anatomical structures and their functions would be useful both for defining the classes of the anatomy and the phenotype ontologies accurately.
Results
We provide an ontological analysis of functions and functional abnormalities. From this analysis, we derive an approach to the automatic extraction of anatomical functions from existing ontologies which uses a combination of natural language processing, graph-based analysis of the ontologies and formal inferences. Additionally, we introduce a new relation to link material objects to processes that realize the function of these objects. This relation is introduced to avoid a needless duplication of processes already covered by the Gene Ontology in a new ontology of anatomical functions.
Conclusions
Ontological considerations on the nature of functional abnormalities and their representation in current phenotype ontologies show that we can extract a skeleton for an ontology of anatomical functions by using a combination of process, phenotype and anatomy ontologies automatically. We identify several limitations of the current ontologies that still need to be addressed to ensure a consistent and complete representation of anatomical functions and their abnormalities.
Availability
The source code and results of our analysis are available at http://bioonto.de.
doi:10.1186/2041-1480-1-4
PMCID: PMC2895731  PMID: 20618982
6.  Development and Evaluation of an Ontology for Guiding Appropriate Antibiotic Prescribing 
Journal of Biomedical Informatics  2011;45(1):120-128.
Objectives
To develop and apply formal ontology creation methods to the domain of antimicrobial prescribing and to formally evaluate the resulting ontology through intrinsic and extrinsic evaluation studies.
Methods
We extended existing ontology development methods to create the ontology and implemented the ontology using Protégé-OWL. Correctness of the ontology was assessed using a set of ontology design principles and domain expert review via the laddering technique. We created three artifacts to support the extrinsic evaluation (set of prescribing rules, alerts and an ontology-driven alert module, and a patient database) and evaluated the usefulness of the ontology for performing knowledge management tasks to maintain the ontology and for generating alerts to guide antibiotic prescribing.
Results
The ontology includes 199 classes, 10 properties, and 1,636 description logic restrictions. Twenty-three Semantic Web Rule Language rules were written to generate three prescribing alerts: 1) antibiotic-microorganism mismatch alert; 2) medication-allergy alert; and 3) non-recommended empiric antibiotic therapy alert. The evaluation studies confirmed the correctness of the ontology, usefulness of the ontology for representing and maintaining antimicrobial treatment knowledge rules, and usefulness of the ontology for generating alerts to provide feedback to clinicians during antibiotic prescribing.
Conclusions
This study contributes to the understanding of ontology development and evaluation methods and addresses one knowledge gap related to using ontologies as a clinical decision support system component—a need for formal ontology evaluation methods to measure their quality from the perspective of their intrinsic characteristics and their usefulness for specific tasks.
doi:10.1016/j.jbi.2011.10.001
PMCID: PMC3272092  PMID: 22019377
Ontology; Clinical decision support; Evaluation
7.  Physical Properties of Biological Entities: An Introduction to the Ontology of Physics for Biology 
PLoS ONE  2011;6(12):e28708.
As biomedical investigators strive to integrate data and analyses across spatiotemporal scales and biomedical domains, they have recognized the benefits of formalizing languages and terminologies via computational ontologies. Although ontologies for biological entities—molecules, cells, organs—are well-established, there are no principled ontologies of physical properties—energies, volumes, flow rates—of those entities. In this paper, we introduce the Ontology of Physics for Biology (OPB), a reference ontology of classical physics designed for annotating biophysical content of growing repositories of biomedical datasets and analytical models. The OPB's semantic framework, traceable to James Clerk Maxwell, encompasses modern theories of system dynamics and thermodynamics, and is implemented as a computational ontology that references available upper ontologies. In this paper we focus on the OPB classes that are designed for annotating physical properties encoded in biomedical datasets and computational models, and we discuss how the OPB framework will facilitate biomedical knowledge integration.
doi:10.1371/journal.pone.0028708
PMCID: PMC3246444  PMID: 22216106
8.  The EXACT description of biomedical protocols 
Bioinformatics  2008;24(13):i295-i303.
Motivation: Many published manuscripts contain experiment protocols which are poorly described or deficient in information. This means that the published results are very hard or impossible to repeat. This problem is being made worse by the increasing complexity of high-throughput/automated methods. There is therefore a growing need to represent experiment protocols in an efficient and unambiguous way.
Results: We have developed the Experiment ACTions (EXACT) ontology as the basis of a method of representing biological laboratory protocols. We provide example protocols that have been formalized using EXACT, and demonstrate the advantages and opportunities created by using this formalization. We argue that the use of EXACT will result in the publication of protocols with increased clarity and usefulness to the scientific community.
Availability: The ontology, examples and code can be downloaded from http://www.aber.ac.uk/compsci/Research/bio/dss/EXACT/
Contact: Larisa Soldatova lss@aber.ac.uk
doi:10.1093/bioinformatics/btn156
PMCID: PMC2718634  PMID: 18586727
9.  Quantitative comparison of mapping methods between Human and Mammalian Phenotype Ontology 
Journal of Biomedical Semantics  2012;3(Suppl 2):S1.
Researchers use animal studies to better understand human diseases. In recent years, large-scale phenotype studies such as Phenoscape and EuroPhenome have been initiated to identify genetic causes of a species' phenome. Species-specific phenotype ontologies are required to capture and report about all findings and to automatically infer results relevant to human diseases. The integration of the different phenotype ontologies into a coherent framework is necessary to achieve interoperability for cross-species research.
Here, we investigate the quality and completeness of two different methods to align the Human Phenotype Ontology and the Mammalian Phenotype Ontology. The first method combines lexical matching with inference over the ontologies' taxonomic structures, while the second method uses a mapping algorithm based on the formal definitions of the ontologies. Neither method could map all concepts. Despite the formal definitions method provides mappings for more concepts than does the lexical matching method, it does not outperform the lexical matching in a biological use case. Our results suggest that combining both approaches will yield a better mappings in terms of completeness, specificity and application purposes.
doi:10.1186/2041-1480-3-S2-S1
PMCID: PMC3448526  PMID: 23046555
10.  RDFScape: Semantic Web meets Systems Biology 
BMC Bioinformatics  2008;9(Suppl 4):S6.
Background
The recent availability of high-throughput data in molecular biology has increased the need for a formal representation of this knowledge domain. New ontologies are being developed to formalize knowledge, e.g. about the functions of proteins. As the Semantic Web is being introduced into the Life Sciences, the basis for a distributed knowledge-base that can foster biological data analysis is laid. However, there still is a dichotomy, in tools and methodologies, between the use of ontologies in biological investigation, that is, in relation to experimental observations, and their use as a knowledge-base.
Results
RDFScape is a plugin that has been developed to extend a software oriented to biological analysis with support for reasoning on ontologies in the semantic web framework. We show with this plugin how the use of ontological knowledge in biological analysis can be extended through the use of inference. In particular, we present two examples relative to ontologies representing biological pathways: we demonstrate how these can be abstracted and visualized as interaction networks, and how reasoning on causal dependencies within elements of pathways can be implemented.
Conclusions
The use of ontologies for the interpretation of high-throughput biological data can be improved through the use of inference. This allows the use of ontologies not only as annotations, but as a knowledge-base from which new information relevant for specific analysis can be derived.
doi:10.1186/1471-2105-9-S4-S6
PMCID: PMC2367633  PMID: 18460179
11.  Animal trait ontology: The importance and usefulness of a unified trait vocabulary for animal species 
Journal of animal science  2008;86(6):1485-1491.
Ontologies help to identify and formally define the entities and relationships in specific domains of interest. Bio-ontologies, in particular, play a central role in the annotation, integration, analysis, and interpretation of biological data. Missing from the number of bio-ontologies is one that includes phenotypic trait information found in livestock species. As a result, the Animal Trait Ontology (ATO) project being carried out under the auspices of the USDA-National Animal Genome Research Program is aimed at the development of a standardized trait ontology for farm animals and software tools to assist the research community in collaborative creation, editing, maintenance, and use of such an ontology. The ATO is currently inclusive of cattle, pig, and chicken species, and will include other livestock species in the future. The ATO will eventually be linked to other species (e.g., human, rat, mouse) so that comparative analysis can be efficiently performed between species.
doi:10.2527/jas.2008-0930
PMCID: PMC2569847  PMID: 18272850
ontology; trait; phenotype; animal; cattle; chicken
12.  Top-Level Categories of Constitutively Organized Material Entities - Suggestions for a Formal Top-Level Ontology 
PLoS ONE  2011;6(4):e18794.
Background
Application oriented ontologies are important for reliably communicating and managing data in databases. Unfortunately, they often differ in the definitions they use and thus do not live up to their potential. This problem can be reduced when using a standardized and ontologically consistent template for the top-level categories from a top-level formal foundational ontology. This would support ontological consistency within application oriented ontologies and compatibility between them. The Basic Formal Ontology (BFO) is such a foundational ontology for the biomedical domain that has been developed following the single inheritance policy. It provides the top-level template within the Open Biological and Biomedical Ontologies Foundry. If it wants to live up to its expected role, its three top-level categories of material entity (i.e., ‘object’, ‘fiat object part’, ‘object aggregate’) must be exhaustive, i.e. every concrete material entity must instantiate exactly one of them.
Methodology/Principal Findings
By systematically evaluating all possible basic configurations of material building blocks we show that BFO's top-level categories of material entity are not exhaustive. We provide examples from biology and everyday life that demonstrate the necessity for two additional categories: ‘fiat object part aggregate’ and ‘object with fiat object part aggregate’. By distinguishing topological coherence, topological adherence, and metric proximity we furthermore provide a differentiation of clusters and groups as two distinct subcategories for each of the three categories of material entity aggregates, resulting in six additional subcategories of material entity.
Conclusions/Significance
We suggest extending BFO to incorporate two additional categories of material entity as well as two subcategories for each of the three categories of material entity aggregates. With these additions, BFO would exhaustively cover all top-level types of material entity that application oriented ontologies may use as templates. Our result, however, depends on the premise that all material entities are organized according to a constitutive granularity.
doi:10.1371/journal.pone.0018794
PMCID: PMC3080885  PMID: 21533043
13.  Prototype semantic infrastructure for automated small molecule classification and annotation in lipidomics 
BMC Bioinformatics  2011;12:303.
Background
The development of high-throughput experimentation has led to astronomical growth in biologically relevant lipids and lipid derivatives identified, screened, and deposited in numerous online databases. Unfortunately, efforts to annotate, classify, and analyze these chemical entities have largely remained in the hands of human curators using manual or semi-automated protocols, leaving many novel entities unclassified. Since chemical function is often closely linked to structure, accurate structure-based classification and annotation of chemical entities is imperative to understanding their functionality.
Results
As part of an exploratory study, we have investigated the utility of semantic web technologies in automated chemical classification and annotation of lipids. Our prototype framework consists of two components: an ontology and a set of federated web services that operate upon it. The formal lipid ontology we use here extends a part of the LiPrO ontology and draws on the lipid hierarchy in the LIPID MAPS database, as well as literature-derived knowledge. The federated semantic web services that operate upon this ontology are deployed within the Semantic Annotation, Discovery, and Integration (SADI) framework. Structure-based lipid classification is enacted by two core services. Firstly, a structural annotation service detects and enumerates relevant functional groups for a specified chemical structure. A second service reasons over lipid ontology class descriptions using the attributes obtained from the annotation service and identifies the appropriate lipid classification. We extend the utility of these core services by combining them with additional SADI services that retrieve associations between lipids and proteins and identify publications related to specified lipid types. We analyze the performance of SADI-enabled eicosanoid classification relative to the LIPID MAPS classification and reflect on the contribution of our integrative methodology in the context of high-throughput lipidomics.
Conclusions
Our prototype framework is capable of accurate automated classification of lipids and facile integration of lipid class information with additional data obtained with SADI web services. The potential of programming-free integration of external web services through the SADI framework offers an opportunity for development of powerful novel applications in lipidomics. We conclude that semantic web technologies can provide an accurate and versatile means of classification and annotation of lipids.
doi:10.1186/1471-2105-12-303
PMCID: PMC3163564  PMID: 21791100
14.  A reference terminology for drugs. 
GALEN technology for re-usable terminologies using formal classification is being applied to the creation and maintenance of a reference terminology for drugs. GALEN's techniques are being used to address specific deficiencies of existing drug classifications that make it difficult to create and maintain guidelines to support prescribing in the care of patients with chronic diseases. The reference terminology is in two parts; firstly, a re-usable and automatically-classified 'ontology' is built with GALEN technology; this describes generic drugs, their composition in terms of chemicals and chemical classes, their actions, indications and interactions. Secondly, a 'dictionary' of prescribable proprietary products is integrated with this ontology. The result is a drug resource designed to support both the traditional uses of a drug knowledge base (e.g. prescribing and messaging), and the specialized demands of guideline authoring and execution.
PMCID: PMC2232568  PMID: 10566339
15.  TGF-beta signaling proteins and the Protein Ontology 
BMC Bioinformatics  2009;10(Suppl 5):S3.
Background
The Protein Ontology (PRO) is designed as a formal and principled Open Biomedical Ontologies (OBO) Foundry ontology for proteins. The components of PRO extend from a classification of proteins on the basis of evolutionary relationships at the homeomorphic level to the representation of the multiple protein forms of a gene, including those resulting from alternative splicing, cleavage and/or post-translational modifications. Focusing specifically on the TGF-beta signaling proteins, we describe the building, curation, usage and dissemination of PRO.
Results
PRO is manually curated on the basis of PrePRO, an automatically generated file with content derived from standard protein data sources. Manual curation ensures that the treatment of the protein classes and the internal and external relationships conform to the PRO framework. The current release of PRO is based upon experimental data from mouse and human proteins wherein equivalent protein forms are represented by single terms. In addition to the PRO ontology, the annotation of PRO terms is released as a separate PRO association file, which contains, for each given PRO term, an annotation from the experimentally characterized sub-types as well as the corresponding database identifiers and sequence coordinates. The annotations are added in the form of relationship to other ontologies. Whenever possible, equivalent forms in other species are listed to facilitate cross-species comparison. Splice and allelic variants, gene fusion products and modified protein forms are all represented as entities in the ontology. Therefore, PRO provides for the representation of protein entities and a resource for describing the associated data. This makes PRO useful both for proteomics studies where isoforms and modified forms must be differentiated, and for studies of biological pathways, where representations need to take account of the different ways in which the cascade of events may depend on specific protein modifications.
Conclusion
PRO provides a framework for the formal representation of protein classes and protein forms in the OBO Foundry. It is designed to enable data retrieval and integration and machine reasoning at the molecular level of proteins, thereby facilitating cross-species comparisons, pathway analysis, disease modeling and the generation of new hypotheses.
doi:10.1186/1471-2105-10-S5-S3
PMCID: PMC2679403  PMID: 19426460
16.  Benchmarking Ontologies: Bigger or Better? 
PLoS Computational Biology  2011;7(1):e1001055.
A scientific ontology is a formal representation of knowledge within a domain, typically including central concepts, their properties, and relations. With the rise of computers and high-throughput data collection, ontologies have become essential to data mining and sharing across communities in the biomedical sciences. Powerful approaches exist for testing the internal consistency of an ontology, but not for assessing the fidelity of its domain representation. We introduce a family of metrics that describe the breadth and depth with which an ontology represents its knowledge domain. We then test these metrics using (1) four of the most common medical ontologies with respect to a corpus of medical documents and (2) seven of the most popular English thesauri with respect to three corpora that sample language from medicine, news, and novels. Here we show that our approach captures the quality of ontological representation and guides efforts to narrow the breach between ontology and collective discourse within a domain. Our results also demonstrate key features of medical ontologies, English thesauri, and discourse from different domains. Medical ontologies have a small intersection, as do English thesauri. Moreover, dialects characteristic of distinct domains vary strikingly as many of the same words are used quite differently in medicine, news, and novels. As ontologies are intended to mirror the state of knowledge, our methods to tighten the fit between ontology and domain will increase their relevance for new areas of biomedical science and improve the accuracy and power of inferences computed across them.
Author Summary
An ontology represents the concepts and their interrelation within a knowledge domain. Several ontologies have been developed in biomedicine, which provide standardized vocabularies to describe diseases, genes and gene products, physiological phenotypes, anatomical structures, and many other phenomena. Scientists use them to encode the results of complex experiments and observations and to perform integrative analysis to discover new knowledge. A remaining challenge in ontology development is how to evaluate an ontology's representation of knowledge within its scientific domain. Building on classic measures from information retrieval, we introduce a family of metrics including breadth and depth that capture the conceptual coverage and parsimony of an ontology. We test these measures using (1) four commonly used medical ontologies in relation to a corpus of medical documents and (2) seven popular English thesauri (ontologies of synonyms) with respect to text from medicine, news, and novels. Results demonstrate that both medical ontologies and English thesauri have a small overlap in concepts and relations. Our methods suggest efforts to tighten the fit between ontologies and biomedical knowledge.
doi:10.1371/journal.pcbi.1001055
PMCID: PMC3020923  PMID: 21249231
17.  Modeling sample variables with an Experimental Factor Ontology 
Bioinformatics  2010;26(8):1112-1118.
Motivation: Describing biological sample variables with ontologies is complex due to the cross-domain nature of experiments. Ontologies provide annotation solutions; however, for cross-domain investigations, multiple ontologies are needed to represent the data. These are subject to rapid change, are often not interoperable and present complexities that are a barrier to biological resource users.
Results: We present the Experimental Factor Ontology, designed to meet cross-domain, application focused use cases for gene expression data. We describe our methodology and open source tools used to create the ontology. These include tools for creating ontology mappings, ontology views, detecting ontology changes and using ontologies in interfaces to enhance querying. The application of reference ontologies to data is a key problem, and this work presents guidelines on how community ontologies can be presented in an application ontology in a data-driven way.
Availability: http://www.ebi.ac.uk/efo
Contact: malone@ebi.ac.uk
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq099
PMCID: PMC2853691  PMID: 20200009
18.  Towards improving phenotype representation in OWL 
Journal of Biomedical Semantics  2012;3(Suppl 2):S5.
Background
Phenotype ontologies are used in species-specific databases for the annotation of mutagenesis experiments and to characterize human diseases. The Entity-Quality (EQ) formalism is a means to describe complex phenotypes based on one or more affected entities and a quality. EQ-based definitions have been developed for many phenotype ontologies, including the Human and Mammalian Phenotype ontologies.
Methods
We analyze formalizations of complex phenotype descriptions in the Web Ontology Language (OWL) that are based on the EQ model, identify several representational challenges and analyze potential solutions to address these challenges.
Results
In particular, we suggest a novel, role-based approach to represent relational qualities such as concentration of iron in spleen, discuss its ontological foundation in the General Formal Ontology (GFO) and evaluate its representation in OWL and the benefits it can bring to the representation of phenotype annotations.
Conclusion
Our analysis of OWL-based representations of phenotypes can contribute to improving consistency and expressiveness of formal phenotype descriptions.
doi:10.1186/2041-1480-3-S2-S5
PMCID: PMC3448528  PMID: 23046625
19.  Obol: Integrating Language and Meaning in Bio-Ontologies 
Comparative and Functional Genomics  2004;5(6-7):509-520.
Ontologies are intended to capture and formalize a domain of knowledge. The ontologies comprising the Open Biological Ontologies (OBO) project, which includes the Gene Ontology (GO), are formalizations of various domains of biological knowledge. Ontologies within OBO typically lack computable definitions that serve to differentiate a term from other similar terms. The computer is unable to determine the meaning of a term, which presents problems for tools such as automated reasoners. Reasoners can be of enormous benefit in managing a complex ontology. OBO term names frequently implicitly encode the kind of definitions that can be used by computational tools, such as automated reasoners. The definitions encoded in the names are not easily amenable to computation, because the names are ostensibly natural language phrases designed for human users. These names are highly regular in their grammar, and can thus be treated as valid sentences in some formal or computable language.With a description of the rules underlying this formal language, term names can be parsed to derive computable definitions, which can then be reasoned over. This paper describes the effort to elucidate that language, called Obol, and the attempts to reason over the resulting definitions. The current implementation finds unique non-trivial definitions for around half of the terms in the GO, and has been used to find 223 missing relationships, which have since been added to the ontology. Obol has utility as an ontology maintenance tool, and as a means of generating computable definitions for a whole ontology.
The software is available under an open-source license from: http://www.fruitfly. org/~cjm/obol. Supplementary material for this article can be found at: http://www. interscience.wiley.com/jpages/1531-6912/suppmat.
doi:10.1002/cfg.435
PMCID: PMC2447432  PMID: 18629143
20.  Accommodating Ontologies to Biological Reality—Top-Level Categories of Cumulative-Constitutively Organized Material Entities 
PLoS ONE  2012;7(1):e30004.
Background
The Basic Formal Ontology (BFO) is a top-level formal foundational ontology for the biomedical domain. It has been developed with the purpose to serve as an ontologically consistent template for top-level categories of application oriented and domain reference ontologies within the Open Biological and Biomedical Ontologies Foundry (OBO). BFO is important for enabling OBO ontologies to facilitate in reliably communicating and managing data and metadata within and across biomedical databases. Following its intended single inheritance policy, BFO's three top-level categories of material entity (i.e. ‘object’, ‘fiat object part’, ‘object aggregate’) must be exhaustive and mutually disjoint. We have shown elsewhere that for accommodating all types of constitutively organized material entities, BFO must be extended by additional categories of material entity.
Methodology/Principal Findings
Unfortunately, most biomedical material entities are cumulative-constitutively organized. We show that even the extended BFO does not exhaustively cover cumulative-constitutively organized material entities. We provide examples from biology and everyday life that demonstrate the necessity for ‘portion of matter’ as another material building block. This implies the necessity for further extending BFO by ‘portion of matter’ as well as three additional categories that possess portions of matter as aggregate components. These extensions are necessary if the basic assumption that all parts that share the same granularity level exhaustively sum to the whole should also apply to cumulative-constitutively organized material entities. By suggesting a notion of granular representation we provide a way to maintain the single inheritance principle when dealing with cumulative-constitutively organized material entities.
Conclusions/Significance
We suggest to extend BFO to incorporate additional categories of material entity and to rearrange its top-level material entity taxonomy. With these additions and the notion of granular representation, BFO would exhaustively cover all top-level types of material entities that application oriented ontologies may use as templates, while still maintaining the single inheritance principle.
doi:10.1371/journal.pone.0030004
PMCID: PMC3253816  PMID: 22253856
21.  An improved ontological representation of dendritic cells as a paradigm for all cell types 
BMC Bioinformatics  2009;10:70.
Background
Recent increases in the volume and diversity of life science data and information and an increasing emphasis on data sharing and interoperability have resulted in the creation of a large number of biological ontologies, including the Cell Ontology (CL), designed to provide a standardized representation of cell types for data annotation. Ontologies have been shown to have significant benefits for computational analyses of large data sets and for automated reasoning applications, leading to organized attempts to improve the structure and formal rigor of ontologies to better support computation. Currently, the CL employs multiple is_a relations, defining cell types in terms of histological, functional, and lineage properties, and the majority of definitions are written with sufficient generality to hold across multiple species. This approach limits the CL's utility for computation and for cross-species data integration.
Results
To enhance the CL's utility for computational analyses, we developed a method for the ontological representation of cells and applied this method to develop a dendritic cell ontology (DC-CL). DC-CL subtypes are delineated on the basis of surface protein expression, systematically including both species-general and species-specific types and optimizing DC-CL for the analysis of flow cytometry data. We avoid multiple uses of is_a by linking DC-CL terms to terms in other ontologies via additional, formally defined relations such as has_function.
Conclusion
This approach brings benefits in the form of increased accuracy, support for reasoning, and interoperability with other ontology resources. Accordingly, we propose our method as a general strategy for the ontological representation of cells. DC-CL is available from .
doi:10.1186/1471-2105-10-70
PMCID: PMC2662812  PMID: 19243617
22.  Ontology design patterns to disambiguate relations between genes and gene products in GENIA 
Journal of Biomedical Semantics  2011;2(Suppl 5):S1.
Motivation
Annotated reference corpora play an important role in biomedical information extraction. A semantic annotation of the natural language texts in these reference corpora using formal ontologies is challenging due to the inherent ambiguity of natural language. The provision of formal definitions and axioms for semantic annotations offers the means for ensuring consistency as well as enables the development of verifiable annotation guidelines. Consistent semantic annotations facilitate the automatic discovery of new information through deductive inferences.
Results
We provide a formal characterization of the relations used in the recent GENIA corpus annotations. For this purpose, we both select existing axiom systems based on the desired properties of the relations within the domain and develop new axioms for several relations. To apply this ontology of relations to the semantic annotation of text corpora, we implement two ontology design patterns. In addition, we provide a software application to convert annotated GENIA abstracts into OWL ontologies by combining both the ontology of relations and the design patterns. As a result, the GENIA abstracts become available as OWL ontologies and are amenable for automated verification, deductive inferences and other knowledge-based applications.
Availability
Documentation, implementation and examples are available from http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/.
doi:10.1186/2041-1480-2-S5-S1
PMCID: PMC3239299  PMID: 22166341
23.  A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines 
BMC Bioinformatics  2011;12:61.
Background
Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts.
Results
To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats).
Conclusions
PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and includes extensive documentation and annotated usage examples.
doi:10.1186/1471-2105-12-61
PMCID: PMC3051902  PMID: 21352538
24.  Process attributes in bio-ontologies 
BMC Bioinformatics  2012;13:217.
Background
Biomedical processes can provide essential information about the (mal-) functioning of an organism and are thus frequently represented in biomedical terminologies and ontologies, including the GO Biological Process branch. These processes often need to be described and categorised in terms of their attributes, such as rates or regularities. The adequate representation of such process attributes has been a contentious issue in bio-ontologies recently; and domain ontologies have correspondingly developed ad hoc workarounds that compromise interoperability and logical consistency.
Results
We present a design pattern for the representation of process attributes that is compatible with upper ontology frameworks such as BFO and BioTop. Our solution rests on two key tenets: firstly, that many of the sorts of process attributes which are biomedically interesting can be characterised by the ways that repeated parts of such processes constitute, in combination, an overall process; secondly, that entities for which a full logical definition can be assigned do not need to be treated as primitive within a formal ontology framework. We apply this approach to the challenge of modelling and automatically classifying examples of normal and abnormal rates and patterns of heart beating processes, and discuss the expressivity required in the underlying ontology representation language. We provide full definitions for process attributes at increasing levels of domain complexity.
Conclusions
We show that a logical definition of process attributes is feasible, though limited by the expressivity of DL languages so that the creation of primitives is still necessary. This finding may endorse current formal upper-ontology frameworks as a way of ensuring consistency, interoperability and clarity.
doi:10.1186/1471-2105-13-217
PMCID: PMC3585786  PMID: 22928880
25.  A Comparison of Using Taverna and BPEL in Building Scientific Workflows: the case of caGrid 
With the emergence of “service oriented science,” the need arises to orchestrate multiple services to facilitate scientific investigation—that is, to create “science workflows.” We present here our findings in providing a workflow solution for the caGrid service-based grid infrastructure. We choose BPEL and Taverna as candidates, and compare their usability in the lifecycle of a scientific workflow, including workflow composition, execution, and result analysis. Our experience shows that BPEL as an imperative language offers a comprehensive set of modeling primitives for workflows of all flavors; while Taverna offers a dataflow model and a more compact set of primitives that facilitates dataflow modeling and pipelined execution. We hope that this comparison study not only helps researchers select a language or tool that meets their specific needs, but also offers some insight on how a workflow language and tool can fulfill the requirement of the scientific community.
doi:10.1002/cpe.1547
PMCID: PMC2901112  PMID: 20625534
scientific workflow; functional programming; Taverna; BPEL; caGrid

Results 1-25 (623147)