The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers.
As biological and biomedical research increasingly reference the environmental context of the biological entities under study, the need for formalisation and standardisation of environment descriptors is growing. The Environment Ontology (ENVO;
http://www.environmentontology.org) is a community-led, open project which seeks to provide an ontology for specifying a wide range of environments relevant to multiple life science disciplines and, through an open participation model, to accommodate the terminological requirements of all those needing to annotate data using ontology classes. This paper summarises ENVO’s motivation, content, structure, adoption, and governance approach. The ontology is available from
http://purl.obolibrary.org/obo/envo.owl - an OBO format version is also available by switching the file suffix to “obo”.
Environment; Ecosystem; Biome; Ontology
We begin by describing recent developments in the burgeoning discipline of applied ontology, focusing especially on the ways ontologies are providing a means for the consistent representation of scientific data. We then introduce Basic Formal Ontology (BFO), a top-level ontology that is serving as domain-neutral framework for the development of lower level ontologies in many specialist disciplines, above all in biology and medicine. BFO is a bicategorial ontology, embracing both three-dimensionalist (continuant) and four-dimensionalist (occurrent) perspectives within a single framework. We examine how BFO-conformant domain ontologies can deal with the consistent representation of scientific data deriving from the measurement of processes of different types, and we outline on this basis the first steps of an approach to the classification of such processes within the BFO framework.1
The Protein Ontology (PRO; http://proconsortium.org) formally defines protein entities and explicitly represents their major forms and interrelations. Protein entities represented in PRO corresponding to single amino acid chains are categorized by level of specificity into family, gene, sequence and modification metaclasses, and there is a separate metaclass for protein complexes. All metaclasses also have organism-specific derivatives. PRO complements established sequence databases such as UniProtKB, and interoperates with other biomedical and biological ontologies such as the Gene Ontology (GO). PRO relates to UniProtKB in that PRO’s organism-specific classes of proteins encoded by a specific gene correspond to entities documented in UniProtKB entries. PRO relates to the GO in that PRO’s representations of organism-specific protein complexes are subclasses of the organism-agnostic protein complex terms in the GO Cellular Component Ontology. The past few years have seen growth and changes to the PRO, as well as new points of access to the data and new applications of PRO in immunology and proteomics. Here we describe some of these developments.
The cancer stem cell (CSC) theory depicts such cells as having the capacity to produce both identical CSCs (symmetrical division) and tumor-amplifying daughter cells (asymmetric division). CSCs are thought to reside in niches similar to those of normal stem cells as described for neural, intestinal, and epidermal tissue, are resistant to chemotherapy, and are responsible for tumor recurrence. We recently described the niche-like nature of mouse renal adenocarcinoma (RENCA) cells following encapsulation in agarose macrobeads. In this paper we tested the hypothesis that encapsulated RENCA colonies function as an in vitro model of a CSC niche and that the majority of cells would undergo chemotherapy-induced death, followed by tumor recurrence. After exposure to docetaxel (5 µg/ml), 50% of cells were lost one week post-treatment while only one or two cells remained in each colony by 6 weeks. Surviving cells expressed OCT4 and reformed tumors at 16 weeks post-treatment. Docetaxel-resistant cells also grew as monolayers in cell culture (16–17 weeks post-exposure) or as primary tumors following transplantation to Balb/c mice (6 of 10 mice) or NOD.CB17-Prkdcscid/J mice (9 of 9 mice; 10 weeks post-transplantation or 28 weeks post-exposure). These data support the hypothesis that a rare subpopulation of OCT4+ cells are resistant to docetaxel and these cells are sufficient for tumor recurrence. The reported methodology can be used to obtain purified populations of tumor-initiating cells, to screen for anti-tumor-initiating cell agents, and to investigate the in vitro correlate of a CSC niche, especially as it relates to chemo-resistance and tumor recurrence.
RENCA; macrobeads; encapsulation; tumor-initiating cells; OCT4
We have previously used simple empirical equations to reproduce the literature values of the ionization energies of isoelectronic sequences of up to four electrons which gave very good agreement. We reproduce here a kinetic energy expression with corrections for relativity and Lamb shift effects which give excellent agreement with the literature values. These equations become more complex as the number of electrons in the system increases. Alternative simple quadratic expressions for calculating ionization energies of multielectron ions are discussed. A set of coefficients when substituted into a simple expression produces very good agreement with the literature values. Our work shows that Slater's rules are not appropriate for predicting trends or screening constants. This work provides very strong evidence that ionization energies are not functions of complete squares, and when calculating ionization energies electron transition/relaxation has to be taken into account. We demonstrate clearly that for particular isoelectronic sequences, the ionizing electrons may occupy different orbitals and in such cases more than one set of constants are needed to calculate the ionization energies.
The National Center for Biomedical Ontology is now in its seventh year. The goals of this National Center for Biomedical Computing are to: create and maintain a repository of biomedical ontologies and terminologies; build tools and web services to enable the use of ontologies and terminologies in clinical and translational research; educate their trainees and the scientific community broadly about biomedical ontology and ontology-based technology and best practices; and collaborate with a variety of groups who develop and use ontologies and terminologies in biomedicine. The centerpiece of the National Center for Biomedical Ontology is a web-based resource known as BioPortal. BioPortal makes available for research in computationally useful forms more than 270 of the world's biomedical ontologies and terminologies, and supports a wide range of web services that enable investigators to use the ontologies to annotate and retrieve data, to generate value sets and special-purpose lexicons, and to perform advanced analytics on a wide range of biomedical data.
Collaborative technologies; knowledge representations; knowledge acquisition and knowledge management; controlled terminologies and vocabularies; ontologies; knowledge bases; applications that link biomedical knowledge from diverse primary sources (includes automated indexing); statistical analysis of large datasets; methods for integration of information from disparate sources; discovery; and text and data mining methods; automated learning; information retrieval; HIT data standards; representing; identifying; and modeling biological structures; developing and refining ehr data standards (including image standards)
The IPAP Schizophrenia Algorithm was originally designed in the form of a flow-chart to help physicians optimise the treatment of schizophrenic patients in the spirit of guideline-based medicine. We take this algorithm as our starting point in investigating how artifacts of this sort can benefit from the facilities of high-quality ontologies. The IPAP algorithm exists thus far only in a form suitable for use by human beings. We draw on the resources of Basic Formal Ontology (BFO) in order to show how such an algorithm can be enhanced in such a way that it can be used in Semantic Web and related applications. We found that BFO provides a framework that is able to capture in a rigorous way all the types of entities represented in the IPAP Schizophrenia Algorithm in way which yields a computational tool that can be used by software agents to perform monitoring and control of schizophrenic patients. We discuss the issues involved in building an application ontology for this purpose, issues which are important for any Semantic Web application in the life science and healthcare domains.
IPAP; Schizophrenia Algorithm; Realist ontology; Basic Formal Ontology; Electronic health record; Referent tracking; Automated decision support
The Plant Ontology (PO; http://www.plantontology.org/) is a publicly available, collaborative effort to develop and maintain a controlled, structured vocabulary (‘ontology’) of terms to describe plant anatomy, morphology and the stages of plant development. The goals of the PO are to link (annotate) gene expression and phenotype data to plant structures and stages of plant development, using the data model adopted by the Gene Ontology. From its original design covering only rice, maize and Arabidopsis, the scope of the PO has been expanded to include all green plants. The PO was the first multispecies anatomy ontology developed for the annotation of genes and phenotypes. Also, to our knowledge, it was one of the first biological ontologies that provides translations (via synonyms) in non-English languages such as Japanese and Spanish. As of Release #18 (July 2012), there are about 2.2 million annotations linking PO terms to >110,000 unique data objects representing genes or gene models, proteins, RNAs, germplasm and quantitative trait loci (QTLs) from 22 plant species. In this paper, we focus on the plant anatomical entity branch of the PO, describing the organizing principles, resources available to users and examples of how the PO is integrated into other plant genomics databases and web portals. We also provide two examples of comparative analyses, demonstrating how the ontology structure and PO-annotated data can be used to discover the patterns of expression of the LEAFY (LFY) and terpene synthase (TPS) gene homologs.
Bioinformatics; Comparative genomics; Genome annotation; Ontology; Plant anatomy; Terpene synthase
Premise of the study
Bio-ontologies are essential tools for accessing and analyzing the rapidly growing pool of plant genomic and phenomic data. Ontologies provide structured vocabularies to support consistent aggregation of data and a semantic framework for automated analyses and reasoning. They are a key component of the semantic web.
This paper provides background on what bio-ontologies are, why they are relevant to botany, and the principles of ontology development. It includes an overview of ontologies and related resources that are relevant to plant science, with a detailed description of the Plant Ontology (PO). We discuss the challenges of building an ontology that covers all green plants (Viridiplantae).
Ontologies can advance plant science in four keys areas: (1) comparative genetics, genomics, phenomics, and development; (2) taxonomy and systematics; (3) semantic applications; and (4) education.
Bio-ontologies offer a flexible framework for comparative plant biology, based on common botanical understanding. As genomic and phenomic data become available for more species, we anticipate that the annotation of data with ontology terms will become less centralized, while at the same time, the need for cross-species queries will become more common, causing more researchers in plant science to turn to ontologies.
bio-ontologies; genome annotation; OBO Foundry; phenomics; plant anatomy; plant genomics; Plant Ontology; plant systematics; semantic web
There is a need recognized by the National Institute of Dental & Craniofacial Research and the National Cancer Institute to advance basic, translational and clinical saliva research. The goal of the Salivaomics Knowledge Base (SKB) is to create a data management system and web resource constructed to support human salivaomics research. To maximize the utility of the SKB for retrieval, integration and analysis of data, we have developed the Saliva Ontology and SDxMart. This article reviews the informatics advances in saliva diagnostics made possible by the Saliva Ontology and SDxMart.
BioMart; database; ontology; saliva
This paper addresses a family of issues surrounding the biological phenomenon of resistance and its representation in realist ontologies. The treatments of resistance terms in various existing ontologies are examined and found to be either overly narrow, internally inconsistent, or otherwise problematic. We propose a more coherent characterization of resistance in terms of what we shall call blocking dispositions, which are collections of mutually coordinated dispositions which are of such a sort that they cannot undergo simultaneous realization within a single bearer. A definition of ‘protective resistance’ is proposed for use in the Infectious Disease Ontology (IDO) and we show how this definition can be used to characterize the antibiotic resistance in Methicillin-Resistant Staphylococcus aureus (MRSa). The ontological relations between entities in our MRSa case study are used alongside a series of logical inference rules to illustrate logical reasoning about resistance. A description logic representation of blocking dispositions is also provided. We demonstrate that our characterization of resistance is sufficiently general to cover two other cases of resistance in the infectious disease domain involving HIV and malaria.
Infectious Disease Ontology; Basic Formal Ontology; MRSa
One way to detect, monitor and prevent adverse events with the help of Information Technology is by using ontologies capable of representing three levels of reality: what is the case, what is believed about reality, and what is represented. We report on how Basic Formal Ontology and Referent Tracking exhibit this capability and how they are used to develop an adverse event ontology and related data annotation scheme for the European ReMINE project.
ontology; referent tracking; adverse events; patient safety
Representing species-specific proteins and protein complexes in ontologies that are both human- and machine-readable facilitates the retrieval, analysis, and interpretation of genome-scale data sets. Although existing protin-centric informatics resources provide the biomedical research community with well-curated compendia of protein sequence and structure, these resources lack formal ontological representations of the relationships among the proteins themselves. The Protein Ontology (PRO) Consortium is filling this informatics resource gap by developing ontological representations and relationships among proteins and their variants and modified forms. Because proteins are often functional only as members of stable protein complexes, the PRO Consortium, in collaboration with existing protein and pathway databases, has launched a new initiative to implement logical and consistent representation of protein complexes.
We describe here how the PRO Consortium is meeting the challenge of representing species-specific protein complexes, how protein complex representation in PRO supports annotation of protein complexes and comparative biology, and how PRO is being integrated into existing community bioinformatics resources. The PRO resource is accessible at http://pir.georgetown.edu/pro/.
PRO is a unique database resource for species-specific protein complexes. PRO facilitates robust annotation of variations in composition and function contexts for protein complexes within and between species.
Biomedical ontologies exist to serve integration of clinical and experimental data, and it is critical to their success that they be put to widespread use in the annotation of data. How, then, can ontologies achieve the sort of user-friendliness, reliability, cost-effectiveness, and breadth of coverage that is necessary to ensure extensive usage?
Our focus here is on two different sets of answers to these questions that have been proposed, on the one hand in medicine, by the SNOMED CT community, and on the other hand in biology, by the OBO Foundry. We address more specifically the issue as to how adherence to certain development principles can advance the usability and effectiveness of an ontology or terminology resource, for example by allowing more accurate maintenance, more reliable application, and more efficient interoperation with other ontologies and information resources.
SNOMED CT and the OBO Foundry differ considerably in their general approach. Nevertheless, a general trend towards more formal rigor and cross-domain interoperability can be seen in both and we argue that this trend should be accepted by all similar initiatives in the future.
Future efforts in ontology development have to address the need for harmonization and integration of ontologies across disciplinary borders, and for this, coherent formalization of ontologies is a prerequisite.
Biomedical Ontologies; Ontology Harmonization; Quality Assurance; SNOMED CT
The goal of the OBO (Open Biomedical Ontologies) Foundry initiative is to create and maintain an evolving collection of non-overlapping interoperable ontologies that will offer unambiguous representations of the types of entities in biological and biomedical reality. These ontologies are designed to serve non-redundant annotation of data and scientific text. To achieve these ends, the Foundry imposes strict requirements upon the ontologies eligible for inclusion. While these requirements are not met by most existing biomedical terminologies, the latter may nonetheless support the Foundry’s goal of consistent and non-redundant annotation if appropriate mappings of data annotated with their aid can be achieved. To construct such mappings in reliable fashion, however, it is necessary to analyze terminological resources from an ontologically realistic perspective in such a way as to identify the exact import of the ‘concepts’ and associated terms which they contain. We propose a framework for such analysis that is designed to maximize the degree to which legacy terminologies and the data coded with their aid can be successfully used for information-driven clinical and translational research.
Ontology; terminology; mapping
Since 2002 we have been testing and refining a methodology for ontology development that is now being used by multiple groups of researchers in different life science domains. Gary Merrill, in a recent paper in this journal, describes some of the reasons why this methodology has been found attractive by researchers in the biological and biomedical sciences. At the same time he assails the methodology on philosophical grounds, focusing specifically on our recommendation that ontologies developed for scientific purposes should be constructed in such a way that their terms are seen as referring to what we call universals or types in reality. As we show, Merrill’s critique is of little relevance to the success of our realist project, since it not only reveals no actual errors in our work but also criticizes views on universals that we do not in fact hold. However, it nonetheless provides us with a valuable opportunity to clarify the realist methodology, and to show how some of its principles are being applied, especially within the framework of the OBO (Open Biomedical Ontologies) Foundry initiative.
We describe a novel approach to genetic association analyses with proteins sub-divided into biologically relevant smaller sequence features (SFs), and their variant types (VTs). SFVT analyses are particularly informative for study of highly polymorphic proteins such as the human leukocyte antigen (HLA), given the nature of its genetic variation: the high level of polymorphism, the pattern of amino acid variability, and that most HLA variation occurs at functionally important sites, as well as its known role in organ transplant rejection, autoimmune disease development and response to infection. Further, combinations of variable amino acid sites shared by several HLA alleles (shared epitopes) are most likely better descriptors of the actual causative genetic variants. In a cohort of systemic sclerosis patients/controls, SFVT analysis shows that a combination of SFs implicating specific amino acid residues in peptide binding pockets 4 and 7 of HLA-DRB1 explains much of the molecular determinant of risk.
This essay concerns the problems surrounding the use of the term “concept” in current ontology and terminology research. It is based on the constructive dialogue between realist ontology on the one hand and the world of formal standardization of health informatics on the other, but its conclusions are not restricted to the domain of medicine. The term “concept” is one of the most misused even in literature and technical standards which attempt to bring clarity. In this paper we propose to use the term “concept” in the context of producing defined professional terminologies with one specific and consistent meaning which we propose for adoption as the agreed meaning of the term in future terminological research, and specifically in the development of formal terminologies to be used in computer systems. We also discuss and propose new definitions of a set of cognate terms. We describe the relations governing the realm of concepts, and compare these to the richer and more complex set of relations obtaining between entities in the real world. On this basis we also summarize an associated terminology for ontologies as representations of the real world and a partial mapping between the world of concepts and the world of reality.
terminology; ontologies; concept systems
While classifications of mental disorders have existed for over one hundred years, it still remains unspecified what terms such as 'mental disorder', 'disease' and 'illness' might actually denote. While ontologies have been called in aid to address this shortfall since the GALEN project of the early 1990s, most attempts thus far have sought to provide a formal description of the structure of some pre-existing terminology or classification, rather than of the corresponding structures and processes on the side of the patient.
We here present a view of mental disease that is based on ontological realism and which follows the principles embodied in Basic Formal Ontology (BFO) and in the application of BFO in the Ontology of General Medical Science (OGMS). We analyzed statements about what counts as a mental disease provided (1) in the research agenda for the DSM-V, and (2) in Pies' model. The results were used to assess whether the representational units of BFO and OGMS were adequate as foundations for a formal representation of the entities in reality that these statements attempt to describe. We then analyzed the representational units specific to mental disease and provided corresponding definitions.
Our key contributions lie in the identification of confusions and conflations in the existing terminology of mental disease and in providing what we believe is a framework for the sort of clear and unambiguous reference to entities on the side of the patient that is needed in order to avoid these confusions in the future.
The Protein Ontology (PRO) provides a formal, logically-based classification of specific protein classes including structured representations of protein isoforms, variants and modified forms. Initially focused on proteins found in human, mouse and Escherichia coli, PRO now includes representations of protein complexes. The PRO Consortium works in concert with the developers of other biomedical ontologies and protein knowledge bases to provide the ability to formally organize and integrate representations of precise protein forms so as to enhance accessibility to results of protein research. PRO (http://pir.georgetown.edu/pro) is part of the Open Biomedical Ontology Foundry.
We propose a typology of representational artifacts for health care and life sciences domains and associate this typology with different kinds of formal ontology and logic, drawing conclusions as to the strengths and limitations for ontology of different kinds of logical resources, with a focus on description logics.
The four types of domain representation we consider are: (i) lexico-semantic representation, (ii) representation of types of entities, (iii) representations of background knowledge, and (iv) representation of individuals.
We advocate a clear distinction of the four kinds of representation in order to provide a more rational basis for using of ontologies and related artifacts to advance integration of data and interoperability of associated reasoning systems.
We highlight the fact that only a minor portion of scientifically relevant facts in a domain such as biomedicine can be adequately represented by formal ontologies when the latter are conceived as representations of entity types. In particular, the attempt to encode default or probabilistic knowledge using ontologies so conceived is prone to produce unintended, erroneous models.
biomedical ontology; description logic; formal ontology; knowledge representation
The Salivaomics Knowledge Base (SKB) is designed to serve as a computational infrastructure that can permit global exploration and utilization of data and information relevant to salivaomics. SKB is created by aligning (1) the saliva biomarker discovery and validation resources at UCLA with (2) the ontology resources developed by the OBO (Open Biomedical Ontologies) Foundry, including a new Saliva Ontology (SALO).
We define the Saliva Ontology (SALO; http://www.skb.ucla.edu/SALO/) as a consensus-based controlled vocabulary of terms and relations dedicated to the salivaomics domain and to saliva-related diagnostics following the principles of the OBO (Open Biomedical Ontologies) Foundry.
The Saliva Ontology is an ongoing exploratory initiative. The ontology will be used to facilitate salivaomics data retrieval and integration across multiple fields of research together with data analysis and data mining. The ontology will be tested through its ability to serve the annotation ('tagging') of a representative corpus of salivaomics research literature that is to be incorporated into the SKB.
The desideratum of semantic interoperability has been intensively discussed in medical informatics circles in recent years. Originally, experts assumed that this issue could be sufficiently addressed by insisting simply on the application of shared clinical terminologies or clinical information models. However, the use of the term ‘ontology’ has been steadily increasing more recently. We discuss criteria for distinguishing clinical ontologies from clinical terminologies and information models. Then, we briefly present the role clinical ontologies play in two multicentric research projects. Finally, we discuss the interactions between these different kinds of knowledge representation artifacts and the stakeholders involved in developing interoperational real-world clinical applications. We provide ontology engineering examples from two EU-funded projects.
Clinical Ontologies; Formal Ontologies; Knowledge Representation
Ontologies describe reality in specific domains in ways that can bridge various disciplines and languages. They allow easier access and integration of information that is collected by different groups. Ontologies are currently used in the biomedical sciences, geography, and law. A Biomedical Ethics Ontology (BMEO) would benefit members of ethics committees who deal with protocols and consent forms spanning numerous fields of inquiry. There already exists the Ontology for Biomedical Investigations (OBI); the proposed BMEO would interoperate with OBI, creating a powerful information tool. We define a domain ontology and begin to construct a BMEO, focused on the process of evaluating human research protocols. Finally, we show how our BMEO can have practical applications for ethics committees. This paper describes ongoing research and a strategy for its broader continuation and cooperation.
ontology; taxonomy; ethics committee review; automation; semantics