Imagine if we could compute across phenotype data as easily as genomic data; this article calls for efforts to realize this vision and discusses the potential benefits.
Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility.
The 2013 Rostock Symposium on Systems Biology and Bioinformatics in Aging Research was again dedicated to dissecting the aging process using in silico means. A particular focus was on ontologies, because these are a key technology to systematically integrate heterogeneous information about the aging process. Related topics were databases and data integration. Other talks tackled modeling issues and applications, the latter including talks focused on marker development and cellular stress as well as on diseases, in particular on diseases of kidney and skin.
A medical intervention is a medical procedure or application intended to relieve or prevent illness or injury. Examples of medical interventions include vaccination and drug administration. After a medical intervention, adverse events (AEs) may occur which lie outside the intended consequences of the intervention. The representation and analysis of AEs are critical to the improvement of public health.
The Ontology of Adverse Events (OAE), previously named Adverse Event Ontology (AEO), is a community-driven ontology developed to standardize and integrate data relating to AEs arising subsequent to medical interventions, as well as to support computer-assisted reasoning. OAE has over 3,000 terms with unique identifiers, including terms imported from existing ontologies and more than 1,800 OAE-specific terms. In OAE, the term ‘adverse event’ denotes a pathological bodily process in a patient that occurs after a medical intervention. Causal adverse events are defined by OAE as those events that are causal consequences of a medical intervention. OAE represents various adverse events based on patient anatomic regions and clinical outcomes, including symptoms, signs, and abnormal processes. OAE has been used in the analysis of several different sorts of vaccine and drug adverse event data. For example, using the data extracted from the Vaccine Adverse Event Reporting System (VAERS), OAE was used to analyse vaccine adverse events associated with the administrations of different types of influenza vaccines. OAE has also been used to represent and classify the vaccine adverse events cited in package inserts of FDA-licensed human vaccines in the USA.
OAE is a biomedical ontology that logically defines and classifies various adverse events occurring after medical interventions. OAE has successfully been applied in several adverse event studies. The OAE ontological framework provides a platform for systematic representation and analysis of adverse events and of the factors (e.g., vaccinee age) important for determining their clinical outcomes.
Ontology of Adverse Events; OAE; Adverse event; Ontology; Vaccine; Drug; Vaccine adverse event; VAERS; Drug adverse event; Design pattern
We have previously described the use of a double coated agarose-agarose porcine islet macrobead for the treatment of type I diabetes mellitus. In the current study, the long-term viral safety of macrobead implantation into pancreatectomized diabetic dogs treated with pravastatin (n = 3) was assessed while 2 dogs served as nonimplanted controls. A more gradual return to preimplant insulin requirements occurred after a 2nd implant procedure (days 148, 189, and >652) when compared to a first macrobead implantation (days 9, 21, and 21) in all macrobead implanted animals. In all three implanted dogs, porcine C-peptide was detected in the blood for at least 10 days following the first implant and for at least 26 days following the second implant. C-peptide was also present in the peritoneal fluid of all three implanted dogs at 6 months after 2nd implant and in 2 of 3 dogs at necropsy. Prescreening results of islet macrobeads and culture media prior to transplantation were negative for 13 viruses. No evidence of PERV or other viral transmission was found throughout the study. This study demonstrates that the long-term (2.4 years) implantation of agarose-agarose encapsulated porcine islets is a safe procedure in a large animal model of type I diabetes mellitus.
The encapsulation of porcine islets is an attractive methodology for the treatment of Type I diabetes. In the current study, the use of pravastatin as a mild anti-inflammatory agent was investigated in pancreatectomized diabetic canines transplanted with porcine islets encapsulated in agarose-agarose macrobeads and given 80 mg/day of pravastatin (n = 3) while control animals did not receive pravastatin (n = 3). Control animals reached preimplant insulin requirements on days 18, 19, and 32. Pravastatin-treated animals reached preimplant insulin requirements on days 22, 27, and 50. Two animals from each group received a second macrobead implant: control animals remained insulin-free for 15 and 21 days (AUC = 3003 and 5078 mg/dL/24 hr days 1 to 15) and reached preimplant insulin requirements on days 62 and 131. Pravastatin treated animals remained insulin-free for 21 and 34 days (AUC = 1559 and 1903 mg/dL/24 hr days 1 to 15) and reached preimplant insulin requirements on days 38 and 192. Total incidence (83.3% versus 64.3%) and total severity (22.7 versus 18.3) of inflammation on tissue surfaces were higher in the control group at necropsy. These findings support pravastatin therapy in conjunction with the transplantation of encapsulated xenogeneic islets for the treatment of diabetes mellitus.
The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers.
We discuss recent progress in the development of cognitive ontologies and summarize three challenges in the coordinated development and application of these resources. Challenge 1 is to adopt a standardized definition for cognitive processes. We describe three possibilities and recommend one that is consistent with the standard view in cognitive and biomedical sciences. Challenge 2 is harmonization. Gaps and conflicts in representation must be resolved so that these resources can be combined for mark-up and interpretation of multi-modal data. Finally, Challenge 3 is to test the utility of these resources for large-scale annotation of data, search and query, and knowledge discovery and integration. As term definitions are tested and revised, harmonization should enable coordinated updates across ontologies. However, the true test of these definitions will be in their community-wide adoption which will test whether they support valid inferences about psychological and neuroscientific data.
ontology; cognition; mental functioning; neuroscience; annotation; integration; big data; brain science
As biological and biomedical research increasingly reference the environmental context of the biological entities under study, the need for formalisation and standardisation of environment descriptors is growing. The Environment Ontology (ENVO;
http://www.environmentontology.org) is a community-led, open project which seeks to provide an ontology for specifying a wide range of environments relevant to multiple life science disciplines and, through an open participation model, to accommodate the terminological requirements of all those needing to annotate data using ontology classes. This paper summarises ENVO’s motivation, content, structure, adoption, and governance approach. The ontology is available from
http://purl.obolibrary.org/obo/envo.owl - an OBO format version is also available by switching the file suffix to “obo”.
Environment; Ecosystem; Biome; Ontology
We are developing the Neurological Disease Ontology (ND) to provide a framework to enable representation of aspects of neurological diseases that are relevant to their treatment and study. ND is a representational tool that addresses the need for unambiguous annotation, storage, and retrieval of data associated with the treatment and study of neurological diseases. ND is being developed in compliance with the Open Biomedical Ontology Foundry principles and builds upon the paradigm established by the Ontology for General Medical Science (OGMS) for the representation of entities in the domain of disease and medical practice. Initial applications of ND will include the annotation and analysis of large data sets and patient records for Alzheimer’s disease, multiple sclerosis, and stroke.
ND is implemented in OWL 2 and currently has more than 450 terms that refer to and describe various aspects of neurological diseases. ND directly imports the development version of OGMS, which uses BFO 2. Term development in ND has primarily extended the OGMS terms ‘disease’, ‘diagnosis’, ‘disease course’, and ‘disorder’. We have imported and utilize over 700 classes from related ontology efforts including the Foundational Model of Anatomy, Ontology for Biomedical Investigations, and Protein Ontology. ND terms are annotated with ontology metadata such as a label (term name), term editors, textual definition, definition source, curation status, and alternative terms (synonyms). Many terms have logical definitions in addition to these annotations. Current development has focused on the establishment of the upper-level structure of the ND hierarchy, as well as on the representation of Alzheimer’s disease, multiple sclerosis, and stroke. The ontology is available as a version-controlled file at
http://code.google.com/p/neurological-disease-ontology along with a discussion list and an issue tracker.
ND seeks to provide a formal foundation for the representation of clinical and research data pertaining to neurological diseases. ND will enable its users to connect data in a robust way with related data that is annotated using other terminologies and ontologies in the biomedical domain.
We begin by describing recent developments in the burgeoning discipline of applied ontology, focusing especially on the ways ontologies are providing a means for the consistent representation of scientific data. We then introduce Basic Formal Ontology (BFO), a top-level ontology that is serving as domain-neutral framework for the development of lower level ontologies in many specialist disciplines, above all in biology and medicine. BFO is a bicategorial ontology, embracing both three-dimensionalist (continuant) and four-dimensionalist (occurrent) perspectives within a single framework. We examine how BFO-conformant domain ontologies can deal with the consistent representation of scientific data deriving from the measurement of processes of different types, and we outline on this basis the first steps of an approach to the classification of such processes within the BFO framework.1
The Protein Ontology (PRO; http://proconsortium.org) formally defines protein entities and explicitly represents their major forms and interrelations. Protein entities represented in PRO corresponding to single amino acid chains are categorized by level of specificity into family, gene, sequence and modification metaclasses, and there is a separate metaclass for protein complexes. All metaclasses also have organism-specific derivatives. PRO complements established sequence databases such as UniProtKB, and interoperates with other biomedical and biological ontologies such as the Gene Ontology (GO). PRO relates to UniProtKB in that PRO’s organism-specific classes of proteins encoded by a specific gene correspond to entities documented in UniProtKB entries. PRO relates to the GO in that PRO’s representations of organism-specific protein complexes are subclasses of the organism-agnostic protein complex terms in the GO Cellular Component Ontology. The past few years have seen growth and changes to the PRO, as well as new points of access to the data and new applications of PRO in immunology and proteomics. Here we describe some of these developments.
The cancer stem cell (CSC) theory depicts such cells as having the capacity to produce both identical CSCs (symmetrical division) and tumor-amplifying daughter cells (asymmetric division). CSCs are thought to reside in niches similar to those of normal stem cells as described for neural, intestinal, and epidermal tissue, are resistant to chemotherapy, and are responsible for tumor recurrence. We recently described the niche-like nature of mouse renal adenocarcinoma (RENCA) cells following encapsulation in agarose macrobeads. In this paper we tested the hypothesis that encapsulated RENCA colonies function as an in vitro model of a CSC niche and that the majority of cells would undergo chemotherapy-induced death, followed by tumor recurrence. After exposure to docetaxel (5 µg/ml), 50% of cells were lost one week post-treatment while only one or two cells remained in each colony by 6 weeks. Surviving cells expressed OCT4 and reformed tumors at 16 weeks post-treatment. Docetaxel-resistant cells also grew as monolayers in cell culture (16–17 weeks post-exposure) or as primary tumors following transplantation to Balb/c mice (6 of 10 mice) or NOD.CB17-Prkdcscid/J mice (9 of 9 mice; 10 weeks post-transplantation or 28 weeks post-exposure). These data support the hypothesis that a rare subpopulation of OCT4+ cells are resistant to docetaxel and these cells are sufficient for tumor recurrence. The reported methodology can be used to obtain purified populations of tumor-initiating cells, to screen for anti-tumor-initiating cell agents, and to investigate the in vitro correlate of a CSC niche, especially as it relates to chemo-resistance and tumor recurrence.
RENCA; macrobeads; encapsulation; tumor-initiating cells; OCT4
We have previously used simple empirical equations to reproduce the literature values of the ionization energies of isoelectronic sequences of up to four electrons which gave very good agreement. We reproduce here a kinetic energy expression with corrections for relativity and Lamb shift effects which give excellent agreement with the literature values. These equations become more complex as the number of electrons in the system increases. Alternative simple quadratic expressions for calculating ionization energies of multielectron ions are discussed. A set of coefficients when substituted into a simple expression produces very good agreement with the literature values. Our work shows that Slater's rules are not appropriate for predicting trends or screening constants. This work provides very strong evidence that ionization energies are not functions of complete squares, and when calculating ionization energies electron transition/relaxation has to be taken into account. We demonstrate clearly that for particular isoelectronic sequences, the ionizing electrons may occupy different orbitals and in such cases more than one set of constants are needed to calculate the ionization energies.
The National Center for Biomedical Ontology is now in its seventh year. The goals of this National Center for Biomedical Computing are to: create and maintain a repository of biomedical ontologies and terminologies; build tools and web services to enable the use of ontologies and terminologies in clinical and translational research; educate their trainees and the scientific community broadly about biomedical ontology and ontology-based technology and best practices; and collaborate with a variety of groups who develop and use ontologies and terminologies in biomedicine. The centerpiece of the National Center for Biomedical Ontology is a web-based resource known as BioPortal. BioPortal makes available for research in computationally useful forms more than 270 of the world's biomedical ontologies and terminologies, and supports a wide range of web services that enable investigators to use the ontologies to annotate and retrieve data, to generate value sets and special-purpose lexicons, and to perform advanced analytics on a wide range of biomedical data.
Collaborative technologies; knowledge representations; knowledge acquisition and knowledge management; controlled terminologies and vocabularies; ontologies; knowledge bases; applications that link biomedical knowledge from diverse primary sources (includes automated indexing); statistical analysis of large datasets; methods for integration of information from disparate sources; discovery; and text and data mining methods; automated learning; information retrieval; HIT data standards; representing; identifying; and modeling biological structures; developing and refining ehr data standards (including image standards)
The IPAP Schizophrenia Algorithm was originally designed in the form of a flow-chart to help physicians optimise the treatment of schizophrenic patients in the spirit of guideline-based medicine. We take this algorithm as our starting point in investigating how artifacts of this sort can benefit from the facilities of high-quality ontologies. The IPAP algorithm exists thus far only in a form suitable for use by human beings. We draw on the resources of Basic Formal Ontology (BFO) in order to show how such an algorithm can be enhanced in such a way that it can be used in Semantic Web and related applications. We found that BFO provides a framework that is able to capture in a rigorous way all the types of entities represented in the IPAP Schizophrenia Algorithm in way which yields a computational tool that can be used by software agents to perform monitoring and control of schizophrenic patients. We discuss the issues involved in building an application ontology for this purpose, issues which are important for any Semantic Web application in the life science and healthcare domains.
IPAP; Schizophrenia Algorithm; Realist ontology; Basic Formal Ontology; Electronic health record; Referent tracking; Automated decision support
The Plant Ontology (PO; http://www.plantontology.org/) is a publicly available, collaborative effort to develop and maintain a controlled, structured vocabulary (‘ontology’) of terms to describe plant anatomy, morphology and the stages of plant development. The goals of the PO are to link (annotate) gene expression and phenotype data to plant structures and stages of plant development, using the data model adopted by the Gene Ontology. From its original design covering only rice, maize and Arabidopsis, the scope of the PO has been expanded to include all green plants. The PO was the first multispecies anatomy ontology developed for the annotation of genes and phenotypes. Also, to our knowledge, it was one of the first biological ontologies that provides translations (via synonyms) in non-English languages such as Japanese and Spanish. As of Release #18 (July 2012), there are about 2.2 million annotations linking PO terms to >110,000 unique data objects representing genes or gene models, proteins, RNAs, germplasm and quantitative trait loci (QTLs) from 22 plant species. In this paper, we focus on the plant anatomical entity branch of the PO, describing the organizing principles, resources available to users and examples of how the PO is integrated into other plant genomics databases and web portals. We also provide two examples of comparative analyses, demonstrating how the ontology structure and PO-annotated data can be used to discover the patterns of expression of the LEAFY (LFY) and terpene synthase (TPS) gene homologs.
Bioinformatics; Comparative genomics; Genome annotation; Ontology; Plant anatomy; Terpene synthase
Premise of the study
Bio-ontologies are essential tools for accessing and analyzing the rapidly growing pool of plant genomic and phenomic data. Ontologies provide structured vocabularies to support consistent aggregation of data and a semantic framework for automated analyses and reasoning. They are a key component of the semantic web.
This paper provides background on what bio-ontologies are, why they are relevant to botany, and the principles of ontology development. It includes an overview of ontologies and related resources that are relevant to plant science, with a detailed description of the Plant Ontology (PO). We discuss the challenges of building an ontology that covers all green plants (Viridiplantae).
Ontologies can advance plant science in four keys areas: (1) comparative genetics, genomics, phenomics, and development; (2) taxonomy and systematics; (3) semantic applications; and (4) education.
Bio-ontologies offer a flexible framework for comparative plant biology, based on common botanical understanding. As genomic and phenomic data become available for more species, we anticipate that the annotation of data with ontology terms will become less centralized, while at the same time, the need for cross-species queries will become more common, causing more researchers in plant science to turn to ontologies.
bio-ontologies; genome annotation; OBO Foundry; phenomics; plant anatomy; plant genomics; Plant Ontology; plant systematics; semantic web
There is a need recognized by the National Institute of Dental & Craniofacial Research and the National Cancer Institute to advance basic, translational and clinical saliva research. The goal of the Salivaomics Knowledge Base (SKB) is to create a data management system and web resource constructed to support human salivaomics research. To maximize the utility of the SKB for retrieval, integration and analysis of data, we have developed the Saliva Ontology and SDxMart. This article reviews the informatics advances in saliva diagnostics made possible by the Saliva Ontology and SDxMart.
BioMart; database; ontology; saliva
This paper addresses a family of issues surrounding the biological phenomenon of resistance and its representation in realist ontologies. The treatments of resistance terms in various existing ontologies are examined and found to be either overly narrow, internally inconsistent, or otherwise problematic. We propose a more coherent characterization of resistance in terms of what we shall call blocking dispositions, which are collections of mutually coordinated dispositions which are of such a sort that they cannot undergo simultaneous realization within a single bearer. A definition of ‘protective resistance’ is proposed for use in the Infectious Disease Ontology (IDO) and we show how this definition can be used to characterize the antibiotic resistance in Methicillin-Resistant Staphylococcus aureus (MRSa). The ontological relations between entities in our MRSa case study are used alongside a series of logical inference rules to illustrate logical reasoning about resistance. A description logic representation of blocking dispositions is also provided. We demonstrate that our characterization of resistance is sufficiently general to cover two other cases of resistance in the infectious disease domain involving HIV and malaria.
Infectious Disease Ontology; Basic Formal Ontology; MRSa
One way to detect, monitor and prevent adverse events with the help of Information Technology is by using ontologies capable of representing three levels of reality: what is the case, what is believed about reality, and what is represented. We report on how Basic Formal Ontology and Referent Tracking exhibit this capability and how they are used to develop an adverse event ontology and related data annotation scheme for the European ReMINE project.
ontology; referent tracking; adverse events; patient safety
Representing species-specific proteins and protein complexes in ontologies that are both human- and machine-readable facilitates the retrieval, analysis, and interpretation of genome-scale data sets. Although existing protin-centric informatics resources provide the biomedical research community with well-curated compendia of protein sequence and structure, these resources lack formal ontological representations of the relationships among the proteins themselves. The Protein Ontology (PRO) Consortium is filling this informatics resource gap by developing ontological representations and relationships among proteins and their variants and modified forms. Because proteins are often functional only as members of stable protein complexes, the PRO Consortium, in collaboration with existing protein and pathway databases, has launched a new initiative to implement logical and consistent representation of protein complexes.
We describe here how the PRO Consortium is meeting the challenge of representing species-specific protein complexes, how protein complex representation in PRO supports annotation of protein complexes and comparative biology, and how PRO is being integrated into existing community bioinformatics resources. The PRO resource is accessible at http://pir.georgetown.edu/pro/.
PRO is a unique database resource for species-specific protein complexes. PRO facilitates robust annotation of variations in composition and function contexts for protein complexes within and between species.
Biomedical ontologies exist to serve integration of clinical and experimental data, and it is critical to their success that they be put to widespread use in the annotation of data. How, then, can ontologies achieve the sort of user-friendliness, reliability, cost-effectiveness, and breadth of coverage that is necessary to ensure extensive usage?
Our focus here is on two different sets of answers to these questions that have been proposed, on the one hand in medicine, by the SNOMED CT community, and on the other hand in biology, by the OBO Foundry. We address more specifically the issue as to how adherence to certain development principles can advance the usability and effectiveness of an ontology or terminology resource, for example by allowing more accurate maintenance, more reliable application, and more efficient interoperation with other ontologies and information resources.
SNOMED CT and the OBO Foundry differ considerably in their general approach. Nevertheless, a general trend towards more formal rigor and cross-domain interoperability can be seen in both and we argue that this trend should be accepted by all similar initiatives in the future.
Future efforts in ontology development have to address the need for harmonization and integration of ontologies across disciplinary borders, and for this, coherent formalization of ontologies is a prerequisite.
Biomedical Ontologies; Ontology Harmonization; Quality Assurance; SNOMED CT
The goal of the OBO (Open Biomedical Ontologies) Foundry initiative is to create and maintain an evolving collection of non-overlapping interoperable ontologies that will offer unambiguous representations of the types of entities in biological and biomedical reality. These ontologies are designed to serve non-redundant annotation of data and scientific text. To achieve these ends, the Foundry imposes strict requirements upon the ontologies eligible for inclusion. While these requirements are not met by most existing biomedical terminologies, the latter may nonetheless support the Foundry’s goal of consistent and non-redundant annotation if appropriate mappings of data annotated with their aid can be achieved. To construct such mappings in reliable fashion, however, it is necessary to analyze terminological resources from an ontologically realistic perspective in such a way as to identify the exact import of the ‘concepts’ and associated terms which they contain. We propose a framework for such analysis that is designed to maximize the degree to which legacy terminologies and the data coded with their aid can be successfully used for information-driven clinical and translational research.
Ontology; terminology; mapping
Since 2002 we have been testing and refining a methodology for ontology development that is now being used by multiple groups of researchers in different life science domains. Gary Merrill, in a recent paper in this journal, describes some of the reasons why this methodology has been found attractive by researchers in the biological and biomedical sciences. At the same time he assails the methodology on philosophical grounds, focusing specifically on our recommendation that ontologies developed for scientific purposes should be constructed in such a way that their terms are seen as referring to what we call universals or types in reality. As we show, Merrill’s critique is of little relevance to the success of our realist project, since it not only reveals no actual errors in our work but also criticizes views on universals that we do not in fact hold. However, it nonetheless provides us with a valuable opportunity to clarify the realist methodology, and to show how some of its principles are being applied, especially within the framework of the OBO (Open Biomedical Ontologies) Foundry initiative.
We describe a novel approach to genetic association analyses with proteins sub-divided into biologically relevant smaller sequence features (SFs), and their variant types (VTs). SFVT analyses are particularly informative for study of highly polymorphic proteins such as the human leukocyte antigen (HLA), given the nature of its genetic variation: the high level of polymorphism, the pattern of amino acid variability, and that most HLA variation occurs at functionally important sites, as well as its known role in organ transplant rejection, autoimmune disease development and response to infection. Further, combinations of variable amino acid sites shared by several HLA alleles (shared epitopes) are most likely better descriptors of the actual causative genetic variants. In a cohort of systemic sclerosis patients/controls, SFVT analysis shows that a combination of SFs implicating specific amino acid residues in peptide binding pockets 4 and 7 of HLA-DRB1 explains much of the molecular determinant of risk.