Search tips
Search criteria

Results 1-25 (1083055)

Clipboard (0)

Related Articles

1.  Unification of multi-species vertebrate anatomy ontologies for comparative biology in Uberon 
Elucidating disease and developmental dysfunction requires understanding variation in phenotype. Single-species model organism anatomy ontologies (ssAOs) have been established to represent this variation. Multi-species anatomy ontologies (msAOs; vertebrate skeletal, vertebrate homologous, teleost, amphibian AOs) have been developed to represent ‘natural’ phenotypic variation across species. Our aim has been to integrate ssAOs and msAOs for various purposes, including establishing links between phenotypic variation and candidate genes.
Previously, msAOs contained a mixture of unique and overlapping content. This hampered integration and coordination due to the need to maintain cross-references or inter-ontology equivalence axioms to the ssAOs, or to perform large-scale obsolescence and modular import. Here we present the unification of anatomy ontologies into Uberon, a single ontology resource that enables interoperability among disparate data and research groups. As a consequence, independent development of TAO, VSAO, AAO, and vHOG has been discontinued.
The newly broadened Uberon ontology is a unified cross-taxon resource for metazoans (animals) that has been substantially expanded to include a broad diversity of vertebrate anatomical structures, permitting reasoning across anatomical variation in extinct and extant taxa. Uberon is a core resource that supports single- and cross-species queries for candidate genes using annotations for phenotypes from the systematics, biodiversity, medical, and model organism communities, while also providing entities for logical definitions in the Cell and Gene Ontologies.
The ontology release files associated with the ontology merge described in this manuscript are available at:
Current ontology release files are available always available at:
PMCID: PMC4089931  PMID: 25009735
Evolutionary biology; Morphological variation; Phenotype; Semantic integration; Bio-ontology
2.  Logical Gene Ontology Annotations (GOAL): exploring gene ontology annotations with OWL 
Journal of Biomedical Semantics  2012;3(Suppl 1):S3.
Ontologies such as the Gene Ontology (GO) and their use in annotations make cross species comparisons of genes possible, along with a wide range of other analytical activities. The bio-ontologies community, in particular the Open Biomedical Ontologies (OBO) community, have provided many other ontologies and an increasingly large volume of annotations of gene products that can be exploited in query and analysis. As many annotations with different ontologies centre upon gene products, there is a possibility to explore gene products through multiple ontological perspectives at the same time. Questions could be asked that link a gene product’s function, process, cellular location, phenotype and disease. Current tools, such as AmiGO, allow exploration of genes based on their GO annotations, but not through multiple ontological perspectives. In addition, the semantics of these ontology’s representations should be able to, through automated reasoning, afford richer query opportunities of the gene product annotations than is currently possible.
To do this multi-perspective, richer querying of gene product annotations, we have created the Logical Gene Ontology, or GOAL ontology, in OWL that combines the Gene Ontology, Human Disease Ontology and the Mammalian Phenotype Ontology, together with classes that represent the annotations with these ontologies for mouse gene products. Each mouse gene product is represented as a class, with the appropriate relationships to the GO aspects, phenotype and disease with which it has been annotated. We then use defined classes to query these protein classes through automated reasoning, and to build a complex hierarchy of gene products. We have presented this through a Web interface that allows arbitrary queries to be constructed and the results displayed.
This standard use of OWL affords a rich interaction with Gene Ontology, Human Disease Ontology and Mammalian Phenotype Ontology annotations for the mouse, to give a fine partitioning of the gene products in the GOAL ontology. OWL in combination with automated reasoning can be effectively used to query across ontologies to ask biologically rich questions. We have demonstrated that automated reasoning can be used to deliver practical on-line querying support for the ontology annotations available for the mouse.
The GOAL Web page is to be found at
PMCID: PMC3337258  PMID: 22541594
3.  The clinical measurement, measurement method and experimental condition ontologies: expansion, improvements and new applications 
The Clinical Measurement Ontology (CMO), Measurement Method Ontology (MMO), and Experimental Condition Ontology (XCO) were originally developed at the Rat Genome Database (RGD) to standardize quantitative rat phenotype data in order to integrate results from multiple studies into the PhenoMiner database and data mining tool. These ontologies provide the framework for presenting what was measured, how it was measured, and under what conditions it was measured.
There has been a continuing expansion of subdomains in each ontology with a parallel 2–3 fold increase in the total number of terms, substantially increasing the size and improving the scope of the ontologies. The proportion of terms with textual definitions has increased from ~60% to over 80% with greater synchronization of format and content throughout the three ontologies. Representation of definition source Uniform Resource Identifiers (URI) has been standardized, including the removal of all non-URI characters, and systematic versioning of all ontology files has been implemented. The continued expansion and success of these ontologies has facilitated the integration of more than 60,000 records into the RGD PhenoMiner database. In addition, new applications of these ontologies, such as annotation of Quantitative Trait Loci (QTL), have been added at the sites actively using them, including RGD and the Animal QTL Database.
The improvements to these three ontologies have been substantial, and development is ongoing. New terms and expansions to the ontologies continue to be added as a result of active curation efforts at RGD and the Animal QTL database. Use of these vocabularies to standardize data representation for quantitative phenotypes and quantitative trait loci across databases for multiple species has demonstrated their utility for integrating diverse data types from multiple sources. These ontologies are freely available for download and use from the NCBO BioPortal website at (CMO), (MMO), and (XCO), or from the RGD ftp site at
PMCID: PMC3882879  PMID: 24103152
4.  The Gene Ontology (GO) Cellular Component Ontology: integration with SAO (Subcellular Anatomy Ontology) and other recent developments 
The Gene Ontology (GO) ( contains a set of terms for describing the activity and actions of gene products across all kingdoms of life. Each of these activities is executed in a location within a cell or in the vicinity of a cell. In order to capture this context, the GO includes a sub-ontology called the Cellular Component (CC) ontology (GO-CCO). The primary use of this ontology is for GO annotation, but it has also been used for phenotype annotation, and for the annotation of images. Another ontology with similar scope to the GO-CCO is the Subcellular Anatomy Ontology (SAO), part of the Neuroscience Information Framework Standard (NIFSTD) suite of ontologies. The SAO also covers cell components, but in the domain of neuroscience.
Recently, the GO-CCO was enriched in content and links to the Biological Process and Molecular Function branches of GO as well as to other ontologies. This was achieved in several ways. We carried out an amalgamation of SAO terms with GO-CCO ones; as a result, nearly 100 new neuroscience-related terms were added to the GO. The GO-CCO also contains relationships to GO Biological Process and Molecular Function terms, as well as connecting to external ontologies such as the Cell Ontology (CL). Terms representing protein complexes in the Protein Ontology (PRO) reference GO-CCO terms for their species-generic counterparts. GO-CCO terms can also be used to search a variety of databases.
In this publication we provide an overview of the GO-CCO, its overall design, and some recent extensions that make use of additional spatial information. One of the most recent developments of the GO-CCO was the merging in of the SAO, resulting in a single unified ontology designed to serve the needs of GO annotators as well as the specific needs of the neuroscience community.
PMCID: PMC3852282  PMID: 24093723
Gene ontology; Cellular component ontology; Subcellular anatomy ontology; Neuroscience; Annotation; Ontology language; Ontology integration; Neuroscience information framework
5.  Multifunctional crop trait ontology for breeders' data: field book, annotation, data discovery and semantic enrichment of the literature 
AoB Plants  2010;2010:plq008.
The ‘Crop Ontology’ database we describe provides a controlled vocabulary for several economically important crops. It facilitates data integration and discovery from global databases and digital literature. This allows researchers to exploit comparative phenotypic and genotypic information of crops to elucidate functional aspects of traits.
Background and aims
Agricultural crop databases maintained in gene banks of the Consultative Group on International Agricultural Research (CGIAR) are valuable sources of information for breeders. These databases provide comparative phenotypic and genotypic information that can help elucidate functional aspects of plant and agricultural biology. To facilitate data sharing within and between these databases and the retrieval of information, the crop ontology (CO) database was designed to provide controlled vocabulary sets for several economically important plant species.
Existing public ontologies and equivalent catalogues of concepts covering the range of crop science information and descriptors for crops and crop-related traits were collected from breeders, physiologists, agronomists, and researchers in the CGIAR consortium. For each crop, relationships between terms were identified and crop-specific trait ontologies were constructed following the Open Biomedical Ontologies (OBO) format standard using the OBO-Edit tool. All terms within an ontology were assigned a globally unique CO term identifier.
Principal results
The CO currently comprises crop-specific traits for chickpea (Cicer arietinum), maize (Zea mays), potato (Solanum tuberosum), rice (Oryza sativa), sorghum (Sorghum spp.) and wheat (Triticum spp.). Several plant-structure and anatomy-related terms for banana (Musa spp.), wheat and maize are also included. In addition, multi-crop passport terms are included as controlled vocabularies for sharing information on germplasm. Two web-based online resources were built to make these COs available to the scientific community: the ‘CO Lookup Service’ for browsing the CO; and the ‘Crops Terminizer’, an ontology text mark-up tool.
The controlled vocabularies of the CO are being used to curate several CGIAR centres' agronomic databases. The use of ontology terms to describe agronomic phenotypes and the accurate mapping of these descriptions into databases will be important steps in comparative phenotypic and genotypic studies across species and gene-discovery experiments.
PMCID: PMC3000699  PMID: 22476066
6.  Evolutionary Characters, Phenotypes and Ontologies: Curating Data from the Systematic Biology Literature 
PLoS ONE  2010;5(5):e10708.
The wealth of phenotypic descriptions documented in the published articles, monographs, and dissertations of phylogenetic systematics is traditionally reported in a free-text format, and it is therefore largely inaccessible for linkage to biological databases for genetics, development, and phenotypes, and difficult to manage for large-scale integrative work. The Phenoscape project aims to represent these complex and detailed descriptions with rich and formal semantics that are amenable to computation and integration with phenotype data from other fields of biology. This entails reconceptualizing the traditional free-text characters into the computable Entity-Quality (EQ) formalism using ontologies.
Methodology/Principal Findings
We used ontologies and the EQ formalism to curate a collection of 47 phylogenetic studies on ostariophysan fishes (including catfishes, characins, minnows, knifefishes) and their relatives with the goal of integrating these complex phenotype descriptions with information from an existing model organism database (zebrafish, We developed a curation workflow for the collection of character, taxonomic and specimen data from these publications. A total of 4,617 phenotypic characters (10,512 states) for 3,449 taxa, primarily species, were curated into EQ formalism (for a total of 12,861 EQ statements) using anatomical and taxonomic terms from teleost-specific ontologies (Teleost Anatomy Ontology and Teleost Taxonomy Ontology) in combination with terms from a quality ontology (Phenotype and Trait Ontology). Standards and guidelines for consistently and accurately representing phenotypes were developed in response to the challenges that were evident from two annotation experiments and from feedback from curators.
The challenges we encountered and many of the curation standards and methods for improving consistency that we developed are generally applicable to any effort to represent phenotypes using ontologies. This is because an ontological representation of the detailed variations in phenotype, whether between mutant or wildtype, among individual humans, or across the diversity of species, requires a process by which a precise combination of terms from domain ontologies are selected and organized according to logical relations. The efficiencies that we have developed in this process will be useful for any attempt to annotate complex phenotypic descriptions using ontologies. We also discuss some ramifications of EQ representation for the domain of systematics.
PMCID: PMC2873956  PMID: 20505755
7.  Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation 
PLoS Biology  2009;7(11):e1000247.
A novel method for quantifying the similarity between phenotypes by the use of ontologies can be used to search for candidate genes, pathway members, and human disease models on the basis of phenotypes alone.
Scientists and clinicians who study genetic alterations and disease have traditionally described phenotypes in natural language. The considerable variation in these free-text descriptions has posed a hindrance to the important task of identifying candidate genes and models for human diseases and indicates the need for a computationally tractable method to mine data resources for mutant phenotypes. In this study, we tested the hypothesis that ontological annotation of disease phenotypes will facilitate the discovery of new genotype-phenotype relationships within and across species. To describe phenotypes using ontologies, we used an Entity-Quality (EQ) methodology, wherein the affected entity (E) and how it is affected (Q) are recorded using terms from a variety of ontologies. Using this EQ method, we annotated the phenotypes of 11 gene-linked human diseases described in Online Mendelian Inheritance in Man (OMIM). These human annotations were loaded into our Ontology-Based Database (OBD) along with other ontology-based phenotype descriptions of mutants from various model organism databases. Phenotypes recorded with this EQ method can be computationally compared based on the hierarchy of terms in the ontologies and the frequency of annotation. We utilized four similarity metrics to compare phenotypes and developed an ontology of homologous and analogous anatomical structures to compare phenotypes between species. Using these tools, we demonstrate that we can identify, through the similarity of the recorded phenotypes, other alleles of the same gene, other members of a signaling pathway, and orthologous genes and pathway members across species. We conclude that EQ-based annotation of phenotypes, in conjunction with a cross-species ontology, and a variety of similarity metrics can identify biologically meaningful similarities between genes by comparing phenotypes alone. This annotation and search method provides a novel and efficient means to identify gene candidates and animal models of human disease, which may shorten the lengthy path to identification and understanding of the genetic basis of human disease.
Author Summary
Model organisms such as fruit flies, mice, and zebrafish are useful for investigating gene function because they are easy to grow, dissect, and genetically manipulate in the laboratory. By examining mutations in these organisms, one can identify candidate genes that cause disease in humans, and develop models to better understand human disease and gene function. A fundamental roadblock for analysis is, however, the lack of a computational method for describing and comparing phenotypes of mutant animals and of human diseases when the genetic basis is unknown. We describe here a novel method using ontologies to record and quantify the similarity between phenotypes. We tested our method by using the annotated mutant phenotype of one member of the Hedgehog signaling pathway in zebrafish to identify other pathway members with similar recorded phenotypes. We also compared human disease phenotypes to those produced by mutation in model organisms, and show that orthologous and biologically relevant genes can be identified by this method. Given that the genetic basis of human disease is often unknown, this method provides a means for identifying candidate genes, pathway members, and disease models by computationally identifying similar phenotypes within and across species.
PMCID: PMC2774506  PMID: 19956802
8.  Semantic integration of physiology phenotypes with an application to the Cellular Phenotype Ontology 
Bioinformatics  2012;28(13):1783-1789.
Motivation: The systematic observation of phenotypes has become a crucial tool of functional genomics, and several large international projects are currently underway to identify and characterize the phenotypes that are associated with genotypes in several species. To integrate phenotype descriptions within and across species, phenotype ontologies have been developed. Applying ontologies to unify phenotype descriptions in the domain of physiology has been a particular challenge due to the high complexity of the underlying domain.
Results: In this study, we present the outline of a theory and its implementation for an ontology of physiology-related phenotypes. We provide a formal description of process attributes and relate them to the attributes of their temporal parts and participants. We apply our theory to create the Cellular Phenotype Ontology (CPO). The CPO is an ontology of morphological and physiological phenotypic characteristics of cells, cell components and cellular processes. Its prime application is to provide terms and uniform definition patterns for the annotation of cellular phenotypes. The CPO can be used for the annotation of observed abnormalities in domains, such as systems microscopy, in which cellular abnormalities are observed and for which no phenotype ontology has been created.
Availability and implementation: The CPO and the source code we generated to create the CPO are freely available on
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3381966  PMID: 22539675
9.  The Teleost Anatomy Ontology: Anatomical Representation for the Genomics Age 
Systematic Biology  2010;59(4):369-383.
The rich knowledge of morphological variation among organisms reported in the systematic literature has remained in free-text format, impractical for use in large-scale synthetic phylogenetic work. This noncomputable format has also precluded linkage to the large knowledgebase of genomic, genetic, developmental, and phenotype data in model organism databases. We have undertaken an effort to prototype a curated, ontology-based evolutionary morphology database that maps to these genetic databases ( to facilitate investigation into the mechanistic basis and evolution of phenotypic diversity. Among the first requirements in establishing this database was the development of a multispecies anatomy ontology with the goal of capturing anatomical data in a systematic and computable manner. An ontology is a formal representation of a set of concepts with defined relationships between those concepts. Multispecies anatomy ontologies in particular are an efficient way to represent the diversity of morphological structures in a clade of organisms, but they present challenges in their development relative to single-species anatomy ontologies. Here, we describe the Teleost Anatomy Ontology (TAO), a multispecies anatomy ontology for teleost fishes derived from the Zebrafish Anatomical Ontology (ZFA) for the purpose of annotating varying morphological features across species. To facilitate interoperability with other anatomy ontologies, TAO uses the Common Anatomy Reference Ontology as a template for its upper level nodes, and TAO and ZFA are synchronized, with zebrafish terms specified as subtypes of teleost terms. We found that the details of ontology architecture have ramifications for querying, and we present general challenges in developing a multispecies anatomy ontology, including refinement of definitions, taxon-specific relationships among terms, and representation of taxonomically variable developmental pathways.
PMCID: PMC2885267  PMID: 20547776
Bioinformatics; devo-evo; fish; morphology; ontology; Teleostei
10.  The mouse pathology ontology, MPATH; structure and applications 
The capture and use of disease-related anatomic pathology data for both model organism phenotyping and human clinical practice requires a relatively simple nomenclature and coding system that can be integrated into data collection platforms (such as computerized medical record-keeping systems) to enable the pathologist to rapidly screen and accurately record observations. The MPATH ontology was originally constructed in 2,000 by a committee of pathologists for the annotation of rodent histopathology images, but is now widely used for coding and analysis of disease and phenotype data for rodents, humans and zebrafish.
Construction and content
MPATH is divided into two main branches describing pathological processes and structures based on traditional histopathological principles. It does not aim to include definitive diagnoses, which would generally be regarded as disease concepts. It contains 888 core pathology terms in an almost exclusively is_a hierarchy nine layers deep. Currently, 86% of the terms have textual definitions and contain relationships as well as logical axioms to other ontologies such the Gene Ontology.
Application and utility
MPATH was originally devised for the annotation of histopathological images from mice but is now being used much more widely in the recording of diagnostic and phenotypic data from both mice and humans, and in the construction of logical definitions for phenotype and disease ontologies. We discuss the use of MPATH to generate cross-products with qualifiers derived from a subset of the Phenotype and Trait Ontology (PATO) and its application to large-scale high-throughput phenotyping studies. MPATH provides a largely species-agnostic ontology for the descriptions of anatomic pathology, which can be applied to most amniotes and is now finding extensive use in species other than mice. It enables investigators to interrogate large datasets at a variety of depths, use semantic analysis to identify the relations between diseases in different species and integrate pathology data with other data types, such as pharmacogenomics.
PMCID: PMC3851164  PMID: 24033988
Pathology; Ontology; Disease; Mouse; Phenotype
11.  A knowledge based approach to matching human neurodegenerative disease and animal models 
Neurodegenerative diseases present a wide and complex range of biological and clinical features. Animal models are key to translational research, yet typically only exhibit a subset of disease features rather than being precise replicas of the disease. Consequently, connecting animal to human conditions using direct data-mining strategies has proven challenging, particularly for diseases of the nervous system, with its complicated anatomy and physiology. To address this challenge we have explored the use of ontologies to create formal descriptions of structural phenotypes across scales that are machine processable and amenable to logical inference. As proof of concept, we built a Neurodegenerative Disease Phenotype Ontology (NDPO) and an associated Phenotype Knowledge Base (PKB) using an entity-quality model that incorporates descriptions for both human disease phenotypes and those of animal models. Entities are drawn from community ontologies made available through the Neuroscience Information Framework (NIF) and qualities are drawn from the Phenotype and Trait Ontology (PATO). We generated ~1200 structured phenotype statements describing structural alterations at the subcellular, cellular and gross anatomical levels observed in 11 human neurodegenerative conditions and associated animal models. PhenoSim, an open source tool for comparing phenotypes, was used to issue a series of competency questions to compare individual phenotypes among organisms and to determine which animal models recapitulate phenotypic aspects of the human disease in aggregate. Overall, the system was able to use relationships within the ontology to bridge phenotypes across scales, returning non-trivial matches based on common subsumers that were meaningful to a neuroscientist with an advanced knowledge of neuroanatomy. The system can be used both to compare individual phenotypes and also phenotypes in aggregate. This proof of concept suggests that expressing complex phenotypes using formal ontologies provides considerable benefit for comparing phenotypes across scales and species.
PMCID: PMC3653101  PMID: 23717278
phenotype; ontology; Neuroscience Information Framework; neurodegenerative disease; semantics
12.  A Gross Anatomy Ontology for Hymenoptera 
PLoS ONE  2010;5(12):e15991.
Hymenoptera is an extraordinarily diverse lineage, both in terms of species numbers and morphotypes, that includes sawflies, bees, wasps, and ants. These organisms serve critical roles as herbivores, predators, parasitoids, and pollinators, with several species functioning as models for agricultural, behavioral, and genomic research. The collective anatomical knowledge of these insects, however, has been described or referred to by labels derived from numerous, partially overlapping lexicons. The resulting corpus of information—millions of statements about hymenopteran phenotypes—remains inaccessible due to language discrepancies. The Hymenoptera Anatomy Ontology (HAO) was developed to surmount this challenge and to aid future communication related to hymenopteran anatomy. The HAO was built using newly developed interfaces within mx, a Web-based, open source software package, that enables collaborators to simultaneously contribute to an ontology. Over twenty people contributed to the development of this ontology by adding terms, genus differentia, references, images, relationships, and annotations. The database interface returns an Open Biomedical Ontology (OBO) formatted version of the ontology and includes mechanisms for extracting candidate data and for publishing a searchable ontology to the Web. The application tools are subject-agnostic and may be used by others initiating and developing ontologies. The present core HAO data constitute 2,111 concepts, 6,977 terms (labels for concepts), 3,152 relations, 4,361 sensus (links between terms, concepts, and references) and over 6,000 text and graphical annotations. The HAO is rooted with the Common Anatomy Reference Ontology (CARO), in order to facilitate interoperability with and future alignment to other anatomy ontologies, and is available through the OBO Foundry ontology repository and BioPortal. The HAO provides a foundation through which connections between genomic, evolutionary developmental biology, phylogenetic, taxonomic, and morphological research can be actualized. Inherent mechanisms for feedback and content delivery demonstrate the effectiveness of remote, collaborative ontology development and facilitate future refinement of the HAO.
PMCID: PMC3012123  PMID: 21209921
13.  Improving ontologies by automatic reasoning and evaluation of logical definitions 
BMC Bioinformatics  2011;12:418.
Ontologies are widely used to represent knowledge in biomedicine. Systematic approaches for detecting errors and disagreements are needed for large ontologies with hundreds or thousands of terms and semantic relationships. A recent approach of defining terms using logical definitions is now increasingly being adopted as a method for quality control as well as for facilitating interoperability and data integration.
We show how automated reasoning over logical definitions of ontology terms can be used to improve ontology structure. We provide the Java software package GULO (Getting an Understanding of LOgical definitions), which allows fast and easy evaluation for any kind of logically decomposed ontology by generating a composite OWL ontology from appropriate subsets of the referenced ontologies and comparing the inferred relationships with the relationships asserted in the target ontology. As a case study we show how to use GULO to evaluate the logical definitions that have been developed for the Mammalian Phenotype Ontology (MPO).
Logical definitions of terms from biomedical ontologies represent an important resource for error and disagreement detection. GULO gives ontology curators a fast and simple tool for validation of their work.
PMCID: PMC3224779  PMID: 22032770
14.  Semantics in Support of Biodiversity Knowledge Discovery: An Introduction to the Biological Collections Ontology and Related Ontologies 
PLoS ONE  2014;9(3):e89606.
The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers.
PMCID: PMC3940615  PMID: 24595056
15.  The Vertebrate Trait Ontology: a controlled vocabulary for the annotation of trait data across species 
The use of ontologies to standardize biological data and facilitate comparisons among datasets has steadily grown as the complexity and amount of available data have increased. Despite the numerous ontologies available, one area currently lacking a robust ontology is the description of vertebrate traits. A trait is defined as any measurable or observable characteristic pertaining to an organism or any of its substructures. While there are several ontologies to describe entities and processes in phenotypes, diseases, and clinical measurements, one has not been developed for vertebrate traits; the Vertebrate Trait Ontology (VT) was created to fill this void.
Significant inconsistencies in trait nomenclature exist in the literature, and additional difficulties arise when trait data are compared across species. The VT is a unified trait vocabulary created to aid in the transfer of data within and between species and to facilitate investigation of the genetic basis of traits. Trait information provides a valuable link between the measurements that are used to assess the trait, the phenotypes related to the traits, and the diseases associated with one or more phenotypes. Because multiple clinical and morphological measurements are often used to assess a single trait, and a single measurement can be used to assess multiple physiological processes, providing investigators with standardized annotations for trait data will allow them to investigate connections among these data types.
The annotation of genomic data with ontology terms provides unique opportunities for data mining and analysis. Links between data in disparate databases can be identified and explored, a strategy that is particularly useful for cross-species comparisons or in situations involving inconsistent terminology. The VT provides a common basis for the description of traits in multiple vertebrate species. It is being used in the Rat Genome Database and Animal QTL Database for annotation of QTL data for rat, cattle, chicken, swine, sheep, and rainbow trout, and in the Mouse Phenome Database to annotate strain characterization data. In these databases, data are also cross-referenced to applicable terms from other ontologies, providing additional avenues for data mining and analysis. The ontology is available at
PMCID: PMC3851175  PMID: 23937709
Quantitative trait loci; Gene association; Trait ontology
16.  Ontological Discovery Environment: A system for integrating gene-phenotype associations 
Genomics  2009;94(6):377-387.
The wealth of genomic technologies has enabled biologists to rapidly ascribe phenotypic characters to biological substrates. Central to effective biological investigation is the operational definition of the process under investigation. We propose an elucidation of categories of biological characters, including disease relevant traits, based on natural endogenous processes and experimentally observed biological networks, pathways and systems rather than on externally manifested constructs and current semantics such as disease names and processes. The Ontological Discovery Environment (ODE) is an Internet accessible resource for the storage, sharing, retrieval and analysis of phenotype-centered genomic data sets across species and experimental model systems. Any type of data set representing gene-phenotype relationships, such quantitative trait loci (QTL) positional candidates, literature reviews, microarray experiments, ontological or even meta-data, may serve as inputs. To demonstrate a use case leveraging the homology capabilities of ODE and its ability to synthesize diverse data sets, we conducted an analysis of genomic studies related to alcoholism. The core of ODE’s gene-set similarity, distance and hierarchical analysis is the creation of a bipartite network of gene-phenotype relations, a unique discrete graph approach to analysis that enables set-set matching of non-referential data. Gene sets are annotated with several levels of metadata, including community ontologies, while gene set translations compare models across species. Computationally derived gene sets are integrated into hierarchical trees based on gene-derived phenotype interdependencies. Automated set identifications are augmented by statistical tools which enable users to interpret the confidence of modeled results. This approach allows data integration and hypothesis discovery across multiple experimental contexts, regardless of the face similarity and semantic annotation of the experimental systems or species domain.
PMCID: PMC2783409  PMID: 19733230
homology; combinatorial algorithms; microarray; ontology
17.  Ontology for Vector Surveillance and Management 
Journal of medical entomology  2013;50(1):1-14.
Ontologies, which are made up by standardized and defined controlled vocabulary terms and their interrelationships, are comprehensive and readily searchable repositories for knowledge in a given domain. The Open Biomedical Ontologies (OBO) Foundry was initiated in 2001 with the aims of becoming an “umbrella” for life-science ontologies and promoting the use of ontology development best practices. A software application (OBO-Edit; *.obo file format) was developed to facilitate ontology development and editing. The OBO Foundry now comprises over 100 ontologies and candidate ontologies, including the NCBI organismal classification ontology (NCBITaxon), the Mosquito Insecticide Resistance Ontology (MIRO), the Infectious Disease Ontology (IDO), the IDOMAL malaria ontology, and ontologies for mosquito gross anatomy and tick gross anatomy. We previously developed a disease data management system for dengue and malaria control programs, which incorporated a set of information trees built upon ontological principles, including a “term tree” to promote the use of standardized terms. In the course of doing so, we realized that there were substantial gaps in existing ontologies with regards to concepts, processes, and, especially, physical entities (e.g., vector species, pathogen species, and vector surveillance and management equipment) in the domain of surveillance and management of vectors and vector-borne pathogens. We therefore produced an ontology for vector surveillance and management, focusing on arthropod vectors and vector-borne pathogens with relevance to humans or domestic animals, and with special emphasis on content to support operational activities through inclusion in databases, data management systems, or decision support systems. The Vector Surveillance and Management Ontology (VSMO) includes >2,200 unique terms, of which the vast majority (>80%) were newly generated during the development of this ontology. One core feature of the VSMO is the linkage, through the has_vector relation, of arthropod species to the pathogenic microorganisms for which they serve as biological vectors. We also recognized and addressed a potential roadblock for use of the VSMO by the vector-borne disease community: the difficulty in extracting information from OBO-Edit ontology files (*.obo files) and exporting the information to other file formats. A novel ontology explorer tool was developed to facilitate extraction and export of information from the VSMO *.obo file into lists of terms and their associated unique IDs in *.txt or *.csv file formats. These lists can then be imported into a database or data management system for use as select lists with predefined terms. This is an important step to ensure that the knowledge contained in our ontology can be put into practical use.
PMCID: PMC3695545  PMID: 23427646
ontology; vector; pathogen; surveillance; management
18.  Enhanced XAO: the ontology of Xenopus anatomy and development underpins more accurate annotation of gene expression and queries on Xenbase 
The African clawed frogs Xenopus laevis and Xenopus tropicalis are prominent animal model organisms. Xenopus research contributes to the understanding of genetic, developmental and molecular mechanisms underlying human disease. The Xenopus Anatomy Ontology (XAO) reflects the anatomy and embryological development of Xenopus. The XAO provides consistent terminology that can be applied to anatomical feature descriptions along with a set of relationships that indicate how each anatomical entity is related to others in the embryo, tadpole, or adult frog. The XAO is integral to the functionality of Xenbase (, the Xenopus model organism database.
We significantly expanded the XAO in the last five years by adding 612 anatomical terms, 2934 relationships between them, 640 synonyms, and 547 ontology cross-references. Each term now has a definition, so database users and curators can be certain they are selecting the correct term when specifying an anatomical entity. With developmental timing information now asserted for every anatomical term, the ontology provides internal checks that ensure high-quality gene expression and phenotype data annotation. The XAO, now with 1313 defined anatomical and developmental stage terms, has been integrated with Xenbase expression and anatomy term searches and it enables links between various data types including images, clones, and publications. Improvements to the XAO structure and anatomical definitions have also enhanced cross-references to anatomy ontologies of other model organisms and humans, providing a bridge between Xenopus data and other vertebrates. The ontology is free and open to all users.
The expanded and improved XAO allows enhanced capture of Xenopus research data and aids mechanisms for performing complex retrieval and analysis of gene expression, phenotypes, and antibodies through text-matching and manual curation. Its comprehensive references to ontologies across taxa help integrate these data for human disease modeling.
PMCID: PMC3816597  PMID: 24139024
Anatomy; Bioinformatics; Data annotation; Developmental biology; Embryology; Model organism database; Ontology; Xenopus
19.  Benchmarking Ontologies: Bigger or Better? 
PLoS Computational Biology  2011;7(1):e1001055.
A scientific ontology is a formal representation of knowledge within a domain, typically including central concepts, their properties, and relations. With the rise of computers and high-throughput data collection, ontologies have become essential to data mining and sharing across communities in the biomedical sciences. Powerful approaches exist for testing the internal consistency of an ontology, but not for assessing the fidelity of its domain representation. We introduce a family of metrics that describe the breadth and depth with which an ontology represents its knowledge domain. We then test these metrics using (1) four of the most common medical ontologies with respect to a corpus of medical documents and (2) seven of the most popular English thesauri with respect to three corpora that sample language from medicine, news, and novels. Here we show that our approach captures the quality of ontological representation and guides efforts to narrow the breach between ontology and collective discourse within a domain. Our results also demonstrate key features of medical ontologies, English thesauri, and discourse from different domains. Medical ontologies have a small intersection, as do English thesauri. Moreover, dialects characteristic of distinct domains vary strikingly as many of the same words are used quite differently in medicine, news, and novels. As ontologies are intended to mirror the state of knowledge, our methods to tighten the fit between ontology and domain will increase their relevance for new areas of biomedical science and improve the accuracy and power of inferences computed across them.
Author Summary
An ontology represents the concepts and their interrelation within a knowledge domain. Several ontologies have been developed in biomedicine, which provide standardized vocabularies to describe diseases, genes and gene products, physiological phenotypes, anatomical structures, and many other phenomena. Scientists use them to encode the results of complex experiments and observations and to perform integrative analysis to discover new knowledge. A remaining challenge in ontology development is how to evaluate an ontology's representation of knowledge within its scientific domain. Building on classic measures from information retrieval, we introduce a family of metrics including breadth and depth that capture the conceptual coverage and parsimony of an ontology. We test these measures using (1) four commonly used medical ontologies in relation to a corpus of medical documents and (2) seven popular English thesauri (ontologies of synonyms) with respect to text from medicine, news, and novels. Results demonstrate that both medical ontologies and English thesauri have a small overlap in concepts and relations. Our methods suggest efforts to tighten the fit between ontologies and biomedical knowledge.
PMCID: PMC3020923  PMID: 21249231
20.  Nose to tail, roots to shoots: spatial descriptors for phenotypic diversity in the Biological Spatial Ontology 
Spatial terminology is used in anatomy to indicate precise, relative positions of structures in an organism. While these terms are often standardized within specific fields of biology, they can differ dramatically across taxa. Such differences in usage can impair our ability to unambiguously refer to anatomical position when comparing anatomy or phenotypes across species. We developed the Biological Spatial Ontology (BSPO) to standardize the description of spatial and topological relationships across taxa to enable the discovery of comparable phenotypes.
BSPO currently contains 146 classes and 58 relations representing anatomical axes, gradients, regions, planes, sides, and surfaces. These concepts can be used at multiple biological scales and in a diversity of taxa, including plants, animals and fungi. The BSPO is used to provide a source of anatomical location descriptors for logically defining anatomical entity classes in anatomy ontologies. Spatial reasoning is further enhanced in anatomy ontologies by integrating spatial relations such as dorsal_to into class descriptions (e.g., ‘dorsolateral placode’ dorsal_to some ‘epibranchial placode’).
The BSPO is currently used by projects that require standardized anatomical descriptors for phenotype annotation and ontology integration across a diversity of taxa. Anatomical location classes are also useful for describing phenotypic differences, such as morphological variation in position of structures resulting from evolution within and across species.
PMCID: PMC4137724  PMID: 25140222
Anatomy; Spatial relationships; Position; Axes; Reasoning; BSPO; Ontology; Phenotype
21.  Modular Ontology Techniques and their Applications in the Biomedical Domain 
In the past several years, various ontologies and terminologies such as the Gene Ontology have been developed to enable interoperability across multiple diverse medical information systems. They provide a standard way of representing terms and concepts thereby supporting easy transmission and interpretation of data for various applications. However, with their growing utilization, not only has the number of available ontologies increased considerably, but they are also becoming larger and more complex to manage. Toward this end, a growing body of work is emerging in the area of modular ontologies where the emphasis is on either extracting and managing “modules” of an ontology relevant to a particular application scenario (ontology decomposition) or developing them independently and integrating into a larger ontology (ontology composition). In this paper, we investigate state-of-the-art approaches in modular ontologies focusing on techniques that are based on rigorous logical formalisms as well as well-studied graph theories. We analyze and compare how such approaches can be leveraged in developing tools and applications in the biomedical domain. We conclude by highlighting some of the limitations of the modular ontology formalisms and put forward additional requirements to steer their future development.
PMCID: PMC3101570  PMID: 21625409
22.  Survey of modular ontology techniques and their applications in the biomedical domain 
In the past several years, various ontologies and terminologies such as the Gene Ontology have been developed to enable interoperability across multiple diverse medical information systems. They provide a standard way of representing terms and concepts thereby supporting easy transmission and interpretation of data for various applications. However, with their growing utilization, not only has the number of available ontologies increased considerably, but they are also becoming larger and more complex to manage. Toward this end, a growing body of work is emerging in the area of modular ontologies where the emphasis is on either extracting and managing “modules” of an ontology relevant to a particular application scenario (ontology decomposition) or developing them independently and integrating into a larger ontology (ontology composition). In this paper, we investigate state-of-the-art approaches in modular ontologies focusing on techniques that are based on rigorous logical formalisms as well as well-studied graph theories. We analyze and compare how such approaches can be leveraged in developing tools and applications in the biomedical domain. We conclude by highlighting some of the limitations of the modular ontology formalisms and put forward additional requirements to steer their future development.
PMCID: PMC3113511  PMID: 21686030
23.  A Prototype Symbolic Model of Canonical Functional Neuroanatomy of the Motor System 
Journal of biomedical informatics  2007;41(2):251-263.
Recent advances in bioinformatics have opened entire new avenues for organizing, integrating and retrieving neuroscientific data, in a digital, machine-processable format, which can be at the same time understood by humans, using ontological, symbolic data representations. Declarative information stored in ontological format can be perused and maintained by domain experts, interpreted by machines, and serve as basis for a multitude of decision-support, computerized simulation, data mining, and teaching applications.
We have developed a prototype symbolic model of canonical neuroanatomy of the motor system. Our symbolic model is intended to support symbolic lookup, logical inference and mathematical modeling by integrating descriptive, qualitative and quantitative functional neuroanatomical knowledge. Furthermore, we show how our approach can be extended to modeling impaired brain connectivity in disease states, such as common movement disorders.
In developing our ontology, we adopted a disciplined modeling approach, relying on a set of declared principles, a high-level schema, Aristotelian definitions, and a frame-based authoring system. These features, along with the use of the Unified Medical Language System (UMLS) vocabulary, enable the alignment of our functional ontology with an existing comprehensive ontology of human anatomy, and thus allow for combining the structural and functional views of neuroanatomy for clinical decision support and neuroanatomy teaching applications.
Although the scope of our current prototype ontology is limited to a particular functional system in the brain, it may be possible to adapt this approach for modeling other brain functional systems as well.
PMCID: PMC2376098  PMID: 18164666
functional neuroanatomy; ontology; neural network; motor system; basal ganglia; disease model; Parkinson’s disease; Chorea; Hemiballism
24.  The Plant Ontology as a Tool for Comparative Plant Anatomy and Genomic Analyses 
Plant and Cell Physiology  2012;54(2):e1.
The Plant Ontology (PO; is a publicly available, collaborative effort to develop and maintain a controlled, structured vocabulary (‘ontology’) of terms to describe plant anatomy, morphology and the stages of plant development. The goals of the PO are to link (annotate) gene expression and phenotype data to plant structures and stages of plant development, using the data model adopted by the Gene Ontology. From its original design covering only rice, maize and Arabidopsis, the scope of the PO has been expanded to include all green plants. The PO was the first multispecies anatomy ontology developed for the annotation of genes and phenotypes. Also, to our knowledge, it was one of the first biological ontologies that provides translations (via synonyms) in non-English languages such as Japanese and Spanish. As of Release #18 (July 2012), there are about 2.2 million annotations linking PO terms to >110,000 unique data objects representing genes or gene models, proteins, RNAs, germplasm and quantitative trait loci (QTLs) from 22 plant species. In this paper, we focus on the plant anatomical entity branch of the PO, describing the organizing principles, resources available to users and examples of how the PO is integrated into other plant genomics databases and web portals. We also provide two examples of comparative analyses, demonstrating how the ontology structure and PO-annotated data can be used to discover the patterns of expression of the LEAFY (LFY) and terpene synthase (TPS) gene homologs.
PMCID: PMC3583023  PMID: 23220694
Bioinformatics; Comparative genomics; Genome annotation; Ontology; Plant anatomy; Terpene synthase
25.  The neurological disease ontology 
We are developing the Neurological Disease Ontology (ND) to provide a framework to enable representation of aspects of neurological diseases that are relevant to their treatment and study. ND is a representational tool that addresses the need for unambiguous annotation, storage, and retrieval of data associated with the treatment and study of neurological diseases. ND is being developed in compliance with the Open Biomedical Ontology Foundry principles and builds upon the paradigm established by the Ontology for General Medical Science (OGMS) for the representation of entities in the domain of disease and medical practice. Initial applications of ND will include the annotation and analysis of large data sets and patient records for Alzheimer’s disease, multiple sclerosis, and stroke.
ND is implemented in OWL 2 and currently has more than 450 terms that refer to and describe various aspects of neurological diseases. ND directly imports the development version of OGMS, which uses BFO 2. Term development in ND has primarily extended the OGMS terms ‘disease’, ‘diagnosis’, ‘disease course’, and ‘disorder’. We have imported and utilize over 700 classes from related ontology efforts including the Foundational Model of Anatomy, Ontology for Biomedical Investigations, and Protein Ontology. ND terms are annotated with ontology metadata such as a label (term name), term editors, textual definition, definition source, curation status, and alternative terms (synonyms). Many terms have logical definitions in addition to these annotations. Current development has focused on the establishment of the upper-level structure of the ND hierarchy, as well as on the representation of Alzheimer’s disease, multiple sclerosis, and stroke. The ontology is available as a version-controlled file at along with a discussion list and an issue tracker.
ND seeks to provide a formal foundation for the representation of clinical and research data pertaining to neurological diseases. ND will enable its users to connect data in a robust way with related data that is annotated using other terminologies and ontologies in the biomedical domain.
PMCID: PMC4028878  PMID: 24314207

Results 1-25 (1083055)