Search tips
Search criteria

Results 1-25 (29)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
Document Types
1.  A method for increasing expressivity of Gene Ontology annotations using a compositional approach 
BMC Bioinformatics  2014;15:155.
The Gene Ontology project integrates data about the function of gene products across a diverse range of organisms, allowing the transfer of knowledge from model organisms to humans, and enabling computational analyses for interpretation of high-throughput experimental and clinical data. The core data structure is the annotation, an association between a gene product and a term from one of the three ontologies comprising the GO. Historically, it has not been possible to provide additional information about the context of a GO term, such as the target gene or the location of a molecular function. This has limited the specificity of knowledge that can be expressed by GO annotations.
The GO Consortium has introduced annotation extensions that enable manually curated GO annotations to capture additional contextual details. Extensions represent effector–target relationships such as localization dependencies, substrates of protein modifiers and regulation targets of signaling pathways and transcription factors as well as spatial and temporal aspects of processes such as cell or tissue type or developmental stage. We describe the content and structure of annotation extensions, provide examples, and summarize the current usage of annotation extensions.
The additional contextual information captured by annotation extensions improves the utility of functional annotation by representing dependencies between annotations to terms in the different ontologies of GO, external ontologies, or an organism’s gene products. These enhanced annotations can also support sophisticated queries and reasoning, and will provide curated, directional links between many gene products to support pathway and network reconstruction.
PMCID: PMC4039540  PMID: 24885854
Gene Ontology; Functional annotation; Annotation extension; Manual curation
2.  Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research 
F1000Research  2014;2:30.
Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species.
We have generated a cross-species phenotype ontology for human, mouse and zebrafish that contains classes from the Human Phenotype Ontology, Mammalian Phenotype Ontology, and generated classes for zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases.
This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from
PMCID: PMC3799545  PMID: 24358873
3.  The environment ontology: contextualising biological and biomedical entities 
As biological and biomedical research increasingly reference the environmental context of the biological entities under study, the need for formalisation and standardisation of environment descriptors is growing. The Environment Ontology (ENVO; is a community-led, open project which seeks to provide an ontology for specifying a wide range of environments relevant to multiple life science disciplines and, through an open participation model, to accommodate the terminological requirements of all those needing to annotate data using ontology classes. This paper summarises ENVO’s motivation, content, structure, adoption, and governance approach. The ontology is available from - an OBO format version is also available by switching the file suffix to “obo”.
PMCID: PMC3904460  PMID: 24330602
Environment; Ecosystem; Biome; Ontology
4.  The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data 
Nucleic Acids Research  2013;42(D1):D966-D974.
The Human Phenotype Ontology (HPO) project, available at, provides a structured, comprehensive and well-defined set of 10,088 classes (terms) describing human phenotypic abnormalities and 13,326 subclass relations between the HPO classes. In addition we have developed logical definitions for 46% of all HPO classes using terms from ontologies for anatomy, cell types, function, embryology, pathology and other domains. This allows interoperability with several resources, especially those containing phenotype information on model organisms such as mouse and zebrafish. Here we describe the updated HPO database, which provides annotations of 7,278 human hereditary syndromes listed in OMIM, Orphanet and DECIPHER to classes of the HPO. Various meta-attributes such as frequency, references and negations are associated with each annotation. Several large-scale projects worldwide utilize the HPO for describing phenotype information in their datasets. We have therefore generated equivalence mappings to other phenotype vocabularies such as LDDB, Orphanet, MedDRA, UMLS and phenoDB, allowing integration of existing datasets and interoperability with multiple biomedical resources. We have created various ways to access the HPO database content using flat files, a MySQL database, and Web-based tools. All data and documentation on the HPO project can be found online.
PMCID: PMC3965098  PMID: 24217912
5.  The Gene Ontology (GO) Cellular Component Ontology: integration with SAO (Subcellular Anatomy Ontology) and other recent developments 
The Gene Ontology (GO) ( contains a set of terms for describing the activity and actions of gene products across all kingdoms of life. Each of these activities is executed in a location within a cell or in the vicinity of a cell. In order to capture this context, the GO includes a sub-ontology called the Cellular Component (CC) ontology (GO-CCO). The primary use of this ontology is for GO annotation, but it has also been used for phenotype annotation, and for the annotation of images. Another ontology with similar scope to the GO-CCO is the Subcellular Anatomy Ontology (SAO), part of the Neuroscience Information Framework Standard (NIFSTD) suite of ontologies. The SAO also covers cell components, but in the domain of neuroscience.
Recently, the GO-CCO was enriched in content and links to the Biological Process and Molecular Function branches of GO as well as to other ontologies. This was achieved in several ways. We carried out an amalgamation of SAO terms with GO-CCO ones; as a result, nearly 100 new neuroscience-related terms were added to the GO. The GO-CCO also contains relationships to GO Biological Process and Molecular Function terms, as well as connecting to external ontologies such as the Cell Ontology (CL). Terms representing protein complexes in the Protein Ontology (PRO) reference GO-CCO terms for their species-generic counterparts. GO-CCO terms can also be used to search a variety of databases.
In this publication we provide an overview of the GO-CCO, its overall design, and some recent extensions that make use of additional spatial information. One of the most recent developments of the GO-CCO was the merging in of the SAO, resulting in a single unified ontology designed to serve the needs of GO annotators as well as the specific needs of the neuroscience community.
PMCID: PMC3852282  PMID: 24093723
Gene ontology; Cellular component ontology; Subcellular anatomy ontology; Neuroscience; Annotation; Ontology language; Ontology integration; Neuroscience information framework
6.  Ontology based molecular signatures for immune cell types via gene expression analysis 
BMC Bioinformatics  2013;14:263.
New technologies are focusing on characterizing cell types to better understand their heterogeneity. With large volumes of cellular data being generated, innovative methods are needed to structure the resulting data analyses. Here, we describe an ‘Ontologically BAsed Molecular Signature’ (OBAMS) method that identifies novel cellular biomarkers and infers biological functions as characteristics of particular cell types. This method finds molecular signatures for immune cell types based on mapping biological samples to the Cell Ontology (CL) and navigating the space of all possible pairwise comparisons between cell types to find genes whose expression is core to a particular cell type’s identity.
We illustrate this ontological approach by evaluating expression data available from the Immunological Genome project (IGP) to identify unique biomarkers of mature B cell subtypes. We find that using OBAMS, candidate biomarkers can be identified at every strata of cellular identity from broad classifications to very granular. Furthermore, we show that Gene Ontology can be used to cluster cell types by shared biological processes in order to find candidate genes responsible for somatic hypermutation in germinal center B cells. Moreover, through in silico experiments based on this approach, we have identified genes sets that represent genes overexpressed in germinal center B cells and identify genes uniquely expressed in these B cells compared to other B cell types.
This work demonstrates the utility of incorporating structured ontological knowledge into biological data analysis – providing a new method for defining novel biomarkers and providing an opportunity for new biological insights.
PMCID: PMC3844401  PMID: 24004649
7.  Dovetailing biology and chemistry: integrating the Gene Ontology with the ChEBI chemical ontology 
BMC Genomics  2013;14:513.
The Gene Ontology (GO) facilitates the description of the action of gene products in a biological context. Many GO terms refer to chemical entities that participate in biological processes. To facilitate accurate and consistent systems-wide biological representation, it is necessary to integrate the chemical view of these entities with the biological view of GO functions and processes. We describe a collaborative effort between the GO and the Chemical Entities of Biological Interest (ChEBI) ontology developers to ensure that the representation of chemicals in the GO is both internally consistent and in alignment with the chemical expertise captured in ChEBI.
We have examined and integrated the ChEBI structural hierarchy into the GO resource through computationally-assisted manual curation of both GO and ChEBI. Our work has resulted in the creation of computable definitions of GO terms that contain fully defined semantic relationships to corresponding chemical terms in ChEBI.
The set of logical definitions using both the GO and ChEBI has already been used to automate aspects of GO development and has the potential to allow the integration of data across the domains of biology and chemistry. These logical definitions are available as an extended version of the ontology from
PMCID: PMC3733925  PMID: 23895341
8.  MouseFinder: candidate disease genes from mouse phenotype data 
Human Mutation  2012;33(5):858-866.
Mouse phenotype data represents a valuable resource for the identification of disease-associated genes, especially where the molecular basis is unknown and there is no clue to the candidate gene’s function, pathway involvement or expression pattern. However, until recently these data have not been systematically used due to difficulties in mapping between clinical features observed in humans and mouse phenotype annotations. Here, we describe a semantic approach to solve this problem and demonstrate highly significant recall of known disease-gene associations and orthology relationships. A web application (MouseFinder; has been developed to allow users to search the results of our whole-phenome comparison of human and mouse. We demonstrate its use in identifying ARTN as a strong candidate gene within the 1p34.1-p32 mapped locus for a hereditary form of ptosis.
PMCID: PMC3327758  PMID: 22331800
phenotype; candidate disease genes; model organism; mouse
9.  Phenotypic overlap in the contribution of individual genes to CNV pathogenicity revealed by cross-species computational analysis of single-gene mutations in humans, mice and zebrafish 
Disease Models & Mechanisms  2012;6(2):358-372.
Numerous disease syndromes are associated with regions of copy number variation (CNV) in the human genome and, in most cases, the pathogenicity of the CNV is thought to be related to altered dosage of the genes contained within the affected segment. However, establishing the contribution of individual genes to the overall pathogenicity of CNV syndromes is difficult and often relies on the identification of potential candidates through manual searches of the literature and online resources. We describe here the development of a computational framework to comprehensively search phenotypic information from model organisms and single-gene human hereditary disorders, and thus speed the interpretation of the complex phenotypes of CNV disorders. There are currently more than 5000 human genes about which nothing is known phenotypically but for which detailed phenotypic information for the mouse and/or zebrafish orthologs is available. Here, we present an ontology-based approach to identify similarities between human disease manifestations and the mutational phenotypes in characterized model organism genes; this approach can therefore be used even in cases where there is little or no information about the function of the human genes. We applied this algorithm to detect candidate genes for 27 recurrent CNV disorders and identified 802 gene-phenotype associations, approximately half of which involved genes that were previously reported to be associated with individual phenotypic features and half of which were novel candidates. A total of 431 associations were made solely on the basis of model organism phenotype data. Additionally, we observed a striking, statistically significant tendency for individual disease phenotypes to be associated with multiple genes located within a single CNV region, a phenomenon that we denote as pheno-clustering. Many of the clusters also display statistically significant similarities in protein function or vicinity within the protein-protein interaction network. Our results provide a basis for understanding previously un-interpretable genotype-phenotype correlations in pathogenic CNVs and for mobilizing the large amount of model organism phenotype data to provide insights into human genetic disorders.
PMCID: PMC3597018  PMID: 23104991
10.  Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research 
F1000Research  2013;2:30.
Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species.
We have generated a cross-species phenotype ontology for human, mouse and zebra fish that contains zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases.
This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from
PMCID: PMC3799545  PMID: 24358873
11.  A knowledge based approach to matching human neurodegenerative disease and animal models 
Neurodegenerative diseases present a wide and complex range of biological and clinical features. Animal models are key to translational research, yet typically only exhibit a subset of disease features rather than being precise replicas of the disease. Consequently, connecting animal to human conditions using direct data-mining strategies has proven challenging, particularly for diseases of the nervous system, with its complicated anatomy and physiology. To address this challenge we have explored the use of ontologies to create formal descriptions of structural phenotypes across scales that are machine processable and amenable to logical inference. As proof of concept, we built a Neurodegenerative Disease Phenotype Ontology (NDPO) and an associated Phenotype Knowledge Base (PKB) using an entity-quality model that incorporates descriptions for both human disease phenotypes and those of animal models. Entities are drawn from community ontologies made available through the Neuroscience Information Framework (NIF) and qualities are drawn from the Phenotype and Trait Ontology (PATO). We generated ~1200 structured phenotype statements describing structural alterations at the subcellular, cellular and gross anatomical levels observed in 11 human neurodegenerative conditions and associated animal models. PhenoSim, an open source tool for comparing phenotypes, was used to issue a series of competency questions to compare individual phenotypes among organisms and to determine which animal models recapitulate phenotypic aspects of the human disease in aggregate. Overall, the system was able to use relationships within the ontology to bridge phenotypes across scales, returning non-trivial matches based on common subsumers that were meaningful to a neuroscientist with an advanced knowledge of neuroanatomy. The system can be used both to compare individual phenotypes and also phenotypes in aggregate. This proof of concept suggests that expressing complex phenotypes using formal ontologies provides considerable benefit for comparing phenotypes across scales and species.
PMCID: PMC3653101  PMID: 23717278
phenotype; ontology; Neuroscience Information Framework; neurodegenerative disease; semantics
12.  A Unified Anatomy Ontology of the Vertebrate Skeletal System 
PLoS ONE  2012;7(12):e51070.
The skeleton is of fundamental importance in research in comparative vertebrate morphology, paleontology, biomechanics, developmental biology, and systematics. Motivated by research questions that require computational access to and comparative reasoning across the diverse skeletal phenotypes of vertebrates, we developed a module of anatomical concepts for the skeletal system, the Vertebrate Skeletal Anatomy Ontology (VSAO), to accommodate and unify the existing skeletal terminologies for the species-specific (mouse, the frog Xenopus, zebrafish) and multispecies (teleost, amphibian) vertebrate anatomy ontologies. Previous differences between these terminologies prevented even simple queries across databases pertaining to vertebrate morphology. This module of upper-level and specific skeletal terms currently includes 223 defined terms and 179 synonyms that integrate skeletal cells, tissues, biological processes, organs (skeletal elements such as bones and cartilages), and subdivisions of the skeletal system. The VSAO is designed to integrate with other ontologies, including the Common Anatomy Reference Ontology (CARO), Gene Ontology (GO), Uberon, and Cell Ontology (CL), and it is freely available to the community to be updated with additional terms required for research. Its structure accommodates anatomical variation among vertebrate species in development, structure, and composition. Annotation of diverse vertebrate phenotypes with this ontology will enable novel inquiries across the full spectrum of phenotypic diversity.
PMCID: PMC3519498  PMID: 23251424
13.  The Plant Ontology as a Tool for Comparative Plant Anatomy and Genomic Analyses 
Plant and Cell Physiology  2012;54(2):e1.
The Plant Ontology (PO; is a publicly available, collaborative effort to develop and maintain a controlled, structured vocabulary (‘ontology’) of terms to describe plant anatomy, morphology and the stages of plant development. The goals of the PO are to link (annotate) gene expression and phenotype data to plant structures and stages of plant development, using the data model adopted by the Gene Ontology. From its original design covering only rice, maize and Arabidopsis, the scope of the PO has been expanded to include all green plants. The PO was the first multispecies anatomy ontology developed for the annotation of genes and phenotypes. Also, to our knowledge, it was one of the first biological ontologies that provides translations (via synonyms) in non-English languages such as Japanese and Spanish. As of Release #18 (July 2012), there are about 2.2 million annotations linking PO terms to >110,000 unique data objects representing genes or gene models, proteins, RNAs, germplasm and quantitative trait loci (QTLs) from 22 plant species. In this paper, we focus on the plant anatomical entity branch of the PO, describing the organizing principles, resources available to users and examples of how the PO is integrated into other plant genomics databases and web portals. We also provide two examples of comparative analyses, demonstrating how the ontology structure and PO-annotated data can be used to discover the patterns of expression of the LEAFY (LFY) and terpene synthase (TPS) gene homologs.
PMCID: PMC3583023  PMID: 23220694
Bioinformatics; Comparative genomics; Genome annotation; Ontology; Plant anatomy; Terpene synthase
14.  Ontologies as integrative tools for plant science 
American journal of botany  2012;99(8):1263-1275.
Premise of the study
Bio-ontologies are essential tools for accessing and analyzing the rapidly growing pool of plant genomic and phenomic data. Ontologies provide structured vocabularies to support consistent aggregation of data and a semantic framework for automated analyses and reasoning. They are a key component of the semantic web.
This paper provides background on what bio-ontologies are, why they are relevant to botany, and the principles of ontology development. It includes an overview of ontologies and related resources that are relevant to plant science, with a detailed description of the Plant Ontology (PO). We discuss the challenges of building an ontology that covers all green plants (Viridiplantae).
Key results
Ontologies can advance plant science in four keys areas: (1) comparative genetics, genomics, phenomics, and development; (2) taxonomy and systematics; (3) semantic applications; and (4) education.
Bio-ontologies offer a flexible framework for comparative plant biology, based on common botanical understanding. As genomic and phenomic data become available for more species, we anticipate that the annotation of data with ontology terms will become less centralized, while at the same time, the need for cross-species queries will become more common, causing more researchers in plant science to turn to ontologies.
PMCID: PMC3492881  PMID: 22847540
bio-ontologies; genome annotation; OBO Foundry; phenomics; plant anatomy; plant genomics; Plant Ontology; plant systematics; semantic web
15.  On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report 
PLoS Computational Biology  2012;8(2):e1002386.
A recent paper (Nehrt et al., PLoS Comput. Biol. 7:e1002073, 2011) has proposed a metric for the “functional similarity” between two genes that uses only the Gene Ontology (GO) annotations directly derived from published experimental results. Applying this metric, the authors concluded that paralogous genes within the mouse genome or the human genome are more functionally similar on average than orthologous genes between these genomes, an unexpected result with broad implications if true. We suggest, based on both theoretical and empirical considerations, that this proposed metric should not be interpreted as a functional similarity, and therefore cannot be used to support any conclusions about the “ortholog conjecture” (or, more properly, the “ortholog functional conservation hypothesis”). First, we reexamine the case studies presented by Nehrt et al. as examples of orthologs with divergent functions, and come to a very different conclusion: they actually exemplify how GO annotations for orthologous genes provide complementary information about conserved biological functions. We then show that there is a global ascertainment bias in the experiment-based GO annotations for human and mouse genes: particular types of experiments tend to be performed in different model organisms. We conclude that the reported statistical differences in annotations between pairs of orthologous genes do not reflect differences in biological function, but rather complementarity in experimental approaches. Our results underscore two general considerations for researchers proposing novel types of analysis based on the GO: 1) that GO annotations are often incomplete, potentially in a biased manner, and subject to an “open world assumption” (absence of an annotation does not imply absence of a function), and 2) that conclusions drawn from a novel, large-scale GO analysis should whenever possible be supported by careful, in-depth examination of examples, to help ensure the conclusions have a justifiable biological basis.
Author Summary
Understanding gene function—how individual genes contribute to the biology of an organism at the molecular, cellular and organism levels—is one of the primary aims of biomedical research. It has been a longstanding tenet of model organism research that experimental knowledge obtained in one organism is often applicable to other organisms, particularly if the organisms share the relevant genes because they inherited them from their common ancestor. Nevertheless this tenet is, like any hypothesis, not beyond question. A recent paper has termed this hypothesis a “conjecture,” and performed a statistical analysis, the results of which were interpreted as evidence against the hypothesis. This statistical analysis relied on a computational representation of gene function, the Gene Ontology (GO). As representatives of the international consortium that produces the GO, we show how the apparent evidence against the “ortholog conjecture” can be better explained as an artifact of how molecular biology knowledge is accumulated. In short, a complementarity between knowledge obtained in mouse and human experimental systems was incorrectly interpreted as a disagreement. We discuss the proper interpretation of GO annotations and potential sources of bias, with an eye toward enhancing the informed use of the GO by the scientific community.
PMCID: PMC3280971  PMID: 22359495
16.  Hematopoietic Cell Types: Prototype for a Revised Cell Ontology 
The Cell Ontology (CL) aims for the representation of in vivo and in vitro cell types from all of biology. The CL is a candidate reference ontology of the OBO Foundry and requires extensive revision to bring it up to current standards for biomedical ontologies, both in its structure and its coverage of various subfields of biology. We have now addressed the specific content of one area of the CL, the section of the ontology dealing with hematopoietic cells. This section has been extensively revised to improve its content and eliminate multiple inheritance in the asserted hierarchy, and the groundwork was laid for structuring the hematopoietic cell type terms as cross-products incorporating logical definitions built from relationships to external ontologies, such as the Protein Ontology and the Gene Ontology. The methods and improvements to the CL in this area represent a paradigm for improvement of the entire ontology over time.
PMCID: PMC2892030  PMID: 20123131
ontology; hematopoietic cells; immunology
17.  Cross-Product Extensions of the Gene Ontology 
The Gene Ontology (GO) consists of nearly 30,000 classes for describing the activities and locations of gene products. Manual maintenance of an ontology of this size is a considerable effort, and errors and inconsistencies inevitably arise. Reasoners can be used to assist with ontology development, automatically placing classes in a subsumption hierarchy based on their properties. However, the historic lack of computable definitions within the GO has prevented the user of these tools.
In this paper we present preliminary results of an ongoing effort to normalize the GO by explicitly stating the definitions of compositional classes in a form that can be used by reasoners. These definitions are partitioned into mutually exclusive cross-product sets, many of which reference other OBO Foundry candidate ontologies for chemical entities, proteins, biological qualities and anatomical entities. Using these logical definitions we are gradually beginning to automate many aspects of ontology development, detecting errors and filling in missing relationships. These definitions also enhance the GO by weaving it into the fabric of a wider collection of interoperating ontologies, increasing opportunities for data integration and enhancing genomic analyses.
PMCID: PMC2910209  PMID: 20152934
18.  Evolution of the Sequence Ontology terms and relationships 
The Sequence Ontology is an established ontology, with a large user community, for the purpose of genomic annotation. We are reforming the ontology to provide better terms and relationships to describe the features of biological sequence, for both genomic and derived sequence. The SO is working within the guidelines of the OBO Foundry to provide interoperability between SO and the other related OBO ontologies. Here we report changes and improvements made to SO including new relationships to better define the mereological, spatial and temporal aspects of biological sequence.
PMCID: PMC3052763  PMID: 20226267
Sequence Ontology; biomedical ontology; genome annotation
19.  Uberon, an integrative multi-species anatomy ontology 
Genome Biology  2012;13(1):R5.
We present Uberon, an integrated cross-species ontology consisting of over 6,500 classes representing a variety of anatomical entities, organized according to traditional anatomical classification criteria. The ontology represents structures in a species-neutral way and includes extensive associations to existing species-centric anatomical ontologies, allowing integration of model organism and human data. Uberon provides a necessary bridge between anatomical structures in different taxa for cross-species inference. It uses novel methods for representing taxonomic variation, and has proved to be essential for translational phenotype analyses. Uberon is available at
PMCID: PMC3334586  PMID: 22293552
20.  Logical Development of the Cell Ontology 
BMC Bioinformatics  2011;12:6.
The Cell Ontology (CL) is an ontology for the representation of in vivo cell types. As biological ontologies such as the CL grow in complexity, they become increasingly difficult to use and maintain. By making the information in the ontology computable, we can use automated reasoners to detect errors and assist with classification. Here we report on the generation of computable definitions for the hematopoietic cell types in the CL.
Computable definitions for over 340 CL classes have been created using a genus-differentia approach. These define cell types according to multiple axes of classification such as the protein complexes found on the surface of a cell type, the biological processes participated in by a cell type, or the phenotypic characteristics associated with a cell type. We employed automated reasoners to verify the ontology and to reveal mistakes in manual curation. The implementation of this process exposed areas in the ontology where new cell type classes were needed to accommodate species-specific expression of cellular markers. Our use of reasoners also inferred new relationships within the CL, and between the CL and the contributing ontologies. This restructured ontology can be used to identify immune cells by flow cytometry, supports sophisticated biological queries involving cells, and helps generate new hypotheses about cell function based on similarities to other cell types.
Use of computable definitions enhances the development of the CL and supports the interoperability of OBO ontologies.
PMCID: PMC3024222  PMID: 21208450
21.  Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development 
BMC Bioinformatics  2010;11:530.
The Gene Ontology project supports categorization of gene products according to their location of action, the molecular functions that they carry out, and the processes that they are involved in. Although the ontologies are intentionally developed to be taxon neutral, and to cover all species, there are inherent taxon specificities in some branches. For example, the process 'lactation' is specific to mammals and the location 'mitochondrion' is specific to eukaryotes. The lack of an explicit formalization of these constraints can lead to errors and inconsistencies in automated and manual annotation.
We have formalized the taxonomic constraints implicit in some GO classes, and specified these at various levels in the ontology. We have also developed an inference system that can be used to check for violations of these constraints in annotations. Using the constraints in conjunction with the inference system, we have detected and removed errors in annotations and improved the structure of the ontology.
Detection of inconsistencies in taxon-specificity enables gradual improvement of the ontologies, the annotations, and the formalized constraints. This is progressively improving the quality of our data. The full system is available for download, and new constraints or proposed changes to constraints can be submitted online at
PMCID: PMC3098089  PMID: 20973947
22.  The Release 5.1 Annotation of Drosophila melanogaster Heterochromatin 
Science (New York, N.Y.)  2007;316(5831):1586.
The repetitive DNA that constitutes most of the heterochromatic regions of metazoan genomes has hindered the comprehensive analysis of gene content and other functions. We have generated a detailed computational and manual annotation of 24 megabases of heterochromatic sequence in the Release 5 Drosophila melanogaster genome sequence. The heterochromatin contains a minimum of 230 to 254 protein-coding genes, which are conserved in other Drosophilids and more diverged species, as well as 32 pseudogenes and 13 noncoding RNAs. Improved methods revealed that more than 77% of this heterochromatin sequence, including introns and intergenic regions, is composed of fragmented and nested transposable elements and other repeated DNAs. Drosophila heterochromatin contains “islands” of highly conserved genes embedded in these “oceans” of complex repeats, which may require special expression and splicing mechanisms.
PMCID: PMC2819280  PMID: 17569856
23.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration 
Nature biotechnology  2007;25(11):1251.
The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data. One approach to integration is through the annotation of multiple bodies of data using common controlled vocabularies or ‘ontologies’. Unfortunately, the very success of this approach has led to a proliferation of ontologies, which itself creates obstacles to integration. The Open Biomedical Ontologies (OBO) consortium is pursuing a strategy to overcome this problem. Existing OBO ontologies, including the Gene Ontology, are undergoing coordinated reform, and new ontologies are being created on the basis of an evolving set of shared principles governing ontology development. The result is an expanding family of ontologies designed to be interoperable and logically well formed and to incorporate accurate representations of biological reality. We describe this OBO Foundry initiative and provide guidelines for those who might wish to become involved.
PMCID: PMC2814061  PMID: 17989687
24.  Integrating phenotype ontologies across multiple species 
Genome Biology  2010;11(1):R2.
A phenotypic ontology that can be used for the analysis of phenotype-genotype data across multiple species, paving the way for truly cross species translational research.
Phenotype ontologies are typically constructed to serve the needs of a particular community, such as annotation of genotype-phenotype associations in mouse or human. Here we demonstrate how these ontologies can be improved through assignment of logical definitions using a core ontology of phenotypic qualities and multiple additional ontologies from the Open Biological Ontologies library. We also show how these logical definitions can be used for data integration when combined with a unified multi-species anatomy ontology.
PMCID: PMC2847714  PMID: 20064205
25.  Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation 
PLoS Biology  2009;7(11):e1000247.
A novel method for quantifying the similarity between phenotypes by the use of ontologies can be used to search for candidate genes, pathway members, and human disease models on the basis of phenotypes alone.
Scientists and clinicians who study genetic alterations and disease have traditionally described phenotypes in natural language. The considerable variation in these free-text descriptions has posed a hindrance to the important task of identifying candidate genes and models for human diseases and indicates the need for a computationally tractable method to mine data resources for mutant phenotypes. In this study, we tested the hypothesis that ontological annotation of disease phenotypes will facilitate the discovery of new genotype-phenotype relationships within and across species. To describe phenotypes using ontologies, we used an Entity-Quality (EQ) methodology, wherein the affected entity (E) and how it is affected (Q) are recorded using terms from a variety of ontologies. Using this EQ method, we annotated the phenotypes of 11 gene-linked human diseases described in Online Mendelian Inheritance in Man (OMIM). These human annotations were loaded into our Ontology-Based Database (OBD) along with other ontology-based phenotype descriptions of mutants from various model organism databases. Phenotypes recorded with this EQ method can be computationally compared based on the hierarchy of terms in the ontologies and the frequency of annotation. We utilized four similarity metrics to compare phenotypes and developed an ontology of homologous and analogous anatomical structures to compare phenotypes between species. Using these tools, we demonstrate that we can identify, through the similarity of the recorded phenotypes, other alleles of the same gene, other members of a signaling pathway, and orthologous genes and pathway members across species. We conclude that EQ-based annotation of phenotypes, in conjunction with a cross-species ontology, and a variety of similarity metrics can identify biologically meaningful similarities between genes by comparing phenotypes alone. This annotation and search method provides a novel and efficient means to identify gene candidates and animal models of human disease, which may shorten the lengthy path to identification and understanding of the genetic basis of human disease.
Author Summary
Model organisms such as fruit flies, mice, and zebrafish are useful for investigating gene function because they are easy to grow, dissect, and genetically manipulate in the laboratory. By examining mutations in these organisms, one can identify candidate genes that cause disease in humans, and develop models to better understand human disease and gene function. A fundamental roadblock for analysis is, however, the lack of a computational method for describing and comparing phenotypes of mutant animals and of human diseases when the genetic basis is unknown. We describe here a novel method using ontologies to record and quantify the similarity between phenotypes. We tested our method by using the annotated mutant phenotype of one member of the Hedgehog signaling pathway in zebrafish to identify other pathway members with similar recorded phenotypes. We also compared human disease phenotypes to those produced by mutation in model organisms, and show that orthologous and biologically relevant genes can be identified by this method. Given that the genetic basis of human disease is often unknown, this method provides a means for identifying candidate genes, pathway members, and disease models by computationally identifying similar phenotypes within and across species.
PMCID: PMC2774506  PMID: 19956802

Results 1-25 (29)