The Plant Ontology Consortium (POC, http://www.plantontology.org) is a collaborative effort among model plant genome databases and plant researchers that aims to create, maintain and facilitate the use of a controlled vocabulary (ontology) for plants. The ontology allows users to ascribe attributes of plant structure (anatomy and morphology) and developmental stages to data types, such as genes and phenotypes, to provide a semantic framework to make meaningful cross-species and database comparisons. The POC builds upon groundbreaking work by the Gene Ontology Consortium (GOC) by adopting and extending the GOC's principles, existing software and database structure. Over the past year, POC has added hundreds of ontology terms to associate with thousands of genes and gene products from Arabidopsis, rice and maize, which are available through a newly updated web-based browser (http://www.plantontology.org/amigo/go.cgi) for viewing, searching and querying. The Consortium has also implemented new functionalities to facilitate the application of PO in genomic research and updated the website to keep the contents current. In this report, we present a brief description of resources available from the website, changes to the interfaces, data updates, community activities and future enhancement.
Maintaining a bio-ontology in the long term requires improving and updating its contents so that it adequately captures what is known about biological phenomena. This paper illustrates how these processes are carried out, by studying the ways in which curators at the Gene Ontology have hitherto incorporated new knowledge into their resource.
Five types of circumstances are singled out as warranting changes in the ontology: (1) the emergence of anomalies within GO; (2) the extension of the scope of GO; (3) divergence in how terminology is used across user communities; (4) new discoveries that change the meaning of the terms used and their relations to each other; and (5) the extension of the range of relations used to link entities or processes described by GO terms.
This study illustrates the difficulties involved in applying general standards to the development of a specific ontology. Ontology curation aims to produce a faithful representation of knowledge domains as they keep developing, which requires the translation of general guidelines into specific representations of reality and an understanding of how scientific knowledge is produced and constantly updated. In this context, it is important that trained curators with technical expertise in the scientific field(s) in question are involved in supervising ontology shifts and identifying inaccuracies.
Gene Ontology; knowledge; maintenance; curation; ontology shifts
BrainInfo (http://braininfo.org) is a growing portal to neuroscientific information on the Web. It is indexed by NeuroNames, an ontology designed to compensate for ambiguities in neuroanatomical nomenclature. The 20-year old ontology continues to evolve toward the ideal of recognizing all names of neuroanatomical entities and accommodating all structural concepts about which neuroscientists communicate, including multiple concepts of entities for which neuroanatomists have yet to determine the best or ‘true’ conceptualization. To make the definitions of structural concepts unambiguous and terminologically consistent we created a ‘default vocabulary’ of unique structure names selected from existing terminology. We selected standard names by criteria designed to maximize practicality for use in verbal communication as well as computerized knowledge management. The ontology of NeuroNames accommodates synonyms and homonyms of the standard terms in many languages. It defines complex structures as models composed of primary structures, which are defined in unambiguous operational terms. NeuroNames currently relates more than 16,000 names in eight languages to some 2,500 neuroanatomical concepts. The ontology is maintained in a relational database with three core tables: Names, Concepts and Models. BrainInfo uses NeuroNames to index information by structure, to interpret users’ queries and to clarify terminology on remote web pages. NeuroNames is a resource vocabulary of the NLM’s Unified Medical Language System (UMLS, 2011) and the basis for the brain regions component of NIFSTD (NeuroLex, 2011). The current version has been downloaded to hundreds of laboratories for indexing data and linking to BrainInfo, which attracts some 400 visitors/day, downloading 2,000 pages/day.
NeuroNames; BrainInfo; ontology; web portal; neuroanatomical nomenclature; web search
An understanding of heart development is critical in any systems biology approach to cardiovascular disease. The interpretation of data generated from high-throughput technologies (such as microarray and proteomics) is also essential to this approach. However, characterizing the role of genes in the processes underlying heart development and cardiovascular disease involves the non-trivial task of data analysis and integration of previous knowledge. The Gene Ontology (GO) Consortium provides structured controlled biological vocabularies that are used to summarize previous functional knowledge for gene products across all species. One aspect of GO describes biological processes, such as development and signaling.
In order to support high-throughput cardiovascular research, we have initiated an effort to fully describe heart development in GO; expanding the number of GO terms describing heart development from 12 to over 280. This new ontology describes heart morphogenesis, the differentiation of specific cardiac cell types, and the involvement of signaling pathways in heart development and aligns GO with the current views of the heart development research community and its representation in the literature. This extension of GO allows gene product annotators to comprehensively capture the genetic program leading to the developmental progression of the heart. This will enable users to integrate heart development data across species, resulting in the comprehensive retrieval of information about this subject.
The revised GO structure, combined with gene product annotations, should improve the interpretation of data from high-throughput methods in a variety of cardiovascular research areas, including heart development, congenital cardiac disease, and cardiac stem cell research. Additionally, we invite the heart development community to contribute to the expansion of this important dataset for the benefit of future research in this area.
annotation; cardiovascular; development; Gene Ontology; heart
The goal of the Plant Ontology™ Consortium is to produce structured controlled vocabularies,
arranged in ontologies, that can be applied to plant-based database information
even as knowledge of the biology of the relevant plant taxa (e.g. development, anatomy,
morphology, genomics, proteomics) is accumulating and changing. The collaborators of the
Plant Ontology™ Consortium (POC) represent a number of core participant database
groups. The Plant Ontology™ Consortium is expanding the paradigm of the Gene
Ontology™ Consortium (http://www.geneontology.org). Various trait ontologies (agronomic
traits, mutant phenotypes, phenotypes, traits, and QTL) and plant ontologies (plant
development, anatomy [incl. morphology]) for several taxa (Arabidopsis, maize/corn/Zea mays
and rice/Oryza) are under development. The products of the Plant Ontology™ Consortium will be open-source.
The Gene Ontology (GO) Consortium (http://www.geneontology.org) (GOC) continues to develop, maintain and use a set of structured, controlled vocabularies for the annotation of genes, gene products and sequences. The GO ontologies are expanding both in content and in structure. Several new relationship types have been introduced and used, along with existing relationships, to create links between and within the GO domains. These improve the representation of biology, facilitate querying, and allow GO developers to systematically check for and correct inconsistencies within the GO. Gene product annotation using GO continues to increase both in the number of total annotations and in species coverage. GO tools, such as OBO-Edit, an ontology-editing tool, and AmiGO, the GOC ontology browser, have seen major improvements in functionality, speed and ease of use.
The effort of function annotation does not merely involve associating a gene with some structured vocabulary that describes
action. Rather the details of the actions, the components of the actions, the larger context of the actions are important
issues that are of direct relevance, because they help understand the biological system to which the gene/protein belongs.
Currently Gene Ontology (GO) Consortium offers the most comprehensive sets of relationships to describe gene/protein activity.
However, its choice to segregate gene ontology to subdomains of molecular function, biological process and cellular component
is creating significant limitations in terms of future scope of use. If we are to understand biology in its total complexity,
comprehensive ontologies in larger biological domains are essential. A vigorous discussion on this topic is necessary for the
larger benefit of the biological community. I highlight this point because larger-bio-domain ontologies cannot be simply
created by integrating subdomain ontologies. Relationships in larger bio-domain-ontologies are more complex due to larger size
of the system and are therefore more labor intensive to create. The current limitations of GO will be a handicap in derivation
of more complex relationships from the high throughput biology data.
gene; ontology; function; annotation; vocabulary
The Gene Ontology (GO) (http://www.geneontology.org) is a community bioinformatics resource that represents gene product function through the use of structured, controlled vocabularies. The number of GO annotations of gene products has increased due to curation efforts among GO Consortium (GOC) groups, including focused literature-based annotation and ortholog-based functional inference. The GO ontologies continue to expand and improve as a result of targeted ontology development, including the introduction of computable logical definitions and development of new tools for the streamlined addition of terms to the ontology. The GOC continues to support its user community through the use of e-mail lists, social media and web-based resources.
The Gene Ontology (GO) project () develops and uses a set of structured, controlled vocabularies for community use in annotating genes, gene products and sequences (also see ). The GO Consortium continues to improve to the vocabulary content, reflecting the impact of several novel mechanisms of incorporating community input. A growing number of model organism databases and genome annotation groups contribute annotation sets using GO terms to GO's public repository. Updates to the AmiGO browser have improved access to contributed genome annotations. As the GO project continues to grow, the use of the GO vocabularies is becoming more varied as well as more widespread. The GO project provides an ontological annotation system that enables biologists to infer knowledge from large amounts of data.
Many biomedical terminologies, classifications, and ontological resources such as the NCI Thesaurus (NCIT), International Classification of Diseases (ICD), Systematized Nomenclature of Medicine (SNOMED), Current Procedural Terminology (CPT), and Gene Ontology (GO) have been developed and used to build a variety of IT applications in biology, biomedicine, and health care settings. However, virtually all these resources involve incompatible formats, are based on different modeling languages, and lack appropriate tooling and programming interfaces (APIs) that hinder their wide-scale adoption and usage in a variety of application contexts. The Lexical Grid (LexGrid) project introduced in this paper is an ongoing community-driven initiative, coordinated by the Mayo Clinic Division of Biomedical Statistics and Informatics, designed to bridge this gap using a common terminology model called the LexGrid model. The key aspect of the model is to accommodate multiple vocabulary and ontology distribution formats and support of multiple data stores for federated vocabulary distribution. The model provides a foundation for building consistent and standardized APIs to access multiple vocabularies that support lexical search queries, hierarchy navigation, and a rich set of features such as recursive subsumption (e.g., get all the children of the concept penicillin). Existing LexGrid implementations include the LexBIG API as well as a reference implementation of the HL7 Common Terminology Services (CTS) specification providing programmatic access via Java, Web, and Grid services.
The Cell Ontology (CL) aims for the representation of in vivo and in vitro cell types from all of biology. The CL is a candidate reference ontology of the OBO Foundry and requires extensive revision to bring it up to current standards for biomedical ontologies, both in its structure and its coverage of various subfields of biology. We have now addressed the specific content of one area of the CL, the section of the ontology dealing with hematopoietic cells. This section has been extensively revised to improve its content and eliminate multiple inheritance in the asserted hierarchy, and the groundwork was laid for structuring the hematopoietic cell type terms as cross-products incorporating logical definitions built from relationships to external ontologies, such as the Protein Ontology and the Gene Ontology. The methods and improvements to the CL in this area represent a paradigm for improvement of the entire ontology over time.
ontology; hematopoietic cells; immunology
The National Cancer Institute (NCI) is developing an integrated biomedical informatics infrastructure, the cancer Biomedical Informatics Grid (caBIG®), to support collaboration within the cancer research community. A key part of the caBIG architecture is the establishment of terminology standards for representing data. In order to evaluate the suitability of existing controlled terminologies, the caBIG Vocabulary and Data Elements Workspace (VCDE WS) working group has developed a set of criteria that serve to assess a terminology's structure, content, documentation, and editorial process. This paper describes the evolution of these criteria and the results of their use in evaluating four standard terminologies: the Gene Ontology (GO), the NCI Thesaurus (NCIt), the Common Terminology for Adverse Events (known as CTCAE), and the laboratory portion of the Logical Objects, Identifiers, Names and Codes (LOINC). The resulting caBIG criteria are presented as a matrix that may be applicable to any terminology standardization effort.
Terminology; Ontology; Auditing; Evaluation
We present an analysis of some considerations involved in expressing the Gene Ontology (GO) as a machine-processible ontology, reflecting principles of formal ontology.
GO is a controlled vocabulary that is intended to facilitate communication between
biologists by standardizing usage of terms in database annotations. Making such
controlled vocabularies maximally useful in support of bioinformatics applications
requires explicating in machine-processible form the implicit background information
that enables human users to interpret the meaning of the vocabulary terms.
In the case of GO, this process would involve rendering the meanings of GO into
a formal (logical) language with the help of domain experts, and adding additional
information required to support the chosen formalization. A controlled vocabulary
augmented in these ways is commonly called an ontology. In this paper, we make a
modest exploration to determine the ontological requirements for this extended version
of GO. Using the terms within the three GO hierarchies (molecular function,
biological process and cellular component), we investigate the facility with which
GO concepts can be ontologized, using available tools from the philosophical and
ontological engineering literature.
Representing species-specific proteins and protein complexes in ontologies that are both human- and machine-readable facilitates the retrieval, analysis, and interpretation of genome-scale data sets. Although existing protin-centric informatics resources provide the biomedical research community with well-curated compendia of protein sequence and structure, these resources lack formal ontological representations of the relationships among the proteins themselves. The Protein Ontology (PRO) Consortium is filling this informatics resource gap by developing ontological representations and relationships among proteins and their variants and modified forms. Because proteins are often functional only as members of stable protein complexes, the PRO Consortium, in collaboration with existing protein and pathway databases, has launched a new initiative to implement logical and consistent representation of protein complexes.
We describe here how the PRO Consortium is meeting the challenge of representing species-specific protein complexes, how protein complex representation in PRO supports annotation of protein complexes and comparative biology, and how PRO is being integrated into existing community bioinformatics resources. The PRO resource is accessible at http://pir.georgetown.edu/pro/.
PRO is a unique database resource for species-specific protein complexes. PRO facilitates robust annotation of variations in composition and function contexts for protein complexes within and between species.
The range of publicly available biomedical data is enormous and is expanding fast. This expansion means that researchers now face a hurdle to extracting the data they need from the large numbers of data that are available. Biomedical researchers have turned to ontologies and terminologies to structure and annotate their data with ontology concepts for better search and retrieval. However, this annotation process cannot be easily automated and often requires expert curators. Plus, there is a lack of easy-to-use systems that facilitate the use of ontologies for annotation. This paper presents the Open Biomedical Annotator (OBA), an ontology-based Web service that annotates public datasets with biomedical ontology concepts based on their textual metadata (www.bioontology.org). The biomedical community can use the annotator service to tag datasets automatically with ontology terms (from UMLS and NCBO BioPortal ontologies). Such annotations facilitate translational discoveries by integrating annotated data.
The skeleton is of fundamental importance in research in comparative vertebrate morphology, paleontology, biomechanics, developmental biology, and systematics. Motivated by research questions that require computational access to and comparative reasoning across the diverse skeletal phenotypes of vertebrates, we developed a module of anatomical concepts for the skeletal system, the Vertebrate Skeletal Anatomy Ontology (VSAO), to accommodate and unify the existing skeletal terminologies for the species-specific (mouse, the frog Xenopus, zebrafish) and multispecies (teleost, amphibian) vertebrate anatomy ontologies. Previous differences between these terminologies prevented even simple queries across databases pertaining to vertebrate morphology. This module of upper-level and specific skeletal terms currently includes 223 defined terms and 179 synonyms that integrate skeletal cells, tissues, biological processes, organs (skeletal elements such as bones and cartilages), and subdivisions of the skeletal system. The VSAO is designed to integrate with other ontologies, including the Common Anatomy Reference Ontology (CARO), Gene Ontology (GO), Uberon, and Cell Ontology (CL), and it is freely available to the community to be updated with additional terms required for research. Its structure accommodates anatomical variation among vertebrate species in development, structure, and composition. Annotation of diverse vertebrate phenotypes with this ontology will enable novel inquiries across the full spectrum of phenotypic diversity.
The @neurIST ontology is currently under development within the scope of the European project @neurIST intended to serve as a module in a complex architecture aiming at providing a better understanding and management of intracranial aneurysms and subarachnoid hemorrhages. Due to the integrative structure of the project the ontology needs to represent entities from various disciplines on a large spatial and temporal scale. Initial term acquisition was performed by exploiting a database scaffold, literature analysis and communications with domain experts. The ontology design is based on the DOLCE upper ontology and other existing domain ontologies were linked or partly included whenever appropriate (e.g., the FMA for anatomical entities and the UMLS for definitions and lexical information). About 2300 predominantly medical entities were represented but also a multitude of biomolecular, epidemiological, and hemodynamic entities. The usage of the ontology in the project comprises terminological control, text mining, annotation, and data mediation.
Medical Informatics Applications; Ontology design; Intracranial aneurysm; Subarachnoid hemorrhage; Terminology
The Plant Ontology Consortium (POC) (www.plantontology.org) is a collaborative
effort among several plant databases and experts in plant systematics, botany
and genomics. A primary goal of the POC is to develop simple yet robust
and extensible controlled vocabularies that accurately reflect the biology of plant
structures and developmental stages. These provide a network of vocabularies linked
by relationships (ontology) to facilitate queries that cut across datasets within
a database or between multiple databases. The current version of the ontology
integrates diverse vocabularies used to describe Arabidopsis, maize and rice (Oryza
sp.) anatomy, morphology and growth stages. Using the ontology browser, over 3500
gene annotations from three species-specific databases, The Arabidopsis Information
Resource (TAIR) for Arabidopsis, Gramene for rice and MaizeGDB for maize, can
now be queried and retrieved.
The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data. One approach to integration is through the annotation of multiple bodies of data using common controlled vocabularies or ‘ontologies’. Unfortunately, the very success of this approach has led to a proliferation of ontologies, which itself creates obstacles to integration. The Open Biomedical Ontologies (OBO) consortium is pursuing a strategy to overcome this problem. Existing OBO ontologies, including the Gene Ontology, are undergoing coordinated reform, and new ontologies are being created on the basis of an evolving set of shared principles governing ontology development. The result is an expanding family of ontologies designed to be interoperable and logically well formed and to incorporate accurate representations of biological reality. We describe this OBO Foundry initiative and provide guidelines for those who might wish to become involved.
The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new ‘phylogenetic annotation’ process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources.
The Mammalian Phenotype Ontology (MP) is a structured vocabulary for describing mammalian phenotypes and serves as a critical tool for efficient annotation and comprehensive retrieval of phenotype data. Importantly, the ontology contains broad and specific terms, facilitating annotation of data from initial observations or screens and detailed data from subsequent experimental research. Using the ontology structure, data are retrieved inclusively, i.e., data annotated to chosen terms and to terms subordinate in the hierarchy. Thus, searching for “abnormal craniofacial morphology” also returns annotations to “megacephaly” and “microcephaly,” more specific terms in the hierarchy path. The development and refinement of the MP is ongoing, with new terms and modifications to its organization undergoing continuous assessment as users and expert reviewers propose expansions and revisions. A wealth of phenotype data on mouse mutations and variants annotated to the MP already exists in the Mouse Genome Informatics database. These data, along with data curated to the MP by many mouse mutagenesis programs and mouse repositories, provide a platform for comparative analyses and correlative discoveries. The MP provides a standard underpinning to mouse phenotype descriptions for existing and future experimental and large-scale phenotyping projects. In this review we describe the MP as it presently exists, its application to phenotype annotations, the relationship of the MP to other ontologies, and the integration of the MP within large-scale phenotyping projects. Finally we discuss future application of the MP in providing standard descriptors of the phenotype pipeline test results from the International Mouse Phenotype Consortium projects.
The Gene Ontology (GO) project (http://www.geneontology.org/) provides a set of structured, controlled vocabularies for community use in annotating genes, gene products and sequences (also see http://www.sequenceontology.org/). The ontologies have been extended and refined for several biological areas, and improvements to the structure of the ontologies have been implemented. To improve the quantity and quality of gene product annotations available from its public repository, the GO Consortium has launched a focused effort to provide comprehensive and detailed annotation of orthologous genes across a number of ‘reference’ genomes, including human and several key model organisms. Software developments include two releases of the ontology-editing tool OBO-Edit, and improvements to the AmiGO browser interface.
Ontologies are rapidly becoming a necessity for the design of efficient information technology tools, especially databases, because they permit the organization of stored data using logical rules and defined terms that are understood by both humans and machines. This has as consequence both an enhanced usage and interoperability of databases and related resources. It is hoped that IDOMAL, the ontology of malaria will prove a valuable instrument when implemented in both malaria research and control measures.
The OBOEdit2 software was used for the construction of the ontology. IDOMAL is based on the Basic Formal Ontology (BFO) and follows the rules set by the OBO Foundry consortium.
The first version of the malaria ontology covers both clinical and epidemiological aspects of the disease, as well as disease and vector biology. IDOMAL is meant to later become the nucleation site for a much larger ontology of vector borne diseases, which will itself be an extension of a large ontology of infectious diseases (IDO). The latter is currently being developed in the frame of a large international collaborative effort.
IDOMAL, already freely available in its first version, will form part of a suite of ontologies that will be used to drive IT tools and databases specifically constructed to help control malaria and, later, other vector-borne diseases. This suite already consists of the ontology described here as well as the one on insecticide resistance that has been available for some time. Additional components are being developed and introduced into IDOMAL.
The use of ontologies to control vocabulary and structure annotation has added value to genome-scale data, and contributed to the capture and re-use of knowledge across research domains. Gene Ontology (GO) is widely used to capture detailed expert knowledge in genomic-scale datasets and as a consequence has grown to contain many terms, making it unwieldy for many applications. To increase its ease of manipulation and efficiency of use, subsets called GO slims are often created by collapsing terms upward into more general, high-level terms relevant to a particular context. Creation of a GO slim currently requires manipulation and editing of GO by an expert (or community) familiar with both the ontology and the biological context. Decisions about which terms to include are necessarily subjective, and the creation process itself and subsequent curation are time-consuming and largely manual.
Here we present an objective framework for generating customised ontology slims for specific annotated datasets, exploiting information latent in the structure of the ontology graph and in the annotation data. This framework combines ontology engineering approaches, and a data-driven algorithm that draws on graph and information theory. We illustrate this method by application to GO, generating GO slims at different information thresholds, characterising their depth of semantics and demonstrating the resulting gains in statistical power.
Our GO slim creation pipeline is available for use in conjunction with any GO-annotated dataset, and creates dataset-specific, objectively defined slims. This method is fast and scalable for application to other biomedical ontologies.
The Gene Ontology (GO) project (http://www.geneontology.org/) provides structured, controlled vocabularies and classifications that cover several domains of molecular and cellular biology and are freely available for community use in the annotation of genes, gene products and sequences. Many model organism databases and genome annotation groups use the GO and contribute their annotation sets to the GO resource. The GO database integrates the vocabularies and contributed annotations and provides full access to this information in several formats. Members of the GO Consortium continually work collectively, involving outside experts as needed, to expand and update the GO vocabularies. The GO Web resource also provides access to extensive documentation about the GO project and links to applications that use GO data for functional analyses.