Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Nat Biotechnol. Author manuscript; available in PMC 2010 January 30.
Published in final edited form as:
PMCID: PMC2814061

The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration


The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data. One approach to integration is through the annotation of multiple bodies of data using common controlled vocabularies or ‘ontologies’. Unfortunately, the very success of this approach has led to a proliferation of ontologies, which itself creates obstacles to integration. The Open Biomedical Ontologies (OBO) consortium is pursuing a strategy to overcome this problem. Existing OBO ontologies, including the Gene Ontology, are undergoing coordinated reform, and new ontologies are being created on the basis of an evolving set of shared principles governing ontology development. The result is an expanding family of ontologies designed to be interoperable and logically well formed and to incorporate accurate representations of biological reality. We describe this OBO Foundry initiative and provide guidelines for those who might wish to become involved.

In the search for what is biologically and clinically significant in the swarms of data being generated by today’s high-throughput technologies, a common strategy involves the creation and analysis of ‘annotations’ linking primary data to expressions in controlled, structured vocabularies, thereby making the data available to search and to algorithmic processing1. The most successful such endeavor, measured both by numbers of users and by reach across species and granularities, is the Gene Ontology (GO)2. There exist over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO3, of which half a million have been manually verified by specialist curators in different model-organism communities on the basis of the analysis of experimental results reported in 52,000 scientific journal articles ( Data related to some 180,000 genes have been manually annotated in this way, an endeavor now being refined and systematized within the Reference Genome Project (US National Institutes of Health National Human Genome Research Institute grant 2P41HG002273-07), which will provide comprehensive GO annotations for both the human genome and a representative set of model-organism genomes in support of research on the primary molecular systems affecting human health.

From retrospective mapping to prospective standardization

The domain of molecular biology is marked by the availability of large amounts of well defined data that can be used without restriction as inputs to algorithmic processing. In the clinical domain, by contrast, only limited amounts of data are available for research purposes, and these still consist overwhelmingly of natural language text. Even where more systematic clinical data are available, the use of local coding schemes means that these data do not cumulate in ways useful to research4. One approach to solving this problem is the Unified Medical Language System (UMLS)5, a compendium of some 100 source vocabularies combined through a process of retrospective mapping based on the identification of synonymy relations between constituent terms. The UMLS has yielded very useful results for applications such as indexing and retrieval of documents. But because the separate vocabularies have no common architecture6,7, UMLS mappings do not meld their terms together into any single system8.

Increasingly, therefore, the need is being recognized for strategies of prospective standardization designed to bring about the progressive improvement and reciprocal alignment of the frameworks employed for the management, description and publication of biomedical data. Two conspicuous products of this trend are the US National Cancer Institute’s Cancer Biomedical Informatics Grid (caBIG) project9 and HL7’s Reference Information Model (RIM) ( caBIG seeks to integrate all cancer research data in a common cyberinfrastructure by standardizing the ways in which such data are acquired, formatted, processed and stored. The HL7 RIM, similarly, offers a standard for the exchange, management and integration of all information relevant to healthcare, from clinical genomics to hospital billing. However, because both caBIG and HL7 focus on the meta-level question of how data and information should be represented in computer and messaging systems, it can be argued that they fail to do justice to the object-level question of how best to represent the proteins, organisms, diseases or drug interactions that are of primary interest in biomedical research7,10.

A collaborative experiment in ontology development

In 2001, Ashburner and Lewis initiated a strategy to address this objectlevel question by creating OBO, an umbrella body for the developers of life-science ontologies. OBO applies the key principles underlying the success of the GO, namely, that ontologies be open, orthogonal, instantiated in a well-specified syntax and designed to share a common space of identifiers11. Ontologies must be open in the sense that they and the bodies of data described in their terms should be available for use without any constraint or license and so be applicable to new purposes without restriction. They are also receptive to modification as a result of community debate. They must be orthogonal to ensure additivity of annotations and to bring the benefits of modular development. They must be syntactically in good order to support algorithmic processing. And they must employ a common system of identifiers to enable backward compatibility with legacy annotations as the ontologies evolve.

OBO now comprises over 60 ontologies, and its role as an ontology information resource is supported by the NIH Roadmap National Center for Biomedical Ontology (NCBO) through its BioPortal12. At the same time, the developers of a subset of OBO ontologies have initiated the OBO Foundry, a collaborative experiment based on the voluntary acceptance by its participants of an evolving set of principles (available at that extend those of the original OBO by requiring in addition that ontologies (i) be developed in a collaborative effort, (ii) use common relations that are unambiguously defined, (iii) provide procedures for user feedback and for identifying successive versions and (iv) have a clearly bounded subject-matter (so that an ontology devoted to cell components, for example, should not include terms like ‘database’ or ‘integer’). A graphical representation of the coverage of the initial Foundry ontologies is provided in Table 1.

Table 1
Coverage of initial Foundry ontologies

Progress thus far

Since the OBO Foundry was established, ontologies such as the GO and the Foundational Model of Anatomy (FMA)13 have been reformed and new ontologies created on the basis of its principles14-16. Perhaps most importantly, ontologies have been laid to rest. Before the OBO Foundry there existed at least four cell-type ontologies: one from Bard, Rhee and Ashburner17, another from Kelso et al.18, a third implicit within the GO and the fourth a subontology within the FMA. The first three now form a single cell-type ontology (CL)19, which is itself being integrated with the cell-type representations contained within the FMA.

The Foundry initiative also serves to align ontology development efforts carried out by separate communities, for example in research on different model organisms. The potential of such research to yield results valuable for the understanding of human disease rests on our ability to make reliable cross-species comparisons. Because so much modelorganism data is localized to anatomical structures, drawing inferences on the basis of such comparisons has been hampered by the lack of coordination in anatomy ontology development among different communities. Some ontologies represent structure, others represent function, yet others represent stages of development, and some draw on combinations of these, in ways that close off opportunities for automatic reasoning. The Foundry has created a roadmap for the incremental resolution of this problem through the initiation of the Common Anatomy Reference Ontology (CARO)14, which is providing guidelines both for modelorganism communities with legacy anatomy ontologies who wish to initiate reforms in the direction of compatibility and for communities who wish to build new ontologies from scratch. CARO is based on the toplevel types of the FMA and is serving as a template for the creation of the Fish Multi-Species, Ixodidae and Argasidae (tick), mosquito and Xenopus anatomy ontologies, and also as basis for reforms of the Drosophila and zebrafish anatomy ontologies19.

The Ontology for Biomedical Investigations (OBI) addresses the need for controlled vocabularies to support integration of experimental data, a need originally identified in the transcriptomics domain by the Microarray Gene Expression Data Society (MGED), which developed the MGED Ontology20 as an annotation resource for microarray data. In response to the recognition of convergent needs in areas such as protein and metabolite characterization, this effort was broadened to become what was initially known as FuGO (Functional Genomics Investigation Ontology)21. FuGO was further expanded in 2006 to include clinical and epidemiological research, biomedical imaging and a variety of further experimentation domains to become what is today OBI, an ontology designed to serve the coordinated representation of designs, protocols, instrumentation, materials, processes, data and types of analysis in all areas of biological and biomedical investigation. Twenty-five groups are now involved in building OBI (, and the Foundry discipline has proven essential to its distributed development.

Unlike most OBO ontologies, which use the OBO file format and the associated OBO-Edit software favored by model-organism and other biologist communities, OBI uses the OWL-DL Web Ontology Language. The need to make OWL and OBO ontologies interoperable has sparked the creation of bidirectional OBO–OWL conversion tools22 that integrate data annotated in terms of the GO and other OBO ontologies with the bodies of data coming onstream within the framework of the Semantic Web23 an influential initiative to exploit OWL ontologies to encode knowledge in distributed computer systems24.

Models of good practice

Each Foundry ontology forms a graph-theoretic structure, with terms connected by edges representing relations such as ‘is_a’ or ‘part_of’ in assertions such as ‘serotonin is_a biogenic amine’ or ‘cytokinesis part_of cell proliferation’. Because relations in OBO ontologies were initially used in inconsistent ways25, the OBO Relation Ontology (RO)26 was developed to provide guidelines to ontology builders in the consistent formulation of relational assertions. These guidelines are already proving useful—for example, in the representation of anatomical change27 and in linking diverse image collections to phylogenetic datasets28.

Other areas in which the Foundry is providing guidelines include naming conventions29 and pathway representations30. The model of good practice in the formulation of definitions is the FMA13, a representation of types of anatomical entities built around two backbone hierarchies of ‘is_a’ and ‘part_of’ relations. The FMA imposes a rule whereby all definitions take the genus-species form:

  • an A = def. a B that C’s where B is the ‘is_a’ parent of A, and C are the differentia marking out that subfamily of Bs which are also As. For example,
  • cell = def. an anatomical structure that has as its boundary the external surface of a maximally connected plasma membrane
  • plasma membrane = def. a cell component that has as its parts a maximal phospholipids bilayer in which instances of two or more types of protein are embedded.

Anchoring definitions in the ‘is_a’ hierarchy in this way diminishes the role of opinion in determining where terms should be placed in the hierarchy, thereby fostering consistency both within and between ontologies and helping to prevent common errors6,7,26.

To maximize cross-ontology coordination, compound terms should be built as far as possible out of constituent terms drawn from Foundry ontologies linked using relational expressions from the RO31. This methodology of cross-products is being applied, in one of the biological projects driving the NCBO, to the annotation of Drosophila, zebrafish and human alleles for genes implicated in disease12,32. Specialist curators associate these alleles with phenotype descriptions formulated using terms drawn from more than one OBO Foundry ontology—for example, composing the Phenotypic Quality Ontology (PATO) term ‘increased concentration’ with the FMA term ‘blood’ and the ChEBI term ‘glucose’ to represent increased blood glucose phenotypes. Such creation of terms through explicit composition avoids the bottlenecks created where, as for example in the Mammalian Phenotype Ontology, each new term must be approved for inclusion in the ontology before it can be used in annotations. But the approach will work only if the resultant terms are unambiguous, and here the Foundry helps provide the necessary rigor. The orthogonality principle helps to reduce the need for arbitrary decisions between equivalent-seeming terms drawn from different ontologies, the PATO phenotypic-quality ontology provides templates for term formation, and the RO provides formally coherent glue for combination33.

The current scope of the OBO Foundry initiative is summarized in Table 2. Foundry ontologies are created and maintained by biologists with a thorough knowledge of the underlying science. Where domain experts jointly control ontology, data, and annotations (as in the case of the GO/Uniprot collaboration), all three can be curated in tandem in a way that provides a reality check at each stage of the process34. As results of experiments are described in annotations, this leads to extensions or corrections of the ontology, which in turn lead to better annotation35. The results of the Foundry’s work can then be applied by external groups as benchmarks—for example, to help identify genes mutated at significant frequencies in human cancers36 or to identify cellular components involved in antigen processing37 or, in general, to refine otherwise noisy results of text- and data-mining38-41.

Table 2
OBO Foundry ontologies (as of April 2007)

The OBO Foundry applied


A demonstration of the utility of the Foundry methodology is provided by ongoing work to create the NeuronDB database within the Senselab project ( NeuronDB encompasses three types of neuronal property: voltage-gated conductances, neurotransmitters and neurotransmitter receptors. An initial representation of neurotransmitters defined an ‘is_a’ hierarchy with classes such as ‘neurotransmitter receptor’ and subclasses such as ‘GABA receptor’. In this initial ontology, receptors were not defined, and strictly speaking one would not have known, for example, whether a receptor was a protein or a protein complex. The Foundry provided a set of principles and at least one task that may be evaluated in making such choices: namely, the scope of each ontology should be clearly bounded and (by orthogonality) no term should appear in more than one ontology. Reviewing the existing ontologies, we found that the GO Molecular Function (GO MF) ontology already had classes such as ‘receptor activity’ (GO:0004872) and a number of subclasses that described receptor activities that were referred to in NeuronDB.

We reviewed one hundred thirty resultant receptor classes. Where they existed, we reused MF classes; where they did not, we created subclasses of existing MF classes and submitted the results to GO for future inclusion. Arranging NeuronDB to interoperate transparently with GO provided the further benefit that we can now take advantage of GO annotations to find the proteins that correspond to the receptor classes by searching annotations to the MF terms. This is a model for how small ontology builders can constructively contribute to the growth of shared resources while simultaneously benefiting users of their own ontologies.


In support of research on neurodegenerative and neurological disease within the Biomedical Informatics Research Network (BIRN)42, the BIRN Ontology Task Force is applying the Foundry principles to formally represent several large domains, including (i) neuroanatomy43, where annotations must capture not only the structural systems of parthood and topological connection but also cytoarchitectural parcellations such as the CA1, CA2 and CA3 regions of the hippocampus, (ii) functional systems, such as the basal ganglion circuits for motor planning and motor memory and (iii) neurochemistry (for example, of brainstem monoamine nuclei). The members of the BIRN Ontology Task Force see the Foundry as providing a framework within which these distinct axes can be algorithmically combined, and they are incorporating the results into BIRN’s neuroimage atlasing project and using them to integrate spatially mapped microarray expression data with mouse imaging results.

The Minimum Information for Biological and Biomedical Investigations (MIBBI)

This initiative represents the first new standards effort that takes OBO and the OBO Foundry as its role model44. MIBBI provides information resources to promote the consolidation of the many prescriptive checklists that specify core metadata items to be included when reporting results in a variety of experimentation domains45. The proliferation of such ‘minimum information’ checklists has made it increasingly difficult to obtain an overview of existing specifications, unnecessarily duplicating efforts and creating problems when third parties try to use described information. The MIBBI Portal operates analogously to OBO and the NBCO Bioportal as an open information resource for all initiatives addressing these problems; the MIBBI Foundry fosters collaborative development and integration of checklists into orthogonal modules46.

How to join

Like OBO, the OBO Foundry is an open community. Any individual or group working in the domain of biomedicine wishing to join the initiative is encouraged to do so, and all discussion forums (listed at are open to all interested parties without restriction. The recommended first step is to join one or more mailing lists in salient areas as a way to become familiar with the Foundry’s collaborative methodology and identify members with overlapping expertise. Those with new ontology resources are invited to submit them for informal consideration by existing members; this will be followed by a period in which compliance with the Foundry principles is addressed, especially as concerns potential conflicts in areas of overlap. Membership in the Foundry initiative then flows from a commitment to incremental implementation of these principles as they evolve over time, with the Foundry coordinators (currently Ashburner, Lewis, Mungall and Smith) serving as analogs of journal editors, whereby the division of labor that results from orthogonality helps ensure that development decisions are made by the authors of single ontologies. By joining the initiative, the authors of an ontology commit to working with other members to ensure that, for any particular domain, there is convergence on a single ontology. Criticism, too, is welcomed: the Foundry is an attempt to apply the scientific method to the task of ontology development, and thus it accepts that no resource will ever exist in a form that cannot be further improved.

Our long-term goal is that the data generated through biomedical research should form a single, consistent, cumulatively expanding and algorithmically tractable whole. Our efforts to realize this goal, which are still very much in the proving stage, reflect an attempt to walk the line between the flexibility that is indispensable to scientific advance and the institution of principles that is indispensable to successful coordination.


The Foundry is receiving ad hoc funding under the BISC Gen e Ontology Consortium, MGED, NCBO and RNA Ontology grants. We are grateful to all of these sources, and also to the ACGT Project of the European Union and to the Humboldt and Volkswagen Foundations.


1. Yue L, Reisdorf WC. Pathway and ontology analysis: emerging approaches connecting transcriptome data and clinical endpoints. Curr Mol Med. 2005;5:11–21. [PubMed]
2. Gene Ontology Consortium. The Gene Ontology (GO) project in 2006. Nucleic Acids Res. 2006;34(database issue):D322–D326. [PMC free article] [PubMed]
3. Camon E, et al. The Gene Ontology Annotation (GOA) Project. Genome Res. 2003;13:662–672. [PubMed]
4. Kohane IS, et al. Building national electronic medical record systems via the World Wide Web. J Am Med Inform Assoc. 1996;3:191–207. [PMC free article] [PubMed]
5. Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32(database issue):D267–D270. [PMC free article] [PubMed]
6. Ceusters W, Smith B, Kumar A, Dhaen C. Mistakes in medical ontologies: where do they come from and how can they be detected? Stud Health Technol Inform. 2004;102:145–164. [PubMed]
7. Ceusters W, Smith B, Goldberg L. A terminological and ontological analysis of the NCI Thesaurus. Methods Inf Med. 2005;44:498–507. [PubMed]
8. Campbell KE, Oliver DE, Shortliffe EH. The Unified Medical Language System. Toward a collaborative approach for solving terminologic problems. J Am Med Inform Assoc. 1998;5:12–16. [PMC free article] [PubMed]
9. Buetow KH. Cyberinfrastructure: empowering a ‘third way’ in biomedical research. Science. 2005;308:821–824. [PubMed]
10. Smith B, Ceusters W. HL7 RIM: an incoherent standard. Stud Health Technol Inform. 2006;124:133–138. [PubMed]
11. Ashburner M, Mungall CJ, Lewis SE. Ontologies for biologists: a community model for the annotation of genomic data. Cold Spring Harb Symp Quant Biol. 2003;68:227–236. [PubMed]
12. Rubin DL, et al. National Center for Biomedical Ontology: advancing biomedicine through structured organization of scientific knowledge. OMICS. 2006;10:185–198. [PubMed]
13. Rosse C, Mejino JLF. The Foundational Model of Anatomy ontology. In: Burger A, et al., editors. Anatomy Ontologies for Bioinformatics. Springer; New York: in the press.
14. Haendel M, et al. CARO: the Common Anatomy Reference Ontology. In: Burger A, et al., editors. Anatomy Ontologies for Bioinformatics. Springer; New York: in the press.
15. Leontis NB, et al. The RNA Ontology Consortium: an open invitation to the RNA community. RNA. 2006;12:533–541. [PubMed]
16. Natale DA, et al. Framework for a protein ontology. BMC Bioinformatics [online] in the press. [PMC free article] [PubMed]
17. Bard J, Rhee SY, Ashburner M. An ontology for cell types. Genome Biol [online] 2005;6:R21. [PMC free article] [PubMed]
18. Kelso J, et al. eVOC: a controlled vocabulary for unifying gene expression data. Genome Res. 2003;13:1222–1230. [PubMed]
19. Mabee PM, et al. Phenotype ontologies: the bridge between genomics and evolution. Trends Ecol Evol. 2007;22:345–350. [PubMed]
20. Whetzel PL, et al. The MGED Ontology: a resource for semantics-based description of microarray experiments. Bioinformatics. 2006;22:866–873. [PubMed]
21. Whetzel PL, et al. Development of FuGO: an ontology for functional genomics investigations. OMICS. 2006;10:199–204. [PMC free article] [PubMed]
22. Golbreic C, et al. Proceedings 6th International Semantic Web Conference (ISWC 2007) Springer; OBO and OWL: leveraging semantic web technologies for the life sciences. in the press.
23. Brinkley JF, Detwiler LT, Gennari JH, Rosse C, Suciu D. A framework for using reference ontologies as a foundation for the semantic web. Proc AMIA Fall Symposium. 2006:95–100. [PMC free article] [PubMed]
24. Lacy LW. Owl: Representing Information Using the Web Ontology Language. Trafford Publishing; Victoria, BC, Canada: 2005.
25. Smith B, Köhler J, Kumar A. On the application of formal principles to life science data: a case study in the Gene Ontology. Data Integration in the Life Sciences (DILS) Workshop. 2004:79–94.
26. Smith B, et al. Relations in biomedical ontologies. Genome Biol [online] 2005;6:R46. [PMC free article] [PubMed]
27. Bittner T, Goldberg LJ. Spatial location and its relevance for terminological inferences in bio-ontologies. BMC Bioinformatics. 2007;23:1674–1682. [PMC free article] [PubMed]
28. Ramírez MJ, et al. Linking of digital images to phylogenetic data matrices using a morphological ontology. Syst Biol. 2007;56:283–294. [PubMed]
29. Schober D, et al. Towards naming conventions for use in controlled vocabulary and ontology engineering. Bio-Ontologies Workshop, ISMB/ECCB; Vienna. 20 July 2007; pp. 87–90.
30. Ruttenberg A, Rees J, Zucker J. What BioPAX communicates and how to extend OWL to help it. OWL: Experiences and Directions Workshop Series. 2006. <>.
31. Hunter L, Bada M. Enrichment of OBO ontologies. J Biomed Inform. 2007;40:300–315. [PMC free article] [PubMed]
32. Hill DP, Blake JA, Richardson JE, Ringwald M. Extension and integration of the Gene Ontology (GO): combining GO vocabularies with external vocabularies. Genome Res. 2002;12:1982–1991. [PubMed]
33. Mungall CJ. Obol: integrating language and meaning in bio-ontologies. Comp Funct Genomics. 2004;5:509–520. [PMC free article] [PubMed]
34. Camon E, et al. The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res. 2004;32(database issue):D262–D266. [PMC free article] [PubMed]
35. Blake J, Hill DP, Smith B. Gene Ontology annotations: what they mean and where they come from. Bio-Ontologies Workshop, ISMB/ECCB; Vienna. 20 July 2007; pp. 79–82.
36. Sjoblom T, et al. The consensus coding sequences of human breast and colorectal cancers. Science. 2006;314:268–274. [PubMed]
37. Lee JA, et al. Components of the antigen processing and presentation pathway revealed by gene expression microarray analysis following B cell antigen receptor (BCR) stimulation. BMC Bioinformatics [online] 2006;7:237. [PMC free article] [PubMed]
38. Rebholz-Schuhmann D, Kirsch H, Couto F. Facts from text—is text mining ready to deliver? PLoS Biol [online] 2005;3:e65. [PMC free article] [PubMed]
39. Witte R, Kappler T, Baker CJO. Ontology design for biomedical text mining. In: Baker CJO, Cheung K-H, editors. Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences. Springer; New York: 2007. pp. 281–313.
40. Zhang S, Bodenreider O. International Workshop on Ontology Matching. OM; 2006. Aligning multiple anatomical ontologies through a reference; pp. 193–197.
41. Luo F, et al. Modular organization of protein interaction networks. Bioinformatics. 2007;23:207–214. [PubMed]
42. Martone ME, Gupta A, Ellisman MH. E-neuroscience: challenges and triumphs in integrating distributed data from molecules to brains. Nat Neurosci. 2004;7:467–472. [PubMed]
43. Fong L, et al. An ontology-driven knowledge environment for subcellular neuroanatomy. OWL Experiences and Directions, 3rd International Workshop; Innsbruck, Austria. June 6–7 2007; in the press.
44. Taylor CF, et al. Promoting coherent minimum reporting requirements for biological and biomedical investigations: the MIBBI Project. Nat Biotechnol. in the press. [PMC free article] [PubMed]
45. Brazma A, et al. Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat Genet. 2001;29:365–371. [PubMed]
46. Sansone SA, et al. A strategy capitalizing on synergies: the Reporting Structure for Biological Investigation (RSBI) working group. OMICS. 2006;10:164–171. [PubMed]
47. Grenon P, Smith B, Goldberg L. Biodynamic ontology: applying BFO in the biomedical domain. In: Pisanelli DM, editor. Ontologies in Medicine. IOS; Amsterdam: 2004. pp. 20–38. [PubMed]