OBO, the Open Biomedical Ontologies project, originated as a collection of controlled vocabularies [1
]. At that time, OBO ontologies consisted of terms, which were interconnected by typed binary relationships, such as is
. Since then, OBO's scope was augmented towards medicine and it was therefore renamed from "Open Biological Ontologies" to "Open Biomedical
Ontologies". It was supplemented by a formal language, the OBO file format, which grew in semantic complexity over time. The use of the Semantic Web standard ontology language OWL [2
] based on description logics (DLs) [4
], was encouraged, and tools for conversion between the OBO file format and OWL were proposed [5
]. Finally, a set of principles was proposed for the coordinated development of non-overlapping ontologies [6
The OBO Site provides three different kinds of ontologies (all ontologies referred to in this paper are available via the OBO Foundry portal http://www.obofoundry.org
). We therefore refrain from indicating references to specific ontologies mentioned in the paper
• The OBO Foundry ontologies, a selection of eight ontologies that, after expert review performed in 2010, were declared to sufficiently comply with the OBO Foundry principles. The following ontologies constitute the OBO Foundry collection: (i) CHEBI: Chemical Entities of Biological Interest; (ii) GO: Gene Ontology Cellular Component; (iii) GO: Gene Ontology Molecular Function; (iv) GO: Gene Ontology Biological Process; (v) PATO: Phenotypic Quality Ontology; (vi) PRO: Protein Ontology; (vii) XAO: Xenopus Anatomy Ontology; and (viii) ZFA: Zebrafish Anatomy Ontology.
• The OBO Foundry candidate ontologies and other ontologies of interest
. This is a heterogeneous, steadily growing collection of currently 91 ontologies, only a few of which claim to follow the OBO Foundry principles. Among this set six candidate ontologies were considered close to being included into the Foundry [7
], viz.: (i) CL: Cell Ontology; (ii) FMA: Foundational Model of Anatomy Ontology; (iii) EnvO: Environment Ontology; (iv) HPO: Human Phenotype Ontology; (v) OBI: Ontology for Biomedical Investigations; and (vi) SO: Sequence Ontology.
• A collection of ontologies called 'Mappings between, logical definitions for, bridging, and relations for combining, ontologies', contains 62 resources [8
]. They consist of 22 mapping files, one ontology (BFO), four bridges, four relation ontologies and 31 cross-product ontologies with logical definitions.
In the work presented here, we aim to analyze the correctness of the use of logic by the OBO Foundry or close-to OBO Foundry ontologies and related mappings. We concentrate on OWL, as this is considered the language of choice for creating and exchanging ontologies [3
]. OWL subscribes to a model-theoretic semantics which leads to logically crisp and far-reaching entailments bearing the risk of creating unintended implications or misinterpretations. We use the phrase "unintended consequence" to describe assertions, and entailments from assertions, which are contrary to the intention of the modeler. Their identification and prevention is crucial for good quality, as otherwise automated reasoning produces unreliable results. The paper is organized as follows. In the next subsections of the Background section we will provide terminological clarifications and insight into OBO and OWL syntax and semantics. The Methods section describes the sampling, rating and evaluation of a key element in OWL ontologies, existential restrictions
, which we hypothesize as constituting a major source of axioms leading to unintended consequences in biomedical ontologies. In the rather extensive Discussion section classes of erroneous modeling decisions are illustrated by examples and possible alternatives are discussed. In the concluding section we summarize the lessons learnt from this experiment and give suggestions for improving ontology quality in the OBO Foundry.
Terminologies vs. Ontologies
Here we introduce the basic concepts underlying our work, highlighting the implications resulting from commitment to different paradigms for semantics (terminology vs. description logics) and syntax (OBO vs. OWL).
The need for standards to semantically annotate different kinds of resources has been addressed by controlled vocabularies and terminology systems [9
], language-oriented artifacts that relate word senses by informal thesaurus-style relations. The need to facilitate the interpretation of these language-oriented artifacts by computers initiated a trend of formalizing their semantics, which was supported by logic-based ontologies. The Gene Ontology (GO) [10
] was a pioneer for moving from a purpose-oriented annotation vocabulary to a more principled resource. GO has been one of the driving forces of OBO. It is also motivated by the evolution of ontological principles rooted in analytical philosophy, as well as by cross-fertilization between the Semantic Web [11
] and Life Sciences communities [12
OBO vs. OWL
The move from the OBO format to OWL mirrors this progress from the representation of term meanings towards the representation of the domain entities that the terms denote and their properties. The Web Ontology Language OWL [2
], now available in its second release [3
], provides an abstract syntax for a language encompassing different flavors of description logics (DLs), a family of decidable fragments of first-order logic [4
]. In contrast, the OBO flatfile format [14
] represents a semantic network of nodes (terms) and edges (relationships), together with metadata and linguistic information (synonyms). At its current state, the OBO format is not a formal language. The definition of a formal semantics for the language is a work in progress. For a draft of the OBO syntax and semantics see http://berkeleybop.org/~cjm/obo2owl/obo-syntax.html
. A preliminary implementation thereof is available at http://code.google.com/p/oboformat/
. To further elucidate the distinction between these two formalisms, consider the following example from the mouse anatomy ontology.
This extract asserts the relationship part_of between the terms ankle and hindlimb in OBO format.
relationship: part of MA:0000026 ! hindlimb
This assertion does not commit to a semantics in terms of the real world entities which are denoted by the terms. It does not allow us to infer that, e.g., all hindlimbs have ankles, or all ankles are part of a hindlimb. Descriptions at this level require some kind of ontological interpretation for the OBO syntax in terms of OWL axioms, as OWL axioms are explicitly quantified. One such interpretation is given by the OBO2OWL specification [15
]. According to this specification, each relationship in OBO format translates to the following existential OWL restriction, illustrated in the compact OWL Manchester syntax [16
Ankle subClassOf part_of some Hindlimb
Making proper use of description logics (and avoiding unintended consequences) requires understanding their very crisp notions of "class" and "relationship". Classes such as Ankle are interpreted as sets of all individuals that correspond to the definitional criteria of that class, i.e., here: all particular ankles in the domain of mouse anatomy. Relationships are then sets of pairs of class instances like has_part or part_of, which extend to all pairs of objects in the domain that are related in terms of parts and wholes. So, all pairs of mouse ankle instances with their respective Hindlimb instances are in the extension of the relation part_of.
It is the reference to instances that makes up the greatest difference between the OBO term-based approach and the OWL class approach, and which explains why the latter is semantically more precise. The description logics on which OWL is based, in contrast to OBO, cannot straightforwardly assert relationships directly between terms or classes. As shown above, relationships always hold between individuals and need to be quantified when classes are to be connected. Quantification can consist in existential quantification ("some",
), universal value restriction ("only", ∀), or cardinality restrictions (max n
; min n
; exactly n
). Our mouse limb example could therefore be alternatively translated into at least the following three OWL expressions:
(i) Ankle subClassOf part_of some Hindlimb
(ii) Ankle subClassOf part_of exactly 1 Hindlimb
(iii) Ankle subClassOf part_of only Hindlimb
(i, the existential restriction) expresses that every instance of the class ankle is part of at least one instance of the class Hindlimb;
(ii, the cardinality restriction) is stricter and expresses that every instance of the class ankle is part of exactly one instance of the class Hindlimb;
(iii, the universal restriction) expresses that an instance of the class ankle can only be part of instances of the class Hindlimb.
In this case the choice of (i) as the default OBO to OWL translation target representation looks plausible. At least with the relation part_of, the option (ii) would be too strict, and the representation (iii) would conflict with the transitivity behavior of the relation part_of, since an instance of the class ankle is also part of the body that the hindlimb is part of.
(G) ontological dependence can be defined according to [17
x dependsG for its existence upon Fs = df
Necessarily, x exists only if some F exists
The first two representations - (i) and (ii) above - express ontological dependence between the two classes, that is, that there is no ankle without a hindlimb it is part of, by the semantics assigned to the some and exactly OWL constructs, namely, that for each instance of the first class there is at least one instance of the related class. Thus, every instance of Ankle existentially depends on some instance of Hindlimb. Representation (iii) has a remarkable property, which might be easier seen in an equivalent formulation:
(iii') Ankle subClassOf
not (part_of some not Hindlimb)
In contrast to (i) and (ii), proposition (iii) does not express any ontological dependence. Bearing in mind that the first representation is the one favored by the OBO2OWL conversion, which makes most of the native OBO ontologies available in OWL, the question is now whether its very strong claim about dependence can be upheld for each and every relational statement in OBO ontologies and cross products. There are many kinds of relational statements for which this claim is obviously too strong. We will certainly not want to interpret statements such as "Aspirin treats headache", or "Smoking causes cancer" in the sense that there is some headache for each and every aspirin tablet, or that there is no smoking event that is not a cause of some cancer.