Main ontology
The main version of the ontology consists of over 6,500 classes [
24] (all ontology statistics are based on the September 2011 release version and exclude classes that have been obsoleted or deprecated), representing a variety of anatomical structures, grouped according to high-level categories from CARO. These include anatomical systems such as 'nervous system' and 'circulatory system'; organs such as 'heart', 'eye', 'brain', 'mesonephros' and 'pancreas'; tissues such as 'adipose tissue', 'cardiac tissue' and 'mesenchymal tissue'; developmental structures such as 'neural tube', 'pancreatic bud' and 'embryonic cloaca'; appendages or organism subdivisions such as 'feather', 'pelvic girdle' and 'limb'. For structures that are distributed over or repeated in multiple body parts, we provide explicit pre-coordinated compositional classes-for example, 'epithelium of lung', 'colonic mucosa', 'femoral epiphysis', 'forelimb skeleton', and 'apical ectodermal ridge of hindlimb'. Each class is in the UBERON namespace, and is uniquely identified by a URI of the form:
http://purl.obolibrary.org/obo/UBERON_nnnnnnnIn this paper we shorten URIs to ID form, and for readability we refer to classes using the class label (enclosed in single quotes), with relations in italics. In contrast to corresponding classes in scAOs, these classes are explicitly intended to be applicable across a range of taxa where appropriate. For example, the class 'lung' is applicable to both avian and mammalian lungs.
We provide multiple download and import options for the ontology, each varying in complexity and scope, ranging from a simple subset of the core ontology to a multi-ontology import. The download table is available as Additional file
1, and is also summarized on the main web page (
http://uberon.org).
The ontology is richly axiomatized, using a variety of constructs from the language OWL2-DL. In this paper we describe these and present examples using OWL Manchester Syntax [
25]. These axioms include (but are not limited to) the
is_a, part_of and
develops_from links typically found in AOs used to represent the composition and ontogeny of structures [
26]. These are all represented in OWL as
SubClassOf axioms together with existential restrictions (for example, 'pulmonary alveolus'
SubClassOfpart_of some 'lung', meaning every pulmonary alveolus is part of a lung-but
not implying that all lungs have alveoli). We describe the other logical axioms in more detail in the sections that follow. The full set of relations is available as Additional file
2. In addition to these logical axioms, the ontology also includes non-logical annotations typically found in AOs, such as textual definitions, synonyms, comments and provenance metadata. Table shows some of the characteristics of Uberon compared against existing anatomical ontologies. It is larger than some model organism ontologies such as MA (mouse) and ZFA (zebrafish), but is dwarfed by the more detailed FMA, with 80,000 classes. Over 70% of the classes in Uberon have textual definitions, and over one-third have computable definitions that can be used by reasoners for automated classification.
The main ontology is available in both Open Biomedical Ontologies (OBO) format and OWL [
27]. We also provide a basic version of the ontology, which contains all the same classes, but only a simple subset of the relationships (currently
is_a, part_of and
develops_from) [
28]. Both of these ontologies have been classified in advance using a reasoner. A number of optional extensions are provided, and are discussed in more detail below.
Multi-species bridging extensions
Most classes in the ontology are applicable across multiple species, and many are generalizations of classes in individual scAOs. For example, both FMA and ZFA contain classes called 'pelvic girdle', but with definitions inapplicable outside tetrapods and teleosts, respectively. The Uberon class 'pelvic girdle' (UBERON:0001271) subsumes the FMA and ZFA classes, and includes a generalized definition that is derived from the FMA definition, but has been modified to be applicable across vertebrates.
Figure depicts the Uberon class 'lung' together with classes from individual scAOs, and the relationships connecting them. The resulting structure allows integrative queries over multiple databases annotated using different ontologies, one of the main use cases driving the development of Uberon. For example, a query for genes expressed in the Uberon (generic) 'lung' should return gene expression data annotated to the scAO lung classes, as well as individual parts, such as the mouse 'lung alveolus' (MA:0000420).
We have included over 17,000 connections between Uberon and scAO classes, derived through a combination of lexical matching, reasoning and manual curation (see Materials and methods). These connections are available in two different ways. In the main ontology, they are present as semantics-free cross-references ('xrefs' in OBO format). In addition, they are available as logical axioms distributed in separate bridging ontologies. These bridging axioms are imported together with the main ontology plus the relevant anatomical ontologies by means of taxonomically scoped 'collection' ontologies such as:
collected-metazoa.owl
collected-vertebrate.owl
collected-mammal.owl
The import hierarchy for each of these collection ontologies is illustrated in Figure . Each collector ontology imports the core ontology, bridging ontologies, and the individual species anatomy ontologies. The bridging ontologies contain either SubClassOf or EquivalentClasses axioms connecting the generic Uberon class to a taxonomic subtype or equivalent. For example, the mouse class 'lung' (MA:0000415) is declared equivalent to an Uberon class 'lung' (UBERON:0002048) that is part_of a mouse (NCBITaxon:10088).
As a general rule, we only include classes in Uberon where there is a need to generalize over existing ontology classes. Thus, 'lung' is present in Uberon as the super-class of the corresponding classes in MA, FMA, and Amphibian Anatomy Ontology (AAO). In some cases, there is the need to generalize a class from a single source ontology when it is relevant to mulitiple taxa-for example, 'brainstem nucleus' as found in the FMA. However, we do not include 'Weberian apparatus' because this structure is not found outside Otophysi, and this clade is already within the scope covered by the multi-species TAO. There would be no value in including this in the core ontology, as the class would be equivalent to the TAO class. Note that the combined vertebrate module that imports TAO would include the Weberian apparatus as part of the pan-vertebrate vertebral column.
One advantage to this taxon modularization approach is that it is relatively easy to include new AOs as they become available, and moreover, to seed them directly from existing applicable Uberon classes. For example, the currently in-preparation archosaur and chicken ontologies will be made interoperable with Uberon as per Figure . Bridging axioms will be created to these AOs, and a derived amniote ontology would include the union of the taxon-restricted amniote portion of Uberon, and the archosaur and chicken AOs.
Multi-species composite ontologies
The combined modules above allow for reasoning and queries involving classes from multiple ontologies, but the resulting ontology structure can pose problems for ontology search and navigation, due to the presence of multiple named classes for each taxonomic variant of a structure. For example, when collected-vertebrate is loaded into an ontology visualization environment, the midbrain is visible at least four times, once each for mouse, human and zebrafish, and once for the generic vertebrate layer. The parts of the midbrain are also represented using a different class in each species, resulting in an ontology structure that is difficult to navigate because of the duplicity of labels and a complex lattice of multiple inheritance. In addition, query efficiency and reasoner classification time may be adversely affected by the prolifieration of classes. To avoid these problems we provide 'composite' ontologies, in which the taxonomic equivalents are automatically merged into the generic Uberon class. If a class has no taxonomic equivalent in Uberon, we do not merge it, placing it at the appropriate place in the ontology. For example, in the composite-vertebrate anatomy file, the multiple scAO classes for 'midbrain' have been merged into a single Uberon class, representing the pan-vertebrate structure. This ontology also includes classes not in the Uberon namespace, such as 'torus longitudinalis', which is represented by the zebrafish anatomy class ZFA:0001360 and is linked to the generic 'midbrain' class via part_of relationships.
Each model organism anatomy contains relationships that cannot be guaranteed to apply outside that taxon. For example, the XAO includes an axiom that the parathyroid develops from the '3rd pharyngeal arch' (called '1st branchial arch' in XAO)-but this cannot be generalized to all species with a parathyroid (Uberon includes a weaker axiom that states that all parathyroids develop from some pharyngeal arch, where the particular arch is not specified). When we merge the species class into the generic class we render these axioms safe by translating them into OWL General Class Inclusion (GCI) axioms. The composite vertebrate ontology contains the following axiom:
'parathyroid gland'(UBERON:0001132) and part_of some 'Xenopus'(NCBITaxon:8353) SubClassOfdevelops_from some 'pharyngeal arch 3'(UBERON:0003114)
That is, every parathyroid found in an instance of Xenopus developed from a third arch. This does not imply that a human parathyroid develops from the same arch.
Spatial and topological relationships
The ontology includes a rich set of spatial relationships-for example, every 'cranial nerve II' is
continuous_with some 'retina'; every 'nerve fiber layer of the retina' is
adjacent_to some 'inner limiting layer of the retina'. These can be used to enhance gene expression or phenotype queries, allowing the user to expand the query to include overlapping, continuous or adjacent regions. As well as being useful for end-user queries, many of these relations are vital for defining classes-for example, the interdigital regions between digits in human and mouse are defined by which digits they are adjacent to (see example in Table ). We also include a subset of the relations defined in the spatial ontology (BSPO), such as
anterior_to. Of these spatial relations, the most widely used are
in_left_side_of and
in_right_side_of, which are used to define the lateral halves of bilaterally symmetric or paired structures. For example, the left lobe of the thyroid gland is defined as a 'lobe of thyroid gland' that is
in_left_side_of some 'thyroid gland'. The class 'left kidney' is defined as a kidney that is
part_of some 'left side of organism', which is itself defined using the
in_left_side_of relation. A full list of all relations is provided in Additional file
2.
Life cycle stages
Uberon also includes a small sub-hierarchy of 29 life cycle stages (seeded from the stage ontology in the upper-level Bilaterian Ontology BILA), connected via
is_a, part_of and
preceded_by relations. Many of these stages are linked to and defined by a GO process (for example, the 'neurula stage' is linked to the GO process 'neurulation' via the
coincides_with relation). There are relationships between anatomical entities and stages (for example, 'extra-embryonic structure' starts and ends during 'embryo stage'. Uberon stages subsume those of scAOs-for example, Uberon:'larva stage' would subsume the zebrafish stages 'larval:protruding mouth (72 hrs-96 hrs)' through 'larval:days 21-29'. Many temporal relations are required for all possible combinations of connections between stages, processes and anatomical entities; these are in the process of being formally defined (F Neuhaus, A Ruttenberg, and D Osumi-Sutherland, personal communication). See Additional file
2 for a description of these relations. Note that these links between anatomical structures, stages, and biological processes are not fully implemented and are intended as a first step towards temporal reasoning across developmental structures. At this time, these relations are course-grained, that is, we do not attempt to subsume individual Thelier and Carnegie stages [
29].
Inter-ontology relationships
We have included relationships and other logical axioms that reference other ontologies in Uberon, such as the GO, the Neuro Behavior Ontology (NBO) [
30], the CL [
15], the Protein Ontology [
31] and CHEBI [
32].
For connections between anatomical structures and GO or Neuro Behavior Ontology, we use the
capable_of relationship and the
has_function_in relationships [
33], such as, for example, 'parathyroid gland'
capable_of 'parathyroid hormone secretion'. For connecting to CL, we use
has_part to indicate the cellular composition of different organ parts and tissues. In the future we may use a more specific relation such as
has_granular_part.
Note that all inter-ontology relationships are excluded from the main ontology, but are included in a merged ontology that also includes subsets of the external ontologies referenced together with the graph closure of all referenced classes. The merged ontology is available at [
34].
One of the uses of the merged ontology is enhancing similarity-based queries and link-mining analyses. Without the use of these inter-ontology axioms, a gene that is implicated in 'ataxia' would show little ontological similarity with a gene implicated in 'abnormal cerebellar morphology'-but if there is a link between the cerebellum and the behavior 'gait', then a path can be established between these two phenotypes.
Managing taxonomic variation
One of the main challenges involved in developing any multi-species ontology (and, in many cases, single-species ontologies) is accommodating organism variation. In a 'canonical' human anatomy ontology we can assert axioms such as ('mammary gland' SubClassOfpart_of some 'female thoracic region'), but this is false for many non-human mammalian mammary glands (and, in rare cases, some human mammary glands). We accommodate this variation by making the generic 'mammary gland' class location-neutral, and then introducing subclasses for each location in which this gland can appear-for example, 'thoracic mammary gland', 'abdominal mammary gland', and so on. Note that we assign the FMA class 'lactiferous gland' as the taxonomical equivalent of 'thoracic mammary gland', rather than the more general 'mammary gland', because most human mammary glands are part of the thoracic region. We call this the named subclass approach to variation.
In some cases this scheme can lead to inflation in the number of ontology classes, leading to unwieldy multiple inheritance. For example, the adenohypophysis has different developmental origins in different species-while in most basal fish and tetrapods the adenohypophyseal anlagen invaginates to form Rathke's pouch, in teleost fish the adenohypophyseal placode does not invaginate but rather maintains its initial organization, forming a solid structure in the head [
35]. If we were to use the named subclass scheme, we would introduce a class 'Rathkes pouch-derived adenohypophysis', but if we were to do this for all developmental variation, the results would be awkward and unnatural for end-users. Instead we take a different approach and create an OWL GCI axiom:
('adenohypophysis'(UBERON:0002196) and part_of some 'Tetrapoda'(NCBITaxon:32523) SubClassOfdevelops_from some 'Rathkes pouch' (UBERON:0006377)
The GCI approach accommodates taxonomic variation without inflating the ontology, at the expense of requiring OWL-aware tools to properly interpret the ontology. Note that these are similar to the GCIs that are created automatically when making the composite multi-species ontologies (see preceding section). The difference is that these are created manually, and encompass a wider variety of taxa. Generalizing developmental relationships across taxa can be controversial-there may be exceptions to the above rule within tetrapods, in which case we would replace 'Tetrapoda' with the appropriate taxon or set of taxa.
Automation of ontology maintenance via logical axioms
In addition to simple relationships connecting classes, we have enhanced the ontology with a wide range of additional logical axioms. These primarily fall into three categories, examples of which are shown in Table : computable definitions, disjointness axioms and taxonomic constraints.
These axioms are intended primarily to assist with automated maintenance, quality control and classification of the ontology. This is particularly important for Uberon, which must remain in sync and consistent with multiple other ontologies.
Computable definitions
Over one third of the classes in Uberon have computable definitions-encoded as equivalence axioms between a named class and an intersection of two or more class expressions. These definitions allow a reasoner to automatically compute subsumption relationships between classes-for example, 'epiphysis of finger' can be automatically classified as a subtype of 'epiphysis of digit'. Asserting these manually would take considerable curator resources, and would be error-prone. The use of computable definitions in Uberon aids maintenance and can reveal potentially missing classes in the scAOs.
Disjointness axioms
If two classes are declared disjoint, it means that nothing can be an instance of both. If a class is inferred to be a subclass of two disjoint classes, the reasoner will flag it as unsatisfiable-this is a useful tool for detecting mistakes in the ontology, particularly in the context of an ontology that attempts to unify multiple other ontologies. We have created 410 disjointness constraints between classes in the ontology. In addition, we have created 751 spatial disjointness axioms in the ontology. For example, the brain and the spinal cord share no parts, or the central and peripheral nervous systems share no parts-though there may be some structures that overlap both, such as axon tracts. Uberon uses a standard merological definition of parthood, such that if A is
part_of B, then every part of A is
part_of B. If A
overlaps B, then A and B share some part in common. Many of these axioms in the neural portion of Uberon were derived from the Allen Brain Atlas [
36], and have proved useful in fixing problems with the ontology and individual species ontologies.
Taxonomic constraints
We have adopted the GO system of taxonomic constraints [
37], and added 216
only_in_taxon and
never_in_taxon to constraints to the ontology. These constraints are useful documentation for human users of the ontology, but their primary purpose is for automated consistency checking within the ontology and across ontologies. For example, if the FBbt class 'tibia' (FBbt:00004642), which represents a segment of an insect leg, were to be accidentally placed as a subclass or equivalent of 'tibia' (UBERON:0000979) based on the fact they share the same label, then a reasoner would infer that this class is formally unsatisfiable based on the three statements: (1) UBERON 'tibia'
SubClassOf bone; (2) bones are never found in organisms that are not vertebrates; and (3) FBbt:00004642 can be found in
D. melanogaster (Figure ). In addition to automatic error-checking, these constraints can be used to create taxon-specific sub-modules of the entire ontology as described above (see Materials and methods). For example, if the scope of interest of a particular application is limited to Aves, then we can generate a sub-module that excludes structures such as fins, teeth and mammal-specific brain structures.
We provide some pre-generated taxon subsets as part of the release process, including a basic-amniote subset and a basic-aves subset.
Maintenance of cross-ontology links
Uberon connects to multiple other ontologies, particularly other anatomical ontologies. Many of these ontologies are constantly evolving. We perform regular all-by-all lexical matching between all anatomical ontologies (see Materials and methods) to identify potential new connections. However, we never rely entirely on lexical matching-we use the output of lexical matching as suggestions that are manually vetted, sometimes after opening a dialog with the maintainers of the external ontology. The use of disjointness axioms and taxonomic constraints in the ontology also assists in detecting incorrect associations. The equivalence axioms are also used to automatically associate between species classes and generic classes.
Provenance of metadata and relationships
Ontologies are constructed using information from multiple sources, including research articles, reviews, textbooks, encyclopedias, medical dictionaries and discussions with experts. It is important to track the provenance of all information collected in an ontology, and this is particularly important for an ontology such as Uberon, which as a matter of expedience frequently includes 'tertiary sources' such as Wikipedia, and other ontologies.
We attempt to include provenance identifiers for all definitions, synonyms and relationships. In each scenario, the item of provenance is an identifier that refers to an external source, such as a PubMed identifier or an ontology identifier. Multiple cross-references can be added to any piece of information. We include comprehensive Wikipedia cross-references, even where we have chosen to supplant the Wikipedia summary with our own definition. These can be used to build web pages that combine the structured ontology information from the ontology with the text from Wikipedia. Of the 4,692 definitions in Uberon, 2,293 have an association with a Wikipedia page. Two of the most frequently used resources are the Mammalian Phenotype Ontology (MPO; 379) and the GO (324)-both of these ontologies include a detailed implicit ontology of anatomical structures. There are 190 classes that take definitions from the FMA, but in many cases these are generalized to be applicable to non-humans. We are gradually refining definitions directly using the literature and expert review; at this time 100 definitions reference a Pubmed ID or a reference to a standard textbook, though many of the the definitions that cite an ontology term ID are indirectly citing a primary source.
We attempt to provide provenance for each synonym. For example, a synonym for the class 'cortex of kidney' (UBERON:0001125) is 'cortex renalis', which is marked as being used in Termina Anatomica and FMA:15581 (the Termina Anatomica synonyms are almost all derived from FMA). The FMA is the most commonly used source of external synonyms (4,133). In some cases, use of synonyms is contradictory-here we mark them as such and indicate the source of the synonym. An example of an inconsistent synonym is 'arm'-the MA (and Uberon) use this to mean the part of the forelimb that includes stylopod and zeugopod-in FMA it means just the stylopod region or the forelimb. The situation is analogous for 'leg'.
We also attempt to provide provenance on a per-relationship basis. This is particularly important for developmental relationships, which may not be straightforward to determine within a species and are even more difficult to generalize across species. In the future, we aim to provide evidence types as well as links to the source of the information, akin to GO annotations. Many of the relationships in Uberon have been sourced from other ontologies, but in most cases these have been checked to ensure they are applicable at the broader taxonomic level.
Use of Uberon enhances queries in single organisms
One of the original motivations for the creation of the ontology was to integrate datasets from different species. More recently, we have found that the use of Uberon can enhance query capabilities within a single species. For example, neither the FMA nor the MA have developmental relationships, so we cannot query for all pharyngeal arch derived-structures using these ontologies alone. However, using either one of these ontologies in combination with Uberon and the appropriate bridging ontology, we can perform a description logic query to find all pharyngeal arch-derived structures (such as the human premaxilla and the mouse palatal shelf epithelium). See Additional files
3 and
4 for a full list of structures.
An integrated anatomy ontology enables modular ontology construction
One of the main motivating factors for a multi-species anatomy ontology is the modular construction of other ontologies. For example, the environment ontology (ENVO) needs to include a number of organism-associated habitats, ranging from the gut of a termite to a human armpit. Similarly, GO and CL [
15] classes such as 'blood vessel development', or 'cerebellar granule cell' are applicable across multiple species, and need to be defined in terms of generic anatomical classes rather than species-specific classes. These ontologies have traditionally included an implicit embedded anatomical ontology, but this leads to redundancy and is error-prone.
The GO classes can be made to explicitly refer to a generic anatomical type from Uberon to provide computable definitions (in a previous work we described the modular construction of the GO [
38]). These include, for example, 'hepatic immune response', defined as being
EquivalentTo 'immune response' and
occurs_in some 'liver'. Conversely, GO is used to define structures in Uberon by the function they carry out-for example, 'parathyroid gland'
capable_of 'parathyroid hormone secretion'. We regularly use these logical definitions to perform automated reasoning to find missing links in the GO. When doing this reasoning, we occasionally find some inconsistency between different ontologies that would be expected to conform. For example, we discovered inconsistencies between the GO and various scAO
s in the treatment of the term 'gut'. The GO was therefore restructured to use the term consistently with Uberon. On manually resolving these inconsistencies between the existing relations and the relations implied by the embedded definitions, one or both ontologies are improved. We have now provided 1,473 logical definitions for GO classes using Uberon. These are supplemented by additional logical definitions for clade-specific classes outside the scope of Uberon, for which we use ZFA, FBbt and Plant Structure Ontology.
Similarly, the CL is applicable across species and refers to generic gross anatomical types for many of its location-specific classes. For example, 'splenic red pulp macrophage' refers to the macrophages within the gross anatomical structure 'splenic red pulp'. These location-specific classes require an ontology of anatomical structures, such as Uberon, to construct computable definitions. This augmentation is underway, and CL is adding computable definitions using Uberon and other OBO ontologies [
33] using the
capable_of relationship and the
has_function_in relationships [
33]. Conversely, the CL is used in Uberon to indicate the composition of tissues and organs, primarily through the
has_part relation. Note that the CL and the extended version of the GO both provide links to Uberon, but these are not redundant. The majority of links from the GO to Uberon are in the development hierarchy, whereas links in the reverse direction typically connect organs to the functions they perform.