The need for standardizing biomedical vocabulary is not recent. As long ago as the 17th century, health authorities in London used a standard list of about 200 causes of death - later integrated into the International Classification of Diseases - to compile accurate health statistics known as the Bills of Mortality [1
]. In addition to terms, scientists such as Linnaeus started formalizing the relations among biological entities, in order to represent and share their knowledge of the world [2
The last decade has seen a marked increase in the number of artifacts created for representing biomedical entities, their terms and their relations, often referred to as vocabularies, terminologies and ontologies. As shown in , the number of citations on ontologies and controlled vocabularies in the PubMed/MEDLINE database has grown by 600% to about 1200 per year1
. While some authors have proposed definitions for these artifacts [3
] and attempted to characterize the distinctions among them [5
], in practice, these names are often used interchangeably. This phenomenon is reflected in part by the fact that 5–10% of the PubMed/MEDLINE citations indexed under the MeSH descriptor "Vocabulary, Controlled" also contain the word "ontology" (dark section of the histogram in .) For the sake of simplicity, we henceforth refer to these various types of artifacts as ontologies. Another interesting trend in the past decade is the change in the relative importance of these ontologies, as measured by the number of mentions in PubMed/MEDLINE citations. As shown in , the Gene Ontology (GO) has become the most cited ontology, with over 450 citations per year. In contrast, the footprint of Unif ied Medical Language System (UMLS) seems smaller now than ten years ago, although the number of citations has remained essentially constant throughout the decade.
Evolution of the number of citations in PubMed/MEDLINE on ontologies and controlled vocabularies over the past 10 years (excluding DSM, the Diagnostic and Statistical Manual of Mental Disorders)
Evolution of the proportion of citations in PubMed/MEDLINE by ontology.
A number of recent reviews have presented the major biomedical ontologies to various audiences, most often with an emphasis on their design and structural characteristics, mentioning their use only in passing [6
]. Other reviews have presented the role played by several biomedical ontologies in specific applications, such as clinical decision support [12
] and discovery applications [13
], or in a specific domain, such as bioinformatics [14
]. In contrast, one review provides a functional perspective on biomedical ontologies [15
]. The interested user is referred to these reviews for more information about these ontologies.
In the present survey, we analyze some of the high-impact biomedical ontologies presented in [7
] through the functional lens of [15
], classifying their roles - somewhat arbitrarily - into three major categories: knowledge management, including the indexing and retrieval of data and information; data integration, exchange and semantic interoperability; and decision support and reasoning. The three categories, however, are not mutually exclusive and we examine the various roles played by each ontology. For example, LOINC is used as a source of standard vocabulary for retrieval purposes [16
], for the integration and exchange of laboratory data [17
], and for "reliable execution of decision logic in clinical decision support systems" [12
]. More generally, reference ontologies are designed independently of any particular applications and expected to be useful in a variety of tasks [19
The ontologies under investigation in this survey include SNOMED CT, a comprehensive concept system for healthcare [21
]; the Logical Observation Identifiers, Names, and Codes (LOINC), a vocabulary for laboratory tests and clinical observations [24
]; the Foundational Model of Anatomy (FMA), a domain ontology of structural human anatomy [20
]; the Gene Ontology, a controlled vocabulary for the functional annotation of gene products across species [28
]; RxNorm, a controlled vocabulary of normalized names and codes for clinical drugs [31
]; the National Cancer Institute Thesaurus, a public domain terminology that provides broad coverage of the cancer domain [34
]; the International Classification of Diseases, the 115-year-old medical terminology, now part of a family of health classifications [37
]; the Medical Subject Headings (MeSH), a controlled vocabulary for the indexing and retrieval of the biomedical literature [39
]; and the Unified Medical Language System (UMLS), a terminology integration system in which all the above ontologies are integrated (with the exception of the FMA, soon-to-be integrated) [41
]. Some characteristics of these ontologies (based on information present in the UMLS) are shown in , including scope, number of entities, distribution of the number of terms per entity, and existence of a subsumption hierarchy.
Table 1 Characteristics of some biomedical ontologies (including scope, number of entities, distribution of the number of terms per entity [minimum, maximum, median and average], and existence of a subsumption hierarchy), based on information present in the UMLS (more ...)
Due to the large number of publications on the subject, this review will necessarily be superficial. Its objective is to provide not an exhaustive list of references, but rather examples, hope-fully typical, of biomedical ontologies in action. In order to reflect the state of the art, the selection of references is also somewhat biased towards recent journal articles.