|Home | About | Journals | Submit | Contact Us | Français|
The development of standard terminologies such as RadLex is becoming important in radiology applications, such as structured reporting, teaching file authoring, report indexing, and text mining. The development and maintenance of these terminologies are challenging, however, because there are few specialized tools to help developers to browse, visualize, and edit large taxonomies. Protégé (http://protege.stanford.edu) is an open-source tool that allows developers to create and to manage terminologies and ontologies. It is more than a terminology-editing tool, as it also provides a platform for developers to use the terminologies in end-user applications. There are more than 70,000 registered users of Protégé who are using the system to manage terminologies and ontologies in many different domains. The RadLex project has recently adopted Protégé for managing its radiology terminology. Protégé provides several features particularly useful to managing radiology terminologies: an intuitive graphical user interface for navigating large taxonomies, visualization components for viewing complex term relationships, and a programming interface so developers can create terminology-driven radiology applications. In addition, Protégé has an extensible plug-in architecture, and its large user community has contributed a rich library of components and extensions that provide much additional useful functionalities. In this report, we describe Protégé’s features and its particular advantages in the radiology domain in the creation, maintenance, and use of radiology terminology.
The language of biomedicine is rich and diverse, and many domains recognizing the need to standardize the language they use have been developing terminologies and ontologies. Terminologies and ontologies are related. Terminologies (also called “vocabularies” and “lexicons”) are collections of labels or terms (or “entities”) particular to a subject field or domain of human activity, developed for the purpose of documenting and promoting correct usage. Ontologies are similar to terminologies, but generally contain rich knowledge about their terms (through attributes and relations) with the goal of describing the entities that exist in a domain. Many terminologies and ontologies have been appearing throughout biomedicine, including genomics,1 molecular biology,2 anatomy,3 clinical research,4 and clinical care.5
The radiology community has also recently recognized the need for developing a standardized terminology. Hospitals contain a diversity of computerized information systems, and they often refer to the same procedures, findings, and diagnoses using different terminologies. The lack of standards in terminology conventions creates a barrier for comparing studies across enterprises and among health organizations, and it also hampers research because it is difficult to retrieve indexed case material in a consistent manner.
RadLex is a project to create a comprehensive medical imaging terminology for capturing, indexing, and retrieving a variety of radiology information resources.6 RadLex is being developed by acquiring the terms relevant to the radiology subdisciplines (abdominal, cardiovascular, musculoskeletal, pediatric, thoracic, and neuroradiology). The ultimate goal is to distribute RadLex widely in electronic form so that it can be incorporated into applications such as structured reporting, teaching file authoring, and indexing research data.
RadLex was initially developed using word processors and spreadsheets. However, it soon became evident that these tools were too limited for accessing and managing RadLex. First, as RadLex is a large taxonomy, it is difficult to visualize and navigate RadLex using conventional tools such as word processors and spreadsheets. Second, RadLex contains many attributes associated with each term, and it is difficult to maintain this information in correct structured form as RadLex grows. Third, although text documents and spreadsheets were a simple initial format for collecting the RadLex terms, they are not suitable for distributing RadLex on the Web or for making it accessible to applications. A tool tailored to the needs of terminology development and dissemination in radiology is needed.
Protégé (http://protege.stanford.edu)7 is an open-source tool for editing and managing terminologies and ontologies. It is the most widely used domain-independent, freely available, platform-independent technology for developing and managing terminologies, ontologies, and knowledge bases in a broad range of application domains. The community of registered Protégé users exceeds 70,000. In addition, the large and active Protégé user community is highly engaged in Protégé code development, regularly contributing enhancements to the software (http://protege.stanford.edu/community/wiki.html), as well as participating in online discussion groups devoted to modeling questions, technical-support issues, and requests for new features (http://protege.stanford.edu/community/archives.html).
Protégé has been used as the primary development environment for many ontologies in the life sciences. These projects include the Foundational Model of Anatomy,3 Cerner’s Clinical Bioinformatics ontology,8 the DICE TS,5 and the MGED Ontology.2 Recently, the RadLex project decided to adopt Protégé, and RadLex is currently distributed in Protégé XML format (http://radlex.org/radlex/docs/downloads.html).
In this report, we introduce Protégé and highlight its value to managing terminology in the radiology domain. Besides providing a suite of functionality that helps curators edit and manage terminologies such as RadLex, Protégé provides an extensible architecture to which custom functionality specific to radiology needs can be added, and it also provides a platform for deploying radiology applications that exploit terminologies such as RadLex.
Protégé is a suite of tools for ontology development and use. Before describing the tool itself, it is important to understand how terminologies and ontologies are built and stored in computers. Briefly, terminologies comprise lists of terms (also called “entities”), and they may also specify attributes (also called “slots”) for those terms (for example, RadLex contains a term called supraaortic valve area, which has an attribute Version Number). Ontologies are similar to terminologies, but contain rich relationships amongst terms, enabling them to represent knowledge in a domain (for example, a relationship SegmentOf linking supraaortic valve area to thoracic arota represents the fact that the supraaortic valve area is a segment of the thoracic aorta).
In the Protégé knowledge model, terminologies and ontologies are represented using “frames” (classes, slots, and facets).9 An ontology in Protégé consists of frames and axioms. Classes are the entities (“terms”) used in the domain of discourse. Classes are the sometimes-called “concepts” in terminologies. For example, if we wish to have a term in our RadLex vocabulary representing the supraaortic valve area, we would create the class supraaortic valve area in the Protégé ontology (Fig. 1).
Slots describe properties or attributes of classes. For example, if we wanted all RadLex terms in our ontology to have a string unique identifier called “Name,” which had a nonempty value, then we would create a Name slot and give it the necessary facets (Fig. 2). Facets describe characteristics of slots (such as cardinality, required values, etc). Facets allow us to express the fact that the name of a RadLex term class is required (Fig. 2). Using a set of slots, we can fully describe each RadLex term. For example, we can express the fact that the supraaortic valve area is a segment of the thoracic aorta, that “root of aorta” is a synonym, and that it has a definition and a RadLex identifier of “RID482” (Fig. 1).
Axioms specify additional constraints, but will not be described further here as they are not used in RadLex at this time. An instance is a frame built from at least one class (“instantiation”) that carries particular values for the slots. A “knowledge base” includes the ontology (classes, slots, facets, and axioms) as well as instances of particular classes with specific values for slots. The distinction between classes and instances is not an absolute one; however, because terminologies generally do not contain instances, we can simplify and limit our discussion to classes.
Classes, slots, and facets are the basic building blocks of terminologies. A tutorial on using these elements and Protégé for creating terminologies and ontologies is available10 and will not be repeated here. Instead, we will focus on the features of Protégé that are particularly relevant to accessing, editing, and sharing ontologies and using them in applications.
Protégé is implemented in Java and runs on a broad range of software platforms, including Windows, MacOS, Linux, and Unix. Protégé is built using a tiered architecture (Fig. 3), providing an ontology storage layer, a knowledge model layer, a graphical user interface (GUI), and an application programming interface (API). Users running the desktop application interact with ontologies using the GUI, while application programs access ontologies in Protégé via the API. The Protégé GUI uses the API to access the Protégé knowledge model of the ontology. Thus, the Protégé GUI, plug-ins, and user applications all access the ontology through the same API, making the Protégé architecture modular and highly flexible (Fig. 3).
In addition, Protégé has a plug-in architecture, permitting developers to extend Protégé’s core functionality in many ways without needing to modify the Protégé source code. There are more than 90 Protégé plug-ins providing advanced capabilities such as import/export, validation, and visualization of large ontologies (http://protege.cim3.net/cgi-bin/wiki.pl?ProtegePluginsLibraryByTopic), some of which we discuss below.
Import and export of ontology content is a critical feature of any tool, because there are many formats for storing ontologies and terminologies. There are several plug-ins for Protégé, enabling it to import and export ontologies in many different formats, including XML, RDF, OWL, and the native Protégé format.
If the ontology is stored in a file, the entire ontology is read into memory. Protégé also provides a relational database backend, which is useful when working with ontologies too large to reside in memory. Finally, through Protégé’s plug-in mechanism, developers can create their own custom import/export plug-ins to work with custom or specialized formats.
Protégé can be run as a stand-alone application or through a Protégé client in communication with a remote server (the latter is particularly useful for collaborative ontology development). When the Protégé desktop application is launched, the user can create a new ontology, open an existing ontology, or import an ontology from a variety of formats. Protégé creates a project file that records display-related information and the location of the ontology source file; the project file simplifies the process of reopening ontologies, and it maintains user GUI customizations.
Once an ontology is opened, the user works with it in the Protégé GUI (Fig. 4). The Protégé GUI is organized into separate panels (accessed by tabs along the top of the GUI) that provide different views into the contents of the ontology. The first tab is the Classes tab, which is the most common view used to browse ontologies and terminologies. On this tab, the ontology is shown as an expandable tree on the left of the GUI, and the attributes of a class selected by the user is shown on the right (Fig. 4). In the tree view, terms that are indented and below a term are called “child” terms, whereas the term above is the “parent.” The interpretation of a child term is that it is subsumed by the parent by an “is-a” relationship (unless the tree of classes is displayed according to a different slot). For example, Protégé shows that the RadLex term aorta is a child of systemic artery, which is interpreted as “aorta is-a systemic artery” (Fig. 4).
The tree view of RadLex can facilitate the process of curating the terminology to find problems to be corrected, because the various hierarchies can be easily expanded and browsed to detect inconsistencies. In this tree view, for example, we see that the supraaortic value area is a child of thoracic aorta (Fig. 4; “supraaortic value area is-a thoracic aorta”). As it is not correct to say that “supraaortic valve area is-a thoracic aorta,” the RadLex ontology should be changed in a future release (what was likely intended was to say “supraaortic value area part-of thoracic aorta,” which could be reflected by changing the Part-of value).
The classes tab permits full editing of the ontology. In the tree view of the ontology on this tab, users can drag and drop classes to reorganize the hierarchy, create and rename classes using intuitive GUI paradigms. The values for attributes of classes can also be directly edited. For example, we could change the name of the class, left coronary sinus, by simply selecting this class and changing the value of the “Preferred Name” slot in the GUI (Fig. 4, right panel).
The other tabs accessible from the Protégé GUI are the slots, forms, and instances tab, which are less often used by those curating terminologies. The slots tab enables users to view and edit all slots in an ontology; the instances tab displays all instances associated with the classes; and the forms view permits users to customize the layout of the elements on display forms.
Ontologies change over time. They are developed iteratively, with incremental changes contributed by people working collaboratively on them. RadLex will soon be releasing a new version, adding new terms as well as changes to the original terminology based on suggestions from users of the first version. There is a need to maintain and compare versions of ontologies, similar to the need for document version management in large projects. Document versioning systems enable users to compare versions, examine changes, and accept or reject changes. However, a versioning system for ontologies must compare and present structural changes rather than changes in the text of a document.
PROMPT is a plug-in to Protégé providing a suite of tools for aligning and comparing two ontologies. PROMPT contains PROMPTDIFF, a versioning environment for ontologies, allowing developers to track changes in ontologies over time.11,12 PROMPTDIFF is a version-comparison algorithm that produces a structural difference between two versions of an ontology. The results are presented to the users enabling them to view classes that were added, deleted, and moved. The users can then act on the changes, accepting or rejecting them.
Suppose we edited a copy of RadLex to update it based on community feedback. Consider that we move the class supraaortic valve area and place it under the anatomic location class, and rename it to supraaortic area. In addition, suppose we create a new class noncoraonary sinus under supraaortic area, and that we update the Part-of relationship on supraaortic area to say it is Part-of thoracic aorta. When we run PROMPTDIFF, comparing our edited version of RadLex against the original version, the PROMPT interface displays the following differences (Fig. 5):
In addition to viewing versions of ontologies, PROMPT also includes a tool to align two different ontologies to locate classes that are the same or similar to each other. Such functionality could be useful in RadLex for finding related terms in other vocabularies such as SNOMED and the ACR index.
The Protégé GUI provides a tree view of ontologies (Fig. 4). Although an expandable tree is suitable for many ontologies and terminologies, it can be difficult to visualize those that have multiple types of relationships among the terms. A tree shows the relationships among terms using one relationship (usually the is-a relationship). RadLex contains several types of relationships, which are best shown using a graph (Fig. 6). There are several types of ontology visualization components that have been developed for Protégé, two of which we highlight here in the context of RadLex.
OntoViz is a plug-in to Protégé that creates graph visualizations of ontologies (http://protege.cim3.net/cgi-bin/wiki.pl?OntoViz). OntoViz produces its graphs using the Graphviz software package.13 The types of visualizations are highly configurable and include ability to (1) visualize part of an ontology by selecting specific classes and instances, (2) display slots and slot edges, (3) customize the graph by specifying colors for nodes and edges, and (4) visualize the neighborhood of classes for particular classes or instances by applying various operators (e.g., subclasses, superclasses).
TGVizTab is a plug-in for Progégé for visualizing ontologies as a TouchGraph, a software implementation for producing an interactive graph that can be navigated by clicking on the nodes and edges in the graph. The TouchGraph library has been modified and integrated into TGVizTab for visualizing ontologies.14 TGVizTab is well suited for navigating RadLex, because of the size of the terminology: the TouchGraph display provides a compact and graphical way to visualize large numbers of classes (Fig. 7). Unlike OntoViz, the user can dynamically navigate the ontology and interrogate each of its components in real time. This visualization paradigm makes the terminology display simple and intuitive.
A key challenge for maintaining a large terminology such as RadLex is to collect and process the feedback from the large community of users. The current model for feedback on terminologies and ontologies is for users to contact the developers or to post comments to online discussion forums. The problem with this approach is that the user feedback is disconnected from the ontology under discussion; it is difficult for the ontology developer to keep track of these discussions, to prioritize the feedback, and to decide which parts of the ontology need updated first.
Collaborative Protégé15 is a recent extension of the Protégé system that supports collaborative ontology development by enabling the community to annotate ontologies. A set of annotation components can be accessed by people viewing terminologies such as RadLex, enabling them to associate their comments directly with the parts of the ontology to which they apply. Thus, for example, if a user noticed that there is a duplicate term in RadLex, that person could create an annotation on the class in question in which the nature of the problem is described (Fig. 8). At a later time, the RadLex curator can view the parts of the ontology for which users have made annotations, either acting on them or making annotations on the annotations themselves. All annotations can be viewed by the community as well, enabling a dialog on ontologies that is tied to the specific components under discussion. This mechanism can streamline the process of acquiring and reviewing community feedback on ontologies.
The Collaborative Protégé components provide search and filtering of user annotations based on different criteria. There are also two types of voting mechanisms that can be used for voting of change proposals. All annotations made by one user are seen immediately by other users, enabling a community-based discussion about ontologies to be directly linked to the applicable portions of the ontologies.
Despite the variety of plug-ins available for Protégé, many users will have custom needs related to how they work with ontologies, which will require direct programmatic access to the ontology content. For example, a RadLex curator may wish to find the classes that have missing values for slots (currently, many SNOMED and UMLS identifiers have not yet been collected for RadLex terms). To find these classes in RadLex, a program would be needed to traverse all classes in the RadLex ontology, looking for those that have missing values for the slots in question.
Protégé provides two programming interfaces that developers can use to access its ontology content. The Protégé Script Tab is a plug-in to Protégé that provides a scripting environment in several interpreted languages such as python, pearl, and ruby (http://protege.cim3.net/cgi-bin/wiki.pl?ProtegeScriptTab). The Script Tab provides the simplest and quickest programmatic access to ontologies in Protégé; it accessed via an additional tab in the Protégé GUI into which users can enter commands (Fig. 9). The commands will be applied to the ontology currently loaded into Protégé. Thus, in our example above, we could perform the query to find classes having missing values for slots by entering a few lines of code into the Script Tab and viewing the results. An example of the output of a few simple commands entered into the Script Tab is shown in Figure 9.
In addition to the Script Tab, Protégé provides an API. System developers can use the Protégé API, a Java interface to access and programmatically manipulate Protégé ontologies—this is the same API that the Protégé GUI uses to access ontologies (Fig. 3). Developers can access the Protege Java API from their Java programs by calling the appropriate methods to access the ontology in Protégé and to perform the desired operations. Documentation for guiding developers in using the Protégé API for application and plug-in development is available (http://protege.stanford.edu/doc/dev.html).
While Protégé is a useful tool for working with ontologies and terminologies, the ultimate reason to be creating these artifacts is to use them in applications. For example, if one wanted to use RadLex in a structured reporting application to enforce use of controlled terminology, it would be necessary for that application to access terms in the ontology. Protégé can be integrated with many applications, connecting external programs to ontologies via the Protégé API (as described above). In this section, we highlight an example application relevant to RadLex that was created in this manner—WebProtege.
A large terminology such as RadLex needs to be both widely available and locally editable. A common method for distributing a terminology is via the Web; however, once a user downloads the terminology, it will be become out of date as soon as a new version is posted. An alternative method for disseminating RadLex is to provide a Web-based program for viewing the terminology. In fact, RadLex is currently made available through a dedicated Web application (http://www.radlex.org/radlex/). However, because the version of RadLex that is deployed in the Web application is separate from the working draft, it is challenging to keep the development and production versions of RadLex tightly synchronized.
An approach whereby a single version of RadLex could be edited by curators and disseminated on the Web could be advantageous to make a current working draft available. We undertook to implement this strategy by adopting WebProtege, an ontology-enabled application built on the Protégé platform. The goal was to distribute Radlex in a Web application while providing the means to keep it synchronized with the version in development.
WebProtege is an application that allows users to share, browse, and edit ontologies using a Web browser (http://protege.cim3.net/cgi-bin/wiki.pl?WebProtege). The WebProtege application provides a graphical paradigm similar to the Protégé GUI; users can browse ontologies in a Web browser in a similar manner to viewing them in Protégé (Fig. 10). In addition, users can create and post comments about specific ontology components.
The Protégé architecture allows multiple applications as well as Protégé itself to access the same ontology (see Fig. 3). Thus, the maintenance and coordination of the terminology is greatly facilitated, because there need be only one copy of the terminology residing in a central location that all applications access through the Protégé API. A RadLex developer can use the Protégé GUI to edit the terminology, while applications that need RadLex use the Protégé API, accessing the same version that the developer has edited, and users browsing RadLex through WebProtege see the same version of the terminology as well.
In addition to the WebProtege application that we have described here, other applications that use ontologies can be built using similar principles. The common denominator is that in order for an application to use an ontology, the appropriate methods in the Protégé API are called to get references to classes (terms) and their attributes (slots).
The need for terminologies and ontologies in radiology is becoming increasingly apparent. Variation in language used in reporting the results of imaging studies is a recognized challenge,16 which can be mitigated by adopting standard terminologies. Radiology educators need a comprehensive lexicon for indexing online educational materials.6 Radiology researchers need their data to be indexed with terminologies so that search and data mining can be more effective. Ontologies can be used to create new intelligent Picture Archiving and Communication System applications.17
In general, there are two types of users of terminologies: terminology developers and terminology consumers. The developers of RadLex are in the former group, and those wishing to use RadLex or to create applications enabled by RadLex are in the latter group. Terminology developers need a tool that meets the functional requirements of terminology management:
Terminology consumers have requirements related to terminology dissemination and application development:
In this report, we have described Protégé, an open-source tool that can help users of terminologies and ontologies to develop them and use them in applications. Protégé provides tools to support the functional requirements for terminology developers and consumers as described above. The Protégé tools range from creating the terminology (GUI, visualization, and versioning support) to deployment and application development (user feedback management, API, and application support). We have illustrated these functions using RadLex as an example terminology. In fact, the RadLex project has adopted Protégé for managing the terminology, and Protégé has been useful in meeting the terminology curation needs of the RadLex project to date.
Although Protégé provides functionality that meets many needs of the user community, there are still some unmet challenges. First, terminologies evolve over time, and terms that are used in applications may be retired in future versions of the terminologies. When a term is deleted from a newer version of the terminology, applications that use an older version of the terminology need to be provided an appropriate replacement term.
A second challenge is that the compact visualizations provided by Protégé may not be sufficient in cases of very large terminologies. Although there are some visualization paradigms for large ontologies, it could be helpful to filter the terminology or produce a compact “view” of the ontology specific to context of the user. A final challenge is that users new to terminologies and ontologies may experience a learning curve in becoming acquainted with the concepts and learning how to create terminology-enabled applications. The Protégé web site contains tutorials and introductory material that can help the community get up to speed with this tool and its use in applications.
This work was supported by the National Center for Biomedical Ontology, under roadmap-initiative grant U54 HG004028 from the National Institutes of Health. This work was also supported by the Protégé resource, under grant LM007885 from the U.S. National Library of Medicine. We acknowledge the highly talented Protégé research and development team, Tim Redmond, Samson Tu, Tania Tudorache, and Jennifer Vendetti, and Ray Fergerson for developing the Protégé system. We also thank the Protégé user community for their outstanding contributions to the Protégé platform.