|Home | About | Journals | Submit | Contact Us | Français|
An important inconsistency currently exists in the literature on oral cancer. Reviewing this literature, one finds that the term oral cancer is defined and described with great variation. In a search in PubMed, at least 17 different terms were found for titles of papers reporting data on oral cancer. The variability of the terms used for designating anatomic regions and type of malignant neoplasms for reporting oral cancer has hampered the ability of researchers to effectively retrieve information concerning oral cancer. Therefore, it is sometimes extremely difficult to provide meaningful comparisons among various studies of oral cancer. Recently, a new ontological strategy that is rooted in consensus-based controlled vocabularies has been proposed to improve the consistency of data in dental research (Smith et al. in J Am Dent Assoc 141:1173–1175, 2010). In this paper, we analyzed the terminology dilemma on oral cancer and explained the current situation. We proposed a possible solution to the dilemma using an ontology-based approach. The advantages for applying this strategy are also discussed.
It has been recently pointed out in an article in the Journal of the American Dental Association (JADA) that the current explosion of information in biomedical research is creating significant problems for our ability to access relevant data that have been published in the literature. Smith et al.  asserted that there is presently “an uncontrolled explosion of different ways of describing information.” The paper goes on to suggest a “strategy to advance the consistency of data in the dental research community” with the goal of solving problems of retrieval and access. Smith et al.  suggested the necessity for creating ontologies that are rooted in “consensus-based controlled structural vocabularies” in order that “past, present, and future data” be maintained in an electronic database that permits searches enabling researchers from diverse research communities to readily obtain relevant information from the literature. The goal of the current paper is to explore the use of ontologies in the area of oral cancer.
In the JADA paper, Smith et al.  describe the advantages and benefits to building such ontologies. It is suggested that biomedical ontologies should be designed so as to be interoperable with other existing ontologies. The interoperability among different ontologies promotes the search and integration of various types of biomedical data. An ontology can be modified or corrected in accordance with new research findings and can also enable the integration of previously captured data. This integration is possible because an ontology allows the incorporation of traditional terms that have been previously used by the research community.
An impediment currently exists in the ability of the oral cancer research community to locate relevant information concerning oral cancer in the medical literature. Various uncontrolled, nonconsensus-based terms are presently being used to refer to cancer that occurs in the various regions of the mouth . Attempts to develop a controlled vocabulary for such terms have not been successful [2, 3]. In fact, some authors [2, 4] have concluded that, it is perhaps “too late for a rational solution to this problem”. Although it may be a daunting task to achieve a widely-accepted definition of oral cancer, our obligation as clinicians and researchers, however, is to make a serious attempt to solve this problem by whatever available means. If we let the current situation continue, it will be extremely difficult to provide meaningful comparisons among various studies of oral cancer regarding incidence, mortality rates, and related epidemiologic and clinical data. As Moore et al.  state, “given the importance of this disease on a worldwide basis, it would appear to be essential that some form of international consensus be derived regarding the classification of oral cancer.”
As noted above, establishing a consensus on oral cancer terminology is, however, a difficult challenge. This paper has three goals: (1) to present an analysis of the literature on oral cancer in order to delineate the nature of the current terminological inconsistencies; (2) to describe and examine the methods of biomedical ontology, as presented in the Smith et al.  paper, to determine their applicability for addressing the oral cancer terminology dilemma; and (3) to discuss the advantages in developing a biomedical ontology in the domain of oral cancer that go beyond providing a remedy for the terminology dilemma.
A review of the literature indicates that there is no consensus on the definition of the term oral cancer. The definition of the word “cancer” itself is clear enough and well accepted: cancer is a common term used to denote all malignant tumors . However, the definition of oral cancer is problematic. The meaning of the term oral cancer varies considerably in pertinent publications . The use of different terms and definitions for reporting oral cancer has hampered communication between clinicians and researchers.
The lack of consensus on the definition of oral cancer was well documented by Moore et al. in a letter published in the journal Oral Diseases . They reviewed the oral cancer literature and wrote that “defining oral cancer presents some important challenges to both clinicians and researchers.” From their point of view, the root of the problem is related to what they called a terminology dilemma. They provide evidence to support this view and conclude by stating that, “as yet there seems no uniformly accepted definition of oral cancer” . Similarly, McCartan  describes the oral cancer terminology as “unhelpful and confusing”.
The variety of terms that have been and are currently being used to describe oral cancer is illustrated in Table 1. A PubMed search revealed at least 17 different terms used in titles of papers reporting data on oral cancer.
The lack of consensus in the definition of oral cancer is reinforced by the variability in the specific anatomic designation of the disease in published reports . Some epidemiological studies focus on cancer that occurs in the area between the vermillion border of the lip and the junction of the soft and hard palate . Some studies add, to the criteria stated above, cancers of the oropharynx , nasopharynx, and hypopharynx , and others include cancer of the major salivary glands [25, 26]. Still others incorporate cancer of the external (skin) portion of the lip . Furthermore, some papers in the literature even fail to specify which anatomic sites are included [28–30].
Another issue related to the terminology dilemma is the histological types of malignant neoplasms that are included as oral cancer. It is clear that malignant neoplasms of the mucosal tissue (epithelium) are always considered to be oral cancer. What is not clear is whether malignant neoplasms in tissues adjacent to the oral mucosa should also be designated as oral cancer. Reviewing the literature, one finds that many studies only include squamous cell carcinomas . Others add cancer in adjacent tissues such as salivary glands  and connective tissue [13, 15, 25]. In addition, others also include malignant neoplasms of the skin of the lip , lymphoid [15, 25], muscle  and nerve  tissues as oral cancer. There also are studies that include metastases to the oral cavity and melanoma  as oral cancer.
The variability of the terms used for designating anatomic regions and types of malignant neoplasms for reporting oral cancer has hampered the ability of researchers to effectively retrieve information in the literature. For example, a researcher who wants to retrieve papers containing information concerning the specific location and type of malignant neoplasm found in various mouth regions is confronted with two problems. One problem is that the researcher must know the anatomic definition of oral cancer that is used by each author. Another problem is that the researcher must know what types of malignant neoplasms are included in the paper.
The terminology dilemma has developed because various investigators have selected terms for defining oral cancer with respect to location and type of malignancy without the formation of a consensus concerning the definition of those terms. One way to solve the terminology dilemma requires that the international community reach a consensus upon a standard definition of oral cancer. Attempts have been made in the oral cancer community to define oral cancer, but these attempts have not been successful . It seems that the current practice of using various, loosely defined terms for oral cancer is unlikely to change.
Another strategy to solve the terminology dilemma would be keep the terms currently in use, as illustrated in Table 1, and create consensus-based standard definitions for each one of them (e.g., intraoral cancer, mouth cancer, oral cavity cancer, oropharyngeal cancer etc.). This would be exceedingly difficult and not very productive even if successful. The reason is because there are at least 17 terms used in the oral cancer literature that we have identified. In our judgment, it would be extremely difficult, if not impossible, to convince experts in the area of oral oncology to convene in order to develop a consensus-based definition for each one of them. It would also not be productive because these variable terms have been developed in the absence of any attempt to establish a controlled vocabulary, which is the very reason for the dilemma we address. The goal is not to solve the dilemma by attempting to define these terms but to move forward with the approach that we are presenting in this paper.
To solve the terminology dilemma, we do not advocate attempting any of the previously mentioned solutions. Rather, we advocate for the development of an oral cancer ontology. The fundamental objective is to allow researchers to obtain specific information they need from the literature concerning oral cancer location and type, regardless of the title of the paper and the definition of oral cancer used by the authors. To this end we believe that the biomedical ontology-based method described by Smith et al.  can be employed to achieve this objective.
Smith et al.  have stated that ontological strategy is rooted in consensus-based controlled vocabularies. Even though there is no consensus-based definition of the terms used to define oral cancer, we can take advantage of other consensus-based terminologies that are currently being used by researchers in the field. Reviewing the oral cancer literature, it is evident that there are two universally well-accepted terminologies in every paper concerning oral cancer. These terminologies are: (1) the standard anatomic terms used for identifying the locations of malignant neoplasms; and (2) the standard terms used to identify types of malignant neoplasms. Information concerning the locations and types of malignant neoplasms are always included in published reports of oral cancer concerning incidence and prevalence.
Standard anatomic terms of location are employed by researchers to report the sites in a paper that will be included in the study. For example, tongue, hard palate, gingiva, or floor of mouth are universally accepted terms. There is currently in existence a hierarchically-structured consensus-based terminology of anatomical terms that is commonly referred to in the oral cancer literature. This terminology is called the International Classification of Diseases for Oncology (ICD-O topography) of possible sites for malignant neoplasms . This ICD-O topography of anatomical sites is accepted as the standard by the World Health Organization (WHO) . This ICD-O topography for the mouth region is listed under the heading of “LIP, ORAL CAVITY, AND PHARYNX” and includes the following major subclasses:
Under each subclass, there are further subdivisions. For example, under the subclass, “palate,” can be found the following subdivisions:
It is obvious that this classification system is not complete as suggested by the use of such phrases as, “other and unspecified”, “other and ill-defined” and “not otherwise specified”. These phrases indicate a lack of specific information about the location of certain tumors. It is to be expected that in the future new standardized protocols will help to eliminate the designation of particular locations as “ill-defined” or not “otherwise unspecified”. When advances are made in the identification of tumor sites this information can be added to the ontology. In addition, ontology-driven information systems have the capacity to retain legacy terms and designate their relationship to the new terms that have taken their place .
We consider the ICD-O site classification system to be a consensus-based, hierarchically-organized controlled vocabulary that can be used as part of an oral cancer ontology. The ICD-O topography can serve as an important component of the oral cancer ontology we are proposing.
Many efforts have been made by several groups of pathologists to standardize the terminology used for malignant neoplasms. These efforts have resulted in widely accepted terminology that is currently used by pathologists in their diagnostic reports of malignant neoplasms. This terminology is included in the World Health Organization Classification of Tumors . Therefore, oral cancer researchers use terms from the WHO classification of tumors to describe the types of malignant neoplasms that are included in their studies. The types of malignant neoplasms that may occur in the mouth region are listed under the heading of “WHO CLASSIFICATION OF TUMOURS OF THE ORAL CAVITY AND OROPHARYNX” and include the following major subclasses:
Under each subclass are listed “finer-grained” subdivisions. For example, under the subclass, “malignant epithelial tumors,” can be found:
We consider the WHO classification of tumors to be a consensus-based, hierarchically-organized controlled vocabulary that can be used as part of the proposed oral cancer ontology. Thus, there are currently in place consensus-based hierarchically-organized terminologies for both location (ICD-O) and type (WHO) of malignant neoplasms. Both of these terminologies will be fundamental elements in the building of an oral cancer ontology. The National Cancer Institute thesaurus (NCIT) is another controlled vocabulary concerning malignant neoplasms . As will be discussed later, links to the NCIT will be possible using the oral cancer ontology.
The next step in the implementation of the ontological strategy is to instantiate the controlled terminologies into a database that can maintain the hierarchical structure of the terminologies. This requires establishing two coding systems. The first is the designation of a computer code to identify each term and the second is the designation of a code that establishes the hierarchical relationship between the terms. Fortunately, both the ICD-O site terminology and the WHO classification of tumor terminology have already provided identification codes (ICD-O morphology codes) for their terminologies and these codes will be used in the ontology. There are, however, no codes to establish the relationship between terms and these will have to be provided.
There is a well-established and growing science concerning the principles of ontology building, including methodologies for coding relationships between terms . There are software programs for instantiating terminologies such as we have described for ICD-O and WHO in ontological databases .
Once the two terminologies in the oral cancer ontology have been placed in an ontological database another coding step must be implemented. The logic of the papers we are interested in makes fundamental correlations between types of cancer and the locations in the mouth region in which they are found. Those relationships must be maintained in the ontology. To do so, relational codes must be established that link every term in the location terminology, with every term in the type of malignant neoplasm terminology.
A semantic query of the ontology might take the following form: give me all the papers in the literature that have studied the occurrence of spindle cell carcinomas in the soft palate and uvula. An answer to that question requires that all the terms in the ontology are cross-linked. Again, the methodology for achieving those cross-linkages has been established in the field of biomedical ontology .
The next step in the implementation of the “ontological strategy” of Smith et al.  requires establishing a relationship between the oral cancer ontology and the literature in oral cancer. This is accomplished through the method of annotation (tagging). Annotation in the case of the oral cancer ontology requires that location and type of cancer information in each relevant paper be coded into the ontology. Figure 1 illustrates the annotation process of oral cancer literature.
In Fig. 1, column 1 shows examples of terms from the consensus-based terminology for types of malignant neoplasms. The corresponding WHO codes for the types of malignant neoplasms are shown in column 2. Column 3 contains the corresponding ICD-O codes for the consensus-based terminology of a small sample of anatomical terms (column 4). The horizontal and crossed arrows represent the links established by the use of relational codes between the two terminologies in the ontology. The vertical arrows represent the codes used to maintain the hierarchical structure of the terminology in the ontology. Column 5 shows two hypothetical examples of papers in oral cancer literature that are linked to the ontology based on the location and type of oral cancer reported.
For example, paper #1 reports data regarding spindle cell carcinoma of the uvula. Retrieving the information on the location (uvula) and type of malignant neoplasm (spindle cell carcinoma) that are reported on in paper #1 is accomplished by the process of linking the code for paper #1 (0.1) to the appropriate terms in the ontology. The process of linking information in the literature to a computer-based ontology is called annotation. This is represented in Fig. 1 by the placing of the paper #1 code (0.1) under both spindle cell carcinoma and uvula. Thus, paper #1 is coded appropriately into the ontology in both terminologies. Paper #1, along with all related publication information concerning the paper, will be retrieved from the ontology by querying the ontology with the terms “spindle cell carcinoma” (type of malignant neoplasm), “uvula” (location of the oral cancer) or by the combined phrase “spindle cell carcinoma of the uvula”. A second example involves a paper # 2, which includes information about squamous cell carcinoma of the palate. Annotating the paper with the ontology will allow retrieval of the paper by querying the terms “squamous cell carcinoma”, “hard palate” or the combined phrase, “squamous cell carcinoma of the hard palate”.
After annotating the location and types of malignant neoplasms reported in papers concerning oral cancer, the oral cancer ontology can be used to facilitate oral cancer literature retrieval. An oral cancer ontology will allow researchers to easily access the information that they need. For example, if the investigator is searching for oral cancer that specifically concerns the tongue, the ontology will retrieve all papers that report on cancer of the tongue. If the investigator narrows the search and is interested in papers that report a specific type of malignant neoplasm of the mouth region, such as squamous cell carcinoma, and in a specific location, such as the tongue, the ontology will locate all such papers. The ontology allows investigators to query the literature by using any combination of location and types of malignant neoplasms.
Initially, this oral cancer ontology can be used to annotate existing published literature based on the location and type of malignant neoplasm. The long-term goal, however, is to expand this ontology and integrate new terminologies, as well as modify existing terminologies based on new knowledge that requires the addition of new terms or modifications of old terms . For example, consensus-based terminologies describing factors such as geographic data, risk factors, staging and grading on oral cancer can be added to the oral cancer ontology as they are developed over time.
Biomedical ontologies are now commonly used to facilitate searches of the literature in particular domains [38, 39]. The oldest and most well used biomedical ontology is the Gene Ontology (GO). It has been in existence for over 10 years. GO provides a standardized vocabulary for gene product annotations in the areas of molecular function, biological processes, and cellular components . GO has been widely adopted as a vocabulary for annotating functional gene-products data . There are over 11 million annotations relating gene products described in research databases to terms in the GO. Research information regarding approximately 180,000 genes has been tagged using G0 .
GO has facilitated retrieval of scientific journal articles associated with genes. Today, scientists commonly use GO tools to access past and present scientific literature. This is possible because the GO vocabulary encompasses all terms that have been used by experts to describe gene data. As of 2010, more than 50,000 articles on genes have been annotated through GO and the content is available for computer searches .
The proposed oral cancer ontology possesses a similar structure to that of the Gene Ontology. GO contains three controlled vocabularies that describe the functional attributes of gene products . The proposed oral cancer ontology contains two controlled vocabularies that describe the location and type of malignant neoplasm found in the mouth region. The GO terms have unique identifiers (codes) . Location and type malignancy of oral cancer will also have their own unique identifiers. Literature–based-annotation using GO codes is an important way in which biological information about specific gene products can be captured and expressed in a searchable, computable form . In an oral cancer ontology, literature-based-annotation using ICD-O and WHO codes can be the method in which the location and type of malignant neoplasms of the mouth region will be captured and made accessible for computer-based searches.
There are several additional and important benefits in constructing the proposed oral cancer ontology on well-defined ontological principles, such as those used in the GO. An oral cancer ontology so constructed can always be modified to adapt to the discovery of new knowledge or to make changes in accepted terminologies, without compromising information already present in the ontology. High quality biomedical ontologies are already in existence and can be linked to the oral cancer ontology. Table 2 shows a partial list of well-established current biomedical ontologies. An oral cancer ontology will give oral cancer researchers the ability to gain information related to problems in oral cancer from these other biomedical ontologies. The semantic interoperability of these ontologies permits the integration of information from many fields and at many levels of granularity, from the genetic (GO) to the cellular (cell type ontology) to the clinical (as SNOMED-CT and the NCI Thesaurus).
The ICD-O codes concerning the locations of oral cancer can be linked to the Foundational Model of Anatomy (FMA), an ontology with a controlled vocabulary of the human body . As a comprehensive ontology of human anatomy, the FMA includes terms that represent the structural organization of the human body from the macromolecular to macroscopic levels. The types of malignant neoplasms found in the mouth region can be linked through FMA to other comprehensive terminologies related to cancer such as Mouse Pathology , and Cancer Research and Management ACGT Master Ontology . Currently, we are developing a standardized vocabulary for the surgical pathology domain and we are planning to link the mouth region cancer terminologies to that vocabulary.
We believe that developing an ontology for oral cancer will be an important step in permitting the field of oral oncology to take advantage of the rapidly developing ontological computer based methodologies. The advantages of the ontology development, as summarized by Smith et al.  include:
How will the development of an ontology for the field of oral cancer be accomplished? This depends upon the degree to which members of the oral cancer community believe in the importance of the creation of ontologies similar to the one we have presented here. It is our view that oral cancer related ontologies can become as indispensable as the GO is to scientists working in the field of genomics. The oral cancer community through its various organizations such as the American Academy of Oral and Maxillofacial Pathology, International Association of Oral Pathologists, the International Academy of Oral Oncology and North American Society of Head and Neck Pathology should address the issue of the importance of using the ontological methods described here to establish a comprehensive data base that would integrate the past, current, and evolving knowledge of the field. This would permit the oral cancer community to meet the challenge posed by Smith et al.  of dealing with the “uncontrolled explosion of different ways of describing information” by creating an electronic “consensus-based controlled structured vocabulary” concerning all aspects of oral oncology that would make the data available for searches and algorithmic processing.
We have proposed a solution to the terminology dilemma involving the common, but ill-defined phrase, “oral cancer”, a dilemma that has prevented effective retrieval of focused oral cancer information from the literature. We have proposed the creation of an oral cancer ontology-based on established ontological principles that captures and integrates site (ICD-O) and type (WHO) of malignant neoplasm information from all papers on oral cancer. The proposed oral cancer ontology will enable researchers to easily access information regarding location and type of mouth region cancer. Furthermore, the oral cancer ontology can be expanded to include, for example, information concerning demographic data, risk factors, staging and grading. In addition, as the field of biomedical ontology expands, our system will help to promote biomedical data integration and semantic interoperability with other existing ontologies, thereby expanding the search capacities of scientists interested in advancing knowledge of cancer of the mouth region.
The authors wish to express their gratitude to Dr. Mirdza Neiders and Dr. Alfredo Aguirre for their invaluable help in reviewing the manuscript.