Building consumer health vocabularies by mapping to standardized medical vocabularies requires an approach for dealing with consumer terms that refer to concepts not yet represented in those vocabularies. The findings of this study suggest that the overlap between the conceptual universes underlying lay health language and professional terminologies is large. Of the 1,046 terms extracted by the OAC CHV development team, only 64 could not be mapped to existing UMLS concepts. Moreover, 47 of these terms denoted concepts that could be present in professional medical discourse. Some of these legitimate concepts could reasonably be expected to appear in future versions of the UMLS (for example, novel drugs and procedures); and most of these legitimate concepts could be constructed by post-coordinating existing UMLS concepts. Only 17 terms referred to concepts that would make a health professional frown or shrug at in puzzlement.
The findings also point to some interesting differences between the concepts derived from the two datasets used in this study. While most concepts derived from the query-based set were narrower (more specific) than their closest UMLS relatives, and oncology was the single most represented domain, most concepts derived from the free text set had non-hierarchical relationships with their UMLS relatives. The most widely represented domain was sexual health, which was not surprising, given the high proportion of sexual health-oriented messages on the bulletin boards.
Implications for Vocabulary Building
These findings suggest that most of the labor in building consumer health vocabularies indeed lies in bridging consumer-preferred terms and physician-preferred terms referring to the same concept—for example, equating shakes and tremors, sugar and glucose, cancer and malignant neoplasm. This should be a cause for optimism, as translation between languages that describe the same realities is a manageable, albeit labor-intensive, task. In addition, almost all new concepts identified in the course of this study were found to be closely related to existing UMLS concepts. Finding such relationships makes the new concepts potentially useful for information retrieval.
The study also provides some pointers to conceptual differences in lay and professional thinking about health and disease, which requires further investigation by vocabulary builders. The prevalence of terms that can be derived by post-coordination supports the findings of other researchers that patients' organize their health knowledge in a way that is different from professionals. 17
From the professional perspective, cancer therapies may be organized according to their mechanism of action (e.g., chemotherapy, immunotherapy), irrespective of the cancer type. From the perspective of the patient, however, information needs are directly connected to a specific diagnosis and its effect on their life course; thus, cancer therapies are more likely to be organized according to the bodily systems affected by the disease (e.g., Colon Cancer Treatment
, Cervical Cancer Treatment
). Despite the common-sense vocabulary builders' notion that lay concepts are “fuzzier” than the professional ones, this study reveals that lay concepts are more likely to be “narrower-than” than “broader-than” their closest UMLS relatives. This finding also appears to support the notion that individuals' thinking of health issues is very specific to the details of the individual situation. Understanding patients' information organization is essential in building information portals and supporting information retrieval.
The findings also suggest that in some domains and settings, the number and breadth of semantic coverage of non-mapping concepts may be greater than in the others. One of the goals of this study was comparison of the non-mapping concepts in two sets, one extracted from MedlinePlus®
queries and the other extracted from consumers' free text exchanges. The query-based set produced many more concepts that were narrower than their closest relatives and could be constructed by post-coordinating existing concepts. In contrast, the free text set produced many concepts having non-hierarchical relationships with existing UMLS concepts. This suggests that the degree of the lay-professional language overlap in query analysis may be deceptive. When lay individuals communicate in what they perceive as the professional setting, they may adjust their language to that of health professionals. 18
However, when talking with people they perceive as peers, they may use a somewhat different language, the one with which they are more familiar and comfortable. They may also be more likely to operate with concepts that are not part of standard medical worlds. Furthermore, these conceptual differences may be more prominent in some domains than in others. The free text set included many terms from the domains of sexual health and wellness/beauty/physical fitness. These domains generated some concepts that were truly lay, such as the mysterious M-Spot
. It is desirable for future studies to focus on identifying lay health concepts in the domains where deviations from traditional professional views are likely to abound (e.g., sexual health, alternative medicine). This study also suggests that uncovering truly lay health concepts is a slow process; innovative methodologies for streamlining the task are desirable.
Implications for Interpreting Lay Models of Health and Disease
Existence of lay concepts that do not overlap with professional medical concepts suggests that patients and consumers may have unique models of health and disease, which differ from those used by professionals. Does the scarcity of such concepts identified in this study suggest that the differences between lay and professional health models are negligible? Studies that investigate lay understanding of specific diseases point to the contrary. 17,19
In the background section of this paper, we proposed four possible relationships between lay and professional term/concept pairs. This study investigated the (relatively uncommon) situation, when lay individuals use terms that cannot be mapped
to the professional vocabulary via automated or manual methods, and require the creation of new concepts. We did not, however, consider the case of lay usage
of professional terms, when a lay individuals use existing professional terms, but ascribe to them meaning that differs from their professional definition; for example, depression
. This case may be as common as it is difficult to investigate.
One can argue, however, that the usage of almost any health term used by a non-health professional will involve some vagueness or alteration of meaning. For example, as mentioned earlier, when consumers use the term “heart”, they are likely to know that it is an organ that pumps the blood through the body, but may not think of it as a “four-chambered organ that receives the blood from the veins and contracts to send it through the arteries.” In addition to containing fewer details and having some vagueness, concepts in lay models are likely to differ from the professional ones in their organization and relationship to one another. Understanding these relationships is important for connecting concepts in consumer health vocabularies.
Limitations of the Study and Directions for the Future Research
The main limitation of this study is the lack of the context in which communication was taking place; a context which would help us interpret the full meaning ascribed by individuals to the health terms they used. Set A, the query data set, provided us with isolated search engine queries; Set B, the free-text data set, provided us with more context, but did not allow us to probe the message writers about the terms they used and what they meant. An additional problem with the query data set is the potential tendency of web portal users to imitate what they perceive as the professional medical language, which may have limited the opportunity for discovering unique lay concepts. 18
This limitation is the negative but necessary aspect of our methodological approach, which allowed the extraction of large numbers of consumer health terms from the corpus and establishing use frequency statistic for different terms. Other researchers have conducted patient surveys, analyzed transcripts of doctor-patient interactions, and recorded physicians' recall of patients' words they found difficult to understand. 3,20,21
While these methods provide more context in which to understand the usage of individual words, they are less systematic than those employed in this study, and are additionally more likely to yield regionalisms and extremely rare concepts. However, when examined with caution, the findings of such studies can be used to supplement the list of non-mapping concepts in our consumer health vocabulary.
The analysis of the free text corpus suggests that this may be a fruitful venue for lay concept discovery. As mentioned previously, our analysis suggests that not all content domains may be equally abundant with lay health concepts. Additional studies should identify and explore promising domains. Future work should also focus on other categories of professional/lay concept mismatch, including cases where lay individuals ascribe unique meaning to professionally sounding terms. Finally, further studies are needed to characterize the difference between consumer knowledge of health of terms and their understanding of the underlying concepts (e.g., Keselman et al. 22
). The field of consumer health vocabulary development is relatively new. As it matures and vocabularies grow, the aspect of creating new, well defined uniquely lay concepts will become more prominent, and the quality of procedures for defining such concepts and relating them to the existing ones will affect the quality and usefulness of the vocabularies.