In this paper, we have described an approach and early results for characterizing the use and contents of free-text family history comments in the EHR. A manual review was conducted to identify and summarize reasons for use of the comments field. In addition, a semi-automated process was developed to identify and quantify key categories of information within a set of comments.
As reflected in , “Problem in list”, “Onset age – exact”, and “Living status” are among the top 5 reasons for comment use, which conveys that the comments field is being used to collect information that should be entered into available structured fields (i.e., “Problem” and “Age of Onset” in the family medical history portion and “Status” in the family status portion). These reasons along with the reasons “Multiple problems” and “Multiple relations” may be addressed by training or user interface modifications to enable more flexible entry of information (currently, there are two modes for entering family medical history and status, the efficiencies of which vary with respect to the reasons for documentation inferred from this study). Other frequent reasons such as “Missing problem” and “Missing relation” suggest that the locally customized list of values for problems and relations could be enhanced to include additional types of cancer and family members (guided by results from both parts of this study). For example, as shown in , “leukemia”, “uterine cancer”, and “brain” are among the top 10 concepts but are not in the list of values provided for problems; similarly, “cousin” is not currently in the list of values for relation. The aforementioned findings are similar to those described by previous efforts focused on the study of structured “data-entry exit strategies” for understanding reasons for using free-text rather than standardized codes for problems, diagnoses, and medications in the EHR22,29
. The frequency of concepts for “maternal relative” and “paternal relative” also suggests that there may be a need for more flexible specification of side of family (i.e., maternal and paternal). While the list of relations includes some “pre-coordinated” values such as “Maternal Grandmother” and “Paternal Uncle”, there may be value in being able to “post-coordinate” side of family (e.g., separately specifying “Maternal” and “Grandmother”) rather than attempting to anticipate all possible combinations in the list.
The initial pipeline implemented in this study consisted of a pre-processor, MetaMap, and a post-processor. Challenges encountered included misspellings, acronyms, and abbreviations that were found throughout the comments as well as variations in age and date formats. A manual process was used to address each of these challenges to some extent in this study where future work will involve developing more robust and automated methods for handling each of these issues. Next steps also include performing a formal evaluation to characterize false positives and false negatives, and determining what adjustments can be made to the MetaMap configuration used in this study to improve performance. For this study, use of all source vocabularies, SNOMED CT only, and NCI Thesaurus only were tested and found to produce similar results with the former two configurations providing additional concepts, particularly for body parts, organs, or organ components (e.g., Entire Lung [C1278908] in addition to Lung [C0024109] for “lung”). Given the noise introduced by these two configurations, we chose to use NCI Thesaurus only in order to demonstrate the feasibility of using MetaMap to study the contents of free-text family history contents; however, future work would involve incorporating additional sources or potentially all sources to enhance the results, and exploring strategies for filtering concepts as appropriate. For example, SNOMED CT20
and HL7 Version 3.021
could be included as other source vocabularies to detect additional concepts such as “great grandmother”, which was not found when restricting to use of the NCI Thesaurus.
In order to limit the scope, this study focused on cancer-related comments found within the medical history portion of the family history section for a specific time period. Next steps include applying the approach to all comments for the medical history portion (that are associated with a range of conditions as well as a non-specific “Other” value) as well as the status portion. In addition, the techniques could be extended to clinical notes and build upon previous efforts to extract family history information from notes12,13
. A comparison of the various structured and unstructured sources of family history information in the EHR (e.g., free-text comments, clinical notes, and problem list) could then be performed to quantify the distribution of information across these sources and determine if the information is complementary, redundant, or potentially conflicting. Other comparisons include studying the differences in use and contents of comments based on provider characteristics (e.g., role, specialty, or practice) and patient characteristics (e.g., age, gender, or problem). These characteristics or contexts may have significant influence in how and what family history information is documented and contribute to guiding EHR customization. For example, top concepts for Family Group () indicate that aside from the gender-neutral concepts, the occurrence of female relatives is more frequent than male relatives, which may be due to the occurrence of breast cancer related entries in the dataset and supports the potential value of having context-specific functionality (e.g., customized or ranked lists for familial relations based on the selected problem). A broader goal will be to test the generalizability of the approach by applying the methods to other sources of free-text comments in the EHR (e.g., for problems22
) as well as to EHR systems at other institutions.
There have been several initiatives focused on the representation and standardization of information related to family history (e.g., American Health Information Community’s Family Health History Workgroup23
and HL7 Clinical Genomics Family History Model24,25
). In previous work11
, we assessed the adequacy of the HL7 Clinical Genomics Family History Model and HL7 Clinical Statement Model26,27
for representing family history information in a set of clinical notes. While these existing models were found to be able to represent most information, the results indicated that several enhancements are needed including ability to represent paternal/maternal side of family and flexibility in handling age information such as different age events (e.g., current age, age of onset/diagnosis, and age of death), non-specific ages (e.g., elderly), and age ranges (e.g., 50-60). The findings from the present study further support the need for such enhancements and will be used to extend the Merged Family History Model that was created in this previous study. In addition to contributing to these modeling efforts, the results of this work may also be used to supplement relevant vocabularies or code systems (e.g., the HL7 V3 Vocabulary for RoleCode28
that defines a list of relatives) with additional values found in the comments (e.g., great aunt).
Collectively, the results from both parts of this study provide valuable insights into clinician thought-processes and specifically how the comments field for family history has been used. These findings could help inform recommendations for enhancing system functionality and user training for improved use and collection of family history information. In addition, the ability to automate the extraction, structuring, and encoding of information captured within family history comments may further improve use of this information by making it more accessible for patient care, decision support, and research. Complementing the approach described in this study with qualitative methods (e.g., interviews and focus groups with clinicians and researchers) could provide further insights to the needs and uses of family history for guiding enhancements and customizations in the EHR.