Although redundancy in electronic documentation is widely recognized, this is the first paper to describe the use of sequence alignment algorithms to quantify redundancy in clinical narrative documentation. The study established the feasibility of exploring redundancy in the narrative record by using an algorithm typically used in bioinformatics to align genetic sequences and in plagiarism detection. The algorithm aided in the quantification and visualization of redundancy within and between document types and its output afforded us natural visualization of the alignment of a sequence of notes. It revealed instances of copy and paste, as well as copied information that has been edited.
As our study looks at syntactic duplications, it only serves as a proxy for the study of conceptual redundancy. Despite this it provides strong indications that redundancy is prevalent in our data set. Future studies involving a larger set of documents and conceptual alignment facilitated by a combination of natural language processing and manual review may better characterize redundancy with respect to quantity and content.
Signout and progress notes, which are sequential and are generated daily throughout the admission, iteratively evolve. The evolution decelerates, however, as the admission progresses. As patients' problems that motivated the admission are diagnosed and treated, less new information is introduced into the clinical narrative.
Despite the overall decrease in amount of unique information in progress notes over the course of an admission, there appears to be a slight increase toward the end of an admission (see ). Although statistical significance cannot be evaluated given our small dataset, a casual review of the aligned notes suggests that this brief rise may correspond with the introduction of new information associated with discharge planning. This leads us to speculate that some clinical events correspond to measurable information ‘injections’ in the narrative continuum of an admission. A consultation, for example, might result in the introduction of a significant amount of new information. Similarly, adverse events may result in measurable spikes in the amount of unique information introduced into the record.
Although outside the scope of this study, the exploration of such measurable information injections might extend the utility of clinical narrative. Retrospective analysis with this method has the potential to be a valuable research tool, but we may even be able to prospectively surveil narrative documentation for real-time identification of important clinical events, as well as excessive copy and paste.
The subjective review of the content of information that is conserved across document types suggests that there exist circumscribed, persistent modules of information that appear in different notes and evolve in different ways. Not surprisingly, the past medical history, for example, remains relatively constant throughout the admission, but appears in multiple documents. In next-generation EHR implementations, centralization of these different modules of information, and improved capacity to reference or link to existing narrative data could contribute to the reduction of redundancy in clinical notes. Rather than retyping sections of notes one could imagine the EHR facilitating a manual process of daily report generation. A clinician could create pointers to information that is still relevant each day, and add only what is new and clinically relevant. This might relieve the clinician of the burden of retyping or copying, and allow more attention and time for clinical decision-making and patient care.
Our study focused on the processing of free-text narrative, and we therefore did not address the issues of redundancy associated with structured documentation. For example, many EHRs allow clinicians to note only abnormal findings, or, in some instances of nursing, to document by exception only interventions beyond those in the institutional standard of care.
14 The EHR subsequently generates a relatively large document that includes normal/standard of care as well as exceptions. While such an approach may ease redundancy in data entry, the potential impact on comprehensibility has not yet been quantified.
Our findings suggest future directions for investigation. Most importantly, we may need to consider a model for next-generation documentation where billing/compliance information becomes an epiphenomenon of clinical documentation—parallel, auditable, and separate from more salient clinical narrative. Such a model may move us from a debate over ‘good’ versus ‘bad’ redundancy to one of how to enable ‘smart’ redundancy. This would ensure that facts which are valuable for clinical communication are propagated (eg, an abnormal but stable physical exam), and that summary documents (eg, signout notes and discharge summaries) summarize so that they are semantically redundant but concise.
Limitations
Although quantification of information duplication via the direct alignment of words in very ‘noisy’ text is not optimal, conceptual alignment was out of the scope of this study. The use of lexical alignment as opposed to conceptual alignment may slightly underestimate the amount of information duplication in clinical documents. On the other hand, we believe this is offset by the fact that occasionally words are aligned that are not contextually related, which very slightly overestimates the amount of information duplication in clinical documents.
We limited the scope of this study to four note types: admission, signout, progress, and discharge summary, because they are typically generated by multiple clinicians from the same service. We believe, however, that there are numerous other document types to and from which clinicians are likely to duplicate information. A more thorough study might include notes from rehabilitation and social work services, reports, procedure and consultation notes, and notes from previous admissions. We also only studied a small sample of documents from one service of a single academic medical center. We only studied documents written electronically in WebCIS, a system used only at our institution. However, the WebCIS notes we studied were ‘free-text’ so the findings may be applicable to documents created in systems with similarly unstructured notes. Neither templated, dictated, nor handwritten notes were studied.