|Home | About | Journals | Submit | Contact Us | Français|
To develop and validate automated electronic note search strategies (automated digital algorithm) to identify Charlson comorbidities.
The automated digital algorithm was built by a series of programmatic queries applied to an institutional electronic medical record database. The automated digital algorithm was derived from secondary analysis of an observational cohort study of 1447 patients admitted to the intensive care unit from January 1 through December 31, 2006, and validated in an independent cohort of 240 patients. The sensitivity, specificity, and positive and negative predictive values of the automated digital algorithm and International Classification of Diseases, Ninth Revision (ICD-9) codes were compared with comprehensive medical record review (reference standard) for the Charlson comorbidities.
In the derivation cohort, the automated digital algorithm achieved a median sensitivity of 100% (range, 99%-100%) and a median specificity of 99.7% (range, 99%-100%). In the validation cohort, the sensitivity of the automated digital algorithm ranged from 91% to 100%, and the specificity ranged from 98% to 100%. The sensitivity of the ICD-9 codes ranged from 8% for dementia to 100% for leukemia, whereas specificity ranged from 86% for congestive heart failure to 100% for leukemia, dementia, and AIDS.
Our results suggest that search strategies that use automated electronic search strategies to extract Charlson comorbidities from the clinical notes contained within the electronic medical record are feasible and reliable. Automated digital algorithm outperformed ICD-9 codes in all the Charlson variables except leukemia, with greater sensitivity, specificity, and positive and negative predictive values.
Comorbidity is defined as any distinct clinical entity that preexists or occurs during a patient's primary disease.1 Various studies have documented the role of comorbidities in predicting a patient's outcome.2-6 The Charlson Comorbidity Index (CCI) was developed to estimate the long-term (1-year) mortality of patients admitted to the hospital or enrolled in research studies on the basis of the comorbid conditions.7 The CCI consists of 19 comorbid conditions, and each comorbidity is assigned a score of 1, 2, 3, or 6 based on the relative risk of 1-year mortality.7 The CCI has been validated in several different populations and is widely used in various health services research and critical care studies.3,8-10
The use of electronic medical records (EMRs) is increasing, and these records are used not only in clinical practice but also in most epidemiologic and health care research. Recent mail survey findings from the National Ambulatory Medical Care Survey conducted by the Centers for Disease Control and Prevention reported an increase in the adoption of EMR systems by US office-based physicians from 18% in 2001 to 57% in preliminary 2011 results.11 As a part of the current health system reform in the United States, the government has invested large sums of money to support and promote the adoption of the EMR system in the country. The 2009 Health Information Technology for Economic and Clinical Health Act provides possible incentives for hospitals to implement EMR systems; hence, most physicians and hospitals intend to implement EMR systems within the next few years.12,13 Traditionally, the CCI was identified and calculated solely using the manual medical record review. In 1992, Deyo et al14 developed an electronic application tool based on International Classification of Diseases, Ninth Revision (ICD-9) codes to automatically calculate the CCI.14 Currently, this method is applied in most research projects for baseline comorbidities adjustments,5,10 although the literature has reported various concerns about the accuracy and underreporting of comorbidities using ICD-9 codes.15-21
With the growing notion of EMRs as a tool to reduce cost and improve safety,22 adoption of EMR systems in US hospitals is steadily increasing. The information overload becomes a hindrance to the effective use of EMRs, which may lead to reduction in performance and alter patient safety.23 Thus, not only is implementation of an automated electronic note search strategy to identify a patient's baseline comorbidities useful in medical research, but early identification of the comorbidities from the EMR might also be helpful for the efficient treatment of patients.
We therefore aimed to develop and validate an automatic note search strategy (automated digital algorithm) based on EMR notes to identify CCI comorbidities. Our secondary aim was to compare the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of automatic note search strategy and ICD-9 code search with comprehensive medical record review (reference standard) in detecting CCI comorbidities from the EMR.
The study was approved by the Mayo Clinic Institutional Review Board, for the use of medical records for research.
The derivation cohort was from a community-based, retrospective study of 1447 eligible patients (patients ≥18 years of age who gave research authorization) admitted to the intensive care unit (ICU) during 2006 from Olmsted County, Minnesota.24 The automated digital algorithm was further validated against a randomly selected cohort of 240 patients who gave research authorization from a retrospective cohort of 651 patients with severe sepsis admitted to the ICU from January 2007 to December 2009,25 using JMP statistical software (version 9.0, SAS Institute Inc, Cary, NC). All those who denied research authorization were excluded.
All the CCI comorbidities in the 5-year interval before the time of admission were manually collected by trained research fellows according to the published definitions by Charlson et al.7 Each record was reviewed by multiple reviewers. We compared the 2 data extraction strategies: (1) CCI scores extracted by the automated digital algorithm and (2) CCI scores extracted by using ICD-9 codes.
The development of an automated electronic note search strategy (automated digital algorithm) requires considerable investment in both time and expertise. This retrospective study used data from the Mayo Clinic Life Sciences System (MCLSS). The MCLSS database is the single centralized database for all the Mayo clinic hospital data. The MCLSS is an exhaustive clinical data warehouse that stores patient demographic characteristics, diagnoses, and hospital, laboratory, flow sheet, clinical, and pathologic data gathered from various clinical and hospital source systems within the institution. The MCLSS encompasses a near real-time model of Mayo's EMR system.26 We used a Data Discovery and Query Builder (DDQB) tool set to access the data contained within the MCLSS database. The DDQB can search for demographic characteristics, clinical data, hospital admissions information, diagnosis codes, procedure codes, laboratory test results, flow sheet data, pathology reports, and genetic data. A valid institutional review board number is needed to retrieve patient data, which can be used for research using the MCLSS. The DDQB provides a unique text search strategy by which researchers can rapidly search for distinct words or entities in the EMR system.
The DDQB is based on Boolean logic to create free text searches.26 All the free text searches were run independently by 2 physician investigators (A.S. and B.S.). To initiate a query, we entered all the synonyms, abbreviations, and the most common symptoms associated with the comorbidity. In addition, we excluded the negative terms mentioned in the clinical notes to make the automated digital algorithms more specific (see Supplemental Appendix 1 for the list of excluded terms; available online at http://www.mayoclinicproceedings.org). For the extraction of all the CCI comorbidities, the automated digital algorithm explored the EMR of each eligible patient during the 5 years before the date of admission in the medical and surgical history section of the EMR. For better understanding, an automated digital algorithm for peptic ulcer disease is shown in the Figure. The automated digital algorithm for CCI comorbidities were continuously refined by adding or excluding terms to improve the sensitivity and specificity to 95% or more. The final search terms used for building the automated digital algorithm are shown in Supplemental Appendix 2 (available online at http://www.mayoclinicproceedings.org). To validate the automated digital algorithm, the sensitivity and specificity were calculated against the reference standard of comprehensive medical record review.
The MCLSS administrative database was used to calculate the ICD-9 coded CCI comorbidities according to the widely used algorithm of Deyo et al.14
Manual data extraction is the traditional method of ascertaining comorbidity data from clinical notes. Trained research fellows manually collected comorbidity data according to the definitions published by Charlson et al.7 Comorbid conditions are mostly recorded in the medical and surgical history section of the clinical notes. So to maintain uniformity and efficiently identify specific comorbid conditions, only the medical and surgical history sections of the clinical notes were ascertained. If the comorbid condition was not identified in this particular section of the EMR, it was assumed to be negative. Research fellows involved in manual data extraction were masked to the automated electronic note search strategy results.
Sensitivity and specificity of both the automated digital algorithm and ICD-9 codes search were calculated based on comparisons of the test results and the reference standard in the 2 cohorts. The PPV and NPV were calculated based on the formula:
The 95% confidence intervals were calculated using an exact test for proportions. JMP statistical software (version 9.0, SAS Institute Inc) was used for all data analysis.
During the study, 1447 consecutive eligible patients admitted to the ICU from January 1 through December 31, 2006, at Mayo Clinic, Rochester, MN, were included in the derivation cohort. The demographic characteristics and baseline comorbidity status of the derivation and validation cohort are summarized in Table 1. The most prevalent comorbidities were chronic lung diseases (24.3%) and diabetes mellitus (24.3%) in the derivation cohort and malignant tumor (32.1%) in the validation cohort. In the derivation cohort, the automated digital algorithm achieved a median sensitivity of 100% (range, 99%-100%) and a median specificity of 99.7% (range, 99%-100%). A summary of the prevalence of CCI comorbidities, sensitivity, and specificity of automated digital algorithm in the derivation and validation cohorts compared with the reference standard is given in Table 2. In addition, the concordance and discordance (true-positive, true-negative, false-positive, and false-negative results) between the automated digital algorithm and reference standard in the derivation and validation cohorts are summarized in Table 2.
Table 3 summarizes the sensitivity, specificity, PPV, and NPV of the 2 data extraction strategies (automated digital algorithm and ICD-9 codes search) for all CCI comorbidities in the validation cohort. The sensitivity for identifying CCI comorbidities using the automated digital algorithm ranged from a minimum of 91% for lymphoma to a maximum of 100% for congestive heart failure (median, 98.5%; interquartile range [IQR], 94%-100%). Sensitivities for extracting comorbidities using ICD-9 codes ranged from a minimum of 8% for dementia to 100% for leukemia (median, 66%; IQR, 51%-76%). The automated digital algorithm achieved a median specificity of 99.6% (IQR, 99%-100%) compared with 97% (IQR, 91%-98%) for ICD-9 codes.
In different CCI comorbidity domains, the PPV ranged from 80% to 100% for the automated digital algorithm, with the lowest being hemiplegia and the highest connective tissue disease, leukemia, lymphoma, metastasis, mild liver disease, peptic ulcer disease, and peripheral vascular disease (100%). The PPV for ICD-9 codes ranged from 0% to 100%, with the highest being leukemia and lowest being AIDS (the ICD-9 codes from Deyo et al for AIDS, ie, 042.x-044.x take into account patients with human immunodeficiency virus and patients with AIDS). The CCI index weight of 6 is given only to AIDS patients and the only patient detected as having AIDS using the ICD-9 code in the validation cohort was a false-positive result, hence the PPV of zero. The NPV ranged from 97% to 100% for the automated digital algorithm compared with 79% to 100% for ICD-9-Clinical Modification codes. The median PPV and NPV were 97.2% (IQR, 94-100%) and 99.6% (IQR, 99-100%) for the automated digital algorithm and 62.5% (IQR, 42%-78%) and 97.2% (IQR, 91%-99%) for the ICD-9 codes, respectively.
Results of this study suggested the feasibility and validity of the automatic note search strategy in identifying CCI comorbidities in the EMR. Our results indicate that the sensitivities of the automatic note search strategy were considerably better than ICD-9 codes to search for all the CCI variables except one, leukemia, for which the sensitivity of both search strategies was 100%. The specificity and the NPV of the automatic note search strategy were also equal or superior to the ICD-9 codes search in all the CCI comorbidities. In addition, our results confirmed the findings of previous studies on the reliability and accuracy of electronic search; for example, Alsara et al26 also reported that electronic query resulted in accurate and highly efficient data extraction.
The CCI is being widely used by health care researchers to predict short-term (30 days) and long-term (1 year) mortalitiy in ICU patients.27-30 To compare the meaningful differences in patients' outcomes, it is essential to balance the baseline comorbidity conditions. The CCI is one of the most commonly used tools to measure the baseline comorbidities before ICU admission. A recent study performed by Christensen et al31 discussed the important role of the CCI combined with administrative data in predicting short- and long-term mortality for ICU patients. Although D'Hoore et al32 described the CCI index as a resourceful way to perform risk adjustment from administrative databases, Poses et al4 reported enhanced discrimination of inpatient mortality using the CCI index. Currently, an ICD-9 code search is frequently used to automatically extract CCI comorbidities.10,33,34 However, the ICD-9–coded administrative databases lack a clinical definition for diagnoses, causing variability in coding practices.35 Our results revealed that ICD-9 codes underreport the comorbidities that substantiate the finding of the previous studies.19,20,36,37 The underreporting could be attributable to extra emphasis on the procedures and complications on admission, compared with the comorbidities, for monetary reasons.20 Romano et al38 also found that the CCI comorbidities were not accurately defined in ICD-9 codes, which produced interobserver variations in ICD-9 codes assigned to the comorbidities. Although automatic searches using ICD-9 codes to identify comorbidities has been used in many research projects, the lack of accuracy in criteria used by the staff who code medical records may differ from physicians' criteria in diagnosing a medical condition, which significantly limits the broad use of this method. The automatic note search strategies were derived from the algorithm-incorporated keyword and program for a query within the specific note section. This approach enhanced the use of the patient database query and tremendously reduced the time when compared with the manual medical record review (mean time taken to manually review 1 patient note for the CCI comorbidities ranged from 5 to 10 minutes). The implementation of an electronic strategy to extract information is not only useful for research purposes but also may be helpful for the treatment of patients.39 Because an automated digital algorithm provides accurate information about a patient's comorbidities, it will help physicians to recognize comorbidity information early and might help in better treatment. Comorbidities act as a prognosticating factor for patient survival and treatment-related outcomes. Patients with higher CCI scores are at increased risk for readmissions and hospitalizations; thus, using automated digital algorithms to identify comorbidities early will certainly be an important factor and might well be used for early palliative consultations if needed in the future. The high sensitivity and specificity of the automated digital algorithm make it an important tool for physicians and investigators in accurately estimating comorbidities and might help in making early decisions and avoiding medical errors.
Another alternative search strategy to identify comorbidities is the Systematized Nomenclature of Medicine–CLINICAL Terms (SNOMED-CT). Although this method produced better performance than the ICD-9 code search strategy, there were also significant limitations for broad use in clinical research.40 Chiang et al41 suggested that SNOMED-CT coding is imperfect and unreliable and requires physician training and repeated testing. Furthermore, SNOMED-CT does not satisfactorily distinguish the exact terms at the clinical interface level for the study template at the current stage.42
Our search strategies also had certain limitations. First, performance of the automated digital algorithm and coding of the CCI is dependent on the quality of the database and consistency of the text entries, which limits the applicability of this approach to units with this database or one similar. However, the logic and the free text search concept could be generalized to other institutions; it provides potential for diffusion of the method at sites willing to replicate the programming effort because the medical documentation training is similar across the country. Because electronic clinical notes are becoming a standard feature of the modern era, our approach will become more generalizable. Second, we only focused on a pertinent section (medical and surgical history) of clinical notes to search for comorbidities, which might have caused us to miss some information provided in other note sections, although the same validation process can be extended to other sections of clinical notes because the concept remains the same. Third, the data can be missed because of errors or corruption in the data warehouse.43 However, this will only account for a small proportion of the database. Fourth, some of the CCI comorbidities definitions are outdated. Since the original CCI was developed in 1987, medicine has undergone a vast amount of change. Certain diseases, such as AIDS, no longer have the same relative risk of mortality as when the CCI was developed. Similarly, criterion for untreated thoracic and abdominal aneurysm 6 cm or larger for diagnosis of peripheral vascular disease needs reassessment. The latest guidelines advocate surgery when the aneurysm is 5.5 cm or larger.44 However, we could refine our search strategy to identify variables according to any new definitions by modifying the algorithm for new definitions. Finally, because of the retrospective nature of the study, we only included documented comorbidity in the definite diagnostic criteria.
In conclusion, CCI comorbidities can be correctly identified using the automated digital algorithm. The combination of good sensitivity, specificity, and easy calculation should encourage physicians to implement the automated digital algorithm in their clinical practice and medical research.
We thank all members of the Multidisciplinary Epidemiology and Translational Research in Intensive Care group for constant and constructive feedback.
For editorial comment,see page 811
Grant Support: This work was supported by the National Institutes of Health grant RC1 LM10468Z-01.