|Home | About | Journals | Submit | Contact Us | Français|
The Meaningful Use (MU) program has increased the national emphasis on electronic measurement of hospital quality.
To evaluate stroke MU and one VHA stroke electronic clinical quality measure (eCQM) in national VHA data and determine sources of error in using centralized electronic health record (EHR) data.
Our study is a retrospective cross-sectional study of stroke quality measure eCQMs vs. chart review in a national EHR. We developed local SQL algorithms to generate the eCQMs, then modified them to run on VHA Central Data Warehouse (CDW) data. eCQM results were generated from CDW data in 2130 ischemic stroke admissions in 11 VHA hospitals. Local and CDW results were compared to chart review.
We calculated the raw proportion of matching cases, sensitivity/specificity, and positive/negative predictive values (PPV/NPV) for the numerators and denominators of each eCQM. To assess overall agreement for each eCQM, we calculated a weighted kappa and prevalence-adjusted bias-adjusted kappa statistic for a three-level outcome: ineligible, eligible-passed, or eligible-failed.
In five eCQMs, the proportion of matched cases between CDW and chart ranged from 95.4 %–99.7 % (denominators) and 87.7 %–97.9 % (numerators). PPVs tended to be higher (range 96.8 %–100 % in CDW) with NPVs less stable and lower. Prevalence-adjusted bias-adjusted kappas for overall agreement ranged from 0.73–0.95. Common errors included difficulty in identifying: (1) mechanical VTE prophylaxis devices, (2) hospice and other specific discharge disposition, and (3) contraindications to receiving care processes.
Stroke MU indicators can be relatively accurately generated from existing EHR systems (nearly 90 % match to chart review), but accuracy decreases slightly in central compared to local data sources. To improve stroke MU measure accuracy, EHRs should include standardized data elements for devices, discharge disposition (including hospice and comfort care status), and recording contraindications.
The online version of this article (doi:10.1007/s11606-015-3562-5) contains supplementary material, which is available to authorized users.
Quality measurement use for ischemic stroke in the US has expanded considerably in the past decade.1 , 2 The Joint Commission requires eight quality measures in its Primary Stroke Center certification program, those participating in the Get With the Guidelines program from the American Stroke Association need to report these eight and other quality measures, and the Centers for Medicare and Medicaid now publicly report ischemic stroke mortality and readmissions on their Hospital Compare website.3 The Affordable Care Act included the eight stroke indicators in its Meaningful Use (MU) program,4 which provides incentives to institutions whose electronic health records (EHRs) are able to automatically collect and report hospital-based quality measures without manual chart review. It is unclear at this time how many EHRs are able to meet this standard, and the extent and nature of errors in the collection and reporting of these data are not known.
The Veterans Health Administration (VHA) does not participate in the MU EHR incentive program, but complies with other national quality measurement programs (e.g., HospitalCompare.gov) and has a very long history of data-driven quality management. In 2012, the VHA adopted three self-reported stroke quality indicators and in January 2015 started to conduct External Peer Review Program (EPRP) chart reviews for the eight inpatient stroke MU measures, a process that involves contracted abstractors manually reviewing charts 1 to 2 months after discharge.5 This process maximizes accuracy but can be costly and time consuming, and it often results in a delay between patient care and reporting of these measures. There is increasing interest in using electronic clinical quality measures (eCQMs) not only to reduce the cost of obtaining data but also to provide a closer pairing of the quality feedback to the time of patient care.
The purpose of this study was to develop and validate inpatient stroke eCQMs that are part of the MU program and are relevant to the VHA’s desire to monitor and improve inpatient stroke care. This project examined four stroke MU measures and one VHA stroke eCQM in national VHA data and determined sources of error in the eCQMs compared to standardized chart review.
This project was conducted as an operational initiative in partnership between the Indianapolis Center for Health Information and Communication and the VHA Office of Analytics and Business Intelligence (OABI). Project activities took place under a jointly approved Memorandum of Understanding between the Richard L. Roudebush VA Medical Center (RVAMC) and the OABI. Collection of the criterion standard chart review data took place as part of a prior research project, the Intervention for Stroke Performance Improvement using Redesign Engineering (INSPIRE) project;6 use of these prior chart review data for the purpose of validating the eCQMs was approved by the local IRB and VA R&D committees.
We used multiple sources of VHA data, including data from the local Veterans Health Information Systems and Technology (VistA) EHR files (which include individual patient record data in both nationally standardized and locally adapted data elements including ‘health factors’ and ‘orderable items’) and from the CDW (see Supplemental Appendix for a complete list of data elements used).7 VistA data from every VHA facility are updated every 24 h into the CDW, which also incorporates data from other VHA data systems. CDW data are stored in relational databases and organized into discrete data domains. Multiple types of patient identifiers allow identification of unique patients and linkage of individual patient data across CDW data tables. For this study, we used patient social security numbers and admission/discharge dates to match CDW data to the identified and chart review verified stroke admissions from the INSPIRE study.
We initially constructed Structured Query Language (SQL) queries to generate five eCQMs: four MU measures following the specifications for the CMS Stroke National Inpatient Quality Measures version 4.2a8 effective for calendar year 2013 and the VHA NIH Stroke Scale (NIHSS) indicator following the VHA Inpatient Evaluation Center (IPEC) specifications for this indicator. MU measures developed included: (1) STK-1, venous thromboembolism (VTE) prophylaxis; (2) STK-2: antithrombotic (AT) at discharge; (3) STK-5: AT by hospital day 2 (AT by HD2); (4) STK-10: consider for rehabilitation (Rehab). The SQL queries initially were developed to run on local VistA EHR data housed in the VISN 11 Regional Data Warehouse. We used text mining within the SQL queries to identify NIHSS performance in any notes and to assess for mentions of “hospice” in the discharge summary only (see Supplemental Appendix for indicator algorithms and data definitions). For each eCQM, we compared the SQL algorithm results to the results of local chart review on all confirmed ischemic stroke admissions to the RVAMC in the year 2013 (N = 98), identified from discharge ICD-9 codes as specified in the stroke national inpatient quality measure specifications.8 Although they are included in the national specifications, we did not include hemorrhagic stroke admissions because the criterion standard chart review cases from the INSPIRE study included only ischemic stroke admissions. Hemorrhagic stroke admissions are a small percentage of all strokes (about 10–15 %); therefore, we do not expect this exclusion to dramatically affect results.
We compared each eCQM denominator and numerator to the corresponding chart review indicator result. Chart review indicators were assessed in the INSPIRE project using the definitions as specified by CMS.6 Trained chart abstractors used a standardized chart review manual to guide abstraction of data elements. Throughout this study, we conducted a random 10 % interrater reliability assessment for all data elements included in the indicator algorithms; all kappa statistics for these data elements were>0.80.
Starting with the local VistA data, we then categorized each eCQM denominator and numerator compared to the corresponding chart review result and examined all mismatches to identify modifications to the SQL algorithms to improve eCQM performance. Iterative improvement involved generating reports of all mismatches and categorizing them as false negative or false positive. Examination of chart review data and electronic algorithm results was done to determine specific sources of error. Our group met to review all errors and discuss solutions; agreed-upon solutions were included in the next revision, and a new mismatch report was generated. In each comparison, we reviewed the number of false-negative and false-positive errors, seeking to minimize the total number of errors in the denominator and numerator separately. For example, in the STK-10 indicator we introduced SQL string searches to identify hospice discharges. Denominator false-negative results in version 2 increased from 23 to 56 (+33 errors), but denominator false-positive results fell from 101 to 51 (-50) for a net benefit of 17 fewer errors, so this text-mining strategy was carried forward into subsequent versions. We did not use natural language processing to extract free text data elements as we felt this technique would not be easily applied in subsequent real-world use cases.
Once eCQM performance had been optimized in the local VistA data, we mapped the data elements in each algorithm to the corresponding VHA CDW data element and table. We ran the SQL queries on the cohort of stroke admissions abstracted in the INSPIRE study (N=2130, representing stroke admissions from 11 VAMCs between 2009–2012) and again compared denominators and numerators to the chart review results, categorized errors, and made iterative modifications of the algorithms as above to improve eCQM performance.
For each eCQM denominator and numerator, we calculated the raw proportion of matched cases, the sensitivity and specificity, and the negative and positive predictive value (NPV, PPV). In the CDW sample we also characterized the overall agreement of each eCQM at the level of the stroke admission as ineligible for that indicator, eligible-passed, or eligible-failed. Since the NPV and PPV calculation is influenced by the distribution of the data, and since some of the indicators have a low proportion of eligible patients and/or a low proportion failing the indicator, the resulting PPV/NPV and agreement calculation can be highly penalized for a single mismatch.9 To correct for this problem, we computed both a weighted kappa and prevalence-adjusted bias-adjusted kappa (PABAK) for the overall agreement of each indicator and also computed the recommended observed and expected proportions of agreement and the prevalence and bias indices for each eCQM.10 , 11
Patient characteristics of the local and CDW samples were similar except for a higher proportion of Whites and smokers, with more severe stroke (NIH Stroke Scale 6.84 vs. 5.67) and shorter length of stay in the local sample (Table (Table11).
The raw proportion of matched cases in each denominator and numerator ranged from 91.2 %–100 % in the local sample and from 86.4 %–99.7 % in the CDW sample (Table (Table2).2). Sensitivity was high in local (range 92.9–100 %) and CDW (86.4–99.7 %) in both numerators and denominators; specificity was high for numerators in both local and CDW (90.9–100 % and 90.8–97.0 %, respectively), but was lower for denominators (as low as 57.1 % local and 30.8 % CDW), often due to low prevalence. Positive predictive values (PPVs) ranged from 96.4 %–100 % in the local sample and from 96.9 %–100 % in the CDW sample; they were overall higher than negative predictive values (NPVs). The overall indicator agreement is shown in Table Table3,3, with prevalence-adjusted bias-adjusted kappa values ranging from 0.73–0.95.
The range of passing rates, mean passing rates, and difference between eCQM and chart passing rates are shown in Table Table4.4. In general, the ranges in the difference in eCQM and chart review passing rates for each indicator were of relatively small magnitude and similar direction across facilities. The only indicator with a statistically significant difference in passing rate was VTE prophylaxis, which was 10.6 % lower (worse performance) in the eCQM.
The distribution of error rates in the eCQMs is shown in Figure Figure1.1. This illustrates that each eCQM has a unique pattern of errors: almost 90 % of the errors in the VTE prophylaxis eCQM result from numerator false negatives, while more than 60 % of the errors in the NIHSS eCQM result from numerator false positives. Both of these indicators have the majority of errors resulting from numerator mismatches (error in determining passing), while the AT by HD2 and the Consider for Rehab eCQMs have the majority of errors resulting from denominator mismatches (error in determining eligibility).
The specific types of eCQM errors considering all eCQM results combined are shown in Table Table5.5. The two largest categories of error are devices not identified (58.0 % of all numerator false-negative results) and inaccurate hospice discharge (51.7 % of all denominator false-negative results). Medication errors at discharge had multiple causes, including counting prescriptions that were either prior to or following the admission (counted as “incorrect VHA prescription”) or lack of clear documentation of discharge medications; together these accounted for 27 % of all numerator false positives.
This study demonstrates that eCQMs can be constructed with nearly 90 % or greater matching of denominator and numerator status in patients admitted with acute ischemic stroke. PPVs for these measures are generally high, suggesting that eligibility assessments are largely accurate and a passing result is likely correct. We specifically demonstrated that stroke MU indicators can be relatively accurately generated from existing EHR systems, in this case the VHA EHR. Our data also demonstrate that a relatively small number of error types is responsible for a large number of the observed mismatches, suggesting specific areas on which EHR developers and informaticians could focus to make improvements in eCQM accuracy for MU measures by standardizing a relatively small number of EHR data elements.
Our agreement results appear substantially better than those of previous studies outside of the VHA,12 , 13 including Persell et al.,14 who found misclassification ranging from 15–81 % for those that failed quality measures for CAD—their numbers improved when they added free text information. Two studies in the VHA did find relatively high concordance between eCQMs and chart review, similar to our results, but these looked specifically at different sets of data (in a discharge summary and the outpatient setting, respectively) rather than constructing inpatient indicators.15 , 16 Some studies report that eCQMs involving prescriptions with few contraindications seem to be captured well, with the exception of those with many documented contraindications in the text (such as warfarin for atrial fibrillation);17 however, we found that documentation of medications at the time of hospital discharge has a number of error sources. Discharge medication indicators that can be met by prescription of aspirin are complex because it is often purchased outside of the VHA and is not always recorded in the EHR as a non-VHA medication. Devices, especially those not routinely ordered, have also been shown to be consistently difficult to capture in eCQMs, as we observed in our study.14
Although studies have shown that workflow and documentation habits affect EHR-derived quality measures,18 compared to most other studies,14 our project more fully characterized the source and types of errors in eCQMs. This detailed information can be useful to systems seeking to implement these Meaningful Use measures or to improve use of the eCQMs overall. For example, many of the errors in assessing eligibility of several eCQMs come from unstandardized discharge disposition: inaccurate hospice status and other discharge disposition errors together account for 56 % of denominator false negatives and 29 % of denominator false positives. This suggests that improving the standardization of the EHR discharge disposition assignment could improve the accuracy of multiple eCQMs. Identifying EHR changes that increase acceptance of the measures by minimizing the type of errors that clinicians most dislike could also be helpful by reducing major sources of error in denominator false positives (i.e., incorrectly labeling ineligible patients as eligible) and reducing numerator false negatives (i.e., failing to identify an appropriately completed process). Our data would suggest that improvement of these errors may be accomplished by (1) documentation of contraindications, (2) standardized discharge disposition categories, (3) standardized device orders, and (4) more complete use of Bar Coded Medication Administration (BCMA) to document ED medication orders and medications active at discharge.
Some errors that we observed in this study are likely to be less problematic today as the VHA EHR has evolved; for example, at the time of this study there was no structured BCMA category for medication refusal. This now is part of that system, and we suspect that these errors would be reduced in a current sample of stroke admissions. Also, BCMA is increasingly being implemented for outpatient medication administration, including VHA Emergency Departments. Finally, the documentation of non-VHA medications may have improved over time, although there remains difficulty clearly documenting medications at the time of discharge, especially when not provided by the VHA due to either lower cost non-prescription medications (e.g. aspirin) or medications prescribed by another system.
The one indicator that most extensively employed SQL text string searching performed extremely well; this may be due to the relatively unique name of the stroke severity scale (the NIH Stroke Scale) and the few and common abbreviations for this scale. The errors typically encountered in this eCQM did not represent cases of finding text that referenced something else, but rather finding text references to the NIHSS that did not indicate completion (e.g., “Couldn’t do NIHSS because patient was uncooperative”). We also used text string searches for “hospice” in the Discharge Summary, which did improve indicator performance for eligibility assessment overall (reduced denominator false positives), but also produced some false-negative results when hospice was discussed but not actually provided.
Accurate quality measurement is critical for improving stroke care and outcomes. Stroke care in the VHA is not ideal,19 and measurement is the first step toward identifying deficiencies and addressing them; however, manual chart measurement can be onerous and time-consuming, leading to delays in recognition of suboptimal care. Accurate, automated eCQMs such as those developed in this study will not only meet Meaningful Use measure requirements, but hopefully will provide early feedback to frontline providers and administrators for institution of quality improvement in stroke. This will only be possible if certain data elements are standardized to ensure the accuracy of eCQMs.
Some limitations of our study may slightly limit its generalizability to other quality measures and systems. First, the quality measures we examined did not require interfaces from outside devices such as those used in laboratory reporting, and such indicators have been reported to be problematic. Second, fixing some errors in the CDW algorithm created additional errors, which limited how much the algorithm could be optimized (e.g., text searching for “hospice” generated both false negatives and positives); however, many of these errors could be corrected with standardized data. Lastly, data collected by other EHR systems different from the VHA EHR might require a different set of algorithms,20 but many of the required data elements and identified errors are likely to be similar. We did not use natural language processing (NLP) in our project, although this method may be useful to increase the accuracy of identifying highly text-based information, has a proposed methodology to formalize NLP for clinical indicators,18 and has been used successfully in VHA projects.13 Although we did not include all eight stroke MU measures in this report, we are currently finalizing two additional MU indicators (anticoagulation for atrial fibrillation and statin medication at discharge), and we expect that our results will generalize to these remaining MU indicators since we need no additional sources of data to construct these final indicators. We did not attempt to construct eCQMs for the remaining two MU indicators (thrombolysis for eligible patients and stroke education) since key data elements including time of stroke symptom onset and patient-specific risk factor education documentation are not part of current VHA EHR data and would thus require substantial NLP.
MU indicators for stroke can be measured relatively accurately in a centralized EHR, and the relatively few sources of error could be addressed with the data standardization germane to any data system. To improve stroke MU measure accuracy, EHRs should include standardized data elements for devices, discharge disposition including hospice and Comfort Care status, recording of contraindications, and medications given in the emergency department. Future research should examine whether EHR-based indicators are cost-effective and should focus on linking these eCQMs to patient care in near real time to support care decisions and increased care quality.
Below is the link to the electronic supplementary material.
(PDF 325 kb)
Michael S. Phipps reports being on the Clinical Advisory Board for Castlight Health, a direct-to-consumer healthcare information company. Jeff Fahner reports no conflicts. Danielle Sager reports no conflicts. Jessica Coffing reports no conflicts. Bailey Maryfield reports no conflicts. Linda S. Williams, MD, reports no conflicts.