As the country intensifies its efforts to standardize performance measures, and payers offer incentives to providers to adopt EHR systems, it is timely and important to assess the validity of claims data to identify populations of interest and consequently to serve as the basis for measures used in quality improvement and public reporting. In the past, claims data were often used because no other data sources were readily available for large-scale analysis. Consequently, many of the existing methods for identifying a target patient population were created to accommodate limitations of claims data. However, in the absence of true clinical data, it can be very hard to identify which patients have a condition of interest from claims data alone.
Commonly used administrative measures include a requirement that a patient have two visits with an encounter diagnosis of interest. The two-visit requirement probably was intended to assure a certain level of specificity in determining whether a patient has the target condition. 16
Unfortunately, the requirement may disqualify a significant number of patients who have the disease. In our study, we found that requiring two claims with diabetes as an encounter code excluded 25% of patients found to have diabetes by expert review, despite each patient having had multiple visits to the practice during the measurement period. Diabetes may not have been entered as an encounter diagnosis either because it was not dealt with during a specific visit or the person entering the billing diagnosis may not have entered all the diagnoses associated with that encounter.
In contrast, patients with a specific diagnosis can often be easily identified using coded data in an EHR, such as an entry on the problem list. The ability to reliably identify patients based on diagnoses listed in the problem list depends on the use of standardized codes for diagnoses. Although there are some EHR systems that capture diagnoses as text strings, the majority of commercial EHR systems capture diagnoses in coded form. In our study, diabetes was found on the problem list for 94% of the patients identified as having diabetes by expert review in the clinic studied. The problem list in EHRs has been shown to be about twice as accurate as problem lists maintained on paper, 17
although its reliability depends on the policies adopted by a clinic pertaining to problem list maintenance, and on the diligence with which these policies are followed. Building in benefits for EHR users based on reuse of diagnoses entered on the problem list (e.g., triggering reminders, providing decision-support) increase the motivation of users to keep the problem list up-to-date and accurate. EHR products should also allow for versioning of data within the EHR (e.g., problem lists, medication lists) so that the state of knowledge about an individual patient can always be reconstructed for a given point in time.
Calculating performance measures using only the subset of the target population determined by administrative data may significantly bias quality reports. In our study, patients with fewer than two face-to-face encounters for diabetes were significantly less likely to receive recommended diabetes care. For example, 97% of patients with two or more visits coded for diabetes had a glycohemoglobin measured within the preceding year, whereas only 68% of patients with known diabetes but seen fewer than two times for diabetes care during the study period had the test done on time. Since the encounter diagnosis is often triggered by tests ordered during the visit, using the encounter diagnosis as the inclusion criteria for the denominator effectively selects patients that are actively receiving care for diabetes, and amounts to a self-fulfilling prophecy. The study clinic produces its own internal clinical quality measures derived from the EHR developed specifically for quality improvement (which are not identical to the measures used in the DOQ project) and determined that it satisfied the clinical guideline 78% of the time across the entire population of seniors served. The clinic is managing its quality improvement programs using the internal clinical quality measure.
We recognize that limiting the target population to those with two coded visits may help exclude patients that may not be appropriate to include in the quality report for a specific clinic. For instance, patients with fewer than two diabetes visits may be obtaining diabetes care elsewhere. Unfortunately, this methodology may also exclude patients under the care of the practice who have untreated diabetes or unrecognized diabetes. Performance measures with an inadvertent bias that excludes patients with inadequately treated conditions may provide an inaccurate picture of the care actually provided. Some would argue, from a total quality perspective, that performance measures should also provide incentives for screening appropriate populations for a diagnosis, such as diabetes.
Another consideration when developing quality reports is whether to include patients in the denominator who may have just seen the physician at the tail end of the reporting period, offering no opportunity for appropriate intervention by the physician. We believe that a physician group is managing a population in addition to individual patients. Thus, outreach programs that increase awareness on the part of individuals with certain diagnoses or risk factors of the need for appropriate clinical attention may be an important responsibility of a health-care organization. Since health promotion and health-care delivery are community based services, one could argue that the performance measures for the services should also be community based. We acknowledge that there are differences of opinion about this assignment of responsibility.
Recognizing the importance of having robust measures of clinical performance in assessing quality and administering quality incentives, we are concerned about the systematic bias introduced by using metrics, such as those based on claims data, when more reliable clinical measures may be defined for practices using EHRs. Measuring performance can have positive effects on patient care, but accuracy is critical. Definitions that cause a systematic overestimate of quality delivered can cause organizational attention to be focused elsewhere, when, in fact, additional work needs to be done. Measures that underestimate the quality of care delivered can frustrate providers trying to improve, and deprive them of recognition they deserve. The most efficient solution is to reuse clinical data generated as a byproduct of clinical care. Ideally, data are entered once by the most appropriate professional for the purpose of providing care, and reused multiple times for the purposes of measuring quality, paying for performance, and generating knowledge about the effectiveness of treatments. Reuse of data not only improves data quality, it reduces the cost of secondary use of data—a welcome relief for providers often burdened with reporting mandates as an additional task or practice cost.
Use of administrative claims data as input for performance measures is also problematic for those provider organizations who are primarily capitated and do not need to collect billing data for their normal business operation. In this situation, it makes even more sense for quality data to be derived from EHR systems.
This study used manual review of the electronic health record as the gold standard. We did not identify nor contact the individuals involved to corroborate any of the findings contained in the documentation. Assessing the accuracy of the documentation was beyond the scope of this study.
Although it would be tempting to migrate national performance measurements to clinically based measures, currently only a minority of practices use EHR systems, 18
which would preclude its immediate use in performance measurement. It would be unfortunate, however, if, as the number of practices using EHRs grew, the country was tethered to a measurement system that lagged behind the deployment of clinical information systems. If pay-for-performance incentive programs continue to use performance measures derived from administrative data, it could have the unintended consequence of rewarding practices that did not convert to electronic systems (who report administrative quality measures with a systematic bias towards higher performance) or, alternatively, penalize those who report quality measures based on clinical data from EHR systems (which capture a larger denominator of patients). We believe policymakers should design measurement systems, and the incentive programs based on them, to take advantage of computer-based information systems that will become the new standard of care. Further study on this issue across a broader range of diagnoses should inform the development of clinically based quality measures. Likewise, a transition plan should be developed to migrate the nation’s use of administratively based quality measures to clinically based quality measures. Given the inherent bias of claims-based quality measures to overstate performance, a temporary adjustment or premium should be built into incentives for reporting quality measures based on actual clinical data from an EHR system. This premium incentive could be time-limited to encourage more rapid adoption of EHR systems.
Developers of clinically based quality measures should take into account what data exist in coded form in EHRs. Efforts to standardize the relevant codes must be undertaken concurrently. Conversely, EHR products should be required to adopt data standards that support the measurement of standard quality measures.