Quality report cards may have the potential to significantly improve health care quality. Report cards can be used to identify high performance hospitals and these hospitals, in turn, can be studied to identify best practices (Epstein 1995
). These best practices can then be disseminated to other hospitals. Report cards can also be used to steer patients toward high-performance hospitals (selective referral) and away from low-performance hospitals (selective avoidance). However, performance profiling is not without risk. In a seminal paper entitled the “Risks of Risk Adjustment,”Iezzoni (1997a
) highlighted one of the fundamental problems of performance profiling: whether or not a hospital is identified as a low-performance or high-performance hospital depends substantially on which scoring system is used to construct the quality ranking. Therein lies one of the major “risks” of report cards: using them as the basis for evidence-based patient referrals may not necessarily lead to better population outcomes if the hospital rankings are not accurate. A priori, report cards based on prediction models that do not distinguish between preexisting conditions and complications would be expected to lead to less accurate quality assessment.
Because administrative data are widely available and relatively inexpensive, they are being increasingly used by third-party payers and the States to construct hospital and physician report cards. However, administrative data have a major limitation. Most ICD-9-CM codes do not indicate whether a secondary diagnosis represents a preexisting condition or a complication. We hypothesized that the absence of date stamping in administrative data, by preventing the accurate specification of disease severity and comorbidities, would yield biased measures of hospital quality.
This study shows that risk-adjustment models based on routine administrative data frequently misidentified low-quality and high-quality hospitals. For example, one-half of the hospitals identified as low-quality CABG hospitals should not have been classified as low-quality outliers. Forty percent of the hospitals providing low-quality care to patients undergoing THR and one-third of the hospitals providing low-quality care to patients admitted with acute myocardial infarction were “missed” using risk-adjustment models based on routine administrative data that did not include date stamp information. We found significant misclassification rates for four out of the seven conditions included in this study: CABG surgery, PTCA, hip replacement, and AMI.
To our knowledge, only two other studies have assessed the impact of date-stamping ICD-9-CM codes in administrative data on the accuracy of quality report cards. Romano and Chan (2000)
studied the impact of using only conditions present at admission to measure hospital quality in a stratified sample of 974 patients with AMIs. Chart reabstraction was used to identify preexisting conditions. They found that using only conditions present at admission had a “major impact on the classification of hospitals [quality] based on risk-adjusted mortality.” A study by Ghali, Quan and Bran (2001)
, based on 50,397 patients undergoing CABG surgery at 23 Canadian hospitals over a 4-year period, showed that the use of a diagnosis-type indicator, similar to the CPAA in California, also had a significant impact on hospital ranking. The findings of our study corroborate the earlier work by Romano and Ghali and further extends their work to include other patient populations.
Clinical data are superior to administrative data for measuring quality. The limitations of administrative data have been well described: errors in the abstraction process, undercoding of comorbidities (Jollis et al. 1993
), variation in the quality of charting by physicians (Pronovost and Angus 1999
), lack of precise definitions for ICD-9-CM codes (Iezzoni 1997b
), underreporting of “complication” codes (Jencks 1992
), and “overcoding” of patient diagnoses to maximize reimbursements (Iezzoni 1997c
). But despite the limitations of administrative data, large-scale efforts to measure quality will continue to rely primarily on administrative data until the necessary information technology infrastructure is created to collect clinical data in a routine and cost-effective fashion. Furthermore, clinical datasets also present problems. The possibility of “gaming” the data always exists when clinicians are responsible for collecting data on their own patients. Because administrative datasets seem destined in the near term to be the foundation of quality assessment, it is imperative that these datasets be as robust as possible. The findings of this study strongly support the addition of CPAA modifiers to indicate whether an ICD-9-CM code describes a secondary diagnosis present at the time of hospital admission.
This study has several strengths. This is the first major study to evaluate the impact of date stamping administrative data on hospital quality reporting across a large spectrum of surgical and medical conditions. Second, the California OSHPD routinely monitors data quality of the California SID. Discharge data reports that do not meet error tolerance levels established by the state are sent back to the reporting institution for correction (California Patient Discharge Data Reporting Manual 2000
). Third, the scope and size of this population-based study, and the fact that it did not exclude patients younger than 65, makes it reasonable to assume that the study conclusions can be generalized to include other medical conditions, as well as populations outside of California.
This study has several important limitations. First, we have assumed that the CPAA modifier accurately identifies conditions present at admission. This data field has not been validated using either chart reabstraction or clinical data. However, an exploratory analysis suggests that this data field has construct validity. For ICD-9-CM codes that are very likely to code chronic conditions, the CPAA modifier indicated that these secondary diagnoses were present at admission 99 percent of the time. For ICD-9-CM codes that designate secondary diagnoses likely to represent complications, the CPAA modifier indicated that these secondary diagnoses were present at admission in 12 percent of the cases. Although the CPAA modifier is being uniformly applied by hospitals to code chronic conditions, there is some heterogeneity in the manner in which hospitals code the CPAA modifier for ICD-9-CM complications codes. It appears that some of this heterogeneity is secondary to complications from previous admissions being correctly coded as present on admission.
However, we did not directly evaluate the validity of the CPPA modifier for ICD-9-CM codes that could represent either preexisting conditions or complications (i.e., AMI). Conditions coded by these ICD-9-CM codes are the very codes for which accurate coding of the CPAA modifier is necessary to assure that hospitals do not receive “credit” for their complications. Such an evaluation of the CPAA modifier would require chart reabstraction. Our analyses of the CPAA modifier were based on the assumption that the degree of accuracy with which hospitals coded the CPAA modifier for either chronic conditions (i.e., COPD) or complications (i.e., postoperative shock) could be extrapolated to conditions which could be either chronic conditions or complications. Therefore, our analysis of the validity of the CPAA modifier provides indirect evidence of the validity of the CPAA modifier and awaits further validation using systematic chart review.
Importantly, we did not find a significant association between hospital coding of complications and changes in hospital quality with date stamping. It is therefore unlikely that miscoding of the CPAA modifier at the hospital level would have accounted for most of the changes in hospital quality assessment that we observed with the application of date stamp information. However, this analysis was based on a limited number of ICD-9-CM codes that were chosen because they would be expected to code for complications as opposed to preexisting conditions. The lack of an association between the coding of the CPAA modifier for complications and changes in quality assessment provides indirect support for the hypothesis that miscoding of the CPAA modifier, in general, did not lead to the changes in quality outlier status that were observed in this study. We did not evaluate the possible association between miscoding of the CPAA modifier for ICD-9-CM codes which could represent either preexisting conditions or complications because, short of conducting a chart audit, we had no way to verify whether the CPAA modifier was being correctly coded in those cases. Therefore, as we did not perform chart reabstraction, and only explored the association of the CPAA modifier for ICD-9-CM complication codes with changes in hospital quality, we cannot rule out the possibility that systematic biases in the coding of the CPAA modifier for other conditions (which could either represent preexisting conditions or complications) could have led to the finding that the absence of date stamp information occasionally leads to the misclassification of hospital quality outliers.
Second, we have defined the “date stamp” models as the “gold standard” for identifying low-performance and high-performance hospitals. While there is no perfect way of identifying “true” quality outliers, and while the “date stamp” has construct validity and offers an opportunity to improve the distinction between preexisting conditions and complications, it requires further validation. Nevertheless, the large substantive differences in the identification of outliers between the “date stamp” and the “no date stamp” models show the importance of identifying complications before constructing risk-adjusted quality measures, and calls into question the use of administrative data for these purposes in the absence of date stamp information.
Third, the definition of quality outliers as hospitals whose observed-to-expected mortality (OE) ratio is statistically different from “1” will result in some hospitals being classified differently as a result of relatively small changes in the confidence interval for the OE ratio. Small quantitative changes in a numerical scale (OE ratio 95 percent confidence interval) can results in large qualitative changes in a categorical scale (low- and high-performance hospital). For example, a hospital will be reclassified as a low performance hospital if the 95 percent confidence interval around its OE ratio changes from 0.98–1.10 to 1.01–1.12. Despite the limitations of using confidence intervals to identify quality outliers, this approach is preferable to simply using point estimates of the OE ratio alone without conveying the extent of statistical uncertainty around the point estimate of the OE ratio.
Fourth, in-hospital mortality was used instead of 30-day mortality as the outcome of interest because information of 30-day mortality was not available in our dataset. Although differences in discharge policies could create a bias when comparing quality across different hospitals, the use of in-hospital mortality would not be expected to bias the study findings as each hospital was compared with itself. Finally, mortality is one of several important dimensions of health care quality. We chose to study mortality as the outcome of interest because most current efforts to evaluate health care quality using outcomes measurement are based on mortality, and do not include other important outcomes such as functional status.