|Home | About | Journals | Submit | Contact Us | Français|
Quality measures may be associated with improved outcomes for two reasons. First, measured activities may directly improve care. Second, success on these measures may be a marker for other unmeasured aspects of high quality care. Our objective is to test the contribution of both possible effects.
2004 Medicare data on hospital performance from Hospital Compare and risk-adjusted mortality rates from Medicare Part A claims.
We studied 3,657 acute care U.S. hospitals and compared observed differences in condition-specific hospital mortality rates based on hospital performance with expected differences in mortality from the clinical studies underlying the measures.
Differences in observed mortality rates across U.S. hospitals are larger than what would be expected if these differences were due only to the direct effects of delivering measured care.
Performance measures reflect care processes that both improve care directly and are also markers of elements of health care quality that are otherwise unmeasured. This finding suggests that process measures capture important information about care that is not directly measured, and that these unmeasured effects are in general larger than the measured effects.
Because measuring quality is seen as essential to improving it, quality is measured, and in many cases reported, for hospitals, health plans, nursing homes, home health agencies, and physicians. These efforts are intended to provide benchmarks and incentives to improve care, to influence consumer choice of providers (Marshall et al. 2000; Berwick, James, and Coye 2003), and to determine providers’ reimbursement (Centers for Medicare and Medicaid 2003; Epstein, Lee, and Hamel 2004; Doran et al. 2006).
Most commonly used quality measures are measures of process or measures of outcomes (Donabedian 1966), and debate remains regarding their relative merits. Patient outcomes are clinically tangible, but their value is reduced by low event rates, difficulties with risk-adjustment, and the long time-horizon needed to measure them. Measuring processes of care can overcome these shortcomings. Process measures generally require a shorter time-frame for assessment, are typically under greater control by clinicians, reflect specific targets for improvement, and ease the burden of risk adjustment by specifying the population of patients for whom such actions are known to be beneficial (Palmer 1997; Jencks et al. 2000; Landon et al. 2003). For these reasons, process measures are increasingly being adopted as tools to motivate quality improvement.
While process measurement has the potential to be a powerful tool for quality improvement, one potential limitation of process measurement is the impossibility of measuring all important processes of care. By necessity, measures of clinical processes are limited to what is measurable, and what is measurable is not always what is important. Nonetheless, process measurement may be a valuable quality improvement tool if measured care is positively correlated with care that is important but otherwise unmeasured. If this were the case, performance on process measures might reflect quality in two different ways.
First, most process measures are supported by clinical studies. For example, the benefit of giving aspirin to a patient with an acute myocardial infarction (AMI) is clear from randomized controlled trials, where aspirin administration in the setting of AMI reduces mortality by 2.5 percentage points, a benefit that is mediated by aspirin's pharmacologic effects (ISIS-2 [Second International Study of Infarct Survival] Collaborative Group 1988; Braunwald et al. 2000). An implicit assumption behind efforts to measure and improve aspirin administration for AMI is that because of these pharmacologic effects, higher rates of aspirin use will directly improve the outcomes of patients.
Second, if measured processes are markers for other important but unmeasured aspects of care, performance on process measures might provide a signal of quality that is broader than what is being measured, but reflects unmeasured quality of care. For example, hospitals that are better at administering aspirin for AMI (or better at documenting its administration) might also be better at other unmeasured care processes that advance patient safety, care coordination, emergency responsiveness, or other processes that may improve patient outcomes. In this way, aspirin use may not only improve outcomes in AMI directly, but also may be a marker of other unmeasured quality-enhancing processes. These unmeasured quality-enhancing processes might improve the care for AMI and also the care for other conditions distinct from AMI.
These two conceptions of process measures—as directly improving outcomes and as markers for other unmeasured but important processes of care—can coexist, and together they provide two justifications for using process measures in assessing provider quality of care. However, whether process measures identify better outcomes because these processes directly improve outcomes or are markers for other unmeasured processes will determine the degree to which policies that increase care processes also improve outcomes. To our knowledge, no prior research has examined whether compliance with processes of care are markers for unmeasured care that improves outcomes.
Our objective in this study is to examine the association of process measures with observed differences in risk-adjusted mortality rates and expected differences in risk-adjusted mortality rates to test the hypothesis that performance on process measures not only directly improves patient outcomes, but is also a marker of unmeasured aspects of health care quality. We use the process measures employed by the Centers for Medicare and Medicaid Services (CMS) for hospital care of AMI, heart failure (HF), and pneumonia, and we compare the observed relationship between condition-specific hospital risk-adjusted mortality rates and hospital performance along the CMS process measures, with the differences in mortality expected from the clinical studies underlying the measures.
Recently the CMS, along with other health care organizations, began participating in the Hospital Quality Alliance, a public–private collaboration that seeks to make performance information on all acute care nonfederal hospitals accessible to the public, payers, and providers of care. These performance measures evaluate hospitals on their compliance with certain processes of care for patients with AMI, HF, pneumonia, and for surgical infection prevention. Hospitals’ performance on these process measures is available to the public through the CMS website, Hospital Compare (http://www.hospitalcompare.hhs.gov/).
We evaluated hospital performance based on publicly available data from the CMS on the original 10 measures included in Hospital Compare from January 1, 2004 to December 31, 2004 for AMI, HF, and pneumonia. Five of the measures assess quality of care for AMI: aspirin within 24 hours of arrival, β-blocker within 24 hours of arrival, angiotensin converting enzyme (ACE) inhibitor for left ventricular dysfunction, aspirin prescribed at discharge, and β-blocker prescribed at discharge. Two of the measures assess quality of care for HF: assessment of left ventricular function and the use of ACE-inhibitor for left ventricular dysfunction. Three of the measures assess quality of care for pneumonia: the timing of initial antibiotics, pneumococcal vaccination, and assessment of oxygenation within 24 hours of admission. We limited our evaluation to the original 10 measures because hospital reporting rates for these measures are tied to financial incentives for reporting through the Medicare Modernization Act, and are therefore nearly universal (CMS 2005; United States Government Accountability Office 2006). Seven additional measures are also included in Hospital Compare. However, they are not tied to financial incentives and have significantly lower levels of reporting (CMS 2005) and are therefore excluded from our analyses because a hospital's decision to report its data may be non-random (McCormick et al. 2002; United States Government Accountability Office 2006).
For each of the 10 measures, a hospital's performance is calculated as the proportion of patients who received the indicated care out of all the patients who were eligible for the indicated care. We included all U.S. acute care hospitals that participated in Hospital Compare during 2004. To ensure the stability of the measures, we also excluded hospitals with fewer than 25 patients in the denominator of a measure. This is the same convention the CMS uses to report performance.
We also measured hospital performance using condition-specific composite measures of performance. We calculated this measure using the same methodology used by the Joint Commission on Accreditation of Healthcare Organizations (JCAHO): the number of opportunities an organization had to meet the performance expectations for a condition (the denominator) is divided into the number of times the organization actually met those expectations (the numerator) (JCAHO 2006).
Using the 100 percent MedPAR file (containing all Medicare Part A claims) for 2004, we calculated condition-specific hospital risk-adjusted mortality rates for Medicare beneficiaries using the standard convention of the observed mortality rate divided by the expected mortality rate. Each patient's predicted probability of death is calculated using logistic regression, adjusted for 30 comorbidities defined by Elixhauser et al. as well as age, race, zip-code level median income and education, gender, insurance status, and whether the admission was emergent or elective (Elixhauser et al. 1998). The c-statistics for these models ranged from 0.70 to 0.74. A hospital's expected mortality rate for a condition is the summed predicted probabilities of death for all of the patients with that condition divided by the number of patients with that condition. The risk-adjusted mortality rate is the ratio of the observed to expected mortality rates standardized by the average nationwide mortality rate.
We calculated 1-year risk-adjusted mortality rates within each hospital for each condition: all patients who had been admitted to the hospital with the principal diagnosis of AMI (ICD-9-CM codes 410.0 through 410.9), HF (ICD-9-CM codes 402.01, 402.11, 402.91, 404.01, 404.03, 404.11. 404.13, 404.91, 404.93, or 428.0 through 428.9), or pneumonia (ICD-9-CM codes 480.8, 480.9, 481, or 482.0 through 487.0). In the pneumonia cohort, we also included patients who had a principal diagnosis of septicemia (ICD-9-CM codes 038.0 through 038.9) or respiratory failure (ICD-9-CM codes 518.81 or 518.84) and a diagnosis of pneumonia. Finally, in order to exclude patients who developed nosocomial pneumonias during their hospital stay, we excluded patients without an admitting diagnosis of pneumonia. Within the AMI cohort, we also identified a subgroup—those with a secondary diagnosis of HF. This cohort was used to calculate the risk-adjusted mortality rates to examine the relationship between risk-adjusted mortality rate and performance on one process measure (ACE-inhibitor for left ventricular dysfunction among patients with AMI) that is conditioned on a second diagnosis.
Mortality rates were also adjusted for hospitals’ profit status, number of beds, teaching status, and whether or not a facility performs open heart surgery (as a measure of hospital technology) (Aiken et al. 2002). These measures came from Medicare's 2004 Provider of Service File and were chosen because they are often used as implicit measures of hospital quality and are known to affect patient outcomes (Hartz et al. 1989; Ayanian et al. 1998; Aiken et al. 2002).
We conducted a literature review to estimate the expected effect of improved hospital performance on hospital mortality rates for each of the 10 quality measures. To do this, we first identified the literature used to make recommendations for clinical practice guidelines and their latest updates for the management of AMI (Braunwald et al. 2000, 2002; Smith et al. 2001; Antman et al. 2004), HF (Hunt et al. 2001, 2005;), and pneumonia (Bartlett et al. 2000; Niederman et al. 2001; Centers for Disease Control and Prevention 2002; Mandell et al. 2003). All 10 performance measures were recommended in these clinical practice guidelines and nine of the 10 recommendations were based on published empirical evidence. Seven of the 10 performance measures were based on results of randomized trials of non–high-risk populations. In these cases we used published trial results to estimate the mortality impact of the proposed intervention (Hjalmarson et al. 1981; Anonymous 1981a, 1981b, 1985, 1986, 1991, 1993; ISIS-2 [Second International Study of Infarct Survival] Collaborative Group 1988; Juul-Moller et al. 1992; Pfeffer et al. 1992; Kober et al. 1995; Ortqvist et al. 1998). Trials based on high-risk populations were excluded because their results are less generalizable and cannot easily be combined with other results to calculate a mean effect due to the non-equal error-variance in each study. In the case of two performance measures with no randomized trial evidence available (oxygenation assessment and timing of initial antibiotic therapy for pneumonia), we examined the results of retrospective cohort studies (Houck et al. 2004). Finally, for the performance measures that were either recommended in clinical practice guidelines but did not reference published empirical evidence (assessment of LV function for HF), or for which the literature identified from clinical practice guidelines did not provide estimates of absolute mortality reduction (oxygenation assessment for pneumonia) (Meehan et al. 1997), we conducted a literature review. For our search on assessment of LV function for HF, our search algorithm combined Medical Subject Heading (MeSH) terms: left ventricular function; HF; and mortality. For our search on assessment of oxygenation for pneumonia, our search algorithm combined MeSH terms: oxygen, blood gas analysis, or oximetry; pneumonia; and mortality. We reviewed additional references found in the bibliographies of retrieved articles. No additional articles were identified that reported absolute differences in mortality. Therefore, our final estimates included eight of the 10 quality measures. When more than one estimate was published, we averaged estimates weighted by the number of participants included in each study. Although this simplification assumes equal error-variance in each study, this assumption seems reasonable given fairly uniform experimental standards across studies. These results are summarized in Table 1.
The published empirical evidence supporting these interventions measured mortality differences at differing time horizons. The mean follow-up period in these studies ranged from 5 weeks to 50 months, with nine of the 13 studies having a mean follow-up of over 1 year. For our main analyses, we compared these expected mortality differences to the observed mortality differences at 1-year. However, we also compared the expected mortality differences to the observed mortality differences using the time horizon from the published studies. When a single measure was supported by multiple studies of varying time horizons, we used the mean of those horizons weighted by the number of participants included in the study.
We also estimated the expected effect of improved hospital performance on mortality rates if hospitals complied with multiple quality indicators within the conditions AMI and pneumonia because mortality improvements from complying with multiple process measures is likely larger than mortality improvements from complying with one performance measure alone. We estimated the composite expected mortality differences within conditions based on the sum of the independent effects of each performance measure reported by the hospital.
For the observed mortality differences, we estimate the relationship between hospital performance and observed hospital mortality rates using a Bayesian approach. We chose to use a Bayesian approach because hospitals with smaller case loads provide less statistically stable estimates. Therefore, we applied Bayesian shrinkage to each hospital's observed condition-specific mortality rates (Christiansen and Morris 1997), which weights the hospital's mortality rates based on the degree of uncertainty used to calculate those rates (Stein 1981). We estimated the relationship between each hospital's risk-adjusted mortality rate and performance, while controlling for other hospital characteristics. In particular, we modeled the observed mortality in hospital i using a Bayesian binomial regression where yi is the observed number of deaths, ni is the number of patients in hospital i with a given condition, and z i is a covariate vector of hospital-level characteristics including hospital performance. Similarly, we modeled the expected mortality in hospital i using a Bayesian linear regression, where yi is the logit of the expected mortality rate, ni is the number of patients in hospital i with a given condition, and z i is a covariate vector of hospital-level characteristics including hospital performance. Using these regression results, we predicted risk-adjusted mortality rates for hospitals with performance scores in the 25th percentile and in the 75th percentile, which we found to be a plausible range of performance within health care markets. The relationship between each performance measure (individual and composite) and condition-specific risk-adjusted mortality rates were modeled separately. All Bayesian analyses were performed using WinBUGS 1.4 (http://www.mrc.bsu.cam.ac.uk/bugs/), a freely available software program. All statistical code is available upon request.
For the expected mortality differences, we predicted differences in hospital mortality rates with performance scores in the 25th percentile and in the 75th percentile. These predictions were based on empirical estimates of the impact of the measured intervention on mortality. For example, in the ISIS-2 trial, patients receiving aspirin at admission for an AMI had a mortality rate that was 2.5 percentage points lower than patients receiving placebo (ISIS-2 Collaborative Group 1988). To calculate the expected mortality difference from aspirin on admission for AMI between hospitals performing in the 25th percentile (where 91 percent of patients received aspirin on admission for AMI) to hospitals performing in the 75th percentile (where 98 percent of patients received aspirin on admission for AMI), we multiplied the absolute mortality reduction from aspirin (2.5 percent) by the absolute difference in aspirin use between the two percentiles (7 percent), yielding an expected absolute mortality difference of (2.5 percent × 7 percent=) 0.175 percent.
Differences in risk-adjusted morality rates across hospitals might be due, in part, to differences in the types of patients that go to high-performing versus low-performing hospitals. If high-performing and low-performing hospitals attract different patients, as one might expect, and the expected benefit of indicated care varies with case mix, the observed relationship between hospital performance and mortality rates may be confounded. To address this potential limitation, we performed 1:1 within caliper propensity score matching (Rosenbaum and Rubin 1983), matching patients at high-performing hospitals to patients at low-performing hospitals based on their comorbidities, age, race, zip-code level, median income and education, gender, insurance status, and whether the admission was emergent or elective. Propensity score matching uses observable variables to create a “pseudo-experimental” analysis, which attempts to mimic an experiment in which patients are assigned to different treatment conditions (in this case low-performing versus high-performing hospitals) to alleviate legitimate concerns about self-selection. After propensity score matching, we recalculated hospital risk-adjusted mortality rates and re-estimated the observed relationship between hospital performance and mortality.
Of the 4,048 hospitals in the Hospital Compare database, 21 did not report performance on any of the eight measures. Two hundred and eighty-four hospitals were excluded because all the measures they reported were based on fewer than 25 patients. An additional 86 hospitals included in Hospital Compare were not identified in the 2004 MedPAR file and were dropped from the analyses. A total of 3,657 hospitals were included in the final analyses. Hospital performance for each of the measures is shown in Table 2.
Figure 1 shows observed and expected mortality differences between hospitals performing at the 25th versus 75th percentile for each measure. For four of the five measures of AMI performance, the expected mortality difference between hospitals accounts for one-third of the observed difference between hospital mortality rates or less. The one exception to this is ACE-inhibitors for left ventricular dysfunction in AMI. For this measure, the observed mortality rates are lower than expected. For the HF measure, ACE-inhibitor for left ventricular dysfunction, the observed relationship between mortality and performance is negative, but the difference in observed mortality between hospitals is not statistically significant. The observed benefit for pneumococcal vaccination and early administration of antibiotics for pneumonia is greater than expected. Results were similar when we used a fixed 1-year time horizon for mortality differences or one based on the time horizons of the original clinical trials except for the timing of initial antibiotic therapy for pneumonia. Using a 30-day window to calculate the mortality rate decreased the observed reduction in mortality from 0.010 to 0.005. For the two composite measures of performance, AMI and pneumonia, the observed benefit remains larger than expected (Figure 1).
Using the propensity score matched cohorts, we re-estimated the observed relationship between hospital performance and mortality. This adjustment resulted in a smaller observed benefit (see Table 3). However, the observed benefit remained significantly larger than the expected benefit.
In this study of the association of process measures with observed and expected risk-adjusted mortality rates, we find that differences in mortality rates across U.S. hospitals are generally larger than what would be expected if these differences were due only to variation in performance on the measures themselves. These results support the hypothesis that performance on process measures not only directly affects patients’ outcomes, but is also a marker of unmeasured aspects of health care quality.
Prior work has found that hospital performance measures predict only small differences in mortality across hospitals (Werner and Bradlow 2006). This current study suggests that while the overall effect of hospital performance on mortality is small, part of that effect is unrelated to the direct effects of engaging in the care process that is assessed. These results have important implications.
First, if hospitals that perform well on these measures have better outcomes that are only partially attributable to their performance on these measures, improvements in measured performance will have uncertain results. On one hand, improvements in measured processes of care might yield improvements in outcomes greater than those expected from clinical trials if, when hospitals adopt ways to improve their measured performance, they also make changes that improve their unmeasured performance. This may improve outcomes for both measured and unmeasured conditions. For example, a hospital might adopt electronic health records or increase staffing in an effort to improve its ability to give aspirin on admission for AMI. If aspirin use rises, patient mortality may improve through aspirin's direct pharmacologic effect. At the same time, the unmeasured changes that led to the rise in aspirin use may reduce medication errors, improve response to emergencies, or improve care in other ways. These changes may improve care for AMI beyond aspirin's pharmacologic effect, and may improve care for patients with other conditions as well.
On the other hand, if unmeasured care does not improve with measured care, the benefit of improving measured processes of care on patient outcomes could be smaller than expected from observational studies. Observed improvements might also be lower than expected if hospitals divert resources from important but unmeasured areas to less important but measured areas. One randomized trial of a quality-improvement effort in nursing homes found that care for two targeted conditions improved, while care for two nontargeted conditions did not change (Mohide et al. 1988). Similarly, an observational study of a practice redesign intervention found that while targeted care processes improved in the intervention practices compared with control practices, there was no change in the nontargeted care processes in either practice setting (Ganz et al. 2007).
The uncertainty of how improvements in care processes will impact patient outcomes raises concerns about the effectiveness of tying process performance measures to performance-based quality improvement incentives, such as pay-for-performance. If improvements in care processes do not translate into significant improvements in patient outcomes, the effectiveness of pay-for-performance programs in improving health care quality may be limited.
Second, these results suggest the potential magnitude of unmeasured care. Randomized trials reveal that aspirin administration in the setting of AMI reduces mortality in absolute terms by 2.5 percentage points, a large effect that saves thousands of lives annually. Our results suggest that the unmeasured elements of care that are associated with aspirin administration but cannot be directly attributed to its pharmacologic effect are close to 10-times greater than aspirin's pharmacologic contribution. Because our results reveal that unmeasured performance predicts outcomes, the successes of process-based hospital quality improvement initiatives will depend on the relationship between measured and unmeasured care. These results suggest further opportunity to reduce mortality rates if we could learn from hospitals with exceptional performance. Hospitals with mortality rates that are lower than predicted by their performance are likely doing something else better. Further research to identify currently unmeasured factors that contribute to low mortality rates in high-performing hospitals could identify ways to improve mortality rates across all hospitals.
Finally, these results support using multiple performance measures to assess hospital quality. This may include different types of measures (structure, process, and outcomes) as well as more complete sets of process measures. As a larger portion of hospital performance is measured, less will be unmeasured. Composite measures may capture more information about quality of care, leaving a smaller residual of unmeasured care; indeed, the smaller differences between observed and expected mortality we found in the composite measures is consistent with this hypothesis.
Our study has several limitations. First, our calculations of observed mortality reductions are based on administrative data, which have limitations. Administrative data are not ideal for risk-adjustment as comorbidities may be under-reported. However, we use a well validated risk-adjustment model designed specifically for use with administrative data (Elixhauser et al. 1998; Stukenborg, Wagner, and Connors 2001; Southern, Quan, and Ghali 2004), and these risk models had high predictive power in our testing. Prior work has suggested that hospital profiles based on administrative data provide good surrogates to those based on clinical data (Austin 2005; Krumholz et al. 2006; Austin 2006). Nonetheless, it is possible that our estimates of the relationship between mortality and hospital performance are due in part to differences in hospital case mix. Because standard risk-adjustment with regression techniques may not fully account for this type of bias (Rubin 1997), we also performed propensity score matching. These additional analyses, comfortingly, did not alter our findings.
Additionally, our calculations of mortality in administrative data are based on all patients with a given condition, whereas the Hospital Compare performance measures are based on a subset of patients within a condition—those eligible for a given measure. Because our AMI mortality estimates include patients who were not eligible, for example, for aspirin, the mortality benefit we attribute to aspirin may be overstated.1 Thus, these inaccuracies stemming from the use of administrative data may understate the difference between observed and expected mortality rates.
Second, our estimates of expected mortality reductions are based on published clinical trials or other scientific evidence. The environment and patient population of these studies may be different than that of U.S. hospitals, and many of these studies were performed over 10 years ago. Indeed, clinical trials often use more homogeneous populations than those seen in daily practice (Fortin et al. 2006) in a more controlled environment. Therefore, these results may not accurately represent the expected value of the intervention in today's typical settings and our estimates of expected mortality reduction from clinical studies are likely overestimates. Because of this, our results give an upper-bound estimate of the direct effect of process measures. In all likelihood, the effect we should expect from complying with process measures is lower than what we predict here, suggesting that an even larger portion of the effect of the measures reflects the marker, or unmeasured, aspects of care.
Finally, our estimates of the composite expected mortality differences within conditions are based on a weighted sum of the independent effects of each process measure alone. These effects may not combine independently, and thus, our estimates of the combined effects of these process measures may also be misestimated. If individual measures combine synergistically, our composite underestimates the expected impact, whereas if they combine subadditively, it overestimates the expected impact.
This research provides new evidence that differences in observed mortality rates across U.S. hospitals are larger than what would be expected if these differences were due only to the direct effects of delivering measured care. This finding suggests that process measures capture important information about care that is not directly measured, and that these unmeasured effects are in general larger than the measured effects.
Rachel M. Werner is supported by a VA HSR&D Research Career Development Award. See electronic appendix for disclosure of authors’ roles. The authors gratefully thank Janell Olah for her assistance with manuscript preparation. The Centers for Medicare and Medicaid Services [CMS] have reviewed and approved the manuscript for compliance with the privacy and protection of the identity of individuals included in the datasets used. CMS did not review and approve the content or science. The authors reported no financial or other conflicts of interest.
1Patients are ineligible for aspirin (and thus for inclusion in the performance measures) if they have an aspirin allergy; active bleeding; or are on warfarin. Such patients may have a higher mortality rate because either they do not receive the pharmacological benefit of aspirin (in the case of an allergy) or have other medical conditions that put them at higher risk for death (in the case of active bleeding or warfarin).
The following supplementary material for this article is available online:
Appendix A. HSR Author matrix.
This material is available as part of the online article from: http://www.blackwell-synergy.com/doi/abs/10.1111/j.1475-6773.2007.00817.x
(This link will take you to the article abstract).
Please note: Blackwell Publishing is not responsible for the content or functionality of any supplementary materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.