|Home | About | Journals | Submit | Contact Us | Français|
Health care utilization among decedents is increasingly used as a measure of health care efficiency, but decedent-based measures may be biased estimates of care received by “dying” patients.
To develop and validate new measures of hospital “end-of-life” treatment intensity.
Retrospective cohort study using Pennsylvania Health Care Cost Containment Council (PHC4) discharge data (April 2001–March 2005) and Centers for Medicare and Medicaid Services (CMS) data (January 1999–December 2003).
Patients 65 and older admitted to 174 Pennsylvania acute care hospitals.
Hospital-specific standardized ratios of intensive care unit (ICU) and life-sustaining treatment (LST) use among terminal admissions (decedents) and admissions with a high probability of dying, and spending and use of hospitals, ICUs, and physicians among patients in their last 6 months of life.
There was marked between-hospital variation in the use of the ICU and LSTs among decedents and admissions with high probability of dying. All hospital decedent and high probability of dying measures were highly correlated (p < 00001). In principal components factor analysis, all 4 of the last-6-months cohort-based measures, the decedent and high-risk admission-based ICU measures, and 8 of the 12 decedent and high probability of dying LST measures loaded onto a single factor, explaining 42% of the variation in the data.
Hospitals’ end-of-life intensity varies in the use of specific life-sustaining treatments that are somewhat emblematic of aggressive end-of-life care. End-of-life intensity is a relatively stable hospital attribute that is robust to multiple measurement approaches.
There is substantial enthusiasm for using claims-based measures to profile the quality and efficiency of US hospitals. Recently, researchers have used claims data to measure the intensity of treatment among patients who died. Given that they found wide regional and hospital-level variations in intensity, they have concluded that high intensity care is of poor quality (1) or inefficient (2–7). The general assumption is that the provision of intensive and expensive care to those who died was a waste of resources. Further, because population-based surveys suggest the majority of Americans prefer not to have intensive hospital-based care when they die,(8, 9) such treatment appears misaligned with patient preferences.
However, these conclusions may be wrong if these measures, which are based upon the decedent follow-back (case-series) approach, are unreliable estimates of the care received by patients who providers believed were “dying” (10). Many deaths are unexpected. Intensive treatment for patients who were not expected to die (but did) has different implications for quality and efficiency than intensive treatment for patients who were expected to die. Furthermore, it is possible that intensity of treatment actually improves survival. If this were true, then the composition of the decedent cohorts used to measure each hospital’s intensity may not be comparable; that is, patients who were “saved” by the higher intensity hospital are missing from their decedent cohort.
The purpose of the current study is to describe the development of new measures of hospital “end-of-life” treatment intensity using Pennsylvania discharge data and to compare these new measures with existing measures of end-of-life intensity with the greatest public policy prominence, Dartmouth Atlas measures (www.dartmouthatlas.org). Our new measures leverage a unique feature of Pennsylvania Health Care Cost Containment Council (PHC4) discharge data, a predicted risk of death upon admission based upon key clinical findings abstracted from the medical chart within the first 48 hours of admission, to identify patients who providers might reasonably expect to die during the hospitalization. The comparative analyses herein are descriptive and do not include metrics for determining which measure of intensity is superior. Rather, we seek to explore the degree to which different measures could create divergent profiles of hospital intensity and, in turn, divergent and potentially contradictory targets for policy intervention.
The sample included all acute care hospitals in Pennsylvania (PA). Hospital-level treatment intensity information is based upon four years of Pennsylvania Health Care Cost Containment Council (PHC4) discharge data (April 2001 through March 2005), five years of Centers for Medicare and Medicaid Services (CMS) Medicare fee-for-service claims Part A hospital claims data (January 1999–December 2003), and four years of CMS fee-for-service Part B physician claims data (January 2000–December 2003). We describe the intensity measures from each of these sources in greater detail, below.
We chose PHC4 data for the current study because it contains a unique data element: the predicted probability of inpatient death at the time of admission. This predicted probability is based upon key clinical findings abstracted in the first 48 hours of admission by specially trained personnel using Medi-Qual’s proprietary Atlas clinical information system. Key clinical findings include historical information, laboratory results, vital signs, clinical symptoms and signs, pathophysiologic abnormalities, and composite pathophysiological scores. As reported in Medi-Qual technical white papers and by peer-reviewed independent assessment (11), inpatient mortality prediction using key clinical findings have better discrimination than administrative data-only models.
We calculated each hospital’s standardized (observed-to-expected) ratio of ICU and life-sustaining treatments (LSTs) among two populations: 1) those who died during the admission (decedents), and 2) those who had a high probability of dying during the admission. We defined “high” as the upper 95th percentile of predicted probability of inpatient death at the time of admission, equivalent to a risk of 0.21 or higher (mean 0.41). We restricted our analyses to hospitals with at least 50 decedent and 50 high probability admissions during the four study years.
The intensity measures that are based upon treatment of decedents use the mortality follow-back, or decedent case series, approach. The intensity measures based upon treatment of patients with a high probability of dying are designed to approximate a cohort of subjects that clinicians might prospectively identify as ”dying” (10).
We calculated observed hospital-level rates of ICU admission and ICU length of stay (LOS) using revenue codes for general and subspecialty intensive care units (174, 175, 200–204, 206–214, and 219). We calculated observed LST use using ICD-9-CM procedure codes in any of the 6 data fields for inpatient procedures: intubation and mechanical ventilation (ICD-9 codes 96.04, 96.05, 96.7x), tracheostomy (ICD-9 codes 31.1, 31.21, 31.29), gastrostomy tube insertion (ICD-9 codes 43.2, 43.11, 43.19, 43.2, 44.32), hemodialysis (ICD-9 code 39.95), enteral or parenteral nutrition (ICD-9 codes 96.6 and 99.15), and cardiopulmonary resuscitation (CPR, ICD-9 codes 99.60, 99.63). We chose these LSTs based upon the judgment of a multidisciplinary group of internists, intensivists, emergency physicians, palliative care physicians, and health services researchers, and reinforced by semi-structured interviews with over 100 informants in 11 PA hospitals (12). To our knowledge, the reliability of coding for each of these procedures has not been specifically explored, but we expect that tracheostomy and mechanical ventilation are reliably coded because they have important implications for reimbursement and are easily verified under audit (thus, overcoding is susceptible to fraud detection and undercoding is financially inefficient).
Next, we calculated expected hospital-level rates of ICU admission, ICU LOS, intubation and mechanical ventilation, tracheostomy, gastrostomy tube insertion, hemodialysis, enteral or parenteral nutrition, and cardiopulmonary resuscitation. We used data pooled across all hospitals over the four years to identify patient-level predictors of ICU admission, ICU LOS, and receipt of the 6 LSTs in the two populations (decedent and high probability of dying). Variables eligible for model inclusion included patient demographics (age, sex, race), illness severity (predicted probability of dying during the hospitalization), diagnosis (25 diagnoses most prevalent among decedents, grouped using the clinical classification software (CCS) developed by the Agency for Healthcare Research and Quality), and coronary artery bypass grafting (a common procedure associated with ICU and MV use). We used stepwise procedures for model selection to avoid biased estimations and inferences due to correlations among these covariates. The p-value of the Hosmer-Lemeshow goodness of fit test was < 0.0001 for all models. The logistic regression models each had an area under the receiver operating curve between 0.70 and 0.83 and the ICU LOS linear model had an R2 of 0.1 and median residual of −0.15 days.
We applied the coefficients obtained from each of those models to calculate a predicted probability of ICU admission and receipt of each of 6 LSTs and a predicted ICU LOS for those with an ICU stay for each decedent and high probability of dying admission. We summed the predicted probabilities of ICU admission, individual LSTs, and ICU LOS across the decedent and high probability of dying admissions at each hospital to obtain the expected hospital-level rates of ICU admission, LSTs, and ICU LOS for decedent and high probability of dying admissions.
Finally, we calculated 16 observed-to-expected treatment ratios for each hospital (8 measures in 2 groups: decedents and high probability of dying). To address instability of estimates, particularly for smaller hospitals, we used empiric Bayes shrinkage estimation (13). The advantage of shrinkage estimation is that it reduces the likelihood of inappropriately labeling a hospital as an “outlier” relative to other hospitals. A Bayesian framework uses prior knowledge about a situation to produce estimates for the true mean that lie somewhere between the observed average and the expected mean based on prior knowledge. Empirical Bayes uses the data both as the basis for prior knowledge and to adjust this knowledge; the expected mean used to seed the iterative analysis in our calculations was the mean in the overall sample.
We chose the Dartmouth Atlas hospital-level intensity measures for comparison to our own because they have been widely published and have garnered the attention of policymakers who seek to improve the efficiency of the US health care system. Although there is no agreed-upon “gold standard” of measuring “end-of-life” treatment intensity, the Dartmouth Atlas measures clearly have the highest public visibility (14).
The Dartmouth measures differ from our PHC4 measures in several ways. First, they are not admission-based measures; they are cohort-based measures. Specifically, they represent cumulative acute care hospital and physician spending and utilization among a cohort of chronically ill elders during in the last 6 months of life who died between 1999 and 2003. Second, the patients in the cohort are restricted to continuously enrolled Medicare fee-for-service with at least one hospital admission for one of 11 diagnoses known to confer a high risk of inpatient mortality (15) during their last 2 years of life, whereas our measures include all patients over age 65, regardless of insurance or diagnosis. Finally, the Dartmouth measures assume that patients are loyal to a particular hospital, but the PHC4 measures do not. Each member of the Dartmouth Atlas end-of-life cohort was assigned to the hospital at which they obtained the plurality of their inpatient care in the last 2 years of their life. The Atlas then attributes the beneficiaries’ acute care hospital and physician utilization and spending to that hospital, and reports hospital-specific last 6-month inpatient spending, hospital days, ICU days and physician visits for hospitals with at least 50 assigned beneficiaries over 5 years (5).
We calculated Pearson’s correlations and conducted exploratory principal components factor analysis among the three types of end-of-life intensity measures: PHC4 decedent admission-based ICU and LST use, PHC4 high probability of dying admission-based ICU and LST use, and Dartmouth Atlas decedent cohort-based last 6 months spending, hospital, ICU, and physician utilization. We performed all analyses using STATA 9.1 (StataCorp, College Station, TX).
There were 193 acute care hospitals in PA during the study years, 174 (90%) hospitals had at least 50 decedent and 50 high probability of dying admissions upon which to base the PHC4 measures, 169 of these (97%) also had Dartmouth Atlas acute care measures of end-of-life treatment intensity and 159 (91%) had Dartmouth Atlas physician measures of end-of-life treatment intensity. Study hospitals represented a broad spectrum: small to large, non-teaching to major academic centers, and lower to higher illness acuity, as reflected in the mean probability of inpatient mortality (Table 1). There was marked variation in “end-of-life” treatment intensity among hospitals, regardless of the intensity measure (Table 2).
We depict the predicted risk of death upon admission among all admissions in the PHC4 data and among the subset of admissions during which the patient died in Figure 1. The PHC4 decedent admission-based ICU and LST measures were based on 114,957 terminal admissions, 39% of which were also at a high probability of dying. The PHC4 high probability of dying admission-based ICU and LST measures were based on 132,585 admissions, 34% of which resulted in death. The Dartmouth Atlas decedent cohort-based last 6 months utilization measures reflect the utilization of 252,201 Medicare fee-for-service beneficiaries in their last 6 months of life.
We found that the loyalty assumption upon which the Dartmouth Atlas decedent cohort-based last 6 month utilization measures are based is highly violated in PHC4 data. Specifically, 28.2% of patients over 65 are admitted to another hospital within 30 days from their discharge from a previous hospital. This hospital “disloyalty” increases with number of admissions; for example, among those with only 2 admissions over 4 years, 36.8% were seen in 2 different hospitals; among those with 6 or more admissions in 4 years, 58.9% were seen in 2 or more different hospitals.
The PHC4 decedent and high probability of dying measures of hospital-level treatment intensity were highly correlated with one another (P<0.0001; lowest ρ = 0.65 for gastrostomy tube placement; highest ρ = 0.95 for ICU admission; see Appendix Table 1 and Appendix Figure). Individual hospitals’ treatment patterns among all admissions, conditional upon predicted risk of death upon admission, generally followed a consistent pattern which underscored the hospital’s practice pattern as a “higher” or “lower” intensity hospital. For example, a large urban teaching hospital in Pittsburgh with an observed-to-expected ICU admission ratio among decedents of 1.21 and among high probability of dying admissions of 1.28, admitted a greater proportion of its patients to the ICU than the state average across all deciles of risk (Figure 2). A competing large urban teaching hospital 5.5 miles away by car with an observed-to-expected ICU admission ratio among decedents of 1.05 and among high probability of dying admissions of 1.00, admitted a lower proportion of its patients to the ICU than the state average across the lower 9 risk deciles and admitted near the statewide average for patients in the top 10% of admission risk of death (>11.4%).
Similarly, in Philadelphia, a large urban teaching hospital with an observed-to-expected ICU admission ratio among decedents of 1.22 and among high probability of dying admission of 1.22 also admitted a greater proportion of its patients to the ICU than the state average across all deciles of admission risk of death. Less than 3 miles away, another large urban teaching hospital with an observed-to-expected ICU admission ratio among decedents of 0.94 and among high probability of dying admissions of 0.95 admitted fewer patients to the ICU than the state average for the lower 6 deciles of admission risk of death, and about the same as the state average for the top 4 deciles of admission risk of death.
Among the hospitals with PHC4 decedent and high probability of dying intensity measures as well as Dartmouth Atlas retrospective cohort-based measures, there were statistically significant positive correlations between twelve of the sixteen PHC4 measures and the four Dartmouth Atlas measures (Appendix Table 1). The four PHC4 measures that were not correlated with the Dartmouth Atlas measures were the standardized ratios of enteral/parenteral nutrition and CPR among decedents and high probability of dying admissions. Consistent with these findings, principal component factor analysis of the 20 intensity items yielded 2 factors with eigenvalues > 1. Sixteen of the hospital intensity measures loaded onto the first factor, the enteral/parenteral nutrition and CPR measures loaded onto the second factor (Appendix Table 2). The first factor explained 42% of the variation in the data and the second factor explained another 13% of the variation.
We conceptualized the first factor as a latent construct, “treatment intensity.” We repeated the factor analyses restricted to the PHC4 decedent-based and to the PHC4 high probability of dying admission-based measures and used the factor loadings for the standardized ratios of ICU admission, ICU LOS, intubation/mechanical ventilation, tracheostomy, gastrostomy, and hemodialysis to calculate a normalized factor score, or index of intensity, for decedent admissions and for high probability of dying admissions. The decedent and high probability of dying indices, like their underlying components, were highly correlated (Figure 3).
In this analysis of Pennsylvania hospitals, we demonstrate that hospitals vary markedly in “end-of-life” intensive care unit (ICU) and life-sustaining treatment (LST) use. Furthermore, we demonstrate correlation among measures relying on retrospectively identified decedents and measures relying on prospectively identified probability of inpatient mortality and between our narrow procedure-based measures of intensity and the broader Dartmouth Atlas utilization measures. We conclude that “end-of-life” treatment intensity is a real hospital attribute and explore the implications of these findings for measurement in the paragraphs that follow.
There is no gold standard for the measurement of “end-of-life” treatment intensity. That is because it is not clear what treatments or measures should contribute to the numerator (i.e., spending? ICU use? life-support?), and because it is not easy to determine, ex ante, who is at the end of their lives in order to identify the denominator. Moreover, despite the intuitive appeal of the widely-adopted Dartmouth Atlas measures, the validity of these measures has never been tested. The foremost contribution of the current study is the demonstration of convergent validity of multiple “end-of-life” intensity measures. These measures varied in their construction of the numerator (the particular treatments or measurements) and the denominator (those who died vs. those who might die). Prior work has explored the robustness of hospital “end-of-life” intensity outlier status to multiple measures in the numerator among cancer decedents (16), but no prior work has assessed correlation of measures with various definitions of the both the numerator and the denominator.
Our finding that 16 of the 20 measures we studied were significantly and often highly correlated suggests that hospital “end-of-life” treatment intensity is a real construct. Moreover, because we observed relative consistency in hospitals’ use of the ICU and LSTs across groups at lower and higher probability of dying, a hospital’s “end-of-life” intensity simply may be–as the Dartmouth Atlas argues–a measure of general treatment intensity; that is, an underlying approach to all patients.
The answer to this question depends on whether one is trying to identify a risk-adjusted marker of general treatment intensity or whether one if fundamentally interested in end-of-life treatment. Regarding the denominator, the high probability of dying measures are theoretically better risk-adjusted markers of intensity because they compare “apples” to “apples.” This is particularly true if more intensive treatment results in a survival advantage, which would make the composition of decedent populations non-comparable between high- and low-intensity hospitals. With respect to understanding provider behavior for dying patients, the high probability of dying measure also theoretically better reflects the “real world” of decision making under conditions of uncertainty. Yet the decedent-based measures are much more feasible to obtain from administrative data. And since decedent-based measures are robust proxies for hospital treatment intensity for patients with a high probability of dying, it may be pragmatic to use decedent-based measures.
Notwithstanding this pragmatic perspective, we note that there are hospitals that fall “off-diagonal” in correlations between decedent- and high probability of dying measures. This may reflect noise, or it could be a window into a hospital’s capacity to “discriminate.” If this is the case, one might argue that jointly reported measures could provide additional information about hospital efficiency. Specifically, a hospital with an intensity of treatment among high probability of dying patients which is lower than its intensity of treatment among decedents may be triaging some of the highest risk patients for less intensive, perhaps palliative, treatment. That is, the high(er) decedent treatment intensity is due not to intensive treatment of hopelessly ill patients who died, but of patients who were initially at relatively low probability of dying but who died (e.g., those who the providers believed might be “saved,” or, alternately who “surprised” providers by dying).
Regarding the numerator, the Dartmouth Atlas measure of Medicare spending is attractive because it reflects the financial implications for the US Treasury. But for those interested in the treatment of dying patients, the underlying contributors to those costs (acute care admissions and physician visits) do not necessarily produce negative social costs; that is, these broad categories of treatment may be desirable, even among patients expected to die soon. On the other hand, particular life-sustaining treatments, such as mechanical ventilation, that most Medicare beneficiaries indicate they prefer to avoid when faced with less than a year to live, even if it briefly extends life (9), may produce negative social costs.
Finally, regarding whether to use a cohort-based approach like the Dartmouth Atlas or an admission-based approach like the one we used here, the correlation among these measures suggests that the frequently-violated assumption of loyalty may not produce substantial misspecification in the cohort-based measures.
Our study relies upon a predicted probability of dying estimate that we did not create ourselves, but which is derived from a proprietary model. Although available information suggests that the model is robust, the lack of transparency may raise doubts among some readers. Furthermore, even if the predicted probability of dying estimate were entirely trustworthy, model-based predictions of mortality risk and physician predictions are only modestly aligned (17), calling into question whether this measure is a reliable proxy for clinicians’ decision making under conditions of uncertainty that we sought to capture. Indeed, it is physician’s estimates of likelihood of survival in the intensive care unit and a high likelihood of poor cognitive function -- not objective severity of illness and organ dysfunction scores–that predict withdrawal of mechanical ventilation (18), one of the LSTs we studied.
We chose the 95th percentile of predicted probability of dying as our cut-off for patients at high probability of dying because the size of the population was approximately equivalent to the population of decedents. This group included many patients whose risk of survival was greater than 50/50 and for whom intensive treatment likely is warranted. Hypothetically, using the 99th percentile (> 50% predicted probability of dying) might have been better, but such estimates would have been far too unstable, even with over 2 million observations. We did not explicitly compare the distribution of risk in the high probability of dying group between hospitals; however, we empirically adjusted for selection effects by standardizing our ratios based upon models that included patient-level risk of death data. It must be acknowledged that our approach to creating standardized ratios assumed that the overall Pennsylvania cohort represented expected “end-of-life” intensity. However, the norms of behavior in Pennsylvania are significantly different than in other regions of the US with lower (Oregon) and higher (New York) “end-of-life” utilization rates as measured by the Dartmouth Atlas.
Of concern, neither the Dartmouth Atlas measures or our PHC4 measures accounted separately for intermediate care and intensive care beds. Reassuringly, the most-recently released Dartmouth Atlas data (2008) parses out intermediate- and high-intensity ICU use and finds them to be correlated; however, variation in intermediate care beds accounts for a greater proportion of the total variation than high-intensity beds. Another concern is that our approach assumed homogeneity in each hospital’s intensity; however, it is likely that within-hospital variation in end-of-life intensity exists between ICUs (e.g., cardiothoracic surgery vs. medical). Finally, no studies have validated the reliability of coding for mechanical ventilation of < 96 hours duration and other life-sustaining treatments using chart-based audits, so these measures may be subject to coding inconsistencies. Indeed, the finding that CPR and enteral/parenteral nutrition did not correlate with the other LST measures suggests either that decision making for these LSTs is governed by different forces, or, more likely, is unreliably coded. This aberrancy in our findings deserves further study.
From a research and measurement perspective, we conclude that hospital “end-of-life” treatment intensity is a real attribute and that mortality follow-back studies of hospital behavior produce reasonably unbiased estimates of hospitals’ intensity among patients with a high probability of dying, at least in the current delivery environment in which hospitals do not appear to selectively target patients for whom intensive treatment is more likely to result in survival. From a policy perspective, wide variations in intensity raises further concerns about the efficiency of our health care system.
Future studies should explore the relationship of these measures to outcomes of real interest -- such as survival, quality-of-life, and patient and family satisfaction–in order to discern whether one or another of these correlated measures (or an amalgam of measures) is more apt at identifying hospital inefficiency.
Source of support: This study was funded by NIH grant K08 AG021921 (Barnato PI), with additional support from P01 AG019783 (Skinner PI) and 1UL1 RR024153 (Reis PI).
The authors acknowledge the Dartmouth Atlas (Elliott S. Fisher, Principal Investigator) for provision of data and Galen Switzer (University of Pittsburgh) and Douglas Staiger (Dartmouth College) for contributions to the analysis.