In this analysis of Pennsylvania hospitals, we demonstrate that hospitals vary markedly in “end-of-life” intensive care unit (ICU) and life-sustaining treatment (LST) use. Furthermore, we demonstrate correlation among measures relying on retrospectively identified decedents and measures relying on prospectively identified probability of inpatient mortality and between our narrow procedure-based measures of intensity and the broader Dartmouth Atlas utilization measures. We conclude that “end-of-life” treatment intensity is a real hospital attribute and explore the implications of these findings for measurement in the paragraphs that follow.
“End-of-life” treatment intensity is a real hospital attribute
There is no gold standard for the measurement of “end-of-life” treatment intensity. That is because it is not clear what treatments or measures should contribute to the numerator (i.e., spending? ICU use? life-support?), and because it is not easy to determine, ex ante, who is at the end of their lives in order to identify the denominator. Moreover, despite the intuitive appeal of the widely-adopted Dartmouth Atlas measures, the validity of these measures has never been tested. The foremost contribution of the current study is the demonstration of convergent validity of multiple “end-of-life” intensity measures. These measures varied in their construction of the numerator (the particular treatments or measurements) and the denominator (those who died vs. those who might die). Prior work has explored the robustness of hospital “end-of-life” intensity outlier status to multiple measures in the numerator among cancer decedents (16
), but no prior work has assessed correlation of measures with various definitions of the both the numerator and the denominator.
Our finding that 16 of the 20 measures we studied were significantly and often highly correlated suggests that hospital “end-of-life” treatment intensity is a real construct. Moreover, because we observed relative consistency in hospitals’ use of the ICU and LSTs across groups at lower and higher probability of dying, a hospital’s “end-of-life” intensity simply may be–as the Dartmouth Atlas argues–a measure of general treatment intensity; that is, an underlying approach to all patients.
Which measure is “best”?
The answer to this question depends on whether one is trying to identify a risk-adjusted marker of general treatment intensity or whether one if fundamentally interested in end-of-life treatment. Regarding the denominator, the high probability of dying measures are theoretically better risk-adjusted markers of intensity because they compare “apples” to “apples.” This is particularly true if more intensive treatment results in a survival advantage, which would make the composition of decedent populations non-comparable between high- and low-intensity hospitals. With respect to understanding provider behavior for dying patients, the high probability of dying measure also theoretically better reflects the “real world” of decision making under conditions of uncertainty. Yet the decedent-based measures are much more feasible to obtain from administrative data. And since decedent-based measures are robust proxies for hospital treatment intensity for patients with a high probability of dying, it may be pragmatic to use decedent-based measures.
Notwithstanding this pragmatic perspective, we note that there are hospitals that fall “off-diagonal” in correlations between decedent- and high probability of dying measures. This may reflect noise, or it could be a window into a hospital’s capacity to “discriminate.” If this is the case, one might argue that jointly reported measures could provide additional information about hospital efficiency. Specifically, a hospital with an intensity of treatment among high probability of dying patients which is lower than its intensity of treatment among decedents may be triaging some of the highest risk patients for less intensive, perhaps palliative, treatment. That is, the high(er) decedent treatment intensity is due not to intensive treatment of hopelessly ill patients who died, but of patients who were initially at relatively low probability of dying but who died (e.g., those who the providers believed might be “saved,” or, alternately who “surprised” providers by dying).
Regarding the numerator, the Dartmouth Atlas measure of Medicare spending is attractive because it reflects the financial implications for the US Treasury. But for those interested in the treatment of dying patients, the underlying contributors to those costs (acute care admissions and physician visits) do not necessarily produce negative social costs; that is, these broad categories of treatment may be desirable, even among patients expected to die soon. On the other hand, particular life-sustaining treatments, such as mechanical ventilation, that most Medicare beneficiaries indicate they prefer to avoid when faced with less than a year to live, even if it briefly extends life (9
), may produce negative social costs.
Finally, regarding whether to use a cohort-based approach like the Dartmouth Atlas or an admission-based approach like the one we used here, the correlation among these measures suggests that the frequently-violated assumption of loyalty may not produce substantial misspecification in the cohort-based measures.
Limitations of our study
Our study relies upon a predicted probability of dying estimate that we did not create ourselves, but which is derived from a proprietary model. Although available information suggests that the model is robust, the lack of transparency may raise doubts among some readers. Furthermore, even if the predicted probability of dying estimate were entirely trustworthy, model-based predictions of mortality risk and physician predictions are only modestly aligned (17
), calling into question whether this measure is a reliable proxy for clinicians’ decision making under conditions of uncertainty that we sought to capture. Indeed, it is physician’s estimates of likelihood of survival in the intensive care unit and a high likelihood of poor cognitive function -- not objective severity of illness and organ dysfunction scores–that predict withdrawal of mechanical ventilation (18
), one of the LSTs we studied.
We chose the 95th percentile of predicted probability of dying as our cut-off for patients at high probability of dying because the size of the population was approximately equivalent to the population of decedents. This group included many patients whose risk of survival was greater than 50/50 and for whom intensive treatment likely is warranted. Hypothetically, using the 99th percentile (> 50% predicted probability of dying) might have been better, but such estimates would have been far too unstable, even with over 2 million observations. We did not explicitly compare the distribution of risk in the high probability of dying group between hospitals; however, we empirically adjusted for selection effects by standardizing our ratios based upon models that included patient-level risk of death data. It must be acknowledged that our approach to creating standardized ratios assumed that the overall Pennsylvania cohort represented expected “end-of-life” intensity. However, the norms of behavior in Pennsylvania are significantly different than in other regions of the US with lower (Oregon) and higher (New York) “end-of-life” utilization rates as measured by the Dartmouth Atlas.
Of concern, neither the Dartmouth Atlas measures or our PHC4 measures accounted separately for intermediate care and intensive care beds. Reassuringly, the most-recently released Dartmouth Atlas data (2008) parses out intermediate- and high-intensity ICU use and finds them to be correlated; however, variation in intermediate care beds accounts for a greater proportion of the total variation than high-intensity beds. Another concern is that our approach assumed homogeneity in each hospital’s intensity; however, it is likely that within-hospital variation in end-of-life intensity exists between ICUs (e.g., cardiothoracic surgery vs. medical). Finally, no studies have validated the reliability of coding for mechanical ventilation of < 96 hours duration and other life-sustaining treatments using chart-based audits, so these measures may be subject to coding inconsistencies. Indeed, the finding that CPR and enteral/parenteral nutrition did not correlate with the other LST measures suggests either that decision making for these LSTs is governed by different forces, or, more likely, is unreliably coded. This aberrancy in our findings deserves further study.
From a research and measurement perspective, we conclude that hospital “end-of-life” treatment intensity is a real attribute and that mortality follow-back studies of hospital behavior produce reasonably unbiased estimates of hospitals’ intensity among patients with a high probability of dying, at least in the current delivery environment in which hospitals do not appear to selectively target patients for whom intensive treatment is more likely to result in survival. From a policy perspective, wide variations in intensity raises further concerns about the efficiency of our health care system.
Future studies should explore the relationship of these measures to outcomes of real interest -- such as survival, quality-of-life, and patient and family satisfaction–in order to discern whether one or another of these correlated measures (or an amalgam of measures) is more apt at identifying hospital inefficiency.