|Home | About | Journals | Submit | Contact Us | Français|
Estimation of six-month prognosis is essential in hospice referral decisions, but accurate, evidence-based tools to assist in this task are lacking.
To develop a new prognostic model, the Patient-Reported Outcome Mortality Prediction Tool (PROMPT), for six-month mortality in community-dwelling elderly patients.
We used data from the Medicare Health Outcomes Survey (MHOS) linked to vital status information. Respondents were 65 years old or older, with self-reported declining health over the past year (n=21,870), identified from four MHOS cohorts (1998–2000, 1999–2001, 2000–2002, and 2001–2003). A logistic regression model was derived to predict six-month mortality, using sociodemographic characteristics, comorbidities, and health-related quality of life (HRQOL), ascertained by measures of activities of daily living (ADLs) and the Medical Outcomes Study Short Form-36 Health Survey (SF-36®); k-fold cross-validation was used to evaluate model performance, which was compared to existing prognostic tools.
The PROMPT incorporated 11 variables including four HRQOL domains: general health perceptions, ADLs, social functioning, and energy/fatigue. The model demonstrated good discrimination (c-statistic=0.75) and calibration. Overall diagnostic accuracy was superior to existing tools. At cutpoints of 10%–70%, estimated six-month mortality risk sensitivity and specificity ranged from 0.8%–83.4% and 51.1%–99.9%, respectively, and positive likelihood ratios at all mortality risk cut-points ≥40% exceeded 5.0. Corresponding positive and negative predictive values were 23.1%–64.1% and 85.3%–94.5%. Over 50% of patients with estimated six-month mortality risk ≥30% died within 12 months.
The PROMPT, a new prognostic model incorporating HRQOL, demonstrates promising performance and potential value for hospice referral decisions. More work is needed to evaluate the model.
Prognostic estimates are important in numerous medical decisions, but perhaps nowhere do they play a more critical role than in the decision to initiate hospice care. This single decision formalizes the beginning of the end-of-life period and the transition from curative to palliative goals of care for many patients, and thus implicitly embodies some estimate of prognosis. In the U.S., the decision to initiate hospice care explicitly depends on prognostic estimates, since physicians must certify an expected survival of six months or less before patients can receive services under the Medicare Hospice Benefit.1, 2
It follows that difficulties in accurately and prospectively estimating six-month mortality may be an important determinant of the underutilization of hospice services—and the proportionate overutilization of aggressive curative interventions—known to characterize end-of-life care in the U.S.3–15 Physicians’ prognostic estimates are known to be generally inaccurate and optimistically biased.14, 16, 17 The extent and systematic nature of this inaccuracy and its correspondence with existing patterns of end-of-life care suggest that prognostic uncertainty—particularly with respect to six-month mortality—may be a key determinant of hospice underutilization.
Yet, few accurate and evidence-based prognostic tools exist to help clinicians estimate six-month mortality. Consensus guidelines were developed in 1996 by the National Hospice Organization (NHO) and adopted widely by clinicians and health policymakers.18 However, these guidelines were not evidence-based and have been shown to perform poorly in predicting six-month mortality.19, 20 Empirically-derived prognostic models also have shown limited accuracy. Perhaps the best-known, most rigorously-evaluated model, from the Study to Understand Prognoses and Preferences for Outcomes and Risks of Treatments (SUPPORT), used disease characteristics and physiologic variables to predict six-month mortality in critically-ill patients surviving hospitalization for any of nine serious illnesses.21 However, in a subsequent validation effort in patients with advanced lung, heart, and liver disease and a 25% six-month mortality, this model also demonstrated poor performance.19
More recent modeling efforts are promising but have been limited to specific diseases such as cancer and dementia,22–26 or temporal endpoints other than six months. For example, models have been developed to predict short-term (less than six months) mortality27, 28 in terminally-ill patients already referred for hospice or palliative care, and longer-term (one year or more) mortality in hospitalized29 or community-dwelling elders.30, 31 These models are thus less useful for hospice referral decisions, although they demonstrate improved predictive performance, possibly because of their inclusion of patient-reported outcomes (PROs) such as self-reported functioning and well-being, or health-related quality of life (HRQOL). Accumulating evidence suggests that PROs assume greater prognostic power than other variables as the end of life approaches, presumably because the dying process represents a final common pathway characterized by a relatively small set of symptoms and functional impairments.14, 32–47 PROs are also attractive as prognostic variables because they are feasibly ascertainable directly from patients.
To our knowledge, however, there have been no previous attempts to integrate HRQOL into prognostic models for six-month mortality in general patient populations. Our objective in the current study was to develop a broadly-applicable prognostic model incorporating HRQOL to predict six-month mortality, the Patient-Reported Outcome Mortality Prediction Tool (PROMPT), and to explore whether such a model could have sufficient accuracy to inform hospice referral decisions.
This study used data from the Medicare Health Outcomes Survey (MHOS), an annual nationwide survey of Medicare managed care beneficiaries administered by the Centers for Medicare and Medicaid Services (CMS) since 1998 (www.hosonline.org).48–50 The MHOS surveys a random sample of 1000 Medicare beneficiaries from each managed care plan under contract with CMS (between 250–320 participating plans yearly). Participants complete a baseline and a two-year follow-up survey if enrolled in the same plan. Institutionalized and disabled beneficiaries are included, but patients on Medicare solely because of end-stage renal disease are excluded. The MHOS utilizes a self-report questionnaire to collect data on patient sociodemographic characteristics, comorbidities, clinical symptoms, and HRQOL, as measured by activities of daily living (ADLs) and the Medical Outcomes Study Short Form-36 Health Survey (SF-36®, version 1).51 The MHOS is administered by mail, with telephone follow-up and administration to initial nonresponders.
We used data from four MHOS cohorts: 1998–2000, 1999–2001, 2000–2002, and 2001–2003. During this time, baseline and follow-up surveys were distributed to a total of 986,530 patients, of whom 912,703 were eligible for participation (more than 65 years old, managed care enrollees, not deceased). There were 634,892 total respondents (response rate 70%); we analyzed the last survey completed by any respondent (Fig. 1). We further limited the sample to patients for whom use of a prognostic tool for hospice decision making would be most clinically appropriate and useful, using the SF-36 health transition item to select patients reporting significantly declining health: “Compared to one year ago, how would you rate your health in general now?” This item is not scored in any SF-36 scale, and its use as an inclusion criterion has conceptual validity since physicians should be more apt to consider hospice referrals for patients with substantially declining health. We included only respondents who reported that their health was “much worse” (n=21,870); there were 3,295 deaths in this group, yielding a substantially higher observed six-month mortality than in the overall MHOS sample (15% vs. 2%).
Sociodemographic characteristics included in our analysis were self-reported age, sex, race/ethnicity, education, and current marital status.
Comorbidities included several self-reported diseases: hypertension, coronary artery disease, congestive heart failure, other heart conditions, stroke, chronic obstructive pulmonary disease, diabetes, and cancer. Arthritis, inflammatory bowel disease, and sciatica were ascertained but excluded from analyses because they are not leading causes of mortality in U.S. adults aged 65 years and older.52 Smoking status (current, former, never) also was ascertained.
HRQOL was measured in two ways. Functional status was assessed using six ADLs: bathing, dressing, eating, getting in or out of chairs, walking, and using the toilet. These items had three response options: “No, I do not have difficulty”/“Yes, I have difficulty”/“I am unable to do this activity.” We created a summary measure of the total number of ADLs for which respondents indicated “unable to do this activity;” scores ranged from 0–6, with higher scores indicating greater functional impairment. HRQOL also was ascertained using the SF-36, version 1,51 a widely-used, comprehensive, generic health status instrument comprising eight scales: physical functioning (10 items), role limitations because of physical health problems (four items), bodily pain (two items), general health perceptions (five items), energy/fatigue (four items), social functioning (two items), role limitations because of emotional problems (three items), and emotional well-being (five items). Response options range from two to six ordinal categories. SF-36 scale scores were normalized to the general U.S. population on a T-score metric (mean=50, standard deviation=10), with higher scores indicating better HRQOL.
The outcome variable was survival at six months since the last completed survey for each respondent. Vital status and date of death were obtained from the CMS Medicare Enrollment Database. Survey completion by proxy also was ascertained.
The large number of predictor variables resulted in a substantial proportion of subjects with missing data (n=6,154, representing 28% of the sample). The proxy survey completion variable had a disproportionately high frequency of missing data (11%), and to avoid dropping cases, we created a dummy variable for non-response to this item. Missing data for all other individual variables were less than 8%, and handled through multiple imputation using the PROC MI and PROC MI-ANALYZE functions of SAS software, version 9.1.3 (SAS Institute Inc., Cary, NC). Missing data were imputed using a Markov chain Monte Carlo method with multiple chains, creating 10 imputed “complete” datasets.
Because of the large number of potential predictors, especially those related to HRQOL, we made several decisions to facilitate variable selection. We incorporated SF-36 scales rather than individual items to maximize measurement precision and because scale scores are normed to the U.S. general population. To further reduce variables in the model and because an a priori theoretical justification for variable selection is lacking, we applied a backward elimination strategy with an Akaike’s information criterion stopping rule53 to a model, including all predictor variables in each imputed dataset. This is equivalent to using a P-value of 0.157 for a variable with one degree of freedom.54 We then applied the majority method, including variables selected in five or more of the 10 imputed sets.55
Age, sex, race/ethnicity, education, proxy status, hypertension, congestive heart failure, stroke, chronic obstructive pulmonary disease, presence of any cancer, smoking status, ADL score, and SF-36 scores for bodily pain, general health perceptions, emotional well-being, social functioning, and energy/fatigue were selected into a final model. Of the continuous variables, age showed a nonlinear association with six-month mortality, and was, therefore, modeled as a restricted cubic spline with the 5%, 35%, 65% and 95% percentiles of age.
Some variables showed counterintuitive associations, with lower six-month mortality in both univariate and multivariate analyses: non-white race, low education, hypertension, stroke, greater bodily pain, and low emotional well-being. Some of these counterintuitive associations have been found in other studies,43, 56–59 and may reflect confounding by unmeasured variables (e.g., health care access and quality), selection biases that could have altered the relative influence of competing mortality risks (e.g., restriction to a managed care sample, selection according to self-reported declining health), or the effects of illness adaptation on participants’ HRQOL ratings.60, 61 Counterintuitive associations potentially diminish the “sensibility” or face validity of risk prediction models for clinicians, and many modelers thus recommend excluding the variables involved.62–64 Other modelers have additionally excluded race/ethnicity and education both because of confounding of their prognostic significance and to ethical concerns about the potential for models incorporating these variables to contribute to health disparities.31, 65 For these reasons and to maximize parsimony, we conducted sensitivity analyses both including and excluding counterintuitively-associated variables in multivariate regression models. Model fit, discrimination, and calibration were similar, and, therefore, we excluded these variables. Regression coefficients and standard errors for variables in the final model were computed by averaging across the 10 imputed datasets, following Rubin’s method.66
Because the MHOS sample is composed of Medicare managed care beneficiaries whose access to care, health status, and thus mortality might differ from the general U.S. population,67 we generated life tables68, 69 comparing overall survival of the MHOS sample with that of the year 2000 general U.S. population to assess representativeness. We also calculated descriptive statistics on sociodemographic and health-related characteristics of the sample.
We used k-fold cross-validation (k=10) to validate the model. Each of the 10 imputed datasets were randomly partitioned into k subsamples, with each subsample used once as the validation set and the remaining k-1 set used as the training set. We assessed model discrimination by calculating the c-statistic, or area under the receiver operating characteristic curve (AUC), averaging the c-statistics across all imputed datasets and cross-validation samples.70 To further evaluate prognostic performance of the final model, we computed estimated six-month mortalities of individuals within each of the 10 imputed datasets, and obtained mortality estimates by averaging across them. We assessed calibration by dividing patients into seven overlapping groups according to their predicted six-month mortality risk (≥0.1, ≥0.2,…, ≥0.7) and comparing the average predicted six-mortality in each group with the actual proportion of patients who died in six months. To assess calibration graphically across estimated risk strata, we used a non-parametric method (PROC LOESS; SAS Institute Inc., Cary, NC) to produce a smoothed high-resolution calibration curve with histogram plot.71
We then calculated sensitivity, specificity, positive and negative predictive values, and positive and negative likelihood ratios at different estimated mortality risk thresholds to compare performance characteristics of the PROMPT to those of the NHO guidelines and SUPPORT model, reported by Fox et al.19 Finally, we generated Kaplan-Meier curves to compare extended survival of respondents across all risk strata.
The MHOS and general U.S. population survival curves were similar (Fig. 2), although the MHOS sample had slightly better survival than the U.S. population; the ages at median survival probability were 85 and 83 years, respectively, for the MHOS and U.S. populations. Table 1 shows the distribution of sociodemographic characteristics, comorbidities, and HRQOL.
Table 2 shows the 11 variables included in the PROMPT and their associations with six-month mortality in the entire study sample. HRQOL variables included ADLs, general health perceptions, social functioning, and energy/fatigue. The c-statistic obtained from tenfold cross-validation was 0.752, indicating good overall discrimination. The model was well-calibrated at lower estimated risk values; however, for estimated risk g than 50%, it overestimated mortality, likely reflecting the small number of events at higher risk strata (Fig. 3).
Table 3 shows the performance characteristics of the PROMPT compared with the NHO guidelines and the SUPPORT prognostic model. At estimated six-month mortality risk thresholds of 10%–70%, model sensitivity and specificity were 0.8%–83.4% and 51.1%–99.99%, respectively, and corresponding positive and negative predictive values were 23.1%–64.1% and 85.3%–94.5%. Positive likelihood ratios exceeded 5.0 at all risk thresholds of 40% or more, while negative likelihood ratios were near 1.0. At comparable estimated risk thresholds, diagnostic performance was superior to the NHO and SUPPORT models.
Figure 4 shows Kaplan-Meier survival curves for respondents in different estimated risk strata. Observed six-month mortality corresponded well to estimated risk, supporting the model’s calibration, and by 12 months more than 50% of all patients in estimated risk strata of 30% or more had died.
In this study we developed a new prognostic model, the PROMPT, to predict six-month mortality in community-dwelling elderly patients with self-reported declining health. The model was developed using a large diverse sample, and utilized 11 total variables, including HRQOL, ascertained by patient self-report. The model demonstrated good calibration and discrimination overall. Importantly, diagnostic performance at various thresholds of estimated risk was superior to existing non-disease-specific models. Specificity was high, and at strata of estimated six-month mortality risk of 40% or more, the model yielded positive likelihood ratios of moderate to large magnitude (5.0 or more) with respect to clinical prediction—exceeding the performance of previous models and increasing the post-test odds of death to an extent generally considered useful in clinical decision making. The model’s positive predictive value in our study population was also high and commensurate to estimated risk; 53% of patients in the 50% risk stratum died by six months, and the proportions of observed deaths were correspondingly greater in higher risk strata. On extended observation, over half of all patients with estimated six-month mortality risk of 30% or more died by 12 months.
These promising performance characteristics are particularly noteworthy given the types of variables used in our model, and the nature of our study population. Unlike prior prognostic efforts, the PROMPT included no physiologic or laboratory data and relatively little clinical data regarding disease characteristics or health services utilization. The study population was clinically heterogeneous, ambulatory, community-dwelling, and relatively healthy, with a lower pretest mortality risk, compared with populations included in other prognostic modeling efforts and for whom hospice care is typically considered by clinicians. Yet, in spite of these significant constraints on prognostic power, the PROMPT still demonstrated superior performance compared to existing tools such as the SUPPORT model and NHO guidelines. This supports the model’s robustness and potential transportability to more narrowly-defined populations with higher pretest probabilities of mortality in which prognostic performance would likely be more optimal. These conclusions remain preliminary, however, since our model has yet to be externally validated and directly compared to other tools.
The primary limitation of the PROMPT is one shared by all existing prognostic tools for predicting short-term mortality: insufficient sensitivity to “rule out” death in a substantial proportion of patients. This is not surprising given that there are undoubtedly numerous causal factors and trajectories in the dying process,72,73 and no prognostic model has accounted for them all. For the PROMPT, furthermore, comorbidities were ascertained by self-report only, and some important ones, e.g., dementia, renal and liver disease, were not assessed. A substantial proportion of surveys (38%) also were completed by proxy, presumably because patients were too ill or impaired to do so themselves. Finally, our selection of patients on the basis of self-reported decline in health likely also led to the inclusion of patients with acute, self-limited conditions with little impact on mortality. All of these factors likely limit the PROMPT’s sensitivity and use as an exclusive means of determining hospice eligibility because this would result in the denial of hospice services for a majority of dying patients. This limitation has led other modelers to conclude that the goal of determining individuals’ risk of six-month mortality is unrealistic.19, 25
Yet, we believe our modeling effort offers important insights for future research, and that the PROMPT has significant potential utility for clinical care. Our study adds to mounting evidence of the prognostic power of HRQOL. The PROMPT’s superior overall performance compared with efforts incorporating disease and physiologic variables alone supports the hypothesis that as death approaches, HRQOL assumes greater prognostic significance.32, 33 The prominent role of similar HRQOL variables in other prognostic models in elderly patients with advanced illness24, 74 further bears this out, supporting the PROMPT’s validity and the value of integrating HRQOL in future modeling efforts.
Furthermore, despite its low sensitivity in ruling out imminent death, the PROMPT has significant potential to improve end-of-life care given the prevailing underutilization of hospice services, overutilization of life-prolonging interventions, and lack of more accurate, evidence-based and explicit prognostic methods. These circumstances alone raise the possibility that use of the model could increase hospice utilization and advance care planning. Yet, the PROMPT’s greatest potential value lies in its ability to confirm a poor six-month prognosis. Its very high specificity across a range of estimated mortality risks (97% or more for all estimated risk cutpoints of 40% or greater) makes the PROMPT an extremely valuable tool for “ruling in” imminent death, with very few false positives. From both an ethical and a clinical standpoint, this function has at least as much clinical importance as ruling out death. The potential harm of a false negative estimate of six-month mortality is overly aggressive care at the end of life. Although undesirable, this outcome is arguably more tolerable than the potential irreversible harm of a false positive estimate: mistakenly labeling patients as “dying” and forgoing potentially beneficial or curative interventions. This ethical concern may be an important reason for physicians’ clinical reluctance to render prognoses,22, 75 and patients’ reluctance to accept them.76 The PROMPT’s ability to identify imminently dying patients with very few false positives addresses this concern, providing physicians and patients with the necessary reassurance to make critical decisions about end-of-life care.
A final limitation of the PROMPT’s performance is its weaker calibration at higher estimated risk levels, at which it overestimated mortality. This is likely a consequence of the small number of total deaths in these subgroups, reflecting the low overall mortality rate of the study sample (15%). In sicker populations with a higher pretest likelihood of mortality, it is possible that the model’s calibration would be improved.
However, this remains to be seen, and further evaluation is needed before the PROMPT can be implemented clinically. Although the large size and geographic and clinical heterogeneity of the study population enhances the model’s generalizability, it needs to be validated prospectively in other populations with differing comorbidities and experiences with health care. The target population of any predictive model determines both its clinical appropriateness and performance characteristics, including sensitivity and specificity,77 and ours consisted of community-dwelling elders with self-reported declining health. However, the PROMPT might be more accurate and useful in alternative populations, for example, patients identified on the basis of comorbidities and health care utilization (as in the SUPPORT study19, 21 and more recent prognostic model efforts in nursing home patients with dementia24), or physicians’ own prognostic estimates,78, 79 recently operationalized through what has been termed the “surprise question:”80–82 “Would I be surprised if this patient died in the next 12 months?” Applying our tool in such selected populations—with higher pre-test probabilities of six-month mortality—would likely improve prognostic performance. Future research also might fruitfully examine whether testing strategies combining multiple prognostic tools and factors could further enhance prognostic power. For example, sensitivity might be increased by employing the PROMPT in parallel with other approaches, such as physicians’ prognostic estimates or the CMS Local Coverage Determination guidelines83 used by hospice providers to determine hospice eligibility.
Finally, more work is needed not only to evaluate and improve the PROMPT, but to understand the optimal means and outcomes of applying prognostic models clinically. The landmark SUPPORT study demonstrated that simply providing physicians with prognostic information may not alter end-of-life decision making or patterns of care.84 Various factors including physician and patient attitudes14, 22 and the structures and processes of health care may limit effective utilization of prognostic models. The feasibility of implementing a PRO-based model such as ours, which utilizes 11 variables but 28 individual data elements, remains to be determined. These and other potential barriers need to be better understood, along with the appropriate methods for communicating prognostic information in a sensitive, comprehensible manner. The current effort provides a foundation for such work and for efforts to refine existing models and to determine optimal strategies for their implementation.
This study was supported by intramural research funds from the National Cancer Institute, National Institutes of Health (NIH). Ron Hays was supported in part by NIA grants (P30AG021684 and P30-AG028748) and an NCMHD grant (2P20MD000182).
We thank Robert Arnold, Rachel Ballard-Barbash, Steve Clauser, and anonymous reviewers for helpful comments on an earlier version of the manuscript.
The authors declare no conflicts of interest.
The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the National Cancer Institute.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.