|Home | About | Journals | Submit | Contact Us | Français|
We compared single- and multi-item measures of general self-rated health (GSRH) to predict mortality and clinical events a large population of veteran patients.
We analyzed prospective cohort data collected from 21,732 patients as part of the Veterans Affairs Ambulatory Care Quality Improvement Project (ACQUIP), a randomized controlled trial investigating quality-of-care interventions.
We created an age-adjusted, logistic regression model for each predictor and outcome combination, and estimated the odds of events by response category of the GSRH question and compared the discriminative ability of the predictors by developing receiver operator characteristic curves and comparing the associated area under the curve (AUC)/c-statistic for the single- and multi-item measures.
All patients were sent a baseline assessment that included a multi-item measure of general health, the 36-item Medical Outcomes Study Short Form (SF-36), and an inventory of comorbid conditions. We compared the predictive and discriminative ability of the GSRH to the SF-36 physical component score (PCS), the mental component score (MCS), and the Seattle index of comorbidity (SIC). The GSRH is an item included in the SF-36, with the wording: “In general, would you say your health is: Excellent, Very Good, Good, Fair, Poor?”
The GSRH, PCS, and SIC had comparable AUC for predicting mortality (AUC 0.74, 0.73, and 0.73, respectively); hospitalization (AUC 0.63, 0.64, and 0.60, respectively); and high outpatient use (AUC 0.61, 0.61, and 0.60, respectively). The MCS had statistically poorer discriminatory performance for mortality and hospitalization than any other other predictors (p<.001).
The GSRH response categories can be used to stratify patients with varying risks for adverse outcomes. Patients reporting “poor” health are at significantly greater odds of dying or requiring health care resources compared with their peers. The GSRH, collectable at the point of care, is comparable with longer instruments.
Health administrators, researchers, and policymakers use prediction models to forecast patient outcomes including morbidity, mortality, and health system utilization. Traditionally, administratively derived predictors have been used for such purposes, however, their limitations have led to the development of alternatives (Romano et al. 1993; Iezzoni et al. 1996; Iezzoni 1999; Schneeweiss and Maclure 2000; Schneeweiss et al. 2001; Schneeweiss et al. 2003). Measures of self-rated health are robust risk predictors that have gained in popularity as a substitute for administratively derived tools. These self-rated health measures are patient centered and predictive of subsequent health outcomes, even in patients without prior health problems. In several studies, patient self-rated health status has predicted such important patient outcomes as mortality and health system utilization (Miilunpalo et al. 1997; Curtis et al. 2002; Fan et al. 2002a,Fan et al. 2002b; Spertus et al. 2002; Knight et al. 2003). These measures remain consistent predictors of hospitalizations and mortality rates even after adjustment for clinically relevant factors (Clarke and Oxmann 2002; Lowrie et al. 2003).
Routine use of self-rated health measures for health care planning and delivery is partially limited by burdens associated with collection of health status information. Many self-rated health measures are multi-item scales that are often onerous to collect in routine practice settings. Single-item general self-rated health status (GSRH) measures may serve as a reasonable substitute for multi-item measures of self-rated health (Balkrishnan and Anderson 2001). They have the advantage of being less expensive and less burdensome to collect, and could be conceivably collected at the point of care with relative ease. In a health care setting that uses a relational, electronic database, this collection could occur as part of routine intake in the primary care setting. They are easy to score and interpret and, like the longer multi-item scales, these single-item measures have predictive validity for mortality and health care utilization in some populations (Idler and Benyamini 1997; Bierman et al. 1999; Balkrishnan et al. 2000). GSRH measures are relatively stable (Eriksson et al. 2001) and sensitive to change (Rodin and McAvay 1992; Diehr et al. 2001).
While the research regarding the use of a single-item GSRH measure as a risk assessment tool is promising, gaps exist (McHorney 1999; Diehr et al. 2001; Eriksson et al. 2001). For example, the performance of such tools is poorly understood in diverse patient populations, and in comparison with multi-item risk assessment tools. The objectives of this study were to determine whether a single-item GSRH measure could predict important outcomes in a large veteran outpatient population, and to compare its discriminative ability with established multi-item risk predictors.
We analyzed prospective cohort data collected as part of the Ambulatory Care Quality Improvement Project (ACQUIP). This multicenter, randomized trial was designed to study the effectiveness of primary care-based, quality-of-care interventions in a Veterans Affairs (VA) patient population (Fihn et al. 2004).
Patients were eligible to participate in the ACQUIP study if they were enrolled in the general internal medicine clinics, between March 1, 1997 and July 31, 1999, at one of seven VA medical centers: Birmingham, Alabama; Little Rock, Arkansas; San Francisco, California; West Los Angeles, California; White River Junction, Vermont; Richmond, Virginia; or Seattle, Washington. Patients were assessed at baseline, and follow-up health surveys were mailed at regular intervals to enrolled patients. The information from these surveys was linked to health resource utilization and clinical outcomes using the VA information system (Visit A).
At time of enrollment, participants were sent a Health Checklist, which asked about sociodemographic characteristics and coexisting illnesses. The Health Checklist was returned by 35,383 (54 percent) of the enrolled patients. Patients who returned the Health Checklist were then sent an instrument that measured general health status, the 36-item Medical Outcomes Study Short Form (SF-36) (Ware 1998). During the study, 61 percent (n=21,732) of these participants returned at least one SF-36. The first SF-36 returned was used for the analysis. Participants who returned a completed SF-36 (n=21,732) were older (mean age 64 versus 58, p<.001), more likely to be male (p<.001), married (p<.001), employed (p<.001), and white (p<.001) than nonrespondents. Respondents had a somewhat higher prevalence of some chronic medical conditions including a prior myocardial infarction (18 versus 16 percent, p<.001), cancer (12 versus 10 percent, p<.001), and congestive heart failure (8 versus 7 percent, p<.001) than those who did not return the SF-36.
The main predictor was the single-item GSRH question from the SF-36, “In general, would you say your health is…” with a 5-category Likert response scale of Excellent, Very Good, Good, Fair, Poor. We compared the GSRH to multi-item scales that are calculated from the SF-36 and have been shown to predict mortality and utilization (Hornbrook and Goodman 1996; Fan et al. 2002). The SF-36 consists of eight subscales: physical functioning, role physical, bodily pain, general health, vitality, social functioning, role emotional, and mental health, that can be summarized as mental component summary (MCS) and physical component summary (PCS) scores (Ware et al. 1994). The PCS and MCS are each normalized to a 100-point scale with a mean of 50 (SD±10). Higher scores reflect better functioning (Ware and Keller 1995). In the VA population, scores on the PCS and MCS tend to be below national norms, with mean scores of 47.8 (SD±12.2) and 37.1 (SD±11.9), respectively (Kazis et al. 1999) (Au et al. 2001).
We also compared predictive accuracy of the GSRH to the Seattle index of comorbidity (SIC), which combines patients' self-reports of coexisting chronic conditions (prior myocardial infarction, cancer, chronic obstructive pulmonary disease, congestive heart failure, diabetes mellitus, pneumonia, and stroke), age, and tobacco use. Ordinarily, the SIC is scored with higher scores reflecting increasing levels of comorbidity and greater risk of mortality and hospitalization among elderly, male primary care patients (Fan et al. 2002). To facilitate interpretability of our results, we reversed the scaling of the SIC to mirror the other measures being evaluated.
The primary outcome was all-cause mortality in the year following baseline assessment of health status. We ascertained death from the VA Beneficiary Identification in Record Locator Subsystem (BIRLS) database, which records deaths of patients whose families apply for veteran's death benefits. The BIRLS system has a sensitivity for detecting mortality that ranges between 80.0 and 94.5 percent (Cowper et al. 2002).
Secondary outcomes included aspects of health services use during the 1-year period following the baseline of health status measurement. We treated hospitalization as a dichotomous variable considered positive if a patient had a hospital admission for any reason during the study interval. We did not ascertain admissions to facilities outside of the VA health system or to nursing homes. Use of outpatient services consisted of all medical visits within the VA system including primary and specialty care. We defined “high use” as the top 10th percentile of total visits for the year, which translated into more than seven visits.
We created an age-adjusted logistic regression model for each predictor and outcome combination. For this analysis, we included only those 21,732 patients who had completed both the Health Checklist and the SF-36, and who had at least 1 year of follow-up. The GSRH response options were modeled as a categorical variable collapsing the “excellent” and “very good” into one reference category to keep with conventional practice (Idler and Kasl 1991) (Kaplan and Camacho 1983). We calculated odds ratios for death, hospitalization, and high outpatient utililization. The c-statistic, or area under the receiver operator curve (AUC), assessed the model discrimination and the Hosmer–Lemeshow goodness-of-fit χ2 statistic (Hosmer, Applied Logistic Regression) assessed the model calibration. AUC values range from 0 to 1, with a value of 1 representing perfect prediction and a value of 0.5 representing chance prediction and a relevant parameter space of 0.5–1.0. Hosmer and Lemeshow have suggested that a c-statistic or AUC value between 0.70 and 0.80 is acceptable and a value greater than 0.80 is excellent. Values higher than 0.90 are rarely observed. For reference, the c-statistic for the Framingham Heart Study risk calculator, a commonly used risk assessment tool is 0.77 (Wilson et al. 1998). To compare the predictive ability of the risk prediction measures, the AUC/c-statistics were compared using the method of DeLong, DeLong, and Clarke–Pearson for correlated data. The standard error and 95 percent confidence intervals (CIs) were also calculated for each AUC using the method of DeLong et al. We used STATA 8.0 statistical software for all analyses.
The sample characteristics were representative of the VA patients nationally (Kazis et al. 1999). Subjects were predominantly older, white, male, and had multiple coexisting illnesses (Table 1). Approximately one-third reported receiving care outside of the VA. Most subjects reported health status below national norms. More than 50 percent reported “poor” or “fair” health on the single-item GSRH measure. Most patients reported multiple coexisting illnesses from the comorbidity index with a median SIC score of 4 (interquartile range 2–5). The most common chronic illnesses from the index were chronic obstructive pulmonary disease and diabetes. In all, 674 patients (3.1 percent) died and 3,255 (15 percent) were hospitalized during the year following the baseline of health status measurement. The median number of outpatient visits was 3 (interquartile range 2–5). Approximately 10 percent of patients in the cohort had no visits during the 1-year period, and an equal percent had seven or more visits.
Actual event rates for mortality, hospitalization, and high outpatient utilization during the 1-year study interval are displayed in Figure 1. For each of the outcomes, we noted a graded relationship, with the subjects who reported worse self-rated health having higher event rates than their peers who reported better self-rated health. For example, patients reporting “poor” GSRH had a mortality rate of 8 percent versus those patients with “excellent” GSRH whose mortality rate was less than 1 percent. Four times as many participants with “poor” GSRH were hospitalized during the study period than those reporting “excellent” GSRH. Patients with “poor” versus “excellent” health had a similarly higher prevalence of high outpatient use (14 versus 4 percent).
In age-adjusted models, GSRH predicted mortality, hospitalization, and high outpatient use. A graded relationship is observed with higher odds of all-cause mortality with incrementally worse GSRH. Compared with persons reporting “excellent/very good” health status, the OR [95 percent CI] of mortality was 1.27 [0.88, 1.83], 2.46 [1.75, 3.47], and 6.84 [4.86, 9.63] for individuals reporting “good,”“fair,” and “poor” health, respectively. Patients with “poor” health had nearly four times the odds of hospitalization and three times the odds of being a high user of outpatient services in the ensuing year as their peers reporting “excellent/very good health.”
In models including age, all evaluated predictors performed best when predicting mortality, as opposed to hospitalization and outpatient utilization (Table 2). In age-adjusted models, the GSRH, PCS, and SIC all demonstrated good predictive properties for identifying patients at risk for death in the year subsequent to baseline measurement with an AUC of approximately 0.73–0.74. All measures of self-rated health and comorbidity poorly predicted hospitalization and high outpatient use. The PCS performed best at hospitalization prediction with an AUC of 0.64, though this was not statistically different from the GSRH and SIC. For hospitalization, the GSRH, PCS, and SIC had statistically similar AUC, but the MCS performed statistically less well (p<.001). All predictors had similar, poor performance for predicting high outpatient service use in the year following the baseline of health status measurement. The MCS had statistically worse performance than the other measures with an AUC less than 0.70 for outcomes.
As a lone predictor of mortality, age had an AUC/c-statistic of 0.65 (95 percent CI: 0.63–0.67) in predicting mortality, and the addition of GSRH improved the AUC/c-statistic to 0.74 (Table 3). The further addition of the PCS, MCS, or a combination of both to age improved the AUC/c-statistic to approximately 0.75. The AUC/c-statistic pattern was similar for the other outcomes under study, hospitalization and high outpatient use of services. There was little to no incremental value in adding additional measures to the predictive capacity of GSRH and age alone.
We found the GSRH single-item measure to be predictive of mortality, hospitalizations, and high utilization of outpatient services. The discriminative ability of the GSRH question for predicting mortality in this cohort was good (AUC/c-statistic 0.74); and therefore, an adequate test by usual standards. Importantly, the GSRH single-item measure performed as well as the multi-item self-reported health risk predictors with which it was compared. Patients who characterized their health as “poor” were at significantly greater risk of dying or requiring health care resources than those who reported their health as “fair” or better. While the GSRH measure stratified patients according to their risk for hospital and outpatient utilization, its performance, based upon an AUC/c-statistic of less than 0.70 was less than optimal, though similar to the multi-item measures.
These findings of GSRH prediction of mortality confirm the finding of this relationship in other populations. Many studies have documented an increased risk of mortality for those patients reporting worse GSRH (Mossey and Shapiro 1982; Idler and Benyamini 1997; Kawada 2003), even when adjusting for key covariates. “Poor” GSRH has also been shown to predict with subsequent hospitalization in Medicare recipients (Bierman et al. 1999) and outpatient visits (Miilunpalo et al. 1997).
In this article, we chose to compare GSRH to other measures of self-rated health and self-reported comorbidity that are established predictors of mortality and utilization. The ability of the GSRH to gauge the risk of death is comparable with other risk prediction tools including those that are administratively derived. For example, the commonly used Framingham Risk Calculator has an AUC/c-statistic of 0.77, although it is typically used to predict events over a much longer time horizon of 10 years (Wilson et al. 1998). Other administrative database risk predictors have comparable AUC/c-statistics to the GSRH, including the CDS-1 with an AUC/c-statistic of 0.70 and the Romano with an AUC/c-statistic of 0.77 (Schneeweiss et al. 2003).
Despite the potential utility of general health measures for identifying groups of subjects at risk, this measure of health is not routinely captured and used because of issues relating to respondent burden, data collection and analysis, and physician acceptance (Deyo and Patrick 1989). The strong psychometric properties of these single-item GSRH questions suggest that they can serve as reasonable substitutes for longer instruments; thus, saving time and money while still providing valid, reliable information—an important goal when attempting to routinely collect health status information. Health system planners and public health agencies, for example, could potentially use the responses from patients' GSRH to target resource allocation and care delivery planning to those with the highest need.
Given the strong association between GSRH and important outcomes like mortality and health care utilization, collecting such information at outpatient visits could theoretically provide an inexpensive method to identify patients who might benefit from specific interventions such as disease or case management.
The results of this study must be interpreted in the context of its limitations. Because, patients in the VA system have greater disease burden and higher health care needs than the general U.S. population, it is uncertain whether the predictive properties of the GSRH question observed in this study would be reproducible in other settings. Moreover, we evaluated only hospitalization and outpatient visits within the VA system and did not assess non-VA care because we were principally interested in understanding the utility of the GSRH as a risk prediction to for identifying individuals at high risk within an institution. In this sample, approximately one-third of patients reported receiving at least some care outside of the VA system. VA patients tend to have worse health status than non-VA populations, and therefore these results may not be generalizable to non-VA populations, particularly those that are relatively healthy (Kazis et al. 1999). Finally, we asked the GSRH question as part of a larger survey. This may have led to contextual bias in the responses to the GSRH item. For example, patients may be more likely to report worse GSRH if they have just completed a multi-item checklist of their coexisting illnesses.
In summary, a single-item measure of general health identifies patients with an increased risk of mortality, hospitalization, and outpatient use as well as multi-item measures of self-reported health and comorbidity. Further work is needed to determine if assessing patients' GSRH in routine clinical settings improves care by identifying groups at risk for increased mortality and other important health outcomes.
Supported by grants from the Department of Veterans Affairs SDR 96-002 and IIR 99-376. Dr. DeSalvo is supported by a grant from the Robert Wood Johnson Foundation Generalist Physician Faculty Scholar Program and NIH Award K12 HD 43451. The authors would like to thank Dr. John Peabody for his thoughtful comments on this manuscript.