|Home | About | Journals | Submit | Contact Us | Français|
To compare depression health state preference scores across four groups: (1) general population, (2) previous history of depression but not currently depressed, (3) less severe current depression, and (4) more severe current depression.
Primary data were collected from 95 general population, 163 primary care, and 83 specialty mental health subjects.
Stratified sampling frames were used to recruit general population and patient subjects. Subjects completed cross-sectional surveys. Key variables included rating scale and standard gamble scores assigned to depression health state descriptions developed from the Patient Health Questionnaire-9 (PHQ-9) and SF-12.
Each subject completed an in-person interview. Forty-nine subjects completed test/retest reliability interviews.
Depressed patient preference scores for three of six SF-12 depression health states were significantly lower than the general population using the rating scale and two of six were significantly lower using standard gamble. Depressed patient scores for five of six PHQ-9 depression health states were significantly lower than the general population using the rating scale and two of six were significantly lower using standard gamble.
Depressed patients report lower preference scores for depression health states than the general population. In effect, they perceived depression to be worse than the general public perceived it to be. Additional research is needed to examine the implications for cost-effectiveness ratios using general population preference scores versus depressed patient preference scores.
Health state preference scores assign a quantitative measure of value to specific health states constrained by death (given a score of 0) and perfect health (given a score of 1 or 100). The specific health states used in this context can be an individual's current health or a description of a hypothetical health state. Health state preference scores are obtained using a variety of methods (Drummond et al. 1997). Health state preference scores form the basis for calculating quality-adjusted life-years (QALYs). Cost per QALY ratios are increasingly used to inform health care resource allocation decisions (National Institute for Clinical Excellence 2004). However, important methodological issues remain regarding the measurement of health state preferences, including who should be the source of the health state preferences used in cost per QALY calculations.
The Panel on Cost-Effectiveness in Health and Medicine recommended using the general population as the source of health state preferences for the reference case analysis (Gold et al. 1996). The Panel's rationale for making this recommendation was based on fairness and minimizing bias, that is, the general population is blind to its own self-interest (unaware of future health problems) and therefore able to provide a less biased assessment of health state preferences. However, in practice, researchers use many sources to generate health state preferences (Brauer et al. 2006). A recent review of cost–utility analyses published between 1998 and 2001 found that 30.3 percent of preference scores were derived from the community, 23.3 percent from patients, 21.0 percent from clinicians, and 18.7 percent from the authors (Brauer et al. 2006). While distinctions are drawn between utility, value, and preference scores (Gold et al. 1996), for simplicity, this paper will use the term “preference score” for each.
Health state preferences obtained from different groups are often similar but can vary widely (Ubel, Loewenstein, and Jepson 2003). Specifically, health state preference scores obtained from patients who have experienced the condition may differ from preference scores obtained from groups who have not experienced the condition. Individuals with the condition may incorporate a greater range of experiences associated with a health state, may accommodate to their current state of health, or may change the way they rate their health in comparison with others (scale recalibration) (Ubel, Loewenstein, and Jepson 2003; Ubel et al. 2005;). Within-group differences also exist. For example, the severity of illness (Badia et al. 1996; Lenert, Treadwell, and Schwartz 1999; Insinga and Fryback 2003;) and the length of time since a health event (Adang et al. 1998; Smith et al. 2006;) may impact health preference scores.
A number of studies have compared health state preference scores generated by different groups. Some of these studies have found differences based on health experience (Gabriel et al. 1999; Lenert, Treadwell, and Schwartz 1999; De Wit, Busschbach, and De Charro 2000; Postulart and Adang 2000; Insinga and Fryback 2003; Rashidi, Anis, and Marra 2006;) while others have not (Balaban et al. 1986; Revicki, Shakespeare, and Kind 1996; Dolders et al. 2006;). In general, studies that compare patient and general population health state preferences find that patients assign preference scores to less than perfect health states that are equal to or greater than the preference scores assigned by members of the general population (Sackett and Torrance 1978; Balaban et al. 1986; Froberg and Kane 1989b; De Wit, Busschbach, and De Charro 2000; Dolders et al. 2006;). A conclusion that could be drawn from these studies is that using general population health state preferences might result in more favorable cost per QALY ratios than using patient preferences, except in cases of life-saving interventions (Brazier et al. 2005). For example, if the general population assigns a lower preference score than patients to a less than perfect health state, then using general population preferences for an intervention that restores perfect health would result in a larger QALY difference and a more attractive cost per QALY ratio. Conversely, a life-saving intervention for unhealthy patients could appear less cost-effective using general population preference scores because the patient would return to a health state the general population assigned a lower preference score to.
Our study explored whether depression experience influenced depression health state preferences and how this might affect cost per QALY calculations. We chose depression because depression is often misunderstood and stigmatized by the general population (Link et al. 1999; Barney et al. 2006; Perry et al. 2007;). The objective of this study was to compare depression health state preferences across four groups: (1) general population, (2) patients with past depression but not currently depressed, (3) patients with mild to moderate depression, and (4) patients with moderate to severe depression.
Our study was a cross-sectional, face-to-face survey of individuals sampled from the following recruitment sites: general population, primary care clinics, and specialty mental health clinics. Our recruitment target for the general population sample was 100, and we recruited 95. From the clinic sites, we attempted to recruit subjects with a broad range of depression severity (see Table 1). Our recruitment target from the clinic sites was 300, and we recruited 246. We also collected test–retest reliability data within 2 weeks of the baseline interview from 49 randomly selected subjects (15 from the general population and 34 from the clinic sites).
Eligibility criteria for all groups included (1) age 18–70 years, (2) able to read and understand English, (3) negative screen for significant cognitive impairment as evidenced by diagnosis of dementia or a score >8 on the Blessed Orientation–Memory–Concentration test, (4) no history of schizophrenia diagnosis, (5) negative screen for bipolar disorder, (6) no life-threatening condition, (7) residence within 60 miles of downtown Little Rock, and (8) access to a telephone. Subjects were compensated US$30 to complete the interview. The University of Arkansas for Medical Sciences (UAMS) Institutional Review Board approved the research protocol.
The general population group was recruited from Central Arkansas (Little Rock and surrounding areas) using a commercially available phone list. The Central Arkansas area was selected because the location corresponded with the clinic sites. The phone list included phone numbers, addresses, age, gender, and ethnicity. Potential subjects were selected from the phone list using a stratified random sampling plan to approximate the age, gender, and ethnicity demographic characteristics of Central Arkansas residents. The general population sampling plan did not include depression severity. Potential participants were mailed a postcard stating that they would receive a phone call in 2 weeks about the research study unless they called a toll-free telephone number to decline participation.
From the primary care and specialty mental health clinic sites affiliated with the UAMS, we recruited three patient groups: patients who had past but not current depression, patients with current mild to moderate depression, and patients with moderate to severe depression. Current depression severity was based on reported Patient Health Questionnaire-9 (PHQ-9) severity cut-off scores (Kroenke, Spitzer, and Williams 2001). Patients with history of depression were only recruited from primary care sites, reported that a clinician had made a diagnosis of depression in the past, and had a current PHQ-9 score <5. The groups with current depression were recruited from primary care and specialty mental health care sites. The mild to moderate depression group had a current PHQ-9 score of 5–14, and the moderate to severe depression group had a PHQ-9 score of 15 or more.
Health state preference scores assign value to health state descriptions on a scale from 0 (equivalent to death) to 1 or 100 (equivalent to perfect health). The next two sections describe the methods we used to generate depression health state descriptions and preference scores.
To create hypothetical depression descriptions, we chose to use the format of two existing, well-validated, and widely used instruments: the PHQ-9 from the PRIME-MD and the Medical Outcomes Study SF-12. To create the PHQ-9 health state descriptions, we reviewed the PHQ-9 responses of 3,000 primary care subjects that were previously used to validate the PHQ-9 (Kroenke, Spitzer, and Williams 2001). A distribution of the responses (0–3) was generated for each of the nine items within each category of overall severity (none or minimal, mild, moderate, moderately severe, and severe). For example, the most frequent response for subjects in the overall depression severity category of “none or minimal depression” for item #1 (little interest or pleasure in doing things) was 0 (not at all), and the most frequent response for those subjects with severe depression was a rating of 3 (“nearly every day”). Using these item distributions, a modal depression health state description was created for mild, moderate, and severe depression (see Appendix SA2 for depression health state descriptions).
The SF-12 is a 12-item general measure of health status (Ware, Kosinski, and Keller 1996). The SF-12 contains one or two items for the following eight health dimensions: physical functioning, role functioning physical, bodily pain, general health perception, energy/vitality, social functioning, role emotional functioning, and mental health. To create the SF-12 outcome descriptions, we modified the depression descriptions previously reported using cluster analysis methods (Sugar et al. 1998). The modifications included (a) using single responses for each item rather than response ranges, (b) using six dimensions of the SF-12 developed by Brazier and colleagues (Brazier et al. 1998; Brazier, Roberts, and Deverill 2002;), and (c) adding a severe depression description. The modifications were needed to facilitate the mapping of valuations to individual items in the SF-12 and to include a severe depression description more consistent with specialty mental health subjects. The result was mild, moderate, and severe depression health state description based on SF-12 items (see Appendix SA2).
The preference scoring procedures included simple ranking, rating scale, and standard gamble, in this order. The rating scale preceded the standard gamble to avoid the anchoring effect induced by the standard gamble (Llewellyn-Thomas et al. 1984; Froberg and Kane 1989a,b;). Subjects were introduced to the preference score procedures using practice health states and then moved to the depression health state descriptions. The practice health states were “wearing glasses” and “blindness in both eyes.” The interviewers were trained to use hard copy rating scale and standard gamble props based on McMaster University specifications (Furlong et al. 1990). Interviewers randomly started with either the PHQ-9 or the SF-12 depression descriptions and randomly presented the three severity descriptions from each instrument.
Simple ranking of health states used hard copy index cards with the health state described on one side. The subject placed the PHQ-9 and SF-12 cards in order from most to least desirable. The simple rank order of health states was used as a validity check for the rating scale and standard gamble ratings.
The rating scale was presented as a 100 mm line divided into five unit intervals with end points defined as death (0) and perfect health (100). For a given health state, the respondent assigned a number between 0 and 100, which corresponded to the preference score.
The standard gamble method is consistent with von Neumann-Morgenstern expected utility axioms. The standard gamble incorporates choice and risk by setting up a choice between two alternatives: choice A—living in a particular health state with certainty, or choice B—a gamble on a hypothetical treatment for which the outcome is uncertain. The subject was told that a hypothetical treatment will lead to perfect health with a probability of p, or immediate death with a probability of 1−p. The subject was then asked to choose between choice A (depression outcome with certainty) or B (the gamble). The probability (p) is varied until the subject is indifferent between choices A and B and the preference score for health state A equals p. We used a ping-pong search procedure where gamble probabilities alternate between high and low values in an iterative search that closes in on the indifference point (Llewellyn-Thomas et al. 1984).
Chronic depression was defined as feeling down, depressed, or hopeless most of the time over the past 2 years without feeling depression free for a period of 2 months or more during this time. Current depression treatment or ever being treated for depression included antidepressant medication or counseling. Current physical health comorbidity was determined from a list of 18 physical health problems.
Categorical demographic and clinical variables were compared using a χ2 test. Continuous demographic and clinical variables were compared using the general population as the reference category and a general linear model procedure with the Dunnett post hoc test to adjust for multiple comparisons. Because none of the depression health states were rated worse than death, no adjustments for this response were needed. Similar methods were used to explore the potential influence of current depression on preference scores assigned to hypothetical health states. To do this we examined the preference scores assigned to each subject's current health state and their current health without taking into account the effects of depression.
Test–retest preference scores were obtained on 14.1 percent (49/341) of the total sample: 15 from the general population and 34 from the patient groups. Test–retest reliability was determined using two approaches. First, we calculated the intraclass correlation coefficient. Second, we calculated the difference in hypothetical depression health state preference scores. Differences between subjects from the general population and patient groups were compared using Wilcoxon test and Mann–Whitney U test, respectively.
Table 1 presents a demographic and clinical description of the general population and depression groups. Reflecting the epidemiology of depression, the percent of females in the depression groups was greater than the general population group (χ2=27.1, p<.001). Increasing depression severity was also associated with a lower percentage of being married or living together compared with the general population sample (χ2=9.3, p=.03).
As expected, depression severity and number of depression episodes were greater in the groups with current depression than the general population group. The group with depression history but no current depression was constrained to have PHQ-9 scores <5, resulting in this group having a lower depression score than the general population sample (p=.02). We did not stratify the general population sample by depression severity and seven subjects (7.4 percent) in the general population sample had PHQ-9 scores of 15 or greater, indicating moderate to severe depression. Depression chronicity (p<.001) and history of current (p<.001) or any (p<.001) depression treatment increased with depression severity. Physical health comorbidity also increased with greater depression severity with the general population reporting significantly less physical health comorbidity than all other groups.
Table 1 reports the preference scores associated with the three depression health states. The overall trend was for a decrease in preference scores as the depression severity of the respondent increased. The comparisons reported here are between the general population and the other groups because the general population is the recommended source for health state preferences.
Using the SF-12 health states (Table 1), we found significant differences between the general population and moderate to severe depression groups. More specifically, for all SF-12 depression health states (mild, moderate, and severe), the general population rating scale scores were significantly higher than the moderate to severe depression group scores (89.5 versus 83.6, p=.04; 72.2 versus 62.7, p=.001; 50.7 versus 42.3, p=.02, respectively). In addition, the mean sample standard gamble scores for the mild and moderate SF-12 depression health states were significantly higher in the general population group than the moderate to severe depression group scores (0.87 versus 0.79, p=.02 and 0.77 versus 0.69, p=.01, respectively).
Using PHQ-9 health states (Table 1), five out of six general population rating scale scores were significantly higher than patient groups with current depression. The proportionate differences between the general population and patient groups with current depression also appeared to increase with hypothetical depression health state severity. For example, the proportionate differences between the severe depression group and the general population group increased from 10 percent (67.2/74.7) for the mild PHQ-9 health state to 21 percent (49.5/62.6) for the moderate PHQ-9 health state to 29 percent (30.7/43.5) for the severe PHQ-9 health state. No significant differences were found between the general population group and the depression history only group except for a trend for the severe hypothetical depression health state (43.5 versus 35.8, p=.05). Standard gamble comparisons resulted in more limited differences. The mean general population standard gamble scores for the mild and moderate PHQ-9 depression health states were significantly higher than the mean moderate to severe depression group scores (0.78 versus 0.70, p=.03 and 0.70 versus 0.63, p=.03, respectively).
As expected, patients with current depression rated their current health lower than the general population using the rating scale and standard gamble (see Table 1). For example, the general population rating scale score for current health was significantly higher than the mild to moderate and moderate to severe depression groups (85.2 versus 71.0, p<.001 and 85.2 versus 49.0, p<.001, respectively), and the general population standard gamble scores were also significantly higher than the mild to moderate and moderate to severe depression groups (0.83 versus 0.74, p=.02 and 0.83 versus 0.60, p<.001, respectively). However, when subjects were asked to rate their current health without taking into account the effects of depression there were no significant differences between the general population and depression group scores. There was a significant difference between the general population and the depression history only group standard gamble score for current health without taking into account the effects of depression (0.86 versus 0.94, p=.049, respectively).
Test–retest preference score results for the hypothetical depression health states were obtained from 15 general population subjects and 34 patient subjects. The intraclass correlation coefficient for all subjects completing the test–retest procedure was in the fair to good range: 0.519 for visual analog scale (VAS) and 0.522 for standard gamble (SG) scores (Fleiss 1986). We examined the mean rank difference for each group across the 12 different hypothetical depression health states using a nonparametric Mann–Whitney U test and found no statistical differences between the general population and patient groups. The absolute difference of mean differences for the VAS scores ranged from 0.18 to 3.80 using the 1–100 scale and from 0.01 to 0.04 for the SG scores using the 0–1 scale.
Studies of patient and nonpatient hypothetical health state preferences typically report either no difference or patient preferences exceeding nonpatient preferences (Balaban et al. 1986; Froberg and Kane 1989b; Boyd et al. 1990; Tsevat et al. 1998; Gabriel et al. 1999; De Wit, Busschbach, and De Charro 2000; Ubel et al. 2005; Dolders et al. 2006;). Thus, general population preference scores will result in similar if not more favorable cost-effectiveness ratios compared with patient preference scores, except in the case of life-saving interventions. To our knowledge, this is the first study to report patient health state preferences that are consistently equal to or lower than general population preferences. Specifically, individuals with current depression reported lower depression health state preference scores than a general population sample—they perceived depression to be worse than the general public perceived it to be. This finding is most pronounced using the PHQ-9 depression health state descriptions and the rating scale preference method.
The data in this study do not allow us to determine whether discrepancies between patient and public preferences resulted because patients overestimated how bad depression is, or because the general public underestimated how bad it is, or whether both phenomena contributed. Individuals with depression might overestimate the impact of depression through negative cognitive distortions that are commonly associated with depression. For example, cognitive distortions such as all-or-nothing thinking or overgeneralizing negative events and rejecting positive events are common problems addressed in cognitive-behavioral therapy for depression. Extending this argument to current health state preferences, we would expect individuals with current depression to assign low ratings to their own health with or without depression, and we did not find evidence for this. Instead, depressed patient preference scores for current health state without depression were indistinguishable from the general population or patients with a history of depression only. Other investigators have coined the term “sadder but wiser” to describe depressed subjects' view of reality, while nondepressed subjects view their circumstances as more favorable than they really are (Alloy and Abramson 1979; Seligman 1998;). At the very least, these results lend credibility to depressed subject preference scores.
The general population might underestimate the impact of depression because of stigma associated with the disease—the idea that depression is a personal weakness and depressed persons need to pull themselves up by their bootstraps like everyone else. A measure of public stigma was not included in this study, but recent studies suggest that public stigma associated with depression continues to exist (Link et al. 1999; Barney et al. 2006; Perry et al. 2007;). Public stigma may result in the general population being less sympathetic to the suffering of individuals with depression and less willing to validate the impact of depression symptoms.
In general, the preference scores for the group with past but not current depression were not significantly different from the general population. The effects of depression on depression health state preference scores appear to be greater for subjects experiencing current depression than for those with a history of depression only. Because we did not conduct debriefing interviews with subjects, explanations for this observation are unclear. However, based on theoretical considerations, there could be a role for coping and appraisal related to current depression severity whereby depressed patients utilize more emotion-focused and less problem-focused coping strategies than subjects with a history of depression only when considering the preference scores for hypothetical depression health states (Lazarus and Folkman 1984; Matheson and Anisman 2003;). Future research is needed to better understand the depression health state preference differences between currently depressed subjects and the other groups (general population and history of depression only).
More significant differences between the preference scores of the general population and individuals with depression were noted when using the rating scale versus standard gamble method (8 significant differences versus 4, respectively). There remains considerable debate about which preference score method is the gold standard (Gold et al. 1996; Green, Brazier, and Deverill 2000; Sherbourne et al. 2001;). A concern raised about depressed patients assigning preference scores to health states is that suicidal ideation (a common symptom of depression) would result in depressed patients choosing death over any other outcome. Methods exist for assigning preferences to health states considered worse than dead (Macran and Kind 2001); however, we did not find evidence of this among depressed subjects in this study.
More differences in depressed patient versus general population preference scores were noted when using the PHQ-9 versus SF-12 depression health state descriptions (7 versus 5, respectively). All PHQ-9 depression health state preference scores were lower than SF-12 preference scores, especially for the mild PHQ-9 depression health state. Five of six PHQ-9 depression health state descriptions were significantly lower for depressed patients versus the general population using the rating scale preference method. The depression health state descriptions using the PHQ-9 included the DSM-IV depression symptoms and a generic description of functional impairment associated with work, home, and relationships, whereas the SF-12 descriptions are based on a more generic measure of functioning. Therefore, the PHQ-9 descriptions were more depression specific than the SF-12, and it appears that depressed patients assigned lower preference scores to the more depression-specific health state descriptions.
Overall, there are two implications of these findings. First, the use of general population preference scores for depression interventions could result in less favorable cost-effectiveness ratios compared with using depressed patient preference scores because preference score differences between depressed health states and perfect health are smaller for the general population than depressed patients. A cost-effectiveness ratio with a smaller denominator would result in a larger (less favorable) ratio. For example, a patient with moderate depression restored to perfect health would result in an SF-12 rating scale preference score change of 0.37 using the more severely depressed patient preference scores and 0.28 using the general population preference scores. Similarly, the same patient would have an SF-12 standard gamble preference score change of 0.30 using the more severely depressed patient preference scores and 0.23 using the general population preference scores. It is important to note that these potential change scores, 0.09 and 0.07, both exceed the minimally important clinical difference (approximately 0.03) reported for preference scores (Walters and Brazier 2003; Kaplan 2005;). Thus, many of the differences in change score estimates based on depressed patient versus the general population preferences were both clinically and statistically significant. If these preference score differences were part of an incremental cost per QALY analysis, the ratio would be approximately 30 percent greater using general population versus depressed patient preference scores. Preference weights for the SF-12 or PHQ-9 that are derived from depressed patients are not available at this time; therefore, we were not able to reanalyze existing cost-effectiveness analysis datasets using depressed patient preference weights. Second, these findings may contribute to our understanding of the observation that mental health treatment resources are not keeping pace with physical health treatment resources (Beck et al. 2003; Schomerus, Matschinger, and Angermeyer 2006;). If the general population underestimates the impact of depression, then there may be less motivation to invest health care resources for depression treatment (McKie and Richardson 2003).
This study had several limitations. Fewer standard gamble comparisons were statistically significant between depressed patients and the general population compared with rating scale comparisons. This is important because the standard gamble tends to be the preferred preference elicitation method among at least some health care economists. However, the connection between the standard gamble preference scores and how patients make health care decisions has been the subject of debate, and rating scale methods have been widely accepted as the most practical of the preference elicitation methods (Brazier et al. 1999). In addition, more statistically significant differences between depressed patients and the general population were noted for the depression-specific health states (PHQ-9) than the generic health states (SF-12). This is important because the Panel on Cost Effectiveness in Health and Medicine recommends the use of generic measures (Gold et al. 1996). However, there are several high-profile economic evaluations of depression interventions that converted depression-specific symptom severity into generic QALYs (Schoenbaum et al. 2001; Simon et al. 2001; Katon et al. 2005;), and there is some evidence to support the validity of these conversion formulas (Pyne et al. 2007). Subjects were recruited from a convenience sample and from a single state and therefore may not be representative of the universe of depressed patients or the general population.
In conclusion, depressed patients report lower depression health state preference scores than the general population. Given this finding, cost-effectiveness ratios using general population preference scores may result in less favorable cost-effectiveness ratios compared with ratios using depressed patient preferences. At the very least, we recommend replication of our findings and consideration of depressed patient preference scores to calculate QALYs in sensitivity analyses for cost-effectiveness analyses of depression interventions.
Joint Acknowledgment/Disclosure Statement: This work was supported by NIMH R21 MH64681-01A1 and Veterans Affairs Research Career Development Award (Dr. Pyne). The authors acknowledge Christian Lynch and Silas Williams for their tremendous data collection efforts.
Additional supporting information may be found in the online version of this article:
Appendix SA1: Author Matrix.
Appendix SA2: PHQ-9 and SF-12 Depression Health State Descriptions.
Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.