|Home | About | Journals | Submit | Contact Us | Français|
The Panel on Cost-Effectiveness in Health and Medicine has called for an “off-the-shelf” catalogue of nationally representative, community-based preference scores for health states, illnesses, and conditions. A previous review of cost-effectiveness analyses found that 77% did not incorporate community-based preferences, and 33% used arbitrary expert or author judgment. These results highlight the necessity of making a wide array of appropriate, community-based estimates more accessible to cost-effectiveness researchers.
To provide nationally representative EQ-5D index scores for chronic ICD-9 codes.
The nationally representative Medical Expenditure Panel Survey (MEPS) was pooled (2000−2002) to create a data set of 38,678 adults. Ordinary least squares (OLS), Tobit, and censored least absolute deviations (CLAD) regression methods were used to estimate the marginal disutility of each condition, controlling for age, comorbidity, gender, race, ethnicity, income, and education.
Most chronic conditions, age, comorbidity, income, and education were highly statistically significant predictors of EQ-5D index scores. Homoskedasticity and normality assumptions were rejected, suggesting only CLAD estimates are theoretically unbiased. The magnitude and statistical significance of coefficients varied by analytic method. OLS and Tobit coefficients were on average 60% and 143% greater than CLAD, respectively. The marginal disutility of 95 chronic ICD-9 codes as well as unadjusted mean, median, and 25th and 75th percentiles are reported.
This research provides nationally representative, community-based EQ-5D index scores associated with a wide variety of chronic ICD-9 codes that can be used to estimate quality-adjusted life-years in cost-effectiveness analyses.
Advancements in medical technology within the reality of limited societal resources have led to the need to assess the value of competing interventions through cost-effectiveness analysis. The quality-adjusted life-year (QALY) method incorporates both survival and the impact of health-related quality of life (HRQL) associated with different health states into a single index, thereby facilitating comprehensive assessment of the effectiveness of an intervention.1,2 Preference weights are used in QALY estimation to adjust for the HRQL impact of different health states.
In 1993, the US Public Health Service convened the Panel on Cost-Effectiveness in Health and Medicine (PCEHM) to standardize the field of cost-effectiveness analysis. The PCEHM recommended the use of community-based, nationally representative preferences for use in cost-effectiveness analyses.1 In practice, however, adherence to this recommendation is poor. A review of 228 cost-effectiveness analyses found that 77% did not incorporate community-based preferences, and 33% used arbitrary expert or author judgment.3 These results highlight the necessity of making valid community-based estimates more accessible.
To accomplish this goal, the PCEHM recommended the collection of a national catalogue of preference weights that could be used by cost-effectiveness researchers without the burden of primary data collection. Such a repository would provide “off-the-shelf” access to a wide number of disease and condition-specific preference weights and would encourage comparisons across different studies and health interventions. The PCEHM further recommended that any such catalogue use a theory-based method to derive empirical weights for health states, illnesses, and conditions from a nationally representative, community-based sample of the US population.1
Previous systems offer important insight and improve our understanding of the sources of variability in preference weights but do not meet the criteria of the PCEHM. Before the PCEHM was convened, Fryback et al.4 measured preferences using the time tradeoff (TTO) method and the Quality of Well-being (QWB) scale for 1356 noninstitutionalized residents of Beaver Dam, Wisconsin. Although a seminal contribution to the understanding of condition-specific preferences, the Beaver Dam sample was not nationally representative. Gold et al.5 developed the Health and Limitations Index (HALex), an HRQL scoring system capable of defining 30 possible health states to provide condition-level weights for the US population. Time and resource constraints prevented the National Center for Health Statistics (NCHS) from directly eliciting empirical community-based preferences for the HALex. HALex scores are not based on empirical weights from a nationally representative community sample. Instead, scores were derived by correspondence analysis of value scores on 2 questions (attributes) in the National Health Interview Survey (NHIS).6 Tengs et al.7 and Bell et al.3 responded to the PCEHM call by collecting numerous preference estimates that had been used in previous studies. The estimates published by Tengs et al. and Bell et al. underscore the eclectic variability in preference weights and highlight one of the underlying motivations for the PCEHM's call for a national catalogue of standardized preference weights.
More recently, Sullivan et al.8 developed a preliminary catalogue of preference-based HRQL scores that meets all criteria of the PCEHM. The authors provide EQ-5D index scores of chronic conditions as classified by the Agency for Healthcare Research and Quality (AHRQ) for 58 clinical classification categories (CCCs) and 10 quality priority conditions (QPCs). CCCs are broad clinical groupings of several ICD-9 codes.9 QPCs are common, serious conditions for which standards for appropriate clinical care have been developed.10
Cost-effectiveness analyses address a wider variety of conditions, illnesses, and health states than those represented in this preliminary catalogue. The most common classification of diseases and conditions used by health researchers and clinicians is the International Classification of Disease (ICD).11 ICD codes provide more refined and clinically homogeneous classification of distinct conditions. Preference-based HRQL scores associated with ICD-9 codes may be appropriate when QPC or CCC codes are too generic for the desired clinical condition. For example, ICD-9 413 “Angina Pectoris” is 1 of 5 distinct 3-digit ICD-9 codes that comprise CCC 101 “Coronary Atherosclerosis and Other Heart Disease.” In addition, the provision of preference-based scores associated with ICD-9 codes may correspond more directly to available cost data for use in cost-effectiveness models because medical billing data, and hence cost estimates, are often categorized by ICD-9 code.
The purpose of this research is to augment the preliminary national catalogue of preference-based HRQL scores for QPC and CCC codes developed by Sullivan et al.8 by providing EQ-5D index scores associated with a variety of chronic ICD-9 codes.
The Medical Expenditure Panel Survey (MEPS) is a nationally representative survey of the US civilian noninstitutionalized population with oversampling of Hispanics and blacks, collecting detailed information on demographic characteristics, health conditions, and health status.10 Medical condition diagnoses in MEPS are based on ICD-9 codes. The current research pooled 2000−2002 MEPS person-level and medical conditions files to construct an individual-level file with information on ICD-9 codes reported by respondents.
The EQ-5D consists of a 5-item descriptive system that measures 5 dimensions of health status (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression) with 3 levels per dimension (no problem, some problems, and extreme problems). The combination of all possible dimensions and levels results in 243 unique health states. A multiattribute value function (MAVF) is used to map preferences for these health states. From this scoring algorithm, EQ-5D index scores are calculated based on responses to the 5-item questionnaire. The EQ-5D has been used as an HRQL and preference measure in a wide variety of diseases and conditions in more than 600 publications.12 In addition, it has been used as a general population health measure in numerous countries.12 The construct validity, reliability, and responsiveness of the EQ-5D have been documented extensively in both general and specific disease populations.12,13 The scoring algorithm for the EQ-5D index descriptive system used in this research is based on US community preferences.14
For the current study, the EQ-5D was administered via a paper-and-pencil self-administered questionnaire (SAQ) given to adults 18 years or greater. The questionnaire was completed by a proxy for individuals unable to respond. There were 38,678 unique individuals 18 years of age or greater with valid EQ-5D scores included in this analysis. Individuals may have reported more than 1 condition. The sample design of the MEPS-HC survey includes stratification, clustering, multiple stages of selection, and oversampling of minority populations.15 Using the MEPS sampling weights, we adjusted for these factors and for survey nonresponse. Conditions with fewer than 100 observations were excluded from the reported results.
A total of 693 three-digit ICD-9 codes were mapped by professional coders from the medical conditions reported by survey respondents.10,11 We have included only chronic (lasting >1 year) conditions to ensure that the condition was experienced while the EQ-5D was administered via the SAQ. (The SAQ was administered during rounds 2 and 4 of the MEPS survey. MEPS data do not provide detailed information about the exact timing and duration of acute conditions and thus cannot be correlated with accuracy to the EQ-5D elicitation time.) Hwang et al.16 convened 2 physician panels to determine which ICD-9 codes reported in MEPS should be classified as chronic. This list was used to determine which ICD-9 codes could be classified as chronic conditions.
Preference-based HRQL scores exhibit a ceiling effect with a significant number of respondents rating themselves in full health.17 The ceiling effect is particularly evident with the EQ-5D index. As discussed elsewhere,8 in these circumstances, the censored least absolute deviations (CLAD) estimator developed by Powell et al.18 has theoretical advantages over ordinary least squares (OLS) regression and the Tobit model, which may produce biased estimates when the assumptions of heteroskedasticity or nonnormality are violated.18–20 To test for heteroskedasticity in the Tobit model, we used the likelihood ratio test described by Petersen and Waldman.21 This statistic has a limiting chi-squared distribution. In addition, the normality assumption was examined using the Hausman test.20
All regression analyses were done using STATA. CLAD was performed in STATA using an existing addin (user-written program)22 and additional programming to incorporate sample weights. Programs were written in STATA to perform the Hausman and the likelihood ratio tests described above. The determination of model specification was guided primarily by the need to be consistent with the specification adopted in the preliminary catalogue.8
Unadjusted mean and median EQ-5D index scores, mean age, and number of chronic conditions reported for the US population in 2000−2002 are shown in Table 1. Table 1 also highlights characteristics that influence EQ-5D index scores. For example, whites are older and report more chronic conditions (possibly due to greater age) than other races. Mean EQ-5D index scores decline by increasing age category, are lower for females than males, are lower for blacks and American Indians compared to whites, are higher for other races and Hispanics compared to whites, and are consistently lower for lower levels of educational attainment and poverty status compared to higher levels.
The homoskedasticity assumption (likelihood ratio test = 401.2; P < 0.001) and the normality assumption (Hausman test statistic = 2015; P < 0.001) were rejected for the classic Tobit model, suggesting that Tobit estimates may be biased. As with many health status measures in population health surveys, it appears that the EQ-5D index is not normally distributed and exhibits a significant ceiling effect (46% reported EQ-5D index = 1.0 in this sample). Given these factors, only CLAD estimates are theoretically unbiased.
Full results of the CLAD, Tobit, and OLS regressions for the MEPS general population without disease indicator variables are displayed in Table 2. Age, the number of chronic conditions (NCC), NCC2, greater levels of income (v. poor), and greater levels of education (v. no degree) were statistically significant in explaining the variation in EQ-5D index scores across all 3 methods. Greater comorbidity burden and higher levels of income had a relatively large magnitude impact on EQ-5D index scores.
The relative percentage differences in the coefficients for each analytic method are also displayed in Table 2. It is evident that point estimates vary significantly by choice of analytic method. Coefficients varied by up to 382% for Tobit estimates of bachelor's degree (0.0823) compared to CLAD (0.0171) and 191% for OLS (0.0497) v. CLAD. On average, statistically significant OLS coefficients were 60% greater in magnitude than CLAD, and Tobit estimates were 143% greater than CLAD and 69% greater than OLS.
Table 3 presents EQ-5D index scores for chronic ICD-9 codes resulting from CLAD regression, as well as unadjusted mean, median, and 25th and 75th percentiles. The number of individuals reporting the condition, mean age, and NCC by percentile are also presented. Estimates of the marginal disutility of each condition after adjusting for age, gender, race, ethnicity, education, income, and comorbidity, are reported for each condition. The complete set of regression coefficients for all ICD-9 codes is available online at the journal's Web site, mdm.sagepub.com.
This research augments the preliminary national catalogue of preference-based scores developed by Sullivan et al.8 by providing EQ-5D index scores associated with a wide range of chronic ICD-9 codes. The preliminary catalogue of chronic conditions was categorized by QPC and CCC codes.8 The current research adds to the existing catalogue and provides analysts with a standardized and flexible source of nationally representative preference-based scores associated with a variety of chronic ICD-9 codes that can be used in QALY estimation for cost-effectiveness analysis.
Availability of nationally representative, community-based preference scores overcomes one of the most important barriers to appropriate estimation of cost-effectiveness.1 An “off-the-shelf” catalogue of preference-based scores derived from a theory-based method provides a standardized and consistent source of preference weights.1 The preference-based scores for chronic ICD-9 codes reported in this research are based on the EQ-5D, a theory-based, previously validated health status classification system that meets all of the criteria established by the PCEHM. Specific examples of how to incorporate these estimates in cost-effectiveness analyses are provided elsewhere (see the online appendix at the journal's Web site, mdm.sagepub.com.
The current list of “off-the-shelf” EQ-5D index scores for ICD-9 codes differs from the previous catalogue for CCC and QPCs. First, the ICD-9 codes in this research provide more refined classification of distinct conditions and may be more appropriate if CCC codes are too generic for the desired clinical conditions. For example, CCC 089 “Blindness and Other Vision Defects” had no associated disutility in previous research,8 which prima facie contradicts clinical expectation. However, CCC 089 is a combination of 4 distinct ICD-9 codes (367 “Disorders of Refraction,” 368 “Visual Disturbance,” 369 “Blindness and Low Vision,” and V41 “Problems with Specific Functions”). In comparison, ICD-9 369 “Blindness and Low Vision” provides a more clinically homogeneous condition classification and has an associated disutility of −0.0498 (Table 3). Hence, while CCC 089 and ICD-9 369 appear similar in name, there is a significant difference in the specificity of clinical categorization. We recommend that analysts familiarize themselves with the ICD-9 to CCC crosswalk to determine the appropriateness of a given condition categorization.23 In addition, ICD-9 codes may correspond more practically to available cost data for use in cost-effectiveness models. Estimation of the cost of a given condition in analyses of existing data (i.e., claims data) is often based on ICD-9 codes. In this case, the provision of EQ-5D index scores associated with ICD-9 codes would provide researchers with the flexibility of having scores that more directly correspond to the costs of the condition. Second, the 10 QPCs reported in the previous research are based on a different question frame. For the QPCs reported previously, patients were asked if they had ever been diagnosed with the condition of interest (e.g., stroke),8 whereas ICD-9 and CCC condition classifications are based on reports of experiencing a given condition within a given year. Hence, ICD-9 436 “Cerebral Vascular Attack (CVA)” corresponds to an individual who, at some point within the year that the EQ-5D was administered, experienced a CVA (Table 3), and CCC 109 “Acute Cerebrovascular Disease” from the preliminary catalogue8 corresponds to an individual who, at some point within the year, reported any 1 of 6 three-digit ICD-9 codes (430, 431, 432, 433, 434, or 436). In contrast, QPC “Stroke” from the previous research8 corresponds to an individual who has previously had a stroke (not necessarily within the past year).
The decision to use broader QPC or CCC codes v. more clinically homogeneous ICD-9 codes in QALY estimation will depend on the requirements of the end-user. However, where a broader approach to the clinical condition of interest is desired (e.g., the generic QPC “Coronary Heart Disease”), or when the sample size of individuals reporting a specific ICD-9 code is too small to ensure accurate estimation of EQ-5D indexscores, the CCC or QPC estimates provided in the previous research may be more appropriate.
Administration of the EQ-5D in MEPS could not be correlated to the exact timing of reported conditions. Both were reported within a given year. Hence, the current research has restricted the scope of the analysis to chronic ICD-9 codes (lasting ≥1 year), as classified by Hwang et al.16 Some of the ICD-9 codes used in this research may not have a chronic impact for all individuals reporting the condition. There were 5 ICD-9 condition coefficients from the regression results that were positive and were omitted. The following 4 were statistically significant and positive: ICD-9 199 malignant neoplasm, 367 disorders of refraction, 600 hyperplasia of prostate, and other skin hypertrophy/atrophy. Positive disutility from chronic conditions is not consistent with clinical expectation. It is possible that some of these conditions were not truly chronic and their impact was not captured in the EQ-5D index score when elicited, that there are unobserved variables that are positively associated with EQ-5D index scores for these conditions, and/or that the EQ-5D health status instrument is not sensitive to these conditions. Condition coefficients that were not statistically significant should be used with extreme caution, incorporating appropriate assessment of the greater uncertainty in the estimates.
The ability of survey respondents to report accurate condition data may be a source of inaccuracy for the conditions listed, and this bias may be exacerbated in blacks and Hispanics. Although the ICD-9 codes were verified and error rates for professional coders did not exceed 2.5% in MEPS,10 they were based on self-report. Previous research has shown that blacks, whites, and Hispanics differ in reporting of disease labeling, levels of illness, and disability, and there is evidence that self-reported conditions may be underreported in general.24–26
As discussed, MEPS data do not contain information on severity of disease. Future research examining the impact of disease severity on preference scores for different conditions is needed. In addition, future research validating the condition estimates in this general population is necessary.
The construct validity, reliability, and responsiveness of the EQ-5D index have been documented extensively in both general and specific disease populations. Although considered a major advantage of the EQ-5D, the parsimony in the items and levels of the questionnaire may result in a potential lack of discrimination. Previous research has suggested that EQ-5D index scores based on the US scoring algorithm exhibit strong ceiling effects.8 This characteristic has necessitated the use of censored regression methods to derive unbiased estimates. The minimum possible decrement in the EQ-5D index score is 0.140 for the US weights. The inability to quantify a health state between 0.86 and 1.0 may lead to lack of discrimination for mild health states. This has also been suggested as a limitation of the UK scoring algorithm for the EQ-5D.27–29 Similar to the QWB and the SF-6D, the scoring function of the EQ-5D is based on a linear additive model, which assumes no interactions in preferences between attributes and may be a limitation of the EQ-5D index. In contrast, the HUI relies on a multiplicative functional form.30 In addition, the US scoring algorithm uses TTO values to estimate preferences based on a MAVF rather than standard gamble (SG)—derived utilities based on a multiattribute utility function (MAUF). Although not free of criticism,1,31,32 the SG results in utilities that are more consistent with the inherent uncertainty in health decisions required by Von Neuman Morganstern expected utility theory.33,34 Despite these limitations, there is no consensus that other preference-based health status classification systems are superior to the EQ-5D index.1,29,35,36 Each has its own advantages and limitations, and the EQ-5D is the only theory-based health status classification system with a scoring function based on US general population preferences that is currently available in a nationally representative data set.
Although consensus guidelines exist on the appropriate use of preference-based HRQL scores, inappropriate use continues to be widespread. The wide variability and inconsistencies across estimates of preference-based HRQL scores in prior cost-effectiveness analyses3,7 underscore the need for consistency in reference-case cost-effectiveness evaluations. This need for consistency lead the PCEHM to call for a national catalogue of preference-based HRQL scores1 as an important step toward promoting comparability of cost-effectiveness analyses. Although there are limitations in the current research, it provides an important addition to the preliminary catalogue developed by Sullivan et al.8 and moves toward the consistency called for by the PCEHM. Perhaps the availability of accessible EQ-5D index scores associated with chronic ICD-9 codes will provide analysts with the flexibility and breadth of “off-the-shelf” preference-based HRQL scores needed to improve the consistency of QALY estimation in cost-effectiveness analysis. Additional studies are necessary to validate these scores within condition-specific populations.
The PCEHM has called for an “off-the-shelf” catalogue of nationally representative, community-based preference scores for health states as well as illnesses and conditions. This research augments the preliminary catalogue developed by Sullivan et al.8 by providing EQ-5D index scores associated with a wide variety of chronic ICD-9 codes that can be used by researchers to estimate QALYs in cost-effectiveness analyses.
This research was supported by grant R03 AG027348-01. A preliminary version of this research was presented as a podium at the Society for Medical Decision Making in October 2004. The authors thank Bill Lawrence and an anonymous reviewer for helpful comments on this manuscript.