|Home | About | Journals | Submit | Contact Us | Français|
Previous mapping algorithms estimating EQ-5D index scores from the SF-12 were based on preferences from a UK community sample. However, preferences based on the general US population are most appropriate for cost-effectiveness analyses done from the societal perspective in the United States.
To provide a mapping algorithm for estimating EQ-5D index scores from the SF-12 based on a nationally representative sample and using preferences based on the general US population.
The Medical Expenditure Panel Survey (MEPS) 2002 and 2000 data were used as independent derivation and validation sets to estimate the relationship between SF-12 scores and EQ-5D index scores, controlling for sociodemographic characteristics and comorbidity burden. Prediction equations for end-users who only have access to SF-12 scores were derived and compared. The empirical performance of censored least absolute deviations (CLAD), Tobit, and ordinary least squares (OLS) analytic methods were compared by calculating the mean prediction error in the validation set.
The fully specified CLAD model resulted in the lowest mean prediction error, followed by OLS and Tobit. The CLAD prediction equation based only on SF-12 scores performed better than the fully specified OLS and Tobit models.
The current research provides an algorithm for mapping EQ-5D index scores from the SF-12. This algorithm may provide analysts with an avenue to obtain appropriate preference-based health-related quality-of-life scores for use in cost-effectiveness analyses when only SF-12 data are available.
Health status measures provide a numeric score representing a profile of health status across several domains, such as physical and meantal health. The SF-12 and its longer form, the SF-36, are examples of generic health status measures. Although these measures provide important information about health-related quality of life (HRQL), they do not incorporate preferences for health states and cannot be used in cost-effectiveness analyses. In contrast, preference-based HRQL instruments are appropriate for the calculation of quality-adjusted life-years (QALYs) in cost-effectiveness analysis (CEA).1,2 QALYs rely on preference weights as the metric by which to reflect the HRQL impact of different health states.
Preference-based HRQL instruments can be separated into 3 categories: direct preference measures, multiattribute health status classification systems (MAHSCS), and estimation or mapping methods.3 Examples of direct preference measures include the standard gamble (SG),4 time tradeoff (TTO),5 and rating scales such as the Visual Analog Scale (VAS).6 MAHSCS offer a practical and theoretically appealing means of incorporating community-based preferences that is consistent with the underlying theory of CEA and QALYs.7 The EQ-5D, Health Utilities Index (HUI), and Quality of Well-being (QWB) Scale are examples of MAHSCS with empirically based, community-derived preference functions.7-10 However, only the EQ-5D has a preference scoring function based on the US general population.
Like MAHSCS, mapping estimation methods rely on the theoretical relationship between health status and preferences to derive preference estimates.3 Mapping methods enable analysts to estimate preference-based HRQL scores when MAHSCS or directly elicited preferences are not feasible or available. The SF-12 is a valid and reliable measure of generic health status and has been ued extensively.11 In cases where analysts would like to conduct a cost-effectiveness analysis and preference-based HRQL scores are not available but SF-12 scores are, it is practical to use a mapping algorithm to estimate EQ-5D index scores from SF-12 scores. Previous studies have developed mapping algorithms to estimate EQ-5D index scores from the SF-12 in nationally representative US populations.12,13 However, the preferences underpinning the EQ-5D index scores were based on a community sample in the United Kingdom.14 UK-based TTO scores have been shown to be significantly different from US-based TTO scores,15 implying that the EQ-5D index scores may also differ. Recently, preferences have been derived for the EQ-5D index based on a community sample in the United States.16 These scores would be more appropriate for cost-effectiveness analyses done from the societal perspective in the United States.1 The purpose of this research is to provide a mapping algorithm for estimating EQ-5D index scores from the SF-12 based on a nationally representative US population and using the recently available US community-based preferences.16
The Medical Expenditure Panel Survey (MEPS) is a nationally representative survey of the US civilian noninstitutionalized population with oversampling of Hispanics and blacks, collecting detailed information on demographic characteristics, health conditions, health status, use of medical care services, health insurance coverage, income, and employment.17 The current research used the 2000 and 2002 MEPS public use data.17 The 2000 and 2002 data were used to provide a larger sample and because they each contain data on unique individuals that do not overlap (unlike MEPS 2001 data, which have panels that overlap with 2000 and 2002) but are equally nationally representative. The SF-12 and EQ-5D were administered via the self-administered questionnaire (SAQ), a paper-and-pencil questionnaire administered to adults aged 18 years or older. If an individual was unable to respond to the SAQ, the questionnaire was completed by a proxy. In the MEPS 2000 data set, there were 14,286 individuals with valid MCS-12, PCS-12, and EQ-5D index scores and 23,647 in 2002, respectively. The 2000 and 2002 medical conditions files were also linked to the full-year consolidated data file to estimate the comorbidity burden of each individual. In the medical conditions files, 693 three-digit ICD-9 codes were mapped by professional coders from the medical conditions reported by survey respondents.18 Clinical classification categories (CCC) were derived in MEPS from clinically meaningful combinations of ICD-9 codes into 259 mutually exclusive categories.17,19 The number of chronic conditions (NCC) was calculated as the total number of chronic CCC codes reported by an individual. Only chronic (lasting >1 year) conditions were included to ensure that the condition was experienced while the SAQ was administered.
The EQ-5D consists of a 5-item dscriptive system that measures 5 dimensioins of health status (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression) with 3 levels per dimension (no problem, some problems, and extreme problems). There are 243 unique health states deriving from the combination of all possible dimensions and levels of the 5 questions. Preferences for these health states are mapped using a multiattribute value function (MAVF). The scoring algorithm for the EQ-5Dindex descriptive system used in this research is based on US community preferences.16
From this scoring algorithm, EQ-5Dindex scores are calculated based on responses to the 5-item questionnaire. The EQ-5D has been used as an HRQL and preference measure in a wide variety of diseases and conditions in more than 600 publications, and its construct validity, reliability, and responsiveness have been documented extensively.20,21
The SF-12 Health Survey is a general health status instrument with 12 questions producing 2 summary scores, the Physical Component Summary (PCS-12) and the Mental Component Summary (MCS-12). These summary scales are scored such that higher scores represent better physical and emotional function, respectively, and are standardized so that the mean score is 50 and standard deviation is 10 in the general population.11
EQ-5D index scores in MEPS exhibit a ceiling effect, with a significant number of respondents rating themselves in full health.22,23 This has been shown for the HUI as well.24 Ignoring the bounded nature of preference-based HRQL scores and using ordinary least squares (OLS) will result in biased and inconsistent estimates.25 The Tobit model provides a consistent and efficient alternative for estimating such data.25,26 The Tobit model specification with upper censoring at 1.0 is defined as follows:
where yi is a latent measure of the preference score, xi is a vector of independent variables affecting preference scores, β is a vector of unknown parameters, ui are residuals that are independently and normally distributed with mean 0 and a common variance σ2, and yi is the actual preference socre as measured by the EQ-5D.
However, in the face of heteroskedasticity or nonnormality, the Tobit model also produces biased estimates.25 In contrast, because the censored least absolute deviations (CLAD) estimator does not depend on distributional or homoskedasticity assumptions of the errors and is robust to censoring, it produces consistent estimates even in the face of heteroskedasticity, nonnormality, and censoring.27 Based on these theoretical limitations that have been confirmed in the empirical econometric literature,24,28,29 the current research estimates EQ-5D index scores using CLAD, Tobit, and OLS.
To test for heteroskedasticity and normality in the Tobit model, we used the likelihood ratio test described by Petersen and Waldman30 and the Hausman test, respectively.25 All analyses were conducted in STATA. CLAD as performed in STATA using an existing user written add-in31 as well as additional programming to incorporate sample weights. Programs were written in STATA to perform the Hausman and the likelihood ratio tests.
The MEPS sample design includes stratification, clustering, multiple stages of selection, and over-sampling of minority populations.32 Unless otherwise noted, we incorporate MEPS sampling weights in all analyses, ensuring nationally representative estimates. The CLAD algorithm used in this research controls for MEPS sampling weights and produces nationally representative point estimates. However, unlike OLS and Tobit, CLAD does not make explicit assumptions about error structure. The result is that the MEPS variance structure cannot be formally included when using CLAD. As a result, bootstrapping based on the MEPS variance structure was used to calculate robust standard errors.
Independent derivation and validation data sets were used to compare the empirical performance of CLAD, Tobit, and OLS. The MEPS 2002 data were used as the derivation data set to regress EQ-5D index scores on independent variables. The MEPS 2000 data were then used as the validation data set to predict EQ-5D index scores using the regression coefficients from the derivation data set. Any EQ-5D index scores predicted >1 were truncated at 1.0 to remain consistent with the scale bounds. The absolute value of the difference between the predicted and actual EQ-5D index score (prediction error) was then calculated for each individual in the validation data set. The total prediction error, mean prediction error, and 95% confidence intervals were then calculated based on the prediction error in the validation data set (MEPS 2000) to measure the relative performance of the 3 analytic methods. CLAD as then used on the final data set (pooled MEPS 2000 and 2002) to produce the mapping algorithm for future use.
The approach to model specification was guided by an attempt to include all relevant variables that have been shown to have a theoretically meaningful and/or statistically significant relationship with EQ-5D index scores in previously published multivariate models. Based on previous research22,23 and mapping algorithms,12,13 the following variables were considered: PCS-12, MCS-12, PCS-122, MCS-122, age, sex, race, ethnicity, income, education, and NCC. The correlation of these variables was then examined. To avoid multicollinearity, variables with correlations ≥0.8 were omitted from the model specification. Although some variables that are not statistically significant may be retained with this approach and end-users likely do not have access to information for all of these variables, it was considered important to have a correct model specification to obtain unbiased estimates. After estimating the full model, alternatives are derived to enable mapping based only on MCS-12 and PCS-12 scores for end-users. In the final recommended algorithm, the coefficients for the intercept, MCS-12, PCS-12, and MCS-12. PCS-12, were taken from the fully specified model results and a second constant was then derived to substitute information about sociodemographic variables that may not be available. This approach was taken to ensure that end-users with only MCS-12 and PCS-12 information could map EQ-5D index scores while ensuring that consistent estimates were obtained from the fully specified model.
Table 1 provides descriptive statistics for the MEPS 2000 and 2002 sample used for the final regressions. Sociodemographic characteristics, mean age, number of chronic conditions, EQ-5D index, and PCS-12 and MCS-12 scores are reported for the 2000 and 2002 MEPS sample. Mean EQ-5D index and PCS-12 scores decline by increasing age category. EQ-5D index, PCS-12, and MCS-12 scrores appear lower for females than males, are lower for blacks and American Indians compared to whites, are higher for other races and Hispanics (except MCS-12) compared to whites and non-hispanics, and are lower for lower levels of educational attainment and poverty status compared to higher levels. The mean NCC reported increases consistently by age category, is greater for females than males, and generally declines for higher levels of educational attainment and income.
The correlation of considered variables resulted in elimination of only 2 variables. Both PCS-122 and MCS-122 had extremely high correlation coefficients (0.99) with the PCS-12 and MCS-12, respectively, and were excluded from the model specification. All other variables had acceptable correlations and were included in the final model specification.
The homoskedasticity assuption (likelihood ratio test = 506; P < 0.001) and the normality assumption (Hausman test statistic = -2851; P < 0.001) were rejected for the classic Tobit model, suggesting that Tobit estimates are likely biased. As with many health status measures in population health surveys, it appears that the EQ=5D index is not normally distributed and exhibits a significant ceiling effect (46% reported EQ-5Dindex = 1.0). Given these factors, the CLAD estimates are theoretically the only consistent estimates of EQ-5D index scores.
The mean and total prediction error are compared across individuals (Table 2) to measure the relative empirical performance of CLAD, Tobit, and OLS in predicting actual EQ-5D index scores in the validation error. Surprisingly, in this experiment, OLS performed better than Tobit. The mean prediction error incorporates MEPS sampling weights and is nationally representative. The actual and predicted EQ-5D index scores are displayed graphically in Figure 1. Predicted scores <0.4 were extremely rare and are not shown. Of note is the fact that the minimum possible decrement from full health (1.0) is 0.14, resulting in no scores between 0.86 and 1.0 on the EQ-5D index.
To address and practifal dilemma faced by endusers who do not have access to additional variables other than PCS-12 and MCS-12 scores, alternative prediction equations based only on PCS-12 and MCS-12 scores were developed. To address this practical necessity, previous algorithms provide model specifications restricted to PCS-12 and MCS-12 scores alone.12,13 In the current approach, it was considered important not to estimate a model restricted to only PCS-12 and MCS-12 scores because it may result in model misspecification. Hence, our approach uses the results of the full model specification to ensure accurate estimates of the impact of PCS-12 and MCS-12 scores. The first alternative is to use the prevalence estimates provided in Table 1 along with the coefficients for each respective variable (i.e., coefficient for near poor multiplied by the proportion of near poor in Table 1) from the full regression to estimate a “weighted average” composite effect of these variables. The CLAD “weighted average” (CLAD WA) approach results in a constant of -0.01647 for CLAD, -0.00677 for OLS, and -0.03132 for Tobit in the final sample.
The second alternative was to take advantage of the specification of the full model and derive a constant that minimized the absolute prediction error. The full model specification can be characterized as follows:
where α = the intercept, γi = the available SF-12 variables, ηi = all other (non-SF-12) variables, and ε = the error term:
Because the end-user has limited information about variables in ηi, we want to offer a constant that will minimize the absolute prediction error. is the solution to the following optimization problem:
Note that α and γi are not estimated in this equation. We already have consistent estimates of α and γi from the results of the fully specified regression (and these coefficients were used in the minimization problem above). This method ensures that the regression coefficients from the full model specification are used. is a constant representing the composite effect of all non-SF-12 variables. We estimated to be -0.01067. Table 2 shows that both CLAD WA and CLAD minimized prediction (CLAD MPE) methods result in good empirical prediction, and CLAD MPE performs better than the fully specified OLS regression in the validation set.
Regression results for the final model specification in the entire data set (MEPS 2000 and 2002) are shown in Table 3. CLAD, Tobit, and OLS regressions are estimated to compare results empirically. The statistical significance of the coefficients in Table 3 varies across methods for Hispanic, other race income, and education. Pseudo-R2 for OLS and Tobit were calculated as 1 - LLconstant/LLfull, whereas pseudo-R2 for CLAD was calculated as 1 - (sum of raw deviations/sum of absolute deviations). LLconstant is the log likelihood of the regression of EQ-5D index scores on a constant, and LLfull is the log likelihood of the regression of EQ-5D index scores on the full regression model. The sum of the raw deviations is similar to the sum of the absolute deviations for the regression of EQ-5D index scores on a constant, whereas the sum of absolute deviations in the denominator is for the full model. Although all 3 measures of pseudo-R2 are measuring something similar (the amount of variance explained by the full model compared to a model including no explanatory variables), they are not comparable across methods. CLAD is not based on maximum likelihood (ML) estimation, and hence a pseudo-R2 based on log likelihood cannot be computed. Although OLS does not use ML, a log likelihood can be calculated and compared to Tobit. However, OLS ignores 46% of the EQ-5D index scores that are clustered at 1.0 and therefore explains a much smaller amount of variance than Tobit (which estimates the variance of a latent variable with possible scores = 1.0 and thus results in a much larger pseudo-R2). Given the lack of comparability of the pseudo-R2 values, the comparison of prediction error (Table 2) may be a more appropriate measures of the relative performance of the 3 methods for the purpose of mapping closest to actual EQ-5D index scores.
The purpose of this research was to develop an algorithm to allow mapping of SF-12 scores to EQ-5D index scores based on a nationally representative sample. There are 2 published algorithms to map EQ-5D index scores from the SF-12 in nationally representative data sets.12,13 The current research differs in several important respects from the previous research. First, the mapping algorithm provided in this research is based on a nationally representative US sample and maps to the EQ-5D index values based on the general US population,16 providing the first mapping algorithm of preference-based EQ-5D index scores that meet the criteria of the Panel on Cost-Effectiveness in Health and Medicine (PCEHM) for reference-case cost-effectiveness analyses from the societal perspective in the United States. The US and UK community-based TTO scores have been shown to differ significantly, implying that EQ-5D index scores may also differ.15 Although very useful to understanding the relationship between the SF-12 health status instrument and the EQ-5D index, previous algorithms mapped to EQ-5D index value based on a community sample in the United Kingdom14 and do not meet the PCEHM criteria for the United States.1 Second, the current research takes a different analytical approach, addressing the characteristics of the data (i.e., 46% censoring, nonnormality, and heteroskedasticity) and comparing different econometric approaches.
This mapping algorithm may be useful to researchers who need preference-based HRQL scores when only SF-12 scores are available. In these cases, analysts could use the algorithm from this research to derive EQ-5D index scores from available values of the SF-12.
Some comments on the practical application of this algorithm may be useful to end-users. First, in most cases, only the MCS-12 and PCS-12 scores are available, and values for sociodemographic variables are missing. In these cases, it is recommended that the CLAD MPE model with the derived constant (-0.01067) be used. This method uses the following prediction equation: EQ-5D = 0.057867 + 0.010367·PCS-12 + 0.00822·MCS-12 - 0.000034·PCS-12·MCS-12 - 0.01067. When predicting EQ-5D index scores, it is important to note that predicted scores >1.0 are possible. Hence, analysts should truncate any predicted score greater than 1 at 1.0, as was done in the empirical comparison. It is important that end-users incorporate appropriate sensitivity analyses of point estimates mapped from SF-12 scores. More detailed information on the appropriate use of sensitivity analysis is provided elsewhere.33 One can use the standard error or 95% confidence intervals (CIs) of the mean predicted EQ-5D score to derive an estimate of the uncertainty in the point estimates. The mean, standard error, and 95% CI for the CLAD MPE-predicted EQ-5D score in the full MEPS sample were 0.8912, .0010636, and 0.8891-0.8933, respectively. However, an equation for calculating an adjusted standard error for mapping is provided by Franks and others:12
where var is the sample variance of the individual predictions in a sample of size N, and R2 = 0.41.
Researchers can calculate EQ-5D index scores and adjusted standard erros by entering PCS-12 and MGS-12 scores and sample standard deviation at the journal’s Web site, mdm.sagepub.com.
Although it would have been feasible to develop a model based on all 12 items of the SF-12, it may not be practical for end-users (many of whom will only have data on the PCS-12 and MCS-12 scales). In addition, the model specification becomes unwieldy and difficult for applies use, as discussed by Franks and others.12
The current research compares the empirical performance of different analytic methods to estimate EQ-5D index scores in an independent, large, nationally representative valiation set, which has implications for the broader econometric analysis of preference-based HRQL scores that tend to be clustered at 1.0. In this nationally representative data set, EQ-5D index scores were found to have a high degree of censoring (46%), with erros that were not normally distributed and exhibited heteroskedasticity. Theoretically, these properties suggest that CLAD is the only unbiased estimator of EQ-5D index scores.25 However, given the statistical properties of EQ-5D index scores in this data set, it is surprising how well OLS performed and how poorly Tobit performed. The mean prediction error with OLS was not significantly higher than with CLAD and was lower than Tobit. The 95% CI of the prediction errors overlapped for all 3 analytic methods. Future researc is needed to examine the most appropriate theoretical approach to estimating preference-based HRQL scores with these statistical properties, as well as the relative empirical performance of analytic methods.
The current research is not without limitations. There is a significant amount of variance that is unexplained by the SF-12 and sociodemographic variables in this analysis. This uncertainty should be incorporated in probabilistic sensitivity analyses in cost-effectiveness analyses using these stimates. In addition, the methods of reducing predicted scores above 1.0 may improve the empirical performance of all 3 methods but likely improves the performance of CLAD and Tobit more than OLS. Unlike previous mapping models, the current model does not include squared PCS-12 and MCS-12 terms due to the lack of apparent nonlinearity and potential for multicollinearity. The range of predicted EQ-5D scores for the fll CLAD model ranges from 0.33 to 1.04, This range covers approximately 98% of actual EQ-5D scores in MEPS. The range for the simplified CLAD model (CLAD MPE) was .42 to 1.02, which covers >96% of all actual EQ-5D scores. If using the minimum and maximum values of actual PCS-12 and MCS-12 scores, the CLAD MPE model-predicted EQ-5D scores range from .20 to >1.0. This range covers >99.9% of actual EQ-5D scores in MEPS and demonstrates the potential range of the model given actual SF-12 scores in MEPS. Nonetheless, it should be noted that the mapping algorithms provided in this research may not provide the full range of individual EQ-5D scores possible at the lower bounds.
Although the construct validity, reliability, and responsiveness of the EQ-5D have been documented extensively in both general and specific disease populations, the EQ-5D is not without limitations. The parsimony in the 5 items and 3 levels of the EQ-5D questionnaire translates into ease of administration and efficiency of preference assessment but also results in a potential lack of discrimination. As discussed, 46% of the nationally representative MEPS sample had an EQ-5D score of 1.0. In addition, from the full health state of 1.0, the minimum possible decrement in the EQ-5D index score is 0.14. (However, it should be noted that using the mapping algorithm provied in this article will provide predicted scores within this range.) Taking these limitations into account, there is no consensus that other theoretically based MAHSCS are superior to the EQ5D.34-37 Each has its own advantages and limitations. The EQ-5D is currently the only instrument with a preference scoring algorithm based on the general US population that can be mapped from the SF-12 in a nationally representative data set.
The generalizability of the estimates provided in this research is limited to the MEPS sample of individuals with valid EQ-5D index, MCS-12, and PCS-12 scores. Individuals with complete responses for all 3 scores are likely to have slightly better health than others. MEPS is also restricted to the noninstitutionalized population. Nonetheless, it is rare to find both instruments in a nationally representative population, and the sample used in this research is likely as generalizable to the nation as is available.
In a recent examination of cost-utility analyses, 77% did not incorporate community-based preferences, and 33% used arbitrary expert or author judgment.38 Although not perfect, the mapping algorithm provided in this research may provide analysts an avenuwe to obtain preference-based HRQL scores appropriate for reference-case cost-effectivenjess analyses in the United States when only SF-12 scores are available.
This research was funded by grant 1R03AG027348-01. The authors would like to thank 2 anonymous reviewers for helpful comments.