provides descriptive statistics for the MEPS 2000 and 2002 sample used for the final regressions. Sociodemographic characteristics, mean age, number of chronic conditions, EQ-5D index, and PCS-12 and MCS-12 scores are reported for the 2000 and 2002 MEPS sample. Mean EQ-5D index and PCS-12 scores decline by increasing age category. EQ-5D index, PCS-12, and MCS-12 scrores appear lower for females than males, are lower for blacks and American Indians compared to whites, are higher for other races and Hispanics (except MCS-12) compared to whites and non-hispanics, and are lower for lower levels of educational attainment and poverty status compared to higher levels. The mean NCC reported increases consistently by age category, is greater for females than males, and generally declines for higher levels of educational attainment and income.
Sociodemographic Charateristics. Mean Age, Chronic Conditions, EQ-5Dindex and PCS-12 and MCS-12 Scores of US Adults in the Final 2000 and 2002 MEPS Sample
The correlation of considered variables resulted in elimination of only 2 variables. Both PCS-122 and MCS-122 had extremely high correlation coefficients (0.99) with the PCS-12 and MCS-12, respectively, and were excluded from the model specification. All other variables had acceptable correlations and were included in the final model specification.
The homoskedasticity assuption (likelihood ratio test = 506; P < 0.001) and the normality assumption (Hausman test statistic = -2851; P < 0.001) were rejected for the classic Tobit model, suggesting that Tobit estimates are likely biased. As with many health status measures in population health surveys, it appears that the EQ=5D index is not normally distributed and exhibits a significant ceiling effect (46% reported EQ-5Dindex = 1.0). Given these factors, the CLAD estimates are theoretically the only consistent estimates of EQ-5D index scores.
The mean and total prediction error are compared across individuals () to measure the relative empirical performance of CLAD, Tobit, and OLS in predicting actual EQ-5D index scores in the validation error. Surprisingly, in this experiment, OLS performed better than Tobit. The mean prediction error incorporates MEPS sampling weights and is nationally representative. The actual and predicted EQ-5D index scores are displayed graphically in . Predicted scores <0.4 were extremely rare and are not shown. Of note is the fact that the minimum possible decrement from full health (1.0) is 0.14, resulting in no scores between 0.86 and 1.0 on the EQ-5D index.
Comparison of Mean Prediction Error for OLS, Tobit, and CLAD Estimators and Previous Prediction Algorithms in the Validation Set (MEPS 2000)
Actual, ordinary least squares (OLS), censored least absolute deviations (CLAD), and Tobit predicted EQ-5D index scores in the 2000 and 2002 Medical Expenditure Panel Survey (MEPS) sample.
To address and practifal dilemma faced by endusers who do not have access to additional variables other than PCS-12 and MCS-12 scores, alternative prediction equations based only on PCS-12 and MCS-12 scores were developed. To address this practical necessity, previous algorithms provide model specifications restricted to PCS-12 and MCS-12 scores alone.12,13
In the current approach, it was considered important not to estimate a model restricted to only PCS-12 and MCS-12 scores because it may result in model misspecification. Hence, our approach uses the results of the full model specification to ensure accurate estimates of the impact of PCS-12 and MCS-12 scores. The first alternative is to use the prevalence estimates provided in along with the coefficients for each respective variable (i.e., coefficient for near poor multiplied by the proportion of near poor in ) from the full regression to estimate a “weighted average” composite effect of these variables. The CLAD “weighted average” (CLAD WA) approach results in a constant of -0.01647 for CLAD, -0.00677 for OLS, and -0.03132 for Tobit in the final sample.
The second alternative was to take advantage of the specification of the full model and derive a constant that minimized the absolute prediction error. The full model specification can be characterized as follows:
where α = the intercept, γi
= the available SF-12 variables, ηi
= all other (non-SF-12) variables, and ε = the error term:
Because the end-user has limited information about variables in ηi
, we want to offer a constant that will minimize the absolute prediction error.
is the solution to the following optimization problem:
Note that α and γi
are not estimated in this equation. We already have consistent estimates of α and γi
from the results of the fully specified regression (and these coefficients were used in the minimization problem above). This method ensures that the regression coefficients from the full model specification are used.
is a constant representing the composite effect of all non-SF-12 variables. We estimated
to be -0.01067. shows that both CLAD WA and CLAD minimized prediction (CLAD MPE) methods result in good empirical prediction, and CLAD MPE performs better than the fully specified OLS regression in the validation set.
Regression results for the final model specification in the entire data set (MEPS 2000 and 2002) are shown in . CLAD, Tobit, and OLS regressions are estimated to compare results empirically. The statistical significance of the coefficients in varies across methods for Hispanic, other race income, and education. Pseudo-R2 for OLS and Tobit were calculated as 1 - LLconstant/LLfull, whereas pseudo-R2 for CLAD was calculated as 1 - (sum of raw deviations/sum of absolute deviations). LLconstant is the log likelihood of the regression of EQ-5D index scores on a constant, and LLfull is the log likelihood of the regression of EQ-5D index scores on the full regression model. The sum of the raw deviations is similar to the sum of the absolute deviations for the regression of EQ-5D index scores on a constant, whereas the sum of absolute deviations in the denominator is for the full model. Although all 3 measures of pseudo-R2 are measuring something similar (the amount of variance explained by the full model compared to a model including no explanatory variables), they are not comparable across methods. CLAD is not based on maximum likelihood (ML) estimation, and hence a pseudo-R2 based on log likelihood cannot be computed. Although OLS does not use ML, a log likelihood can be calculated and compared to Tobit. However, OLS ignores 46% of the EQ-5D index scores that are clustered at 1.0 and therefore explains a much smaller amount of variance than Tobit (which estimates the variance of a latent variable with possible scores = 1.0 and thus results in a much larger pseudo-R2). Given the lack of comparability of the pseudo-R2 values, the comparison of prediction error () may be a more appropriate measures of the relative performance of the 3 methods for the purpose of mapping closest to actual EQ-5D index scores.
Final Model Specification Results in the 2000 and 2002 MEPS Sample