A number of limitations must be considered in interpreting these results. First, only a restricted set of common conditions was included in the analysis and some were pooled to form larger disorder groups. A number of burdensome conditions, such as dementia and psychosis, were not included. Expansion and disaggregation is clearly needed in future research. Second, diagnoses of chronic physical conditions were based on self reports that could have been biased. Such bias might account for the generally higher prevalence estimates of these conditions in developed than developing countries. Third, we focused on 12-month prevalence of conditions but 30-day health valuations, as these were the time frames included in the WMH surveys. This difference in recall periods would be expected to lead to an under-estimate of the severity of the active phases of episodic conditions (e.g., migraine), although it should yield an accurate estimate of the average severity of conditions in a typical month (30-day) of the year (12-month). A related limitation is that even a 12-month time frame is relatively short compared to the time frames used in some other health valuation studies (e.g., 10-years or lifetime).
Another limitation is that the highly skewed distribution of VAS scores and non-additive effects of comorbid conditions might have led to instability of results. Even though we explored use of GLM rather than OLS and examined a number of different model specifications to capture effects of comorbidity, it is possible that future research will discover better specifications either of functional form or of joint associations of comorbid conditions with health valuations. In particular, the use of data mining techniques such as regression tree analysis (Breiman, 2001
; Breiman et al. 1984
; Friedman, 1991
) might provide useful insights into better specification of interaction effects. A related limitation is that we assumed that the VAS is an interval scale. At noted above in the section on analysis methods, this assumption has been called into question in some previous studies (Krabbe et al. 2006
; Parkin & Devlin, 2006
). Nonlinear monotonic transformations have been proposed to approximate interval scale properties (Krabbe, 2008
; Craig et al. 2009
). It would be very useful in future methodological research to explore the extent to which these different methods influence results.
Another limitation is that our estimates were based only on the overall adult population in developed and developing countries. The ratings of conditions might be quite different in different population segments (e.g., elderly, women, poor) or in different countries. Future research is needed to investigate these specifications. The use of anchoring vignettes has been shown to help address this problem (Salomon et al. 2004
). In addition, a number of statistical methods exist to improve the accuracy of comparisons across sub-samples and populations that could profitably be used in future applications (Tandon et al. 2002
Another limitation is that our results are based on VAS scores assigned by respondents to their own health states rather than to health states based on hypothetical vignettes. While there is general agreement that perceptions of people in the general population should be taken into consideration in making health valuations (Gudex et al. 1996
), concerns have been raised that bias exists in the perceptual ratings of community respondents based on their own illness experiences (Stiggelbout & de Vogel-Voogt, 2008
) and their familiarity with the experiences of people close to them (Krabbe et al. 2006
), resulting in a general preference for health valuations made by experts (Marquie et al. 2003
). Furthermore, bias in self reports in the WMH data might have been greater for mental than physical conditions because so many questions were asked in the survey about mental conditions and the VAS was administered only at the end of the survey. It would be useful to investigate this potential bias in future applications by randomizing the order of presentation of the VAS question in the survey. Methods have been developed to integrate VAS responses with responses based on other valuation methods (e.g., time trade-off, willingness to pay) that might also profitably be used in future studies to evaluate these biases (Salomon & Murray, 2004
A less obvious limitation, finally, is that the simulation method evaluated marginal
effects of individual conditions. This method can be faulted because it implicitly assumes that the presence vs. absence of a single condition can be changed while holding constant all other conditions. This assumption would be plausible if all comorbid conditions were either causes or risk markers (Kraemer et al. 1997
) of focal conditions. However, in cases where the comorbid condition is a consequence of the focal condition or where two or more conditions are reciprocally related, the simulation method used here will under-estimate the effect of the focal condition (assuming that comorbidity is positive) by controlling for one or more of the intervening pathways through which that condition influences VAS scores.
This under-estimation could be removed by deleting controls for all conditions that are thought to mediate the total effect of the focal condition. However, in the case where these comorbid conditions are reciprocally related to the focal condition, exclusion of the comorbid conditions from the prediction equation will lead to over-estimation of the effect of the focal condition. The only plausible way to address that issue is to develop a methodology of partial control
: that is, to control for the subset of comorbid conditions that have causal effects on the focal conditions but not for the subset that occur as a consequence of the focal condition. An innovative methodology known as g-estimation has been developed to do this (Young et al. 2010
), but this method requires access to large-scale longitudinal epidemiological data that monitor onset and course of comorbid conditions over time. As a result of this data requirement, use of g-estimation has been minimal (Taubman et al. 2009
) and has never to our knowledge been used to study health valuation. This method is nonetheless very promising and deserves to be explored in future studies aimed at sorting out the effects of comorbidty on health valuation.
Within the context of these limitations, our results show clearly that sensible estimates can be obtained of condition-specific effects on VAS while taking comorbidity into consideration. As noted in the introduction, a similar approach could be used to study informant ratings by using a series of hypothetical vignettes of people with comorbid conditions rather than pure conditions. We find that the consideration of comorbidity makes a substantial difference to ratings. In particular, condition-specific ratings are lower when comorbidity is taken into consideration due to a general pattern of sub-additive interactions among comorbid conditions in predicting VAS scores. This sub-additive pattern is consistent with the findings of the one other previous study we know that carried out a similar type of analysis (Verbrugge et al. 1989
). Furthermore, we found substantial between-condition variation in the extent to which adjustment for comorbidity influences estimates.
Although the substantive findings regarding effects of individual conditions on VAS should be interpreted with caution given the limitations enumerated above, it is noteworthy that neurological conditions, insomnia, and major depression were estimated to be the most severe conditions at the individual level. The neurological conditions we considered included epilepsy and seizure disorders, Parkinson’s disease, and multiple sclerosis, all of which have been shown to have high disability in previous studies (Jacoby & Baker, 2008
; Singer et al. 1999
). The high ranking of insomnia is surprising because previous studies, although documenting a high societal-level burden of insomnia, have generally found this to be due to high prevalence in conjunction with moderate individual-level burden rather than to high individual-level burden (Roth et al. 2006
). The high individual-level severity of insomnia in our study probably lies in the fact that we required a greater sleep disruption (at least two hours of either delay in sleep onset or disruption in sleep maintenance per night most nights of the week for at least one month in the past year) than previous studies of insomnia (Ohayon, 2002
). The high individual-level estimate we found for depression, finally, is consistent with much previous research (Donohue & Pincus, 2007
; Gabilondo et al. 2009
; Wang et al. 2008
The rank-ordering of the individual-level VAS estimates was found to be quite similar in developing and developed countries. However, several exceptions were found. These should be investigated in future studies. Digestive conditions (stomach/intestine ulcer and irritable bowel disorder) were rated considerably more severe in developed than developing countries, possibly reflecting a different mix of cases that might explain the differences in estimated severity. The individual-level estimated severity of drug abuse, in comparison, was substantially higher in developing than developed countries. Differential willingness to admit drug problems might have been involved in this result, as reported prevalence of drug abuse was much lower in developing than developed countries, possibly indicating that the cases we learned of in developing countries were more severe than those in developed countries (Schmidt & Room, 1999
Comparison of our individual-level condition severity estimates with estimates in an earlier WMH analysis of condition-specific role impairment (Ormel et al. 2008
) finds that the conditions rated most severe in that earlier study were generally also rated among the most severe in the current investigation. However, a number of differences in relative ratings exist that could be attributed either to differences in the outcome (i.e., a global VAS score versus a measure of condition-specific role impairment) or to our previous analysis not adjusting for comorbidity.
Our results regarding societal-level associations are less innovative because, consistent with previous studies, we merely multiplied the prevalence estimates of the conditions with the individual-level estimates of condition severity to arrive at societal-level estimates of burden. As in previous studies that compared individual-level and societal-level estimates (Andlin-Sobocki et al. 2005
; Saarni et al. 2007
; Whiteford, 2000
), the rank-ordering of conditions differs considerably between the two, with societal-level estimates influenced importantly by variation in prevalence and the conditions estimated to be most burdensome at the societal level dominated by high-prevalence conditions.
While our results argue clearly for the importance of considering comorbidity when estimating disease burden, the best way to do this is not obvious. The approach we took here has the advantage of considering comorbidities in their true distribution in the population rather than requiring hypothetical scenarios to be generated that might or might not adequately characterize the actual distribution of complex comorbidities in the population. However, methods also exist to allow the effects of individual conditions to be estimated using expert ratings of hypothetical patient scenarios that include information about complex profiles of comorbidity (Jasso, 2006
; Saarni et al. 2007
). Indeed, the actual distributions of comorbidity found in community surveys like the WMH surveys could be used to generate these vignettes so as to guarantee that they represent the distribution and range of patterns in the population. As many health policy researchers favor condition severity ratings made by experts rather than the ratings made by respondents in community surveys for a variety of other reasons (Insinga & Fryback, 2003
; Marquie et al. 2003
; Ormel et al. 2008
; Schnadig et al. 2008
), it might be that the best approach would be to build information about comorbidity into conventional expert rating scenarios. However, valuations of the sort presented here based on community samples also would seem to have value in representing the perceptions of actual people with real conditions in the population. It remains a challenge for the field to develop a way of integrating data of these different sorts.