In this study we performed an international comparison of HRQoL-based health expectancy. We found that QALE at age 20 ranged between 33 years in Armenia and almost 61 years in Japan. Generally, female QALE was higher than male QALE within this set of countries. In terms of QALE, Hungary and Slovenia performed better than Armenia, yet worse in comparison to the other countries. The relatively low health expectancy for a country such as Armenia may be expected given its lower levels of health spending and national income and its different socioeconomic circumstances. The United States performed worse in terms of QALE compared to the other western high-income countries in the dataset. Many studies have found such unfavorable health outcomes in the US and several explanations for this phenomenon have been given, such as an inefficient health care system, substantial disparities in the population in terms of access to health care, or behavioral factors (unhealthy diets) [44
In the final part of the analysis, we decomposed the difference in QALE using counterfactual scenarios. It was shown that the relative contribution of mortality, health states, and health-state values differed among countries. For example, the high QALE for Japanese males was to a large extent a result of a low prevalence of health problems in EQ-5D domains. In turn, the better average health of Spanish females was largely explained by lower mortality rates. Interestingly, in various cases the EQ-5D profiles showed a greater contribution to differences in QALE than differences in mortality. Lower mortality did go hand in hand with better HRQoL, although there were exceptions. For example, Dutch females had a lower life expectancy than Spanish females, yet they experienced fewer health problems in EQ-5D domains. As a result, the difference in HRQoL-based health expectancy was smaller than the difference in life expectancy between these two countries. The decomposition confirmed that international comparisons of health expectancy, based on country-specific values, are influenced substantially by differences in value sets.
Differences in health expectancy across countries may stem from various factors, among which methodological issues and cultural differences play a role. Amid the three main SMPH elements (mortality, nonfatal health outcomes, and valuation) we focus on the value sets first. A remarkable result was the difference in QALE across the six TTO value sets. The German value set generated QALE up to seven years higher than the UK value set. The ranking of countries varied to a lesser extent across value sets, particularly in the high-performing or low-performing countries. We did find rank switches in the group of average performers. This may be expected because the differences in QALE were relatively small in this middle group, showing various overlapping confidence intervals (see Figure ). Therefore, the ranking of these country-gender strata is particularly sensitive to the value-set choice. Around 50% of the country-gender strata showed a rank-change of two or more positions across value sets. Interestingly, the relative change in QALE associated with the value set choice differed across countries. The impact was greatest in low-performing countries such as Armenia, Hungary, and Slovenia. We also found that the ranking of countries did not consistently improve when local values were used. For example, Germany did not reach a higher rank in the German value set compared to the ranking in which Japanese values were used.
In the literature, the variation in health valuation has largely been explained by methodological differences across valuation studies and differences in the level of wealth and the level of education among populations [27
]. In our case the available value sets represented the preferences of Western countries of similar levels of education and similar levels of wealth. Although we cannot exclude that methodological differences played a role, we argue that these cannot fully explain the variation that was found (see also [46
]). All studies were conducted using face-to-face interviews, applied the TTO technique to elicit values, and included nationally-representative samples. In order to determine the valuation function, they used similarly specified least squares regression models representing the relationship between the TTO outcome and EQ-5D domains-levels and took account of within-individual error correlation [46
]. The main difference was the model used in the US, which included a different specification of the N2 and N3 interaction terms and the marginal HRQoL effects. The US value set took account of a decrease in the marginal reduction in HRQoL associated with further increases in the number of domains with any problems or extreme problems. Still, the extent to which the US valuation function generated different HRQoL scores not only depended on the interaction terms and marginal effects, but also on the values attached to the individual domains and levels. Additional file 3
shows for each value set the HRQoL score associated with certain health states to exemplify the differences.
Consequently, we argue that a more conceptual discussion is needed. Cross-country variation in values may reflect cultural differences or differences in the availability of certain social services (and therefore the perceived/expected impact of health impairments). Naturally, health-state values also differ among individuals [47
]. It may be argued that national or global value sets should cover this within-population variation in terms of values. In other words, the samples in elicitation studies need to be representative along the relevant population characteristics (similar to the other elements of SMPH). The cross-national differences in values need to be taken into account in the context of health-system-performance assessments and international comparisons of population health. In such studies, country-specific value sets may be preferred, since each health system should deliver outcomes according to the preferences of the population it serves and whose means are put in use. Moreover, the varying impact of health problems across countries needs to be accounted for. Some previous international comparisons of SMPH have used global value sets, based on the argument that health values are reasonably consistent across countries. However, the result of this study, similar to, for example, Üstün et al. [26
], points to the contrary and shows that variation in values may affect SMPH outcomes. A drawback of using country-specific value sets is that they may not always be available, as was experienced in this study and in previous studies (e.g. [21
]). In our opinion, the best solution is to calculate health expectancy by different foreign value sets and to compare the differences (as in Table ). Additionally, the use of country-specific value sets in international comparisons may deserve close scrutiny from an equity perspective, particularly if there is a relationship among values, true health status, and level of wealth. Populations with less exposure to what constitutes "full health" may assign lower values, i.e., a smaller loss in terms of HRQoL, to certain health problems. As a result, a particular health intervention will generate fewer benefits in these populations. From an equity perspective, this may be considered undesirable. This argument has not been tested empirically though, and may be less relevant when only high-income countries of similar levels of health are included, as in our study.
The issue of value-set choice not only pertains to HRQoL-based health expectancy. All SMPH using multiple health states, diseases, levels of disability, or other morbidity measures use a valuation function or a set of weights. Only measures such as disability-free life expectancy do not comprise value sets. Such approaches classify people in two groups: with or without disability or disease. In that case you simply multiply the proportion without any disability with the number of life years lived in a particular stratum. Obviously these are rather crude methods that neglect differences in severity levels.
Two other issues need to be raised regarding the valuation part of SMPH. First, a plus of the EQ-5D type instrument, particularly in case an economic perspective is required, may be that value sets have been elicited using a choice-based method (TTO technique). Choice-based methods are considered the preferred method among economists to elicit people's preferences. The extent to which the elicitation method affects cross-country differences is largely unknown. Some have argued that different elicitation methods generate a rather similar cross-country variation in terms of values, but more research is needed on this issue [47
]. Secondly, we need to address the question of whose values should be used. The value sets we used all represented general population values. Various authors have compared population values with patient values [48
]. From an economic perspective, population values may be preferred, since health systems consume public means and should therefore allocate their resources and outcomes according to the preferences of the general population [48
]. However, it was found that the general public attaches a much greater loss in terms of HRQoL to particular health problems than patients do. Although patients are better informed about the impact of morbidity, the adaptation effect is present among them [52
]. Expert opinion has also been applied in previous international studies on SMPH [24
]. The question is to what extent experts are able to assess the impact of different health states or diseases on people in general as well as for different populations. As a result this discussion appears unresolved.
As demonstrated by the decomposition, differences in QALE are also affected by differences in health states. Two major measurement issues should be discussed in this respect. First, although all studies used the same standardized EQ-5D instrument, the mode of administration differed across studies. It has been shown that telephone surveys in particular may generate more positive HRQoL scores compared to self- or interviewer-administered surveys [54
]. The surveys included in our study were conducted as face-to-face interviews (Armenia, Greece, Japan, Spain, and UK) or self-administered postal interviews (other countries). Only part of the German data was based on a telephone survey. A second major measurement issue regarding the measurement of nonfatal health outcomes is response heterogeneity. People who are in an objectively equal health state may respond differently to the same health question. Response heterogeneity can be explained by differences in norms and expectations, in awareness, and in access to health care across populations. It may affect the validity and the cross-population comparability of all SMPH using self-reported health data (in terms of health states, disability, or disease) [55
]. At the same time, the effect of response heterogeneity may somewhat be dampened if similar mechanisms also play a role in the valuation of nonfatal health outcomes. Some have argued that response heterogeneity may be less of a problem if different severity levels are included in the morbidity measure, since most threshold issues arise at the lower-valued mild-severity levels [1
]. Moreover, the problem may be greater in self-rated general health questions, and some authors even used EQ-5D type of questions as more objective health measures [56
]. Still, it remains unclear to what extent the reporting of EQ-5D health states, and our international comparison, have been subject to response bias. Whether response bias in the measurement of morbidity is related to the variation in the valuation of morbidity needs further investigation.
From a practical point of view, HRQoL-type of data may be preferred, since this approach may turn out to be less resource-intensive in terms of data gathering and data analysis than, for example, disease-based methods [22
]. The latter approach requires information on many types of diseases and on the impact of all diseases in terms of disability. At an international level, data availability may be limited, which could cause less accuracy of the results. Furthermore, the presence of comorbidity complicates disease-based calculations [58
]. In turn, an advantage of disease-based measures may be that clinical records or administrative records on the prevalence of diseases can be used. Such data do not suffer from self-report problems.
The following should be kept in mind while interpreting our results. First, the EQ-5D surveys were conducted in different years. This also holds for the value sets that were used, whereas preferences may change over time. It is unclear whether this is the case and to what extent this may have affected the results. We did see that value sets from similar years still showed substantial differences such as those from the Netherlands and the US or those from Germany and Japan. Future research could clarify to what extent health-related preferences change over time. Secondly, certain population groups were not included in the EQ-5D samples, such as inhabitants younger than 20 years and, in most surveys, people older than 85. Therefore we did not calculate QALE at birth and were unable to differentiate HRQoL within the 85-plus group. In addition, the surveys did not include the institutionalized population. However, due to a lack of comparable data, it is unclear to what extent this influenced the cross-country variation. Further, it was unclear whether all potential determinants of HRQoL were represented sufficiently. Thirdly, we did not take uncertainty in mortality into account because this information was not included in WHO life tables. However, there will be little uncertainty in life tables given the large population size. Consequently, the uncertainty in health expectancy particularly arises in the morbidity part of these measures [21
]. Finally, as discussed before, different researchers may have used slightly different protocols and analyses which may have affected the differences in value sets [46
In conclusion, we recommend that future international comparisons on SMPH profoundly discuss their value-set choice, including the theoretical and practical issues, and perform sensitivity analyses where possible and necessary. In addition, more qualitative research on the determinants of differences in valuation within and across populations is needed. This will improve the interpretation and the usefulness of HRQoL-based, and other, summary measures of population health.