|Home | About | Journals | Submit | Contact Us | Français|
Health expectancies are key indicators for monitoring the health of populations, as well as for informing debates about compression or expansion of morbidity. However, current methodologies for estimating them are not entirely satisfactory. They are either of limited applicability due to high data requirements (the multistate method) or based upon questionable assumptions (the Sullivan method).
This paper proposes a new method, called the “intercensal” method, which relies on the multistate framework using widely available data. The method uses age-specific proportions “healthy” at two successive, independent cross-sectional health surveys, and, together with information on general mortality, solves for the set of transition probabilities that produces the observed sequence of proportions healthy. The system is solved by making realistic parametric assumptions about the age patterns of transition probabilities. Using data from the Health and Retirement Survey and from the National Health Interview Survey, the method is tested against both the multistate method and the Sullivan method. We conclude that the intercensal approach is a promising framework for the indirect estimation of health expectancies.
Health expectancies are population health indicators which combine information on both quantity of life (through mortality) and quality of life (usually through disability). Health expectancies measure the number of years that an individual can expect to live in defined health statuses. For example, the life expectancy with disability corresponds to the number of years that one can expect to live with some disability. The disability-free life expectancy corresponds to the number of years that one can expect to live in the absence of disability. Disability and disability-free life expectancies add up to the total, conventional life expectancy.
Health expectancies are important indicators for several reasons. First, they allow the monitoring of the health of populations with a greater level of detail than traditional life expectancies (Mathers et al. 2003). International comparisons of life expectancies may hide important differences in levels of morbidity and disability. This becomes particularly critical as countries advance through the epidemiological transition and experience increasing proportions of deaths due to degenerative diseases often preceded by a period of disability. The World Health Organization has recognized the importance of health expectancies as population health indicators and has estimated them for 191 member states (World Health Organization 2004).
Second, trends in health expectancies are useful indicators for addressing the question of whether current increases in life expectancy are being matched by similar increases in healthy life. In recent decades, several hypotheses have emerged: the expansion-of-morbidity hypothesis, which states that seriously chronically-ill individuals are being kept alive by medical interventions, creating increasing demands on health and social care systems while generating limited improvements in the well-being of the population (Gruenberg 1977; Olshansky et al. 1991); the compression-of-morbidity hypothesis, which states that the average age at onset of disability increases faster than life expectancy, creating a decrease in the number of years spent with disability (Fries 1980, 1983, 1993); and the dynamic-equilibrium hypothesis, an intermediate hypothesis which states that when disability is defined as severe morbidity, the number of years spent with disability remains relatively constant as life expectancy increases (Manton 1982). These are important debates with implications for the future costs of health and social care systems in low-mortality countries. An international research network, the Network on Health Expectancy (Réseau Espérance de Vie En Santé, or REVES), is committed to promoting the use of health expectancies and to improving and harmonizing calculation methods (http://reves.site.ined.fr/en/home/).
In spite of the importance of health expectancies as population health indicators, current methodologies for estimating them are not entirely satisfactory. They are either of limited applicability because of high data requirements (the multistate method), or based on questionable assumptions (the Sullivan method). The multistate (or increment-decrement) life table method uses a framework that is consistent with the standard life expectancy calculations (Rogers et al. 1990). In a standard period life table, age-specific mortality rates are estimated for one period, and the life expectancy is calculated by simulating a fictitious (or synthetic) cohort of individuals exposed throughout their life time to the mortality rates of that particular period. Thus the period life expectancy summarizes mortality conditions of a particular period. The multistate life table also uses period conditions when calculating the number of years that one can expect to live in a particular health status. Period conditions are summarized with a set of age-specific “transition” rates observed during the period of interest. In the case of health expectancies involving only two health statuses (such as disability-free vs. disabled or, more generally, any “healthy” vs. “unhealthy” dichotomy), there are four possible transitions: from healthy to unhealthy; from healthy to death; from unhealthy to healthy; and from unhealthy to death. Health expectancies are calculated by simulating a synthetic cohort exposed throughout their life time to the transition rates of the period of interest. These health expectancies refer to individuals in a given health state at a given age, and are thus called “conditional” health expectancies. From this simulation are derived “period” or “equilibrium” proportions of healthy individuals, i.e., the proportions of individuals of a given age who are healthy in the synthetic cohort. By weighting conditional health expectancies with these equilibrium proportions, one can obtain “unconditional” health expectancies, i.e., the number of years that one can expect to live above age x in a particular health state, regardless of state at age x.
In spite of its methodological soundness and the richness of its output, the multistate method has one major drawback: it requires data from longitudinal surveys (Mathers 2002). These surveys, which are expensive and complex to carry out, are not widely available. As a result, the multistate method has produced only sporadic health expectancy estimates. These data requirements dramatically reduce the attractiveness of the multistate method, which at present does not appear as a realistic tool for producing long time series of health expectancies or making broad international comparisons of population health (Cambois et al. 1999).
Because of the high data requirements of the multistate method, research on health expectancies has more often relied on the Sullivan method (also called observed-prevalence method) (Sullivan 1971; Cambois et al. 1999). The Sullivan method is based on the assumption that the observed cross-sectional age-specific proportions of healthy individuals are equal to the “equilibrium” proportions defined above. Under this assumption, one can apply these observed age-specific proportions healthy to the total survival curve for the synthetic cohort, which produces a curve for the proportion of births that are healthy at age x. The area under this “healthy” survival curve gives unconditional health expectancies. If the assumption holds, these unconditional health expectancies are equal to the unconditional health expectancies produced by the multistate method. Because of this assumption, the Sullivan method requires only data from one cross-sectional census or survey, together with knowledge of overall mortality (usually an official life table).
However, there is no guarantee that the assumption of the Sullivan method holds in real situations. The currently-observed proportions healthy are the product of a history of mortality and disability that spans about a century, whereas equilibrium proportions are the product of mortality and disability for the current period only. The Sullivan method thus produces an indicator which is affected by the past history of the population (Brouard and Robine 1992). It is not a “pure” period index indicating the number of healthy years that one should expect to live under current conditions. In fact, the Sullivan method can be viewed as a composite method which combines synthetic cohort information (period survival) with actual cohort information (proportions healthy). Another drawback of the Sullivan method is that it produces only estimates of unconditional health expectancies. Age-specific transition rates and conditional health expectancies cannot be estimated with the Sullivan method.
There is some debate in the literature about the magnitude of the bias in the Sullivan method. Some authors have concluded on the basis of actual data that the differences between observed and equilibrium proportions healthy are significant, and that health expectancy estimates based on the Sullivan method should not be used for conclusions on compression or expansion of morbidity (Rogers et al. 1990; Barendregt et al. 1994, 1997; Lievre et al. 2003). On the other hand, Mathers and Robine (1997) have concluded on the basis of simulated scenarios that the differences are not very large and that the Sullivan method is acceptable for monitoring long-term changes in population health. These simulations have been later criticized by Barendregt and colleagues who believe that the chosen scenarios provided too favorable results in support of the Sullivan method (Barendregt et al. 1997). In spite of disagreement about the size of the biases, there is consensus that the multistate method is theoretically sounder than the Sullivan method (Laditka and Hayward 2003) and that one should be careful in interpreting Sullivan-based measures in terms of compression or expansion of morbidity (Nusselder 2003). In sum, it appears that the Sullivan method is the most-commonly used method more because of its reliance on widely available data than because of its reliability. Uncertainties about scenarios of compression or expansion of morbidity may be due in part to these methodological issues.
This paper explores a new approach for estimating health expectancies which relies on the multistate framework, but uses widely available data. The method uses observed proportions of healthy individuals at two successive, independent cross-sectional surveys, and, together with information on general mortality, solves for the set of period transition probabilities that produces the observed sequence of proportions healthy. The system is solved by making realistic parametric assumptions about the age pattern of transition probabilities. The proposed approach aims at resolving a major drawback of the multistate method by relying on widely available data. Indeed, cross-sectional proportions healthy are widely available from health surveys routinely conducted in various parts of the world and for various time periods, as well as from certain censuses (Myers and Lamb 1992). General mortality information is also widely available. This new approach, which applies when two health statuses (in addition to death) are considered, produces both conditional and unconditional health expectancies.
This is not the first time that two successive cross-sections are used for estimating period conditions. There is a family of indirect demographic methods, called “intercensal” methods, which uses demographic identities to estimate a single-decrement or multiple-decrement period life table from two cross-sections (Preston et al. 2001; Schmertmann 2002). The proposed method can be considered as an extension of the intercensal framework to the estimation of a multistate (increment-decrement) life table. We thus refer to this method as an “intercensal” method, although it can use survey data as well.
A number of authors have examined how successive cross-sectional data can be used to estimate a multistate system in the absence of direct information on actual transitions (Willekens 1982, 1999; Schoen and Jonsson 2003). However, the existing approaches (i.e., the Iterative Proportional Fitting (IPF) approach and the Relative State Attraction (RSA) approach) require absolute counts of individuals by state at two cross-sections. They thus have limited applicability when the data come from two independent sample surveys, where the absolute counts of individuals at two cross-sections cannot be related to one-another in a meaningful way. Unlike the IPF and the RSA approaches, the intercensal method does not use absolute counts of individuals as inputs and can thus be applied to data from independent sample surveys. In addition, it makes less constraining assumptions than the RSA approach, especially with regards to mortality (Schoen and Jonsson 2003).
In a different line of research, Davis et al. (2001) and Imai and Soneji (2007) have used successive cross-sections to estimate health expectancies. Their goal, however, was not to reconstruct the full increment-decrement system, which they recognize cannot be recovered with their methodology, but to estimate unconditional health expectancies, as in the Sullivan method. Moreover, they use successive cross-sections not to estimate inter-survey period conditions, but to track actual cohorts as they appear in these cross-sections and estimate corresponding cohort health expectancies. The approach they developed can thus be viewed as a version of the Sullivan method for the estimation of cohort health expectancies. In this paper, we go beyond previous research by proposing a methodology that uses data from two independent cross-sectional sample surveys with the aim of recovering the multi-state system prevalent during the inter-survey period.
In this section, the terms “healthy” and “unhealthy” refer to any dichotomous definition of health states. The basic equation of the intercensal approach expresses the observed population proportion of healthy individuals at age x+n and time t+n, Π(x+n, t+n) (=healthy individuals at age x+n and time t+n/all individuals at age x+n and time t+n), in terms of the observed proportion of healthy individuals in the same cohort at time t, Π(x, t), and the transition probabilities prevailing between t and t+n. The derivation of the equation involves following cohorts of healthy and unhealthy individuals between t and t+n, and comparing their respective evolution to that of the entire cohort, regardless of health status.
Suppose the following transition probabilities, which all refer to conditions of period [t, t+n]:
The observed population proportion of healthy individuals at age x+n and time t+n, Π(x+n, t+n), is equal to:
Equation (1) can be modified as follows:
By defining nrx = nqxUD/nqxHD (unhealthy/healthy mortality ratio) and using the fact that death probabilities for the entire population are weighted averages of the death probabilities for healthy and unhealthy populations (nqx = Π(x, t) nqxHD + [1-Π(x, t)] nqxUD), Equation (2) can be rewritten as:
If the only available information consists of two independent cross-sections of the population (with age and health status as variables), as well as a conventional life table prevailing between t and t+n, the known quantities of Equation (3) are Π(x, t), Π(x+n, t+n) and nqx. The other quantities, nqxUH, nqxHD, and nrx are unknown. When data are available for k age groups, Equation (3) expands to a system of k equations.
This system of k equations is obviously not solvable, because it has 3*k unknowns. However, the quantities nqxUH, nqxHD, and nrx do not vary randomly with age. On the contrary, they correspond to health processes that are clearly related to age (incidence of disability, recovery from disability, and active/disabled mortality ratio in the case of the active vs. disabled dichotomy). As discussed in the next section, these quantities follow some rather simple functions of age. Knowledge of the age patterns of these three functions will reduce the number of unknowns in the system of equations, and allow us to search for optimal solutions.
There is a body of literature showing that the four sets of transition rates in the active/disabled/dead multistate framework follow some well-defined age-patterns. For example, Rogers et al. (1990) and Crimmins et al. (1994) have shown that such transition rates are well described by exponential functions at ages 60 and above.
where hpxj is the monthly probability of remaining in state j between x and x+h, and where hqxjk is the monthly probability of moving from state j to state k between x and x+h (j = U, H and k = U, H, D). This age pattern is used as a basis for estimating transition probabilities with the IMaCh procedure, which uses longitudinal data. Lièvre et al. have applied this model to data from the Longitudinal Study on Aging (LSOA) and produced estimates of monthly transition probabilities for the sampled population. We converted these monthly transition probabilities into annual and 2-year probabilities (allowing multiple transitions during the two-year period) and found that these probabilities (i.e., nqxHU, nqxUH, nqxHD, nqxUD for n=1 or n=2) are very well fitted with simple exponential functions at ages 60 and above. This means that exponential assumptions for one or two-year transition probabilities are consistent with Lièvre et al.'s assumptions for monthly transition probabilities.
Our analysis of data from the 1998 and 2000 Health and Retirement Study (HRS) confirms the validity of exponential assumptions for two-year transition probabilities. The 1998 HRS is a nationally representative sample of elderly adults aged 50 and above living in households in the United States in 1998. After deleting 460 cases that were known to be alive but lost; and 63 cases whose survival status was unknown at the 2000 follow-up, the 1998 sample has 10,809 cases (i.e., 4635 men and 6174 women) between the exact age of 64 and 94. Among them, 521 men and 542 women died by the 2000 interview. The analysis does not use sample weights because our preliminary investigation indicates no substantial difference between the un-weighted and weighted data in the age patterns of health state transition probabilities.
We follow the convention of defining the two living states (active and disabled) in terms of functional limitations in the following six activities of daily living (ADLs): dressing, walking across the room, bathing or showering, eating, getting in or out of bed, or using the toilet (Katz et al. 1973). A respondent is defined to be disabled if s/he reported to have any difficulties that last more than three months with at least one of the ADLs, or to “cannot”, or “do not do” at least one of them, and to be active if otherwise. In the remainder of this paper, “H” refers to the active state, and “U” refers to the disabled state. Information for month and year of death is available from reports by proxy respondents, or from linkage with the 2000 National Death Index.
Figure 1 shows two-year transition probabilities for males and females in two-year age groups (centered at exact ages 65, 67, …, 93). As expected, the incidence of disability (2qxHU) increases with age, while recovery from disability (2qxUH) decreases with age. Death probabilities for active (2qxHD) and disabled (2qxUD) individuals both increase with age, but probabilities are higher and increase with age at a lower rate for the disabled. The four sets of transition probabilities are well fitted by exponential functions of age, as shown by the exponential curves in Figure 1. The observed and fitted age patterns of mortality for active and disabled individuals imply that the ratio of disabled to active mortality (nrx) is greater than one and follows a declining exponential function of age. These results are consistent with Lièvre et al.'s results with the LSOA data.
Knowledge of the age-pattern of nqxUH, nqxHD and nrx allows us to develop a system of equations (one equation for each age group) in which there is a relatively small number of unknown parameters. If all three functions follow an exponential function of age, i.e., nqxUH = α1 exp(β1 x), nrx = α2 exp(β2 x) and nqxHU = α3 exp(β3 x), we obtain the following equation for each age group:
Yx, Ax, Bx, Cx and Dx are the known quantities of Equation (3). Equation (5) can be viewed as a model in which a dependent variable, Yx, is related to five independent variables (x, Ax, Bx, Cx, Dx) in a nonlinear fashion.
The core idea of the proposed approach is to use nonlinear optimization techniques to estimate the unknown parameters of Equation (5). Optimization techniques solve systems of equations using iterative procedures. The first step of the optimization consists of defining an “objective” for the optimization, i.e., the quantity that needs to be minimized. The most common objective is the minimization of the sum of squared residuals, but other objectives can be specified. The system is usually solved through a procedure in which initial values of the parameters are iteratively improved until the objective is met (Bates and Watts 1988). In this paper, we use the CONOPT non-linear optimization solver interfaced with the programming language GAMS. This solver is useful for our purpose, because it allows us to specify bounds in the parameters, as discussed below in section 3.3 (Rosenthal 2008).
Once the parameters are solved for, it is possible to produce age-specific estimates of nqxUH, nqxHU and nrx, which, together with knowledge of nqx, are sufficient to recover the full set of period transition probabilities (nqxHU, nqxUH, nqxHD, nqxUD) consistent with the observed changes in proportions active and the observed overall death probabilities. Knowledge of these transition probabilities allows the estimation of a full multistate life table and corresponding period health expectancies (both conditional and unconditional), using classic multistate life table construction techniques (Preston et al. 2001).
It is important to note here that this approach involves probabilities, rather than rates, in the multistate system. Proportions of healthy individuals at two cross-sections are linked to one another via probabilities in Equation (3), and the parameters that are estimated in Equation (5) allow us to recover probabilities rather than rates. There are two reasons for relying on probabilities rather than rates. First, using probabilities produces a simpler relationship linking proportions of healthy individuals at two different dates, embodied in Equation (1). If we wanted to rely on rates, Equation (1) would have to involve a rate-to-probability conversion and its associated assumptions. This would substantially increase the complexity of the non-linear estimation process. Given that this method relies on very little input data, with only as many observations as age groups, we decided to rely on the most parsimonious relationship between the known and unknown quantities. The second reason is that the main goal of this method is to estimate health expectancies, which requires knowledge of probabilities. If the method was producing rates, we would still have to perform a rate-to-probability conversion in order to estimate health expectancies. In brief, using rates rather than probabilities would be at once more complex and unnecessary. One disadvantage of working with probabilities, though, is that the metric of probabilities (and, thus, the corresponding parametric assumptions) depend on the length of the age interval. By contrast, the metric of rates computed over discrete age intervals will not differ from that of the underlying continuous process (as long as they are expressed with the same exposure metric such as person-years). Age patterns of rates thus better reflect the underlying continuous process of interest, and can be more easily tailored to the various age interval configurations via the rate-to-probability conversion. We believe, however, that for this particular method, the advantages of working with probabilities outweigh the disadvantages.
In this paper, we test our method using two different strategies. We first rely on simulations, with the goal of testing the method's ability to recover the parameters that generated the simulated data. We then apply the method to two actual cross-sections from the National Health Interview Survey (NHIS), and compare how the method performs in comparison with the multistate approach based on longitudinal data from the HRS.
In a first series of tests, we generated a fictitious population of active and disabled individuals exposed to known transition probabilities, and tested the method's performance in the ideal situation where the parametric assumptions are exactly met and where there is no sampling variability.
For this purpose, we used the HRS 1998-2000 transition probabilities for males and females combined, which we fitted with exponential functions, as shown in Figure 1. This produced the “true” parameters for the three unknown functions of Equation (5). These parameters are shown in Table 1. Values of the fitted functions are shown in Table 2.
We then generated population proportions of active individuals at time t, Π(x,t), by smoothing the HRS 1998 proportions in two-year age intervals centered at exact age x, between ages 65 and 93, for both sexes combined. (This series is arbitrary; the test could be conducted with any series of proportions active at time t.)
These proportions of active individuals at time t were then projected to time t+2, using equation (1) and the fitted transition probabilities. This step required knowledge of 2qx (total mortality), which we calculated using the fitted mortality curves for the active and the disabled, along with the population proportions active at time t (nqx = Π(x, t) nqxHD + [1-Π(x, t)] nqxUD).
Population proportions of active individuals at time t and t+2 are shown in Table 2, together with the age-specific transition probabilities (including total mortality) that agree with these two cross-sections. The health expectancies that correspond to these age-specific transition probabilities are shown in Table 3. (In order to avoid the use of age extrapolation in these tests, we calculated partial health expectancies at age 65, truncated at age 95.) These are the “true” health expectancies that the new method seeks to estimate.
The population proportions active at time t and t+2 shown in Table 2, together with overall mortality values (nqx), give us all the information to calculate the “known” elements of Equation (5), i.e., Ax, Bx, Cx, Dx, and Yx. We can then use these data as inputs in the optimization solver, with the goal of recovering the original parameters that generated the data and estimating corresponding health expectancies.
Using these inputs, the optimization solver converged to the true parameters. This occurred regardless of the choice of starting values in the solver. This means that in a situation where there is no sampling variability and where the parametric assumptions for the transition probabilities are perfectly met in the population, the unknown parameters can be exactly recovered with this approach. The unknown parameters were also exactly recovered with less sophisticated procedures, such as STATA's non-linear regression procedure.
In reality, even if the parametric assumptions are met in the population, the observed proportions active at two dates may not come from population data but from sample data. It is thus important to introduce sampling variability in the data for a more realistic test of the method's performance. We thus drew random samples of various sizes from the two population cross-sections generated in the above section. These simulated samples were then used as inputs in the intercensal method, thus testing the method's performance in a situation where the parametric assumptions are met and where the only source of uncertainty comes from sampling variability.
Specifically, we took the population proportions of active individuals shown in Table 2 as the “true” proportions. At each age group at time t and t+2, each individual was a random draw with the probability of being active equal to the population proportions active. The sample sizes are 20,000, 10,000 and 5,000 individuals in 1998 and 2000, with an age distribution following that of the US population in 1998 and 2000. For each pair of samples, the proportions active at time t and t+2, together with overall mortality values (nqx), were used as inputs in the optimization solver. (No sampling variability was introduced in the overall mortality values, since we expect mortality data to come from official, population-based estimates.) As above, the goal is to recover the original parameters that generated the data and estimate corresponding health expectancies. These estimated health expectancies can be compared to the “true” health expectancies shown in Table 3.
CONOPT, the optimization solver which we used, allows us to specify constraints for the unknown parameters. It is useful to take advantage of these constraints now that we introduced sampling variability in our simulated data (no constraints were necessary when we used population data as inputs). Drawing from our review of the empirical evidence on age patterns of disability, we used the following constraints in our age range 65-93: (1) probabilities nqxHU and nqxUH need to be between 0 and 1 (by definition); (2) the slope of nqxHU needs to be positive (the probability of developing disability increases with age); (3) the slope of nqxUH needs to be negative (the probability of recovering from disability decreases with age); (4) nrx can only be above 1 (the mortality for the disabled must be greater than that for the active); (5) nrx must decrease with age (the mortality differential between the disabled and active decreases with age). In addition to these constraints, we used a set of bounds for the parameters. We chose bounds that were large enough to encompass plausible variability in the functions nqxUH, nqxHU and nrx, while ensuring that the above constraints are met. We tested two sets of bounds, a narrower one and a wider one. These bounds are provided in Table 4 and shown in Figure 2. Relative to the observed data, the wide bounds include very high and very low levels of the three functions nqxUH, nqxHU and nrx. While we would need to have empirical estimates of transition probabilities in a wide range of contexts to be able to fine tune these bounds, the wide bounds appear large enough to encompass a wide variety of situations.
We used the minimum of the sum of squared residuals (min Σx (Ŷx-Yx)2) as the objective for the optimization solver. With both narrow and wide bounds, all 1000 samples had locally optimal solutions, which is all that can be guaranteed for a nonlinear model. The convergence took 20 to 30 iterations on average. We used various combinations of starting values for the 6 unknown parameters of Equation (5). Regardless of the choice of starting values, the optimizer always converged to the same solution.
We evaluate the intercensal method by comparing health expectancies estimated with this method to the “true” health expectancies shown in Table 3. Results for each conditional or unconditional health expectancy are presented in Figure 3--55 according to the following configurations: (1) Sample size (20K, 10K or 5K per cross-section); (2) Bounds in nonlinear optimization (narrow, wide). For each combination of these configurations, we present how health expectancies estimates are distributed among 1,000 pairs of samples randomly drawn from the simulated population at time t and t+2. Each sampling distribution is presented in two ways: (1) box plots showing medians and quartiles (boxes represent inter-quartile ranges or IQR, whiskers extend to 1.5 times the IQR below the first quartile or above the third quartile, and estimates beyond that range are considered outliers and are represented by dots); (2) means and 90% uncertainty intervals (means are represented with dots, and whiskers around these dots represent the range between the 5th and 95th percentile).
These sampling distributions are compared to the true health expectancies, represented with a dashed line. Results are shown in Figure 3 for health expectancies conditional on being active at age 65, Figure 4 for health expectancies conditional on being disabled at age 65, and Figure 5 for unconditional health expectancies.
We observe the following patterns in these results. Overall, the intercensal method appears to work remarkably well for health expectancies conditional on being active at age 65 (Figure 3), and for unconditional health expectancies (Figure 5). For these health expectancies, the method offers little bias and sampling error; the means and medians of the sampling distributions are typically within a few decimal points from the true value, and the size of the 90% uncertainty interval rarely extends beyond one year.
There is more bias and more sampling error for health expectancies conditional on being disabled at age 65 (Figure 4), especially when wide bounds are used. For these values, means and medians of sampling distributions are typically within one year of the true values, and uncertainty can be quite substantial. This may be due to the fact that these conditional health expectancies for the disabled are greatly affected by the estimated value of 2qxUH at age 65. Indeed, 2q65UH is high in comparison with the other transition probabilities at that age, which means that small relative errors in the value of that probability (estimated with α1) will have a large impact on the health expectancies conditional on being disabled. On the positive side, errors in these conditional health expectancies have little impact on the unconditional health expectancies shown in Figure 5, because they pertain to a small proportion of the population at age 65.
Comparing different combinations of sample size and bounds, we find the following patterns in terms of bias and sampling error. First, as expected, larger samples produce better results, but the gain is not always substantial. Second, the choice of narrower bounds in the nonlinear optimization provides better results. The gain, however, is not very substantial, except in the case of health expectancies conditional on being disabled at age 65 (Figure 4). For these health expectancies, the use of narrow bounds produces much less bias and sampling error.
For the unconditional expectancies, we are also able to compare the performance of the intercensal method to results from the Sullivan method. (As discussed in the introduction, the Sullivan method does not provide conditional health expectancies.) These Sullivan estimates, shown in Figure 5, were calculated by combining simulated sampled proportions at time t with overall mortality between t and t+2. We find that the intercensal method produces slightly less bias than the Sullivan estimates. The Sullivan estimates, however, have less sampling variability. This is due in part to the fact that, in the Sullivan method, sampling variability comes from only one sample, while in the intercensal method, sampling variability comes from two independent samples. This additional uncertainty, however, does not jeopardize bias, which is slightly smaller for the intercensal method. This smaller bias may be due in part to the fact that the intercensal method, like the direct multistate method, produces estimates that refer to a synthetic cohort subject to transitions probabilities prevalent during t and t+2 (like the true values), while the Sullivan method produces results that are affected by transitions occurring before time t.
In order to test the limits of this approach, we also run the optimization solver with no bounds, keeping only the five constraints specified above. The no bounds results were less satisfactory. First, about 5% of the samples encountered infeasibility and unboundedness. Second, while the IQR and the 90% uncertainty intervals for the feasible solutions were comparable in range to the wide-bounds results, a few solutions, up to 15 out of 1000 samples, had values far outside the range observed. While the nonbounded results are useful to test the limits of the estimation procedure, they put us in an unnecessary difficult situation by ignoring a priori information about the order of magnitude of the functions. For example, it might not be useful to allow the procedure to yield results where the death probability of the active is more than 12 times that of the disabled at age 65. While the proposed narrow and wide bounds could be fine tuned by examining transition probabilities in a wide range of contexts, we believe there is more a priori information to be used than just the five constraints specified above.
Since the purpose of this method is to estimate health expectancies, we focused our discussion of the results on errors in this outcome measure. However, transition probabilities are also an outcome of interest for a number of purposes. We observe that probabilities are less accurately estimated than the health expectancies produced by them. Values of nqxHU and nqxUH, in particular, tend to be seriously underestimated. These errors largely offset each other when calculating health expectancies.
In order to illustrate this point, we treated the two waves of HRS as cross-sectional and calculated proportions active in 1998 and 2000. We combined these values with official NCHS mortality probabilities for the year 1999 in the intercensal procedure. We then compared the estimated transition probabilities with those directly observed when the HRS is used longitudinally. This comparison, presented in Figure 6, shows that 2rx, and consequently, 2qxHD and 2qxUD, are rather well estimated with the procedure. Values of nqxHU and nqxUH, however, are systematically underestimated. These patterns of errors are rather common in our simulations. They affect health expectancies conditional on being disabled, as we saw in Figure 4. However, they largely offset each other in the estimation of the other health expectancies.
The above simulations provide a test of the intercensal method in situations where the parametric assumptions are exactly met and where the only source of error comes from sampling variability. They do not fully replicate real-life situations where the underlying assumptions might not be exactly met, and where the three sources of information (two cross-sections and overall mortality) come from three independent sources, each with their own potential errors.
To perform a more realistic test, we applied the intercensal method to data from the NHIS. The NHIS is a continuing annual household survey of the civilian, noninstitutionalized population of all ages in the United States. Each week a probability sample of households is interviewed. In this test, we used the 1999 and 2001 annual samples. The 1999 annual sample consists of 37,573 households, yielding 97,059 persons. The 2001 annual sample consists of 38,932 households, yielding 100,761 persons. As the NHIS top codes all respondents aged 85 and over as 85, we restricted the 1999 sample to those aged 64-81 and the 2001 sample to those aged 66-83, resulting in 9854 and 8994 cases, respectively. As in the case of our earlier HRS analysis, we did not use sample weights in the NHIS samples.
Our definition of health states in the NHIS is not identical to the one we used in the HRS. This is due to the fact that, instead of asking about having any difficulty as in the HRS, the NHIS health questions ask about needing help. As the subject may have difficulty in functioning without needing help, the prevalence of functional limitations in the NHIS is generally lower than that in the HRS for the same type of activities. To make the level of prevalence comparable between the two surveys, we expanded the list of activities in the NHIS to include: (1) personal care needs such as eating, bathing, dressing, or getting around inside the home; (2) routine needs such as everyday household chores, doing necessary business, shopping, or getting around for other purposes; and (3) being limited in the kind or amount of work one can do. Cases with a positive response to any of these items were coded as disabled.
For overall mortality pertaining to the inter-survey period, we used U.S. mortality data from the Human Mortality Database for the year 2000. These data are based on official life tables from NCHS. Because of the age restriction in the NHIS, we used 9 age groups only (65-81) as inputs. The input data are shown on Table 5.
In this more realistic test, it is not possible to compare the method's estimates to some unambiguous “truth”, because no exhaustive population information exists on transitions among health statuses that would allow us to estimate health expectancies in a non-parametric fashion. Information on transitions in and out of health statuses typically comes from samples, and sample sizes are usually too small to allow non-parametric calculations. However, in recent years, the IMaCh approach mentioned earlier has emerged as the most reliable method for estimating health expectancies in a synthetic cohort using longitudinal data. We thus used health expectancies produced using IMaCh as our reference against which results of the intercensal method were compared. These IMaCh estimates can be viewed as “direct” multistate estimates, because they use direct observations of individual transitions between states, contrasting with the intercensal estimates which can be viewed as “indirect” multistate estimates. (As explained earlier, IMaCh makes assumptions about the age pattern of transitions that are comparable to those made in the intercensal method.) The data source used for the IMaCh-based reference estimates is the HRS for a period comparable to the period used for NHIS. We obviously cannot use data from the NHIS with IMaCh, because the NHIS data are not longitudinal. We purposely did not use the HRS data as input in the intercensal method for this comparison, because these cross-sections would not be truly independent and would thus provide a test that might favor the new method. Results of this test are shown in Figure 7. (Here also, in order to avoid extrapolation beyond the age for which we have data, we calculated partial health expectancies at age 65, with a truncation age of 83.)
Results indicate that, compared to the IMaCh approach using HRS, the intercensal method provides remarkably comparable estimates for unconditional health expectancies (.H and .U), and for health expectancies conditional on being active at age 65 (HH, HU and H.). For these health expectancies, errors range from .3 to .8 years with narrow bounds; and from .4 to 1.2 with wide bounds. As in our earlier test, the errors are larger for health expectancies conditional on being disabled at age 65, especially when wide bounds are used. Active life expectancy (UH) tends to be underestimated, while life expectancy in the disabled state (UU) tends to be overestimated. These two errors balance out for the total life expectancy conditional on being disabled (U.), producing an error of .4 years when narrow bounds are used, and .9 years when wide bounds are used. These results are consistent with the results of the earlier simulations.
For unconditional health expectancies, we are also able to compare the results of the intercensal method to those of the Sullivan method. The intercensal method and the Sullivan method perform similarly well, with little difference between wide bounds and narrow bounds. Here also, this is consistent with the results of the simulations.
In light of the simulations and examples developed in this paper, the intercensal approach appears as a promising framework for indirectly estimating health expectancies. In this section, we discuss the advantages and disadvantages of this approach, compared to the two most common approaches that have been used so far: the multistate approach using IMaCh, and the Sullivan method.
The most important advantage of the intercensal approach, compared to IMaCh, is that it allows the calculation of conditional and unconditional health expectancies without resorting to longitudinal data. Transitions among health statuses, and deaths by health status, are not needed. Only two independent cross-sections, producing proportions active, and a life table pertaining to the inter-survey period are needed. These data are widely available. The two methods rely on comparable parametric assumptions. The disadvantage is that the intercensal method produces health expectancy estimates that do not always agree with the underlying population health expectancies. However, except in the case of conditional expectancies for the disabled, the errors are not substantial. The population of disabled individuals at age 65 is a small population group, and therefore unconditional health expectancies are not substantially affected. We thus believe that the intercensal approach is a promising alternative to the data-costly IMaCh approach.
The advantages relative to the Sullivan method are many. First of all, unlike the Sullivan approach, the intercensal method allows the estimation of conditional health expectancies. Health expectancies vary substantially depending on one's health status at a given age, and therefore conditional health expectancies are important health indicators. A second advantage is that the intercensal method, like IMaCh, does not rely on the assumption that the observed prevalence of disability of the population is equal to the that of the synthetic cohort. It is therefore more theoretically correct. Another advantage is that the intercensal method produces unconditional health expectancy estimates that are slightly less biased than those resulting from the Sullivan method. The disadvantage relative to the Sullivan method is that it requires two cross-sections, instead of one in the case of the Sullivan method. This is a minor disadvantage given that successive cross-sections are widely available. A related disadvantage is that the unconditional health expectancies produced by the intercensal method have more sampling variability than the Sullivan method. This additional sampling variability is modest, however, and seems to be an acceptable price to pay in view of the advantages that the intercensal method offers compared to the Sullivan method.
The intercensal method developed in this paper allows the estimation of health expectancies (conditional and unconditional) without the use of longitudinal data. It only requires two successive cross-sections of the population and overall mortality pertaining to the inter-survey period. Assumptions regarding the age pattern of transitions are similar to those made by IMaCh.
The intercensal method works perfectly when there is no sampling variability and when parametric assumptions are exactly met in the population. When we introduce sampling variability, the method appears to work remarkably well for estimating health expectancies conditional on being active at age 65, and for unconditional expectancies.
The amount of error is somewhat larger for health expectancies conditional on being disabled at age 65. Fortunately, this is a small population group at age 65, and therefore these errors do not compromise the multistate system and the other health expectancy estimates. Nonetheless, further research is needed for improving the performance of the method for these conditional health expectancies.
One direction for future research is to examine the age patterns of transitions in and out of disability using other available longitudinal data sets besides the HRS. This would help determining if the parametric assumptions made in this paper hold in a variety of settings. Additional information on transition probabilities would also allow us to examine in greater detail the amount of variation in transition probabilities, and would help choosing perhaps more sensible bounds for the values of the parameters. The age pattern of transition probabilities also needs to be studied for various widths of age intervals (e.g. 3 years, 5 years, etc.), and for other healthy vs. unhealthy dichotomies.
We believe, however, that the method introduced in this paper provides a useful framework for estimating health expectancies in the absence of longitudinal data.
The authors are grateful to Jason Fine, Stephen Wright, and anonymous reviewers for their useful comments. This research was supported by core grants to the Center for Demography and Ecology at the University of Wisconsin-Madison (R24 HD047873) and to the Center for Demography of Health and Aging at the University of Wisconsin-Madison (P30 AG017266), and by a research award from the Graduate School, University of Wisconsin-Madison.
Publisher's Disclaimer: This open-access work is published under the terms of the Creative Commons Attribution NonCommercial License 2.0 Germany, which permits use, reproduction & distribution in any medium for non-commercial purposes, provided the original author(s) and source are given credit. See http://creativecommons.org/licenses/by-nc/2.0/de/