The SPACE program is a statistical program developed to estimate the MSLT functions from survey data. It consists of multiple sets of PC-based SAS^{®} programs with different modeling capacity. However, they are structured similarly: each set of the program contains two components – the data component and the statistical component. The data component prepares the input data sets – both for the full analysis sample and for the bootstrap samples drawn from the full sample – for the statistical component. The statistical component estimates transition rates or probabilities and the MSLT functions based on the estimated parameters. The output includes estimates of MSLT functions and their variances.

2.1 The MSLT Model

The MSLT model characterizes population movement over time in a finite, discrete and mutually exclusive state space as a Markov process. Two types of MSLT models can be fitted in the SPACE program: one follows a first-order Markov chain where the transition probabilities are conditional only on the current age and status, and another follows a semi-Markov process (SMP) model where the transition probabilities are age, status and duration dependent. Since we compare to the IMaCh estimates in this study, we will only consider the duration-independent MSLT model in detail here.

The SPACE program takes the traditional event history approach to estimating the MSLT model parameters by assuming that the observed events are independent, as in the IMaCh and GSMLT programs, and complete (i.e., no missing events between two successive observations). When a person has the same status in both occasions, then it is assumed that no event has occurred between the two observations; if the observations are different, then it is assumed that only one event has occurred. This assumption is restrictive, as spells of short durations may occur frequently between interviews (

Hardy et al. 2005). As interviews become less frequent (e.g., every five years), the number of events that the model fails to capture are likely to rise and additional bias can be introduced into the estimates of MSLT functions (

Gill et al. 2005;

Wolf and Gill 2009)

The SPACE program uses a class of discrete-time hazard models to estimate the age-specific transition probabilities or rates from sample data. The SPACE program can fit a multinomial logistic regression as in

Laditka and Wolf (1998) to estimate the transition probabilities directly. The equation takes the following basic form:

where

*p*_{ij} (

*age*,

*t*) is the transition probability from the current state

*i* to state

*j* (

*i*,

*j* = 1,2,3,…,

*n*,

*i* ≠

*j*) over the annual interval at

*age*_{t},

*a*_{ij} is the intercept, and

*b*_{ij} is the coefficient for age at the beginning of annual interval, and

*c*_{ij} is the set of coefficients for other covariates.

Alternatively, the SPACE program can estimate the transition rates as in

Hayward and Grady (1990) and

Crimmins, Hayward, and Saito (1994). The model takes the following form:

where

*μ*_{ij} is the instantaneous transition rates from current state

*i* to state

*j* (

*i*,

*j* = 1,2,3,…,

*n*,

*i* ≠

*j*) over the annual interval at

*age*_{t}. The transition rates are converted into transition probabilities using the formula in

Crimmins, Hayward, and Saito (1994:165).

The SPACE program imposes no limit on the number of non-absorbing states in the MSLT models; the number of states depends only on the data and research objective. Also, multiple categorical covariates measured at study baseline (e.g., gender, race/ethnicity, education, etc.) can be included. These covariates may each have more than two categories and their values remain fixed during the survey period.^{10} Finally, the SPACE program permits uneven lengths in observation intervals, which are calculated as the number of years between interviews.^{11} The program converts input data from person-year format to annual interval format in which each line of data represents the movement from one interview to the next over the period of one year. If the length of the interval between interviews is two or more years, then a corresponding number of annual intervals are created to facilitate the computation of annual transition parameters. The events are assumed to occur randomly in one of these annual intervals. This approach results in the events occurring, on average, in the middle of the observation interval.

The outputs of these regressions are age-specific transition probability estimates for all possible transitions, conditional on the other covariates included in the model. The SPACE program provides two options to estimate MSLT functions: the deterministic approach as in IMaCh or GSMLT or micro-simulation. The deterministic approach can only estimate HE, while micro-simulation can be used to estimate HE and a variety of other MSLT functions. The details of micro-simulation are discussed in section 2.4.

2.2 Data

The study population for this analysis is drawn from the 1998–2002 panels of the Medicare Current Beneficiary Survey (MCBS). The MCBS is a nationally representative, multi-stage, longitudinal panel survey of the Medicare population, sponsored by the Centers for Medicare and Medicaid Services (CMS), and conducted continuously since 1991. The survey gathers data on a wide range of topics including health status, socio-demographic information, and the use and cost of medical services. Survey records are linked to administrative data on use of and expenditures on Medicare-covered services (hospital, physician, etc.) and on vital status.

The MCBS follows a rotating panel design. Each year a new panel that is representative of the current Medicare population is selected from the list of eligible beneficiaries. Each person in the panel is scheduled to receive 12 interviews over a four-year period, with information on self-reported health status collected once a year in the Fall. The panel is not renewed during the four-year period, but the rates of attrition are small and decline over time. Given the adjustment for survey non-response in sample weights, the bias in estimates is substantially reduced or eliminated (

Kautter et al. 2006).

The MCBS has all the elements of a complex survey. Strata are created based on the characteristics of primary sampling units (PSUs), which are basically large geographical areas (i.e., a Metropolitan Statistical Area (MSA) or group of contiguous counties). The largest MSAs in the country are selected with probability one; each is essentially a “stratum.” Within these certainty strata, the individual zip-code clusters are considered PSUs for variance estimation. For each of the non-certainty strata, two PSUs are selected. The analysis sample used in this study has a total of 112 strata and 1,168 PSUs. The individual Medicare beneficiaries are then selected in the third stage (i.e., within each zip-code cluster) stratified by seven age groups (under 45, 45 to 64, 65 to 69, 70 to 74, 75 to 79, 80 to 84 and 85 and over). The oldest people (85 and over) and disabled people (64 and under) are oversampled to allow detailed analysis of their health status and health care needs.

For this study, we use the 1998–2002 panels that contain 14,892 elderly beneficiaries. We exclude 1,017 persons of Hispanic origin or other racial/ethnic groups to focus on non-Hispanic whites and blacks only, since the IMaCh program does not accept more than two categories at a time for any covariate. The full analysis sample contains 50,830 person-year observations for 13,875 persons of age 65 and older.

For explication purposes, we use a simple dichotomous measure of health based only on the presence of limitations in activities of daily living (ADLs). A person is considered disabled (*i* = 1) if he or she either responds “yes” to having difficulty with one or more of the six ADLs (bathing, dressing, eating, transferring, walking, and using the toilet), or responds “does not do the activity because of a health or physical problem.” Otherwise, this person is considered non-disabled or ‘active’ (*i* = 2). For those who report limitations with any of the activities, the survey also asks whether they receive help from another person and/or use special equipment. We do not consider “receiving help” in defining functional disability in this study. Survey respondents can move between the disabled and non-disabled states over time; while ‘dead’ (*i* = 3) is an absorbing state.

2.3 Estimation algorithms for model parameters

The regressions specified above are carried out by two SAS procedures: PROC LOGISTIC and PROC LIFEREG. These procedures offer powerful modeling capabilities. For example, users can use the current state as one of the covariates, instead of using it to stratify the data. Users can select the best functional form for the whole sample, or choose different forms for subsets of the data.^{12} They can also relax the linear relationship between age and the *logit* to test other functional forms (e.g., logarithm or polynomial functions), or even to evaluate different forms of the link function (e.g., cumulative, multinomial or complementary log-log). This degree of flexibility is not available in the IMaCh and GSMLT programs.

2.4 Micro-simulation and bootstrapping

Micro-simulation is the main computation technique used by the SPACE program to estimate MSLT functions. Compared to the deterministic approach, micro-simulation holds substantial advantages in terms of the scope of statistics one can calculate. With the deterministic approach, one essentially moves the entire population through the transition matrices, with little insight into individual dynamics. As a result, only a few summary statistics can be derived. Micro-simulation, however, simulates the life path of all members of the population such that a wide variety of summary statistics of the population dynamics can be derived.

The population simulated in the SPACE program is not arbitrary. It is characterized by the estimated transition parameters, conditional on the covariates included in the regression models. To briefly describe how it works, suppose we want to simulate the life histories of a 100,000-person cohort of 65-year old black men. For a hypothetical member of this cohort, we first randomly assign an initial health status, say, active, at age 65 based on the weighted health distribution for black men at age 65 from the input data. We then evaluate possible health changes between age 65 and 66 by comparing a random number from the uniform distribution with the transition probabilities for the age 65–66 interval, given his current status of active health. If his health status changes to disabled, then we generate a new random number from the uniform distribution to compare with the transition probabilities for the age 66–67 interval, conditional on being disabled at age 66. The result of this comparison determines if his health status changes again between age 66 and 67, and is repeated one year at a time until his eventual death. Once this process is repeated for all members of the cohort, we have a complete record of individual health histories from which MSLT functions can be easily calculated by averaging over the individual records. For example, total life expectancy (TLE) is computed by the average number of years lived for the simulated 100,000-person cohort. HE, including expected length of time spent in both active health (ALE) and disability (DLE), is computed by the average number of years spent in each health state.

It is worth noting that the simulation is based upon parameters of a period life table, where the experience of an eighty-year old today is assumed to hold for the next twenty years when a current sixty-year old reaches his or her eightieth birthday. This is unlikely to be the actual experience of any individual. Therefore, the simulated life paths should not be regarded as the actual experience of a single cohort over time.

In order to test the group differences in MSLT functions, we need to estimate their variances. Evaluating group differences in MSLT functions is different from evaluating the differences in parameter estimates in the event-specific models since they arise from a complex set of transitions that are not immediately obvious. For example, suppose one wants to test the hypothesis that males and females differ in their expected health over the life cycle (i.e. expected years in various health states). Health expectancy is determined by age-specific rates of movement into and out of the health states, including the risk of death from each health state. How sex is associated with health expectancy, however, may be unclear for a number of reasons. Sex may significantly affect some transitions and not others. Or, sex effects, even though statistically significant, may be offsetting. It is also possible that sex effects may be non-significant for the whole set of transitions, yet the consistency of effects for a lengthy period of time may combine in a way in which the sex effects are reinforced and magnified with age. The hypothesis of sex differences in health is therefore global in the sense that it takes into account sex differences in *all* of the transitions defining the process of life cycle health.

In addition, variance estimation for a complex survey needs to consider sources of variability due to stratification and multi-stage clustering that are not present in a simple random sample (SRS). Treating a stratified sample as a SRS usually overestimates the variance, while treating a clustered sample as SRS usually underestimates the variance. Although the net effect is often not obvious, it is nonetheless clear that ignoring the complex sampling design can lead to incorrect statistical inference (

Lohr 1999).

To address these issues, the SPACE program uses a version of the rescaling bootstrap method developed specifically for complex surveys (

Korn and Graubard 1999:32–33;

Rao and Wu 1988;

Sitter 1992a), and which has been implemented in recent demographic studies (e.g.,

Cai and Lubitz 2007;

Cai et al. 2006). This approach samples

*n*_{h} − 1 PSUs with replacement within the stratum

*h*, where

*n*_{h} is the number of PSUs in stratum

*h*. For each PSU

_{i} sampled from stratum

*h*, the original sample weight is multiplied by

, where

*m*_{i} is the number of times the PSU

_{i} is selected. If a rare event is not represented in a particular bootstrap sample, the sample can be redrawn.

It is worth noting that this particular procedure has two potentially offsetting sources of bias. First, this procedure resamples only at the PSU level and thus will underestimate the variance for a multistage survey. This source of bias is not likely to be significant, however, since the additional variability due to sub-sampling at later stages is usually negligible compared to variability at the PSU level (

Lohr 1999). Second, this procedure draws the bootstrap samples with replacement, which may lead to overestimation of the variance for data sampled without replacement. This second source of bias may be negligible if the first-stage sampling fraction is small (

Rao 1988). If not, then alternative procedures specifically developed for without-replacement samples (e.g.,

Bickel and Freedman 1984;

Sitter 1992b) can be considered. But these procedures are more difficult to implement, however, and require knowledge of the sampling fraction, which is typically not available to researchers using the public versions of the survey data.

The bootstrap method usually requires more computation, and its theoretical properties in complex surveys are not fully studied (

Lohr 1999). There is also evidence from simulation studies that the bootstrap method may not outperform the Jackknife and the Balanced Repeated Replication (BRR) methods for stratified one-stage SRS with replacement (

Kovar, Rao, and Wu 1988). But, the bootstrap method also has a number of advantages. It can be used to estimate the variance for a broader class of statistics, including sample quantiles. It can also provide consistent variance estimators for surveys with imputed data, and has a “higher potential to be extended to other complex problems” than the BRR and jackknife approaches (

Shao and Tu 1995:280). From a practitioner’s perspective, it is reasonable to conclude that the bootstrap method is a suitable all-purpose variance estimator for MSLT functions.

2.5 Comparison of SPACE with IMaCh and GSMLT

There are several major differences among the three programs. The first is their assumptions about the “completeness” of observed data. The IMaCh program takes the eMC approach assuming that observed data are incomplete and allows multiple events between successive interviews. The SPACE and GSMLT program, on the other hand, takes the traditional event history approach assuming that the observed data are complete and allows only one (or zero) event between successive observations. Although the assumption of eMC is conceptually more realistic, a recent study shows that the estimates of HE and transition probabilities from both approaches are surprisingly similar – both are biased (

Wolf and Gill 2009). Since HE estimates may be insensitive to the length of the interval between interviews for up to two years (

Gill et al. 2005), it is possible that estimates based on both approaches become more biased as the interview becomes less frequent.

Another difference lies in their treatment of the design factors of the survey data. Currently, the GSMLT program makes no adjustment for survey design and assumes input data are from a SRS, while the IMaCh program makes only limited adjustment by assuming that the survey design affects only the sample weight. Since the sample weight reflects the probability of the selection of

*individuals*, not

*clusters*, it cannot remove the bias in variance estimates in clustered survey data (

Lohr 1999).

The three programs also differ in the scope of statistics they produce. The SPACE program permits users to estimate a variety of MSLT functions because of its use of micro-simulation, while the other two programs offer limited choices. The IMaCh and GSMLT programs focus on HE estimates. Although the GSMLT program also estimates MSLT functions that can be expressed as a function of the transition probabilities, many statistics that may be of interest to researchers do not meet this criteria (e.g., the probability of death within two years after two prior episodes of disability, or the median of healthy life years). In addition, the complexity of expressing the mathematical relation of MSLT functions to transition probabilities may be challenging for many users.

2.6 Application

In this section we will present the results from an application of the SPACE program and compare them to the IMaCh program. shows selected characteristics of sampled persons in the study population. The majority of sampled persons are female, non-Hispanic, white, and between the ages of 65 and 74 and free of ADL limitations. The educational achievement of the panels shows some improvement between 1998 and 2002. The proportion with less than a high school education dropped from 29 percent to 24 percent, while the proportion of high school graduates and those with more than high school education (including college and vocational training) increased from 71 percent to 76 percent. The prevalence of active health dropped slightly from 62 percent in the 1998 panel to 59 percent in the 2002 panel, while the prevalence of ADL limitations increased slightly.

| **Table 1**Characteristics of the analysis sample of 1998–2002 panels in MCBS |

2.6.1 Comparison of IMaCh and SPACE estimates The coefficient estimates of the logistic regressions from both programs are shown in . We fit the same logistic regressions of the form of

eq. (1) for both the IMaCh and SPACE programs. In , the IMaCh coefficients are estimated with both one-month and 12-month transition intervals, and the SPACE coefficients are estimated with an annual interval. For both programs, the gender coefficients indicate that elderly women are more likely than elderly men to become disabled, while less likely to recover and to die. The race/ethnicity coefficients indicate that elderly non-Hispanic blacks are more likely to become disabled and die, while less likely than elderly non-Hispanic whites to recover from disability.

The SPACE coefficient estimates for the “annual” interval are slightly different from the IMaCh estimates with a 12-month interval, reflecting the differences in the measurement of the gap between interviews. The SPACE program does not take into account the variation in the actual interval between interviews. It arranges the longitudinal data into pairs of observations and estimates the transition probabilities between one time point and the next. In the case of MCBS data, although it is *designed* with 12-month intervals, the actual gap ranges from 8 months to 16 months (). This variation in time interval is ignored by the SPACE program, but not by the IMaCh program, which estimates month-to-month transitions.

shows the equilibrium prevalence of health states that is used by the IMaCh program to estimate HE, as well as the smoothed prevalence estimates for SPACE estimates. The SPACE prevalence estimates are similar to the observed prevalence, while the IMaCh prevalence estimates of disability, whether using the one-month or 12-month interval, are noticeably lower.

Using the IMaCh estimates of coefficients with monthly transition interval in and the period prevalence estimates in , the IMaCh estimates of HE at age 65 by gender and race/ethnicity are shown in . We also include in two sets of variance estimates – the bootstrap estimates that consider survey design and the IMaCh estimates that do not – in order to evaluate the degree of bias in variance estimates when complex survey design is ignored. The bootstrap variance estimates are obtained by first randomly selecting 250 bootstrap samples from the full analytic sample, and then using them as input data sets to the IMaCh program to derive 250 sets of IMaCh point estimates of life expectancy.^{13} The standard deviation of these 250 estimates are considered the bootstrap SEs of the original IMaCh point estimates and are compared to the IMaCh estimates that do not reflect the complex sampling design of MCBS.

| **Table 3**IMaCh estimates of health expectancy at age 65 and their standard errors |

A common measure of such bias in survey research is the design effect (DEFF) – the ratio of the variance estimates that consider survey design (i.e., the bootstrap estimates) to the estimates that do not (i.e., the IMaCh estimates). Since stratification and clustering have opposing effects on sampling variability, the value of the ratio may suggest the relative size of these design factors: if the ratio is greater than one then the clustering effect may be stronger; if the ratio is less than one then the stratification effect may be stronger. shows that all of the bootstrap estimates are larger than the IMaCh estimates, an indication of the larger clustering effect in MCBS. In some cases the bootstrap estimates are much larger. For example, the bootstrap variance of DLE for non-Hispanic white females is 75% larger than the IMaCh estimate.

2.6.2 Convergence of bootstrap variance estimates A practical issue in the implementation of bootstrap method is how many bootstrap samples to draw.

Efron (1987) suggested that 100 samples are sufficient for variance estimates, while other researchers have argued for a much higher number of replications, especially given the rapidly declining cost of computation (e.g.,

Booth and Sarkar 1998;

Chernick 1999). From a practitioner’s perspective, a straightforward approach is to check the convergence pattern of bootstrap variance estimates as more and more samples are drawn. The analyst can typically decide on the number of samples to draw when the variance estimates begin to stabilize. In , we plot the standard errors (SEs) for SPACE estimates of HE at age 65. While the patterns are different across gender and racial/ethnic groups, all four figures show that the SEs begin to stabilize after the first 500 samples. The fluctuations after 500 samples are very small, except maybe for the TLE estimates for 65-year old black men and black women. The SEs for these two groups are 0.60 and 0.52 with 2000 samples, representing a small difference of six and seven percent, respectively, from their corresponding values with 500 samples. The convergence patterns for the percentile estimates of years lived and spent in active health and disability are similar to and not shown here. Since these small differences are not likely to affect the results of hypothesis tests, we choose to use the SEs with 500 bootstrap samples in .

^{14} | **Table 4**SPACE estimates of the average and percentiles of years expected to live and spend in active health and disability at age 65, by gender and race/ethnicity |

2.6.3 The distribution of years lived and spent in different health states shows the SPACE estimates of the average, median, 25

^{th} and 75

^{th} percentiles of the number of years lived, as well as the number of years spent in active health and disability at age 65, by gender and race/ethnicity, based on transition probabilities and prevalence estimated from the full analysis sample as shown in . We also include the TLE at age 65 based on the National Vital Statistics System in 2000 as a comparison (

Arias 2002). The SPACE estimates of TLE in indicate some small differences from the 2000 Vital Statistics: the largest is 4.7 percent for white women. The differences are caused mostly by the lack of control for variation in the actual time interval between interviews in the SPACE program. We verified this source of difference by manually calculating the transition probabilities using the SPACE coefficient estimates and the IMaCh coefficient estimates with a 12-month transition interval in . The IMaCh estimates of TLE with a 12-month interval are closer to the Vital Statistics in year 2000 than the SPACE estimates.

For both gender and racial/ethnic groups, estimates of the median, 25^{th} and 75^{th} percentile of the number of years expected to live and to spend in active health at age 65 indicate generally symmetric distributions, while the distributions of years spent in disability have a longer tail on the right. Due to space constraints, we only present the distributions of years spent in active health and disability at age 65 for non-Hispanic white and black men in and . shows that the distribution of years spent in active health for white males is generally shifted to the right of the distribution for black males. In substantive terms, this difference in distributions can be illustrated by the fact that 61% of black men spend fewer years in active health than the median number of years spent in active health for white males. In addition, the graph shows that the distribution of years spent in active health for black men is more tightly clustered around its median of 11.5 years compared to white men; the average deviation from the median is 3.5 years for blacks compared to 4.4 years for whites. Black men thus not only have a lower ALE (i.e., average number of years in active health) than whites but it also appears that they are more homogenous with regard to the distribution of years spent in active health than are whites.

indicates that the distributions of years spent in disability for black and white men are highly skewed to the left. About half of all men are expected to have 2 or fewer disabled years; about 25 percent will spend five or more years disabled after age 65. As is evident, the distributions are very similar for the two race groups, consistent with the results in . Thus, it appears that the distributions of years spent in active health differ substantially for black and white men, yet the distributions of years spent in disability do not.