|Home | About | Journals | Submit | Contact Us | Français|
We used a longitudinal data set covering 13 years from the Cardiovascular Health Study to evaluate the properties of a recently developed approach to deal with left censoring that fits a semi-Markov process (SMP) model by using an analog to the stochastic EM algorithm—the SMP-EM approach. It appears that the SMP-EM approach gives estimates of duration-dependent probabilities of health changes similar to those obtained by using SMP models that have the advantage of actual duration data. SMP-EM estimates of duration-dependent transition probabilities also appear more accurate and less variable than multi-state life table estimates.
Markov models (or multi-state life table models in demography) are widely used to analyze life course events, such as changes in health status at the level of an individual  or population [2–5], living arrangements [6, 7], poverty status , labor force participation [9–11], etc. Much of the appeal of a multi-state life table model lies in its simple assumption of a first-order Markov chain, where the transition probability is assumed to be conditional on the current state only . In reality, however, many researchers would agree that the process is likely to be more complicated. Some recent studies on the pattern of changes in functional health find evidence for duration dependence [13–16], suggesting the need for duration models such as the semi-Markov process (SMP) model. The SMP model conditions the transition probability on both the current state and its duration since the last event and has been used in various studies of event histories [17–20]. Although it is conceptually superior to the multi-state life table model, broader application of the SMP model has been hindered by the difficult ‘initial conditions’ problem in many longitudinal observational data sets .
The ‘initial conditions’ problem refers to the situation where longitudinal monitoring begins after a subject has entered his/her current state. If the duration of the current state between the entry time and the beginning of observation, which is termed the ‘backward recurrence time’ in Cox , is known, then there are a number of ways to model the event probabilities [18, 22, 23]. But often the backward recurrence time is unknown (i.e. the data are left-censored). Some shortcuts to simplify the problem have been proposed. Allison  suggested excluding all left-censored spells , where a spell is defined as the sojourn in the current state. This approach, which will be denoted henceforth by ‘SMP-EX’, typically results in substantial loss of information and may lead to unreliable estimates in small samples. Moreover, the SMP-EX approach does not provide all of the information needed to conduct inferences via simulated cohorts, which is a useful method for inference. Another approach is to assume that each unknown backward recurrence time is 0 (e.g. [25, 26]). This is generally incorrect and may lead to severe bias in parameter estimates unless the hazard rates are constant , in which case there is no duration effect and thus no need to fit an SMP model. Since many of the nation’s largest longitudinal health surveys (e.g. the Medicare Current Beneficiary Survey (MCBS), the Health and Retirement Survey, and the Longitudinal Studies of Aging I and II) are left-censored, there is a potentially serious problem when applying the SMP model directly to these data to study health transitions.
Cai et al.  proposed an approach to solving the ‘initial conditions’ problem in left-censored data. This approach, denoted by ‘SMP-EM’, uses an analog to the stochastic expectation-maximization (EM) algorithm , which is a variant of the deterministic EM algorithm . Cai et al. suggested that the unknown backward recurrence times can be treated as missing data and imputed directly. The imputed values of backward recurrence times are used in simulations of individual trajectories, both over the course of the iteration between the E-step and the M-step and in the estimation of summary measures of population health once the algorithm converges. Using the 1992–2002 MCBS, Cai et al. found that the algorithm yielded robust and quick convergence to common stationary distributions of coefficient estimates under different starting values for the unobserved backward recurrence times. They examined the probabilities of health events and reported statistically significant differences between the SMP-EM and multi-state life table estimates of total and healthy life expectancy at age 65.
Since the MCBS data have a relatively short follow-up period—respondents remain in the survey for a maximum of 3.5 years, Cai et al.  could not offer a detailed evaluation of the SMP-EM approach. In order to better evaluate the approach, it is necessary to use data with a longer observation period and, preferably, with information on backward recurrence times so that the SMP-EM estimates can be compared with SMP estimates that use actual duration data. The Cardiovascular Health Study (CHS) is desirable for this purpose. The CHS has a long observation period from baseline in 1989–1990 to the present and gathers health status information every six months from respondents during clinic visits and telephone interviews. In addition, subjects who report any limitations in instrumental activities of daily living (IADL) or activities of daily living (ADL) at the initial baseline in-person interview are asked how long they have had the limitation(s). A recalled backward recurrence time is therefore available for those with IADL and/or ADL limitations. These two unique data features, which are rare among longitudinal data sets, allow us to evaluate in detail the accuracy of SMP-EM estimates.
This study will compare SMP-EM estimates with the observed values from the CHS, SMP estimates using actual duration data, SMP-EX estimates, and, finally, multi-state life table model estimates. We will focus on the characteristics of the disablement process (i.e. disability incidence, recovery from disability, and death). These comparisons should help us better understand the properties of the SMP-EM estimates to guide future applications of the approach in event history analysis.
The CHS is a longitudinal study of coronary heart disease and stroke in adults aged 65 years and older . It has also been used in longitudinal studies of changes in health status over time . When the study started in 1989–1990, 5201 men and women were recruited from four communities in the U.S. (Forsyth County, NC; Sacramento County, CA; Washington County, MD; and Pittsburgh, PA). An additional 687 African Americans were added in 1992–1993, bringing the total sample size of CHS to 5888 people aged 65 years and older. Eligible participants were sampled from Medicare enrollment files. Those eligible included all persons living in the household of each individual sampled from the medicare sampling frame, who were 65 years or older at the time of examination, were non-institutionalized, were expected to remain in the area for the next three years, were able to give informed consent and did not require a proxy respondent at baseline. Potentially eligible individuals who were wheelchair-bound in the home at baseline or were receiving hospice treatment, radiation therapy, or chemotherapy for cancer were excluded.
Similar to Cai et al. , we measure health using functional limitations, which are defined on the basis of IADL and ADL limitations. An IADL limitation is defined as reportedly having difficulty in performing any of the six IADL activities without assistance because of health problems: using the telephone, doing light housework, doing heavy housework, preparing one’s own meals, shopping for personal items, and managing money. An ADL limitation is defined as reportedly having difficulty in performing any of the six ADL activities because of health problems: bathing/showering, dressing, eating, getting in or out of bed or chairs, walking, and using the toilet. Given the sample size of the CHS, we classify a living person into two mutually exclusive health states: active health and disability. Disability is defined as having one or more limitations in either IADL or ADL; active health is free of any limitations in both IADL and ADL. Death is the third and absorbing health state.
We analyze the CHS data collected between baseline (1989–1990) and 2002–2003, the latest years in which information on deaths of sampled persons are collected. From baseline to 1998–1999, the study conducted annual clinic visits, which were followed by telephone interviews six months later. Detailed questions on functional limitations were asked during every clinic visit. Telephone interviews were less detailed; they asked the sampled persons whether their overall functional status had changed since last interview and, if so, in what direction. Annual clinic visits were discontinued after 1998–1999 and were replaced by phone interviews every six months through 2002–2003. We use these six-month phone interviews as the source of annual functional status from 1998–1999 to 2002–2003. After exclusion of 159 sampled persons because of issues with the data such as having only one interview record, the final analysis sample includes up to 13 years of observations for 5729 persons aged 65 years and older totaling 60 471 person-years and 14 222 spells.
The difficulty in fitting the SMP model directly to left-censored data is due to the unknown values of the backward recurrence time, denoted by R. Cai et al.  suggested addressing this issue by imputing the R-values via an iterative stochastic EM algorithm. The stochastic E-step imputes the unknown R-values using a cohort simulated based on the current estimates of transition probabilities and the current estimated distribution of backward recurrence times at the cohort’s baseline. The M-step updates the transition probabilities and the cohort’s estimated baseline distribution of backward recurrence times using the current imputed values of R and all other known data values. The algorithm iterates between the E- and M-steps until convergence, after which estimates of interest are obtained by averaging over a successive number of iterations using the M-step estimates for those iterations. Given the similarities between the CHS data and the MCBS data used in Cai et al.  in terms of sampled persons’ characteristics and measures of health, many of the analytic strategies are similar. We will provide only a brief description here of the setup and implementation of the SMP-EM approach. The details can be found in .
The stochastic E-step imputes the unknown R-value for each left-censored spell in the CHS by drawing a value R* randomly from the R-values of matching persons in the simulated cohort. The values of R from which R* is drawn are calculated by subtracting the observed forward recurrence time (i.e. the duration between the beginning of observation and the first event or the end of study) of the left-censored spell in the CHS from the total lengths of all simulated spells in the cohort with matching characteristics such as gender, age, and current health status.
In the E-step for each iteration in our application, the simulation of the cohort began at an age younger than the youngest age in the CHS sample at baseline to facilitate imputation for all left-censored spells in the CHS. Given that the youngest persons in both the CHS and MCBS samples came under observation at age 65, we decided to follow the example in  and simulate a cohort beginning at age 55. The distribution of functional statuses at age 55 was backward extrapolated (via logistic regression) from estimates at age 65 and over for the CHS. In addition, although we started recording simulated health states for the cohort at age 55, the first spells of the simulated 55-year-olds were assumed to be already in progress at age 55 (i.e. the 55-year-olds all have R≥0). For simplicity, as in , the R-values for the 55-year-olds were assumed to have the same distributions as those of the 65-year-olds with matching characteristics. Thus, the R-value for each 55-year-old was randomly drawn from the corresponding estimated distribution of R-values for the 65-year-olds in the CHS based on the preceding E- and M-steps.
Although the choice of a baseline age of 55 is somewhat arbitrary, there are reasons to believe that it is not likely to introduce serious error. The idea behind the SMP-EM approach is to recognize the observed sample as the survivors of the younger cohort. In the present study, approximately 90 per cent of the simulated 55-year-olds survived to age 65, a proportion comparable to the general U.S. population . We also examined the distributions of R* produced by other starting ages such as ages 50 and 60, and they were similar to the distribution of R* using the age 55. Thus, the use of age 55 appeared reasonable.
The cohort size for each iteration of the E-step was 5000 persons. The resulting number of spells was large enough to allow the R-values for all left-censored spells in the CHS to be imputed. The left-censored spells completed by imputation were then joined with the other spells in the CHS with observed beginnings to form an updated set of ‘pseudo-complete’ data for the M-step described next.
The M-step estimates the transition probabilities using Lancaster’s conditional likelihood approach . In our study, the transition probabilities were estimated using a discrete-time hazard model since the CHS schedules interviews at discrete-time intervals and events are only known to have occurred somewhere between annual observations . A discrete-time hazard model is analogous to a piecewise constant hazard model in continuous time, where within each time interval the hazard rate is constant and the length of exposure is exponentially distributed. We fitted a separate multinomial logistic regression for each current health state. The dependent variable was the functional status at the end of each annual interval. The independent variables were age at the beginning of the spell, and duration of the current status and its natural logarithm. It is important to note that the discrete-time model implicitly assumes no more than one event between successive interviews. This assumption is likely unrealistic since there is evidence that widely spaced interviews miss a large number of disability events of short duration . We will discuss the limitation of this assumption in greater detail later.
Both the current R*-values and the age-duration-state-specific transition probability estimates are used in the simulation of another 55-year-old cohort in the next iteration of the E-step. As the algorithm iterates between the E-step and the M-step, the transition probabilities and R* are updated until the algorithm converges. In this study, we initialized the stochastic EM algorithm by setting R = 0 for each left-censored case; the values of R were then updated in subsequent iterations. Convergence occurred rather quickly within the first 25 iterations. We then used the estimated R-distributions at age 65 and the transition probability estimates from iterations 26 to 35 to simulate 10 large cohorts (100 000 persons) of 65-year-olds, one cohort per iteration, and computed the average estimates of summary values from these 10 cohorts as the SMP-EM estimates.
We evaluate the SMP-EM approach through a number of comparisons with observed values from the CHS sample and with results of using other approaches, including the multi-state life table approach. Table I summarizes the list of evaluations, their methods and data, and the measures being compared. Prediction accuracy is assessed by mean absolute percentage error (MAPE) . To calculate MAPE for a procedure, we first calculate the absolute percentage difference (|(ŷ − y)/y|*100 per cent) between the empirical value or the ‘true’ estimate (i.e. the SMP estimate using known values of backward recurrence time), denoted by y, and the estimated value from the procedure, denoted by ŷ, within each level of duration or age. We then add up these absolute percentage differences across all levels of duration or age and divide the sum by the number of levels to obtain the average absolute percentage error.
Since the SMP-EM approach imputes the R-values, we first examine whether the imputation of R is accurate, and whether SMP estimates using imputed backward recurrence times R* are accurate (Questions 1 and 2 of Table I, and Figures 1–4). Given the unique richness of the CHS data set, we perform comparisons using two different sub-samples, one using only those sampled persons with IADL or ADL limitations at baseline and the other using the first six years of observation of the 13-year study as a mock pre-baseline period. The first sub-sample takes advantage of the fact that persons with IADL or ADL limitations at baseline are asked to recall how long they have had the limitations; the second sub-sample avoids possible recall inaccuracies (inherent in the first sub-sample) by treating part of the observed spells as (possibly truncated) backward recurrence times. We use ‘SMP-RD’ to denote SMP estimates for the first sub-sample of disabled persons and ‘SMP-OB’ to denote estimates for the second sub-sample with up to six years of pre-baseline period. They are considered the ‘true’ SMP estimates since they utilize known values of R.
The SMP-RD and SMP-OB results in Figures 3 and and44 are derived from simulated cohorts of 100 000 65-year-olds. The simulations used both transition probability estimates and a distribution of R-values at baseline as inputs. The SMP estimates of transition probabilities were obtained via the same methods as in the M-step of the SMP-EM approach, although an iterative procedure was not needed since the known values of R were used. For the distribution of R-values at baseline, we used empirical distributions based on the recalled backward recurrence times at age 65 for SMP-RD estimates or the possibly truncated observed backward recurrence times at age 65 for SMP-OB estimates.
The second comparison is designed to assess whether SMP-EM estimates are more accurate than multi-state life table estimates (Questions 3 and 4 of Table I, and Table III) in estimating the dynamics of the disablement process (disability incidence and recovery and death). We used a repeated random split-sample design (50 sets of prediction/validation samples) here in order to avoid overfitting of the sample data. Each set of split samples contained a prediction sample, which was used to fit the SMP-EM and multi-state life table models, and a validation sample, which was used to compare observed individual trajectories with predicted values from the two approaches. To make the situation more difficult for the SMP-EM approach, we treated the recalled values of the backward recurrence time as unobserved in the prediction sample. For both models, predictions were made according to the baseline characteristics (age, functional status, and in the case of SMP-EM, backward recurrence time) for the persons in the validation sample. For disabled respondents, we used recalled values of the backward recurrence time; for active respondents, we drew a value from the estimated conditional distribution of R from the SMP-EM procedure that corresponded to the sampled person’s sex and age at study baseline. Predictions were right-censored at the end of the actual observation period for each of the persons in the validation sample to facilitate direct comparison with the CHS sample. MAPEs and standard deviations of predictions across the 50 split samples were calculated.
Table II presents the coefficient estimates and their bootstrap standard errors (SEs) for the multinomial logistic regression models fitted in this study. Bootstrap SEs reflecting the stratified sample design of the CHS were produced via the bootstrap algorithm described by Korn and Graubard . We used only 100 bootstrap samples in this study due to the high cost of computation. For each of the two sub-samples of the CHS, the SMP-EM intercept and coefficient for age are usually similar to the other SMP estimates, with only one relative difference larger than 100 per cent. But there are larger differences in the coefficients for duration and log (duration), including several relative differences larger than 100 per cent. In one instance (SMP-RD versus SMP-EM for the event of remaining disabled), the duration coefficients for the methods being compared have opposite signs. For the full CHS sample, the SMP-EX and SMP-EM coefficient estimates are generally similar too, except for the duration coefficient. Because, the SMP-EX method excludes all left-censored spells, the estimated SEs for the SMP-EX method are larger than those for the SMP-EM method, which uses the partial information in the left-censored spells.
Figures 1 and and22 compare the histograms of imputed R*’s with the R’s that are recalled and observed. The histogram of recalled R’s in the sub-sample of disabled shows a large concentration around one and two years of length, which the distribution of imputed R’s has failed to reproduce (Figure 1). The distributions of imputed and recalled durations at later years are more similar, however.
For the sub-sample with up to six years of pre-baseline period, the histogram of observed R’s (truncated at the end of the sixth year) indicates a bi-modal shape (Figure 2). This bi-modal shape is not reproduced by imputing R*: the distribution of imputed R*’s (also truncated at the end of the sixth year) misses the concentration of R at the lower end; instead it shows a significant concentration at the upper end.
Figure 3(A) and (B) compares the probabilities of recovery and death from the state of current disability within the next year as a function of duration using the sub-sample of disabled elderly. The SMP-EM and SMP-RD estimates of the probabilities of recovery (Figure 3(A)) and death (Figure 3(B)) indicate similar patterns, with a cross-over at the 4th year of duration when SMP-RD begins to show higher probabilities of recovery. The SMP-RD estimates of the probabilities of death seem to level off after six years in disability, while the SMP-EM estimates continue to rise with duration. The lack of agreement in these two figures reflects in part the effect of differences in estimates of the duration effect in Table II. Figure 4(A–C) compares the estimated probabilities of disability incidence, recovery and death using the sub-sample with up to six years of pre-baseline period. In all three figures, the values and patterns of the SMP-EM and SMP-OB estimates are similar.
These comparisons suggest that SMP-EM is a useful approach to dealing with the left-censoring problem in longitudinal data. The SMP-EM estimates of event probabilities are reasonably close to the ‘true’ SMP estimates using the known values of R. Although the shapes of the histograms in Figures 1 and and22 did not match exactly, especially in Figure 2 where the ‘true’ distribution is bimodal, the mismatch apparently did not affect the estimates of transition probabilities substantially.
To compare the accuracy of SMP-EM and multi-state life table estimates of transition probabilities by duration and age, Table III gives the MAPE, based on the results from 50 sets of prediction-validation samples. Panel A (duration-specific) indicates that the average errors of SMP-EM predictions are 43, 16 and 13 per cent for predicting the probabilities of incidence, recovery and death, respectively; the respective average errors of multi-state life table predictions are 61, 131, and 47 per cent. The SMP-EM estimates also exhibit smaller variability than the multi-state life table estimates. The standard deviations of the 50 sets of SMP-EM predictions, averaged over all duration intervals, are 1.7, 0.3, and 0.2 per cent, respectively, for disability incidence, recovery and death, while the average standard deviations for multi-state life table predictions are larger at 3.2, 1.8 and 1.3 per cent, respectively (not shown in Table III). Panel B (age-specific) shows that the differences are much smaller than those in Panel A. On average, the SMP-EM estimates have the same or smaller prediction errors for age-specific probabilities of death and disability incidence, but larger prediction errors for age-specific probabilities of recovery. The average SMP-EM prediction errors over 20 age groups, ages 65–85, are 9, 17, and 7 per cent, respectively, for probabilities of disability incidence, recovery and death; the respective average multi-state life table prediction errors are 9, 12 and 10 per cent. Larger SMP-EM biases in predicting the probabilities of recovery from disability in large part are due to the poorer performance of SMP-EM predictions at ages 65–75. In terms of the variability of estimates, the SMP-EM approach again outperforms the multi-state life table method, although to a lesser extent: the average standard deviations of SMP-EM estimates are 0.7, 0.8, and 0.3 percents for disability incidence, recovery and death, respectively, versus those of multi-state life table estimates at 1.8, 2.4, and 0.7 per cent (not shown here).
This study evaluates the SMP-EM approach in detail by taking advantage of the uniquely rich information in the CHS data set. We showed that changes in functional disability status appear to follow the pattern of a duration-dependent process, and the SMP-EM method provided more accurate estimates of duration-specific transition probabilities than the multi-state life table model. In contrast, the SMP-EM and multi-state life table approaches performed similarly overall for estimating age-specific transition probabilities, which is not surprising since the quantity being estimated was not duration-specific. Although the SMP-EM approach has to impute the unknown backward recurrence times, its estimates of event probabilities were similar to those obtained by using SMP estimates that have the advantage of actual data on backward recurrence time.
Taken together, the results in this study are positive for the SMP-EM approach and should encourage further exploration of its application in event history analysis. In a few cases, however, the performance of the SMP-EM approach was less than ideal. For example, the distribution of imputed R* did not match the bi-modal distribution of R in the split sample, although the mismatch apparently did not affect the accuracy of estimating duration-dependent transition probabilities substantially.
It is worth noting that two of the assumptions that all models in this study employ are unrealistic. First, the number of events between successive observations is assumed to be binary. If the health states at two adjacent interviews are identical, then it is assumed that no event has occurred between interviews; if the health states are different, then it is assumed that only one event has occurred. This assumption ignores the possibility of potentially multiple events of short durations that cannot be detected with this type of ‘partially observed’ data . There is evidence that the majority of disability episodes last only 1–2 months . Therefore, failure to take this into account is likely to result in inaccurate estimates of the number of transitions and the length of each episode of disability events, although Gill et al.  also suggest that estimates of summary measures such as the life expectancy are likely accurate, as long as the observation frequency is one year or less as in CHS. Several recent studies have proposed ways to estimate short-interval transition probability matrices for multi-state life table models [37, 39, 40]. There is clearly a need to relax the assumption concerning the number of events between successive observations for the SMP-EM method also. Second, the renewal process that underlies that SMP model assumes that events are independent of each other, conditional on the covariates included in the model. In multi-state life table models, events that occur in each age interval are also considered conditionally independent. It is naïve to assume that a limited number of covariates have accounted for all the differences among sampled persons. Additional research is therefore needed to extend both the SMP-EM and multi-state life table models to incorporate some type of correlation among recurrent events, possibly along the lines of the frailty-type model used in survival analysis.
We believe that it is reasonable to conclude that the SMP-EM approach is useful for estimating the relationship of disability transitions to duration, even though the estimation of the duration effect and the ad hoc assumptions used in the SMP-EM approach are both subject to potential specification errors. Since the form of duration dependence may be especially difficult to evaluate in samples with short follow-up periods, the estimates may be dominated by the imputed durations. Future applications of the SMP-EM approach should pay particular attention to the estimation of the duration effect.
The authors want to thank the anonymous reviewers for their helpful comments.
‡This article is a U.S. Government work and is in the public domain in the U.S.A.
Disclaimer: The findings and conclusions in this paper are those of the authors and do not necessarily represent the views of the National Center for Health Statistics, Centers for Disease Control and Prevention.