Home | About | Journals | Submit | Contact Us | Français |

**|**Biostatistics**|**PMC3138070

Formats

Article sections

- Abstract
- INTRODUCTION
- RECURRENT EVENT DATA ANALYSIS
- MODEL CHECKING TECHNIQUES
- SIMULATIONS
- SCHIZOPHRENIA DATA EXAMPLE
- DISCUSSION
- SUPPLEMENTARY MATERIAL
- References

Authors

Related links

Biostatistics. 2011 July; 12(3): 535–547.

Published online 2010 December 6. doi: 10.1093/biostatistics/kxq071

PMCID: PMC3138070

Chiung-Yu Huang^{*}

Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA ; Email: huangchi/at/niaid.nih.gov

Xianghua Luo

Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA

Biostatistics Research Branch, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA

Received 2010 July 8; Revised 2010 October 22; Accepted 2010 November 1.

Copyright Published by Oxford University Press 2010.

This article has been cited by other articles in PMC.

Recurrent events are the natural outcome in many medical and epidemiology studies. To assess covariate effects on the gaps between consecutive recurrent events, the Cox proportional hazards model is frequently employed in data analysis. The validity of statistical inference, however, depends on the appropriateness of the Cox model. In this paper, we propose a class of graphical techniques and formal tests for checking the Cox model with recurrent gap time data. The building block of our model checking method is an averaged martingale-like process, based on which a class of multiparameter stochastic processes is proposed. This maneuver is very general and can be used to assess different aspects of model fit. Numerical simulations are conducted to examine finite-sample performance, and the proposed model checking techniques are illustrated with data from the Danish Psychiatric Central Register.

Recurrent events are frequently encountered in many biomedical and epidemiology studies, where clinical important events occur repeatedly over the course of follow-up. In many applications, the time between consecutive events, that is the gap time, is a natural outcome of interest. For example, the gap between psychiatric admissions can be viewed as one dimension of the course of schizophrenia, as a decreasing trend in the length between admissions suggests a progressive course of the disorder. When events are of the same type, Wang and Chang (1999) and Peña *and others* (2001) generalized the Kaplan–Meier estimator to estimate the common marginal survival function of gap times, and Huang and Chen (2003), Chang (2004), Lu (2005), and Sun *and others* (2006) considered semiparametric estimation problems. When events are of different types, various nonparametric methods, such as Visser (1996), Wang and Wells (1998), Lin *and others* (1999), and Huang and Wang (2005), as well as semiparametric methods, such as Huang (1999) and Chang (2000), have been developed in the literature.

This work is motivated by the analysis of psychiatric admissions from the Danish Psychiatric Central Register (DPCR), a nationwide psychiatric case registry in Denmark that recorded all psychiatric contacts since 1969. Researchers have shown that the risk of readmission due to relapse of schizophrenia is stable over time (Olesen and Mortensen, 2002; Olesen and Parner, 2006). Moreover, marginal Cox proportional hazards models have been employed to assess the effects of baseline covariates, such as onset age, gender, and marital status, on the risk of rehospitalization. The validity of statistical inference, however, depends critically on the adequacy of the assumed Cox model. While various model checking methods have been suggested for univariate survival time data (Barlow and Prentice, 1988; Therneau *and others*, 1990; Lin *and others*, 1993) and for clustered survival time data (Spiekerman and Lin, 1996), these techniques are not appropriate for recurrent gap time data due to its peculiar stochastic structure. To our knowledge, no formal model checking procedures have been studied in the literature for assessing the Cox model in the gap time setting.

In this article, we introduce diagnostic methodologies for the Cox model with recurrent gap times. In Section 2, we briefly review the pseudo-partial likelihood estimator studied in Huang and Chen (2003). The model checking methods based on the averaged martingale-like process are described in Section 3. We present results of simulation studies in Section 4 and illustrate the proposed methods with the DPCR data in Section 5. Some concluding remarks are given in Section 6.

Suppose *n* independent subjects are recruited into a longitudinal study after experiencing an initial event and the recurrences of the same event are recorded during the study period. Let *i* be the index for a subject, and let *j* index the sequence of recurrent events of an individual, where *j* = 0 indexes the initial event. Denote by *T*_{ij} the gap time between the *j* − 1th and *j*th events of the *i*th subject.

In practice, subjects are usually sampled independently, but gap times from the same subject could be correlated. We assume that there exists a subject-specific frailty variable *γ*_{i} so that, conditioning on *γ*_{i}, the individual recurrent event process is a renewal process; that is, on the individual level, the gap times *T*_{i1},*T*_{i2},… are independent and identically distributed (iid). Thus, unconditioning on *γ*_{i}, the recurrent gap times, *T*_{i1},*T*_{i2},…, are identically (but not independently) distributed. Note that, under our setting, both the distribution of the subject-specific frailty variable *γ*_{i} and the relationship between *γ*_{i} and (*T*_{i1},*T*_{i2}…) are left unspecified. Let **Z**_{i} = (*Z*_{1i},…,*Z*_{pi})^{T} be a vector of time-independent covariates associated with subject *i*. To assess the effect of **Z**_{i} on the distribution of the gap times, we further assume that the common hazard function of the gap times *T*_{ij},*j* = 1,2,…, satisfies the Cox proportional hazards model

where **β** is a *p*×1 vector of parameters and the baseline hazard function *λ*_{0}(*t*) is an arbitrary function with . The observation of recurrent events is subject to right censoring. We denote by *C*_{i} the time between the initial event and the end of follow-up, and assume that *C*_{i} is independent of *γ*_{i} and (*T*_{i1},*T*_{i2},…) conditional on **Z**_{i}. Let *m*_{i} ≥ 1 denote the index satisfying and , with ∑_{1}^{0}·0. Thus, *m*_{i} is the number of observed gap times, where the first *m*_{i} − 1 gap times are completely observed and the last gap time is censored by . Denote the observed data for subject *i* by {*Y*_{i1},…,*Y*_{imi};*m*_{i},**Z**_{i}}, where *Y*_{ij} = *T*_{ij} for *j* = 1,…*m*_{i} − 1 and . The observed data across the *n* subjects are assumed to be iid.

Intuitively, the observed recurrent gap time data can be viewed as clustered survival time data. However, as pointed out by Wang and Chang (1999), the sequential structure of recurrent events presents many challenges for statistical analysis. For example, only the last gap time is censored, and the censored gap time *T*_{imi} tends to be longer than the uncensored ones due to intercept sampling. Moreover, the gap time *T*_{ij},*j* > 1, is subject to dependent censoring *C*_{i} − *T*_{i1} − … − *T*_{ij − 1}, as the gap times are correlated. Finally, subjects who are at a higher risk of recurrent event tend to have more recurrent events; thus the cluster size *m*_{i} is informative about the gap time distribution. Hence, methods for clustered survival time data cannot be directly applied to analyze recurrent gap time data due to its peculiar structure.

We introduce 2 empirical processes for the estimation procedure of the Cox model with recurrent gap time data. Define Δ_{i} = *I*(*m*_{i} > 1) and *m*_{i}^{*} = max{*m*_{i} − 1,1}. Hence, for people with no observed events Δ_{i} = 0 and *m*_{i}^{*} = 1, and for people with observed events Δ_{i} = 1 and *m*_{i}^{*} is the number of uncensored gap times. Let *N*_{ij}(*t*) = Δ_{i}*I*(*Y*_{ij} ≤ *t*) and *R*_{ij}(*t*) = *I*(*Y*_{ij} ≥ *t*). For subject *i*, we define the “averaged counting process” and the “averaged at-risk process” as

respectively. Note that if subject *i* has observed events, a weight 1/*m*_{i}^{*} is assigned to each of the *m*_{i}^{*} uncensored gaps, and the last censored gap is not used in the formulation of *N*_{i}^{*}(*t*) and *R*_{i}^{*}(*t*). Thus, *N*_{i}^{*}(*t*) and *R*_{i}^{*}(*t*) have jump sizes of 1/*m*_{i}^{*} and − 1/*m*_{i}^{*}, respectively, at each uncensored gap time, which is different from the univariate survival analysis where the jump size of the individual counting and at-risk processes is always 1 and − 1.

To estimate the Cox model, Huang and Chen (2003) proposed to solve the pseudo-partial score equation

where and *τ* satisfies pr(*Y*_{i1} ≥ *τ*) > 0. Note that we define **a**^{0} = **a**,**a**^{1} = **a** and **a**^{2} = **a****a**^{T} for a column vector **a**. The pseudo-partial likelihood estimator is defined as the solution to **U**^{*}(**b**) = 0. As pointed out by Wang and Chang (1999), for an individual with *m*_{i} > 1, the uncensored gap times *Y*_{i1},…,*Y*_{imi*} of subject *i* are identically distributed conditional on *γ*_{i},*m*_{i} and *C*_{i}. By utilizing this property, it can be shown that **U**^{*} and the partial score function derived by using only the first gap times {(*Y*_{i1},**Z**_{i},Δ_{i}),*i* = 1,,*n*} converge uniformly to the same limit, ensuring the consistency of .

Following the spirit of conventional counting process martingales, for subject *i* we define the averaged martingale-like process

With the use of *M*_{i}^{*}(*t*), the asymptotic representation of the pseudo-partial likelihood estimator can be expressed as

where is assumed to be positive definite at **b** = **β** and (**b**,*t*) is the limit of **S**^{(1)*}(**b**,*t*)/*S*^{(0)*}(**b**,*t*) as *n*→*∞*.

The averaged martingale-like process *M*_{i}^{*}(*t*) is expected to be informative about model misspecifications because it is the difference between the observed and the expected proportion of uncensored gaps that are shorter than *t*. Note, however, that *M*_{i}^{*}(*t*) is not a martingale because of the sequential structure of recurrent events. The stochastic process *M*_{i}^{*}(*t*) can be estimated by the averaged residual process

where the estimated baseline cumulative hazard function is given by the Breslow-type estimator with

The averaged residual process is different from the martingale residual processes considered by Lin *and others* (1993) and Spiekerman and Lin (1996), as it is constructed based on correlated gap times and is inversely weighted by the corresponding cluster size *m*_{i}^{*}. Interestingly, the averaged residual process behaves similarly to the ordinary martingale residual process in that, for any *t*, and for large *n*.

We now describe objective diagnostic techniques for the Cox model by grouping the averaged residuals cumulatively with respect to follow-up time and/or covariate values. Specifically, we consider a general class of multiparameter stochastic processes

(3.1)

where *f*(·) is a prespecified bounded function and . Under the assumed Cox proportional hazards model, the processes are expected to fluctuate randomly around zero. We show in Appendix A of the supplementary material (available at Biostatistics online) that the process is asymptotically equivalent to , where

Note that are the *n*-limits of

and

It can be shown that, as *n*→*∞*, the multiparameter process converges to a zero-mean Gaussian process with covariance function *E*{Ψ_{i}^{*}(*t*_{1},**z**_{1})Ψ_{i}^{*}(*t*_{2},**z**_{2})^{T}} for any pairs (*t*_{1},**z**_{1}) and (*t*_{2},**z**_{2}). Replacing the unknown quantities in Ψ_{i}^{*} by the corresponding sample estimates, we define the function

where

In Appendix B of the supplementary material (available at Biostatistics online), we show that *E*{Ψ_{i}^{*}(*t*_{1},**z**_{1})Ψ_{i}^{*}(*t*_{2},**z**_{2})^{T}} can be consistently estimated by .

In practice, we can approximate the limiting distribution of *W*^{*}(*t*,**z**) through Monte-Carlo simulations. Let *G*_{1},…,*G*_{n} be independent standard normal random variables which are independent of the data. Define . Conditional on the data, and thus conditioned on is a sum of *n* independent, zero-mean random vectors for every (*t*,**z**). Arguing as in section 2.4 of Spiekerman and Lin (1996), we can show that and converge weakly to the same limiting process. Hence, to approximate the null distribution of *W*^{*}(*t*,**z**), we can simulate a number of realizations from by repeatedly generating standard normal random samples (*G*_{1},…,*G*_{n}). To assess how unusual the observed residual pattern is under the assumed model, we plot the observed multiparameter process along with a few, say 20, processes simulated from . An approximate *P* value for the supremum test can be estimated by the empirical probability that the observed value of exceeds the observed value of through simulation.

Two important aspects of the Cox model can contribute to the lack of fit to the data: the functional form of covariates might be misspecified, and the proportional hazards assumption may fail. In this section, we demonstrate the use of the results described in Section 3.1 in examining these 2 aspects of the Cox model by considering 2 special cases of *W*^{*}(·,·).

When fitting a Cox model, one would like to determine the functional form of a covariate that best explains its effect on the gap time distribution. For example, we may want to assess whether the effect of schizophrenia onset age is linear on the logged hazard. The appropriateness of the functional form of the *k*th component of **Z** can be examined by plotting the partial-mean process for *Z*_{ki},

against *z*. If the functional form of *Z*_{ki} is appropriate, then the observed process is expected to fluctuate around zero for all values of *z*; otherwise one would expect a systematic trend. Note that the partial-mean process *W*_{z}^{*}(*z*) is a special case of *W*^{*}(*t*,**z**), with *f*(·) = 1 and *t* = *τ* in (3.1). Hence, the null distribution of *W*_{z}^{*} can be approximated through simulating the corresponding zero-mean Gaussian process , and the *P* value can be obtained by using the supremum test .

Figure 1 shows plots of the partial-mean process, along with 20 realizations of the null distribution, when fitting a proportional hazards model *λ*(*t*|*Z*) = *λ*_{0}(*t*)exp(*β**Z*) to data generated from various alternative models. Figure 1(a) shows that the observed partial-mean process falls well inside the range determined by the simulated processes when fitting the Cox model with the correct functional form. Figure 1(b) shows that the observed partial-mean process has a curvilinear pattern when a quadratic term is missing from the fitted model. The fitted model overestimates the hazards for lower end values of *Z* and underestimates the hazards for large values of *Z*, suggesting that a concave function of *Z* could fit better than the linear function of *Z*. Figure 1(c) and (d) consider 2 other examples of concave functions, and log(*Z*), and it is easy to see that the pattern is similar to Figure 1(b). Figure 1(e) gives an example of convex function exp(*Z*), and its pattern is opposite to Figure 1(b). Finally, Figure 1(f) considers a dichotomized *Z*. Interestingly, the observed partial-mean process has a sharp drop near the cutoff value of *Z*, and falls inside the range determined by the simulated processes when moving away from the cutoff value.

In evaluating the effect of electroshock therapy on schizophrenia patients, it is often assumed that the hazard ratio of the intervention group versus the non-intervention group is constant over time. However, the results of Cox regression can be misleading if proportional hazards assumption is not valid. To check the proportional hazards assumption, we consider the pseudo-score process

which is also a special case of *W*^{*}(*t*,**z**), with *f*(**Z**_{i}) = **Z**_{i} and **z** = *∞* in (3.1). The results stated before for *W*^{*}(*t*,**z**) can be used to simulate the distributions of the standardized pseudo-score process where is the estimated variance–covariance matrix for given by

Graphical inspections of the proportional hazards assumption with respect to the *k*th component of can be performed based on , where is the *k*th component of and is the *k*th diagonal element of . Moreover, the *P* value for the overall assessment of the proportional hazards assumption can be obtained by using the supremum test , where

Figure 2 shows plots of standardized pseudo-score processes when fitting the proportional hazards model, *λ*(*t*|*Z*) = *λ*_{0}(*t*)exp(*β**Z*), to data generated from alternative models where the proportionality assumption may fail to hold. Figure 2(a) shows that the observed process falls well inside the range determined by the simulated processes when the proportional hazards assumption holds. Figure 2(b) considers the covariate effect to be increasing with time, while Figure 2(c) considers the scenario where there are no covariate effects before *t* = 0.5 and the covariate effect follows proportional hazards assumption after *t* = 0.5. Both plots show evidence of nonproportional hazards for the covariate. It is interesting to see that in Figure 2(c) the observed pseudo-score process for large *t* falls well inside the range determined by the approximating processes, suggesting that the proportional hazards assumption may hold for later time points.

To simulate correlated gap times for subject *i*, we generate a subject-specific variable *A*_{i} and episode-specific variables *B*_{ij} independently from zero-mean normal distributions with variances *ρ* and 1 − *ρ*, respectively, where *ρ*[0,1]. Let *T*_{ij}^{(0)} = − log{1 − Φ(*A*_{i} + *B*_{ij})}, where Φ(·) is the cumulative density function of the standard normal distribution. It can be verified that *T*_{i1}^{(0)},*T*_{i2}^{(0)},… are realizations of correlated exponential random variables with unit means. Thus by applying a transformation to *T*_{ij}^{(0)}, we can easily impose covariate effects and/or change the shape of the hazard function. As an example, has a marginal cumulative baseline hazard function Λ(*t*) = *a*×*t*^{κ}.

Two sets of simulation studies are conducted to examine the performance of the proposed supremum tests in checking the functional form of a covariate and the proportional hazards assumption. For each set of simulations, we consider combinations of different sample sizes *n* = 50,100 and *ρ* = 0,0.25,0.5,0.75. The censoring times are generated from a uniform(0,*τ*_{c}) distribution, where *τ*_{c} is chosen such that the proportion of subjects with zero events is approximately 25%. The size and power of the test statistics are estimated using 1000 simulated data sets. The significance level of the supremum tests is set at 0.05.

The first set of simulations assesses the performance on checking the functional form of a covariate *Z* in the Cox model. We fit a proportional hazards model *λ*(*t*|*Z*) = *λ*_{0}(*t*)exp(*Z**β*) to recurrent gap time data generated under various models described in Table 1. We compare the performance of the proposed method with the diagnostic test proposed by Lin *and others* (1993), henceforth LWY, that only uses the first gap times {(*Y*_{i1},Δ_{i},**Z**_{i}),*i* = 1,…,*n*}. Table 1 summarizes the estimated power and size of the proposed supremum test and the LWY testing procedure. When the functional form of the covariate is correctly specified, the estimated size of the proposed test is close to the predetermined significance level (0.05) across various degrees of within-subject association in recurrent gap times. As expected, the proposed method always yields greater power than the LWY method. Moreover, the power of the proposed supremum test decreases when the within-subject correlation increases, despite the increase in the average number of gaps, whereas the power of LWY test remains constant. Such a phenomenon is frequently observed in correlated data where the power increases with a decrease in within-subject correlation.

Size and power of the supremum test sup * _{z}*|

The second set of simulations assesses the performance on checking the proportional hazards assumption. We generate gap times with marginal hazard functions *λ*(*t*|*Z*) = 2*t*exp(0.3*Z*),*λ*(*t*|*Z*) = exp[*Z*{0.8 + 0.3log(*t*)}] and *λ*(*t*|*Z*) = exp{*Z**I*(*t* > 0.5)}, where the assumption of proportionality in *Z* holds for the former and fails for the other 2 scenarios. The supremum test based on the standardized pseudo-score process is applied to the simulated data and is compared with the LWY test using the first gap times. Table 2 summarizes the estimated sizes and powers of the 2 model checking methods. When the assumption of proportional hazards holds, the estimated size of both tests are close to the nominal level. As expected, the proposed test consistently yields higher power than the LWY test that only uses data on the first gap times.

We illustrate the proposed methods using data collected by the nationwide DPCR (Munk-Jørgensen and Mortensen, 1997). The DPCR has monitored contacts with psychiatric institutions throughout Denmark since 1969 and has included all contacts in emergency rooms and outpatient clinics since 1995. Each admission record includes date of admission, date of discharge, one main discharge diagnosis and up to 3 auxiliary discharge diagnoses. The data used in our analysis is from a cohort of 286 individuals with first contact with Danish psychiatric services during the period from April 1, 1970 to December 31, 1970 with a diagnosis of schizophrenia. For our cohort, 63% were male, 20% were married, and the median onset age of schizophrenia was 26 with a range from 14 to 88. We are interested in assessing the effects of onset age, gender, and marital status on the risk of rehospitalization due to a schizophrenic episode. Although previous literature (Olesen and Mortensen, 2002; Olesen and Parner, 2006) shows that the readmission risk seems to be stable over time, to avoid potential long-term pattern changes in recurrent gap times, we set the maximum follow-up time for each person to be 3 years. Forty percent of these individuals (171 out of 286) did not experience another hospitalization during the 3 year follow-up. On average, each patient experienced 1.7 hospitalizations after the initial contact.

We apply the nonparametric trend test proposed by Wang and Chen (2000) to the recurrent gap times of the study cohort and find that there is no significant change in the distributional pattern of gap times. Univariate Cox regression using the method by Huang and Chen (2003) shows that 3 risk factors, age of schizophrenia onset (*P* < 0.001), male (*P* = 0.09), and unmarried (*P* = 0.02) are associated with a shorter gap time between repeated hospitalizations. When all 3 variables are included in the Cox model, the effects of onset age remains highly significant (*P* < 0.001), while the effects of gender (*P* = 0.85) and marital status (*P* = 0.47) are insignificant. The multivariate Cox model estimates that a person who has onset at 20 has almost a 25% higher instantaneous risk of rehospitalization than a person who has an onset at 30, adjusting for gender and marital status. Further analysis reveals that both gender and marital status were highly correlated with onset age. The median onset age was 24 and 34 for males and females, and was 23 and 38 for unmarried and married patients of the studied cohort. Hence, in the remaining analysis, we will only focus on the effect of onset age on the distribution of recurrent gap times.

To check the assumption of proportional hazards for the candidate model with age at onset, denoted by *Z*, as the only covariate, we plot the standardized pseudo-score process *W*_{ph}^{*}(*t*) from the fitted model. Figure 3(a) shows the observed standardized pseudo-score process versus follow-up time since last hospitalization along with approximating processes simulated from the null hypothesis of a correct model fit. The observed process seems to be typical of the null hypothesis simulations, and the proportional hazards assumption fails to be rejected with an estimated *P* value 0.29 that is based on 1000 simulated processes.

Diagnostic plots for the DPCR data. Upper panel: plots of the standardized pseudo-score process *W*_{ph}^{*}(*t*) for checking the proportional hazards assumption. The covariates in the fitted models are (a) *Z*, (b) , and (c) log(*Z*). Lower panel: plots of the partial-mean **...**

Next, we plot the partial-mean process *W*_{z}^{*}(*z*) and apply the supremum test to check the linear functional form of *Z*. Figure 3(d) displays the observed partial-mean process along with 20 approximating processes generated from the null distribution. In addition to the linear Cox model, we also consider 2 different functional forms, and log(*Z*), of *Z*. The corresponding plots are given by Figures 3(e) and 3(f). The estimated *P* values for the 3 models are 0.42, 0.71, and 0.77, respectively. These results suggest that the linear form of onset age, although reasonable, may not fit as well as the proportional hazards models with either or log(*Z*) as the covariate.

Plots of standardized pseudo-score process for the other 2 candidate models are also obtained to examine the assumption of proportional hazards. As shown in Figures 3(b) and (c), the estimated *p* values for the models with and log(*Z*) as the only covariate are 0.50 and 0.75, respectively. Based on these results, the logarithm transformation of the schizophrenia onset age seems to be more appropriate than the other functional forms considered in this section. The pseudo-partial likelihood method estimates the coefficient of log*Z* to be − 1.04 (SE 0.21) in the Cox model.

The proposed methodology can be employed to develop other model assessment techniques for the Cox model. For example, the link function of the proportional hazards model can be checked by plotting against the set of unique values of global test can be constructed by considering for all possible values of **z** = (*z*_{1},…,*z*_{p}) and *t*. Arguing as in Section 3.1, the distributions of *W*_{l}^{*}(*z*) and *W*_{g}^{*}(**z**,*t*) also can be approximated through simulating the corresponding zero-mean Gaussian process. Finally, a smoothed plot of versus a covariate omitted from the fitted model provides approximately the correct functional form of that covariate.

In this article, we introduced 2 stochastic processes, the averaged counting process *N*_{i}^{*}(*t*) and the averaged at risk process *R*_{i}^{*}(*t*), to formulate the averaged martingale-like process *M*_{i}^{*}(*t*). Different graphical techniques and formal tests were proposed by grouping the estimated process , that is the averaged residual process, cumulatively with respect to follow-up time and/or covariate values. The proposed methodology is very flexible. In fact, by properly modifying the definition of *N*_{i}^{*}(*t*) and *R*_{i}^{*}(*t*), the proposed methods can be generalized to check non-Cox models. As an example, consider the accelerated failure time (AFT) model log(*T*_{ij}) = *Z*_{i}^{T}**β** + *ε*_{ij}, where *ε*_{ij}'s have a common distribution. We can define Model checking methods for the AFT model can be developed based on the corresponding averaged martingale-like process. Plots of the multiparameter stochastic processes with simulated trajectories under the AFT model will help assess model fit. If the AFT model does not fit well, we can repeat this exercise for the Cox model and a linear transformation model. At the end, we can select the model that has the best-fitting residual process.

A limitation of the proposed methodology is its requirement that the recurrent gap times are conditional iid. When the conditionally iid assumption fails to hold, the exchangeability among uncensored gap times may not hold and thus the proposed method may not be appropriate. In practice, before fitting the proposed model, one should check whether there is a trend in the distributional pattern of gap times by applying trend tests such as the ones proposed by Wang and Chen (2000). Moreover, one may compare the results using the proposed model checking methods with that using the methods considered by Lin *and others* (1993) for the first gap times. If the 2 approaches give different conclusions, it is likely that the conditional iid assumption is violated and one should consider other statistical models that better capture the difference in episode-specific distributional patterns.

Supplementary material is available at http://biostatistics.oxfordjournals.org.

The authors thank Drs Preben Bo Mortensen and William Eaton for generously providing anonymous Denmark schizophrenia data for illustrating the proposed methods. *Conflict of Interest:* None declared.

- Barlow WE, Prentice RL. Residuals for relative risk regression. Biometrika. 1988;75:65–74.
- Chang S-H. A two-sample comparison for multiple ordered event data. Biometrics. 2000;56:183–189. [PubMed]
- Chang S-H. Estimating marginal effects in accelerated failure time models for serial sojourn times among repeated events. Lifetime Data Analysis. 2004;10:175–190. [PubMed]
- Huang C-Y, Wang M-C. Nonparametric estimation of the bivariate recurrence time distribution. Biometrics. 2005;61:392–402. [PubMed]
- Huang Y. The two-sample problem with induced dependent censorship. Biometrics. 1999;55:1108–1113. [PubMed]
- Huang Y, Chen YQ. Marginal regression of gaps between recurrent events. Lifetime Data Analysis. 2003;9:293–303. [PubMed]
- Lin DY, Sun W, Ying Z. Nonparametric estimation of the gap time distribution for serial events with censored data. Biometrika. 1999;86:59–70.
- Lin DY, Wei LJ, Ying ZL. Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika. 1993;80:557–572.
- Lu WB. Marginal regression of multivariate event times based on linear transformation models. Lifetime Data Analysis. 2005;11:389–404. [PubMed]
- Munk-Jørgensen P, Mortensen PB. The Danish Psychiatric Central Register. Danish Medical Bulletin. 1997;44:82–84. [PubMed]
- Olesen AV, Mortensen PB. Readmission risk in schizophrenia: selection explains previous findings of a progressive course of disorder. Psychological Medicine. 2002;32:1301–1307. [PubMed]
- Olesen AV, Parner ET. Correcting for selection using frailty models. Statistics in Medicine. 2006;25:1672–1684. [PubMed]
- Peña EA, Strawderman RL, Hollander M. Nonparametric estimation with recurrent event data. Journal of the American Statistical Association. 2001;96:1299–1315.
- Spiekerman CF, Lin DY. Checking the marginal Cox model for correlated failure time data. Biometrika. 1996;83:143–156.
- Sun LQ, Park DH, Sun JG. The additive hazards model for recurrent gap times. Statistica Sinica. 2006;16:919–932.
- Therneau TM, Grambsch PM, Fleming TR. Martingale-based residuals for survival models. Biometrika. 1990;77:147–160.
- Visser M. Nonparametric estimation of the bivariate survival function with application to vertically transmitted AIDS. Biometrika. 1996;83:507–518.
- Wang M-C, Chang S-H. Nonparametric estimation of a recurrent survival function. Journal of the American Statistical Association. 1999;94:146–153. [PMC free article] [PubMed]
- Wang M-C, Chen YQ. Nonparametric and semiparametric trend analysis for stratified recurrence times. Biometrics. 2000;56:789–794. [PubMed]
- Wang W, Wells MT. Nonparametric estimation of successive duration times under dependent censoring. Biometrika. 1998;85:561–572.

Articles from Biostatistics (Oxford, England) are provided here courtesy of **Oxford University Press**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |