Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2875299

Formats

Article sections

- Summary
- 1. Introduction
- 2. Model Setup
- 3. Estimation Procedure
- 4. Simulation Studies
- 5. Discussion
- References

Authors

Related links

Biometrics. Author manuscript; available in PMC 2010 May 25.

Published in final edited form as:

Published online 2009 May 12. doi: 10.1111/j.1541-0420.2009.01266.x

PMCID: PMC2875299

NIHMSID: NIHMS117740

The publisher's final edited version of this article is available at Biometrics

See other articles in PMC that cite the published article.

Recurrent event data analyses are usually conducted under the assumption that the censoring time is independent of the recurrent event process. In many applications the censoring time can be informative about the underlying recurrent event process, especially in situations where a correlated failure event could potentially terminate the observation of recurrent events. In this paper, we consider a semiparametric model of recurrent event data that allows correlations between censoring times and recurrent event process via frailty. This flexible framework incorporates both time-dependent and time-independent covariates in the formulation, while leaving the distributions of frailty and censoring times unspecified. We propose a novel semiparametric inference procedure that depends on neither the frailty nor the censoring time distribution. Large sample properties of the regression parameter estimates and the estimated baseline cumulative intensity functions are studied. Numerical studies demonstrate that the proposed methodology performs well for realistic sample sizes. An analysis of hospitalization data for patients in an AIDS cohort study is presented to illustrate the proposed method.

Recurrent event data arise in a wide variety of settings, including public health, biomedical research, reliability studies, and social sciences. In these studies each subject is at risk of experiencing repeated events, and the observation of recurrent events is terminated at or before the end of the study. For example, an HIV patient may experience multiple opportunistic infections during follow-up, and a schizophrenic patient may be repeatedly admitted to a psychiatric hospital. The recording of recurrent events could stop early if the patient withdraws or dies before the study period ends. In the examples, the analysis of recurrent event data is more relevant than time to first infection or hospital admission, because recurrent events are more informative about a patient's underlying health condition and can be used to assess whether there is evidence for deterioration over the long-term course of the disease.

A key feature of the recurrent event data structure is that the event recurrence times within a subject are stochastically ordered and typically correlated. Hence recurrent event data can be viewed as clustered “survival” time data. However, methods for clustered survival time data cannot be applied directly because the cluster size, i.e. the number of events of a subject, is correlated with the underlying distribution of recurrent events. Various methodologies have been proposed to study the risk of event occurrence over time. Extending the methodologies of survival analysis, Prentice, Williams and Peterson (1981), Andersen and Gill (1982), and Chang and Wang (1999) considered conditional models based on intensity and hazard functions. Others, such as Pepe and Cai (1993), Lawless and Nadeau (1995), and Lin et al. (2000), proposed marginal multiplicative rate models based on the number of recurrent events. Schaubel, Zeng, and Cai (2006) considered semiparametric additive rate models, where the effect of covariates acts additively on the marginal rate of recurrent event. As a useful alternative, Ghosh (2004) and Sun and Su (2008) proposed accelerated rate models where the covariate effect transforms the time scale of the baseline rate function. Finally, Zeng and Lin (2006) proposed a class of semiparametric transformation models to allow for more flexible modeling of the recurrent event process.

The analysis of recurrent event data is usually conducted with the assumption of independent censoring; hence, at any time point, the collection of subjects under observation is a random sample of the study population defined at the time origin. In many instances, however, censoring can be informative about the recurrent event process, especially when a failure event serves as a part of the censoring mechanism and precludes further observation of recurrent events. The afforementioned problem in dealing with recurrent event data under informative censoring complicates analyses for the ALIVE (AIDS Link to Intraveneous Experience) study (Vlahov et al. 1991), where nearly 3000 injection drug users (IDU) from Baltimore Maryland were recuited through extensive community outreach efforts. During the study period substantial information was collected including repeated hospitalizations, HIV test results, race, gender, and death. Hospitalization is an important integrated measure of an IDU's health status, as it reflects the negative health consequences from a variety of causes, e.g. violence, opportunistic infections, liver-related complications, or complications of injection drug use, such as infections of the skin and soft tissue (abscess, cellulitis, and necrotizing fasciitis), bacteremia, endocarditis, and osteomyelitis. Both time-dependent covariates, e.g. HIV status, and time-independent covariates, e.g. race and gender, might influence the rehospitalization risk. The potential informative censoring makes it difficult to properly assess the effect of these risk factors. The censoring among HIV-negative users is likely to be caused by informative dropouts from healthier IDUs. Moreover, HIV positive IDUs tend to be hospitalized more often and also have a higher mortality rate. A naive analysis of the hospitalization rate is likely to yield a biased estimate of the effect of HIV status on overall hospitalization.

To account for the dependence of recurrent events and the censoring event, researchers have proposed two types of approaches, marginal models and frailty models. Ghosh and Lin (2003) studied a correlated marginal model for the joint distribution of recurrent events and the failure time. By artificially censoring some observed event times, the authors developed a semiparametric estimation procedure for estimating the effect of time-independent covariates without modeling the correlation between the recurrent event process and the failure time. Lancaster and Intrator (1998) considered a joint fully parametric model of the recurrent event process and the failure time, where the dependency between the two outcomes is induced by sharing an unobserved frailty in the intensity of recurrent event process and the hazard of the failure event. Extending the work of Lancaster and Intrator (1998), Liu, Wolfe, and Huang (2004), Ye, Kalbfleisch, and Schaubel (2007), Huang and Liu (2007) and Rondeau et al. (2007) studied joint semiparametric models with a specified frailty distribution and estimated the model parameters by maximizing the semiparametric likelihoods. On the other hand, Wang, Qin, and Chiang (2001), Huang and Wang (2004), Huang, Wang, and Zhang (2006), Sun, Tong, and He (2007) proposed nonparametric and semiparametric estimation procedures where, in addition to the baseline intensity function and the baseline hazard function, the distribution of the unobserved frailty was also left unspecified. The estimation procedures proposed by these authors, however, cannot deal with time-dependent covariates.

We consider a semiparametric model for recurrent event data that accounts for both time-dependent and time-independent covariates in modeling the risk of recurrent events. This model allows the censoring time to be correlated with the recurrent event process via an unobserved frailty, relaxing the independent censoring assumption required by the usual risk set methods. Current methods that deal with time-dependent covariates require either an independent censoring assumption (Lin et al. 2000) or a parametric specification of the frailty distribution (Lancaster and Intrator 1998, Liu et al. 2004, and Ye et al. 2007). In practice, however, these methods may suffer from assumption violations due to informative censoring or lack of model checking techniques for the distribution of unobserved frailty. In this paper we present a novel semiparametric estimation procedure that depends on neither the distributions of frailty variables nor the failure times. We propose to estimate the regression coeffcients of time-dependent covariates by applying the conditional likelihood method to comparable pairs of event times, so the the nuisance parameters are eliminated from the conditional likelihood function. We then construct a weighted truncation product-limit estimator, adjusting for sampling bias in the risk sets, to estimate the shape function of the recurrent event process. Finally, we solve estimating equations formulated based on expected number of observed events to estimate the effects of time-independent covariates. We apply this approach to the ALIVE study and show that HIV-positive status is associated with an increased risk in repeated hospitalization.

Suppose that [0, *τ*] is the time period of interest or the study time interval, where the recurrent events could potentially be observed up to *τ*. Let subscript *i* be the index for a subject, *i* = 1, 2, …, *n*. For subject *i*, let *N _{i}*(

We introduce a nonnegative valued frailty variable *Z _{i}*, with

$${\lambda}_{i}\left(t\right)={Z}_{i}{\lambda}_{0}\left(t\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\phantom{\rule{thinmathspace}{0ex}}\{{\mathit{X}}_{i}{\left(t\right)}^{\prime}\beta +{\mathit{W}}_{i}^{\prime}\gamma \},\phantom{\rule{1em}{0ex}}t\in [0,\tau ],$$

(1)

where *β* and *γ* are *p* × 1 and *q* × 1 vectors of parameters, and the baseline intensity function λ_{0}(*t*) is an arbitrary continuous function with ${\Lambda}_{0}\left(t\right)={\int}_{0}^{t}{\lambda}_{0}\left(u\right)\mathit{du}$. We further assume that, conditional on (*Z _{i}*,

Note that two different types of informative censoring are often encountered in real application: informative dropouts and terminal events that could preclude further occurrence of recurrent events. For the case of informative dropouts the proposed model can be interpreted straightforwardly. For the latter case, recurrent events after the terminal event can be considered latent and modelled as if they could have occurred, analogous to latent failure times in the setting of competing risks (see discussions in Ghosh and Lin 2003). The validity of the proposed estimation procedure in Section 3 only relies on the population structure for recurrent events occurring before the terminal event.

Assumption (1) implies that the occurrence of recurrent events follows a proportional intensity model, where the unobserved frailty *Z _{i}* inflates/deflates the intensity. Because of the memoryless property of a Poisson process, conditional on

We denote by *m _{i}* the number of recurrent events that occurred before time

For estimation of the regression parameter *β*, an initial attempt might use the conditional technique used in Wang et al. (2001) to eliminate nuisance parameters from the likelihood. Under (1) the event times (*t*_{i}_{1}, …, *t _{imi}*)of the

$$\frac{{Z}_{i}{\lambda}_{0}\left(t\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\phantom{\rule{thinmathspace}{0ex}}\{{\mathit{X}}_{i}{\left(t\right)}^{\prime}\beta +{\mathit{W}}_{i}^{\prime}\gamma \}}{{\int}_{0}^{{Y}_{i}}{Z}_{i}{\lambda}_{0}\left(u\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\phantom{\rule{thinmathspace}{0ex}}\{{\mathit{X}}_{i}{\left(t\right)}^{\prime}\beta +{\mathit{W}}_{i}^{\prime}\gamma \}\mathit{du}},\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}0\u2a7dt\u2a7d{Y}_{i}.$$

Note that both the unobserved frailty *Z _{i}* and the time-independent covariates

$$\underset{i=1}{\overset{n}{\Pi}}\underset{j=1}{\overset{{m}_{i}}{\Pi}}\frac{{\lambda}_{0}\left(t\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\left\{{\mathit{X}}_{i}{\left({t}_{\mathit{ij}}\right)}^{\prime}\beta \right\}}{{\int}_{0}^{{Y}_{i}}{\lambda}_{0}\left(u\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\phantom{\rule{thinmathspace}{0ex}}\left\{{\mathit{X}}_{i}{\left(u\right)}^{\prime}\beta \right\}\mathit{du}}=\underset{i=1}{\overset{n}{\Pi}}\underset{j=1}{\overset{{m}_{i}}{\Pi}}\frac{d{\pi}_{i}\left({t}_{\mathit{ij}}\right)}{{\pi}_{i}\left({Y}_{i}\right)},$$

(2)

where *dπ _{i}*(

$$d{\pi}_{i}\left(t\right)=\frac{{\lambda}_{0}\left(t\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\left\{{\mathit{X}}_{i}{\left(t\right)}^{\prime}\beta \right\}}{{\int}_{0}^{\tau}{\lambda}_{0}\left(u\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\left\{{\mathit{X}}_{i}{\left(u\right)}^{\prime}\beta \right\}\mathit{du}}I(0\u2a7dt\u2a7d\tau ),$$

and ${\pi}_{i}\left(t\right)={\int}_{0}^{t}d{\pi}_{i}\left(u\right)$. Note that *π _{i}* defines a proper distribution function with

When the recurrent event model only includes time-independent covariates, the conditional density function further reduces to λ_{0}(*t*)/Λ_{0}(*Y _{i}*), which is in the form of a truncated density function. It is easy to see that for this special case the conditional likelihood depends only on λ

Because (2) is computationally equivalent to the semiparametric likelihood of a set of independently right-truncated random variables, we can reformulate the problem as estimating the regression parameter ** β** using the data {

For any two event times *t _{ij}* and

$$\begin{array}{cc}\hfill & \frac{d{\pi}_{i}\left({t}_{\mathit{ij}}\right)d{\pi}_{k}\left({t}_{\mathit{kl}}\right)}{d{\pi}_{i}\left({t}_{\mathit{ij}}\right)d{\pi}_{k}\left({t}_{\mathit{kl}}\right)+d{\pi}_{i}\left({t}_{\mathit{kl}}\right)d{\pi}_{k}\left({t}_{\mathit{ij}}\right)}\hfill \\ \hfill & \phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}=\frac{\mathrm{exp}\left[{\{{\mathit{X}}_{i}\left({t}_{\mathit{ij}}\right)+{\mathit{X}}_{k}\left({t}_{\mathit{kl}}\right)\}}^{\prime}\beta \right]}{\mathrm{exp}[{\{{\mathit{X}}_{i}\left({t}_{\mathit{ij}}\right)+{\mathit{X}}_{k}\left({t}_{\mathit{kl}}\right)\}}^{\prime}\beta +\mathrm{exp}\left[{\{{\mathit{X}}_{i}\left({t}_{\mathit{kl}}\right)+{\mathit{X}}_{k}\left({t}_{\mathit{ij}}\right)\}}^{\prime}\beta \right]}=\frac{1}{1+\mathrm{exp}\left\{{\rho}_{\mathit{ik}}{({t}_{\mathit{ij}},{t}_{\mathit{kl}})}^{\prime}\beta \right\}},\hfill \end{array}$$

(3)

where * ρ_{ik}*(

$$\underset{i<k}{\Pi}\underset{j=1}{\overset{{m}_{i}}{\Pi}}\underset{l=1}{\overset{{m}_{k}}{\Pi}}{\left[\frac{1}{1+\mathrm{exp}\left\{{\rho}_{\mathit{ik}}{({t}_{\mathit{ij}},{t}_{\mathit{kl}})}^{\prime}\beta \right\}}\right]}^{{\delta}_{\mathit{ijkl}}}.$$

The score function derived from the logged pairwise pseudolikelihood can be expressed as

$$\underset{i<k}{\Sigma}\int \int I(t\u2a7d{Y}_{\mathit{ik}},u\u2a7d{Y}_{\mathit{ik}})\left[\frac{-\mathrm{exp}\left\{{\rho}_{\mathit{ik}}{(t,u)}^{\prime}\beta \right\}}{1+\mathrm{exp}\left\{{\rho}_{\mathit{ik}}{(t,u)}^{\prime}\beta \right\}}{\rho}_{\mathit{ik}}(t,u)\right]{\mathit{dN}}_{i}\left(t\right){\mathit{dN}}_{k}\left(u\right),$$

where *Y _{ik}* =

$$\mathit{h}({d}_{i},{D}_{k};\beta )=\int \int I(t\u2a7d{Y}_{\mathit{ik}},u\u2a7d{Y}_{\mathit{ik}})\left[\frac{-\mathrm{exp}\left\{{\rho}_{\mathit{ik}}{(t,u)}^{\prime}\beta \right\}}{1+\mathrm{exp}\left\{{\rho}_{\mathit{ij}}{(t,u)}^{\prime}\beta \right\}}{\rho}_{\mathit{ik}}(t,u)\right]{\mathit{dN}}_{i}\left(t\right){\mathit{dN}}_{k}\left(u\right),$$

and denote the score function by

$${\mathit{S}}_{p}\left(\beta \right)=\frac{1}{\left(\begin{array}{c}\hfill n\hfill \\ \hfill 2\hfill \end{array}\right)}\underset{i<k}{\Sigma}\mathit{h}({D}_{i},{D}_{k};\beta ).$$

It is easy to see that ** h** is permutation symmetric in its arguments and

*Assume that X(t) is bounded by M and E*[

Note that the variance covariance matrix ** V** can be estimated by ${\widehat{\mathit{V}}}_{2}^{-1}{\widehat{\mathit{V}}}_{1}{\widehat{\mathit{V}}}_{2}^{-1}$, where

$${\widehat{\mathit{V}}}_{1}=\frac{4}{n}\underset{i=1}{\overset{n}{\Sigma}}\frac{1}{\left(\begin{array}{c}\hfill n-1\hfill \\ \hfill 2\hfill \end{array}\right)}\underset{i<j<k}{\Sigma}\mathit{h}({D}_{i},{D}_{j};\widehat{\beta})\mathit{h}{({D}_{i},{D}_{k};\widehat{\beta})}^{\prime},$$

and

$${\widehat{\mathit{V}}}_{2}=-\frac{1}{\left(\begin{array}{c}\hfill n\hfill \\ \hfill 2\hfill \end{array}\right)}\underset{i<k}{\Sigma}\frac{\partial \mathit{h}({D}_{i},{D}_{k};\widehat{\beta})}{\partial \beta}.$$

Also note that based on the score function * S_{p}*, a test statistic for testing the hypothesis

$$\frac{1}{\left(\begin{array}{c}\hfill n\hfill \\ \hfill 2\hfill \end{array}\right)}\underset{j<k}{\Sigma}{N}_{i}\left({Y}_{\mathit{ik}}\right){N}_{k}\left({Y}_{\mathit{ik}}\right){\int}_{0}^{{Y}_{\mathit{ik}}}\{{\mathit{X}}_{i}\left(u\right)-{\mathit{X}}_{k}\left(u\right)\}\left\{\frac{{\mathit{dN}}_{i}\left(u\right)}{{N}_{i}\left(u\right)}-\frac{{\mathit{dN}}_{k}\left(u\right)}{{N}_{k}\left(u\right)}\right\}.$$

Interestingly, this test statistic for the effects of time-dependent covariates does not require information about time-independent covariates, and hence is useful for checking proportionality assumption in the usual proportional rate model.

By definition *π _{i}*(

$$\underset{{s}_{l}>t}{\Pi}\left\{1-\frac{{d}_{l}\left(\beta \right)}{{R}_{1}\left(\beta \right)}\right\},$$

where ${d}_{l}\left(\beta \right)=\frac{1}{n}{\Sigma}_{i=1}^{n}{\Sigma}_{j=1}^{{m}_{i}}I({t}_{\mathit{ij}}={s}_{l})\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\{-{\mathit{X}}_{i}{\left({t}_{\mathit{ij}}\right)}^{\prime}\beta \}$ and ${R}_{l}\left(\beta \right)=\frac{1}{n}{\Sigma}_{i=1}^{n}{\Sigma}_{j=1}^{{m}_{i}}I({t}_{\mathit{ij}}\u2a7d{s}_{l}\u2a7d{Y}_{i})\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\{-{\mathit{X}}_{i}{\left({t}_{\mathit{ij}}\right)}^{\prime}\beta \}$. Note that ** β** = 0 implies that the assigned weight is a unit weight and the proposed estimator reduces to the product-limit estimator that maximizes the nonparametric likelihood function for truncated data. By replacing

$$\widehat{F}(t;\widehat{\beta})=\underset{{s}_{l}>t}{\Pi}\left\{1-\frac{{d}_{1}\left(\widehat{\beta}\right)}{{R}_{1}\left(\widehat{\beta}\right)}\right\}.$$

The large sample properties of (*t*) are stated in Lemma 1, with proofs given in Appendix.

*Assume that (a)* Λ_{0}(*τ*) > 0, *(b) Pr*(*Y* > *τ*, *Z* > 0) > 0, *and (c) G*(*u*) = *E*{*ZI*(*Y* *u*)exp(** W**′

To estimate the regression parameters of time-independent covariates, we note that, conditioning on {*Z _{i}*,

$$E\{{m}_{i}\mid {Z}_{i},{Y}_{i},{\mathit{W}}_{i},{\mathit{x}}_{i}\left({Y}_{i}\right)\}={Z}_{i}{\Lambda}_{0}\left(\tau \right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\left\{{\mathit{W}}_{i}^{\prime}\gamma \right\}\int I({Y}_{i}\u2a7eu)\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\left\{{\mathit{X}}_{i}{\left(u\right)}^{\prime}\beta \right\}\mathit{dF}\left(u\right).$$

Thus, following the assumption *E*[*Z _{i}* |

$$E\left\{\frac{{m}_{i}}{{\int}_{0}^{{Y}_{i}}\mathrm{exp}\left\{{\mathit{X}}_{i}{\left(u\right)}^{\prime}\beta \right\}\mathit{dF}\left(u\right)}\mid {\mathit{W}}_{i}\right\}={\Lambda}_{0}\left(\tau \right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}{\mathit{W}}_{i}^{\prime}\gamma ).$$

We propose to estimate ** γ** and Λ

$$\frac{1}{n}\underset{i=1}{\overset{n}{\Sigma}}{\mathit{W}}_{i}^{\ast}\left[\frac{{m}_{i}}{{\int}_{0}^{{Y}_{i}}\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\left\{{\mathit{X}}_{i}{\left(u\right)}^{\prime}\widehat{\beta}\right\}d\widehat{F}\left(u\right)}-\mathrm{exp}\left\{{\left({\mathit{W}}_{i}^{\ast}\right)}^{\prime}\eta \right\}\right]=0,$$

(4)

where ${\mathit{W}}_{i}^{\ast}={(1,{\mathit{W}}_{i}^{\prime})}^{\prime}$ and ** η** = (lnΛ

Following Theorem 1 and Lemma 1, we can establish the asymptotic properties for ** γ** and Λ

Assume that the conditions specified in Theorem 1 and Lemma 1 hold, $\sqrt{n}(\widehat{\gamma}-\gamma )$ converges weakly to a multivariate normal distribution with mean 0 and *variance-covariance matrix E*(−* ξ/γ*)

$$4F{\left(t\right)}^{2}\mathrm{exp}\left(2{\eta}_{1}\right)E\left[\{f({D}_{1},{D}_{2})+\kappa ({D}_{1},{D}_{2};t,\beta )\}\{f({D}_{1},{D}_{3})+\kappa ({D}_{1},{D}_{3};t,\beta )\}\right],$$

*where f*(*D _{i}*,

We conduct Monte Carlo simulation studies to evaluate the finite-sample properties of the proposed estimator. For all simulations, we generate 1,000 simulated datasets, each with *n* = 100 and *n* = 400 independent subjects. For subject *i*, the frailty *Z _{i}* is generated from a gamma distribution with unit mean and variance 4, and the time-independent covariate

We compare the results from the proposed estimation procedure to that from the Lin et al. (2000), termed as LWYY, method. The empirical bias, empirical standard error and the mean square error of the estimates based on 1,000 samples are shown in Table 1. Figures Figures11 and and22 show the estimates and the pointwise 95% confidence intervals of the baseline cumulative intensity function. As summarized in the table, the average length of follow-up period is approximately 5.8, and the average number of observed recurrent events ranges from 3.3 to 4.7 in these simulation studies. The estimators of *β* and *γ* from the proposed method perform well in that the biases in the estimates of *β* and *γ* are small, and the averages of Λ_{0}(*t*) are almost indistinguishable from the true curve. On the other hand, using the LWYY method, which requires independent censoring to draw valid inference, results in biased estimation as well as greater mean squared errors of the covariate effects. Under the specified conditions in our simulations, simulated control subjects (*W _{i}* = 0) with higher frailty values tend to have a longer observation period. Thus risk sets are more likely to consist of sicker subjects at later time points. As a result, the LWYY method that compares subjects within risk sets underestimates the difference between treatment group and control group in the intensity of recurrent events. Note that, compared to the LWYY method, the relative efficiency of the proposed estimator depends on the degree of association between the censoring mechanism and the recurrent event process. Under the independent censoring assumption, the proposed method is expected to be less efficient than the LWYY method since the estimation steps involves procedures to handle the informative censoring which results in loss of estimation efficiency.

Plots of estimated _{0}(*t*) with pointwise 95% confidence intervals for Scenario I: Λ_{0}(*t*) = *t*/2 (—: true curve, - - - : empirical average, : pointwise 95% confidence interval).

Plots of estimated _{0}(*t*) with pointwise 95% confidence intervals for Scenario II: Λ_{0}(*t*) = *t*^{3/2}/6 (—: truecurve, --- : empirical average, : pointwise 95% confidence interval).

We use our method to analyze data from the AIDS Link to Intravenous Experience (ALIVE) study (Vlahov et al. 1991). During the period February 1988 through March 1989, a total of 2,946 active injection drug users in the city of Baltimore, Maryland were recruited into the ALIVE study through extensive community outreach efforts. Participants underwent a screening interview in which data on sociodemographic factors, history of drug use and sexual practice over the previous 10 years were obtained. Information on HIV testing was collected at follow-up visits scheduled at 6-month intervals. The date of seroconversion was estimated as the midpoint between the dates of the first positive HIV test and the date of the last negative HIV test, if available. We analyze hospital admissions recorded between July 16, 1993 and December 31, 1997 from 1,896 intravenous drug users who had at least one follow-up visit at the study clinic during the same period of time. Among these participants, there were 1,412 (74%) males and 1,781 (95%) Africa Americans. The number of hospital admissions ranges from 0 to 19, averaging 3.5 per subject, whereas 1,026 (54%) subjects had no hospitalization record. A total of 244 (13%) participants died during the four-and-a-half study period, among them 200 were male (82%) and 233 (95%) were black.

We apply the proposed method to study patient's risk of hospitalization since time 0 (July 16, 1993). The covariates of interest in our analysis include patient's HIV status (*X _{i}*(

For model (1), the estimated is 0.16 with 95% bootstrap confidence interval (−0.62, 0.99), and the estimated _{1} and _{2} are −0.26 and −0.06, with corresponding 95% confidence intervals (−0.45, −0.08) and (−0.06, 0.19). The HIV-positive status is associated with 17% increase in the risk of hospitalization, but the difference is not statistically significant. African Americans have a insignificantly lower rate of hospitalization compared to non-African Americans, while males have a significantly lower risk (23% lower) than females. Figure 3 shows the estimated _{0}(*t*) and the pointwise 95% bootstrap confidence intervals. The estimated baseline cumulative rate function is close to a straight line, suggesting that the risk of hospitalization is approximately constant over time. For comparison, we also apply the LWYY method to analyze the ALIVE data. The estimated is 0.50 with 95% bootstrap confidence interval (−0.36, 0.67), and the estimated _{1} and _{2} are −0.27 and −0.18, with corresponding 95% confidence intervals (−0.44, −0.10) and (−0.48, 0.19). The direction of covariate effects estimated in the LWYY model are consistent with the estimates under the proposed model.

In this paper, we develop a flexible estimation procedure that does not require parametric assumptions about the distributions of the frailty variable and the censoring time. In particular, we apply the modified pairwise pseudolikelihood method (Liang and Qin 2000) based on all comparable pairs of event times to estimate ** β**. The proposed pseudolikelihood-based estimation procedure involves neither frailty nor the time-independent covariates, and hence is very useful for checking proportionality assumption in the widely used proportional rate models.

Naturally we can consider applying the pseudolikelihood method to three or more comparable event times, and expect a potential gain in efficiency. Due to comparability constraints, however, the number of comparable triples may not be much more than the number of comparable pairs. Hence the efficiency gain can be very limited, while the computation time may increase substantially. As an alternative to increase efficiency, we consider the disjoint time intervals [0, *Y*_{(1)}], (*Y*_{(1)}, *Y*_{(2)}], …, (*Y*_{(n−1)}, *Y*_{(n)}], where (*Y*_{(1)}, …, *Y*_{(n)}) are order statistics of the censoring times {*Y*_{1}, …, *Y _{n}*}. Note that any event time is comparable with all other event times within the same time interval. Thus, we can formulate a pseudolikelihood for each disjoint interval by conditioning on the order statistics of all the event times within that interval, and the denominator of the pseudolikelihood can be approximated by drawing a subset of all possible permutations of event times within the interval. We then estimate

As discussed in Section 3.2, the conditional likelihood (2) is computationally equivalent to the likelihood of a set of independent observations that are subject to two sources of bias: right truncation and biased sampling. In the literature, the product-limit estimator is used for inference under random truncation (Wang et al. 1986), while inverse probability weighting estimators are often used to correct for sampling bias (Horvitz and Thompson 1952). In our setting, however, both sources of bias are present simultaneously. To correct the bias in estimating the shape function of the recurrent event process, we propose an estimator that properly combines the inverse probability weighting technique with the product limit estimator by assigning each event time in the risk set a weight proportional to the inverse of its sampling probability. Interestingly, the estimation of the shape function does not require information about time-independent covariates and the unobserved frailty, and is reduced to the usual product limit estimator when ** β** = 0.

Note that the censoring time in our model formulation is treated as a nuisance. In many applications, especially when a failure event precludes the observation of recurrent events, study interests are placed on the joint inference of the recurrent event process and the failure times. Shared frailty models such as the fully parametric approach by Lancaster and Intrator (1998) and semiparametric approaches by Liu et al. (2004) and Huang and Wang (2004) have been used to study recurrent events and failure time data jointly. The first two models require a parametric assumption for the unobserved frailty distribution, and thus suffer from lack of model checking techniques; moreover, censoring mechanism other than the failure event of interest is required to be random for validating their inferential results. On the other hand, the Huang and Wang (2004) model is more robust in that the frailty distribution is left unspecified and censoring mechanism other than the failure event is allowed to be correlated with the recurrent event process and the failure times. Their estimation procedure, however, inherits the properties of the estimator studied by Wang et al. (2001), and hence can not handle time-dependent covariates. By properly combining the new methodology proposed in this paper and the “borrow-strength estimation procedure” studied in Huang and Wang (2004), we can easily extend their joint inference procedure to handel both time-dependent and time-independent covariates. The properties of the new estimation procedure will be studied elsewhere.

This article does not intend to develop formal model checking methods. Under informative censoring, model checking is expected to be a difficult task in general. we simply suggest a possible approach for validating model assumptions. A rigorous study will be done elsewhere. To check on the proportional intensity model assumption imposed by (1), we could replace *Z _{i}* with ${\widehat{Z}}_{i}={m}_{i}\u2215\left\{{\int}_{0}^{{Y}_{i}}{\lambda}_{0}\left(u\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\{{\mathit{X}}_{i}{\left(t\right)}^{\prime}\widehat{\beta}+{\mathit{W}}_{i}^{\prime}\widehat{\gamma}\}\mathit{du}\right\}$ to derive the Schoenfeld residuals (Schoenfeld 1982). If the assumption of proportional intensity model holds, the derived residuals are expected to randomly scattered around 0.

Mei-Cheng Wang's work was supported by National Institutes of Health grant R01 AI078835. The authors thank Drs. David Vlahov and Steffanie Strathdee for providing the data from the ALIVE study, which was supported by NIDA grants DA 04334 and DA 08009. Comments from Drs. Mike Proschan and Dean Follmann have improved the presentation of this paper.

Because − log[1 + exp{* ρ_{ik}*(

For convenience, we denote *a*^{2} = *aa*′ for any vector *a*. Applying Taylor expansion to *S _{p}*(

Define the functions $Q\left(u\right)={\int}_{0}^{u}G\left(\upsilon \right)d{\Lambda}_{0}\left(\upsilon \right)$ and *R*(*u*) = *G*(*u*)Λ_{0}(*u*). It can be shown that n the empirical averages $\stackrel{~}{Q}(u;\beta )=\frac{1}{n}{\Sigma}_{i=1}^{n}{\Sigma}_{j=1}^{{m}_{i}}I({t}_{\mathit{ij}}\u2a7du)\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\left(\right\{-{\mathit{X}}_{i}{\left({t}_{\mathit{ij}}\right)}^{\prime}\beta \}$ and $\stackrel{~}{R}(u;\beta )=\frac{1}{n}{\Sigma}_{i=1}^{n}{\Sigma}_{j=1}^{{m}_{i}}I({t}_{\mathit{ij}}\u2a7du\u2a7d{Y}_{i})\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\{-{\mathit{X}}_{i}{\left({t}_{\mathit{ij}}\right)}^{\prime}\beta \}$ are unbiased estimators of *Q*(*u*)and *R*(*u*) if ** β** is the true parameter value. Let (

Further, it is easy to see that ${\int}_{t}^{\tau}R{\left(u\right)}^{-1}\mathit{dQ}\left(u\right)=-\mathrm{ln}F\left(t\right)$. Note that assumptions (a) and (b) implies that *R*(*τ*) > 0. Thus for any constant *c** > inf{*y* : Λ_{0}(*y*) > 0}, one has *R*(*u*) > 0 for *c** *u* < *τ*. As *n* → ∞, (*u*) and (*u*) converge almost surely to *Q*(*u*) and *R*(*u*) uniformly in *u* [*c**, *τ*]. By approximation techniques for product-limit estimators and the inequality 0 < − ln(1 − *u*^{−1}) − *u*^{−1} < *u*^{−1}(*u* − 1)^{−1}, for *u* > 1, we can show that $-\mathrm{ln}\phantom{\rule{thinmathspace}{0ex}}\widehat{F}\left(t\right)-{\int}_{t}^{\tau}d\widehat{Q}\left(u\right)\u2215\widehat{R}\left(u\right)\to 0$ almost surely for each *t* [*c**, *τ*], with the usual convention 0/0 = 0. Hence,

$$\widehat{F}\left(t\right)=\mathrm{exp}\left\{-{\int}_{t}^{\tau}\frac{d\widehat{Q}\left(u\right)}{\widehat{R}\left(u\right)}\right\}+{O}_{p}\left({n}^{-1\u22152}\right).$$

Because the mapping of ${\int}_{t}^{\tau}d\widehat{Q}\left(u\right)\u2215\widehat{R}\left(u\right)$ from the two empirical processes, under mild regularity conditions, is compactly differentiable with respect to the supremum norm and the two processes converge weakly to their limits (see example 2.11.16 of van der Vaart and Wellner 1996), we apply the functional delta method to ${\int}_{t}^{\tau}d\widehat{Q}\left(u\right)\u2215\widehat{R}\left(u\right)$ and establish its asymptotic representation

$$\begin{array}{cc}\hfill {\int}_{t}^{\tau}\frac{d\widehat{Q}\left(u\right)}{\widehat{R}\left(u\right)}-{\int}_{t}^{\tau}\frac{\mathit{dQ}\left(u\right)}{R\left(u\right)}& ={\int}_{t}^{\tau}\frac{d\{\stackrel{~}{Q}(u;\widehat{\beta})-\stackrel{~}{Q}(u;\beta )\}}{R\left(u\right)}-{\int}_{t}^{\tau}\frac{\{\stackrel{~}{R}(u;\widehat{\beta})-\stackrel{~}{R}(u;\beta )\}\mathit{dQ}\left(u\right)}{R{\left(u\right)}^{2}}\hfill \\ \hfill & \phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}+{\int}_{t}^{\tau}\frac{d\stackrel{~}{Q}(u;\beta )}{R\left(u\right)}-{\int}_{t}^{\tau}\frac{\stackrel{~}{R}(u;\beta )}{R{\left(u\right)}^{2}}\mathit{dQ}\left(u\right)+{o}_{p}\left({n}^{-1\u22152}\right)\hfill \\ \hfill & =\frac{1}{\left(\begin{array}{c}\hfill n\hfill \\ \hfill 2\hfill \end{array}\right)}\underset{i<k}{\Sigma}\varphi ({D}_{i},{D}_{k};t,\beta )+\frac{1}{n}\underset{i=1}{\overset{n}{\Sigma}}\psi ({D}_{i};t,\beta )+{o}_{p}\left({n}^{-1\u22152}\right),\hfill \end{array}$$

where $\varphi ({D}_{i},{D}_{k};t,\beta )={\int}_{t}^{\tau}\{R{\left(u\right)}^{-1}d{\mathit{V}}_{\stackrel{~}{Q}}\left(u\right)-{\mathit{V}}_{\stackrel{~}{R}}\left(u\right)R{\left(u\right)}^{-2}\mathit{dQ}\left(u\right)\}{\mathit{V}}_{2}^{-1}\mathit{h}({D}_{i},{D}_{k};\beta )$ and $\psi ({D}_{i};t,\beta )={\Sigma}_{j=1}^{{m}_{i}}I(t<{t}_{\mathit{ij}}\u2a7d\tau )R{\left({t}_{\mathit{ij}}\right)}^{-1}-{\int}_{t}^{\tau}I({t}_{\mathit{ij}}\u2a7du\u2a7d{Y}_{i})\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\{-{\mathit{X}}_{i}{\left({t}_{\mathit{ij}}\right)}^{\prime}\beta \}R\left(u\right){}^{-2}\mathit{dQ}\left(u\right)$. Let *κ*(*D _{i}*,

Let *H* bet the joint probability measure of (** W***,

$$\begin{array}{cc}\hfill \xi ({D}_{i},{D}_{k})& =\int \frac{-{\mathit{w}}^{\ast}m}{{\left[{\int}_{0}^{y}\mathrm{exp}\left\{\mathit{x}{\left(u\right)}^{\prime}\beta \right\}\mathit{dF}\left(u\right)\right]}^{2}}[{\int}_{0}^{y}\mathit{x}{\left(u\right)}^{\prime}{\mathit{V}}_{2}^{-1}\mathit{h}({D}_{1},{D}_{k};\beta )\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\left\{\mathit{x}{\left(u\right)}^{\prime}\beta \right\}\mathit{dF}\left(u\right)\phantom{]}\hfill \\ \hfill & \phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}+\phantom{[}{\int}_{0}^{y}\mathrm{exp}\left\{\mathit{x}{\left(u\right)}^{\prime}\beta \right\}d\left\{\kappa ({D}_{i},{D}_{k};u,\beta )F\left(u\right)\right\}]\mathit{dH}({\mathit{w}}^{\ast},m,\mathit{x},y)\hfill \\ \hfill & +\frac{1}{2}{\mathit{W}}_{i}^{\ast}\left\{\frac{{m}_{i}}{{\int}_{0}^{{Y}_{i}}\mathrm{exp}\left\{{\mathit{X}}_{i}{\left(u\right)}^{\prime}\beta \right\}\mathit{dF}\left(u\right)}-\mathrm{exp}\left\{\phantom{\rule{thickmathspace}{0ex}}{\left({\mathit{W}}_{i}^{\ast}\right)}^{\prime}\eta \right\}\right\}\hfill \\ \hfill & +\frac{1}{2}{\mathit{W}}_{k}^{\ast}\left\{\frac{{m}_{k}}{{\int}_{0}^{{Y}_{k}}\mathrm{exp}\left\{{\mathit{X}}_{k}{\left(u\right)}^{\prime}\beta \right\}\mathit{dF}\left(u\right)}-\mathrm{exp}\left\{{\left({\mathit{W}}_{k}^{\ast}\right)}^{\prime}\eta \right\}\right\}.\hfill \end{array}$$

It can be verified that ${\left(\begin{array}{c}\hfill n\hfill \\ \hfill 2\hfill \end{array}\right)}^{-1}{\Sigma}_{i<k}\kappa ({D}_{i},{D}_{k};t,\beta )$ is a U-statistic with *E*{** ξ**(

To study the asymptotic normality of $\sqrt{n}\{{\widehat{\Lambda}}_{0}\left(t\right)-{\Lambda}_{0}\left(t\right)\}$, we write

$$\begin{array}{cc}\hfill \sqrt{n}\{{\widehat{\Lambda}}_{0}\left(t\right)-{\Lambda}_{0}\left(t\right)\}& =\sqrt{n}F\left(t\right){e}^{{\eta}_{1}}({\widehat{\eta}}_{1}-{\eta}_{1})+{\sqrt{\mathit{ne}}}^{{\eta}_{1}}\{\widehat{F}\left(t\right)-F\left(t\right)\}+{o}_{p}\left(1\right)\hfill \\ \hfill & ={\sqrt{\mathit{ne}}}^{{\eta}_{1}}F\left(t\right)\frac{1}{\left(\begin{array}{c}\hfill n\hfill \\ \hfill 2\hfill \end{array}\right)}\underset{i<k}{\Sigma}\{f({D}_{i},{D}_{k})+\kappa ({D}_{i},{D}_{k};t,\beta )\}+{o}_{p}\left(1\right),\hfill \end{array}$$

where *f*(*D _{i}*,

- Andersen EB. Asymptotic Properties of Conditional Maximum-likelihood Estimators. Journal of the Royal Statistical Society, Series B. 1970;32:283–301.
- Andersen PK, Gill RD. Cox's regression model for counting processes: a large sample study. Annals of Statistics. 1982;10:1100–1120.
- Chang S-H, Wang M-C. Conditional regression analysis for recurrence time data. Journal of the American Statistical Association. 1999;94:1221–1230.
- Farrington CP, Whitaker HJ. Semiparametric analysis of case series data. Journal of the Royal Statistical Society, Series C. 2006;55:553–594.
- Ghosh D. Accelerated rates regression models for recurrent failure data. Lifetime Data Analysis. 2004;10:247–261. [PubMed]
- Ghosh D, Lin DY. Semiparametric analysis of recurrent events in the presence of dependent censoring. Biometrics. 2003;59:877–885. [PubMed]
- Huang C-Y, Wang M-C. Joint modeling and estimation for recurrent event processes and failure time data. Journal of the American Statistical Association. 2004;99:1153–1165. [PMC free article] [PubMed]
- Huang C-Y, Wang M-C, Zhang Y. Analysing panel count data with informative observation times. Biometrika. 2006;93:763–775. [PMC free article] [PubMed]
- Huang X, Liu L. A joint frailty model for survival time and gap times between recurrent events. Biometrics. 2007;63:389–397. [PubMed]
- Hoeffding W. A class of statistics with asymptotically normal distribution. Annals of Mathematical Statististics. 1948;19:293–325.
- Horvitz DG, Thompson DJ. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association. 1952;47:663–685.
- Kalbfleisch JD. Likelihood methods and nonparametric tests. Journal of the American Statistical Association. 1978;73:167–170.
- Lancaster T, Intrator O. Panel data with survival: hospitalization of HIV-positive patients. Journal of the American Statistical Association. 1998;93:46–53.
- Lawless JF, Nadeau C. Some simple robust methods for the analysis of recurrent events. Technometrics. 1995;37:158–168.
- Liang K-Y, Qin J. Regression analysis under non-standard situations: a pairwise pseudolikelihood approach. Journal of the Royal Statistical Society, Series B. 2000;62:773–786.
- Lin DY, Wei LJ, Yang I, Ying Z. Semiparametric regression for the mean and rate functions of recurrent events. Journal of the Royal Statistical Society, Series B. 2000;62:711–730.
- Liu L, Wolfe RA, Huang XL. Shared frailty models for recurrent events and a terminal event. Biometrics. 2004;60:747–756. [PubMed]
- Pepe MS, Cai J. Some graphical displays and marginal regression analyses for recurrent failure times and time dependent covariates. Journal of the American Statistical Association. 1993;88:811–820.
- Prentice RL, Williams BJ, Peterson AV. On the regression analysis of multivariate failure time data. Biometrika. 1981;68:373–379.
- Rondeau V, Mathoulin-Pelissier S, Jacqmin-Gadda H, Brouste V, Soubeyran P. Joint frailty models for recurring events and death using maximum penalized likelihood estimation: application on cancer events. Biostatistics. 2007;8:708–721. [PubMed]
- Schaubel DE, Zeng D, Cai J. A semiparametric additive rates model for recurrent event data. Lifetime Data Analysis. 2006;12 [PubMed]
- Serfling RJ. Approximation Theorems of Mathematical Statistics. Wiley; New York: 1980.
- Sun L, Su B. A class of accelerated means regression models for recurrent event data. Lifetime Data Analysis. 2008;14:357–375. [PubMed]
- Sun J, Tong X, He Xin. Regression analysis of panel count data with dependent observation times. Biometrics. 2007;63:1053–1059. [PubMed]
- van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. Springer-Verlag; New York: 1996.
- Vlahov D, Anthony JC, Muñoz A, Margolick J, Nelson KE, Celentano DD, Solomon L, Polk BF. The ALIVE study, a longitudinal study of HIV-1 infection in intravenous drug users: description of methods and characteristics of participants. The Journal of Drug Issues. 1991;21:759–776. [PubMed]
- Wang M-C, Jewell NP, Tsai W-Y. Asymptotic properties of the product limit estimate under random truncation. Annals of Statistics. 1986;14:1597–1650.
- Wang M-C, Qin J, Chiang C-T. Analyzing recurrent event data with informative censoring. Journal of the American Statistical Association. 2001;96:1057–1065. [PMC free article] [PubMed]
- Ye Y, Kalbfleisch JD, Schaubel ES. Semiparametric analysis of correlated recurrent and terminal events. Biometrics. 2007;63:78–87. [PubMed]
- Zeng D, Lin DY. Efficient estimation of semiparametric transformation models for counting processes. Biometrika. 2006;93:627–640.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |