Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC4935555

Formats

Article sections

Authors

Related links

Stat Med. Author manuscript; available in PMC 2017 August 15.

Published in final edited form as:

Stat Med. 2016 August 15; 35(18): 3049–3065.

Published online 2016 February 16. doi: 10.1002/sim.6901PMCID: PMC4935555

NIHMSID: NIHMS756693

Shanshan Li,^{a} Yifei Sun,^{b} Chiung-Yu Huang,^{b,}^{c,}^{*} Dean A. Follmann,^{d} and Richard Krause^{d}

Although recurrent event data analysis is a rapidly evolving area of research, rigorous studies on estimation of the effects of intermittently observed time-varying covariates on the risk of recurrent events have been lacking. Existing methods for analyzing recurrent event data usually require that the covariate processes are observed throughout the entire follow-up period. However, covariates are often observed periodically rather than continuously. We propose a novel semiparametric estimator for the regression parameters in the popular proportional rate model. The proposed estimator is based on an estimated score function where we kernel smooth the mean covariate process. We show that the proposed semiparametric estimator is asymptotically unbiased, normally distributed and derive the asymptotic variance. Simulation studies are conducted to compare the performance of the proposed estimator and the simple methods carrying forward the last covariates. The different methods are applied to an observational study designed to assess the effect of Group A streptococcus (GAS) on pharyngitis among school children in India.

In many epidemiology and biomedical settings, data on risk factors and events that occur repeatedly over time are collected. Modeling and estimation of covariate effects on the occurrence of recurrent events has been a much discussed topic in the past few decades; see [1] and [2] for comprehensive reviews. Statistical methods for recurrent event data analysis usually require that the covariate processes are observed throughout the entire follow-up period, continuing to the end of the study or until loss to follow-up [3, 4, 5, 6, 7]. In many applications, however, the values of time-varying covariates are only observed periodically. As a result, the missing covariates in the estimation functions need to be replaced with some estimated values. For continuous measures, one can obtain the predicted value at any time by assuming a mixed effect model or by smoothing the neighboring observed covariate values. See [8] for a nice summary. However, these approaches do not apply for binary measures, as there is not a ‘smoothed’ estimate for binary data. A natural alternative that works for both continuous and binary measures is the last covariate carried forward (LCCF) approach. Under LCCF, the last known value of the covariate is used forward in time until a new value is measured. Thus the true covariate process is approximated by a step function with jumps at the measurement times. Analogous to the bias induced by the LCCF method for time to event analysis [9, 10, 11], the LCCF method is expected to lead to biased estimation in the recurrent event data analysis.

The particular research that motivated this work is an observational study designed to evaluate the effect of Group A streptococcus (GAS) on the risk of developing pharyngitis (sore throat). Pharyngitis is most frequently due to viruses, but several bacteria, including Group A streptococci (GAS), persist as a common cause of pharyngitis even in the era of antibiotics. GAS pharyngitis is more prevalent in children than adults, and mainly occurs in winter and early spring. A World Health Organization report estimates that there are over 616 million new cases per year of GAS pharyngitis, of which over 550 million occur in less developed countries. To study the effect of GAS on pharyngitis, a total of 305 school children were recruited in a rural area near Vellore, India. During the follow-up period, cases of pharyngitis were identified weekly. Throat swabs were obtained on those with pharyngitis to identify the presence of GAS. Additionally, monthly throat cultures were obtained on the school children to determine the GAS carriage rate. In our analysis, occurrences of pharyngitis are the recurrent events of interest and GAS colonization status is a time-varying covariate.

We apply the popular semiparametric proportional rate model [12, 13, 3] to evaluate the effect of GAS colonization on pharyngitis. This model allows for an arbitrary baseline rate function of pharyngitis, and is an analogue to Cox regression for recurrent events. The parameters in the proportional rate model can be estimated by maximizing the pseudo-partial likelihood function. To properly construct the pseudo-partial likelihood, the GAS colonization status must be known exactly in everyone who is still under observation in the study whenever someone has an event. However, in the Indian pharyngitis study, the GAS status were observed monthly while the events of pharyngitis were assessed weekly. As a result, the covariate values at each event time were observed for children who had pharyngitis, but were possibly missing for other children in the corresponding risk set. Naively imputing the missing value with the last observed covariate may lead to biased estimation. Instead, we propose to solve an estimated score equation that is constructed by kernel smoothing the observed covariate values collected around each event time. The proposed estimator can be applied to handle both continuous and binary covariate processes, and it is shown to be asymptotically unbiased and normally distributed. We derive the asymptotic variance by properly incorporating the uncertainty of the estimated covariate functions in the estimation procedure. Simulations are conducted to compare the bias and efficiency of the proposed method with that of the LCCF approach. We apply the methods to the Indian study of GAS and pharyngitis.

Let subscript *i* be the index for a subject,
*i* = 1, 2, ..., *n*. For subject
*i*, let ${N}_{i}^{\ast}(t)$ be the number of recurrent events occurring at
or before time *t* [0,
*τ*], where the recurrent events could
potentially be observed beyond a prespecified time point
*τ*. Thus the counting process ${N}_{i}^{\ast}(t)$ has a jump of size one when an event (such as
the sore throat in the Indian pharyngitis study) occurs. Many authors, including
[14] and
[15], have considered
modeling the intensity function of the underlying recurrent event process
{ ${N}_{i}^{\ast}(\xb7)$, *t* [0,
*τ*]}, where the intensity function
is the instantaneous risk of event occurrence conditioning on the preceding
event history. Recent research has focused on modeling the marginal rate
function of the recurrent event process. The rate function
*λ _{i}*(

Let *Z _{i}*(

$$\lambda \{t\mid {Z}_{i}(t)\}={\lambda}_{0}(t)exp\{{\beta}^{\top}{Z}_{i}(t)\},$$

(1)

where *β* is a *p
×* 1 vector of the regression parameters and
*λ*_{0}(*t*) is an arbitrary
baseline rate function. The regression parameter
*β _{j}* is interpreted as the logarithm of
the ratio of the rate function at time

In most applications, the underlying counting process ${N}_{i}^{\ast}(\xb7)$ is subject to censoring due to loss to
follow-up or end of the study. Let *C _{i}* denote the
time to loss to follow-up or end of the study for subject

$$l(\beta )=\sum _{i=1}^{n}{\int}_{0}^{\tau}\left(\beta {Z}_{i}(t)-log\phantom{\rule{0.16667em}{0ex}}\left[\sum _{k=1}^{n}{Y}_{k}(t)exp\{\beta {Z}_{k}(t)\}\right]\right)\phantom{\rule{0.16667em}{0ex}}{dN}_{i}(t),$$

which is equivalent to solving the (normalized) pseudo-partial score function

$$U(\beta )={n}^{-1}\sum _{i=1}^{n}{\int}_{0}^{\tau}\left\{{Z}_{i}(t)-\frac{{S}^{(1)}(t,\beta )}{{S}^{(0)}(t,\beta )}\right\}\phantom{\rule{0.16667em}{0ex}}{dN}_{i}(t)$$

(2)

for zero, with ${S}^{(k)}(t,\beta )={n}^{-1}{\sum}_{i=1}^{n}{Y}_{i}(t){Z}_{i}{(t)}^{k}exp\{\beta {Z}_{i}(t)\}$, *k* = 0, 1, 2. The
construction of the pseudo-partial score function requires that
*Z _{i}*(

In practice, however, covariates are typically measured periodically. An
ad-hoc method to deal with missing covariate values at time *t*
is to fill in the missing values with the last observed covariate values before
time *t*. In studies of recurrent events, it is common that
covariate values are collected when an event occurs in addition to regular
follow-up visits. Thus there are two possible ways to carry forward the last
observation. One is to carry forward all covariates regardless of whether the
measurement is from an event or from the regular follow-up visits. For example,
in the Indian pharyngitis study, one may carry forward the status of GAS
colonization collected at the time of the pharyngitis event as well as the GAS
carriage status collected at the monthly visits. We call this all covariates
carried forward (ACCF) method. A second possibility is to carry forward only
regular follow-up visits, that is, GAS carriage data in the Indian pharyngitis
study. We term this the carriage covariates carried forward (CCCF) method.
Research has shown that the last covariate carried forward approach leads to
biased inferential results of the covariate effects in the survival setting
[9, 10]. To our knowledge no previous work in the
literature investigated the bias induced by the ACCF method or the CCCF method
in recurrent event data analysis.

In this section, we propose new statistical methods for recurrent event
data analysis when the data on the time-varying covariates are collected at
regular discrete time points in all subjects as well as at the exact times of an
event, for the subject having the event. As noted by many authors, including
[16], the
pseudo-partial score function *U* in (2) is a functional of four empirical
processes ${n}^{-1}{\sum}_{i=1}^{n}{Z}_{i}(t){dN}_{i}(t),{n}^{-1}{\sum}_{i=1}^{n}{dN}_{i}(t)$,
*S*^{(0)}(*t*,
*β*), and
*S*^{(1)}(*t*,
*β*). In the Indian pharyngitis study, throat swabs
were obtained on those with pharyngitis in order to identify the presence of
GAS, that is, covariates were collected at event visits. Therefore, under our
setting, *Z _{i}*(

To see this, we first consider the simple case where the time-dependent
covariate *Z _{i}*(

$$\frac{{S}^{(1)}(t,\beta )}{{S}^{(0)}(t,\beta )}\to \frac{{e}^{\beta}r(t)}{{e}^{\beta}r(t)+1-r(t)}$$

in probability as *n* →
∞. Thus the limiting function of *U* can be estimated
consistently if a consistent estimator of *r*(*t*)
can be obtained using available data.

Intuitively, one may estimate *r*(*t*),
that is, the prevalence rate of GAS colonization at time *t* in
the data example, by dividing the number of positive throat cultures collected
around time *t* by the total number of swabs collected around
time *t*. We consider employing a kernel estimator for
*r*(*t*) which computes a locally weighted
average of the covariate values. Let
*O _{i}*(

$${\widehat{r}}_{h}(t)=\frac{{n}^{-1}{\sum}_{i=1}^{n}{\int}_{0}^{\tau}{K}_{h}(t-u){Y}_{i}(u){Z}_{i}(u){dO}_{i}(u)}{{n}^{-1}{\sum}_{i=1}^{n}{\int}_{0}^{\tau}{K}_{h}(t-u){Y}_{i}(u){dO}_{i}(u)},\phantom{\rule{0.38889em}{0ex}}t\in [h,\tau -h].$$

(3)

To avoid bias estimation in the boundary region, we set
* _{h}*(

Define the covariate collection function
*m*(*t*) by
*m*(*t*)*dt* =
*E*{*dO*_{1}(*t*)},
thus *m*(*t*) is the instantaneous
“risk” of a regular visit occurring at time *t*.
We can show that, provided *m*(*t*) > 0 for
*t* [0, *τ*]
and under the regularity conditions given in the appendix, ${n}^{-1}{\sum}_{i=1}^{n}{\int}_{0}^{\tau}{K}_{h}(t-u){Y}_{i}(u){Z}_{i}(u){dO}_{i}(u)$ converges in probability to
*r*(*t*)*m*(*t*)*G*(*t*)
uniformly in *t* as *h* → 0 and
*nh*^{2} → ∞. Similarly, we have ${n}^{-1}{\sum}_{i=1}^{n}{\int}_{0}^{\tau}{K}_{h}(t-u){Y}_{i}(u){dO}_{i}(u)$ uniformly converges in probability to
*m*(*t*)*G*(*t*)
as *h* → 0 and *nh*^{2} →
∞. Thus the uniform consistency of
* _{h}*(

The proposed estimator
* _{h}*(

To estimate *β*, we construct the estimated score
function

$${\widehat{U}}_{h}(\beta )={n}^{-1}\sum _{i=1}^{n}{\int}_{0}^{\tau}\left\{{Z}_{i}(t)-\frac{{e}^{\beta}{\widehat{r}}_{h}(t)}{{e}^{\beta}{\widehat{r}}_{h}(t)+1-{\widehat{r}}_{h}(t)}\right\}\phantom{\rule{0.16667em}{0ex}}{dN}_{i}(t).$$

(4)

Let (*β*) be the limit of the
pseudo-partial likelihood *U*(*β*) in
(2) with the complete
covariate data. It follows directly from the consistency of
* _{h}*(

Note that although the proportional rate model (1) postulates the risk of experiencing
recurrent events at time *t* given the covariate history up to
*t*, the proposed estimation procedure borrows information
from covariate values beyond time *t* to derive a consistent
estimate of *r*(*t*) at time *t*.
The validity of the proposed approach relies on the fact that the proportional
rate model is formulated on the basis of the rate function, that is, the risk of
experiencing recurrent events unconditioning on the event history. Under this
marginal model, the estimated score function and the pseudo-partial score
function converge to the same limit, provided that
*r*(*t*) can be estimated consistently.

Interestingly, in the special case where the expected value of the
covariate process over all individuals is known to be constant, that is,
*r*(*t*) = *r*, one can
show that

$$U(\beta )={n}^{-1}\sum _{i=1}^{n}{\int}_{0}^{\tau}\left\{{Z}_{i}(t)-\frac{{e}^{\beta}r}{{e}^{\beta}r+1-r}\right\}\phantom{\rule{0.16667em}{0ex}}{dN}_{i}(t)+{o}_{p}({n}^{-1/2}).$$

Thus, solving *U*(*β*) = 0
yields

$${e}^{\beta}=\frac{{\sum}_{i=1}^{n}{\int}_{0}^{\tau}{Z}_{i}(t){dN}_{i}(t)}{{\sum}_{i=1}^{n}{\int}_{0}^{\tau}\{1-{Z}_{i}(t)\}{dN}_{i}(t)}\times \frac{1-r}{r}+{o}_{p}({n}^{-1/2}),$$

(5)

where ${n}_{1}={\sum}_{i=1}^{n}{\int}_{0}^{\tau}{Z}_{i}(t){dN}_{i}(t)$ and ${n}_{0}={\sum}_{i=1}^{n}{\int}_{0}^{\tau}\{1-{Z}_{i}(t)\}{dN}_{i}(t)$ are the numbers of positive and negative throat
cultures at sick visits in the Indian pharyngitis study. By replacing
*r* in (5)
with the proportion of positive throat cultures at all monthly visits, it is
interesting to see that exp(*β*) can be consistently
estimated by a simple cross-product ratio

$$\frac{{n}_{1}}{{n}_{0}}\times \frac{{z}_{0}}{{z}_{1}},$$

(6)

where ${z}_{1}={\sum}_{i=1}^{n}{\int}_{0}^{\tau}{Z}_{i}(t){dO}_{i}(t)$ and ${z}_{0}={\sum}_{i=1}^{n}{\int}_{0}^{\tau}\{1-{Z}_{i}(t)\}{dO}_{i}(t)$are the numbers of positive and negative throat
cultures at the regular follow-up visits. In other words, the log cross-product
ratio,
log{(*n*_{1}*z*_{0})/(*n*_{0}*z*_{1})},
is a consistent estimator for *β*.

So far we have focused on the case where
*Z _{i}*(

$$\frac{{n}^{-1}{\sum}_{i=1}^{n}{\int}_{0}^{\tau}{K}_{h}(t-u){Y}_{i}(u){Z}_{i}{(u)}^{k}exp\{\beta {Z}_{i}(u)\}{dO}_{i}(u)}{{n}^{-1}{\sum}_{i=1}^{n}{\int}_{0}^{\tau}{K}_{h}(t-u){Y}_{i}(u){dO}_{i}(u)},\phantom{\rule{0.38889em}{0ex}}t\in [h,\tau -h].$$

Define ${\widehat{S}}_{h}^{(k)}(t,\beta )={n}^{-1}{\sum}_{i=1}^{n}{\int}_{0}^{\tau}{K}_{h}(t-u){Y}_{i}(u){Z}_{i}{(u)}^{k}exp\{\beta {Z}_{i}(u)\}{dO}_{i}(u)$. To avoid the bias in the boundary region, we
set ${\widehat{S}}_{h}^{(k)}(t,\beta )={\widehat{S}}_{h}^{(k)}(h,\beta )$ for *t* [0,
*h*), and set ${S}_{h}^{(k)}(t,\beta )={\widehat{S}}_{h}^{(k)}(\tau -h,\beta )$ for *t*
(*τ* − *h*,
*τ*]. In the Appendix, we show that ${\widehat{S}}_{h}^{(k)}(t,\beta )$ converges uniformly in probability to their
limits
*s*^{(}^{k}^{)}(*t*,
*β*)*m*(*t*), where
*m*(*t*)*dt* =
*E*[*dO _{i}*(

$${\widehat{\mathcal{E}}}_{h}(t,\beta )=\frac{{n}^{-1}{\sum}_{i=1}^{n}{\int}_{0}^{\tau}{K}_{h}(t-u){Y}_{i}(u){Z}_{i}(u)exp\{\beta {Z}_{i}(u)\}{dO}_{i}(u)}{{n}^{-1}{\sum}_{i=1}^{n}{\int}_{0}^{\tau}{K}_{h}(t-u){Y}_{i}(u)exp\{\beta {Z}_{i}(u)\}{dO}_{i}(u)}\to \mathcal{E}(t,\beta )$$

(7)

uniformly in probability as *h*
→ 0 and *nh*^{2} → ∞. It is easy
to verify that
* _{h}*(

We propose to replace
*S*^{(1)}(*t*,
*β*)/*S*^{(0)}(*t*,
*β*) with
* _{h}*(

$${\widehat{U}}_{h}(\beta )={n}^{-1}\sum _{i=1}^{n}{\int}_{0}^{\tau}\{{Z}_{i}(t)-{\widehat{\mathcal{E}}}_{h}(t,\beta )\}{dN}_{i}(t).$$

(8)

Because
*Û _{h}*(

Under conditions (A1)–(A9),
* _{h}* is a consistent
estimator of the true parameter

As suggested by Theorem 1, the asymptotic distribution of the
proposed estimator does not depend on the choice of bandwidth as long as the
bandwidth condition is satisfied, that is, *h* =
*O*(*n*^{−}* ^{v}*)
with 1/4 <

$${PE}_{k}(h)=-\sum _{i\in {D}_{k}}{\int}_{0}^{\tau}\left({\widehat{\beta}}_{(-k)}{Z}_{i}(t)-log\phantom{\rule{0.16667em}{0ex}}\left[\frac{{\sum}_{j\in {D}_{k}}{\int}_{0}^{\tau}{K}_{h}(t-u){Y}_{j}(u)exp\{{\widehat{\beta}}_{(-k)}{Z}_{j}(u)\}{dO}_{j}(u)}{{\sum}_{j\in {D}_{k}}{\int}_{0}^{\tau}{K}_{h}(t-u){Y}_{j}(u){dO}_{j}(u)}\right]\right)\phantom{\rule{0.16667em}{0ex}}{dN}_{i}(t),$$

where
_{(−}_{k}_{)}
is estimated using data from individuals not in
*D _{k}* with bandwidth

We conduct a series of numerical simulation studies to evaluate the
finite-sample performance of different estimators with moderate sample size. For
each simulation, we generate 1000 simulated datasets, each with 300 subjects. In the
first set of simulation studies, we consider the scenarios where
*Z _{i}*(

We compare the performances of four estimators of
*β*: (a) CPR, the cross-product ratio estimator, (b) ESF, the
estimated score function approach, (c) ACCF, the rate ratio estimator with all
carriage and event covariates carried forward, (d) CCCF, the rate ratio estimator
with carriage covariates carried forward. We consider two scenarios:
*g*(*t*) = 4 for *t*
[0, 20] and *g*(*t*)
= 4*I*(*t* ≤ 10) +
6*I*(10 < *t* ≤ 20). The prevalence
rate of GAS for the former scenario is 20% at any time point, while for the
latter scenario the prevalence rate is 20% for *t* ≤
10 and drops to 14% for *t* > 10. Each subject has 20
scheduled visits on [0, 20], with one visit per unit time interval.
The time of visit in each interval is uniformly distributed. We evaluate the
behavior of the estimators under various degrees of missingness, where the
probability of missing a pre-scheduled visit is set to be 0%, 20%,
40%, and 60% for all visits. The last observed regular visit is
treated as the censoring time so the recurrent events are only observed up to the
last observed regular visit. To estimate the pseudo-partial score function, we
employ the Gaussian kernel with the quartiles of the kernel density function at
±0.25*h*, where for each simulated dataset the bandwidth
*h* is chosen by applying the 10-fold cross-validation method
described in Section 2.3.

Table 1 summarizes the empirical bias
and the empirical standard deviation of 1000 estimated regression parameters, the
estimated asymptotic standard error, the bootstrap standard error based on 500
bootstrapped samples, the coverage rate of the 95% bootstrap confidence
interval, and the relative efficiency that compares the mean square error of an
estimator to that of the rate ratio estimator under the perfect scenario where the
covariate process is monitored continuously throughout the entire study period. When
*g*(*t*) = 4, there is no time trend in
the covariates. The constant prevalence rate assumption holds, and thus the CPR
estimator is consistent. The bias and relative efficiency of ESF is comparable to
that of CPR. The ACCF method yields substantial bias, especially when the missing
probability is high, while the CCCF method remains consistent. Some intuition for
this behavior can be developed. Suppose prevalence is constant. At any time
*t* with *β* > 0, an estimate of the
prevalence rate using all covariate data will tend to be biased too high as events
just prior to *t* will tend to have *Z* = 1.
An estimate of the prevalence just using the carriage data collected at the monthly
visits will not be biased and this differential behavior seems to have consequences
for estimation of *β* as well. For all four estimators, the
bootstrap standard errors are very close to the empirical standard deviations and
the coverage rates of the 95% bootstrap confidence intervals are close to
the nominal level (0.95), supporting the appropriateness of the bootstrap approach.
Moreover, for the proposed ESF estimator, the asymptotic standard errors are close
to the empirical standard deviations.

Simulation results for with a binary
covariate process. Bias and ES are the empirical bias (×1000) and
empirical standard deviation (×1000) of 1000 regression parameter
estimates; ASE is the estimated asymptotic standard **...**

In the scenarios where *g*(*t*) =
4*I*(*t* ≤ 10) +
6*I*(10 < *t* ≤ 20), there is a
decreasing time trend in the covariate process. The simple CPR estimator is
substantially biased because the steady state assumption is violated. Moreover, both
the ACCF and CCCF methods are also biased and their bias increases with the missing
probability. The bias of the proposed ESF estimator is small, and its relative
efficiency is much higher than its competitors. Both the bootstrap standard errors
and the asymptotic standard errors of the ESF estimator track the empirical standard
deviations well.

The second set of simulation studies examines the performance of ESF, ACCF,
CCCF methods in scenarios where *Z _{i}*(

Simulation results for with a continuous
covariate process. Bias and ES are the empirical bias (×1000) and
empirical standard deviation (×1000) of 1000 regression parameter
estimates; ASE is the estimated asymptotic **...**

In order to assess the sensitivity of the proposed estimation procedure to
the bandwidth selection, we have compared the simulation results with different
choices of bandwidth (results not shown). It is found that the estimated regression
coefficients are very similar, differing only in the third decimal place. In
addition, as pointed out by an anonymous reviewer, the condition
*m*(*t*) > 0 for all *t* may not
be satisfied in some studies. To compare the proposed estimator with its competitors
in the case where *m*(*t*) > 0 fails to hold, we
have conducted additional simulation where we set the timing of regular visits to be
fixed (results not shown). It is found that the proposed estimator also shows
smaller bias and similar/better relative efficiency (in terms of mean square errors)
when compared with its competitors. In other words, our estimator still outperforms
its competitors even when this technical assumption
*m*(*t*) > 0 fails to hold.

All analyses are performed in R, version 3.1.0. The computation environment
is a multi-core Linux cluster with more than 680 cores running in the average of 2.5
GHz speed and 4.4 TB of memory. In the simulations reported above, it takes a few
seconds to obtain parameter estimation for the proposed estimator with a given
bandwidth *h*, and it takes less than 5 mins to do the bandwidth
selection using cross-validation. The other methods also obtain the estimates in a
few seconds.

Pharyngitis is one of the most common reasons patients seek the care of a physician, and GAS is one of the common causes of pharyngitis. A meta-analysis of the prevalence of GAS in the pharynx revealed that children with pharyngitis had a GAS prevalence rate of 37% (CI: 32% – 43%), and children with no clinical evidence of infection had a 12% prevalence rate of GAS (CI: 9% – 14%) [21]. Because GAS pharyngitis is a communicable disease, family members and school classmates of the patient are frequently infected. Furthermore, patients with viral pharyngitis, which is more common than streptococcal pharyngitis, may have incidental GAS pharyngeal carriage, with a rate similar to that of the symptom-free children. Streptococcal pharyngitis in most cases cannot be distinguished on clinical grounds from viral pharyngitis. For this reason the throat culture for the detection of Group A streptococci remains the diagnostic gold standard.

We analyze data from the Indian pharyngitis study to evaluate the effect of
GAS colonization on the risk of developing pharyngitis. Between March 2002 and March
2004, 305 school children from Vellore India, aged between 7 and 11, were examined
weekly for pharyngitis. For those with pharyngitis, a throat culture was obtained to
identify the presence of GAS. Additionally, monthly throat cultures were obtained on
all the study children to determine the prevalence of GAS. Note that, although the
regular visits were scheduled on a monthly basis, the actual observation times were
irregularly spaced across subjects to balance the workload. Similarly, although the
event visits were scheduled weekly, the actual observation times were irregularly
spaced. Therefore, the “continuous observation times” assumption is
approximately satisfied in our application. All patients with GAS infections were
treated with antibiotics, which usually shortens the infectious period to 24 hours.
In case the antibiotic therapy was not effective, a two-week rule was applied to
determine an episode of pharyngitis, that is, a pharyngitis event occurred within 14
days after a previous episode was considered as the same episode. We fit the
proportional rate model [3]
with GAS colonization status as a time-varying covariate and applied the methods
discussed in Section 2 to estimate the covariate effect. The time origin for the
recurrent event analysis is set to be the first day of the study, that is, March 11,
2002. In general, the assumption that *O _{i}*(·) is
independent of { ${N}_{i}^{\ast}(\xb7)$,

Table 3 presents a tally of outcomes based on the regular carriage visits and the pharyngitis event visits. Over the 2 years of the study, 641 pharyngitis events occurred or roughly one per child per year. With 2827 monthly visits, the carriage data corresponds to a total followup of about 236 children years. About 17% of the throat cultures collected at pharyngitis visits were GAS positive, while 11% of the throat cultures obtained at the carriage visits were GAS positive. Figure 1 presents the estimated prevalence rate of GAS colonization using data from the monthly carriage visits. It is seen that the GAS prevalence rate was changing over time in the observation period, with a range from 0.05 to 0.2.

A tabulation of the number of carriage and pharyngitis visits broken down by
Group A streptococcus throat colonization from 305 school-children from Vellore
India followed for 2 years.

We report the rate ratio estimates using four different methods. The simple CPR method estimates a rate ratio of 1.61 with a 95% confidence interval (CI) of (1.31–1.99). However, this estimator is likely to be biased because GAS colonization varies with season. The ACCF method that carries forward all available covariate data yields a rate ratio of 1.28 (95% CI: 1.01–1.58), while the CCCF method that carries forward only carriage covariate yields a rate ratio of 1.37 (95% CI: 1.01–1.76). Finally, applying 10-fold cross-validation for the proposed ESF approach with a Gaussian kernel, we estimate a rate ratio of 1.46 (95% CI: 1.18–1.86). Confidence intervals for all four methods are obtained by the percentile method of the nonparametric bootstrap for clustered data with 1000 bootstrap samples, where the sampling unit is the child. Similarly as observed in the simulation studies, the CCCF and the proposed approach give comparable results, while the other two estimators appear to yield results deviated from the proposed estimator. Since the simulations in the previous section show that the proposed estimated score function approach is flexible and robust, we advocate the proposed approach for inference here. We conclude that there is significant effect of GAS on the risk of developing pharyngitis among school children. This suggests that population methods of control such as vaccine development may be worth considering for this population. The risk of pharyngitis increases by 46% (CI: 18%–86%) for a child who is colonized with GAS.

This article aims to estimate the effect of covariates on recurrent events when time-varying covariates are measured at pre-scheduled follow-up visits (regular visits) and in subjects when they have an event (event visits). The former type of visits carries no information about the underlying recurrent event process, whereas the event-based sampling of covariate values typically provides a biased representation of the underlying covariate process. If higher values of a covariate are associated with an event, then event based sampling will tend to get larger covariates than from the underlying process. This is a subtle but important point and often leads to substantial bias using the “obvious” method of last-covariate value carried forward, as we have shown.

We propose to solve an estimated score function which estimates the mean covariate process in the pseudo-partial score function by kernel smoothing the observed covariate data at regular visits. While the idea of applying the smoothing technique to individual components in an estimating equation is not brand new in longitudinal data analysis and survival analysis, see for example [22], we believe that applying the smoothing technique to tackle the commonly encountered problem of not observing covariate values for every individual in the risk sets is innovative in recurrent event analysis.

Though illustrated with univariate covariate process, the proposed method can be extended to multivariate covariate processes by employing multivariate kernel smoothing techniques in a straightforward manner. Note that the bandwidths for different covariates in multivariate kernel smoothing are allowed to be different. Overall, the estimated score function approach is the most flexible and robust approach as it can accommodate any kind of covariate process. The cross-product ratio estimator and the last covariate carried forward approach require more restrictive assumptions for unbiased and efficient inference.

An alternative approach to deal with intermittently observed time-dependent covariates is to jointly model the recurrent event process and the covariate process. This requires postulation of a specific stochastic model for the covariate process. For binary covariate process, one can assume a transition probability model [10]. For continuous covariate process, one can assume a mixed effects model [11, 23]. The validity of the joint modeling approach relies on the assumptions of the covariate process. It may lead to substantial bias if the model for the covariate process does not hold. Another alternative approach, as mentioned by one reviewer, is to fill in the missing covariates using simple smoothing techniques such as taking the mean of the neighboring values [24]. This is a simple and straightforward approach, however, it has a few drawbacks. First of all, it is not applicable to the binary covariate measures. Secondly, it is more computationally intensive than our proposed approach, as it requires estimation of the missing covariate values for each individual. Thirdly, because the covariate estimates are based on individual data, this approach may lead to substantial bias when the individual observations are sparse in time. Lastly, it lacks theoretical justification.

For future research, as suggested by one reviewer, it would be interesting to investigate the problem of infrequently updated covariate values in other recurrent event data analysis settings. For example, the self-controlled case series (SCCS) method is an alternative method to analyze recurrent event data when the event occurrence rate is low [25]. The two-state infection model [2] and the multi-state model [26] can be used in studies where individuals experience more than one types of events. Existing estimation procedures for these models require the covariate process be observed continuously throughout the observation period. It would be interesting to develop approaches to deal with missing covariate values under these different settings. Further investigation is warranted.

The authors are grateful to the editor, associate editor, and referees for their comments which helped improve this paper. This research was supported by the National Cancer Institute grant R01CA193888.

- (A1){
*N*(·),_{i}*O*(·),_{i}*Y*(·),_{i}*Z*(·)} are independent and identically distributed._{i} - (A2)
*N*(_{i}*τ*) is bounded.*λ*(·), the rate function of^{c}*N*(·), is of bounded variation._{i} - (A3)The true parameter
*β*_{0}lies in a compact set in , and the baseline rate function*λ*_{0}(*t*) is absolutely continuous. - (A4)The covariate process
*Z*(_{i}*t*) has uniformly bounded total variation, namely, ${\int}_{0}^{\tau}\mid {dZ}_{i}(t)\mid +\mid {Z}_{i}(0)\mid \le c$ for some*c*> 0 for all*i*. Without loss of generality, we assume*Z*(_{i}*t*) ≥ 0. - (A5)The censoring time
*C*is independent of ${N}_{i}^{\ast}(\xb7)$ conditional on_{i}*Z*(·) with_{i}*G*(*τ*) = pr(*C*≥_{i}*τ*) > 0. - (A6)The functions
*s*^{(}^{k}^{)}(*t*,*β*) =*E*[*Y*(_{i}*t*)*Z*(_{i}*t*)exp {^{k}*βZ*(_{i}*t*)}],*k*= 0, 1, have bounded second derivatives for*t*[0,*τ*]. - (A7)The observation time process
*O*(·) is independent of { ${N}_{i}^{\ast}(\xb7)$,_{i}*Z*(·),_{i}*C*} and is bounded. Moreover, the covariate collection function_{i}*m*(*t*) at time*t*, defined by*m*(*t*)*dt*=*E*[*dO*(_{i}*t*)], is positive and has bounded second derivative for*t*[0,*τ*]. - (A8)The kernel function
*K*(·) is a symmetric density function with a bounded support. - (A9)
*h*=*O*(*n*), where 1^{−v}*/*4*< v <*1*/*2.

We first show the uniform consistency of
when *h*
→ 0, and *nh*^{2} → ∞.
Because the functional defined by
*Û _{h}* is continuous with respect
to the supreme norm topology, it is sufficient to show that the four
processes ${n}^{-1}{\sum}_{i=1}^{n}{\int}_{0}^{\tau}{Z}_{i}(t)dN(t),\phantom{\rule{0.16667em}{0ex}}{n}^{-1}{\sum}_{i=1}^{n}{N}_{i}(t),\phantom{\rule{0.16667em}{0ex}}{\widehat{S}}_{h}^{(1)}(t,\beta )$ and ${\widehat{S}}_{h}^{(0)}(t,\beta )$ converge in probability to their limits
uniformly for

$$P(\underset{\beta \in \mathcal{B},t\in [0,\tau ]}{sup}\sqrt{n}\mid {\widehat{R}}^{(k)}(t,\beta )-{r}^{(k)}(t,\beta )\mid \phantom{\rule{0.16667em}{0ex}}>x)\le {c}_{k}{x}^{{v}_{k}}{e}^{-2{x}^{2}},$$

(9)

where *c _{k}* and

$$\begin{array}{l}\underset{\beta \in \mathcal{B},t\in [h,\tau ,h]}{sup}\mid {\widehat{S}}_{h}^{(k)}(t,\beta )-E\{{\widehat{S}}_{h}^{(k)}(t,\beta )\}\mid =\underset{\beta \in \mathcal{B},t\in [h,\tau -h]}{sup}\left|{\int}_{t-h}^{t+h}\{{\widehat{R}}^{(k)}(u,\beta )-{r}^{(k)}(u,\beta )\}d{K}_{h}(t-u)\right|\\ \le {h}^{-1}\underset{\beta \in \mathcal{B},t\in [0,\tau ]}{sup}\mid {\widehat{R}}^{(k)}(t,\beta )-{r}^{(k)}(t,\beta )\mid \xb7V(K),\end{array}$$

where
*V*(*K*) is the variation of the
kernel function *K*.

By equation (9),
for any *ε* > 0, we have

$$\begin{array}{l}P({h}^{-1}\underset{\beta \in \mathcal{B},t\in [0,\tau ]}{sup}\mid {\widehat{R}}^{(k)}(t,\beta )-{r}^{(k)}(t,\beta )\mid \phantom{\rule{0.16667em}{0ex}}>\epsilon )=P(\underset{\beta \in \mathcal{B},t\in [0,\tau ]}{sup}\sqrt{n}\mid {\widehat{R}}^{(k)}(t,\beta )-{r}^{(k)}(t,\beta )\mid \phantom{\rule{0.16667em}{0ex}}>\sqrt{n}h\epsilon )\\ <{c}_{k}{(\sqrt{n}h\epsilon )}^{{v}_{k}}{e}^{-2n{h}^{2}{\epsilon}^{2}}\to 0,\phantom{\rule{0.38889em}{0ex}}\text{as}\phantom{\rule{0.38889em}{0ex}}n{h}^{2}\to \infty .\end{array}$$

Therefore, ${sup}_{\beta \in \mathcal{B},t\in [h,\tau -h]}\left|{\widehat{S}}_{h}^{(k)}(t,\beta )-E\{{\widehat{S}}_{h}^{(k)}(t,\beta )\}\right|$ converge to 0 in probability as
*nh*^{2} → ∞. Also, since ${sup}_{\beta \in \mathcal{B},t\in [h,\tau -h]}\left|E\{{\widehat{S}}_{h}^{(k)}(t,\beta )\}-{s}^{(k)}(t,\beta )m(t)\right|=O({h}^{2}),\phantom{\rule{0.16667em}{0ex}}{sup}_{\beta \in \mathcal{B},t\in [0,h)}\mid {s}^{(k)}(t,\beta )m(t)-{s}^{(k)}(h,\beta )m(h)\mid \phantom{\rule{0.16667em}{0ex}}=O(h)$, and
sup_{β}_{,}_{t}_{[0,}_{h}_{)}
|*s*^{(}^{k}^{)}(*t*,
*β*)*m*(*t*)
−
*s*^{(}^{k}^{)}(*h*,
*β*)*m*(*h*)
= *O*(*h*), and
sup_{β}_{,}_{t}_{(}_{τ}_{−}_{h}_{,}_{τ}_{]}|*s*^{(}^{k}^{)}(*t*,
*β*)*m*(*t*)−*s*^{(}^{k}^{)}(*τ*−*h*,
*β*)*m*(*τ*−*h*)|
= *O*(*h*), the uniform
consistency of ${\widehat{S}}_{h}^{(k)}(t,\beta )$ holds. Moreover, the monotone bounded
stochastic process ${n}^{-1}{\sum}_{i=1}^{n}{N}_{i}(t)$ converges in probability to its limits
Λ* ^{c}*(

Next, we prove the asymptotic normality of $\sqrt{n}{\widehat{U}}_{h}({\beta}_{0})$, which can be written as

$$\sqrt{n}{\widehat{U}}_{h}({\beta}_{0})={n}^{-1/2}\sum _{i=1}^{n}\left\{{\int}_{0}^{\tau}{Z}_{i}(t){dN}_{i}(t)-{\int}_{0}^{\tau}\frac{{s}^{(1)}(t,{\beta}_{0})}{{s}^{(0)}(t,{\beta}_{0})}{dN}_{i}(t)\right\}+{n}^{-1/2}\sum _{i=1}^{n}\left\{{\int}_{0}^{\tau}\frac{{s}^{(1)}(t,{\beta}_{0})}{{s}^{(0)}(t,{\beta}_{0})}{dN}_{i}(t)-{\int}_{0}^{\tau}\frac{{\widehat{S}}_{h}^{(1)}(t,{\beta}_{0})}{{\widehat{S}}_{h}^{(0)}(t,{\beta}_{0})}{dN}_{i}(t)\right\}.$$

(10)

Because *N*(*t*) is a bounded
monotone stochastic process, ${n}^{-1/2}{\sum}_{i=1}^{n}\{{N}_{i}(t)-{\mathrm{\Lambda}}^{c}(t)\}$ converges weakly to a zero mean tight
Gaussian process. Moreover, $\left\{{\scriptstyle \frac{{s}^{(1)}(t,{\beta}_{0})}{{s}^{(0)}(t,{\beta}_{0})}}-{\scriptstyle \frac{{\widehat{S}}_{h}^{(1)}(t,{\beta}_{0})}{{\widehat{S}}_{h}^{(0)}(t,{\beta}_{0})}}\right\}$ has total bounded variation and
converges in probability to 0. By Lemma 4.2 of [29], the second term on the right
hand side of Equation
(10) is

$$\begin{array}{l}{n}^{-1/2}\sum _{i=1}^{n}\left\{{\int}_{0}^{\tau}\frac{{s}^{(1)}(t,{\beta}_{0})}{{s}^{(0)}(t,{\beta}_{0})}{dN}_{i}(t)-{\int}_{0}^{\tau}\frac{{\widehat{S}}_{h}^{(1)}(t,{\beta}_{0})}{{\widehat{S}}_{h}^{(0)}(t,{\beta}_{0})}{dN}_{i}(t)\right\}=-{n}^{1/2}{\int}_{0}^{\tau}\left\{\frac{{\widehat{S}}_{h}^{(1)}(t,{\beta}_{0})}{{\widehat{S}}_{h}^{(0)}(t,{\beta}_{0})}-\frac{{s}^{(1)}(t,{\beta}_{0})}{{s}^{(0)}(t,{\beta}_{0})}\right\}\phantom{\rule{0.16667em}{0ex}}{\lambda}^{c}(t)dt+{o}_{p}(1)\\ =-{n}^{1/2}{\int}_{0}^{\tau}\left\{\frac{{\widehat{S}}_{h}^{(1)}(t,{\beta}_{0})}{{\widehat{S}}_{h}^{(0)}(t,{\beta}_{0})}-\frac{{s}^{(1)}(t,{\beta}_{0})m(t)}{{\widehat{S}}_{h}^{(0)}(t,{\beta}_{0})}\right\}\phantom{\rule{0.16667em}{0ex}}{\lambda}^{c}(t)dt-{n}^{-1/2}{\int}_{0}^{\tau}\left\{\frac{{s}^{(1)}(t,{\beta}_{0})m(t)}{{\widehat{S}}_{h}^{(0)}(t,{\beta}_{0})}-\frac{{s}^{(1)}(t,{\beta}_{0})}{{s}^{(0)}(t,{\beta}_{0})}\right\}\phantom{\rule{0.16667em}{0ex}}{\lambda}^{c}(t)dt+{o}_{p}(1)\\ \stackrel{\mathit{def}}{=}I+II+{o}_{p}(1).\end{array}$$

For ease of presentation, we introduce a few new notations. Let
*g* be a function of bounded variation on [0,
*τ*]. Define ${S}_{i}^{(k)}(t,\beta )={\int}_{0}^{\tau}{K}_{h}(t-u){Y}_{i}(u){Z}_{i}{(u)}^{k}exp\{{Z}_{i}(u)\beta \}{dO}_{i}(u)$ for *t*
[*h*,
*τ*−*h*], and ${S}_{i}^{(k)}(t,\beta )={S}_{i}^{(k)}(h,\beta )$ for *t*
[0, *h*] ${S}_{i}^{(k)}(t,\beta )={S}_{i}^{(k)}(\tau -h,\beta )$ for *t*
[*τ*−*h*,
*τ*]. Thus ${\widehat{S}}_{h}^{(k)}(t,\beta )={n}^{-1}{\sum}_{i=1}^{n}{S}_{i}^{(k)}(t,\beta )$. We also establish the following two
properties in order to prove the asymptotic distribution of
*I* and *II*. (Detailed proof for the
two properties can be found in the end of the Appendix.)

- It can be shown that the function classes { ${\int}_{0}^{u}g(t){S}_{i}^{(k)}(t,\beta )dt$} and { ${\int}_{0}^{u}g(t){Y}_{i}(t){Z}_{i}{(t)}^{k}exp\{{\beta}_{0}{Z}_{i}(t)\}{dO}_{i}(t)$} are bounded and monotone in
*u*and thus is Donsker. By the functional central limit theorem, for*u*[0,*τ*], we have$$\sqrt{n}{\int}_{0}^{u}g(t){\widehat{S}}_{h}^{(k)}(t,{\beta}_{0})dt-\frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\int}_{0}^{u}g(t){Y}_{i}(t){Z}_{i}{(t)}^{k}exp\{{\beta}_{0}{Z}_{i}(t)\}{dO}_{i}(t)=O(\sqrt{n}{h}^{2})+{O}_{p}(\sqrt{h}).$$Note that the right hand side of the above equation will be*o*(1) if_{p}*nh*^{4}→ 0 and*h*→ 0. - Under condition (A9), along the same line of [30], can be shown that $\sqrt{n}{\int}_{0}^{\tau}g(t)\{{\widehat{S}}^{(0)}{(t,{\beta}_{0})}^{-1}-{s}^{(0)}{(t,{\beta}_{0})}^{-1}m{(t)}^{-1}\}\{{\widehat{S}}^{(k)}(t,{\beta}_{0})-{s}^{(k)}(t,{\beta}_{0})m(t)\}dt={o}_{p}(1)$.

For *I*, we have

$$\begin{array}{l}I=-{n}^{1/2}{\int}_{0}^{\tau}\left\{\frac{{\widehat{S}}_{h}^{(1)}(t,{\beta}_{0})}{{s}^{(0)}(t,{\beta}_{0})m(t)}-\frac{{s}^{(1)}(t,{\beta}_{0})}{{s}^{(0)}(t,{\beta}_{0})}\right\}\phantom{\rule{0.16667em}{0ex}}{\lambda}^{c}(t)dt+{o}_{p}(1)\\ =-\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\left\{{\int}_{0}^{\tau}\frac{{\lambda}^{c}(t)}{{s}^{(0)}(t,{\beta}_{0})m(t)}{Y}_{i}(t){Z}_{i}(t)exp\{{\beta}_{0}{Z}_{i}(t)\}{dO}_{i}(t)-{\int}_{0}^{\tau}\frac{{s}^{(1)}(t,{\beta}_{0})}{{s}^{(0)}(t,{\beta}_{0})}{\lambda}^{c}(t)dt\right\}+{o}_{p}(1),\end{array}$$

where the first equation holds by using
(ii) and the second equation holds from (i). Similarly, for
*II*, we have

$$\begin{array}{l}II=\sqrt{n}{\int}_{0}^{\tau}{s}^{(1)}(t,{\beta}_{0})\frac{{\widehat{S}}_{h}^{(0)}(t,{\beta}_{0})-{s}^{(0)}(t,{\beta}_{0})m(t)}{{s}^{(0)}{(t,{\beta}_{0})}^{2}m(t)}{\lambda}^{c}(t)dt+{o}_{p}(1)\\ =\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\left\{{\int}_{0}^{\tau}\frac{{s}^{(1)}(t,{\beta}_{0}){\lambda}^{c}(t)}{{s}^{(0)}{(t,{\beta}_{0})}^{2}m(t)}{Y}_{i}(t)exp\{{\beta}_{0}{Z}_{i}(t)\}{dO}_{i}(t)-{\int}_{0}^{\tau}\frac{{s}^{(1)}(t,{\beta}_{0})}{{s}^{(0)}(t,{\beta}_{0})}{\lambda}^{c}(t)dt\right\}+{o}_{p}(1).\end{array}$$

Hence we have $\sqrt{n}{\widehat{U}}_{h}({\beta}_{0})={n}^{-1/2}{\sum}_{i=1}^{n}\psi ({\beta}_{0})+{o}_{p}(1)$, where

$$\psi ({\beta}_{0})={\int}_{0}^{\tau}\{{Z}_{i}(t)-\mathcal{E}(t,{\beta}_{0})\}{dN}_{i}(t)-{\int}_{0}^{\tau}{Y}_{i}(t)\{{Z}_{i}(t)-\mathcal{E}(t,{\beta}_{0})\}exp\{{\beta}_{0}{Z}_{i}(t)\}{\{{s}^{(0)}(t,{\beta}_{0})m(t)\}}^{-1}{\lambda}^{c}(t){dO}_{i}(t).$$

Define
* _{h}*(

$${\widehat{\mathrm{\Gamma}}}_{h}(\beta )={n}^{-1}\sum _{i=1}^{n}{\int}_{0}^{\tau}\left[{\left\{\frac{{\widehat{S}}_{h}^{(1)}(t,\beta )}{{\widehat{S}}_{h}^{(0)}(t,\beta )}\right\}}^{2}-\frac{{\widehat{S}}_{h}^{(2)}(t,\beta )}{{\widehat{S}}_{h}^{(0)}(t,\beta )}\right]\phantom{\rule{0.16667em}{0ex}}{dN}_{i}(t)$$

and

$$\mathrm{\Gamma}(\beta )={\int}_{0}^{\tau}\left[{\left\{\frac{{s}^{(1)}(t,\beta )}{{s}^{(0)}(t,\beta )}\right\}}^{2}-\frac{{s}^{(2)}(t,\beta )}{{s}^{(0)}(t,\beta )}\right]\phantom{\rule{0.16667em}{0ex}}{\lambda}^{c}(t)dt.$$

Arguing as before, we can show that
* _{h}*(

Let ${R}_{i}^{(k)}(t,\beta )$ denote the *i*th
term of
^{(}^{k}^{)}(*t*,
*β*), that is, ${R}_{i}^{(k)}(t,\beta )={\int}_{0}^{t}{Y}_{i}(u){Z}_{i}{(u)}^{k}exp\{{Z}_{i}(u)\beta \}{dO}_{i}(u)$ Let
*a _{i}*(

We first show that
sup_{u}_{[0,}_{h}_{]}|*Ea _{i}*(

$$\begin{array}{l}\underset{u\in [0,h]}{sup}\mid E{a}_{i}(u)\mid =\underset{u\in [0,h]}{sup}\left|E{\int}_{0}^{u}g(t){S}_{i}^{(k)}(t,{\beta}_{0})dt-E{\int}_{0}^{u}g(t){dR}_{i}^{(k)}(t,{\beta}_{0})\right|\\ =\underset{u\in [0,h]}{sup}\left|E{\int}_{0}^{u}{\int}_{0}^{2h}{K}_{h}(h-s)g(t){dR}_{i}^{(k)}(s,{\beta}_{0})dt-E{\int}_{0}^{u}g(t){dR}_{i}^{(k)}(t,{\beta}_{0})\right|\\ =\underset{u\in [0,h]}{sup}\left|{\int}_{0}^{u}g(t){\int}_{0}^{2h}{K}_{h}(h-s){dr}^{(k)}(s,{\beta}_{0})dt-{\int}_{0}^{u}g(t){dr}^{(k)}(t,{\beta}_{0})\right|\\ \le \underset{u\in [0,h]}{sup}\left|{\int}_{0}^{u}g(t){s}^{(k)}(h,{\beta}_{0})m(h)dt-{\int}_{0}^{u}g(t){s}^{(k)}(t,{\beta}_{0})m(t)dt\right|+O({h}^{2})\\ \le \underset{u\in [0,h]}{sup}\left|{\int}_{0}^{u}g(t){f}_{1}^{(k)}(h)(h-t)dt\right|+O({h}^{2})\\ =O({h}^{2}).\end{array}$$

(11)

And

$$\underset{u\in [0,h]}{sup}E\mid {a}_{i}(u)\mid \phantom{\rule{0.16667em}{0ex}}\le h\xb7\left\{\underset{t\in [0,h]}{sup}E\mid g(t){S}_{i}^{(k)}(t,{\beta}_{0})\mid +\underset{t\in [0,h]}{sup}\mid g(t){s}_{i}^{(k)}(t,{\beta}_{0})m(t)\mid \right\}=O(h).$$

(12)

Similarly, we can show
sup_{u}_{[}_{τ}_{−}_{h}_{,}_{τ}_{]}
| *Ea _{i}*(

For *h* < *u* <
*τ* − *h*, it can
be shown that

$$\begin{array}{l}\underset{u\in [h,\tau -h]}{sup}\mid E{a}_{i}(u)\mid =\underset{u\in [h,\tau -h]}{sup}\left|E{\int}_{0}^{u}g(t){S}_{i}^{(k)}(t,{\beta}_{0})dt-E{\int}_{0}^{u}g(t){dR}_{i}^{(k)}(t,{\beta}_{0})\right|\\ =\underset{u\in [h,\tau -h]}{sup}\left|E{\int}_{0}^{h}g(t){S}_{i}^{(k)}(t,{\beta}_{0})dt-E{\int}_{0}^{h}g(t){dR}_{i}^{(k)}(t,{\beta}_{0})\right|+\\ \underset{u\in [h,\tau -h]}{sup}\left|E{\int}_{h}^{u}{\int}_{t-h}^{t+h}\{g(t)-g(s)\}{K}_{h}(t-s){dR}_{i}^{(k)}(s,{\beta}_{0})dt\right|+\\ \underset{u\in [h,\tau -h]}{sup}\left|E{\int}_{0}^{2h}{\int}_{h}^{s+h}{K}_{h}(t-s)\mathit{dtg}(s){dR}_{i}^{(k)}(s,{\beta}_{0})-E{\int}_{h}^{2h}g(s){dR}_{i}^{(k)}(s,{\beta}_{0})\right|+\\ \underset{u\in [h,\tau -h]}{sup}\left|E{\int}_{u-h}^{u}{\int}_{s+h}^{u}{K}_{h}(t-s)\mathit{dtg}(s){dR}_{i}^{(k)}(s,{\beta}_{0})+E{\int}_{u}^{u+h}{\int}_{s-h}^{u}{K}_{h}(t-s)\mathit{dtg}(s){dR}_{i}^{(k)}(s,{\beta}_{0})\right|\\ =Ia+Ib+Ic+Id\end{array}$$

Note that *Ia* =
*O*(*h*^{2}) from (11). Also, we have

$$Ib\le \underset{u\in [h,\tau -h]}{sup}\left|{\int}_{h}^{u}{\int}_{t-h}^{t+h}\{g(t)-g(s)\}{K}_{h}(t-s){s}^{(k)}(t,{\beta}_{0})m(t)\mathit{dsdt}\right|+O({h}^{2})=O({h}^{2}).$$

Let ${f}_{2}^{(k)}(t)$ denote the derivative of
*g*(*t*)*s*^{(}^{k}^{)}(*t*,
*β*_{0})*m*(*t*),
then

$$\begin{array}{l}Ic=\underset{u\in [h,\tau -h]}{sup}\left|E{\int}_{0}^{2h}{\int}_{h}^{s+h}{K}_{h}(t-s)\mathit{dtg}(s){dR}_{i}^{(k)}(s,{\beta}_{0})-E{\int}_{h}^{2h}g(s){dR}_{i}^{(k)}(s,{\beta}_{0})\right|\\ =\underset{u\in [h,\tau -h]}{sup}\left|{\int}_{0}^{2h}{\int}_{1-s/h}^{1}K(x)\mathit{dxg}(s){s}^{(k)}(s,{\beta}_{0})m(s)ds-{\int}_{h}^{2h}g(u){s}^{(k)}(s,{\beta}_{0})m(s)ds\right|\\ =\underset{u\in [h,\tau -h]}{sup}\left|{\int}_{0}^{h}{\int}_{1-s/h}^{1}K(x)\mathit{dxg}(s){s}^{(k)}(s,{\beta}_{0})m(s)ds+{\int}_{h}^{2h}{\int}_{1-s/h}^{-1}K(x)\mathit{dxg}(s){s}^{(k)}(s,{\beta}_{0})m(s)ds\right|\\ =\underset{u\in [h,\tau -h]}{sup}|{\int}_{0}^{h}{\int}_{1-s/h}^{1}K(x)dx\left\{g(s){s}^{(k)}(s,{\beta}_{0})m(s)-g(0){s}^{(k)}(0,{\beta}_{0})m(0)\right\}ds-{\int}_{h}^{2h}{\int}_{-1}^{1-s/h}K(x)dx\left\{g(s){s}^{(k)}(s,{\beta}_{0})m(s)-g(0){s}^{(k)}(0,{\beta}_{0})m(0)\right\}ds|\\ \le \underset{u\in [h,\tau -h]}{sup}\left|{\int}_{0}^{h}{\int}_{1-s/h}^{1}K(x)dx{f}_{2}^{(k)}(0)\xb7\mathit{sds}-{\int}_{h}^{2h}{\int}_{-1}^{1-s/h}K(x)dx{f}_{2}^{(k)}(0)\xb7\mathit{sds}\right|+O({h}^{2})\\ =O({h}^{2}).\end{array}$$

$$\begin{array}{l}E(Id)=\underset{u\in [h,\tau -h]}{sup}\left|E{\int}_{u-h}^{u}{\int}_{s+h}^{u}{K}_{h}(t-s)\mathit{dtdg}(s){dR}_{i}^{(k)}(s)+E{\int}_{u}^{u+h}{\int}_{s-h}^{u}{K}_{h}(t-s)\mathit{dtdg}(s){dR}_{i}^{(k)}(s)\right|\\ =\underset{u\in [h,\tau -h]}{sup}\left|-{\int}_{u-h}^{u}{\int}_{(u-s)/h}^{1}K(x)\mathit{dxg}(s){u}^{(k)}(s,{\beta}_{0})m(s)ds+{\int}_{u}^{u+h}{\int}_{-1}^{(u-s)/h}K(x)\mathit{dxg}(s){u}^{(k)}(s,{\beta}_{0})m(s)ds\right|\\ =\underset{u\in [h,\tau -h]}{sup}|-{\int}_{u-h}^{u}{\int}_{(u-s)/h}^{1}K(x)dx\left\{g(s){u}^{(k)}(s,{\beta}_{0})m(s)-g(u){u}^{(k)}(u,{\beta}_{0})m(u)\right\}ds+{\int}_{u}^{u+h}{\int}_{-1}^{(u-s)/h}K(x)dx\left\{g(s){u}^{(k)}(s,{\beta}_{0})m(s)-g(u){u}^{(k)}(u,{\beta}_{0})m(u)\right\}ds|\\ =O({h}^{2}).\end{array}$$

Therefore, we have
sup_{u}_{[}_{h}_{,}_{τ}_{−}_{h}_{]}
|*Ea _{i}*(

Combining the results for *u*
[*h*,
*τ*−*h*],
*u* [0,
*h*] and *u*
[*τ*−*h*,
*τ*], we have
sup_{u}_{[0,}_{τ}_{]}
|*Ea _{i}*(

We now give the proof of (ii) using similar arguments as in
[30]. To
ensure a non-zero denominator ${\widehat{S}}_{h}^{(0)}$, we define ${\stackrel{\sim}{S}}_{h}^{(0)}(t,\beta )={\widehat{S}}_{h}^{(0)}(t,\beta )$ when ${\widehat{S}}_{h}^{(0)}(t,\beta )>a/log(n)$ and ${\stackrel{\sim}{S}}_{h}^{(0)}(t,\beta )=a/log(n)$ if ${\widehat{S}}_{h}^{(0)}(t,\beta )\le a/log(n)$, where *a* is a
constant. Then we have $\sqrt{n}{sup}_{t\in [0,\tau ]}\mid {\stackrel{\sim}{S}}_{h}^{(0)}(t,{\beta}_{0})-{\widehat{S}}_{h}^{(0)}(t,{\beta}_{0})\mid ={o}_{p}(1)$ by calculating its
*L*_{2}-norm using Equation (9).

Define ${\widehat{S}}_{-i}^{(0)}(t,{\beta}_{0})={n}^{-1}{\sum}_{k\ne i}{\int}_{0}^{\tau}{K}_{h}(t-u){Y}_{k}(u)exp\{{Z}_{k}(u){\beta}_{0}\}{dO}_{k}(u)$ and ${\widehat{S}}_{-ij}^{(0)}(t,{\beta}_{0})={n}^{-1}{\sum}_{k\ne i,j}{\int}_{0}^{\tau}{K}_{h}(t-u){Y}_{k}(u)exp\{{Z}_{k}(u){\beta}_{0}\}{dO}_{k}(u)$. We further define ${\stackrel{\sim}{S}}_{-i}^{(0)}(t,{\beta}_{0})=I({\widehat{S}}_{-i}^{(0)}(t,{\beta}_{0})>a/logn)\xb7{\widehat{S}}_{-i}^{(0)}(t,{\beta}_{0})+I({\widehat{S}}_{-i}^{(0)}(t,{\beta}_{0})\le a/logn)\xb7a/logn$, and ${\stackrel{\sim}{S}}_{-ij}^{(0)}(t,{\beta}_{0})=I({\widehat{S}}_{-ij}^{(0)}(t,{\beta}_{0})>a/logn)\xb7{\widehat{S}}_{-ij}^{(0)}(t,{\beta}_{0})+I({\widehat{S}}_{-ij}^{(0)}(t,{\beta}_{0})\le a/logn)\xb7a/logn$. We also define ${L}_{i}(t)={\int}_{0}^{t}{S}_{i}^{(k)}(u,{\beta}_{0})du-{\int}_{0}^{t}{s}^{(k)}(u,{\beta}_{0})m(u)du,\phantom{\rule{0.16667em}{0ex}}{h}_{i}(t)=g(t){\stackrel{\sim}{S}}_{-i}^{(0)}{(t,{\beta}_{0})}^{-1}-g(t){s}^{(0)}{(t,{\beta}_{0})}^{-1}m{(t)}^{-1},{h}_{ij}(t)=g(t){\stackrel{\sim}{S}}_{-ij}^{(0)}{(t,{\beta}_{0})}^{-1}-g(t){s}^{(0)}{(t,{\beta}_{0})}^{-1}m{(t)}^{-1}$.

We first show $\sqrt{n}{\int}_{0}^{\tau}g(t)\{{\stackrel{\sim}{S}}_{-i}^{(0)}{(t,{\beta}_{0})}^{-1}-{s}^{(0)}{(t,{\beta}_{0})}^{-1}m{(t)}^{-1}\}\{{\widehat{S}}^{(k)}(t,{\beta}_{0})-{s}^{(k)}(t,{\beta}_{0})m(t)\}dt$, which is ${n}^{-1/2}{\sum}_{i=1}^{n}{\int}_{0}^{\tau}{h}_{i}(t){dL}_{i}(t)$, converges in probability to 0 by
proving *L*_{2}-convergence. Note that

$$\frac{1}{n}E\phantom{\rule{0.16667em}{0ex}}{\left[\sum _{i=1}^{n}{\int}_{0}^{\tau}{h}_{i}(u){dL}_{i}(u)\right]}^{2}\phantom{\rule{0ex}{0ex}}=\frac{1}{n}E\phantom{\rule{0.16667em}{0ex}}\left[\sum _{i=1}^{n}\sum _{j=1}^{n}{\int}_{0}^{\tau}{\int}_{0}^{\tau}\{{h}_{ij}(u)+{h}_{i}(u)-{h}_{ij}(u)\}\phantom{\rule{0.16667em}{0ex}}\{{h}_{ij}(v)+{h}_{j}(v)-{h}_{ij}(v)\}\phantom{\rule{0.16667em}{0ex}}{dL}_{i}(u){dL}_{j}(v)\right]\phantom{\rule{0ex}{0ex}}=\frac{1}{n}E\phantom{\rule{0.16667em}{0ex}}[\sum _{i=1}^{n}\sum _{j=1}^{n}{\int}_{0}^{\tau}{\int}_{0}^{\tau}{h}_{ij}(u){h}_{ij}(v){dL}_{i}(u){dL}_{j}(v)+2\sum _{i=1}^{n}\sum _{j=1}^{n}{\int}_{0}^{\tau}{\int}_{0}^{\tau}{h}_{ij}(u)\{{h}_{j}(v)-{h}_{ij}(v)\}{dL}_{i}(u){dL}_{j}(v)\phantom{\rule{0ex}{0ex}}+\sum _{i=1}^{n}\sum _{j=1}^{n}{\int}_{0}^{\tau}{\int}_{0}^{\tau}\{{h}_{i}(u)-{h}_{ij}(u)\}\{{h}_{j}(v)-{h}_{ij}(v)\}{dL}_{i}(u){dL}_{j}(v)]\stackrel{\mathit{def}}{=}Ie+If+Ig.$$

We then prove *Ie*, *If*,
*Ic* are all *o*(1) term. To prove
this we first note that
sup_{t}_{[0,}_{τ}_{]}
| *EL _{j}*(

$$E\{\underset{u\in [0,\tau ]}{sup}{\mid {h}_{i}(u)\mid}^{2}\}=o(1),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}E\{\underset{u\in [0,\tau ]}{sup}{\mid {h}_{ji}(u)\mid}^{2}\}=o(1),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}E\{n\underset{u\in [0,\tau ]}{sup}{\mid {h}_{ji}(u)-{h}_{i}(u)\mid}^{2}\}=o(1).$$

Therefore, for *Ie*, we have

$$\begin{array}{l}Ie=(n-1)E\phantom{\rule{0.16667em}{0ex}}\left\{{\int}_{0}^{\tau}{h}_{ij}(u){dL}_{i}(u){\int}_{0}^{\tau}{h}_{ij}(v){dL}_{j}(v)\right\}+\frac{1}{n}\sum _{i=1}^{n}E\phantom{\rule{0.16667em}{0ex}}{\left\{{\int}_{0}^{\tau}{h}_{i}(u){dL}_{i}(u)\right\}}^{2}\\ \le (n-1)E\{\underset{u\in [0,\tau ]}{sup}{\mid {h}_{ij}(u)\mid}^{2}\}{\left\{{\int}_{0}^{\tau}\mid E{S}_{i}^{(k)}(u,{\beta}_{0})-{s}^{(k)}(u,{\beta}_{0})m(u)\mid du\right\}}^{2}+\frac{1}{n}\sum _{i=1}^{n}E\phantom{\rule{0.16667em}{0ex}}{\left\{{\int}_{0}^{\tau}{h}_{i}(u){dL}_{i}(u)\right\}}^{2}\\ =o(1).\end{array}$$

For *If*, we have

$$\begin{array}{l}\mid If\mid =\left|2(n-1)E\phantom{\rule{0.16667em}{0ex}}\left\{{\int}_{0}^{\tau}{h}_{ij}(u){dL}_{i}(u){\int}_{0}^{\tau}\{{h}_{j}(v)-{h}_{ij}(v)\}{dL}_{j}(v)\right\}\right|\\ =2(n-1)\left|E{\int}_{0}^{\tau}{h}_{ij}(u){dL}_{i}(u)[E{L}_{j}(t)\{{h}_{j}(\tau )-{h}_{ij}(\tau )\}-{\int}_{0}^{\tau}E{L}_{j}(u)d\{{h}_{j}(u)-{h}_{ij}(u)\}]\right|\\ \le o(1)+2(n-1)E\phantom{\rule{0.16667em}{0ex}}\left[\left|{\int}_{0}^{\tau}{h}_{ij}(u){dL}_{i}(u)\right|{\int}_{0}^{\tau}\mid E{L}_{j}(u)\mid \left|d\phantom{\rule{0.16667em}{0ex}}\left\{\frac{{n}^{-1}g(u){S}_{i}^{(0)}(u)}{{\stackrel{\sim}{S}}_{-ij}^{(0)}(u){\stackrel{\sim}{S}}_{-j}^{(0)}(u)}\right\}\right|\right]\\ \le o(1)+2(n-1){n}^{-1}E\phantom{\rule{0.16667em}{0ex}}[\left|{\int}_{0}^{\tau}{h}_{ij}(u){dL}_{i}(u)\right|\xb7O({\{log(n)\}}^{3})\xb7{\int}_{0}^{\tau}\mid E{L}_{j}(u)\mid \phantom{\rule{0.16667em}{0ex}}\left\{{S}_{i}^{(0)}(u)\mid dg(u)\mid +g(u)\mid {\stackrel{.}{S}}_{i}^{(0)}(u)\mid du+g(u){S}_{i}^{(0)}(u)\mid {\widehat{S}}_{-j}^{(0)}(u)\mid du+g(u){S}_{i}^{(0)}(u)\mid {\widehat{S}}_{-ij}^{(0)}(u)\mid du\right\}]\\ =o(1)+2(n-1){n}^{-1}E\left|{\int}_{0}^{\tau}{h}_{ij}(u){dL}_{i}(u)\right|\xb7O({\{log(n)\}}^{3})\xb7O(1)=o(1),\end{array}$$

where
(·) denotes the derivative of
*S*(·). For *Ig*, by
Cauchy-Schwartz Inequality,

$$\begin{array}{l}\mid Ig\mid \phantom{\rule{0.16667em}{0ex}}\le 1/n\sum _{i=1}^{n}\sum _{j=1}^{n}\mid E{\int}_{0}^{\tau}\{{h}_{i}(u)-{h}_{ij}(u)\}{dL}_{i}(u){\int}_{0}^{\tau}\{{h}_{j}(v)-{h}_{ij}(v)\}{dL}_{j}(v)\mid \\ \le 1/n\sum _{i=1}^{n}\sum _{j=1}^{n}{[E{\{{\int}_{0}^{\tau}\{{h}_{i}(u)-{h}_{ij}(u)\}{dL}_{i}(u)\}}^{2}]}^{1/2}{[E{\{{\int}_{0}^{\tau}\{{h}_{j}(u)-{h}_{ij}(u)\}{dL}_{j}(u)\}}^{2}]}^{1/2}\\ \le 1/n\sum {\sum}_{i\ne j}E\underset{u\in [0,\tau ]}{sup}{\mid {h}_{1}(u)-{h}_{12}(u)\mid}^{2}E{\{{\int}_{0}^{t}\mid {dL}_{1}(u)\mid \}}^{2}=o(1).\end{array}$$

Therefore, we have $1/\sqrt{n}{\sum}_{i=1}^{n}{\int}_{0}^{\tau}{h}_{i}(u){dL}_{i}(u)={o}_{p}(1)$. Moreover, since

$$\frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\int}_{0}^{\tau}g(u)\{{\stackrel{\sim}{S}}_{h}^{(0)}{(u,{\beta}_{0})}^{-1}-{\stackrel{\sim}{S}}_{-i}^{(0)}{(u,{\beta}_{0})}^{-1}\}{dL}_{i}(u)={o}_{p}(1),$$

we have proved the equation $\sqrt{n}{\int}_{0}^{\tau}g(t)\{{\stackrel{\sim}{S}}^{(0)}{(t,{\beta}_{0})}^{-1}-{s}^{(0)}{(t,{\beta}_{0})}^{-1}m{(t)}^{-1}\}\{{\widehat{S}}^{(k)}(t,{\beta}_{0})-{s}^{(k)}(t,{\beta}_{0})m(t)\}dt={o}_{p}(1)$. Since $\sqrt{n}{sup}_{t\in [0,\tau ]}\mid {\stackrel{\sim}{S}}_{h}^{(0)}(t,{\beta}_{0})-{\widehat{S}}_{h}^{(0)}(t,{\beta}_{0})\mid ={o}_{p}(1)$, we have

$$\sqrt{n}{\int}_{0}^{\tau}g(t)\{{\widehat{S}}^{(0)}{(t,{\beta}_{0})}^{-1}-{s}^{(0)}{(t,{\beta}_{0})}^{-1}m{(t)}^{-1}\}\{{\widehat{S}}^{(k)}(t,{\beta}_{0})-{s}^{(k)}(t,{\beta}_{0})m(t)\}dt={o}_{p}(1).$$

Hence (ii) is proved.

1. Andersen PK, Borgan O, Gill RD, Keiding N. Statistical Models Based on Counting Processes. Springer-Verlag; New York: 1993.

2. Cook RJ, Lawless JF. The Statistical Analysis of Recurrent Events. New York: Springer; 2007.

3. Lin DY, Wei LJ, Yang I, Ying Z. Semiparametric regression for the mean and rate functions of
recurrent events. Journal of the Royal Statistical Society, Ser B. 2000;62:711–730.

4. Liu L, Wolfe RA, Huang X. Shared frailty models for recurrent events and a terminal
event. Biometrics. 2004;60(3):747–756. [PubMed]

5. Schaubel DE, Cai J. Analysis of clustered recurrent event data with application to
hospitalization rates among renal failure patients. Biostatistics. 2005;6(3):404–419. [PubMed]

6. Farrington CP, Whitaker HJ. Semiparametric analysis of case series data. Journal of the Royal Statistical Society, Series C. 2006;55:553–594.

7. Huang CY, Qin J, Wang MC. Semiparametric analysis for recurrent event data with
time-dependent covariates and informative censoring. Biometrics. 2010;66:39–49. [PMC free article] [PubMed]

8. Andersen PK, Liestol K. Attenuation caused by infrequently updated covariates in survival
analysis. Biostatistics. 2003;4(4):633–649. [PubMed]

9. Prentice RL. Covariate measurement errors and parameter estimates in a failure
time regression time. Biometrika. 1982;69:331–342.

10. Faucett CL, Schenker N, Elashoff RM. Analysis of censored survival data with intermittently observed
time-dependent binary covariates. Journal of the American Statistical Association. 1998;93:427–437.

11. Tsiatis AA, Davidian M. Joint modelling of longitudinal and time-to-event data: An
overview. Statistica Sinica. 2004;14:809–834.

12. Lawless JF, Nadeau C. Some simple robust methods for the analysis of recurrent
events. Technometrics. 1995;37(2):158–168.

13. Lawless JF, Nadeau C, Cook RJ. Proceedings of the First Seattle Symposium in Biostatistics: Survival
Analysis. New York: Springer; 1997. Analysis of mean and rate functions for recurrent
events; pp. 37–49.

14. Prentice RL, Williams BJ, Peterson AV. On the regression analysis of multivariate failure time
data. Biometrika. 1981;68:373–379.

15. Andersen PK, Gill RD. Cox’s regression model for counting processes: A large
sample study. Annals of Statistics. 1982;10:1100–1120.

16. Huang CY, Wang MC. Joint modeling and estimation for recurrent event processes and
failure time data. Journal of the American Statistical Association. 2004;99:1153–1165. [PMC free article] [PubMed]

17. Nadaraya EA. On estimating regression. Theory of Probability and its Applications. 1964;9:141–142.

18. Waston GS. Smooth regression analysis. The Indian Journal of Statistics, Series A. 1964;26:359–372.

19. Silverman BW. Density Estimation for Statistics and Data Analysis. Chapman and Hall; 1986.

20. Tian L, Zucker D, Wei L. On the Cox model with time-varying regression
coefficients. Journal of the American statistical Association. 2005;100(469):172–183.

21. Shaikh N, Leonard E, Martin JM. Prevalence of streptococcal pharyngitis and streptococcal
carriage in children: A meta-analysis. Pediatrics. 2010;126:557–564. [PubMed]

22. Sun Y. Estimation of semiparametric regression model with longitudinal
data. Lifetime Data Analysis. 2010;16(2):271–298. [PMC free article] [PubMed]

23. Liu L, Huang X. Joint analysis of correlated repeated measures and recurrent
events processes in the presence of death, with application to a study on
acquired immune deficiency syndrome. Journal of the Royal Statistical Society, Series C. 2009;58(1):65–81.

24. Raboud J, Reid N, Coates R, Farewell V. Estimating risks of progressing to aids when covariates are
measured with error. Journal of the Royal Statistical Society. Series A. 1993:393–406.

25. Whitaker HJ, Hocine MN, Farrington CP. The methodology of self-controlled case series
studies. Statistical methods in medical research. 2009;18:7–26. [PubMed]

26. Tom BD, Farewell VT. Intermittent observation of time-dependent explanatory variables:
a multistate modelling approach. Statistics in Medicine. 2011;30(30):3520–3531. [PubMed]

27. Van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. Springer; 1996.

28. Nan B, Wellner JA. A general semiparametric Z-estimation approach for case-cohort
studies. Statistica Sinica. 2013;23(3):1155–1180. [PMC free article] [PubMed]

29. Kosorok MR. Introduction to Empirical Processes and Semiparametric
Inference. New York: Springer; 2007.

30. Mammen E, Nielsen JP. A general approach to the predictability issue in survival
analysis with applications. Biometrika. 2007;94(4):873–892.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |