Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3259712

Formats

Article sections

- Abstract
- 1 Introduction
- 2 Estimation
- 3 Asymptotic Properties
- 4 Data-Driven Bandwidth Selection
- 5 Simulation Study
- 6 Vascular Disease Application
- 7 Concluding Remarks
- References

Authors

Related links

Ann Inst Stat Math. Author manuscript; available in PMC 2013 April 1.

Published in final edited form as:

Ann Inst Stat Math. 2012 April 1; 64(2): 415–438.

doi: 10.1007/s10463-010-0317-2PMCID: PMC3259712

NIHMSID: NIHMS250291

Hazard function estimation is an important part of survival analysis. Interest often centers on estimating the hazard function associated with a particular cause of death. We propose three nonparametric kernel estimators for the hazard function, all of which are appropriate when death times are subject to random censorship and censoring indicators can be missing at random. Specifically, we present a regression surrogate estimator, an imputation estimator, and an inverse probability weighted estimator. All three estimators are uniformly strongly consistent and asymptotically normal. We derive asymptotic representations of the mean squared error and the mean integrated squared error for these estimators and we discuss a data-driven bandwidth selection method. A simulation study, conducted to assess finite sample behavior, demonstrates that the proposed hazard estimators perform relatively well. We illustrate our methods with an analysis of some vascular disease data.

A common feature of survival data is the presence of right censored observations. Censoring can occur, for example, if individuals withdraw from a study before dying or if a study ends before all subjects have died. Additionally, when multiple causes of death are operating, the time to death from one cause can be censored by a death from a different cause. For instance, in a clinical trial one might distinguish between deaths attributable to the disease of interest and deaths due to all other causes. Without loss of generality, we focus on a particular cause of death and treat all other causes as censoring mechanisms with respect to the death time of interest.

Let *T* and *C*^{(0)} be random variables representing the time to death from the cause of interest and the time to usual (right) censoring, respectively. Let *T*^{(1)}, *T*^{(2)}, ···, *T*^{(}^{r}^{)} be the times to death from all other causes. In our problem, *T* may be censored by *C*^{(0)}, *T*^{(1)}, ···, *T*^{(}^{r}^{−1)} or *T*^{(}^{r}^{)}. Let *C* = min(*C*^{(0)}, *T*^{(1)}, ···, *T*^{(}^{r}^{)}), where *C* denotes the censoring random variable. We assume that *T* and *C* are independent and we observe *X* = min(*T*, *C*) and *δ* = *I*(*T* ≤ *C*), where *I*(·) is the indicator function. Let *F*, *G*, and *L* be the cumulative distribution functions for *T*, *C*, and *X*, respectively. Finally, let *λ*(*t*) = *lim _{ε}*

Censored survival time problems frequently are characterized in terms of hazard functions, and thus the estimation of *λ*(*t*) has received much attention. Suppose the data consist of *n* independent and identically distributed pairs {(*X _{i}*,

$${\lambda}_{n}(t)=\frac{1}{{h}_{n}}\sum _{i=1}^{n}K\left(\frac{t-{X}_{i}}{{h}_{n}}\right)\frac{{\delta}_{i}}{n-{R}_{i}+1},$$

(1)

where *R _{i}* is the rank of

This paper addresses the problem in which cause of death is unknown for a subset of individuals, and thus some of the censoring indicators are missing. For example, van der Laan and McKeague (1998) describe epidemiological studies in which death certificates were missing for some people, mainly due to emigration or inconclusive hospital case notes and autopsy results. They point out that it can be impossible to determine whether death was due to the cause of interest in these cases. Missing causes of death also arise in carcinogenicity experiments. In some studies only a subset of animals are examined for tumors to cut costs; occasionally tissues autolyze or are cannibalized by cage mates before a necropsy can be performed; and pathologists are not always able to determine each tumor’s role in causing death. For example, Kalbfleisch and Prentice (1980) provide data on mice who died from leukemia, other known (non-leukemia) causes, or unknown causes; and Dinse (1986) presents data on mice whose status of nonrenal vascular disease at death was classified as absent, incidental, fatal, or unknown. This last data set is analyzed in Section 6.

The general problem of analyzing censored survival data with missing cause-of-death data (or missing censoring indicators) has received much attention. Dinse (1982) derived the nonparametric maximum likelihood estimator of the survival function in this situation; see, also, the estimators of Dinse (1986), Lo (1991), McKeague and Subramanian (1998), van der Laan and McKeague (1998), and Subramanian (2004, 2006). Other authors have considered hypothesis testing and regression modeling. Goetghebeur and Ryan (1990) derived a modified log rank test to compare survival rates in two groups; Dewanji (1992) suggested an improvement to that approach; and Goetghebeur and Ryan (1995) extended their earlier results to the proportional hazards regression model. Tsiatis *et al.* (2002) used multiple imputation methods to evaluate treatment differences in survival. Recently, Gao and Tsiatis (2005) developed a semiparametric procedure to estimate regression coefficients in a linear transformation competing risks model. Klein and Moeschberger (2003, Chapter 6) point out the importance of kernel estimation for hazard functions in the presence of censored data. In this paper, we concentrate on nonparametrically estimating the hazard function, *λ*(*t*), by extending well-known kernel smoothing methods to allow for missing data.

Suppose that *X* is always observed, but the censoring indicator *δ* is missing for some subjects. Define a missingness indicator *ξ* which is 1 if *δ* is observed and is 0 otherwise. Therefore, we observe either {*X*, *δ*, *ξ* = 1} or {*X*, *ξ* = 0}. Throughout this paper, we assume that *δ* is missing at random (MAR), which implies that *ξ* and *δ* are conditionally independent given *X: P*(*ξ* = 1|*X*, *δ*) = *P*(*ξ* = 1|*X*). The MAR assumption is common in statistical analyses involving missing data and is reasonable in many practical situations; see, for example, Little and Rubin (1987, Chapter 1).

When some censoring indicators are missing, the hazard estimator in (1) cannot be applied directly. One simple solution is to use only the complete cases, {*X*, *δ*, *ξ* = 1}, and to ignore all subjects with missing indicators, {*X*, *ξ* = 0}. However, the resulting complete case (CC) estimator is highly inefficient if there is a significant degree of missingness; see, e.g., van der Laan and McKeague (1998). Also, the CC estimator is consistent and unbiased only when the censoring indicators are missing completely at random (MCAR), which is a special case of MAR where *ξ* is independent of both *X* and *δ: P*(*ξ* = 1|*X*, *δ*) = *P*(*ξ* = 1); see, e.g., Tsiatis *et al.* (2002).

Imputation has become a popular method for handling missing data; see, for example, Rubin (1987), Lipsitz *et al.* (1998), Robins and Wang (2000), and Wang and Rao (2002). The popularity of this approach stems largely from the fact that once the missing values are imputed, standard techniques for analysing complete data can be readily applied. The inverse probability weighted procedure is also widely used in missing data situations; see, for example, Robins and Rotnitzky (1992), Robins *et al.* (1994), and Zhao *et al.* (1996). These two approaches are usually applied to regression problems with missing responses or covariates, but here we adapt them to handle missing censoring indicators.

This paper develops three kernel estimators for the hazard function: a regression surrogate estimator, an imputation estimator, and an inverse probability weighted estimator. The regression surrogate estimator is based on a particular expression for *λ*(*t*), and the imputation estimator and inverse probability weighted estimator are motivated by the regression surrogate estimator. All three estimators are of the form given in (1), except that some or all of the *δ _{i}* values are replaced by other quantities. The regression surrogate estimator replaces every

The paper is organized as follows. Section 2 defines three nonparametric hazard estimators. Section 3 shows that these estimators are uniformly strongly consistent and asymptotically normal under the MAR assumption, and derives asymptotic representations for the mean squared error (MSE) and mean integrated squared error (MISE). Section 4 gives a data-driven bandwidth selection procedure. Section 5 reports simulation results for evaluating finite sample performance; Section 6 illustrates our methods by applying them to data from an animal experiment; and Section 7 provides a few concluding remarks. Finally, the main results are proved in the Appendices.

The hazard function of interest, *λ*(*t*), can be expressed as

$$\lambda (t)=\frac{f(t)}{1-F(t)}=\frac{[1-G(t)]f(t)}{1-L(t)}=\frac{1}{1-L(t)}\frac{{dL}_{1}(t)}{dt},$$

(2)

where *f*(*t*) = *dF*(*t*)/*dt* and *L*_{1}(*t*) = *P* (*X* ≤ *t*, *δ* = 1). As noted by Dikta (1998), we can write
${L}_{1}(t)={\int}_{0}^{t}m(s)dL(s)$, where *m*(*s*) = *P* (*δ* = 1|*X* = *s*) is the conditional expectation of the censoring indicator given the observation time, and thus (2) yields

$$\lambda (t)=\frac{m(t)}{1-L(t)}\frac{dL(t)}{dt}.$$

(3)

Given (3), we use kernel smoothing to define a regression surrogate estimator of *λ*(*t*):

$${\widehat{\lambda}}_{n,S}(t)=\frac{1}{{h}_{n}}\int K\left(\frac{t-s}{{h}_{n}}\right)\frac{{m}_{n}(s){dL}_{n}(s)}{1-{L}_{n}(s-)},$$

(4)

where
${m}_{n}(s)={\displaystyle \sum _{i=1}^{n}}{\xi}_{i}{\delta}_{i}W\left({\scriptstyle \frac{s-{X}_{i}}{{b}_{n}}}\right)/{\displaystyle \sum _{i=1}^{n}}{\xi}_{i}W\left({\scriptstyle \frac{s-{X}_{i}}{{b}_{n}}}\right),{L}_{n}(s)={n}^{-1}{\displaystyle \sum _{i=1}^{n}}I({X}_{i}\le s)$, and *s*− is the time just before *s*. Here *m _{n}*(

As *n*[1 − *L _{n}*(

$${\widehat{\lambda}}_{n,S}(t)=\frac{1}{{h}_{n}}\sum _{i=1}^{n}K\left(\frac{t-{X}_{i}}{{h}_{n}}\right)\frac{{m}_{n}({X}_{i})}{n-{R}_{i}+1}.$$

(5)

If no censoring indicators are missing, the regression surrogate estimator in (5) reduces to a pre-smoothed Nelson-Aalen type estimator (Cao *et al.*, 2005; Cao and Jacome, 2004; Jacome *et al.*, 2008). Similarly, the basic kernel estimator in (1), which is appropriate when none of the censoring indicators are missing, coincides with the regression surrogate estimator in (5) if every *δ _{i}* is replaced by

$${\widehat{\lambda}}_{n,I}(t)=\frac{1}{{h}_{n}}\sum _{i=1}^{n}K\left(\frac{t-{X}_{i}}{{h}_{n}}\right)\frac{{\xi}_{i}{\delta}_{i}+(1-{\xi}_{i}){m}_{n}({X}_{i})}{n-{R}_{i}+1}.$$

(6)

Finally, define *π*(*x*) = *P*(*ξ* = 1|*X* = *x*), plus its Nadaraya-Watson kernel regression estimator
${\pi}_{n}(x)={\displaystyle \sum _{i=1}^{n}}{\xi}_{i}\mathrm{\Omega}\left({\scriptstyle \frac{x-{X}_{i}}{{\gamma}_{n}}}\right)/{\displaystyle \sum _{i=1}^{n}}\mathrm{\Omega}\left({\scriptstyle \frac{x-{X}_{i}}{{\gamma}_{n}}}\right)$, where Ω(·) is a kernel function and *γ _{n}* is a bandwidth sequence. Our inverse probability weighted estimator of

$${\widehat{\lambda}}_{n,W}(t)=\frac{1}{{h}_{n}}\sum _{i=1}^{n}K\left(\frac{t-{X}_{i}}{{h}_{n}}\right)\frac{[{\xi}_{i}/{\pi}_{n}({X}_{i})]{\delta}_{i}+[1-{\xi}_{i}/{\pi}_{n}({X}_{i})]{m}_{n}({X}_{i})}{n-{R}_{i}+1}.$$

(7)

In this case, *δ _{i}* in (1) is replaced by

Just like other nonparametric kernel methods, our methods are robust to the choice of kernel functions. Some common kernel functions include the uniform, biweight, and Epanechnikov kernel functions. Bandwidth selection for *h _{n}* and robustness of our methods with respect to

All three of our hazard estimators use a nonparametric estimator for *m*(*x*). One natural alternative is a semiparametric approach that assumes a specific parametric model, say *m*(*x*|*θ*), where *θ* is a finite-dimensional parameter and *m*(·|·) is a known function. Another alternative for handling missing data is the use of multiple imputation. Conditional on any observation time *X _{i}* for which the censoring indicator is missing (

This section discusses asymptotic properties of the estimators proposed in Section 2. Let * _{n}*(

Under the assumptions given in Appendix A, we have

$$\underset{0\le t\le \tau}{sup}\left|{\widehat{\lambda}}_{n}(t)-\lambda (t)\right|\stackrel{a.s.}{\to}0,$$

where 0 < τ < τ_{L} and τ_{L} = inf{t: L(t) = 1}.

The next theorem, which is proved in Appendix B, establishes the asymptotic normality of * _{n}*(

Under the assumptions given in Appendix B, we have

$$\sqrt{{nh}_{n}}\left({\widehat{\lambda}}_{n}(t)-\lambda (t)-{(-1)}^{k}\frac{{\lambda}^{(k)}(t){h}_{n}^{k}}{k!}\int {u}^{k}K(u)du\right)\stackrel{\mathcal{L}}{\to}N\left(0,{\sigma}^{2}(t)\right)$$

for any fixed 0< t < τ_{L}, where λ^{(k)}(t) is the kth derivative of λ(t).

The asymptotic variance, *σ*^{2}(*t*), in the above theorem is

$${\sigma}^{2}(t)=\frac{\lambda (t)}{1-L(t)}\int {K}^{2}(u)du+\frac{[1-\pi (t)]m(t)[1-m(t)]l(t)}{\pi (t){[1-L(t)]}^{2}}\int {K}^{2}(u)du,$$

(8)

where *l*(*t*) = *dL*(*t*)/*dt*. A consistent estimator of *σ*^{2}(*t*), say
${\widehat{\sigma}}_{n}^{2}(t)$, can be obtained by replacing *λ*(*t*), *L*(*t*), *π*(*t*), *m*(*t*) and *l*(*t*) in (8) by estimators * _{n}*(

The previous theorem implies that the asymptotically optimal bandwidth for a fixed value of *t*, which minimizes the asymptotic mean squared error, is

$${h}_{\mathit{opt},t}={\left(\frac{{\sigma}^{2}(t)}{{[{\lambda}^{(k)}(t)/k!]}^{2}{\left[\int {u}^{k}K(u)du\right]}^{2}}\right)}^{{\scriptstyle \frac{1}{2k+1}}}{n}^{-{\scriptstyle \frac{1}{2k+1}}}.$$

This result also can be obtained by applying part (i) of the following theorem, which establishes the asymptotic MSE and MISE representations, as derived in Appendix C.

Under the assumptions given in Appendices B and C:

- We have for any fixed 0 < t < τ
_{L}:$$E{[{\widehat{\lambda}}_{n}(t)-\lambda (t)]}^{2}={h}_{n}^{2k}{\left(\frac{{\lambda}^{(k)}(t)}{k!}\int {u}^{k}K(u)du\right)}^{2}+\frac{{\sigma}^{2}(t)}{{nh}_{n}}+o\left({({nh}_{n})}^{-1}\right)+o({h}_{n}^{2k}).$$ - If ∫ λ(t)w(t)/[1 − L(t)] dt < ∞ and ∫ l(t)w(t)/[1 − L(t)]
^{2}dt < ∞, we have:$$E\int {[{\widehat{\lambda}}_{n}(t)-\lambda (t)]}^{2}w(t)dt={h}_{n}^{2k}\left(\int {[{\lambda}^{(k)}(t)/k!]}^{2}w(t)dt\right){\left(\int {u}^{k}K(u)\phantom{\rule{0.16667em}{0ex}}du\right)}^{2}+{({nh}_{n})}^{-1}\int {\sigma}^{2}(t)w(t)dt+o\left({({nh}_{n})}^{-1}\right)+o({h}_{n}^{2k}),$$where w(t) is a weight function used to eliminate endpoint effects. A typical weight function is w(t) = 1 for t in some interval, say [τ_{L}, τ_{U}], and w(t) = 0 otherwise.

As a consequence of Theorem 3(ii), the asymptotically optimal bandwidth that minimizes the asymptotic mean integrated squared error is

$${h}_{\mathit{opt}}={\left(\frac{\int {\sigma}^{2}(t)w(t)dt}{\left[\int {({\lambda}^{(k)}(t)/k!)}^{2}w(t)dt\right]{\left[\int {u}^{k}K(u)du\right]}^{2}}\right)}^{{\scriptstyle \frac{1}{2k+1}}}{n}^{-{\scriptstyle \frac{1}{2k+1}}}.$$

(9)

Obviously, *h _{opt}* depends on the unknowns

Theorem 3 shows that the choice of bandwidths b_{n} and γ_{n} does not affect the first-order term of the mean squared error, though it might affect higher order terms. Consequently, the selection of b_{n} and γ_{n} is not critical to the estimator _{n}(t), a result which is also verified in our simulation study. Thus, in the next section, we consider the selection of h_{n} only.

Cross-validation techniques based on least squares regression have been applied to density estimation for censored data by Marron and Padgett (1987). These techniques were extended to hazard estimation by Sarda and Vieu (1991) and Patil (1993a) for uncensored data, and by Patil (1993b) and González-Manteiga *et al.* (1996) for censored data. We further extend the least squares cross-validation approach to the case of hazard estimation for censored data with missing censoring indicators.

In our situation, the least squares cross-validated bandwidth is the minimizer of

$$CV({h}_{n})=\int {\widehat{\lambda}}_{n}^{2}(t)w(t)dt-\frac{2}{n}\sum _{i=1}^{n}\frac{{\widehat{\lambda}}_{n}^{(-i)}({X}_{i})}{1-{L}_{n}({X}_{i})}w({X}_{i}){Q}_{n}({X}_{i},{\delta}_{i},{\xi}_{i}),$$

(10)

where * _{n}*(

$$\frac{{\text{ISE}}_{w}({h}_{\mathit{opt},n})}{{inf}_{{h}_{n}{\mathcal{H}}_{n}}}$$

where is a set of bandwidths satisfying certain regularity conditions and ISE* _{w}* is the integrated weighted squared error: ISE

We conducted a simulation study to compare the finite sample properties of our estimators with those of the complete case estimator, say * _{CC}*(

The Weibull distribution is very flexible and is often used to analyze lifetime data. Thus, we generated the failure time *T* and censoring time *C* from a Weibull distribution with shape parameter *τ* and scale parameter *η*, denoted by *W*(*τ*, *η*). Given *T* and *C*, we defined *X* = min(*T*, *C*) and *δ* = *I*(*T* ≤ *C*) for each subject. We fixed (*τ*, *η*) = (3, 1) for *T*, and we specified (*τ*, *η*) = (2, 1.96) for *C* to obtain a 20% censoring rate, (*τ*, *η*) = (2, 1.25) for *C* to obtain a 40% censoring rate, and (*τ*, *η*) = (2, 0.74) for *C* to obtain a 70% censoring rate. We used the logistic model *π*(*x*) = [1 + exp(−*θ*_{1} − *θ*_{2}*x*)]^{−1} to classify some of the censoring indicators as missing. Given *X* = *x*, the missingness indicator *ξ* was set to 1 with probability *π*(*x*); otherwise *ξ* was set to 0 (and *δ* was treated as missing). We denoted *π*(*x*) by *π*_{1}(*x*) or *π*_{2}(*x*) when the corresponding average missingness rate was approximately 20% or 40%, respectively. When the censoring rate (CR) was 20%, we set (*θ*_{1}, *θ*_{2}) to (0.7, 0.87) for *π*_{1}(*x*) and (0.32, 0.1) for *π*_{2}(*x*). Similarly, for CR = 40%, we set (*θ*_{1}, *θ*_{2}) to (0.7, 0.98) for *π*_{1}(*x*) and (0.33, 0.1) for *π*_{2}(*x*); and for CR = 70%, we set (*θ*_{1}, *θ*_{2}) to (0.7, 1.28) for *π*_{1}(*x*) and (0.33, 0.13) for *π*_{2}(*x*). We generated 1000 samples of size *n* = 30, 60, and 120 for each choice of CR and *π* (*x*). We used the biweight kernel function
$K(u)={\scriptstyle \frac{15}{16}}{(1-{u}^{2})}^{2}$ if |*u*| ≤ 1 and *K*(*u*) = 0 otherwise, and the uniform kernel functions
$W(u)=\mathrm{\Omega}(u)={\scriptstyle \frac{1}{2}}$ if |*u*| ≤ 1 and *W*(*u*) = Ω(*u*) = 0 otherwise. Finally, we took *w*(*t*) = 1 if *t* [0, 2], and 0 otherwise.

First, we investigated how the bandwidth *h _{n}* obtained via least squares cross-validation varied with bandwidths

Figure 1a. CV-Optimal bandwidth *h*_{opt}_{,}_{n} against bandwidths *b*_{n} and *γ*_{n}. The *h*_{opt}_{,}_{n} surface is the average *h*_{opt}_{,}_{n} over 1000 replicate samples of size *n* = 60, with a censoring rate of *CR* = 20% and a missingness rate given by *π*_{2}(*x*). These results **...**

Alternatively, an optimal bandwidth *h _{n}* can be obtained by minimizing the mean integrated squared error; refer to the left part of the equation in Theorem 3(ii) and notice that the true hazard function is known to be

Second, we studied the effects of *b _{n}* and

Mean integrated squared error as a function of bandwidths *b*_{n} and *γ*_{n}. The MISE surface is calculated from 1000 simulated values of _{n}_{,}_{W} (*t*), with *n* = 60, a censoring rate of *CR* = 20%, a missingness rate given by *π*_{2}(*x*), **...**

Next, Table 1 gives the MISE values for all estimators and every combination of *n*, CR, and *π*(*x*). We used
$({b}_{n},{\gamma}_{n})=({n}^{-{\scriptstyle \frac{1}{3}}},{n}^{-{\scriptstyle \frac{1}{3}}})$, and we set *h _{n}* equal to the MISE-optimal bandwidth. In most cases, regardless of the choice of

Mean integrated squared error (MISE) by sample size (n), censoring rate (CR), and missingness rate (π) for five hazard estimators.

Finally, plots of the true hazard rate and average curves associated with all five estimators are presented in Figure 3 for samples of size *n* = 60. In each of the four subplots, which correspond to the four combinations of *π*(*x*) and the lower two censoring rates, the dotted line shows the true hazard rate *λ*(*t*) = 3*t*^{2}. The other curves are time-specific averages of the 1000 hazard estimates corresponding to *λ _{n}*(

This section illustrates our methods by applying them to some data from an animal experiment. These data were previously analyzed by Dinse (1986), who reported the survival time and disease status at death for 58 female mice. At necropsy, each mouse was examined for nonrenal vascular disease (NRVD). Survival was measured in days and NRVD status at death was classified as absent, incidental, unknown, or fatal. An occurrence of NRVD was considered incidental if it was present but not responsible for death and fatal if the mouse died as a direct or indirect result of its disease. In some cases, NRVD was found to be present, but its role in causing death was unknown.

We applied our methods to estimate the hazard function for death due to NRVD, say *λ*(*t*), among the subset of mice with the disease. Time to death (*X*) was known for all mice. We used the same kernel functions and cross-validation bandwidth selection method as in Section 5. Also, we used *w*(*t*) = *I*(*X*_{(1)} < *t* < *X*_{(}_{n}_{)}), where *X*_{(1)} = min{*X*_{1}, *X*_{2}, ···, *X _{n}*} and

Theoretically, all three proposed estimators have the same asymptotic representation (see Lemma 1 in Appendix B) and hence they all have the same asymptotic normal distribution (see Theorem 2). This asymptotic equivalence is similar to that obtained by Cheng (1994) for the marginal average estimator and the imputation estimator in the nonparametric regression context, and to that obtained by Wang *et al.* (2004) for the marginal average estimator, the regression imputation estimator, and the marginal propensity score weighted estimator in the semiparametric regression context. Furthermore, it is shown that our estimators have the same asymptotic MSE and MISE representations (see Theorem 3). In small samples, however, simulation results show that _{n}_{,}* _{S}*(

On the other hand, the inverse probability weighted approach enjoys the so-called “double robustness” property (see Scharfstein *et al.*, 1999). That is, if *m*(*x*) and *π*(*x*) are specified by parametric models *m*(*x*|*θ*) and *π*(*x*|*β*), respectively, the corresponding weighted estimator is consistent as long as one of the two models is specified correctly, where *θ* and *β* are finite dimensional parameters. This property implies that the weighted estimator is consistent if either *m*(*x*) or *π*(*x*) is estimated nonparametrically and the other is specified to be a known function, regardless of whether the specification is correct or not. However, the efficiency of the weighted estimator depends on the bias between the specified model and the true one; the larger the bias, the larger the loss of efficiency. The regression surrogate and imputation methods do not share this property.

This research was supported by National Science Fund for Distinguished Young Scholars, National Natural Science Foundation of China (10671198), National Science Fund for Creative Research Groups, a grant from the Research Grants Council of the Hong Kong, China (HKU 7050/06P), a grant from Key Lab of Random Complex Structures and Data Science, CAS and the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences (Z01-ES-102685). The authors thank Shyamal Peddada and Norman Kaplan for their comments and suggestions.

We begin by making the following assumptions:

(A.*mπ*): *m*(·) and *π*(·) are uniformly continuous functions.

(A.*lλ*): *l*(·) and *λ*(·) are continuous functions.

(A.K): *K*(·) is a probability density kernel function with bounded support and bounded variation.

(A.*W*Ω): *W*(·) and Ω(·) are bounded kernel functions with bounded support.

(A.*h _{n}*):

(A.*b _{n}*):

(A.*γ _{n}*):

First we prove that Theorem 1 is true for _{n}_{,}* _{I}*(

$$\begin{array}{l}{\widehat{\lambda}}_{n,I}(t)-\lambda (t)=\left[{\scriptstyle \frac{1}{{h}_{n}}}\sum _{i=1}^{n}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right){\scriptstyle \frac{{\xi}_{i}{\delta}_{i}+(1-{\xi}_{i})m({X}_{i})}{n-{R}_{i}+1}}-\lambda (t)\right]+\left[{\scriptstyle \frac{1}{{h}_{n}}}\sum _{i=1}^{n}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right)\phantom{\rule{0.16667em}{0ex}}\left({\scriptstyle \frac{{\xi}_{i}{\delta}_{i}+(1-{\xi}_{i}){m}_{n}({X}_{i})}{n-{R}_{i}+1}}-{\scriptstyle \frac{{\xi}_{i}{\delta}_{i}+(1-{\xi}_{i})m({X}_{i})}{n-{R}_{i}+1}}\right)\right]\\ :={T}_{n1}(t)+{T}_{n2}(t).\end{array}$$

(11)

As *n*[1 − *L _{n}*(

$$\begin{array}{l}{T}_{n1}(t)=\left[{\scriptstyle \frac{1}{{h}_{n}}}\sum _{i=1}^{n}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right){\scriptstyle \frac{{\delta}_{i}}{n-{R}_{i}+1}}-\lambda (t)\right]+\left[{\scriptstyle \frac{1}{{nh}_{n}}}\sum _{i=1}^{n}{\scriptstyle \frac{({\xi}_{i}-1)[{\delta}_{i}-m({X}_{i})]}{1-{L}_{n}({X}_{i}-)}}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right)\right]\\ :={T}_{n11}(t)+{T}_{n12}(t).\end{array}$$

(12)

Based on the results of Wang (1999), by (A.*lλ*), (A.K) and (A.*h _{n}*) we have for the first term in (12):

$$\underset{0\le t\le \tau}{sup}{T}_{n11}(t)\stackrel{a.s.}{\to}0.$$

(13)

The second term in (12) can be rewritten as

$$\begin{array}{l}{T}_{n12}(t)=\left[{\scriptstyle \frac{1}{{nh}_{n}}}\sum _{i=1}^{n}{\scriptstyle \frac{({\xi}_{i}-1)[{\delta}_{i}-m({X}_{i})]}{1-L({X}_{i})}}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right)\right]+\left[{\scriptstyle \frac{1}{{nh}_{n}}}\sum _{i=1}^{n}{\scriptstyle \frac{({\xi}_{i}-1)[{\delta}_{i}-m({X}_{i})][{L}_{n}({X}_{i}-)-L({X}_{i})]}{[1-{L}_{n}({X}_{i}-)][1-L({X}_{i})]}}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right)\right]\\ :={T}_{n12}^{[1]}(t)+{T}_{n12}^{[2]}(t).\end{array}$$

(14)

Define the following shorthand notation:
${L}_{n1}(t)={\scriptstyle \frac{1}{n}}{\displaystyle \sum _{i=1}^{n}}I[{X}_{i}\le t,{\delta}_{i}=1],{\stackrel{~}{L}}_{n1}(t)={\scriptstyle \frac{1}{n}}{\displaystyle \sum _{i=1}^{n}}I[{X}_{i}\le t,{\xi}_{i}=1],{\stackrel{~}{L}}_{n11}(t)={\scriptstyle \frac{1}{n}}{\displaystyle \sum _{i=1}^{n}}I[{X}_{i}\le t,{\delta}_{i}=1,{\xi}_{i}=1]$* *_{1}(*t*) = *P*(*X* ≤ *t*, *ξ* = 1) and *L*_{11}(*t*) = *P*(*X* ≤ *t*, *δ* = 1, *ξ* = 1). The first term in (14) can be written

$${T}_{n12}^{[1]}(t)={\scriptstyle \frac{1}{{h}_{n}}}\int K\left({\scriptstyle \frac{t-s}{{h}_{n}}}\right)\left[{\scriptstyle \frac{{dL}_{n11}(s)-m(s){d\stackrel{~}{L}}_{n1}(s)-{dL}_{n1}(s)+m(s){dL}_{n}(s)}{1-L(s)}}\right].$$

As *dL*_{1}(*s*) = *m*(*s*)*dL*(*s*) by definition, and *dL*_{11}(*s*) = *m*(*s*)*d*_{1}(*s*) under the MAR assumption, this same term also can be rewritten as

$${T}_{n12}^{[1]}(t)={\scriptstyle \frac{1}{{h}_{n}}}\int K\left({\scriptstyle \frac{t-s}{{h}_{n}}}\right)\left[{\scriptstyle \frac{d[{L}_{n11}(s)-{L}_{11}(s)]}{1-L(s)}}-{\scriptstyle \frac{m(s)d[{\stackrel{~}{L}}_{n1}(s)-{\stackrel{~}{L}}_{1}(s)]}{1-L(s)}}-{\scriptstyle \frac{d[{L}_{n1}(s)-{L}_{1}(s)]}{1-L(s)}}+{\scriptstyle \frac{m(s)d[{L}_{n}(s)-L(s)]}{1-L(s)}}\right].$$

(15)

Denote by *L*^{*}(·) the distribution or sub-distribution function *L*(·), *L*_{1}(·), _{1}(·), or *L*_{11}(·); and denote by
${L}_{n}^{}$ the corresponding empirical function *L _{n}*(·),

$$\underset{0\le t\le \tau}{sup}{T}_{n12}^{[1]}(t)\phantom{\rule{0.16667em}{0ex}}=O({h}_{n}^{-1}\phantom{\rule{0.16667em}{0ex}}{n}^{-{\scriptstyle \frac{1}{2}}}\sqrt{loglog\mathit{n}}),\phantom{\rule{0.38889em}{0ex}}a.s.$$

(16)

With respect to
${T}_{n12}^{[2]}(t)$, similar to (16) it can be shown that
${sup}_{0\le t\le \tau}{T}_{n12}^{[2]}(t)\stackrel{a.s.}{\to}0$. This together with (16) proves that
${sup}_{0\le t\le \tau}{T}_{n12}(t)\stackrel{a.s.}{\to}0$, and thus, together with (13), we have
${sup}_{0\le t\le \tau}{T}_{n1}(t)\stackrel{a.s.}{\to}0$. To prove Theorem 1, it remains to prove that
${sup}_{0\le t\le \tau}{T}_{n2}(t)\stackrel{a.s.}{\to}0$. First, we rewrite *T _{n}*

$${T}_{n2}(t)=\frac{1}{{nh}_{n}}\sum _{i=1}^{n}K\left(\frac{t-{X}_{i}}{{h}_{n}}\right)\frac{(1-{\xi}_{i})[{m}_{n}({X}_{i})-m({X}_{i})]}{1-{L}_{n}({X}_{i}-)}.$$

Then, by (A.*lλ*), (A.*mπ*), (A.WΩ) and (A.*b _{n}*), we have

$$\underset{0\le t\le \tau}{sup}{T}_{n2}(t)\phantom{\rule{0.16667em}{0ex}}\le \underset{0\le x\le \tau}{sup}{m}_{n}(x)-m(x)\underset{0\le t\le \tau}{sup}\left|{\scriptstyle \frac{1}{{nh}_{n}}}\sum _{i=1}^{n}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right){\scriptstyle \frac{1}{1-{L}_{n}({X}_{i}-)}}\right|$$

(17)

This proves Theorem 1 for _{n}_{,}* _{I}*(

We assume that the following conditions are true:

(B.*mlπλ*): *m*(·), *l*(·), *π*(·), and *λ*(·) have bounded derivatives of order *k* > 1.

(B.K): *K*(·) is a continuous kernel function of order *k* > 1 with bounded support.

(B.WΩ): *W*(·) and Ω(·) are bounded kernel functions of order *k* > 1 with bounded support.

(B.*h _{n}*):

(B.*h _{n}b_{n}*):

(B.*h _{n}γ_{n}*):

(B.*h _{n}b_{n}γ_{n}*):

Conditions (B.h_{n}), (B.h_{n}b_{n}), (B.h_{n}γ_{n}), and (B.h_{n}b_{n}γ_{n}) are clearly satisfied for k = 2 and
${h}_{n}=O\left({n}^{-{\scriptstyle \frac{1}{5}}}\right),{b}_{n}=O\left({n}^{-{\scriptstyle \frac{1}{3}}}\right)$, and
${\gamma}_{n}=O\left({n}^{-{\scriptstyle \frac{1}{3}}}\right)$.

If _{n}(t) denotes any one of the proposed estimators _{n,S}(t), _{n,I}(t), or _{n,W} (t), then under the above assumptions, we have

$${\widehat{\lambda}}_{n}(t)-\lambda (t)={\scriptstyle \frac{{\stackrel{~}{f}}_{n}(t)-{E\stackrel{~}{f}}_{n}(t)}{1-L(t)}}+{\scriptstyle \frac{1}{{nh}_{n}}}\sum _{i=1}^{n}{\scriptstyle \frac{[{\xi}_{i}-\pi ({X}_{i})][{\delta}_{i}-m({X}_{i})]}{\pi ({X}_{i})[1-L({X}_{i})]}}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right)+{(-1)}^{k}{h}_{n}^{k}{\scriptstyle \frac{{\lambda}^{(k)}(t)\int {u}^{k}K(u)du}{k!}}+{o}_{p}\left({({nh}_{n})}^{-{\scriptstyle \frac{1}{2}}}\right),$$

where ${\stackrel{~}{f}}_{n}(t)={\scriptstyle \frac{1}{{h}_{n}}}\int K\left({\scriptstyle \frac{t-s}{{h}_{n}}}\right){dL}_{n1}(s)$.

(a) First we prove that Lemma 1 is true for _{n}_{,}* _{S}*(

$$\begin{array}{l}{\widehat{\lambda}}_{n,S}(t)-\lambda (t)=\left[{\scriptstyle \frac{1}{{h}_{n}}}\sum _{i=1}^{n}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right){\scriptstyle \frac{{\delta}_{i}}{n-{R}_{i}+1}}-\lambda (t)\right]\\ +{\scriptstyle \frac{1}{{h}_{n}}}\sum _{i=1}^{n}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right){\scriptstyle \frac{m({X}_{i})-{\delta}_{i}}{n-{R}_{i}+1}}+{\scriptstyle \frac{1}{{h}_{n}}}\sum _{i=1}^{n}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right){\scriptstyle \frac{{m}_{n}({X}_{i})-m({X}_{i})}{n-{R}_{i}+1}}\\ :={U}_{n1}(t)+{U}_{n2}(t)+{U}_{n3}(t).\end{array}$$

(18)

The first term in (18) can be rewritten as

$$\begin{array}{l}{U}_{n1}(t)=\left[{\scriptstyle \frac{1}{{h}_{n}}}\int K\left({\scriptstyle \frac{t-s}{{h}_{n}}}\right){\scriptstyle \frac{{d\widehat{F}}_{n}(s)}{1-{\widehat{F}}_{n}(s)}}-{\scriptstyle \frac{1}{{h}_{n}}}\int K\left({\scriptstyle \frac{t-s}{{h}_{n}}}\right){\scriptstyle \frac{dF(s)}{1-F(s)}}\right]+\left[{\scriptstyle \frac{1}{{h}_{n}}}\int K\left({\scriptstyle \frac{t-s}{{h}_{n}}}\right){\scriptstyle \frac{dF(s)}{1-F(s)}}-\lambda (t)\right]\\ :={U}_{n11}(t)+{U}_{n12}(t),\end{array}$$

(19)

where * _{n}*(

$${U}_{n11}(t)=\frac{{\stackrel{~}{f}}_{n}(t)-{E\stackrel{~}{f}}_{n}(t)}{1-L(t)}+{O}_{p}({({nh}_{n})}^{-1})+{O}_{p}\left({n}^{-{\scriptstyle \frac{1}{2}}}\right).$$

(20)

Under conditions (B.*mlπλ*) and (B.K), the second term in (19) can be written

$${U}_{n12}(t)=\frac{1}{{h}_{n}}\int K\left(\frac{t-s}{{h}_{n}}\right)\lambda (s)ds-\lambda (t)={(-1)}^{k}\frac{{h}_{n}^{k}{\lambda}^{(k)}(t)}{k!}\int {u}^{k}K(u)du+o({h}_{n}^{k}).$$

(21)

As *n*[1 − *L _{n}*(

$${U}_{n2}(t)=\frac{1}{{nh}_{n}}\sum _{i=1}^{n}K\left(\frac{t-{X}_{i}}{{h}_{n}}\right)\frac{m({X}_{i})-{\delta}_{i}}{1-L({X}_{i})}+{o}_{p}\left({({nh}_{n})}^{-{\scriptstyle \frac{1}{2}}}\right),$$

(22)

and under conditions (B.*mlπλ*) and (B.WΩ), the third term in (18) is

$$\begin{array}{l}{U}_{n3}(t)={\scriptstyle \frac{1}{{nh}_{n}}}\sum _{i=1}^{n}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right){\scriptstyle \frac{{\scriptstyle \frac{1}{{nb}_{n}}}{\displaystyle \sum _{j=1}^{n}}{\xi}_{j}[{\delta}_{j}-m({X}_{j})]W\left({\scriptstyle \frac{{X}_{i}-{X}_{j}}{{b}_{n}}}\right)}{[1-L({X}_{i})]\pi ({X}_{i})l({X}_{i})}}+{\scriptstyle \frac{1}{{nh}_{n}}}\sum _{i=1}^{n}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right){\scriptstyle \frac{{\scriptstyle \frac{1}{{nb}_{n}}}{\displaystyle \sum _{j=1}^{n}}{\xi}_{j}[m({X}_{j})-m({X}_{i})]W\left({\scriptstyle \frac{{X}_{i}-{X}_{j}}{{b}_{n}}}\right)}{[1-L({X}_{i})]\pi ({X}_{i})l({X}_{i})}}+{o}_{p}\left({({nh}_{n})}^{-{\scriptstyle \frac{1}{2}}}\right)\\ :={U}_{n31}(t)+{U}_{n32}(t)+{o}_{p}\left({({nh}_{n})}^{-{\scriptstyle \frac{1}{2}}}\right).\end{array}$$

(23)

Under conditions (B.K), (B.WΩ), and (B.*h _{n}b_{n}*), the first term in (23) is

$${U}_{n31}(t)={\scriptstyle \frac{1}{{nh}_{n}}}\sum _{j=1}^{n}{\scriptstyle \frac{{\xi}_{j}[{\delta}_{j}-m({X}_{j})]}{\pi ({X}_{j})[1-L({X}_{j})]}}K\left({\scriptstyle \frac{t-{X}_{j}}{{h}_{n}}}\right)+{o}_{p}\left({({nh}_{n})}^{-{\scriptstyle \frac{1}{2}}}\right).$$

(24)

Under conditions (B.*mlπλ*), (B.K), and (B.WΩ), and following steps similar to those in the proof of equation (17) in Wang and Rao (2002), we can show that

$${U}_{n32}(t)={o}_{p}\left({({nh}_{n})}^{-{\scriptstyle \frac{1}{2}}}\right).$$

(25)

Taken together, equations (23), (24), and (25) prove

$${U}_{n3}(t)=\frac{1}{{nh}_{n}}\sum _{j=1}^{n}\frac{{\xi}_{j}[{\delta}_{j}-m({X}_{j})]}{\pi ({X}_{j})[1-L({X}_{j})]}K\left(\frac{t-{X}_{j}}{{h}_{n}}\right)+{o}_{p}\left({({nh}_{n})}^{-{\scriptstyle \frac{1}{2}}}\right),$$

(26)

and equations (18)–(22) and (26) prove Lemma 1 for _{n}_{,}* _{S}*(

(b) Next we prove that Lemma 1 is true for _{n}_{,}* _{I}*(

$${T}_{n11}(t)={U}_{n1}(t).$$

(27)

Based on equation (14), conditions (B.K) and (B.ml*πλ*), and the MAR assumption, it is straightforward to prove

$${T}_{n12}(t)=\frac{1}{{nh}_{n}}\sum _{i=1}^{n}\frac{({\xi}_{i}-1)[{\delta}_{i}-m({X}_{i})]}{1-L({X}_{i})}K\left(\frac{t-{X}_{i}}{{h}_{n}}\right)+{o}_{p}\left({({nh}_{n})}^{-{\scriptstyle \frac{1}{2}}}\right).$$

(28)

Under equations (12), (19), (20), (21), (27) and (28), it follows that

$${T}_{n1}(t)={\scriptstyle \frac{{\stackrel{~}{f}}_{n}(t)-{E\stackrel{~}{f}}_{n}(t)}{1-L(t)}}+{\scriptstyle \frac{1}{{nh}_{n}}}\sum _{i=1}^{n}{\scriptstyle \frac{({\xi}_{i}-1)[{\delta}_{i}-m({X}_{i})]}{1-L({X}_{i})}}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right)+{(-1)}^{k}{\scriptstyle \frac{{h}_{n}^{k}{\lambda}^{(k)}(t)}{k!}}\int {u}^{k}K(u)du+{o}_{p}\left({({nh}_{n})}^{-{\scriptstyle \frac{1}{2}}}\right).$$

(29)

Similar to equation (26), we can show that

$${T}_{n2}(t)=\frac{1}{{nh}_{n}}\sum _{i=1}^{n}\frac{{\xi}_{i}[1-\pi ({X}_{i})][{\delta}_{i}-m({X}_{i})]}{\pi ({X}_{i})[1-L({X}_{i})]}K\left(\frac{t-{X}_{i}}{{h}_{n}}\right)+{o}_{p}\left({({nh}_{n})}^{-{\scriptstyle \frac{1}{2}}}\right).$$

(30)

Together equations (11), (29), and (30) prove that Lemma 1 is true for _{n}_{,}* _{I}*(

c) Finally, we prove that Lemma 1 is true for _{n}_{,}* _{W}* (

$$\begin{array}{l}{\widehat{\lambda}}_{n,W}(t)-\lambda (t)=\left[{\scriptstyle \frac{1}{{h}_{n}}}\sum _{i=1}^{n}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right){\scriptstyle \frac{{\xi}_{i}{\delta}_{i}/{\pi}_{n}({X}_{i})+[1-{\xi}_{i}/{\pi}_{n}({X}_{i})]{m}_{n}({X}_{i})}{n-{R}_{i}+1}}-{\scriptstyle \frac{1}{{h}_{n}}}\sum _{i=1}^{n}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right){\scriptstyle \frac{{\xi}_{i}{\delta}_{i}/{\pi}_{n}({X}_{i})+[1-{\xi}_{i}/{\pi}_{n}({X}_{i})]{m}_{n}({X}_{i})}{n-{R}_{i}+1}}\right]+\left[{\scriptstyle \frac{1}{{h}_{n}}}\sum _{i=1}^{n}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right){\scriptstyle \frac{{\xi}_{i}{\delta}_{i}/{\pi}_{n}({X}_{i})+[1-{\xi}_{i}/{\pi}_{n}({X}_{i})]m({X}_{i})}{n-{R}_{i}+1}}-{\scriptstyle \frac{1}{{h}_{n}}}\sum _{i=1}^{n}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right){\scriptstyle \frac{{\xi}_{i}{\delta}_{i}/\pi ({X}_{i})+[1-{\xi}_{i}/\pi ({X}_{i})]m({X}_{i})}{n-{R}_{i}+1}}\right]+\left[{\scriptstyle \frac{1}{{h}_{n}}}\sum _{i=1}^{n}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right){\scriptstyle \frac{{\xi}_{i}{\delta}_{i}/\pi ({X}_{i})+[1-{\xi}_{i}/\pi ({X}_{i})]m({X}_{i})}{n-{R}_{i}+1}}-\lambda (t)\right]\\ :={R}_{n1}(t)+{R}_{n2}(t)+{R}_{n3}(t).\end{array}$$

(31)

Under conditions (B.K), (B.*mlπλ*), (B.WΩ), (B.*h _{n}b_{n}*), and (B.

$$\begin{array}{l}{R}_{n1}(t)={\scriptstyle \frac{1}{{h}_{n}}}\sum _{i=1}^{n}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right){\scriptstyle \frac{[1-{\xi}_{i}/{\pi}_{n}({X}_{i})][{m}_{n}({X}_{i})-m({X}_{i})]}{n-{R}_{i}+1}}\\ ={\scriptstyle \frac{1}{n}}\sum _{j=1}^{n}{\xi}_{j}[{\delta}_{j}-m({X}_{j})]{\scriptstyle \frac{1}{{h}_{n}{b}_{n}}}\int K\left({\scriptstyle \frac{t-s}{{h}_{n}}}\right){\scriptstyle \frac{{\scriptstyle \frac{1-\pi (s)}{\pi (s)}}W\left({\scriptstyle \frac{s-{X}_{j}}{{b}_{n}}}\right)}{[1-L(s)]\pi (s)}}ds+{o}_{p}\left({({nh}_{n})}^{-{\scriptstyle \frac{1}{2}}}\right)\\ ={O}_{p}({n}^{-{\scriptstyle \frac{1}{2}}})+{o}_{p}\left({({nh}_{n})}^{-{\scriptstyle \frac{1}{2}}}\right)={o}_{p}\left({({nh}_{n})}^{-{\scriptstyle \frac{1}{2}}}\right).\end{array}$$

(32)

The second term in (31) can be rewritten as

$${R}_{n2}(t)=\frac{1}{{nh}_{n}}\sum _{i=1}^{n}K\left(\frac{t-{X}_{i}}{{h}_{n}}\right)\frac{{\xi}_{i}[{\delta}_{i}-m({X}_{i})][\pi ({X}_{i})-{\pi}_{n}({X}_{i})]}{[1-{L}_{n}({X}_{i}-)]\pi ({X}_{i}){\widehat{\pi}}_{n}({X}_{i})}.$$

Using arguments similar to those used to derive (26), we can prove that

$${R}_{n2}(t)={o}_{p}\left({({nh}_{n})}^{-{\scriptstyle \frac{1}{2}}}\right).$$

(33)

The third term in (31) can be rewritten as

$$\begin{array}{l}{R}_{n3}(t)={\scriptstyle \frac{1}{{h}_{n}}}\sum _{i=1}^{n}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right){\scriptstyle \frac{m({X}_{i})}{n-{R}_{i}+1}}-\lambda (t)+{\scriptstyle \frac{1}{{h}_{n}}}\sum _{i=1}^{n}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right){\scriptstyle \frac{{\xi}_{i}[{\delta}_{i}-m({X}_{i})]}{\pi ({X}_{i})(n-{R}_{i}+1)}}\\ ={\scriptstyle \frac{{\stackrel{~}{f}}_{n}(t)-{E\stackrel{~}{f}}_{n}(t)}{1-L(t)}}+{\scriptstyle \frac{1}{{nh}_{n}}}\sum _{i=1}^{n}{\scriptstyle \frac{[{\xi}_{i}-\pi ({X}_{i})][{\delta}_{i}-m({X}_{i})]}{\pi ({X}_{i})[1-L({X}_{i})]}}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right)+{h}_{n}^{k}{\scriptstyle \frac{{(-1)}^{k}{\lambda}^{(k)}(t)\int {u}^{k}K(u)du}{k!}}+{o}_{p}\left({n}^{{\scriptstyle \frac{-1}{2}}}\right).\end{array}$$

(34)

Equations (31)–(34) together prove that Lemma 1 is true for _{n}_{,}* _{W}* (

By the Lyapounov central limit theorem, we have

$$\sqrt{{nh}_{n}}\frac{{\stackrel{~}{f}}_{n}(t)-{E\stackrel{~}{f}}_{n}(t)}{1-L(t)}\stackrel{\mathcal{L}}{\to}N\left(0,{\sigma}_{1}^{2}(t)\right)$$

(35)

and

$$\frac{1}{\sqrt{{nh}_{n}}}\sum _{i=1}^{n}\frac{[{\xi}_{i}-\pi ({X}_{i})][{\delta}_{i}-m({X}_{i})]}{\pi ({X}_{i})[1-L({X}_{i})]}K\left(\frac{t-{X}_{i}}{{h}_{n}}\right)\stackrel{\mathcal{L}}{\to}N\left(0,{\sigma}_{2}^{2}(t)\right),$$

(35)

where

$${\sigma}_{1}^{2}(t)=\frac{\lambda (t)}{1-L(t)}\int {K}^{2}(u)du$$

and

$${\sigma}_{2}^{2}(t)=\frac{1-\pi (t)}{\pi (t)}\frac{m(t)[1-m(t)]l(t)}{{[1-L(t)]}^{2}}\int {K}^{2}(u)du.$$

Under the MAR assumption, we can prove

$$\mathit{Cov}\left(\sqrt{{nh}_{n}}\frac{{\stackrel{~}{f}}_{n}(t)-{E\stackrel{~}{f}}_{n}(t)}{1-L(t)},\frac{1}{\sqrt{{nh}_{n}}}\sum _{i=1}^{n}\frac{[{\xi}_{i}-\pi ({X}_{i})][{\delta}_{i}-m({X}_{i})]}{\pi ({X}_{i})[1-L({X}_{i})]}K\left(\frac{t-{X}_{i}}{{h}_{n}}\right)\right)=0.$$

(37)

Equations (35)–(37), together with Lemma 1, prove that Theorem 2 is true.

We start by making the following assumption:

(C.*π*): inf* _{x} π*(

First we prove that Theorem 3 is true for _{n}_{,}* _{I}*(

$${ET}_{n11}^{2}(t)={\scriptstyle \frac{1}{{nh}_{n}}}\left({\scriptstyle \frac{\lambda (t)}{1-L(t)}}\int {K}^{2}(u)du\right)+{h}_{n}^{2k}{\left({\scriptstyle \frac{{\lambda}^{(k)}(t)}{k!}}\int {u}^{k}K(u)du\right)}^{2}+o\left({\scriptstyle \frac{1}{{nh}_{n}}}\right)+o\left({h}_{n}^{2k}\right).$$

(38)

Under the MAR assumption, we have

$$E(\frac{({\xi}_{i}-1)[{\delta}_{i}-m({X}_{i})]}{1-{L}_{n}({X}_{i}-)}K\left(\frac{t-{X}_{i}}{{h}_{n}}\right){X}_{1},{X}_{2},,{X}_{n})=0,$$

which leads to the following result:

$$\begin{array}{l}{ET}_{n12}^{2}(t)={\scriptstyle \frac{1}{{n}^{2}{h}_{n}^{2}}}\sum _{i=1}^{n}E\{E({\left[{\scriptstyle \frac{({\xi}_{i}-1)[{\delta}_{i}-m({X}_{i})]}{1-{L}_{n}({X}_{i}-)}}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right)\right]}^{2}|{X}_{1},{X}_{2},,{X}_{n})\}& ={\scriptstyle \frac{1}{{nh}_{n}}}{\scriptstyle \frac{[1-\pi (t)]m(t)[1-m(t)]l(t)}{{[1-L(t)]}^{2}}}\int {K}^{2}(u)du+o\left({\scriptstyle \frac{1}{{nh}_{n}}}\right).\end{array}$$

(39)

For *T _{n}*

$$\begin{array}{l}{ET}_{n2}^{2}(t)=E\left\{{\scriptstyle \frac{1}{{nh}_{n}}}\sum _{i=1}^{n}K\left({\scriptstyle \frac{t-{X}_{i}}{{h}_{n}}}\right){\scriptstyle \frac{(1-{\xi}_{i})}{1-{L}_{n}({X}_{i}-)}}\left[{\scriptstyle \frac{{\displaystyle \sum _{j=1}^{n}}{\xi}_{i}{\delta}_{j}W\left({\scriptstyle \frac{{X}_{i}-{X}_{j}}{{b}_{n}}}\right)}{{\displaystyle \sum _{j=1}^{n}}{\xi}_{j}W\left({\scriptstyle \frac{{X}_{i}-{X}_{j}}{{b}_{n}}}\right)}}-m({X}_{i})\right]\right\}\\ ={M}_{n}(t)+o\left(({M}_{n}(t)\right),\end{array}$$

(40)

where

$${M}_{n}(t)=E{\left[\frac{1}{{nh}_{n}}\sum _{j=1}^{n}{\xi}_{j}[{\delta}_{j}-m({X}_{j})]\frac{1}{{nb}_{n}}\sum _{i=1}^{n}K\left(\frac{t-{X}_{i}}{{h}_{n}}\right)\frac{(1-{\xi}_{i})W\left({\scriptstyle \frac{{X}_{i}-{X}_{j}}{{b}_{n}}}\right)}{[1-L({X}_{i})]\pi ({X}_{i})l({X}_{i})}\right]}^{2}.$$

Under conditions (B.K), (B.WΩ) and (C.*π*), and the fact that *b _{n}*/

$$\begin{array}{l}{M}_{n}(t)=E\left\{{\left[{\scriptstyle \frac{1}{{nh}_{n}}}\sum _{i=1}^{n}{\xi}_{i}[{\delta}_{i}-m({X}_{i})]K\left({\scriptstyle \frac{t-{X}_{i}}{{n}_{h}}}\right){\scriptstyle \frac{1-\pi ({X}_{i})}{[1-L({X}_{i})]\pi ({X}_{i})}}\right]}^{2}\right\}+o\left({\scriptstyle \frac{1}{{nh}_{n}}}\right)\\ ={\scriptstyle \frac{1}{{nh}_{n}}}{\scriptstyle \frac{{[1-\pi (t)]}^{2}m(t)[1-m(t)]l(t)}{{[1-L(t)]}^{2}\pi (t)}}\int {K}^{2}(u)du+o\left({\scriptstyle \frac{1}{{nh}_{n}}}\right).\end{array}$$

(41)

Equations (11), (12), and (38)–(41) together prove Theorem 3 for _{n}_{,}* _{I}*(

- Blum JR, Susarla V. Maximal deviation theory of density and failure rate function estimates based on censored data. In: Krishniah PR, editor. Multivariate analysis. New York: North-Holland; 1980. pp. 213–222.
- Cao R, Jácome MA. Presmoothed kernel density estimator for censored data. Journal of Nonparametric Statistics. 2004;16:289–309.
- Cao R, López-de-Ullibarri I, Janssen P, Veraverbeke N. Presmoothed Kaplan-Meier and Nelson-Aalen estimators. Journal of Nonparametric Statistics. 2005;17:31–56.
- Cheng PE. Nonparametric estimation of mean functionals with data missing at random. Journal of the American Statistical Association. 1994;89:81–87.
- Dewanji A. A note on a test for competing risks with missing failure type. Biometrika. 1992;79:855–857.
- Diehl S, Stute W. Kernel density and hazard function estimation in the presence of censoring. Journal of Multivariate Analysis. 1988;25:299–310.
- Dikta G. On semiparametric random censorship models. Journal of Statistical Planning and Inference. 1998;66:253–279.
- Dinse GE. Nonparametric estimation for partially-complete time and type of failure data. Biometrics. 1982;38:417–431. [PubMed]
- Dinse GE. Nonparametric prevalence and mortality estimators for animal experiments with incomplete cause-of-death data. Journal of the American Statistical Association. 1986;81:328–336.
- Gao GZ, Tsaitis AA. Semiparametric estimators for the regression coefficients in the linear transformation competing risks model with missing cause of failing. Biometrika. 2005;92:875–891.
- Goetghebeur EJ, Ryan LM. A modified log rank test for competing risks with missing failure type. Biometrika. 1990;77:207–211.
- Goetghebeur EJ, Ryan LM. Analysis of competing risks survival data when some failure types are missing. Biometrika. 1995;82:821–833.
- González-Manteiga W, Cao R, Marron JS. Bootstrap selection of the smoothing parameter in nonparametric hazard rate estimation. Journal of the American Statistical Association. 1996;91:1130–1140.
- Jacome MA, Gijbels I, Cao R. Comparison of presmoothing methods in kernel density estimation under censoring. Computational Statistics. 2008;23:381–406.
- Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. New York: John Wiley & Sons; 1980.
- Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association. 1958;53:457–481.
- Klein JP, Moeschberger ML. Survival Analysis. New York: Springer; 2003.
- Lipsitz SR, Zhao LP, Molenberghs G. A semiparametric method of multiple imputation. Journal of the Royal Statistical Society, Series B. 1998;60:127–144.
- Little RJA, Rubin DB. Statistical analysis with missing data. New York: John Wiley & Sons; 1987.
- Lo S-H. Estimating a survival function with incomplete cause-of-death data. Journal of Multivariate Analysis. 1991;39:217–235.
- Marron JS, Padgett JW. Asymptotically optimal bandwidth selection for kernel density estimators from randomly right-censored samples. The Annals of Statistics. 1987;15:1520–1535.
- McKeague IW, Subramanian S. Product-limit estimators and Cox regression with missing censoring information. Scandinavian Journal of Statistics. 1998;25:589–601.
- Nelson W. Theory and applications of hazard plotting for censored failure data. Technometrics. 1972;14:945–966.
- Patil PN. Bandwidth choice for nonparametric hazard rate estimation. Journal of Statistical Planning and Inference. 1993a;35:15–30.
- Patil PN. On the least squares cross-validation bandwidth in hazard rate estimation. The Annals of Statistics. 1993b;21:1792–1810.
- Ranlau-Hansen H. Smoothing counting process intensities by means of kernel functions. The Annals of Statistics. 1983;11:453–466.
- Regina YC, John VR. A histogram estimator of the hazard rate with censored data. The Annals of Statistics. 1985;13:592–605.
- Robins JM, Rotnitzky A. Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell N, Dietz K, Farewell V, editors. AIDS epidemiology - methodological issues. Boston: Birkhäuser; 1992. pp. 297–331.
- Robins JM, Rotnitzky A, Zhao LP. Estimation of Regression Coefficients when Some Regressors Are Not Always Observed. Journal of the American Statistical Association. 1994;89:846–866.
- Robins JM, Wang N. Inference for imputation estimators. Biometrika. 2000;87:113–124.
- Rubin DB. Multiple imputation for nonresponse in surveys. New York: Wiley; 1987.
- Sarda P, Vieu P. Smoothing parameter selection in hazard estimation. Statistics & Probability Letters. 1991;11:429–434.
- Scharfstein DO, Rotnitzky A, Robins J. Adjusting for nonignorable drop-out using semiparametric nonresponse models (with discussion) Journal of the American Statistical Association. 1999;94:1096–1146.
- Subramanian S. Asymptotically efficient estimation of a survival function in the missing censoring indicator model. Journal of Nonparametric Statistics. 2004;16:797–817.
- Subramanian S. Survival analysis for the missing censoring indicator model using kernel density estimation techniques. Statistical Methodology. 2006;3:125–136. [PMC free article] [PubMed]
- Tanner MA. A note on the variable kernel estimator of the hazard function from randomly censored data. The Annals of Statistics. 1983;11:994–998.
- Tanner MA, Wong WH. The estimation of the hazard function from randomly censored data by the kernel method. The Annals of Statistics. 1983;11:989–993.
- Tsiatis AA, Davidian M, McNeney B. Multiple imputation methods for testing treatment differences in survival distributions with missing cause of failure. Biometrika. 2002;89:238–244.
- van der Laan MJ, McKeague IW. Efficient estimation from right-censored data when failure indicators are missing at random. The Annals of Statistics. 1998;26:164–182.
- Wang QH. Some bounds for the error of an estimator of the hazard function with censored data. Statistics & Probability Letters. 1999;44:319–326.
- Wang QH, Linton O, Härdle W. Semiparametric regression analysis with missing response at random. Journal of the American Statistical Association. 2004;99:334–345.
- Wang QH, Rao JNK. Empirical likelihood-based inference under imputation for missing response data. The Annals of Statistics. 2002;30:896–924.
- Zhao LP, Lipsitz SR, Lew D. Regression analysis with missing covariate data using estimating equations. Biometrics. 1996;52:1165–1182. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |