Under an additive hazards model, the hazard function for failure time *T* given covariate *Z* is assumed to be of the form

where

*λ*_{0}(

*t*) is an unspecified baseline hazard function and

*β*_{0} is a

*p*-vector of unknown regression parameters. In the case where all data are observed,

Lin & Ying (1994) introduced a pseudoscore function for the parameter vector

*β*_{0} and showed that the resulting estimator is consistent and asymptotically normal, with an easily estimated covariance matrix.

When censoring indicators are missing for right-censored data, we observe

*n* independent and identically distributed vectors (

*X*_{i}, ξ_{i}, ξ_{i}δ_{i}, Z_{i}, R_{i}) (

*i* = 1, …,

*n*), where

*ξ*_{i} is an indicator that

*δ*_{i} is not missing, and

*R*_{i} is an auxiliary covariate that is not used to model the hazard but may be used to describe the probability that

*δ*_{i} is missing. The probability that

*δ*_{i} is missing is characterized by the distribution of

*ξ*_{i} given

*δ*_{i} and

*W*_{i} = (

*X*_{i}, Z_{i}, R_{i})

*,* which is Bernoulli with probability

*P*{

*ξ*_{i} = 1|

*δ*_{i}, W_{i} =

*w*}. Under the MAR assumption (

Little & Rubin 1987), we have

Another function of interest is *π*(*w*) = *P*{*δ*_{i} = 1|*W*_{i} = *w, ξ*_{i} = 1}, which is the conditional probability of an uncensored observation, given that *δ*_{i} is observed and *W*_{i} = *w*.

A naive method for estimating

*β*_{0} is to simply ignore the missing data and to apply the pseudoscore function of

Lin & Ying (1994) to the complete data only. Such a procedure (called the complete case estimator) may not only lose efficiency due to discarding incomplete observations, but may also generate biased estimators, even when the censoring indicators are MAR. If either

*ρ*(

*w*) or

*π*(

*w*) is modeled correctly, we can use the approach of

Lu & Liang (2008) to obtain the IPW and DR estimators. In many situations, however, knowledge of

*ρ*(

*w*) and

*π*(

*w*) is limited, and thus both models may be misspecified. In this article, no parametric models are assumed for these two probabilities; rather, both are estimated nonparametrically by kernel smoothers. We begin by introducing the simple weighted estimator, which is derived under the MAR assumption.

Because

*ρ*(

*W*_{i}) is a function of continuous variables such as

*X*_{i}, we estimate it with the Nadaraya-Watson estimator based on complete observations. Specifically, let

*d* denote the number of continuous elements of

*W*_{i} and let

*K* be an

*r*th-order (

*r* >

*d*) kernel function of

*d* variables with finite support that satisfies ∫

*K*(

*u*)

*du* = 1, ∫

*u*^{m}K(

*u*)

*du* = 0,

*m* = 1

*,* …

*, r* − 1, ∫

*u*^{r}K(

*u*)

*du* ≠ 0, and ∫

*K*(

*u*)

^{2}*du* < ∞, where

*u* can be a scalar or a vector. If

*u* is a vector, say

*u* = (

*u*_{1}, …,

*u*_{d})′, then

*u*^{m} denotes

. The motivation for using higher-order kernels is to reduce the order of magnitude of the bias of the curve estimator, leading to a faster rate of convergence of the mean integrated squared error (

Wand & Schucany 1990). This type of kernel function may be constructed in various manners. For instance,

Wand & Schucany (1990) gave a univariate Gaussian-based kernel of order 2

*r*:

where

^{(2}^{r}^{−1)}(

*u*_{1}) is the (2

*r* − 1)-th derivative of the standard normal density function

(

*u*_{1}).

Hall & Marron (1988) proposed a class of univariate kernels of order

*r*:

Define *K*_{h}(·) = *K*(·*/h*), where *h* is a bandwidth sequence, and *K*(*u/h*) = *K*(*u*_{1}*/h,* …*, u*_{d}/h) for *u* = (*u*_{1}, …, *u*_{d})′. Write *W*_{i} = (*W*_{1}_{i}, W_{2}_{i}), where *W*_{1}_{i} and *W*_{2}_{i} include all continuous and discrete elements of *W*_{i}, respectively. Then the Nadaraya-Watson estimator of *ρ*(*w*) is given by

where

*w* = (

*w*_{1}*, w*_{2}). The choice of the kernel function

*K* usually has little effect on the estimator

(

*w*), and thus the estimator of

*β*_{0}, but the bandwidth sequence

*h* typically does influence these estimators, both theoretically and practically. We assume that

*h* satisfies

*nh*^{2}^{r} → 0 and

*nh*^{2}^{d} → ∞ as

*n* → ∞. If

*h* =

*O*(

*n*^{−1}^{/p}) for some integer

*p* > 2

*d*, then a reasonable choice for

*r* is the smallest even integer such that

*r* ≥

*p* −

*d* (

Qi, Wang & Prentice 2005). For example, when

*d* = 2, we might choose

*p* = 5 and

*r* = 4. In a similar manner, we can estimate

*π*(

*w*) by

Note that the kernel function

*K* and bandwidth sequence

*h* used in (

3) need not be identical to those used in (

4), and the bandwidth can be different for each component of

*W*_{1}_{i}. For example, we can define

*h* = (

*h*_{1}, …,

*h*_{d})′ for different bandwidths, and write

*K*(

*u/h*) =

*K*(

*u*_{1}*/h*_{1}, …,

*u*_{d}/h_{d}). Here, we use the same

*K* and

*h* in both for notational convenience.

Let

denote the baseline cumulative hazard function. Using the inverse probability weighted approach, consider the following estimating equations for

*β*_{0} and Λ

_{0}:

where

,

*Y*_{i}(

*t*) =

*I*(

*X*_{i} ≥

*t*), and

*τ* is a prespecified positive constant such that

*P*(

*X*_{i} ≥

*τ*) > 0. The resulting simple weighted estimators for

*β*_{0} and Λ

_{0} have the following closed forms:

and

where

*a*^{2} =

*aa*′ for any vector

*a* and

In practice, we often choose *τ* to be the largest observation time, say *τ* = max{*X*_{i}}.

Let

(

*t*) =

*E*[

*Y*_{i}(

*t*)

*Z*_{i}(

*t*)]

*/E*[

*Y*_{i}(

*t*)]. Define

*N*_{i}(

*t*) =

*I*(

*X*_{i} ≤

*t*) and

. The asymptotic properties of

are given in the following theorem.

Theorem 1

Under regularity conditions (C1)–(C6), which are stated in the

Appendix,

is consistent and n

^{1/2}(

−β

_{0}) is asymptotically normal with mean zero and covariance matrix V = A

^{−1}ΣA

^{−1} + A

^{−1}Σ

^{*}A

^{−1}, where

Note that the first term in

*V* is the asymptotic variance of the

Lin & Ying (1994) estimator based only on the complete data (

*ξ*_{i} 1) and the second term represents the effect of the missing censoring indicators. If we let

, then the covariance matrix

*V* can be consistently estimated by

=

*Â*^{−1}(

+

^{*})

*Â*^{−1}*,* where

Define

and

The asymptotic properties of

_{0}(

*t*) are given in the next theorem.

Theorem 2

Under the assumptions of Theorem 1,

_{0}(t) converges in probability to Λ

_{0}(t) uniformly in t

[0, τ], and n

^{1/2}{

_{0}(t) − Λ

_{0}(t)g converges weakly on [0, τ] to a zero-mean Gaussian process with covariance function at (t, s) (t ≤ s) equal to

The covariance function Γ(

*t, s*) can be consistently estimated by substituting

,

and

for the unknowns

*β*_{0},

*ρ* and

*π* in the appropriate empirical estimators, and by replacing the (unobserved) processes

with

. For an individual with a given covariate vector

*z*_{0}, the corresponding estimator of the survival function

*S*(

*t, z*_{0}) is

Using the functional delta-method and Theorem 2, we can obtain the asymptotic properties of *Ŝ*(*t, z*_{0}), which can be applied to construct confidence bands for *S*(*t, z*_{0}).

When the missingness probability

*ρ*(

*w*) is known or a parametric model is specified for

*ρ*(

*w*), the simple weighted estimator uses only the complete case data (i.e., only individuals with

*ξ*_{i} = 1), and the fully augmented weighted estimator (also called the double robust estimator) incorporates contributions from the incomplete observations (i.e., individuals with

*ξ*_{i} = 0), thus the fully augmented weighted estimator is more efficient than the corresponding simple weighted estimator (

Lu & Liang 2008). In addition, the fully augmented weighted estimator has the so-called double-robustness property; that is, the estimator is consistent if one can correctly specify either the missingness probability

*ρ*(

*w*) or the conditional probability of an uncensored observation

*π*(

*w*) (

Wang & Chen 2001). However, estimating

*ρ*(

*w*) nonparametrically enables the simple weighted estimator

to follow the same asymptotic distribution as the fully augmented weighted estimator

_{a} (described next). This indicates that

is equivalent asymptotically to

_{a}. These conclusions are consistent with the results of

Qi, Wang & Prentice (2005) for proportional hazards regression with missing covariates.

The fully augmented weighted estimators for

*β*_{0} and Λ

_{0} are the solutions to the following estimating equations:

The resulting fully augmented weighted estimators for

*β*_{0} and Λ

_{0} have the following closed forms:

and

Similar to Theorems 1 and 2, the asymptotic properties of

_{a} and

_{a} are given in the following theorem.

Theorem 3

Under the assumptions of Theorem 1, we have:

_{a} is consistent and n^{1/2}(_{a} − β_{0}) is asymptotically normal with mean zero and covariance matrix V:_{a}(t) converges in probability to Λ_{0}(t) uniformly in t [0, τ], and n^{1/2}{_{a}(t) − Λ_{0}(t)} converges weakly on [0, τ] to a zero-mean Gaussian process with covariance function Γ(t, s) at (t, s) (t ≤ s).

For the fully augmented weighted method, the covariance matrix

*V* and covariance function Γ(

*t, s*) can be consistently estimated by substituting

_{a},

and

for

*β*_{0},

*ρ* and

*π* in the appropriate empirical estimators, and by replacing the processes

with

.

Theorems 1, 2 and 3 show that both the simple and fully augmented weighted estimators have the same asymptotic normal distribution, and the resulting estimators of the baseline cumulative hazard function converge to the same Gaussian process. This means that the simple weighted estimators with nonparametric

(

*w*) are as efficient as the kernel-assisted fully augmented weighted estimators. One intuitive explanation for this is that the incomplete observations are indirectly incorporated in the simple weighted estimator by using the inverse of

(

*w*) as a weight.

Note that

_{0}(

*t*) and

_{a}(

*t*) may not always be monotonic in

*t*, in which case simple modifications such as those discussed in

Lin & Ying (1994) can be made to ensure monotonicity while preserving asymptotic properties.