Under an additive hazards model, the hazard function for failure time T given covariate Z is assumed to be of the form
where
λ0(
t) is an unspecified baseline hazard function and
β0 is a
p-vector of unknown regression parameters. In the case where all data are observed,
Lin & Ying (1994) introduced a pseudoscore function for the parameter vector
β0 and showed that the resulting estimator is consistent and asymptotically normal, with an easily estimated covariance matrix.
When censoring indicators are missing for right-censored data, we observe
n independent and identically distributed vectors (
Xi, ξi, ξiδi, Zi, Ri) (
i = 1, …,
n), where
ξi is an indicator that
δi is not missing, and
Ri is an auxiliary covariate that is not used to model the hazard but may be used to describe the probability that
δi is missing. The probability that
δi is missing is characterized by the distribution of
ξi given
δi and
Wi = (
Xi, Zi, Ri)
, which is Bernoulli with probability
P{
ξi = 1|
δi, Wi =
w}. Under the MAR assumption (
Little & Rubin 1987), we have
Another function of interest is π(w) = P{δi = 1|Wi = w, ξi = 1}, which is the conditional probability of an uncensored observation, given that δi is observed and Wi = w.
A naive method for estimating
β0 is to simply ignore the missing data and to apply the pseudoscore function of
Lin & Ying (1994) to the complete data only. Such a procedure (called the complete case estimator) may not only lose efficiency due to discarding incomplete observations, but may also generate biased estimators, even when the censoring indicators are MAR. If either
ρ(
w) or
π(
w) is modeled correctly, we can use the approach of
Lu & Liang (2008) to obtain the IPW and DR estimators. In many situations, however, knowledge of
ρ(
w) and
π(
w) is limited, and thus both models may be misspecified. In this article, no parametric models are assumed for these two probabilities; rather, both are estimated nonparametrically by kernel smoothers. We begin by introducing the simple weighted estimator, which is derived under the MAR assumption.
Because
ρ(
Wi) is a function of continuous variables such as
Xi, we estimate it with the Nadaraya-Watson estimator based on complete observations. Specifically, let
d denote the number of continuous elements of
Wi and let
K be an
rth-order (
r >
d) kernel function of
d variables with finite support that satisfies ∫
K(
u)
du = 1, ∫
umK(
u)
du = 0,
m = 1
, …
, r − 1, ∫
urK(
u)
du ≠ 0, and ∫
K(
u)
2du < ∞, where
u can be a scalar or a vector. If
u is a vector, say
u = (
u1, …,
ud)′, then
um denotes

. The motivation for using higher-order kernels is to reduce the order of magnitude of the bias of the curve estimator, leading to a faster rate of convergence of the mean integrated squared error (
Wand & Schucany 1990). This type of kernel function may be constructed in various manners. For instance,
Wand & Schucany (1990) gave a univariate Gaussian-based kernel of order 2
r:
where
(2r−1)(
u1) is the (2
r − 1)-th derivative of the standard normal density function
![[var phi]](/corehtml/pmc/pmcents/x03C6.gif)
(
u1).
Hall & Marron (1988) proposed a class of univariate kernels of order
r:
Define Kh(·) = K(·/h), where h is a bandwidth sequence, and K(u/h) = K(u1/h, …, ud/h) for u = (u1, …, ud)′. Write Wi = (W1i, W2i), where W1i and W2i include all continuous and discrete elements of Wi, respectively. Then the Nadaraya-Watson estimator of ρ(w) is given by
where
w = (
w1, w2). The choice of the kernel function
K usually has little effect on the estimator
![[rho with circumflex]](/corehtml/pmc/pmcents/x03C1x0302.gif)
(
w), and thus the estimator of
β0, but the bandwidth sequence
h typically does influence these estimators, both theoretically and practically. We assume that
h satisfies
nh2r → 0 and
nh2d → ∞ as
n → ∞. If
h =
O(
n−1/p) for some integer
p > 2
d, then a reasonable choice for
r is the smallest even integer such that
r ≥
p −
d (
Qi, Wang & Prentice 2005). For example, when
d = 2, we might choose
p = 5 and
r = 4. In a similar manner, we can estimate
π(
w) by
Note that the kernel function
K and bandwidth sequence
h used in (
3) need not be identical to those used in (
4), and the bandwidth can be different for each component of
W1i. For example, we can define
h = (
h1, …,
hd)′ for different bandwidths, and write
K(
u/h) =
K(
u1/h1, …,
ud/hd). Here, we use the same
K and
h in both for notational convenience.
Let

denote the baseline cumulative hazard function. Using the inverse probability weighted approach, consider the following estimating equations for
β0 and Λ
0:
where

,
Yi(
t) =
I(
Xi ≥
t), and
τ is a prespecified positive constant such that
P(
Xi ≥
τ) > 0. The resulting simple weighted estimators for
β0 and Λ
0 have the following closed forms:
and
where
a
2 =
aa′ for any vector
a and
In practice, we often choose τ to be the largest observation time, say τ = max{Xi}.
Let
![[z macron]](/corehtml/pmc/pmcents/zmacr.gif)
(
t) =
E[
Yi(
t)
Zi(
t)]
/E[
Yi(
t)]. Define
Ni(
t) =
I(
Xi ≤
t) and

. The asymptotic properties of
![[beta]](/corehtml/pmc/pmcents/bgrcirc.gif)
are given in the following theorem.
Theorem 1
Under regularity conditions (C1)–(C6), which are stated in the
Appendix,
![[beta]](/corehtml/pmc/pmcents/bgrcirc.gif)
is consistent and n
1/2(
![[beta]](/corehtml/pmc/pmcents/bgrcirc.gif)
−β
0) is asymptotically normal with mean zero and covariance matrix V = A
−1ΣA
−1 + A
−1Σ
*A
−1, where
Note that the first term in
V is the asymptotic variance of the
Lin & Ying (1994) estimator based only on the complete data (
ξi ![[equivalent]](/corehtml/pmc/pmcents/equiv.gif)
1) and the second term represents the effect of the missing censoring indicators. If we let

, then the covariance matrix
V can be consistently estimated by

=
Â−1(
![[Sigma]](/corehtml/pmc/pmcents/x03A3x0302.gif)
+
*)
Â−1, where
Define

and
The asymptotic properties of
0(
t) are given in the next theorem.
Theorem 2
Under the assumptions of Theorem 1,
0(t) converges in probability to Λ
0(t) uniformly in t
![[set membership]](/corehtml/pmc/pmcents/x2208.gif)
[0, τ], and n
1/2{
0(t) − Λ
0(t)g converges weakly on [0, τ] to a zero-mean Gaussian process with covariance function at (t, s) (t ≤ s) equal to
The covariance function Γ(
t, s) can be consistently estimated by substituting
![[beta]](/corehtml/pmc/pmcents/bgrcirc.gif)
,
![[rho with circumflex]](/corehtml/pmc/pmcents/x03C1x0302.gif)
and
![[pi]](/corehtml/pmc/pmcents/picirc.gif)
for the unknowns
β0,
ρ and
π in the appropriate empirical estimators, and by replacing the (unobserved) processes

with

. For an individual with a given covariate vector
z0, the corresponding estimator of the survival function
S(
t, z0) is
Using the functional delta-method and Theorem 2, we can obtain the asymptotic properties of Ŝ(t, z0), which can be applied to construct confidence bands for S(t, z0).
When the missingness probability
ρ(
w) is known or a parametric model is specified for
ρ(
w), the simple weighted estimator uses only the complete case data (i.e., only individuals with
ξi = 1), and the fully augmented weighted estimator (also called the double robust estimator) incorporates contributions from the incomplete observations (i.e., individuals with
ξi = 0), thus the fully augmented weighted estimator is more efficient than the corresponding simple weighted estimator (
Lu & Liang 2008). In addition, the fully augmented weighted estimator has the so-called double-robustness property; that is, the estimator is consistent if one can correctly specify either the missingness probability
ρ(
w) or the conditional probability of an uncensored observation
π(
w) (
Wang & Chen 2001). However, estimating
ρ(
w) nonparametrically enables the simple weighted estimator
![[beta]](/corehtml/pmc/pmcents/bgrcirc.gif)
to follow the same asymptotic distribution as the fully augmented weighted estimator
a (described next). This indicates that
![[beta]](/corehtml/pmc/pmcents/bgrcirc.gif)
is equivalent asymptotically to
a. These conclusions are consistent with the results of
Qi, Wang & Prentice (2005) for proportional hazards regression with missing covariates.
The fully augmented weighted estimators for
β0 and Λ
0 are the solutions to the following estimating equations:
The resulting fully augmented weighted estimators for
β0 and Λ
0 have the following closed forms:
and
Similar to Theorems 1 and 2, the asymptotic properties of
a and
a are given in the following theorem.
Theorem 3
Under the assumptions of Theorem 1, we have:
a is consistent and n1/2(
a − β0) is asymptotically normal with mean zero and covariance matrix V:
a(t) converges in probability to Λ0(t) uniformly in t
[0, τ], and n1/2{
a(t) − Λ0(t)} converges weakly on [0, τ] to a zero-mean Gaussian process with covariance function Γ(t, s) at (t, s) (t ≤ s).
For the fully augmented weighted method, the covariance matrix
V and covariance function Γ(
t, s) can be consistently estimated by substituting
a,
![[rho with circumflex]](/corehtml/pmc/pmcents/x03C1x0302.gif)
and
![[pi]](/corehtml/pmc/pmcents/picirc.gif)
for
β0,
ρ and
π in the appropriate empirical estimators, and by replacing the processes

with

.
Theorems 1, 2 and 3 show that both the simple and fully augmented weighted estimators have the same asymptotic normal distribution, and the resulting estimators of the baseline cumulative hazard function converge to the same Gaussian process. This means that the simple weighted estimators with nonparametric
![[rho with circumflex]](/corehtml/pmc/pmcents/x03C1x0302.gif)
(
w) are as efficient as the kernel-assisted fully augmented weighted estimators. One intuitive explanation for this is that the incomplete observations are indirectly incorporated in the simple weighted estimator by using the inverse of
![[rho with circumflex]](/corehtml/pmc/pmcents/x03C1x0302.gif)
(
w) as a weight.
Note that
0(
t) and
a(
t) may not always be monotonic in
t, in which case simple modifications such as those discussed in
Lin & Ying (1994) can be made to ensure monotonicity while preserving asymptotic properties.