Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3010164

Formats

Article sections

- Abstract
- 1. INTRODUCTION
- 2. ESTIMATION PROCEDURE
- 3. SIMULATION STUDIES
- 4. EXAMPLE: ANALYSIS OF BRAIN CANCER DATA
- 5. CONCLUDING REMARKS
- References

Authors

Related links

Can J Stat. Author manuscript; available in PMC 2011 September 1.

Published in final edited form as:

Can J Stat. 2010 September; 38(3): 333–351.

doi: 10.1002/cjs.10072PMCID: PMC3010164

NIHMSID: NIHMS202676

Xinyuan SONG, Department of Statistics, Shatin, N. T., Hong Kong, P. R. China;

Xinyuan SONG: kh.ude.khuc.ats@gnosyx; Liuquan SUN: nc.ca.tma@qls; Xiaoyun MU: nc.ca.ssma@yxum; Gregg E. DINSE: vog.hin.shein@esnid

In this article, the authors consider a semiparametric additive hazards regression model for right-censored data that allows some censoring indicators to be missing at random. They develop a class of estimating equations and use an inverse probability weighted approach to estimate the regression parameters. Nonparametric smoothing techniques are employed to estimate the probability of non-missingness and the conditional probability of an uncensored observation. The asymptotic properties of the resulting estimators are derived. Simulation studies show that the proposed estimators perform well. They motivate and illustrate their methods with data from a brain cancer clinical trial.

In the analysis of failure time data, the cause of failure may be unknown for some subjects for a variety of reasons (e.g., autopsies were not performed or medical records were missing). We motivate and illustrate our methods with data on patients from a brain cancer clinical trial, where we evaluate the effect of two potential explanatory variables on a measure of quality of life. All patients were initially ambulatory, but over time some lost their mobility, some had a progression of their cancer, and some experienced both events. To assess quality of life, we define “survival time” as the time to non-ambulatory progression. Thus, patients who progressed and were no longer ambulatory contributed uncensored times, patients who progressed but were still ambulatory or who had not progressed by the end of the study contributed censored times, and patients who progressed but whose ambulatory status was unknown contributed times with missing censoring indicators. We apply our regression analysis to evaluate the effects of sex and age on the time to non-ambulatory progression.

Specifically, let *T* be the failure time, let *Z* be a *p*×1 vector of covariates, and let *C* be a censoring time that is assumed to be conditionally independent of *T* given *Z*. Data are available on *Z* and *X* = *T* ^*C*, but the censoring indicator *δ* = *I*(*T* ≤ *C*) may be missing. If the probability that *δ* is missing does not depend on either the true value of *δ* or the values of *X* and *Z*, then a missing *δ* is said to be missing completely at random (MCAR). Alternatively, if the probability that *δ* is missing depends on the values of *X* and *Z* but not on the true value of *δ*, then a missing *δ* is said to be missing at random (MAR); see Little & Rubin (1987).

Under the MCAR assumption and in the absence of covariates, Dinse (1982) obtained a nonparametric maximum likelihood estimator (NPMLE) of the survival function using an EM algorithm. Lo (1991) proved that there are infinitely many NPMLEs and some of them may be inconsistent; he consequently constructed a consistent and asymptotically normal estimator. Gijbels, Lin & Ying (1993, 2007) and McKeague & Subramanian (1998) proposed further improvements on these estimators. When covariates are present, Gijbels, Lin & Ying (1993) initiated research on estimation under the Cox model. McKeague & Subramanian (1998) provided an alternative approach to estimation. Subramanian (2000) considered estimation under proportionality of conditional hazards. Zhou & Sun (2003) studied the additive hazards regression model.

Under the MAR assumption, van der Laan & McKeague (1998) first addressed efficient estimation of the survival function and proposed a sieved nonparametric maximum likelihood estimator. Further developments along the lines of efficient estimation can be found in Subramanian (2004, 2006) and Wang & Ng (2008). Goetghebeur & Ryan (1995) and Lu & Tsiatis (2001) analyzed competing risks data with missing cause of failure under proportional hazards regression models. Gao & Tsiatis (2005) considered the linear transformation competing risks model with missing cause of failure. Recently, Lu & Liang (2008) studied competing risks data with missing cause of failure under the semiparametric additive hazards model, and suggested the inverse probability weighted (IPW) and double robust (DR) estimators. To obtain these estimators, however, they imposed parametric models for two components: the probability that the censoring indicator is not missing and the conditional probability of a given failure type.

In this article, we propose estimators for the regression parameters in a semiparametric additive hazards model, where the failure times are subject to right censoring and some censoring indicators are missing at random. We provide simple and fully augmented weighted estimators that incorporate incomplete data nonparametrically. Unlike Lu & Liang (2008), no parametric models are assumed for the missingness probability or the conditional probability of an uncensored observation; instead, we use nonparametric kernel smoothing techniques to estimate these probabilities. The resulting estimators have closed forms and are easy to implement. Under the usual MAR assumption, both the simple and fully augmented weighted estimators are consistent and asymptotically equivalent, i.e., they have the same asymptotic normal distribution. In addition, the asymptotic properties of the estimated baseline cumulative hazard function are also established for the model.

The remainder of the paper is organized as follows. Section 2 presents the simple and fully augmented weighted estimators and their asymptotic properties under the MAR assumption. Section 3 reports simulation results that show the proposed estimators perform well. In Section 4, our methods are applied to analyze the brain cancer data described earlier. Our concluding remarks follow in Section 5 and technical proofs are relegated to the Appendix.

Under an additive hazards model, the hazard function for failure time *T* given covariate *Z* is assumed to be of the form

$$\lambda (t\mid Z)={\lambda}_{0}(t)+{\beta}_{0}^{\prime}Z,$$

(1)

where *λ*_{0}(*t*) is an unspecified baseline hazard function and *β*_{0} is a *p*-vector of unknown regression parameters. In the case where all data are observed, Lin & Ying (1994) introduced a pseudoscore function for the parameter vector *β*_{0} and showed that the resulting estimator is consistent and asymptotically normal, with an easily estimated covariance matrix.

When censoring indicators are missing for right-censored data, we observe *n* independent and identically distributed vectors (*X _{i}, ξ_{i}, ξ_{i}δ_{i}, Z_{i}, R_{i}*) (

$$P\{{\xi}_{i}=1\mid {\delta}_{i},\phantom{\rule{0.16667em}{0ex}}{W}_{i}=w\}=P\{{\xi}_{i}=1\mid {W}_{i}=w\}\equiv \rho (w).$$

(2)

Another function of interest is *π*(*w*) = *P*{*δ _{i}* = 1|

A naive method for estimating *β*_{0} is to simply ignore the missing data and to apply the pseudoscore function of Lin & Ying (1994) to the complete data only. Such a procedure (called the complete case estimator) may not only lose efficiency due to discarding incomplete observations, but may also generate biased estimators, even when the censoring indicators are MAR. If either *ρ*(*w*) or *π*(*w*) is modeled correctly, we can use the approach of Lu & Liang (2008) to obtain the IPW and DR estimators. In many situations, however, knowledge of *ρ*(*w*) and *π*(*w*) is limited, and thus both models may be misspecified. In this article, no parametric models are assumed for these two probabilities; rather, both are estimated nonparametrically by kernel smoothers. We begin by introducing the simple weighted estimator, which is derived under the MAR assumption.

Because *ρ*(*W _{i}*) is a function of continuous variables such as

$$K({u}_{1})=\frac{{(-1)}^{r}{\phi}^{(2r-1)}({u}_{1})}{{2}^{r-1}(r-1)!{u}_{1}},$$

where ^{(2}^{r}^{−1)}(*u*_{1}) is the (2*r* − 1)-th derivative of the standard normal density function (*u*_{1}). Hall & Marron (1988) proposed a class of univariate kernels of order *r*:

$$K({u}_{1})={\pi}^{-1}{\int}_{0}^{\infty}cos({tu}_{1})exp(-{t}^{r})dt.$$

Some higher-order polynomial kernels can be found in Müller (1984) and Gasser, Müller & Mammitzsch (1985).

Define *K _{h}*(·) =

$$\widehat{\rho}(w)=\frac{{\sum}_{i=1}^{n}{\xi}_{i}{K}_{h}({w}_{1}-{W}_{1i})I({W}_{2i}={w}_{2})}{{\sum}_{i=1}^{n}{K}_{h}({w}_{1}-{W}_{1i})I({W}_{2i}={w}_{2})},$$

(3)

where *w* = (*w*_{1}*, w*_{2}). The choice of the kernel function *K* usually has little effect on the estimator (*w*), and thus the estimator of *β*_{0}, but the bandwidth sequence *h* typically does influence these estimators, both theoretically and practically. We assume that *h* satisfies *nh*^{2}* ^{r}* → 0 and

$$\widehat{\pi}(w)=\frac{{\sum}_{i=1}^{n}{\xi}_{i}{\delta}_{i}{K}_{h}({w}_{1}-{W}_{1i})I({W}_{2i}={w}_{2})}{{\sum}_{i=1}^{n}{\xi}_{i}{K}_{h}({w}_{1}-{W}_{1i})I({W}_{2i}={w}_{2})}.$$

(4)

Note that the kernel function *K* and bandwidth sequence *h* used in (3) need not be identical to those used in (4), and the bandwidth can be different for each component of *W*_{1}* _{i}*. For example, we can define

Let
${\mathrm{\Lambda}}_{0}(t)={\int}_{0}^{t}{\lambda}_{0}(s)ds$ denote the baseline cumulative hazard function. Using the inverse probability weighted approach, consider the following estimating equations for *β*_{0} and Λ_{0}:

$$\sum _{i=1}^{n}{\int}_{0}^{\tau}\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}{Z}_{i}\left[d{N}_{i}^{u}(t)-{Y}_{i}(t){\beta}^{\prime}{Z}_{i}dt-{Y}_{i}(t)d{\mathrm{\Lambda}}_{0}(t)\right]=0,$$

(5)

$$\sum _{i=1}^{n}\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}\left[d{N}_{i}^{u}(t)-{Y}_{i}(t){\beta}^{\prime}{Z}_{i}dt-{Y}_{i}(t)d{\mathrm{\Lambda}}_{0}(t)\right]=0,$$

(6)

where
${N}_{i}^{u}(t)=I({X}_{i}\le t,{\delta}_{i}=1)$, *Y _{i}*(

$$\widehat{\beta}={\left[\sum _{i=1}^{n}{\int}_{0}^{\tau}\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}{Y}_{i}(t){\{{Z}_{i}-\overline{Z}(t)\}}^{\otimes 2}dt\right]}^{-1}\sum _{i=1}^{n}{\int}_{0}^{\tau}\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}\{{Z}_{i}-\overline{Z}(t)\}d{N}_{i}^{u}(t)$$

and

$${\widehat{\mathrm{\Lambda}}}_{0}(t)={\int}_{0}^{t}\frac{{\sum}_{i=1}^{n}\widehat{\rho}{({W}_{i})}^{-1}{\xi}_{i}[d{N}_{i}^{u}(s)-{Y}_{i}(s){\widehat{\beta}}^{\prime}{Z}_{i}ds]}{{\sum}_{i=1}^{n}\widehat{\rho}{({W}_{i})}^{-1}{\xi}_{i}{Y}_{i}(s)},$$

where *a*^{2} = *aa*′ for any vector *a* and

$$\overline{Z}(t)=\frac{{\sum}_{i=1}^{n}\widehat{\rho}{({W}_{i})}^{-1}{\xi}_{i}{Y}_{i}(t){Z}_{i}}{{\sum}_{i=1}^{n}\widehat{\rho}{({W}_{i})}^{-1}{\xi}_{i}{Y}_{i}(t)}.$$

In practice, we often choose *τ* to be the largest observation time, say *τ* = max{*X _{i}*}.

Let (*t*) = *E*[*Y _{i}*(

Under regularity conditions (C1)–(C6), which are stated in the Appendix, is consistent and n^{1/2}(−β_{0}) is asymptotically normal with mean zero and covariance matrix V = A^{−1}ΣA^{−1} + A^{−1}Σ^{*}A^{−1}, where

$$\begin{array}{l}\mathrm{\sum}=E\left[{\int}_{0}^{\tau}{\{{Z}_{i}-\overline{z}(t)\}}^{\otimes 2}d{N}_{i}^{u}(t)\right],\\ {\mathrm{\sum}}^{\ast}=E\left[\pi ({W}_{i})(1-\pi ({W}_{i}))(1-\rho ({W}_{i}))\rho {({W}_{i})}^{-1}{B}_{i}^{\otimes 2}\right],\end{array}$$

and

$$A=E\left[{\int}_{0}^{\tau}{Y}_{i}(t){\{{Z}_{i}-\overline{z}(t)\}}^{\otimes 2}dt\right].$$

Note that the first term in *V* is the asymptotic variance of the Lin & Ying (1994) estimator based only on the complete data (*ξ _{i}* 1) and the second term represents the effect of the missing censoring indicators. If we let
${\widehat{B}}_{i}={\int}_{0}^{\tau}\{{Z}_{i}-\overline{Z}(t)\}d{N}_{i}(t)$, then the covariance matrix

$$\begin{array}{l}\widehat{\mathrm{\sum}}={n}^{-1}\sum _{i=1}^{n}{\int}_{0}^{\tau}\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}{\{{Z}_{i}-\overline{Z}(t)\}}^{\otimes 2}d{N}_{i}^{u}(t),\\ {\widehat{\mathrm{\sum}}}^{\ast}={n}^{-1}\sum _{i=1}^{n}\widehat{\pi}({W}_{i})(1-\widehat{\pi}({W}_{i}))(1-\widehat{\rho}({W}_{i}))\widehat{\rho}{({W}_{i})}^{-1}{\widehat{B}}_{i}^{\otimes 2},\end{array}$$

and

$$\widehat{A}={n}^{-1}\sum _{i=1}^{n}{\int}_{0}^{\tau}\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}{Y}_{i}(t){\{{Z}_{i}-\overline{Z}(t)\}}^{\otimes 2}dt.$$

Define $d(t)={\int}_{0}^{t}\overline{z}(s)ds$ and

$$D(t)={\int}_{0}^{t}\left\{\frac{E[{Y}_{i}(s){Z}_{i}^{\otimes 2}]}{E[{Y}_{i}(s)]}-\overline{z}{(s)}^{\otimes 2}\right\}{\beta}_{0}ds.$$

The asymptotic properties of _{0}(*t*) are given in the next theorem.

Under the assumptions of Theorem 1, _{0}(t) converges in probability to Λ_{0}(t) uniformly in t [0, τ], and n^{1/2}{_{0}(t) − Λ_{0}(t)g converges weakly on [0, τ] to a zero-mean Gaussian process with covariance function at (t, s) (t ≤ s) equal to

$$\begin{array}{l}\mathrm{\Gamma}(t,s)={\int}_{0}^{t}\frac{dE\{{N}_{i}^{u}(u)\}}{{(E\{{Y}_{i}(u)\})}^{2}}+E\left\{\pi ({W}_{i})(1-\pi ({W}_{i}))(1-\rho ({W}_{i}))\rho {({W}_{i})}^{-1}{\int}_{0}^{t}\frac{d{N}_{i}(u)}{{(E\{{Y}_{i}(u)\})}^{2}}\right\}-d{(t)}^{\prime}{A}^{-1}\left\{\pi ({W}_{i})(1-\pi ({W}_{i}))(1-\rho ({W}_{i}))\rho {({W}_{i})}^{-1}{\int}_{0}^{s}\frac{({Z}_{i}-\overline{z}(u))d{N}_{i}(u)}{E\{{Y}_{i}(u)\}}\right\}\\ -d{(s)}^{\prime}{A}^{-1}\left\{\pi ({W}_{i})(1-\pi ({W}_{i}))(1-\rho ({W}_{i}))\rho {({W}_{i})}^{-1}{\int}_{0}^{t}\frac{({Z}_{i}-\overline{z}(u))d{N}_{i}(u)}{E\{{Y}_{i}(u)\}}\right\}+d{(t)}^{\prime}{A}^{-1}(\mathrm{\sum}+{\mathrm{\sum}}^{\ast}){A}^{-1}d(s)-d{(t)}^{\prime}{A}^{-1}D(s)-d{(s)}^{\prime}{A}^{-1}D(t).\end{array}$$

The covariance function Γ(*t, s*) can be consistently estimated by substituting , and for the unknowns *β*_{0}, *ρ* and *π* in the appropriate empirical estimators, and by replacing the (unobserved) processes
${N}_{i}^{u}$ with
$\widehat{\rho}{({W}_{i})}^{-1}{\xi}_{i}{N}_{i}^{u}$. For an individual with a given covariate vector *z*_{0}, the corresponding estimator of the survival function *S*(*t, z*_{0}) is

$$\widehat{S}(t,{z}_{0})=exp\{-{\widehat{\mathrm{\Lambda}}}_{0}(t)-{\widehat{\beta}}^{\prime}{z}_{0}t\}.$$

Using the functional delta-method and Theorem 2, we can obtain the asymptotic properties of *Ŝ*(*t, z*_{0}), which can be applied to construct confidence bands for *S*(*t, z*_{0}).

When the missingness probability *ρ*(*w*) is known or a parametric model is specified for *ρ*(*w*), the simple weighted estimator uses only the complete case data (i.e., only individuals with *ξ _{i}* = 1), and the fully augmented weighted estimator (also called the double robust estimator) incorporates contributions from the incomplete observations (i.e., individuals with

The fully augmented weighted estimators for *β*_{0} and Λ_{0} are the solutions to the following estimating equations:

$$\sum _{i=1}^{n}{\int}_{0}^{\tau}{Z}_{i}\left[\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}d{N}_{i}^{u}(t)+\left(1-\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}\right)\widehat{\pi}({W}_{i})d{N}_{i}(t)-{Y}_{i}(t){\beta}^{\prime}{Z}_{i}dt-{Y}_{i}(t)d{\mathrm{\Lambda}}_{0}(t)\right]=0,$$

(7)

$$\sum _{i=1}^{n}\left[\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}d{N}_{i}^{u}(t)+\left(1-\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}\right)\widehat{\pi}({W}_{i})d{N}_{i}(t)-{Y}_{i}(t){\beta}^{\prime}{Z}_{i}dt-{Y}_{i}(t)d{\mathrm{\Lambda}}_{0}(t)\right]=0.$$

(8)

The resulting fully augmented weighted estimators for *β*_{0} and Λ_{0} have the following closed forms:

$${\widehat{\beta}}_{a}={\left[\sum _{i=1}^{n}{\int}_{0}^{\tau}{Y}_{i}(t){\{{Z}_{i}-{\overline{Z}}^{\ast}(t)\}}^{\otimes 2}dt\right]}^{-1}\sum _{i=1}^{n}{\int}_{0}^{\tau}\{{Z}_{i}-{\overline{Z}}^{\ast}(t)\}\times \left[\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}d{N}_{i}^{u}(t)+\left(1-\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}\right)\widehat{\pi}({W}_{i})d{N}_{i}(t)\right]$$

and

$${\widehat{\mathrm{\Lambda}}}_{a}(t)={\int}_{0}^{t}\frac{{\sum}_{i=1}^{n}\left[\widehat{\rho}{({W}_{i})}^{-1}{\xi}_{i}d{N}_{i}^{u}(s)+(1-\widehat{\rho}{({W}_{i})}^{-1}{\xi}_{i})\widehat{\pi}({W}_{i})d{N}_{i}(s)-{Y}_{i}(s){\widehat{\beta}}_{a}^{\prime}{Z}_{i}ds\right]}{{\sum}_{i=1}^{n}{Y}_{i}(s)},$$

where

$${\overline{Z}}^{\ast}(t)=\frac{{\sum}_{i=1}^{n}{Z}_{i}{Y}_{i}(t)}{{\sum}_{i=1}^{n}{Y}_{i}(t)}.$$

Similar to Theorems 1 and 2, the asymptotic properties of * _{a}* and

Under the assumptions of Theorem 1, we have:

_{a}is consistent and n^{1/2}(_{a}− β_{0}) is asymptotically normal with mean zero and covariance matrix V:_{a}(t) converges in probability to Λ_{0}(t) uniformly in t [0, τ], and n^{1/2}{_{a}(t) − Λ_{0}(t)} converges weakly on [0, τ] to a zero-mean Gaussian process with covariance function Γ(t, s) at (t, s) (t ≤ s).

For the fully augmented weighted method, the covariance matrix *V* and covariance function Γ(*t, s*) can be consistently estimated by substituting * _{a}*, and for

Theorems 1, 2 and 3 show that both the simple and fully augmented weighted estimators have the same asymptotic normal distribution, and the resulting estimators of the baseline cumulative hazard function converge to the same Gaussian process. This means that the simple weighted estimators with nonparametric (*w*) are as efficient as the kernel-assisted fully augmented weighted estimators. One intuitive explanation for this is that the incomplete observations are indirectly incorporated in the simple weighted estimator by using the inverse of (*w*) as a weight.

Note that _{0}(*t*) and * _{a}*(

We conducted simulation studies to examine and compare the finite-sample performance of the simple and fully augmented weighted estimators proposed in Section 2, and also to compare their performance with that of the full data and complete-case analyses under the MAR model. In these studies, we considered three situations for the covariate *Z*: (a) *Z* was assumed to follow a Bernoulli distribution with success probability 0.5; (b) *Z* was generated from a uniform distribution on (0*,* 1); (c) *Z* = (*Z*_{1}*, Z*_{2})′*,* where *Z*_{1} follows a uniform distribution on (0*,* 1) and *Z*_{2} follows a Bernoulli distribution with success probability 0.5. The underlying additive hazards model for the failure time *T* was taken to be
$\lambda (t\mid Z)=1+{\beta}_{0}^{\prime}Z$, where *β*_{0} = 0, 0.5 and 1 for the case *Z* is a scalar, and *β*_{0} = (0, 0)′ and *β*_{0} = (1, −1)′ for the two-dimensional covariate. The censoring time C was generated from a uniform distribution on (0*, c*), where *c* was selected to give a censoring rate of either 15% or 55%.

The missingness indicators were generated from the logistic model

$$\rho (W)=\frac{exp({\theta}^{\prime}W)}{1+exp({\theta}^{\prime}W)},$$

(9)

where *W* = (*X, Z*), *X* = *T* ^ *C,* and *θ* was chosen to produce a missingness rate of 50% under each censoring level. When Z was a Bernoulli random variable, there was only one (*d* = 1) continuous element in *W*, and we used the univariate Gaussian kernel function *K*(*u*) = (2*π*)^{−1/2} exp(−*u*^{2}*/*2) and a bandwidth of *h* = 0.5*n*^{−1/3}, with sample size of *n* = 100. When Z was a uniform random variable or a two-dimensional covariate as in (c), there were two (*d* = 2) continuous elements in *W*, and we used the bivariate Gaussian-based kernel function of order 4 (Wand & Schucany 1990)

$$K({u}_{1},{u}_{2})=\frac{1}{8\pi}(3-{u}_{1}^{2})(3-{u}_{2}^{2})exp(-({u}_{1}^{2}+{u}_{2}^{2})/2)$$

(10)

and a bandwidth vector of *h* = (*h*_{1}*, h*_{2})′ = (1.5*n*^{−1/5}*, n*^{−1/5})′, with sample size of *n* = 400. We took *τ* to be the largest observed value of *X,* so that all data were used in the analysis. All simulation studies were based on 1000 replications for each combination of parameters.

Our simulation results are summarized in Tables 1 and and2.2. In these tables, Bias is the sample mean of the estimate minus the true value; MSE is the sample mean of the squared differences between the estimate and the true value; and CP is the 95% empirical coverage probability for *β*_{0} based on a normal approximation. Similar summaries for the full-data and complete-case estimators are calculated for comparison.

Tables 1 and and22 show that the complete case estimator is highly biased in all situations, with coverage probabilities that are too small, whereas the simple and fully augmented weighted estimators are nearly unbiased, with very reasonable coverage probabilities. Furthermore, the simple and fully augmented weighted estimators have similar MSE values, which are only slightly larger than those of the full data estimator and are often much smaller than those of the complete case estimator. These results suggest that our proposed estimators are more efficient than the complete case estimator and are adequate for practical use. We also simulated data under different parameter configurations and obtained similar results.

We compared the proposed methods and the parametric approach of Lu & Liang (2008) under MAR and MCAR assumptions. Data were simulated under correctly and incorrectly specified parametric models, using the same setup as in Table 1 with a censoring rate of 55% and a missingness rate of 50%, where *Z* follows a Bernoulli distribution with a sample size of *n* = 200 and *β*_{0} = 0 and 1. The results are presented in Table 3. In Table 3, LIPW1 and LIPW2 stand for the inverse probability weighted (IPW) estimators of Lu & Liang (2008) when using the logistic model and the constant model for *ρ*(*w*), respectively; LDR1 and LDR2 stand for the double robust (DR) estimators of Lu & Liang (2008) when using the logistic model and the constant model for *ρ*(*w*), respectively. In all cases, we used a constant model for *π*(*w*), which is misspecified.

Comparison of the proposed method with the parametric approach of Lu and Liang (2008) under MAR and MCAR for a missingness rate of 50%

It can be seen from Table 3 that the proposed methods are essentially unbiased in all the settings, and the parametric approach of Lu & Liang (2008) is also unbiased when the parametric model for *ρ*(*w*) is correctly specified. Furthermore, the proposed estimators are as efficient as the DR estimator of Lu & Liang (2008), and are more efficient than the IPW estimator of Lu & Liang (2008). When both *ρ*(*w*) and *π*(*w*) are misspecified, however, both the IPW and DR estimators of Lu & Liang (2008) are biased under MAR. The key advantage of our method is that it provides reasonable estimation without making parametric modeling assumptions about *ρ*(*w*) and *π*(*w*). Rather than assuming parametric models, our approach uses nonparametric smoothing techniques to estimate these probabilities. In addition, the proposed estimators are more efficient than the complete case estimator under MCAR. So, if MCAR is true, our proposed approach still works well and does not lose efficiency.

We also conducted simulation studies to examine the performance of the proposed methods when MNAR (missing not at random) is true. In the study, the setup was the same as in Table 1, where *Z* follows a uniform distribution on (0, 1) with *β*_{0} = 0 and *n* = 400, except that the censoring rate was set to be 20%, and the missingness probability was given by

$$\rho (W,\delta )=\frac{exp({\theta}_{1}^{\prime}W+{\theta}_{2}\delta )}{1+exp({\theta}_{1}^{\prime}W+{\theta}_{2}\delta )},$$

where *θ*_{1} and *θ*_{2} were chosen to produce a missingness rate of either 20% or 50%. The results are summarized in Table 4. It can be seen from Table 4 that the proposed estimation procedures perform well when the missingness rate is low (say, 20%), but when the missingness rate is high (say, 50%), the proposed estimators are a little biased. However, the biases are relatively small compared to those of the complete case estimator.

We applied our methods to the brain cancer data mentioned earlier. We analyzed the data on all 387 patients who entered the clinical trial with a form of brain cancer known as glioblastoma. Dinse (1982) used a subset of these data to illustrate his nonparametric maximum likelihood analysis, which did not account for covariates. All patients were ambulatory when they entered the trial, but over time some lost their mobility, some had a progression of their cancer, and some experienced both events. As a measure of quality of life, we defined “survival time” as the time to non-ambulatory progression, and we evaluated the effects of sex and age on this event time.

Of the 387 patients, 86 progressed and were non-ambulatory, 24 progressed but were still ambulatory, 220 did not progress by the end of the study, and 57 progressed but had an unknown ambulatory status. Thus, our analysis treated these outcomes as 86 uncensored times, 244 censored times, and 57 times with a missing censoring indicator. There were 144 women and 243 men, ranging in age from 14 to 74 years, and the length of time on study (or until progression) varied from 2 to 1088 days.

Let *X* be the observed time (in days), measured from the beginning of the trial, and let *δ* indicate whether the patient had progressed and was non-ambulatory. We defined *Z*_{1} to be a binary indicator of the patient’s sex, which was 1 for men and 0 for women, and *Z*_{2} to be the age at trial entry (in years), which was treated as a continuous covariate. Since *W* = (*X, Z*_{1}*, Z*_{2}) contains two continuous elements, we used the bivariate Gaussian-based kernel function of order 4 for *K*, as defined in (10), with a bandwidth vector of *h* = (*h*_{1}*, h _{2}*)′ = (34, 10)′. We used

The analysis of the brain cancer data is summarized in Table 5, which gives the results for our simple weighted estimator (SWE) and our fully augmented weighted estimator (FAWE). For comparison, Table 5 also gives the results of the complete case (CC) analysis. None of the three methods suggested that men and women had different hazard rates for non-ambulatory progression. On the other hand, our two estimators showed that age is important (*p* = 0.037 for SWE and *p* = 0.011 for FAWE), but the CC analysis did not (*p* = 0.367). Specifically, the hazard rate for non-ambulatory progression increased as patients grew older, which is consistent with worsening quality of life. The age coefficients were of similar magnitude for all three methods, but the standard error was much larger for the CC analysis than for our SWE and FAWE analyses. Thus, as a result of excluding data, the complete case analysis missed the age effect on non-ambulatory progression that our approaches appropriately identified.

Model (1) has the limitation that the linear predictor
${\beta}_{0}^{\prime}Z$ needs to be constrained to ensure non-negativity for the right side of (1). One may avoid this constraint by using a nonnegative link function, such as
${\lambda}_{0}(t)+exp({\beta}_{0}^{\prime}Z)$. The ideas presented in this paper can be applied to any regression function
$g({\beta}_{0}^{\prime}Z)$, where *g*(·) is a known link function. In addition, Our approach can be extended to incorporate missing covariates (Qi, Wang & Prentice 2005) in the situation where both the failure indicators and the covariates are partially observed.

Nonparametric kernel estimation can be done for a small number of continuous covariates, but for categorical covariates, it would usually require stratified kernel estimation within each strata defined by the categorical covariates. In practice, when there are too many categories, it may be desirable to specify a more flexible model for the missingness probability, such as a partially linear additive model, and then use local kernel regression to estimate the missingness probability. Here we focus on a kernel estimation approach for *ρ*(*w*) and *π*(*w*). Of course, other smoothing techniques such as the local polynomial method (Fan & Gijbels 1996) may be used and require the same assumptions. Furthermore, *n*^{1/2}-rate asymptotic normality of the proposed estimators indicates that an appropriate choice for the bandwidth sequence *h* depends only on the second order terms of the mean square error of the estimators, and thus bandwidth selection may not be critical for estimating *β*_{0} and Λ_{0}.

Since the estimating functions in (5) to (8) were obtained in a somewhat ad hoc fashion, it might be worthwhile investigating possible improvements that could result from other approaches, such as the one suggested by McKeague & Sasieni (1994) or perhaps a nonparametric maximum likelihood approach. Alternatively, estimation procedures based on the general Aalen additive model (Aalen 1980) or the linear transformation model (Gao & Tsiatis 2005) with missing censoring information might also be worthy of investigation.

Another limitation of the approach given here is that the covariates *Z* are time-invariant. In some applications, we might want to incorporate time-dependent covariates. Thus, a more general approach might extend model (1) to a time-varying version:

$$\lambda (t\mid Z(t))={\lambda}_{0}(t)+{\beta}_{0}{(t)}^{\prime}Z(t),$$

where *β*_{0}(*t*) is an unknown *p*-vector of time-varying regression coefficients and *Z*(*t*) is a vector of covariates that may depend on time. However, the proposed estimation procedure cannot be extended in a straightforward manner to deal with time-dependent covariates because of the curse of dimensionality created by *Z*(*t*) and a need for alternative smoothing techniques for estimating *β*_{0}(*t*). In addition, when the dimension of *Z*(*t*) is high, the probabilities *ρ*(*w*) and *π*(*w*) can be modeled parametrically (Lu & Liang 2008). As a different approach, perhaps dimension-reduction techniques could be extended in conjunction with a partially linear model (Liang, Härdle & Carroll 1999) for *ρ*(*w*) and *π*(*w*).

The authors would like to thank the Editor (Paul Gustafson), the Associate Editor, two reviewers and Shyamal Peddada for their constructive and insightful comments and suggestions that greatly improved the paper. Xinyuan Song’s research was fully supported by two grants from the Research Grant Council of the Hong Kong Special Administration Region. Liuquan Sun’s research was fully supported by the National Natural Science Foundation of China Grants, the National Basic Research Program of China (973 Program) and Key Laboratory of RCSDS, CAS. Gregg Dinse’s research was supported by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences.

We will use the same notation defined in the previous sections and assume that the following regularity conditions hold:

- (C1) Λ
_{0}(*τ*) < ∞*, Pr*(*X*≥*τ*) > 0*, Z*is bounded, and ${inf}_{0\le t\le \tau}[{\lambda}_{0}(t)+{\beta}_{0}^{\prime}Z]>0$ a.e. - (C2) The probability (density)
*f*(*w*) of*W*is bounded away from 0, and has_{i}*r*continuous and bounded partial derivatives with respect to the continuous components of*W*a.e._{i} - (C3) The missingness probability
*ρ*(*w*) is bounded away from 0, and has*r*continuous and bounded partial derivatives with respect to the continuous components of*W*a.e._{i} - (C4) The conditional probability
*π*(*w*) has*r*continuous and bounded partial derivatives with respect to the continuous components of*W*a.e._{i} - (C5) $A=E[{\int}_{0}^{\tau}{Y}_{i}(t){\{{Z}_{i}-\overline{z}(t)\}}^{\otimes 2}dt]$ is nonsingular.
- (C6)
*nh*^{2}→ 0 and^{r}*nh*^{2}→ ∞, as^{d}*n*→ ∞.

We give the proof of Theorem 3 and outline the proof of Theorem 1; Theorem 2 can be proven in the same manner. For notational convenience, we assume that all components of *W _{i}* are continuous in the following proof.

Substituting * _{a}* into equation (7), we find that

$$U(\beta )=\sum _{i=1}^{n}{\int}_{0}^{\tau}\{{Z}_{i}-{\overline{Z}}^{\ast}(t)\}\phantom{\rule{0.16667em}{0ex}}\left[\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}d{N}_{i}^{u}(t)+\left(1-\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}\right)\widehat{\pi}({W}_{i})d{N}_{i}(t)-{Y}_{i}(t){\beta}^{\prime}{Z}_{i}dt\right].$$

Let ${M}_{i}(t)={N}_{i}^{u}(t)-{\int}_{0}^{t}{Y}_{i}(s)\{{\lambda}_{0}(s)+{\beta}_{0}^{\prime}{Z}_{i}\}ds$. Then we can write

$$U({\beta}_{0})={U}_{1}({\beta}_{0})+{U}_{2}({\beta}_{0})+{U}_{3}({\beta}_{0}),$$

(A.1)

where

$$\begin{array}{l}{U}_{1}({\beta}_{0})=\sum _{i=1}^{n}{\int}_{0}^{\tau}\{{Z}_{i}-\overline{Z}(t)\}d{M}_{i}(t),\\ {U}_{2}({\beta}_{0})=\sum _{i=1}^{n}{\int}_{0}^{\tau}\{{Z}_{i}-\overline{Z}(t)\}\phantom{\rule{0.16667em}{0ex}}\left(\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}-1\right)d{N}_{i}^{u}(t),\end{array}$$

and

$${U}_{3}({\beta}_{0})=\sum _{i=1}^{n}{\int}_{0}^{\tau}\{{Z}_{i}-\overline{Z}(t)\}\phantom{\rule{0.16667em}{0ex}}\left(1-\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}\right)\widehat{\pi}({W}_{i})d{N}_{i}(t).$$

Note that *U*_{1}(*β*_{0}) is a martingale integral. Thus, it follows that

$${U}_{1}({\beta}_{0})=\sum _{i=1}^{n}{\int}_{0}^{\tau}\{{Z}_{i}-\overline{z}(t)\}d{M}_{i}(t)+{o}_{p}({n}^{1/2}).$$

(A.2)

Define
${\mathrm{\Phi}}_{n}(t)={n}^{-1}{\sum}_{i=1}^{n}(\widehat{\rho}{({W}_{i})}^{-1}{\xi}_{i}-1){N}_{i}^{u}(t)$, and write Φ* _{n}*(

$$\begin{array}{l}{\mathrm{\Phi}}_{n1}(t)={n}^{-1}\sum _{i=1}^{n}\left({\xi}_{i}-\rho ({W}_{i})\right)\frac{{N}_{i}^{u}(t)}{\rho ({W}_{i})},\\ {\mathrm{\Phi}}_{n2}(t)={n}^{-1}\sum _{i=1}^{n}\left(\rho ({W}_{i})-\widehat{\rho}({W}_{i})\right)\frac{{N}_{i}^{u}(t)}{\rho ({W}_{i})},\end{array}$$

and

$${\mathrm{\Phi}}_{n3}(t)={n}^{-1}\sum _{i=1}^{n}\left({\xi}_{i}-\widehat{\rho}({W}_{i})\right)\phantom{\rule{0.16667em}{0ex}}\left(\rho ({W}_{i})-\widehat{\rho}({W}_{i})\right)\frac{{N}_{i}^{u}(t)}{\widehat{\rho}({W}_{i})\rho ({W}_{i})}.$$

By the uniform strong law of large numbers (Pollard, 1990), sup_{0≤}_{t}_{≤}* _{τ}* |Φ

$${\mathrm{\Phi}}_{n2}(t)={n}^{-2}\sum _{i=1}^{n}\sum _{j=1}^{n}\frac{(\rho ({W}_{i})-{\xi}_{j}){K}_{h}({W}_{i}-{W}_{j}){N}_{i}^{u}(t)}{\rho ({W}_{i}){h}^{d}\widehat{f}({W}_{i})},$$

where (*w*) = (*nh*^{d})^{−1}*K _{h}*(

$${\mathrm{\Phi}}_{n21}(t)={n}^{-2}\sum _{i=1}^{n}\sum _{j=1}^{n}\frac{(\rho ({W}_{i})-{\xi}_{j}){K}_{h}({W}_{i}-{W}_{j}){N}_{i}^{u}(t)}{\rho ({W}_{i}){h}^{d}f({W}_{i})},$$

and

$${\mathrm{\Phi}}_{n22}(t)={n}^{-2}\sum _{i=1}^{n}\sum _{j=1}^{n}\frac{(\rho ({W}_{i})-{\xi}_{j}){K}_{h}({W}_{i}-{W}_{j}){N}_{i}^{u}(t)(\widehat{f}({W}_{i})-f({W}_{i}))}{\rho ({W}_{i}){h}^{d}f{({W}_{i})}^{2}}.$$

A straightforward calculation yields that *E*{Φ_{n}_{21}(*t*)} = *O*(*h ^{r}*) → 0, and

$$\underset{0\le t\le \tau}{sup}\mid {\mathrm{\Phi}}_{n}(t)\mid \phantom{\rule{0.16667em}{0ex}}={o}_{p}(1).$$

(A.3)

The functional central limit theorem (Pollard 1990) implies that

$$\underset{0\le t\le \tau}{sup}\mid {\overline{Z}}^{\ast}(t)-\overline{z}(t)\mid \phantom{\rule{0.16667em}{0ex}}={O}_{p}({n}^{-1/2}).$$

(A.4)

Using (A.3) and (A.4), we have

$${\int}_{0}^{\tau}\{{\overline{Z}}^{\ast}(t)-\overline{z}(t)\}d{\mathrm{\Phi}}_{n}(t)={o}_{p}({n}^{-1/2}).$$

Hence,

$$\begin{array}{l}{U}_{2}({\beta}_{0})=\sum _{i=1}^{n}{\int}_{0}^{\tau}\{{Z}_{i}-\overline{z}(t)\}\phantom{\rule{0.16667em}{0ex}}\left(\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}-1\right)d{N}_{i}^{u}(t)-n{\int}_{0}^{\tau}\{{\overline{Z}}^{\ast}(t)-\overline{z}(t)\}d{\mathrm{\Phi}}_{n}(t)\\ =\sum _{i=1}^{n}\left(\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}-1\right){\delta}_{i}{B}_{i}+{o}_{p}({n}^{1/2}).\end{array}$$

(A.5)

In a similar manner, we obtain

$${U}_{3}({\beta}_{0})=\sum _{i=1}^{n}\left(1-\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}\right)\widehat{\pi}({W}_{i}){B}_{i}+{o}_{p}({n}^{1/2}).$$

(A.6)

Thus, it follows from (A.1), (A.2), (A.5) and (A.6) that

$$\begin{array}{l}U({\beta}_{0})=\sum _{i=1}^{n}{\int}_{0}^{\tau}\{{Z}_{i}-\overline{z}(t)\}d{M}_{i}(t)+\sum _{i=1}^{n}\left(\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}-1\right)\phantom{\rule{0.16667em}{0ex}}({\delta}_{i}-\widehat{\pi}({W}_{i})){B}_{i}+{o}_{p}({n}^{1/2})\\ =\sum _{i=1}^{n}{\int}_{0}^{\tau}\{{Z}_{i}-\overline{z}(t)\}d{M}_{i}(t)+\sum _{i=1}^{n}\left(\frac{{\xi}_{i}}{\rho ({W}_{i})}-1\right)\phantom{\rule{0.16667em}{0ex}}({\delta}_{i}-\pi ({W}_{i})){B}_{i}+{R}_{n1}+{R}_{n2}+{o}_{p}({n}^{1/2}),\end{array}$$

where

$${R}_{n1}=\sum _{i=1}^{n}\left(1-\frac{{\xi}_{i}}{\rho ({W}_{i})}\right)\phantom{\rule{0.16667em}{0ex}}(\widehat{\pi}({W}_{i})-\pi ({W}_{i})){B}_{i},$$

and

$${R}_{n2}=\sum _{i=1}^{n}\left(\widehat{\rho}({W}_{i})-\rho ({W}_{i})\right)\phantom{\rule{0.16667em}{0ex}}(\widehat{\pi}({W}_{i})-{\delta}_{i})\frac{{\xi}_{i}{B}_{i}}{\widehat{\rho}({W}_{i})\rho ({W}_{i})}.$$

Let *m*(*w*) = *ρ*(*w*)*f*(*w*) and
$\widehat{m}(w)={({nh}^{d})}^{-1}{\sum}_{i=1}^{n}{\xi}_{i}{K}_{h}(w-{W}_{i})$. Then by the Taylor expansion of 1/*W _{i}*) at

$$\sum _{i=1}^{n}(\widehat{\pi}({W}_{i})-\pi ({W}_{i})){B}_{i}={R}_{n11}+{R}_{n12}+{o}_{p}({n}^{1/2}),$$

where

$${R}_{n11}={n}^{-1}\sum _{i=1}^{n}\sum _{j=1}^{n}\frac{{\xi}_{j}({\delta}_{j}-\pi ({W}_{i})){K}_{h}({W}_{i}-{W}_{j}){B}_{i}}{{h}^{d}m({W}_{i})},$$

and

$${R}_{n12}=-{n}^{-1}\sum _{i=1}^{n}\sum _{j=1}^{n}\frac{{\xi}_{j}({\delta}_{j}-\pi ({W}_{i})){K}_{h}({W}_{i}-{W}_{j})(\widehat{m}({W}_{i})-m({W}_{i})){B}_{i}}{{h}^{d}m{({W}_{i})}^{2}}.$$

Define

$${R}_{n11}^{\ast}={n}^{-1/2}{R}_{n11}-{n}^{-1/2}\sum _{i=1}^{n}\frac{{\xi}_{i}({\delta}_{i}-\pi ({W}_{i})){B}_{i}}{\rho ({W}_{i})}.$$

Some straightforward calculation gives $E\{{R}_{n11}^{\ast}\}=O({n}^{1/2}{h}^{r})\to 0$, and $\mathit{Var}\{{R}_{n11}^{\ast}\}=O({nh}^{2r}+{({nh}^{2})}^{-1})\to 0$, which imply that

$${R}_{n11}=\sum _{i=1}^{n}\frac{{\xi}_{i}({\delta}_{i}-{\pi}_{i}({W}_{i})){B}_{i}}{\rho ({W}_{i})}+{o}_{p}({n}^{1/2}).$$

Similarly, we have *R _{n}*

$$\sum _{i=1}^{n}(\widehat{\pi}({W}_{i})-\pi ({W}_{i})){B}_{i}=\sum _{i=1}^{n}\frac{{\xi}_{i}({\delta}_{i}-{\pi}_{i}({W}_{i})){B}_{i}}{\rho ({W}_{i})}+{o}_{p}({n}^{1/2}).$$

(A.7)

In a similar manner, we obtain

$$\sum _{i=1}^{n}\frac{{\xi}_{i}}{\rho ({W}_{i})}\left(\widehat{\pi}({W}_{i})-\pi ({W}_{i})\right){B}_{i}=\sum _{i=1}^{n}\frac{{\xi}_{i}({\delta}_{i}-{\pi}_{i}({W}_{i})){B}_{i}}{\rho ({W}_{i})}+{o}_{p}({n}^{1/2}).$$

(A.8)

It follows from (A.7) and (A.8) that *R _{n}*

$$U({\beta}_{0})=\sum _{i=1}^{n}{\int}_{0}^{\tau}\{{Z}_{i}-\overline{z}(t)\}d{M}_{i}(t)+\sum _{i=1}^{n}\left(\frac{{\xi}_{i}}{\rho ({W}_{i})}-1\right)\phantom{\rule{0.16667em}{0ex}}({\delta}_{i}-\pi ({W}_{i})){B}_{i}+{o}_{p}({n}^{1/2}).$$

(A.9)

The law of large numbers and the multivariate central limit theorem show that *n*^{−1}*U*(*β*_{0}) → 0 in probability and *n*^{−1/2}*U*(*β*_{0}) converges in distribution to a normal random variable with mean zero and variance matrix Σ + Σ^{*}. Note that

$${\widehat{\beta}}_{a}-{\beta}_{0}=-\frac{\partial U(\beta )}{\partial \beta}U({\beta}_{0}),$$

and

$$-{n}^{-1}\frac{\partial U(\beta )}{\partial \beta}={n}^{-1}\sum _{i=1}^{n}{\int}_{0}^{\tau}{Y}_{i}(t){\{{Z}_{i}-{\overline{Z}}^{\ast}(t)\}}^{\otimes 2}dt\to A$$

almost surely by the uniform strong law of large numbers (Pollard 1990). Then it follows from (A.9) that * _{a}* is consistent and

First write

$${\widehat{\mathrm{\Lambda}}}_{a}(t)-{\mathrm{\Lambda}}_{0}(t)={\int}_{0}^{t}\frac{{\sum}_{i=1}^{n}d{M}_{i}(s)}{{\sum}_{i=1}^{n}{Y}_{i}(s)}-{({\widehat{\beta}}_{a}-{\beta}_{0})}^{\prime}{\int}_{0}^{t}{\overline{Z}}^{\ast}(s)ds+\sum _{i=1}^{n}\left(\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}-1\right)\phantom{\rule{0.16667em}{0ex}}({\delta}_{i}-\widehat{\pi}({W}_{i})){\int}_{0}^{t}\frac{d{N}_{i}(s)}{{\sum}_{i=1}^{n}{Y}_{i}(s)}.$$

Note that

$$\underset{0\le t\le \tau}{sup}\left|{n}^{-1}\sum _{i=1}^{n}{Y}_{i}(t)-E[{Y}_{1}(t)]\right|={O}_{p}({n}^{-1/2}).$$

Following similar arguments as in the proof of (i), we obtain

$${\widehat{\mathrm{\Lambda}}}_{a}(t)-{\mathrm{\Lambda}}_{0}(t)={n}^{-1}{\int}_{0}^{t}\frac{{\sum}_{i=1}^{n}d{M}_{i}(s)}{E\{{Y}_{i}(s)\}}-d{(t)}^{\prime}({\widehat{\beta}}_{a}-{\beta}_{0})+{n}^{-1}\sum _{i=1}^{n}\left(\frac{{\xi}_{i}}{\rho ({W}_{i})}-1\right)\phantom{\rule{0.16667em}{0ex}}({\delta}_{i}-\pi ({W}_{i})){\int}_{0}^{t}\frac{d{N}_{i}(s)}{E\{{Y}_{i}(s)\}}+{o}_{p}({n}^{-1/2})$$

(A.10)

uniformly on [0, *τ*]. In view of the consistency of * _{a}*, it follows from the uniform strong law of large numbers and the multivariate central limit theorem that sup

Note that is the solution to *U*^{*}(*β*) = 0, where

$${U}^{\ast}(\beta )=\sum _{i=1}^{n}{\int}_{0}^{\tau}\frac{{\xi}_{i}}{\widehat{\rho}({W}_{i})}\left\{{Z}_{i}-\overline{Z}(t)\right\}\phantom{\rule{0.16667em}{0ex}}\left[d{N}_{i}^{u}(t)-{Y}_{i}(t){\beta}^{\prime}{Z}_{i}dt\right].$$

Then it can be checked that

$${U}^{\ast}({\beta}_{0})={U}_{1}^{\ast}({\beta}_{0})+{U}_{2}^{\ast}({\beta}_{0}),$$

(A.11)

where

$${U}_{1}^{\ast}({\beta}_{0})=\sum _{i=1}^{n}{\int}_{0}^{\tau}\frac{{\xi}_{i}}{\rho ({W}_{i})}\left\{{Z}_{i}-\overline{Z}(t)\right\}d{M}_{i}(t),$$

and

$${U}_{2}^{\ast}({\beta}_{0})=\sum _{i=1}^{n}{\int}_{0}^{\tau}{\xi}_{i}\left(\frac{1}{\widehat{\rho}({W}_{i})}-\frac{1}{\rho ({W}_{i})}\right)\phantom{\rule{0.16667em}{0ex}}\left\{{Z}_{i}-\overline{Z}(t)\right\}d{M}_{i}(t).$$

Similarly to (A.2), we get

$${U}_{1}^{\ast}({\beta}_{0})=\sum _{i=1}^{n}{\int}_{0}^{\tau}\frac{{\xi}_{i}}{\rho ({W}_{i})}\left\{{Z}_{i}-\overline{z}(t)\right\}d{M}_{i}(t)+{o}_{p}({n}^{1/2}).$$

(A.12)

From an argument similar to that in the proof of (A.7), we have

$${U}_{2}^{\ast}({\beta}_{0})=-\sum _{i=1}^{n}{\int}_{0}^{\tau}\frac{{\xi}_{i}-\rho ({W}_{i})}{\rho ({W}_{i})}\phantom{\rule{0.16667em}{0ex}}\left\{{Z}_{i}-\overline{z}(t)\right\}d{M}_{i}(t)+\sum _{i=1}^{n}\left(\frac{{\xi}_{i}}{\rho ({W}_{i})}-1\right)\phantom{\rule{0.16667em}{0ex}}({\delta}_{i}-\pi ({W}_{i})){B}_{i}+{o}_{p}({n}^{1/2}).$$

(A.13)

It follows from (A.11)–(A.13) that

$${U}^{\ast}({\beta}_{0})=\sum _{i=1}^{n}{\int}_{0}^{\tau}\phantom{\rule{0.16667em}{0ex}}\{{Z}_{i}-\overline{z}(t)\}d{M}_{i}(t)+\sum _{i=1}^{n}\left(\frac{{\xi}_{i}}{\rho ({W}_{i})}-1\right)\phantom{\rule{0.16667em}{0ex}}({\delta}_{i}-\pi ({W}_{i})){B}_{i}+{o}_{p}({n}^{1/2}),$$

which implies that *n*^{−1/2}*U*^{*}(*β*_{0}) converges in distribution to a normal random variable with mean zero and variance matrix Σ + Σ^{*}. Then it follows from the Taylor expansion of *U*^{*}() that *n*^{1/2}( − *β*_{0}) is asymptotically normal with mean zero and covariance matrix *V* = *A*^{−1}(Σ + Σ^{*})*A*^{−1}.

*MSC 2000*: Primary 62N01; secondary 62G05.

Xinyuan SONG, Department of Statistics, Shatin, N. T., Hong Kong, P. R. China.

Liuquan SUN, Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, P.R. China.

Xiaoyun MU, Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, P.R. China.

Gregg E. DINSE, Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA.

- Aalen OO. A model for nonparametric regression analysis of counting processes. In: Klonecki W, Kozek A, Rosinski J, editors. Mathematical Statistics and Probability Theory, Lecture Notes in Statistics. 2. Springer-Verlag; New York: 1980. pp. 1–25.
- Dinse GE. Nonparametric estimation for partially-complete time and type of failure data. Biometrics. 1982;38:417–431. [PubMed]
- Fan J, Gijbels I. Local Polynomial Modelling and Its Applications. Chapman & Hall; London: 1996.
- Gao G, Tsiatis AA. Semiparametric estimators for the regression coefficients in the linear transformation competing risks model with missing cause of failure. Biometrika. 2005;92:875–891.
- Gasser T, Müller HG, Mammitzsch V. Kernels for nonparametric curve estimation. Journal of the Royal Statistical Society Series B. 1985;47:238–252.
- Gijbels I, Lin D, Ying Z. Tech Report 039–93. Mathematical Sciences Research Institute; Berkeley: 1993. Non- and semi-parametric analysis of failure time data with missing failure indicators.
- Gijbels I, Lin D, Ying Z. Non- and semi-parametric analysis of failure time data with missing failure indicators. IMS Lecture Notes-Monograph Series. Inverse Problems: Tomography, Networks and Beyond. 2007;54:203–223.
- Goetghebeur EJ, Ryan L. Analysis of competing risks survival data when some failure types are missing. Biometrika. 1995;82:821–833.
- Hall P, Marron JS. Choice of kernel order in density estimation. The Annals of Statistics. 1988;16:161–173.
- van der Laan MJ, McKeague IW. Efficient estimation from right-censored data when failure indicators are missing at random. The Annals of Statistics. 1998;26:164–182.
- Liang H, Härdle W, Carroll RJ. Estimation in a semiparametric partially linear errors-invariable model. The Annals of Statistics. 1999;27:1519–1535.
- Lin DY, Ying Z. Semiparametric analysis of the additive risk model. Biometrika. 1994;81:61–71.
- Little RJA, Rubin DB. Statistical analysis with missing data. Wiley; New York: 1987.
- Lo S-H. Estimating a survival function with incomplete cause-of-death data. Journal of Multivariate Analysis. 1991;39:217–235.
- Lu K, Tsiatis AA. Multiple imputation methods for estimating regression coefficients in the competing risks model with missing cause of failure. Biometrics. 2001;57:1191–1197. [PubMed]
- Lu W, Liang Y. Analysis of competing risks data with missing cause of failure under additive hazards model. Statistica Sinica. 2008;18:219–234.
- McKeague IW, Sasieni PD. A partly parametric additive risk model. Biometrika. 1994;81:501–514.
- McKeague IW, Subramanian S. Product-limit estimators and Cox regression with missing censoring information. Scandinavian Journal of Statistics. 1998;25:589–601.
- Müller HG. Smooth optimum kernel estimators of densities, regression curves and modes. The Annals of Statistics. 1984;12:766–774.
- Pollard D. Empirical Processes: Theory and Applications. Institute of Mathematical Statistics; Hayward, California: 1990.
- Qi L, Wang CY, Prentice RL. Weighted estimators for proportional hazards regression with missing covariates. Journal of the American Statistical Association. 2005;100:1250–1263.
- Subramanian S. Efficient estimation of regression coefficients and baseline hazard under proportionality of conditional hazards. Journal of Statistical Planning and Inference. 2000;84:81–94.
- Subramanian S. Asymptotically efficient estimation of a survival function in the missing censoring indicator model. Journal of Nonparametric Statistics. 2004;16:797–817.
- Subramanian S. Survival analysis for the missing censoring indicator model using kernel density estimation techniques. Statistical Methodology. 2006;3:125–136. [PMC free article] [PubMed]
- van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. Spring-Verlag; New York: 1996.
- Wand MP, Schucany WR. Gaussian-based kernels. The Canadian Journal of Statistics. 1990;18:197–204.
- Wang CY, Chen HY. Augmented inverse probability weighted estimator for Cox missing covariate regression. Biometrics. 2001;57:414–419. [PubMed]
- Wang Q, Ng KW. Asymptotically efficient product-limit estimators with censoring indicators missing at random. Statistica Sinica. 2008;18:749–768.
- Zhou X, Sun L. Additive hazards regression with missing censoring information. Statistica Sinica. 2003;13:1237–1257.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |