PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Can J Stat. Author manuscript; available in PMC Sep 1, 2011.
Published in final edited form as:
Can J Stat. Sep 2010; 38(3): 333–351.
doi:  10.1002/cjs.10072
PMCID: PMC3010164
NIHMSID: NIHMS202676
Additive hazards regression with censoring indicators missing at random
Xinyuan SONG, Liuquan SUN, Xiaoyun MU, and Gregg E. DINSE
Xinyuan SONG, Department of Statistics, Shatin, N. T., Hong Kong, P. R. China;
Xinyuan SONG: xysong/at/sta.cuhk.edu.hk; Liuquan SUN: slq/at/amt.ac.cn; Xiaoyun MU: muxy/at/amss.ac.cn; Gregg E. DINSE: dinse/at/niehs.nih.gov
Abstract
In this article, the authors consider a semiparametric additive hazards regression model for right-censored data that allows some censoring indicators to be missing at random. They develop a class of estimating equations and use an inverse probability weighted approach to estimate the regression parameters. Nonparametric smoothing techniques are employed to estimate the probability of non-missingness and the conditional probability of an uncensored observation. The asymptotic properties of the resulting estimators are derived. Simulation studies show that the proposed estimators perform well. They motivate and illustrate their methods with data from a brain cancer clinical trial.
Keywords: Additive hazards model, censoring, kernel smoother, missing at random, weighted estimating equation
In the analysis of failure time data, the cause of failure may be unknown for some subjects for a variety of reasons (e.g., autopsies were not performed or medical records were missing). We motivate and illustrate our methods with data on patients from a brain cancer clinical trial, where we evaluate the effect of two potential explanatory variables on a measure of quality of life. All patients were initially ambulatory, but over time some lost their mobility, some had a progression of their cancer, and some experienced both events. To assess quality of life, we define “survival time” as the time to non-ambulatory progression. Thus, patients who progressed and were no longer ambulatory contributed uncensored times, patients who progressed but were still ambulatory or who had not progressed by the end of the study contributed censored times, and patients who progressed but whose ambulatory status was unknown contributed times with missing censoring indicators. We apply our regression analysis to evaluate the effects of sex and age on the time to non-ambulatory progression.
Specifically, let T be the failure time, let Z be a p×1 vector of covariates, and let C be a censoring time that is assumed to be conditionally independent of T given Z. Data are available on Z and X = T ^C, but the censoring indicator δ = I(TC) may be missing. If the probability that δ is missing does not depend on either the true value of δ or the values of X and Z, then a missing δ is said to be missing completely at random (MCAR). Alternatively, if the probability that δ is missing depends on the values of X and Z but not on the true value of δ, then a missing δ is said to be missing at random (MAR); see Little & Rubin (1987).
Under the MCAR assumption and in the absence of covariates, Dinse (1982) obtained a nonparametric maximum likelihood estimator (NPMLE) of the survival function using an EM algorithm. Lo (1991) proved that there are infinitely many NPMLEs and some of them may be inconsistent; he consequently constructed a consistent and asymptotically normal estimator. Gijbels, Lin & Ying (1993, 2007) and McKeague & Subramanian (1998) proposed further improvements on these estimators. When covariates are present, Gijbels, Lin & Ying (1993) initiated research on estimation under the Cox model. McKeague & Subramanian (1998) provided an alternative approach to estimation. Subramanian (2000) considered estimation under proportionality of conditional hazards. Zhou & Sun (2003) studied the additive hazards regression model.
Under the MAR assumption, van der Laan & McKeague (1998) first addressed efficient estimation of the survival function and proposed a sieved nonparametric maximum likelihood estimator. Further developments along the lines of efficient estimation can be found in Subramanian (2004, 2006) and Wang & Ng (2008). Goetghebeur & Ryan (1995) and Lu & Tsiatis (2001) analyzed competing risks data with missing cause of failure under proportional hazards regression models. Gao & Tsiatis (2005) considered the linear transformation competing risks model with missing cause of failure. Recently, Lu & Liang (2008) studied competing risks data with missing cause of failure under the semiparametric additive hazards model, and suggested the inverse probability weighted (IPW) and double robust (DR) estimators. To obtain these estimators, however, they imposed parametric models for two components: the probability that the censoring indicator is not missing and the conditional probability of a given failure type.
In this article, we propose estimators for the regression parameters in a semiparametric additive hazards model, where the failure times are subject to right censoring and some censoring indicators are missing at random. We provide simple and fully augmented weighted estimators that incorporate incomplete data nonparametrically. Unlike Lu & Liang (2008), no parametric models are assumed for the missingness probability or the conditional probability of an uncensored observation; instead, we use nonparametric kernel smoothing techniques to estimate these probabilities. The resulting estimators have closed forms and are easy to implement. Under the usual MAR assumption, both the simple and fully augmented weighted estimators are consistent and asymptotically equivalent, i.e., they have the same asymptotic normal distribution. In addition, the asymptotic properties of the estimated baseline cumulative hazard function are also established for the model.
The remainder of the paper is organized as follows. Section 2 presents the simple and fully augmented weighted estimators and their asymptotic properties under the MAR assumption. Section 3 reports simulation results that show the proposed estimators perform well. In Section 4, our methods are applied to analyze the brain cancer data described earlier. Our concluding remarks follow in Section 5 and technical proofs are relegated to the Appendix.
Under an additive hazards model, the hazard function for failure time T given covariate Z is assumed to be of the form
equation M1
(1)
where λ0(t) is an unspecified baseline hazard function and β0 is a p-vector of unknown regression parameters. In the case where all data are observed, Lin & Ying (1994) introduced a pseudoscore function for the parameter vector β0 and showed that the resulting estimator is consistent and asymptotically normal, with an easily estimated covariance matrix.
When censoring indicators are missing for right-censored data, we observe n independent and identically distributed vectors (Xi, ξi, ξiδi, Zi, Ri) (i = 1, …, n), where ξi is an indicator that δi is not missing, and Ri is an auxiliary covariate that is not used to model the hazard but may be used to describe the probability that δi is missing. The probability that δi is missing is characterized by the distribution of ξi given δi and Wi = (Xi, Zi, Ri), which is Bernoulli with probability P{ξi = 1|δi, Wi = w}. Under the MAR assumption (Little & Rubin 1987), we have
equation M2
(2)
Another function of interest is π(w) = P{δi = 1|Wi = w, ξi = 1}, which is the conditional probability of an uncensored observation, given that δi is observed and Wi = w.
A naive method for estimating β0 is to simply ignore the missing data and to apply the pseudoscore function of Lin & Ying (1994) to the complete data only. Such a procedure (called the complete case estimator) may not only lose efficiency due to discarding incomplete observations, but may also generate biased estimators, even when the censoring indicators are MAR. If either ρ(w) or π(w) is modeled correctly, we can use the approach of Lu & Liang (2008) to obtain the IPW and DR estimators. In many situations, however, knowledge of ρ(w) and π(w) is limited, and thus both models may be misspecified. In this article, no parametric models are assumed for these two probabilities; rather, both are estimated nonparametrically by kernel smoothers. We begin by introducing the simple weighted estimator, which is derived under the MAR assumption.
Because ρ(Wi) is a function of continuous variables such as Xi, we estimate it with the Nadaraya-Watson estimator based on complete observations. Specifically, let d denote the number of continuous elements of Wi and let K be an rth-order (r > d) kernel function of d variables with finite support that satisfies ∫K(u)du = 1, ∫umK(u)du = 0, m = 1,, r − 1, ∫urK(u)du ≠ 0, and ∫K(u)2du < ∞, where u can be a scalar or a vector. If u is a vector, say u = (u1, …, ud)′, then um denotes equation M3. The motivation for using higher-order kernels is to reduce the order of magnitude of the bias of the curve estimator, leading to a faster rate of convergence of the mean integrated squared error (Wand & Schucany 1990). This type of kernel function may be constructed in various manners. For instance, Wand & Schucany (1990) gave a univariate Gaussian-based kernel of order 2r:
equation M4
where [var phi](2r−1)(u1) is the (2r − 1)-th derivative of the standard normal density function [var phi](u1). Hall & Marron (1988) proposed a class of univariate kernels of order r:
equation M5
Some higher-order polynomial kernels can be found in Müller (1984) and Gasser, Müller & Mammitzsch (1985).
Define Kh(·) = K/h), where h is a bandwidth sequence, and K(u/h) = K(u1/h,, ud/h) for u = (u1, …, ud)′. Write Wi = (W1i, W2i), where W1i and W2i include all continuous and discrete elements of Wi, respectively. Then the Nadaraya-Watson estimator of ρ(w) is given by
equation M6
(3)
where w = (w1, w2). The choice of the kernel function K usually has little effect on the estimator [rho with circumflex](w), and thus the estimator of β0, but the bandwidth sequence h typically does influence these estimators, both theoretically and practically. We assume that h satisfies nh2r → 0 and nh2d → ∞ as n → ∞. If h = O(n−1/p) for some integer p > 2d, then a reasonable choice for r is the smallest even integer such that rpd (Qi, Wang & Prentice 2005). For example, when d = 2, we might choose p = 5 and r = 4. In a similar manner, we can estimate π(w) by
equation M7
(4)
Note that the kernel function K and bandwidth sequence h used in (3) need not be identical to those used in (4), and the bandwidth can be different for each component of W1i. For example, we can define h = (h1, …, hd)′ for different bandwidths, and write K(u/h) = K(u1/h1, …, ud/hd). Here, we use the same K and h in both for notational convenience.
Let equation M8 denote the baseline cumulative hazard function. Using the inverse probability weighted approach, consider the following estimating equations for β0 and Λ0:
equation M9
(5)
equation M10
(6)
where equation M11, Yi(t) = I(Xit), and τ is a prespecified positive constant such that P(Xiτ) > 0. The resulting simple weighted estimators for β0 and Λ0 have the following closed forms:
equation M12
and
equation M13
where a[multiply sign in circle]2 = aa′ for any vector a and
equation M14
In practice, we often choose τ to be the largest observation time, say τ = max{Xi}.
Let [z macron](t) = E[Yi(t)Zi(t)]/E[Yi(t)]. Define Ni(t) = I(Xit) and equation M15. The asymptotic properties of [beta] are given in the following theorem.
Theorem 1
Under regularity conditions (C1)–(C6), which are stated in the Appendix, [beta] is consistent and n1/2([beta]−β0) is asymptotically normal with mean zero and covariance matrix V = A−1ΣA−1 + A−1Σ*A−1, where
equation M16
and
equation M17
Note that the first term in V is the asymptotic variance of the Lin & Ying (1994) estimator based only on the complete data (ξi [equivalent] 1) and the second term represents the effect of the missing censoring indicators. If we let equation M18, then the covariance matrix V can be consistently estimated by V = Â−1([Sigma] + [Sigma]*)Â−1, where
equation M19
and
equation M20
Define equation M21 and
equation M22
The asymptotic properties of [Lambda with circumflex]0(t) are given in the next theorem.
Theorem 2
Under the assumptions of Theorem 1, [Lambda with circumflex]0(t) converges in probability to Λ0(t) uniformly in t [set membership] [0, τ], and n1/2{[Lambda with circumflex]0(t) − Λ0(t)g converges weakly on [0, τ] to a zero-mean Gaussian process with covariance function at (t, s) (t ≤ s) equal to
equation M23
The covariance function Γ(t, s) can be consistently estimated by substituting [beta], [rho with circumflex] and [pi] for the unknowns β0, ρ and π in the appropriate empirical estimators, and by replacing the (unobserved) processes equation M24 with equation M25. For an individual with a given covariate vector z0, the corresponding estimator of the survival function S(t, z0) is
equation M26
Using the functional delta-method and Theorem 2, we can obtain the asymptotic properties of Ŝ(t, z0), which can be applied to construct confidence bands for S(t, z0).
When the missingness probability ρ(w) is known or a parametric model is specified for ρ(w), the simple weighted estimator uses only the complete case data (i.e., only individuals with ξi = 1), and the fully augmented weighted estimator (also called the double robust estimator) incorporates contributions from the incomplete observations (i.e., individuals with ξi = 0), thus the fully augmented weighted estimator is more efficient than the corresponding simple weighted estimator (Lu & Liang 2008). In addition, the fully augmented weighted estimator has the so-called double-robustness property; that is, the estimator is consistent if one can correctly specify either the missingness probability ρ(w) or the conditional probability of an uncensored observation π(w) (Wang & Chen 2001). However, estimating ρ(w) nonparametrically enables the simple weighted estimator [beta] to follow the same asymptotic distribution as the fully augmented weighted estimator [beta]a (described next). This indicates that [beta] is equivalent asymptotically to [beta]a. These conclusions are consistent with the results of Qi, Wang & Prentice (2005) for proportional hazards regression with missing covariates.
The fully augmented weighted estimators for β0 and Λ0 are the solutions to the following estimating equations:
equation M27
(7)
equation M28
(8)
The resulting fully augmented weighted estimators for β0 and Λ0 have the following closed forms:
equation M29
and
equation M30
where
equation M31
Similar to Theorems 1 and 2, the asymptotic properties of [beta]a and [Lambda with circumflex]a are given in the following theorem.
Theorem 3
Under the assumptions of Theorem 1, we have:
  • [beta]a is consistent and n1/2([beta]a − β0) is asymptotically normal with mean zero and covariance matrix V:
  • [Lambda with circumflex]a(t) converges in probability to Λ0(t) uniformly in t [set membership] [0, τ], and n1/2{[Lambda with circumflex]a(t) − Λ0(t)} converges weakly on [0, τ] to a zero-mean Gaussian process with covariance function Γ(t, s) at (t, s) (t ≤ s).
For the fully augmented weighted method, the covariance matrix V and covariance function Γ(t, s) can be consistently estimated by substituting [beta]a, [rho with circumflex] and [pi] for β0, ρ and π in the appropriate empirical estimators, and by replacing the processes equation M32 with equation M33.
Theorems 1, 2 and 3 show that both the simple and fully augmented weighted estimators have the same asymptotic normal distribution, and the resulting estimators of the baseline cumulative hazard function converge to the same Gaussian process. This means that the simple weighted estimators with nonparametric [rho with circumflex](w) are as efficient as the kernel-assisted fully augmented weighted estimators. One intuitive explanation for this is that the incomplete observations are indirectly incorporated in the simple weighted estimator by using the inverse of [rho with circumflex](w) as a weight.
Note that [Lambda with circumflex]0(t) and [Lambda with circumflex]a(t) may not always be monotonic in t, in which case simple modifications such as those discussed in Lin & Ying (1994) can be made to ensure monotonicity while preserving asymptotic properties.
We conducted simulation studies to examine and compare the finite-sample performance of the simple and fully augmented weighted estimators proposed in Section 2, and also to compare their performance with that of the full data and complete-case analyses under the MAR model. In these studies, we considered three situations for the covariate Z: (a) Z was assumed to follow a Bernoulli distribution with success probability 0.5; (b) Z was generated from a uniform distribution on (0, 1); (c) Z = (Z1, Z2)′, where Z1 follows a uniform distribution on (0, 1) and Z2 follows a Bernoulli distribution with success probability 0.5. The underlying additive hazards model for the failure time T was taken to be equation M34, where β0 = 0, 0.5 and 1 for the case Z is a scalar, and β0 = (0, 0)′ and β0 = (1, −1)′ for the two-dimensional covariate. The censoring time C was generated from a uniform distribution on (0, c), where c was selected to give a censoring rate of either 15% or 55%.
The missingness indicators were generated from the logistic model
equation M35
(9)
where W = (X, Z), X = T ^ C, and θ was chosen to produce a missingness rate of 50% under each censoring level. When Z was a Bernoulli random variable, there was only one (d = 1) continuous element in W, and we used the univariate Gaussian kernel function K(u) = (2π)−1/2 exp(−u2/2) and a bandwidth of h = 0.5n−1/3, with sample size of n = 100. When Z was a uniform random variable or a two-dimensional covariate as in (c), there were two (d = 2) continuous elements in W, and we used the bivariate Gaussian-based kernel function of order 4 (Wand & Schucany 1990)
equation M36
(10)
and a bandwidth vector of h = (h1, h2)′ = (1.5n−1/5, n−1/5)′, with sample size of n = 400. We took τ to be the largest observed value of X, so that all data were used in the analysis. All simulation studies were based on 1000 replications for each combination of parameters.
Our simulation results are summarized in Tables 1 and and2.2. In these tables, Bias is the sample mean of the estimate minus the true value; MSE is the sample mean of the squared differences between the estimate and the true value; and CP is the 95% empirical coverage probability for β0 based on a normal approximation. Similar summaries for the full-data and complete-case estimators are calculated for comparison.
Table 1
Table 1
Simulation results for one covariate with a missingness rate of 50%
Table 2
Table 2
Simulation results for a two-dimensional covariate vector with a missingness rate of 50%
Tables 1 and and22 show that the complete case estimator is highly biased in all situations, with coverage probabilities that are too small, whereas the simple and fully augmented weighted estimators are nearly unbiased, with very reasonable coverage probabilities. Furthermore, the simple and fully augmented weighted estimators have similar MSE values, which are only slightly larger than those of the full data estimator and are often much smaller than those of the complete case estimator. These results suggest that our proposed estimators are more efficient than the complete case estimator and are adequate for practical use. We also simulated data under different parameter configurations and obtained similar results.
We compared the proposed methods and the parametric approach of Lu & Liang (2008) under MAR and MCAR assumptions. Data were simulated under correctly and incorrectly specified parametric models, using the same setup as in Table 1 with a censoring rate of 55% and a missingness rate of 50%, where Z follows a Bernoulli distribution with a sample size of n = 200 and β0 = 0 and 1. The results are presented in Table 3. In Table 3, LIPW1 and LIPW2 stand for the inverse probability weighted (IPW) estimators of Lu & Liang (2008) when using the logistic model and the constant model for ρ(w), respectively; LDR1 and LDR2 stand for the double robust (DR) estimators of Lu & Liang (2008) when using the logistic model and the constant model for ρ(w), respectively. In all cases, we used a constant model for π(w), which is misspecified.
Table 3
Table 3
Comparison of the proposed method with the parametric approach of Lu and Liang (2008) under MAR and MCAR for a missingness rate of 50%
It can be seen from Table 3 that the proposed methods are essentially unbiased in all the settings, and the parametric approach of Lu & Liang (2008) is also unbiased when the parametric model for ρ(w) is correctly specified. Furthermore, the proposed estimators are as efficient as the DR estimator of Lu & Liang (2008), and are more efficient than the IPW estimator of Lu & Liang (2008). When both ρ(w) and π(w) are misspecified, however, both the IPW and DR estimators of Lu & Liang (2008) are biased under MAR. The key advantage of our method is that it provides reasonable estimation without making parametric modeling assumptions about ρ(w) and π(w). Rather than assuming parametric models, our approach uses nonparametric smoothing techniques to estimate these probabilities. In addition, the proposed estimators are more efficient than the complete case estimator under MCAR. So, if MCAR is true, our proposed approach still works well and does not lose efficiency.
We also conducted simulation studies to examine the performance of the proposed methods when MNAR (missing not at random) is true. In the study, the setup was the same as in Table 1, where Z follows a uniform distribution on (0, 1) with β0 = 0 and n = 400, except that the censoring rate was set to be 20%, and the missingness probability was given by
equation M37
where θ1 and θ2 were chosen to produce a missingness rate of either 20% or 50%. The results are summarized in Table 4. It can be seen from Table 4 that the proposed estimation procedures perform well when the missingness rate is low (say, 20%), but when the missingness rate is high (say, 50%), the proposed estimators are a little biased. However, the biases are relatively small compared to those of the complete case estimator.
Table 4
Table 4
Simulation results for the proposed method under MNAR
We applied our methods to the brain cancer data mentioned earlier. We analyzed the data on all 387 patients who entered the clinical trial with a form of brain cancer known as glioblastoma. Dinse (1982) used a subset of these data to illustrate his nonparametric maximum likelihood analysis, which did not account for covariates. All patients were ambulatory when they entered the trial, but over time some lost their mobility, some had a progression of their cancer, and some experienced both events. As a measure of quality of life, we defined “survival time” as the time to non-ambulatory progression, and we evaluated the effects of sex and age on this event time.
Of the 387 patients, 86 progressed and were non-ambulatory, 24 progressed but were still ambulatory, 220 did not progress by the end of the study, and 57 progressed but had an unknown ambulatory status. Thus, our analysis treated these outcomes as 86 uncensored times, 244 censored times, and 57 times with a missing censoring indicator. There were 144 women and 243 men, ranging in age from 14 to 74 years, and the length of time on study (or until progression) varied from 2 to 1088 days.
Let X be the observed time (in days), measured from the beginning of the trial, and let δ indicate whether the patient had progressed and was non-ambulatory. We defined Z1 to be a binary indicator of the patient’s sex, which was 1 for men and 0 for women, and Z2 to be the age at trial entry (in years), which was treated as a continuous covariate. Since W = (X, Z1, Z2) contains two continuous elements, we used the bivariate Gaussian-based kernel function of order 4 for K, as defined in (10), with a bandwidth vector of h = (h1, h2)′ = (34, 10)′. We used τ = 1088, which was the largest observed value of X.
The analysis of the brain cancer data is summarized in Table 5, which gives the results for our simple weighted estimator (SWE) and our fully augmented weighted estimator (FAWE). For comparison, Table 5 also gives the results of the complete case (CC) analysis. None of the three methods suggested that men and women had different hazard rates for non-ambulatory progression. On the other hand, our two estimators showed that age is important (p = 0.037 for SWE and p = 0.011 for FAWE), but the CC analysis did not (p = 0.367). Specifically, the hazard rate for non-ambulatory progression increased as patients grew older, which is consistent with worsening quality of life. The age coefficients were of similar magnitude for all three methods, but the standard error was much larger for the CC analysis than for our SWE and FAWE analyses. Thus, as a result of excluding data, the complete case analysis missed the age effect on non-ambulatory progression that our approaches appropriately identified.
Table 5
Table 5
Estimates of regression coefficients for sex and age (in years), along with estimated standard errors and significance levels, from the analysis of time (in days) to non-ambulatory progression for patients in the brain cancer clinical trial.
Model (1) has the limitation that the linear predictor equation M38 needs to be constrained to ensure non-negativity for the right side of (1). One may avoid this constraint by using a nonnegative link function, such as equation M39. The ideas presented in this paper can be applied to any regression function equation M40, where g(·) is a known link function. In addition, Our approach can be extended to incorporate missing covariates (Qi, Wang & Prentice 2005) in the situation where both the failure indicators and the covariates are partially observed.
Nonparametric kernel estimation can be done for a small number of continuous covariates, but for categorical covariates, it would usually require stratified kernel estimation within each strata defined by the categorical covariates. In practice, when there are too many categories, it may be desirable to specify a more flexible model for the missingness probability, such as a partially linear additive model, and then use local kernel regression to estimate the missingness probability. Here we focus on a kernel estimation approach for ρ(w) and π(w). Of course, other smoothing techniques such as the local polynomial method (Fan & Gijbels 1996) may be used and require the same assumptions. Furthermore, n1/2-rate asymptotic normality of the proposed estimators indicates that an appropriate choice for the bandwidth sequence h depends only on the second order terms of the mean square error of the estimators, and thus bandwidth selection may not be critical for estimating β0 and Λ0.
Since the estimating functions in (5) to (8) were obtained in a somewhat ad hoc fashion, it might be worthwhile investigating possible improvements that could result from other approaches, such as the one suggested by McKeague & Sasieni (1994) or perhaps a nonparametric maximum likelihood approach. Alternatively, estimation procedures based on the general Aalen additive model (Aalen 1980) or the linear transformation model (Gao & Tsiatis 2005) with missing censoring information might also be worthy of investigation.
Another limitation of the approach given here is that the covariates Z are time-invariant. In some applications, we might want to incorporate time-dependent covariates. Thus, a more general approach might extend model (1) to a time-varying version:
equation M41
where β0(t) is an unknown p-vector of time-varying regression coefficients and Z(t) is a vector of covariates that may depend on time. However, the proposed estimation procedure cannot be extended in a straightforward manner to deal with time-dependent covariates because of the curse of dimensionality created by Z(t) and a need for alternative smoothing techniques for estimating β0(t). In addition, when the dimension of Z(t) is high, the probabilities ρ(w) and π(w) can be modeled parametrically (Lu & Liang 2008). As a different approach, perhaps dimension-reduction techniques could be extended in conjunction with a partially linear model (Liang, Härdle & Carroll 1999) for ρ(w) and π(w).
Acknowledgments
The authors would like to thank the Editor (Paul Gustafson), the Associate Editor, two reviewers and Shyamal Peddada for their constructive and insightful comments and suggestions that greatly improved the paper. Xinyuan Song’s research was fully supported by two grants from the Research Grant Council of the Hong Kong Special Administration Region. Liuquan Sun’s research was fully supported by the National Natural Science Foundation of China Grants, the National Basic Research Program of China (973 Program) and Key Laboratory of RCSDS, CAS. Gregg Dinse’s research was supported by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences.
APPENDIX
We will use the same notation defined in the previous sections and assume that the following regularity conditions hold:
  • (C1) Λ0(τ) < ∞, Pr(Xτ) > 0, Z is bounded, and equation M42 a.e.
  • (C2) The probability (density) f(w) of Wi is bounded away from 0, and has r continuous and bounded partial derivatives with respect to the continuous components of Wi a.e.
  • (C3) The missingness probability ρ(w) is bounded away from 0, and has r continuous and bounded partial derivatives with respect to the continuous components of Wi a.e.
  • (C4) The conditional probability π(w) has r continuous and bounded partial derivatives with respect to the continuous components of Wi a.e.
  • (C5) equation M43 is nonsingular.
  • (C6) nh2r → 0 and nh2d → ∞, as n → ∞.
We give the proof of Theorem 3 and outline the proof of Theorem 1; Theorem 2 can be proven in the same manner. For notational convenience, we assume that all components of Wi are continuous in the following proof.
Proof of Theorem 3(i)
Substituting [Lambda with circumflex]a into equation (7), we find that [beta]a is the solution to U (β) = 0, where
equation M44
Let equation M45. Then we can write
equation M46
(A.1)
where
equation M47
and
equation M48
Note that U1(β0) is a martingale integral. Thus, it follows that
equation M49
(A.2)
Define equation M50, and write Φn(t) = Φn1(t) + Φn2(t) + Φn3(t), where
equation M51
and
equation M52
By the uniform strong law of large numbers (Pollard, 1990), sup0≤tτn1(t)| = o(1) almost surely. It can be checked that
equation M53
where f(w) = (nhd)−1Kh(wWi), which is a kernel density estimate of f(w). By a Taylor expansion of 1/f(Wi) about f (Wi), Φn2(t) can be written as Φn21(t) − Φn22(t) + op(1), where
equation M54
and
equation M55
A straightforward calculation yields that En21(t)} = O(hr) → 0, and Varn21(t)} = O(h2r + (nh2d)−1) → 0, which imply Φn21(t) = op(1). Similarly, Φn22(t) = op(1), and thus it follows that Φn2(t) = op(1). Likewise, Φn3(t) = op(1). Therefore, we have Φn(t) = op(1). Note that Φn(t) is monotone and bounded in t. Consequently, we obtain
equation M56
(A.3)
The functional central limit theorem (Pollard 1990) implies that
equation M57
(A.4)
Using (A.3) and (A.4), we have
equation M58
Hence,
equation M59
(A.5)
In a similar manner, we obtain
equation M60
(A.6)
Thus, it follows from (A.1), (A.2), (A.5) and (A.6) that
equation M61
where
equation M62
and
equation M63
Let m(w) = ρ(w)f(w) and equation M64. Then by the Taylor expansion of 1/mWi) at m(Wi), we can write
equation M65
where
equation M66
and
equation M67
Define
equation M68
Some straightforward calculation gives equation M69, and equation M70, which imply that
equation M71
Similarly, we have Rn12 = op(n1/2), and thus
equation M72
(A.7)
In a similar manner, we obtain
equation M73
(A.8)
It follows from (A.7) and (A.8) that Rn1 = op(n1/2). Likewise, Rn2 = op(n1/2). Thus,
equation M74
(A.9)
The law of large numbers and the multivariate central limit theorem show that n−1U(β0) → 0 in probability and n−1/2U(β0) converges in distribution to a normal random variable with mean zero and variance matrix Σ + Σ*. Note that
equation M75
and
equation M76
almost surely by the uniform strong law of large numbers (Pollard 1990). Then it follows from (A.9) that [beta]a is consistent and n1/2([beta]aβ0) is asymptotically normal with mean zero and covariance matrix V = A−1(Σ + Σ*)A−1.
Proof of Theorem 3(ii)
First write
equation M77
Note that
equation M78
Following similar arguments as in the proof of (i), we obtain
equation M79
(A.10)
uniformly on [0, τ]. In view of the consistency of [beta]a, it follows from the uniform strong law of large numbers and the multivariate central limit theorem that sup0≤tτ |[Lambda with circumflex]a(t) − Λ0(t)| → 0 in probability, and n1/2{[Lambda with circumflex]a(t) − Λ0(t)} converges in finite dimensional distributions to a zero-mean Gaussian process. The first term on the right-hand side of (A.10) is tight as it is a martingale integral. The second term is tight because n1/2([beta]aβ0) converges in distribution and d(t) is a deterministic function. Note that for each i, (ξiρ(Wi)−1 − 1) (δiπ(Wi)) equation M80 can be written as sums of monotone processes. Then the tightness of the third term follows from Example 2.11.16 of van der Vaart & Wellner (1996). Thus, n1/2{[Lambda with circumflex]a(t)− Λ0(t)} is tight and converges weakly to a zero-mean Gaussian process with covariance function Γ(s, t) at (s, t).
Outlined proof of Theorem 1
Note that [beta] is the solution to U*(β) = 0, where
equation M81
Then it can be checked that
equation M82
(A.11)
where
equation M83
and
equation M84
Similarly to (A.2), we get
equation M85
(A.12)
From an argument similar to that in the proof of (A.7), we have
equation M86
(A.13)
It follows from (A.11)–(A.13) that
equation M87
which implies that n−1/2U*(β0) converges in distribution to a normal random variable with mean zero and variance matrix Σ + Σ*. Then it follows from the Taylor expansion of U*([beta]) that n1/2([beta]β0) is asymptotically normal with mean zero and covariance matrix V = A−1(Σ + Σ*)A−1.
Footnotes
MSC 2000: Primary 62N01; secondary 62G05.
Contributor Information
Xinyuan SONG, Department of Statistics, Shatin, N. T., Hong Kong, P. R. China.
Liuquan SUN, Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, P.R. China.
Xiaoyun MU, Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, P.R. China.
Gregg E. DINSE, Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA.
  • Aalen OO. A model for nonparametric regression analysis of counting processes. In: Klonecki W, Kozek A, Rosinski J, editors. Mathematical Statistics and Probability Theory, Lecture Notes in Statistics. 2. Springer-Verlag; New York: 1980. pp. 1–25.
  • Dinse GE. Nonparametric estimation for partially-complete time and type of failure data. Biometrics. 1982;38:417–431. [PubMed]
  • Fan J, Gijbels I. Local Polynomial Modelling and Its Applications. Chapman & Hall; London: 1996.
  • Gao G, Tsiatis AA. Semiparametric estimators for the regression coefficients in the linear transformation competing risks model with missing cause of failure. Biometrika. 2005;92:875–891.
  • Gasser T, Müller HG, Mammitzsch V. Kernels for nonparametric curve estimation. Journal of the Royal Statistical Society Series B. 1985;47:238–252.
  • Gijbels I, Lin D, Ying Z. Tech Report 039–93. Mathematical Sciences Research Institute; Berkeley: 1993. Non- and semi-parametric analysis of failure time data with missing failure indicators.
  • Gijbels I, Lin D, Ying Z. Non- and semi-parametric analysis of failure time data with missing failure indicators. IMS Lecture Notes-Monograph Series. Inverse Problems: Tomography, Networks and Beyond. 2007;54:203–223.
  • Goetghebeur EJ, Ryan L. Analysis of competing risks survival data when some failure types are missing. Biometrika. 1995;82:821–833.
  • Hall P, Marron JS. Choice of kernel order in density estimation. The Annals of Statistics. 1988;16:161–173.
  • van der Laan MJ, McKeague IW. Efficient estimation from right-censored data when failure indicators are missing at random. The Annals of Statistics. 1998;26:164–182.
  • Liang H, Härdle W, Carroll RJ. Estimation in a semiparametric partially linear errors-invariable model. The Annals of Statistics. 1999;27:1519–1535.
  • Lin DY, Ying Z. Semiparametric analysis of the additive risk model. Biometrika. 1994;81:61–71.
  • Little RJA, Rubin DB. Statistical analysis with missing data. Wiley; New York: 1987.
  • Lo S-H. Estimating a survival function with incomplete cause-of-death data. Journal of Multivariate Analysis. 1991;39:217–235.
  • Lu K, Tsiatis AA. Multiple imputation methods for estimating regression coefficients in the competing risks model with missing cause of failure. Biometrics. 2001;57:1191–1197. [PubMed]
  • Lu W, Liang Y. Analysis of competing risks data with missing cause of failure under additive hazards model. Statistica Sinica. 2008;18:219–234.
  • McKeague IW, Sasieni PD. A partly parametric additive risk model. Biometrika. 1994;81:501–514.
  • McKeague IW, Subramanian S. Product-limit estimators and Cox regression with missing censoring information. Scandinavian Journal of Statistics. 1998;25:589–601.
  • Müller HG. Smooth optimum kernel estimators of densities, regression curves and modes. The Annals of Statistics. 1984;12:766–774.
  • Pollard D. Empirical Processes: Theory and Applications. Institute of Mathematical Statistics; Hayward, California: 1990.
  • Qi L, Wang CY, Prentice RL. Weighted estimators for proportional hazards regression with missing covariates. Journal of the American Statistical Association. 2005;100:1250–1263.
  • Subramanian S. Efficient estimation of regression coefficients and baseline hazard under proportionality of conditional hazards. Journal of Statistical Planning and Inference. 2000;84:81–94.
  • Subramanian S. Asymptotically efficient estimation of a survival function in the missing censoring indicator model. Journal of Nonparametric Statistics. 2004;16:797–817.
  • Subramanian S. Survival analysis for the missing censoring indicator model using kernel density estimation techniques. Statistical Methodology. 2006;3:125–136. [PMC free article] [PubMed]
  • van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. Spring-Verlag; New York: 1996.
  • Wand MP, Schucany WR. Gaussian-based kernels. The Canadian Journal of Statistics. 1990;18:197–204.
  • Wang CY, Chen HY. Augmented inverse probability weighted estimator for Cox missing covariate regression. Biometrics. 2001;57:414–419. [PubMed]
  • Wang Q, Ng KW. Asymptotically efficient product-limit estimators with censoring indicators missing at random. Statistica Sinica. 2008;18:749–768.
  • Zhou X, Sun L. Additive hazards regression with missing censoring information. Statistica Sinica. 2003;13:1237–1257.