Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3020262

Formats

Article sections

- Abstract
- 1 Introduction
- 2 Least squares regression calibration estimator
- 3 Least squares imputation estimator
- 4 Least squares inverse probability weighted estimator
- 5 Simulation study
- 6 Example: analysis of glioblastoma data
- 7 Discussion
- References

Authors

Related links

Lifetime Data Anal. Author manuscript; available in PMC 2012 April 1.

Published in final edited form as:

Published online 2010 June 18. doi: 10.1007/s10985-010-9175-8

PMCID: PMC3020262

NIHMSID: NIHMS223231

Qihua Wang, Department of Mathematics and Statistics, Yunnan University, Kunming 650091, China; Academy of Mathematics and Systems Science, Chinese Academy of Science, Beijing 100190, China;

Gregg E. Dinse: vog.hin.shein@esnid

The publisher's final edited version of this article is available at Lifetime Data Anal

Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial.

Linear regression is a popular statistical tool that has been used successfully in many areas. In survival analysis, a log-transformation of the response variable converts a conventional linear model to an accelerated failure time model, which is an appealing alternative to the Cox (1972) proportional hazards model because of its direct interpretation (*cf*. Reid 1994). Linear regression has been studied extensively for the analysis of randomly right censored data, and various procedures have been developed; see, for example, Miller (1976), Buckley and James (1979), Koul et al. (1981), Leurgans (1987), Ritov (1990), Tsiatis (1990), Lai and Ying (1991); Ying (1993), Jin et al. (2003), and Li and Wang (2003). All of these methods assume that every censoring indicator is observed.

In practice, some censoring indicators may be unobserved, and we develop methods for this situation. For example, we analyzed data from a clinical trial involving patients with glioblastoma, a form of brain cancer. One measure of post-treatment quality of life is the length of time until a patient declines to a state in which mobility is lost and the cancer has returned. We evaluated data on adult glioblastoma patients over 50 years of age. Progression status was always known, but ambulatory status was unknown in some cases. By the end of the study, some patients were non-ambulatory and had progressed, some had progressed but remained ambulatory, some had progressed but their ambulatory status was unrecorded, and some had not progressed. Our analysis focused on the time to non-ambulatory progression, and these four types of events contributed uncensored observations, censored observations, observations with a missing censoring indicator, and censored observations, respectively. In particular, we were interested in how sex and age affected the time to non-ambulatory progression. Our methods have applications in many other areas, including analyses with missing death certificates in epidemiological studies, unperformed necropsies in animal experiments, and unidentified component failures in quality control settings.

The methods we propose are formulated in the context of a censored survival problem with missing censoring indicators, but they can be used to analyze competing risks data on event-specific failure times when some event indicators are unknown. In practice, the usual competing risks analysis focuses on a particular event type of interest and treats the times to failure from other events as censored observations on the event time of interest. Thus, one can view an unknown failure type as a missing censoring indicator and use our methods.

If censoring indicators are missing, standard estimators for censored data are not directly applicable. One solution is to ignore cases with missing data and apply conventional analyses to the complete cases only. However, the efficiency of this complete-case approach decreases as the degree of missingness increases. Also, the complete-case estimator is consistent only when censoring indicators are missing completely at random (MCAR). Alternatively, some analyses directly incorporate survival data with missing censoring indicators. For example, Dinse (1982); Lo (1991); McKeague and Subramanian (1998), and Subramanian (2004, 2006) proposed survival estimators. Within the regression context, Goetghebeur and Ryan (1990, 1995); Dewanji (1992); Lu and Tsiatis (2001), and Tsiatis et al. (2002) developed methods under a proportional hazards model; Zhou and Sun (2003) and Lu and Liang (2008) worked with an additive hazards model; and Gao and Tsiatis (2005) employed a linear transformation competing risks model. Linear regression techniques, however, have received little attention in situations where censoring indicators are missing, and thus we now focus on that particular problem.

Suppose there are *n* subjects and their observations follow the linear regression model

$${Y}_{i}={X}_{i}^{}$$

where *Y _{i}* is a response variable,

We observe {*Ỹ*, *X*, δ} for subjects with complete data (ξ = 1) and {*Ỹ*, *X*} for subjects with a missing censoring indicator (ξ = 0). Typically the missingness mechanism is assumed to produce data that are missing at random (MAR), which implies *pr* (ξ = 1|*Ỹ*, *X*, δ) = *pr* (ξ = 1|*Ỹ*, *X*). Alternatively, we assume *pr* (ξ = 1|*Ỹ*, *X*, δ) = *pr* (ξ = 1|*Ỹ*), which is more restrictive than the MAR assumption but less restrictive than the common MCAR assumption that *pr* (ξ = 1|*Ỹ, X*, δ) = *pr* (ξ = 1). We made an assumption stricter than MAR to avoid postulating a parametric model for *pr* (ξ = 1|*Ỹ, X*) and to sidestep problems encountered when applying a nonparametric method to sparse data, such as can arise in high-dimensional situations. Under the weaker MAR assumption, we expect to obtain similar results for a reasonable model in the parametric case or with sufficient data in the nonparametric case, but we leave such analyses for future work.

In this paper, we propose three estimators for β. Our methods extend an approach taken by Koul et al. (1981) when all censoring indicators are observed. They used the formula

$${Y}_{i}^{*}=\frac{{\delta}_{i}{\tilde{Y}}_{i}}{1-{G}_{n}({\tilde{Y}}_{i})}$$

(1)

to transform observed data (*Ỹ _{i}*, δ

The article is organized as follows. Section 2 develops a synthetic data approach based on regression calibration, which sets *p _{i}* 0. Sections 3 and 4 define imputation and inverse probability weighted estimators, which set

If we define ${Z}_{i}^{}$ and *m*(*z _{i}*) =

$$E({Y}_{i,R}|{X}_{i})==E\phantom{\rule{thinmathspace}{0ex}}\left[E\left\{\frac{{\tilde{Y}}_{i}{\delta}_{i}}{1-G({\tilde{Y}}_{i})}|{\tilde{Y}}_{i},{X}_{i}\right\}|{X}_{i}\right]==E({Y}_{i}|{X}_{i})={X}_{i}^{}$$

For known *m*(·) and *G*(·), this result suggests the following least squares estimator of β:

$${\tilde{\beta}}_{R}={({\displaystyle \sum _{i=1}^{n}{X}_{i}{X}_{i}^{}})-1}^{{\displaystyle \sum _{i=1}^{n}{X}_{i}{Y}_{i,R}}}$$

In practice, however, *m*(·) and *G*(·) are usually unknown, and hence _{R} is not defined. One simple solution is to calculate _{R} using estimates of *m*(·) and *G*(·).

For example, consider a parametric model for *m*(*z*) of the form *m*_{0}(*z*, θ), where *m*_{0}(·, ·) is a known continuous function and θ is a vector of unknown parameters. The parameter vector θ can be estimated by _{n}, the maximizer of the likelihood

$$\underset{{\{{m}_{0}({Z}_{i},\theta )\}}^{{\xi}_{i}{\delta}_{i}}{\{1-{m}_{0}({Z}_{i},\theta )\}}^{{\xi}_{i}(1-{\delta}_{i})}}{\overset{}{i=1n}},$$

and thus *m*(*z*) can be estimated by _{n}(*z*) = *m*_{0}(*z*, _{n}). As for estimating *G*(·), we begin by defining π(*ỹ*) = *E* (ξ|*Ỹ* = *ỹ*), which can be estimated nonparametrically by

$${\widehat{\pi}}_{n}(\tilde{y})={\displaystyle \sum _{i=1}^{n}{\xi}_{i}W}\phantom{\rule{thinmathspace}{0ex}}\left(\frac{\tilde{y}-{\tilde{Y}}_{i}}{{b}_{n}}\right)/{\displaystyle \sum _{i=1}^{n}W}\phantom{\rule{thinmathspace}{0ex}}\left(\frac{\tilde{y}-{\tilde{Y}}_{i}}{{b}_{n}}\right)\phantom{\rule{thinmathspace}{0ex}},$$

where *W*(·) is a kernel function and *b _{n}* is a bandwidth sequence. Next, we define a Horvitz and Thompson (1952) type inverse probability weighted estimator of

$${\widehat{u}}_{n}(\tilde{y})={\displaystyle \sum _{i=1}^{n}{\delta}_{i}}\phantom{\rule{thinmathspace}{0ex}}\left(\frac{{\xi}_{i}}{{\widehat{\pi}}_{n}({\tilde{Y}}_{i})}\right)\phantom{\rule{thinmathspace}{0ex}}K\phantom{\rule{thinmathspace}{0ex}}\left(\frac{\tilde{y}-{\tilde{Y}}_{i}}{{h}_{n}}\right)/{\displaystyle \sum _{i=1}^{n}\left(\frac{{\xi}_{i}}{{\widehat{\pi}}_{n}({\tilde{Y}}_{i})}\right)\phantom{\rule{thinmathspace}{0ex}}K\phantom{\rule{thinmathspace}{0ex}}}\left(\frac{\tilde{y}-{\tilde{Y}}_{i}}{{h}_{n}}\right)\phantom{\rule{thinmathspace}{0ex}},$$

where *K
*(·) is a kernel function and *h _{n}* is a bandwidth sequence. Finally, similar to Dikta (1998) and Wang and Ng (2008), we define the following estimator of

$$\widehat{{G}_{n}}(\tilde{y})=1-{\displaystyle \underset{{\left(\frac{n-{R}_{i}}{n-{R}_{i}+1}\right)}^{1-{\widehat{u}}_{n}({\tilde{Y}}_{i})},}{i:{\tilde{Y}}_{i}\tilde{y}}}$$

where *R _{i}* denotes the rank of

$$\phantom{\rule{thinmathspace}{0ex}}{\widehat{\beta}}_{R}={({\displaystyle \sum _{i=1}^{n}{X}_{i}{X}_{i}^{}})-1}^{{\displaystyle \sum _{i=1}^{n}{X}_{i}{\widehat{Y}}_{i,R},}}$$

where *Ŷ*_{i,R} is *Y _{i,R}* with

Define ${\sigma}_{r,s}=E\phantom{\rule{thinmathspace}{0ex}}[\frac{\pi (\tilde{Y})}{{m}_{0}(Z,\theta )\{1-{m}_{0}(Z,\theta )\}}\frac{{m}_{0}(Z,\theta ){\theta}_{r}\frac{{m}_{0}(Z,\theta ){\theta}_{s}}{]}}{}$ for 1 ≤ *r, s* ≤ *d* + 1 and assume:

- (C.1)
*m*_{0}(*z*, θ) has bounded first-order derivatives on θ. - (C.2) σ
_{r,s}< ∞ for 1 ≤*r*,*s*≤*d*+ 1 and matrix*A*(θ) = (σ_{r,s}) is positive definite. - (C.3)
*E*{‖*X*‖^{2}*Ỹ*^{2}/^{2}(*Ỹ*)} < ∞, where*H*(*t*) =*pr*(*Ỹ*≤*t*) and (*t*) = 1 −*H*(*t*). - (C.4)
*E*{‖*X*‖^{2}} < ∞. - (C.5) π(·) has a bounded first-order derivative.
- (C.6)
*W*(·) is of order 1 with bounded support. - (C.7)
*nb*→ ∞ and $n{b}_{n}^{2}\to 0$._{n}

The following theorem addresses the asymptotic normality of _{R}.

**Theorem 2.1** *Under assumptions (C.1), (C.2), (C.3) and (C.4), we have*

$${n}^{1/2}(\widehat{{\beta}_{R}}-\beta )\stackrel{}{\to}$$

*where V _{R}* = Σ

*The structure of* Ω_{R} *is complex; thus, estimating V _{R} by replacing the unknowns in* Ω

$$\phantom{\rule{thinmathspace}{0ex}}\widehat{{V}_{J}}={({\displaystyle \sum _{i=1}^{n}{X}_{i}}{X}_{i}^{})-1}^{[{\displaystyle \sum _{i=1}^{n}{({\widehat{Y}}_{i,R}-{X}_{i}^{})2}^{{X}_{i}}]}\phantom{\rule{thinmathspace}{0ex}}{({\displaystyle \sum _{i=1}^{n}{X}_{i}}{X}_{i}^{})-1}^{.}}$$

(2)

**Theorem 2.2** *Under assumptions (C.1), (C.2), (C.3) and (C.4), we have*

$$n{\widehat{V}}_{J}\stackrel{p}{\to}{V}_{R}.$$

*Proofs of Theorem 2.1 and Theorem 2.2 are given in the Appendix*.

Imputation is a popular method for dealing with missing data, and this section describes such an approach. When some censoring indicators are missing, some synthetic data in (1) are not defined. However, in the expression for ${Y}_{i}^{*}$, we can replace *G _{n}*(·) by

$$\begin{array}{cc}{\widehat{Y}}_{i,I}=\frac{\{{\xi}_{i}{\delta}_{i}+(1-{\xi}_{i}){\widehat{m}}_{n}({Z}_{i})\}{\tilde{Y}}_{i}}{1-{\widehat{G}}_{n}({\tilde{Y}}_{i})}& \text{i=1,,n.}\end{array}$$

Analogous to Sect. 2, we define a least squares imputation estimator for β:

$${\widehat{\beta}}_{I}={({\displaystyle \sum _{i=1}^{n}{X}_{i}}{X}_{i}^{})-1}^{{\displaystyle \sum _{i=1}^{n}{X}_{i}}}$$

Under the assumed missingness mechanism, _{I} also can be motivated by the relationship

$$E({Y}_{I}|X)=E\phantom{\rule{thinmathspace}{0ex}}\left(\frac{[E(\xi |\tilde{Y})\delta +\{1-E(\xi |\tilde{Y})\}m(Z)]{\tilde{Y}}^{2}}{1-G(\tilde{Y})}|X\right)==E({Y}_{R}|X)={X}^{}$$

where *Y _{I}* = {ξδ + (1 − ξ)

The following theorem addresses the asymptotic normality of _{I}.

**Theorem 3.1** *Under assumptions (C.1), (C.2), (C.3) and (C.4), we have*

$${n}^{1/2}(\widehat{{\beta}_{I}}-\beta )\stackrel{}{\to}$$

*where V _{I}* = Σ

Similar to Sect. 2, we estimate *V _{I}* by the jackknife methods described earlier. Clearly

$$\begin{array}{ll}var({Y}_{I}|X)\hfill & =E\phantom{\rule{thinmathspace}{0ex}}\left(\frac{[E(\xi |\tilde{Y})\delta +\{1-E(\xi |\tilde{Y})\}{m}^{2}(Z)]{\tilde{Y}}^{2}}{{\{1-G(\tilde{Y})\}}^{2}}|X\right)-{E}^{2}({Y}_{I}|X)\hfill \\ \hfill & =E\phantom{\rule{thinmathspace}{0ex}}\left(\frac{[\pi (\tilde{Y})E(\delta |Z)+\{1-\pi (\tilde{Y})\}{m}^{2}(Z)]{\tilde{Y}}^{2}}{{\{1-G(\tilde{Y})\}}^{2}}|X\right)-{E}^{2}({Y}_{R}|X)\hfill \\ \hfill & =E\phantom{\rule{thinmathspace}{0ex}}\left(\frac{\pi (\tilde{Y})m(Z)\phantom{\rule{thinmathspace}{0ex}}\{1-m(Z)\}{\tilde{Y}}^{2}}{{\{1-G(\tilde{Y})\}}^{2}}|X\right)+var({Y}_{R}|X)\ge var({Y}_{R}|X).\hfill \end{array}$$

Inverse probability weighting is another popular approach for dealing with missing data. If we define *Y _{W}* = [{ξ/π(

$$\begin{array}{ll}E({Y}_{W}|X)\hfill & =E\phantom{\rule{thinmathspace}{0ex}}\left(\frac{[\{E(\xi )/\pi (\tilde{Y})\}\delta +\{1-E(\xi )/\pi (\tilde{Y})\}m(Z)]\tilde{Y}}{1-G(\tilde{Y})}|X\right)\hfill \\ \hfill & ==E({Y}_{R}|X)={X}^{}\hfill \end{array}$$

In practice, *m*(·), π(·) and *G*(·) are usually unknown, but we can use estimates of these quantities to calculate inverse probability weighted synthetic data:

$$\begin{array}{cc}{\widehat{Y}}_{i,W}=\frac{\left[\left\{\frac{{\xi}_{i}}{{\widehat{\pi}}_{n}({\tilde{Y}}_{i})}\right\}\phantom{\rule{thinmathspace}{0ex}}{\delta}_{i}+\left\{1-\frac{{\xi}_{i}}{{\widehat{\pi}}_{n}({\tilde{Y}}_{i})}\right\}\phantom{\rule{thinmathspace}{0ex}}{\widehat{m}}_{n}({Z}_{i})\right]\phantom{\rule{thinmathspace}{0ex}}{\tilde{Y}}_{i}}{1-{\widehat{G}}_{n}({\tilde{Y}}_{i})}& \text{i=1,,n.}\end{array}$$

Note that *Ŷ _{i,W}* is very similar to

$${\widehat{\beta}}_{W}={({\displaystyle \sum _{i=1}^{n}{X}_{i}{X}_{i}^{}})-1}^{{\displaystyle \sum _{i=1}^{n}{X}_{i}}}$$

The following theorem addresses the asymptotic normality of _{W}.

**Theorem 4.1** *Under assumptions (C.1) through (C.7), we have*

$${n}^{1/2}({\widehat{\beta}}_{W}-\beta )\stackrel{}{\to}$$

*where V _{W}* = Σ

As described earlier, *V _{W}* can be estimated by jackknife methods. We see

$$\begin{array}{ll}var({Y}_{W}|X)\hfill & =E\phantom{\rule{thinmathspace}{0ex}}[\frac{\left\{\frac{E(\xi |\tilde{Y})\delta}{{\pi}^{2}(\tilde{Y})}+\frac{2E(\xi |\tilde{Y})\delta m(Z)}{\pi (\tilde{Y})}-\frac{2E(\xi |\tilde{Y})\delta m(Z)}{{\pi}^{2}(\tilde{Y})}\right\}{\tilde{Y}}^{2}}{{\{1-G(\tilde{Y})\}}^{2}}\phantom{\rule{thinmathspace}{0ex}}+\frac{\left\{{m}^{2}(Z)-\frac{2E(\xi |\tilde{Y}){m}^{2}(Z)}{\pi (\tilde{Y})}+\frac{E(\xi |\tilde{Y}){m}^{2}(Z)}{{\pi}^{2}(\tilde{Y})}\right\}{\tilde{Y}}^{2}}{{\{1-G(\tilde{Y})\}}^{2}}|X]-{E}^{2}({Y}_{W}|X)\hfill \\ \hfill & =E\phantom{\rule{thinmathspace}{0ex}}[\frac{\left\{\frac{E(\delta |Z)}{\pi (\tilde{Y})}+2E(\delta |Z)m(Z)-\frac{2E(\delta |Z)m(Z)}{\pi (\tilde{Y})}\right\}{\tilde{Y}}^{2}}{{\{1-G(\tilde{Y})\}}^{2}}\phantom{\rule{thinmathspace}{0ex}}+\frac{\left\{{m}^{2}(Z)-2{m}^{2}(Z)+\frac{{m}^{2}(Z)}{\pi (\tilde{Y})}\right\}{\tilde{Y}}^{2}}{{\{1-G(\tilde{Y})\}}^{2}}|X]-{E}^{2}({Y}_{R}|X)\hfill \\ \hfill & =E\phantom{\rule{thinmathspace}{0ex}}\left[\frac{m(Z)\left\{1-m(Z)\right\}{\tilde{Y}}^{2}}{\pi (\tilde{Y}){\left\{1-G(\tilde{Y})\right\}}^{2}}|X\right]+var({Y}_{R}|X)\ge var({Y}_{I}|X)\ge var({Y}_{R}|X).\hfill \end{array}$$

One advantage of this approach is that if either *m*(·) or π(·) is modeled correctly, the estimator _{W} will be consistent, which is the so-called “double-robustness” property (see, *e.g.*, Scharfstein et al. 1999; Lunceford and Davidian 2004; and Wang et al. 2004). Note that we estimate π(·) nonparametrically, and thus _{W} enjoys this robustness even if the model for *m*(·) is not specified correctly. This is an appealing property of _{W}, but must be weighed against its larger asymptotic variance.

The finite-sample properties of the proposed estimators were evaluated via Monte Carlo simulation, with emphasis on the effects of sample size, percentage of observations censored, and percentage of censoring indicators missing. Response data were generated from the linear model: *Y* = α + β*X* + ε, with α = 3 and β = 0.5. The random variables ε, *X*, and *C* were independently generated from N(0,1), U(0,2), and N(μ, 4) distributions, respectively. For each subject, the observed response was *Ỹ* = *min*(*Y*, *C*) and the censoring indicator was δ = *I* (*Y* ≤ *C*). Conditional on *Ỹ* = *ỹ*, the probability that the censoring indicator was missing, 1 − π(*ỹ*), was determined by the logistic model: *log*{π(*ỹ*)/|1 − π(*ỹ*)]} = θ_{1} + θ_{2}*ỹ*.

Censoring rates of 20 and 40% were obtained by setting μ = 5.395 and 4.067, respectively. Values for θ_{1} and θ_{2} were selected by specifying the proportion of subjects with a missing censoring indicator at *ỹ* = 0 and by specifying the average proportion (over all *ỹ*) with a missing censoring indicator. We simulated data so that the average missingness rate was 20 or 40% and so that the missingness rate increased or decreased with *ỹ*. We chose θ_{1} so that 1 − π(0) = {1 + *exp*(θ_{1})}^{−1} was 0.15 or 0.25 when the average missingness rate was 20%, and so that 1 − π(0) was 0.30 or 0.50 when the average missingness rate was 40%. We also examined situations in which none of the censoring indicators were missing, both for 20 and 40% censoring, as well as the case where there was no censoring at all.

We generated 10,000 Monte Carlo random samples of size *n* = 50, 100, 200 and 400 under each combination of censoring and missingness. Every data set was analyzed under the model: *Y* = α + β*X* + ε. For each configuration, we averaged over the 10,000 data sets to estimate the mean squared error (MSE), bias, and standard error (SE) associated with the slope estimators _{R}, _{I}, and _{W}. The results for the intercept estimators _{R}, _{I}, and _{W} were qualitatively the same, so they are not presented. When none of the censoring indicators were missing, we also calculated the unbounded Koul et al. (1981) estimator:

$${\widehat{\beta}}_{K}={({\displaystyle \sum _{i=1}^{n}{X}_{i}{X}_{i}^{}})-1}^{{\displaystyle \sum _{i=1}^{n}{X}_{i}}}$$

where ${Y}_{i}^{*}$ is defined in (1). We view _{K} as a reference for comparisons in our simulations.

We used the uniform kernel function $W(u)=\frac{1}{2}$ if |*u*| ≤ 1 and *W*(*u*) = 0 otherwise, and the biweight kernel function $K(u)=\frac{15}{16}(1-2{u}^{2}+{u}^{4})$ if |*u*| ≤ 1 and *K*(*u*) = 0 otherwise. The bandwidths were *h _{n}* =

The MSE, bias, and SE for _{K}, _{R}, _{I}, and _{W} are presented in Tables 1, ,2,2, and and3,3, respectively. The average jackknife estimate of SE is also shown in Table 3. Results are given only for situations where the average missingness rate increased with *ỹ*, as the results for the decreasing cases were virtually identical. The Koul et al. (1981) estimator (_{k}) is included as a benchmark when no censoring indicators are missing. The first row of each table corresponds to the special case of no censoring, where all of the estimators are identical because each synthetic response $({Y}_{i}^{*},{\widehat{Y}}_{i,R},{\widehat{Y}}_{i,I},{\widehat{Y}}_{i,W})$ reduces to the observed response *Y _{i}*.

Mean squared errors for the regression calibration (_{R}), imputation (_{I}), and inverse probability weighted (_{W}) estimators of β by censoring rate (CR), missingness rate (MR), and sample size (n) **...**

Biases for the regression calibration (_{R}) imputation (_{I}), and inverse probability weighted (_{W}) estimators of β by censoring rate (CR), missingness rate (MR), and sample size (n) based on 10,000 **...**

Standard errors (and mean jackknife estimates) for the regression calibration (_{R}), imputation (_{I}), and inverse probability weighted (_{W}) estimators of β by censoring rate (CR), missingness rate **...**

Table 1 shows that when no censoring indicators were missing, the MSEs for _{R} were less than or equal to those for _{K}, whereas the MSEs for _{I} and _{W} were slightly larger, at least for the 40% censoring case. For any given configuration, with or without missing censoring indicators, the MSE for _{R} never exceeded the MSE for _{I} or _{W}. As expected, the MSEs for _{R}, _{I}, and _{W} decreased (i.e., improved) as sample sizes increased, as censoring rates decreased, and as average missingness rates decreased (see Table 1).

The estimators also were evaluated with respect to bias and SE, the squares of which contribute equally to MSE. Table 2 gives the biases, all of which are negative when censoring is present; thus, each approach tended to underestimate the slope parameter. One interesting result is that applying our methods with missing censoring indicators usually produced less bias than applying the standard estimator of Koul et al. (1981) with complete censoring information. In fact, with a 40% censoring rate, the bias for each of our estimators (with missingness rates of either 20 or 40%) was always less than for _{K} with no missing censoring indicators. Among the proposed estimators, the bias was consistently smallest for the inverse probability weighted estimator, _{W}. Also, biases decreased as sample sizes increased and as censoring rates decreased, but were not affected as much by missingness rates.

Table 3 gives the SEs and their jackknife estimates. The SE patterns mimic those for the MSEs in Table 1 because bias and SE contribute equally to MSE and the SEs are larger (in absolute value) than the biases. As with the MSEs, _{R} had the smallest SEs and _{W} had the largest. The SEs decreased as *n* increased, as censoring decreased, and as missingness decreased. The jackknife estimates were always good for _{W}, but became too small for _{I}, and even smaller for _{R}, as *n* decreased, as censoring increased, and as missingness increased.

In order to assess robustness, we analyzed the same simulated data with a “poor” model choice for *m*(*z*). Rather than using *log*{*m*(*z*)/[1 − *m*(*z*)]} = γ_{1} + γ_{2}*ỹ* + γ_{3}*x* to model *m*(*z*) as a logistic function of *ỹ* and *x*, we set γ_{2} = γ_{3} = 0 and modeled *m*(*z*) as a constant, which gives $\widehat{m}(z)\widehat{m}={\displaystyle {\sum}_{i=1}^{n}{\xi}_{i}{\delta}_{i}/}{\displaystyle {\sum}_{i=1}^{n}{\xi}_{i}}$. Table 4 displays the biases and SEs obtained for this poor model choice. The results for no missing censoring indicators (*M R* = 0%) are the same as in Tables 2 and and3,3, except for _{R}, as only _{R} depends on (*z*) in this situation. The SEs always decrease as sample sizes increase, as does the bias of _{W}, but the biases of _{R} and _{I} actually increase with the sample size in many cases. In fact, the bias of _{R} is now positive and increases with *n* in every situation, whereas it was always negative and approached 0 as *n* increased when using the better model for *m*(*z*). The bias of _{I} increases with the sample size, which makes the negative biases smaller in absolute value, but once the biases become positive, they grow in absolute value as *n* increases. In contrast, the bias of _{W} for the poor model choice is nearly identical to the bias of _{W} when using the better model. As predicted theoretically, the double-robustness property of _{W} apparently keeps its performance from degrading if a poor choice is made when modeling *m*(*z*).

We illustrate our methods with the glioblastoma data introduced earlier. These data are from a brain cancer clinical trial. As a measure of post-treatment quality of life, we focus on the number of days until a patient declines to a state in which he has lost his mobility and his cancer has returned. Thus, the endpoint of interest is the time (in days) to non-ambulatory progression, and we want to relate that outcome to the patient’s sex and age at study entry.

We have data on 276 glioblastoma patients who were over 50 years old when they entered the trial. By the end of the study, 59 of these patients progressed and were no longer ambulatory, 14 progressed and were still ambulatory, 166 did not progress, and 37 progressed but had an unknown ambulatory status. Therefore, with respect to the time to non-ambulatory progression, we have 59 uncensored observations (ξ = 1, δ = 1), 180 censored observations (ξ = 1, δ = 0), and 37 observations with a missing censoring indicator (ξ = 0). We have data on a covariate vector *X* = (*X*_{1}, *X*_{2}, *X*_{3})^{}, where *X*_{1} is fixed at 1, *X*_{2} indicates the patient’s sex (0 for female, 1 for male), and *X*_{3} is the patient’s age (in years) at study entry. Our data set includes information on 109 women and 167 men, with ages ranging from 51 to 74.

We applied our methods to the glioblastoma data to estimate the effects of sex and age on the time to non-ambulatory progression. As a first step, we fitted the linear regression model: *E*(*Y*|*X* = *x*) = β_{1}*x*_{1} + β_{2}*x*_{2} + β_{3}*x*_{3}, where *Y* is the log of the time to non-ambulatory progression. This initial model specifies parallel lines, with a separate intercept for each sex and a common slope in age. To help assess the appropriateness of the model, we added a sex-age interaction term (β_{4}*x*_{2}*x*_{3}), which allows each line to have its own sex-specific intercept and slope. Estimated regression coefficients and their standard errors, obtained under both models via all three proposed methods, are shown in Table 5. As a further means of assessing model suitability, we also fitted a separate quadratic model in age for each sex, as well as a model with quadratic-age curves that shared common age terms, but none of the quadratic terms were significant, so these results are not presented.

Estimated regression coefficients (and jackknife estimates of their standard errors) under two models for the glioblastoma data from the brain cancer clinical trial (n = 276)

Based on 2-sided Wald tests, none of the analyses provided evidence of a sex effect, but to varying degrees they all suggested a detrimental age effect, with time to nonambulatory progression decreasing with age. The expanded model gave no hint of a sex-age interaction, regardless of which method was used, so we focus on the simpler model with only main effects for sex and age. Relative to the typical 0.05 level, the statistical significance of the age effect was marginal (*p* = 0.047) in the inverse probability weighted analysis, slightly stronger (*p* = 0.029) in the imputation analysis, and fairly high (*p* < 0.001) in the regression calibration analysis. Though excluding 13% of the data seems inefficient, if we ignore the 37 observations with a missing censoring indicator, the Koul et al. (1981) analysis gives similar results, with no evidence of a sex effect and a significant age effect (*p* = 0.022).

Thus, time to non-ambulatory progression does not differ significantly between men and women, but it appears to decrease (i.e., quality of life worsens) with age. Although evidence of this age effect is weak for the inverse probability weighted analysis, it is consistent across all three of the proposed analyses, as well as the Koul et al. (1981) analysis. The relative sizes of the estimated standard errors in Table 5 agree with our simulation results, where the SE for _{R} was smallest (and its jackknife estimates were too small), followed by the SE for _{I}, and finally by the SE for _{W} (with good jackknife estimates). This suggests that the test based on _{W} might be the least powerful, which further strengthens the conclusion that the observed age effect is real, as even the least powerful test was statistically significant.

In this example, expressing the failure time of interest as a function of two component times reveals a potential source of bias. That is, the time to non-ambulatory progression can be viewed as the maximum of the time to loss of mobility and the time to cancer progression. We treat patients who have progressed but are still ambulatory at the end of the study as contributing censored failure times, but this censoring is likely informative and may introduce a bias, as ambulatory patients who have already progressed should be at greater risk of failing than ambulatory patients who have not yet progressed. Similarly, patients in remission who have already lost their mobility should be at greater risk of failing than patients in remission who are still ambulatory. This problem is not unique to our approach, however, and affects a wide class of existing survival techniques, including something as simple as using a Kaplan–Meier curve to estimate the distribution of time to non-ambulatory progression. Future efforts should be directed at developing methods that properly account for this type of informative censoring, such as a multivariate procedure, although such developments are beyond the scope of this article.

We propose three estimators for the regression coefficients in a linear least squares analysis of censored survival data when some censoring indicators are missing. Our methods generalize the synthetic data methods of Koul et al. (1981), which do not allow for missing censoring indicators. The proposed estimators modify the synthetic responses of Koul *et al.* so that each missing censoring indicator (δ) is replaced by its estimated conditional expectation. Our estimators differ from each other only with respect to how they handle the known censoring indicators. The regression calibration estimator replaces each known δ by its estimated mean, the imputation estimator uses the known δ, and the inverse probability weighted estimator substitutes a weighted average of the known δ and its estimated mean.

Similar to Dikta (1998), our regression calibration method can be used even when all of the censoring indicators are observed. In that situation, the Koul et al. (1981) estimator (_{K}) transforms the observations (*Ỹ _{i}*, δ

$$var\phantom{\rule{thinmathspace}{0ex}}\left\{\frac{{\delta}_{i}{\tilde{Y}}_{i}}{1-G({\tilde{Y}}_{i})}|{X}_{i}\right\}\ge var\phantom{\rule{thinmathspace}{0ex}}\left\{\frac{m({Z}_{i}){\tilde{Y}}_{i}}{1-G({\tilde{Y}}_{i})}|{X}_{i}\right\},$$

which implies that _{K} might be improved by replacing δ_{i} in ${Y}_{i}^{*}$ with a regression estimator of *m*(*Z _{i}*) based on {(

Along these same lines, we show in Sects. 3 and 4 that _{I} and _{W} have larger asymptotic variances than _{R}, and thus our imputation and inverse probability weighted estimators are asymptotically less efficient than our regression calibration estimator. The simulation studies confirm these results (see Tables 3 and and4).4). This suggests that, at least with respect to this measure of performance, the regression calibration method for handling missing censoring indicators is better than the other two methods, even though the other two methods can lead to asymptotically efficient estimators in some situations (see, e.g., Robins et al. 1994; Hahn 1998; Hirano et al. 2003; and Wang et al. 2004).

On the other hand, there are several good reasons to favor _{W} over _{I} or _{R}. The inverse probability weighted estimator is less biased than the other estimators (see Tables 2 and and4)4) and its jackknife estimates of SE were the most accurate (see Table 3). Also, the inverse probability weighted estimator enjoys the double-robustness property, which makes it less susceptible to problems caused by incorrectly specified models (see Table 4).

How should one balance the advantages and disadvantages of each approach? In general, we recommend the inverse probability weighted estimator. Not only is the inverse probability weighted estimator more robust to poor model choices for *m*(*z*), but relative to the other estimators, it is less biased and its jackknife variance estimates are more accurate. Although the regression calibration estimator has the smallest variance, its relatively large bias can produce misleading results in practice. In fact, our simulation showed that if a poor model was selected for *m*(*z*), the bias of _{R} could actually increase with the sample size. Currently, the variance estimator for _{R} does not adequately account for uncertainty in estimating *m*(*z*) and *G*(*z*). Also, the variance estimates are functions of the parameter estimates and _{R} has the largest bias; thus, even though the true variance of _{R} is smallest, the variance of _{R} also can be severely underestimated (see Table 3), making results appear too significant. Future efforts should focus on accounting for uncertainty in estimating *m*(*z*) and *G*(*z*), but until then, we cannot recommend using the regression calibration method in its present form.

Clearly, our estimators depend on the choice of bandwidths. However, each is a global functional and thus the rate *n*^{1/2} asymptotic normality of the proposed estimators suggests that a proper choice of *b _{n}* in assumption (C.7) does not depend on the first order term of the mean squared error. Rather, it may depend only on second (or higher) order terms. This implies that the selection of

We assume the censoring variables (*C*_{1},…,*C _{n}*) are

Qihua Wang’s research was supported by the National Science Fund for Distinguished Young Scholars in China (10725106), the National Natural Science Foundation of China (10671198), the National Science Fund for Creative Research Groups in China, the research Grants Council of Hong Kong (HKU 7050/06P), and the grant from Key Lab of Random Complex Structures and Data Science, CAS. Gregg Dinse’s research was supported by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences (Z01-ES-045007-13). The authors thank Dr. Shyamal Peddada and Dr. David Dunson, as well as the editors and referees, for their many helpful comments.

Define $Grad[{m}_{0}(z,\theta )]={(\frac{{m}_{0}(z,\theta ){\theta}_{1},\frac{{m}_{0}(z,\theta ){\theta}_{2},\dots \phantom{\rule{thinmathspace}{0ex}},\frac{{m}_{0}(z,\theta ){\theta}_{d},\frac{{m}_{0}(z,\theta ){\theta}_{d+1}}{)}}{}}{}}{}}^{}$ and set *A*(θ)=(σ_{r,s}) for 1≤*r*, *s*≤*d*+1, where ${\sigma}_{r,s}=E\phantom{\rule{thinmathspace}{0ex}}[\frac{\pi (\tilde{Y})}{{m}_{0}(Z,\theta )\{1-{m}_{0}(Z,\theta )\}}\frac{{m}_{0}(Z,\theta ){\theta}_{r}\frac{{m}_{0}(Z,\theta ){\theta}_{s}}{]}}{}$. Furthermore, let α(*z*_{1}, *z*_{2}) = *Grad*[*m*_{0}(*z*_{1}, θ)]^{} *A*^{−1}(θ) *Grad*[*m*_{0}(*z*_{2}, θ)] and assume the following conditions:

- (C.1)
*m*_{0}(*z*, θ) has bounded first-order derivatives on θ. - (C.2) σ
_{r,s}< ∞ for 1 ≤*r*,*s*≤*d*+ 1 and the matrix*A*(θ) = (σ_{r,s}) is positive definite. - (C.3)
*E*{‖*X*‖^{2}*Ỹ*^{2}/^{2}(*Ỹ*)} < ∞, where*H*(*t*) =*pr*(*Ỹ*≤*t*) and (*t*) = 1 −*H*(*t*). - (C.4)
*E*{‖*X*‖^{2}} < ∞. - (C.5) π(·) has a bounded first-order derivative.
- (C.6)
*W*(·) is of order 1 with bounded support. - (C.7)
*nb*→ ∞ and $n{b}_{n}^{2}\to 0$._{n}

*Proof of Theorem 2.1* For ${\mathrm{\Sigma}}_{n}\text{}=\text{}\frac{1}{n}\phantom{\rule{thinmathspace}{0ex}}{\sum}_{i=1}^{n}{X}_{i}{X}_{i}^{}$ and ${A}_{n}\text{}=\text{}{n}^{-1/2}\phantom{\rule{thinmathspace}{0ex}}{\sum}_{i=1}^{n}{X}_{i}\phantom{\rule{thinmathspace}{0ex}}({\widehat{Y}}_{i,R}-{X}_{i}^{}$, we can write

$${n}^{1/2}({\widehat{\beta}}_{R}-\beta )={\sum}_{n}^{-1}{A}_{n}.$$

(A.1)

By adding and subtracting identical terms, we can rewrite *A _{n}* as follows:

$$\begin{array}{ll}{A}_{n}\hfill & ={n}^{-1/2}{\displaystyle \sum _{i=1}^{n}{X}_{i}}({Y}_{i,R}-{X}_{i}^{}\phantom{\rule{thinmathspace}{0ex}}+\phantom{\rule{thinmathspace}{0ex}}{n}^{-1/2}{\displaystyle \sum _{i=1}^{n}{X}_{i}}\phantom{\rule{thinmathspace}{0ex}}\left\{\frac{{\tilde{Y}}_{i}{m}_{0}({Z}_{i},\theta )}{1-G({\tilde{Y}}_{i})}\right\}\frac{{\widehat{G}}_{n}({\tilde{Y}}_{i})-G({\tilde{Y}}_{i})}{1-G({\tilde{Y}}_{i})}+{o}_{p}(1)\hfill & \hfill & :={T}_{n1}+{T}_{n2}+{T}_{n3}+{o}_{p}(1).\hfill & \hfill \end{array}$$

(A.2)

Define $\mu (z)=E\phantom{\rule{thinmathspace}{0ex}}\left\{\frac{X\tilde{Y}}{1-G(\tilde{Y})}\alpha (Z,{Z}_{j})|{Z}_{j}=z\right\}$. Under conditions (C.1) and (C.2), we have

$${T}_{n2}={n}^{-1/2}{\displaystyle \sum _{j=1}^{n}\frac{\mu ({Z}_{j}){\xi}_{j}\{{\delta}_{j}-{m}_{0}({Z}_{j},\theta )\}}{{m}_{0}({Z}_{j},\theta )\{1-{m}_{0}({Z}_{j},\theta )\}}+{o}_{p}(1).}$$

(A.3)

Set _{0}(*t*) = *pr*(*Ỹ _{i}* >

$$\psi ({\tilde{Y}}_{i},{\delta}_{i},{\xi}_{i};t)=[-\frac{\{{\xi}_{i}-\pi ({\tilde{Y}}_{i})\}\{{\delta}_{i}-u({\tilde{Y}}_{i})\}}{\pi ({\tilde{Y}}_{i})\{1-H({\tilde{Y}}_{i})\}}I({\tilde{Y}}_{i}\le t)\phantom{\rule{thinmathspace}{0ex}}+{\displaystyle \underset{0}{\overset{{\tilde{Y}}_{i\mathrm{t}}}{\int}}}]\phantom{\rule{thinmathspace}{0ex}},$$

(A.4)

where *u*(·) is given in Sect. 2. Similar to Wang and Ng (2008), we have

$${\widehat{G}}_{n}(t)-G(t)=\frac{1-G(t)}{n}{\displaystyle \sum _{i=1}^{n}\psi ({\tilde{Y}}_{i},{\delta}_{i},{\xi}_{i};t)}+{o}_{p}({n}^{-\frac{1}{2}}).$$

The third term in (A.2) can be rewritten as

$${T}_{n3}={n}^{-3/2}{\displaystyle \sum _{i=1}^{n}{\displaystyle \sum _{j=1}^{n}{X}_{i}}}\frac{{\tilde{Y}}_{i}{m}_{0}({Z}_{i},\theta )\psi ({\tilde{Y}}_{j},{\delta}_{j},{\xi}_{j};{\tilde{Y}}_{i})}{1-G({\tilde{Y}}_{i})}+{o}_{p}(1)\phantom{\rule{thinmathspace}{0ex}}.$$

(A.5)

Define

$$h({X}_{i},{\tilde{Y}}_{i},{\delta}_{i},{\xi}_{i};{X}_{j},{\tilde{Y}}_{j},{\delta}_{j},{\xi}_{j})=\frac{{X}_{i}{\tilde{Y}}_{i}{m}_{0}({Z}_{i},\theta )\psi ({\tilde{Y}}_{j},{\delta}_{j},{\xi}_{j};{\tilde{Y}}_{i})}{1-G({\tilde{Y}}_{i})}\phantom{\rule{thinmathspace}{0ex}}+\phantom{\rule{thinmathspace}{0ex}}\frac{{X}_{j}{\tilde{Y}}_{j}{m}_{0}({Z}_{i},\theta )\psi ({\tilde{Y}}_{i},{\delta}_{i},{\xi}_{i};{\tilde{Y}}_{j})}{1-G({\tilde{Y}}_{j})}$$

and set *U _{n} = n*

$${T}_{n3}={U}_{n}+{o}_{p}(1)\phantom{\rule{thinmathspace}{0ex}}.$$

(A.6)

Combining (A.2), (A.3), and (A.6) gives

$${A}_{n}={T}_{n1}+{M}_{n}+{U}_{n}+{o}_{p}(1),$$

(A.7)

where *T*_{n1} is defined in (A.2) and *M _{n}* is the main component of

By the central limit theorem, we have

$${T}_{n1}\to N(0,{\mathrm{\Omega}}_{1}),$$

(A.8)

and under the assumed missingness mechanism, we have

$${M}_{n}\stackrel{}{\to}$$

(A.9)

where Ω_{1} = *E*{*XX*^{}(*Y _{R} − X*

$$g({X}_{1},{\tilde{Y}}_{1},{\delta}_{1},{\xi}_{1})=E\phantom{\rule{thinmathspace}{0ex}}\left\{\frac{{X}_{2}{\tilde{Y}}_{2}{m}_{0}({Z}_{2},\theta )\psi ({\tilde{Y}}_{1},{\delta}_{1},{\xi}_{1};{\tilde{Y}}_{2})}{1-G({\tilde{Y}}_{2})}|{X}_{1},{\tilde{Y}}_{1},{\delta}_{1},{\xi}_{1}\right\}\phantom{\rule{thinmathspace}{0ex}}.$$

Under the assumed missingness mechanism, we have

$$E\phantom{\rule{thinmathspace}{0ex}}\{\psi ({\tilde{Y}}_{i},{\delta}_{i},{\xi}_{i};{\tilde{Y}}_{j})|{\tilde{Y}}_{j}\}=0\text{fori\u2260j,}$$

from which it is straightforward to verify that *E*{*h*_{α}(*X*_{1}, *Ỹ*_{1}, δ_{1}, ξ_{1}; *X*_{2}, *Ỹ*_{2}, δ_{2}, ξ_{2})} = 0. As a result of the relationship

$$E\phantom{\rule{thinmathspace}{0ex}}\{{\psi}^{2}({\tilde{Y}}_{i},{\delta}_{i},{\xi}_{i};{\tilde{Y}}_{j})|{\tilde{Y}}_{j}\}={-\{1-G({\tilde{Y}}_{j})\}}^{2}{\displaystyle \underset{0}{\overset{{\tilde{Y}}_{j}}{\int}}{\{\overline{H}(s)\}}^{-2}d\phantom{\rule{thinmathspace}{0ex}}{\tilde{H}}_{0}(s)}\phantom{\rule{thinmathspace}{0ex}}\le \frac{{\{1-G({\tilde{Y}}_{j})\}}^{2}}{{\overline{H}}^{2}({\tilde{Y}}_{j})}\text{for}i\ne j$$

and condition (C.3), it follows that $E\phantom{\rule{thinmathspace}{0ex}}({h}_{\alpha}^{2})<\infty $. By the central limit theorem for U-statistics, we have

$${U}_{n\alpha}\stackrel{}{\to}$$

where

$${\mathrm{\Omega}}_{3}=E\phantom{\rule{thinmathspace}{0ex}}\{g({X}_{1},{\tilde{Y}}_{1},{\delta}_{1},{\xi}_{1}){g}^{}$$

and ${\alpha}^{}$. This result implies

$${U}_{n}\stackrel{}{\to}$$

(A.10)

As for covariances, under the assumed missingness mechanism, we have

$$cov({T}_{n1},{M}_{n})=E\phantom{\rule{thinmathspace}{0ex}}[{X}_{i}\phantom{\rule{thinmathspace}{0ex}}\left\{\frac{{\tilde{Y}}_{i}{m}_{0}({Z}_{i})}{1-G({Z}_{i})}-{X}_{i}^{}\}\frac{{\mu}^{}{m}_{0}({Z}_{i},\theta )\{1-{m}_{0}({Z}_{i},\theta )\}}{}\right]=0$$

(A.11)

Note that $E\{{X}_{k}({Y}_{k,R}-{X}_{k}^{}$ for *k* ≠ *i*, *k* ≠ *j*, *i ≠ j* and $E\{{X}_{i}({Y}_{i,R}-{X}_{i}^{}$, where *h _{ij} = h*(

Similarly, we have

$$\begin{array}{ll}cov\phantom{\rule{thinmathspace}{0ex}}({T}_{n1},{U}_{n})\hfill & =\left(1-\frac{1}{n}\right)\phantom{\rule{thinmathspace}{0ex}}E\phantom{\rule{thinmathspace}{0ex}}\{E\phantom{\rule{thinmathspace}{0ex}}\left[{X}_{1}({Y}_{1,R}-{X}_{1}^{}]\frac{{X}_{2}^{}1-G({\tilde{Y}}_{2})}{}\right\}\to \phantom{\rule{thinmathspace}{0ex}}E\phantom{\rule{thinmathspace}{0ex}}\{E\phantom{\rule{thinmathspace}{0ex}}\left[{X}_{1}({Y}_{1,R}-{X}_{1}^{}]\frac{{X}_{2}^{}1-G({\tilde{Y}}_{2})}{}\right\}& =E\phantom{\rule{thinmathspace}{0ex}}\{{X}_{1}({Y}_{1,R}-{X}_{1}^{}:={\mathrm{\Omega}}_{1,3}.\end{array}$$

(A.12)

and

$$cov\phantom{\rule{thinmathspace}{0ex}}({M}_{n},{U}_{n})=-\left(1-\frac{1}{n}\right)E\phantom{\rule{thinmathspace}{0ex}}[\frac{{m}_{0}({Z}_{2},\theta ){\tilde{Y}}_{2}{X}_{2}{\mu}^{}\{1-G({\tilde{Y}}_{2})\}\{1-H({\tilde{Y}}_{1})\}}{I}]\to -E\phantom{\rule{thinmathspace}{0ex}}[\frac{{m}_{0}({Z}_{2},\theta ){\tilde{Y}}_{2}{X}_{2}{\mu}^{}\{1-G({\tilde{Y}}_{2})\}\{1-H({\tilde{Y}}_{1})\}}{I}]:={\mathrm{\Omega}}_{2,3}.$$

(A.13)

Together Eqs. (A.7)–(A.9) and (A.10)–(A.13) prove

$$\phantom{\rule{thinmathspace}{0ex}}{A}_{n}\stackrel{}{\to}$$

(A.14)

where

$${\mathrm{\Omega}}_{R}={\mathrm{\Omega}}_{1}+{\mathrm{\Omega}}_{2}+{\mathrm{\Omega}}_{3}+2{\mathrm{\Omega}}_{1,3}+2{\mathrm{\Omega}}_{2,3}.$$

(A.15)

By the law of large numbers and (C.4), it follows that ${n}^{-1}{\Sigma}_{i=1}^{n}{X}_{i}{X}_{i}^{}$. This result, together with (A.1) and (A.14), prove Theorem 2.1.

*Proof of Theorem 2.2* Asymptotic representations similar to (A.7) can be obtained for ${\Sigma}_{i=1}^{n}{({\widehat{Y}}_{i,R}-{X}_{i}^{}2}^{{X}_{i}}$ for *i* = 1,…,*n*. Then, by applying these representations to (2) and using some algebra, Theorem 2.2 can be proved.

*Proof of Theorem 3.1* By standard arguments, we have

$$\begin{array}{l}\text{n\u22121/2\u2211i=1nXi(Y^i,I\u2212Xi\beta )}={n}^{-1/2}{\displaystyle \sum _{i=1}^{n}{X}_{i}}\phantom{\rule{thinmathspace}{0ex}}[\frac{\{{\delta}_{i}{\xi}_{i}+(1-{\xi}_{i}){m}_{0}({Z}_{i},\theta )\}{\tilde{Y}}_{i}}{1-G({\tilde{Y}}_{i})}-{X}_{i}^{}]+{n}^{-1/2}{\displaystyle \sum _{i=1}^{n}{X}_{i}\frac{(1-{\xi}_{i})\{{m}_{0}({Z}_{i},{\widehat{\theta}}_{n})-{m}_{0}({Z}_{i},\theta )\}{\tilde{Y}}_{i}}{1-G({\tilde{Y}}_{i})}}+{n}^{-1/2}{\displaystyle \sum _{i=1}^{n}\frac{{X}_{i}\tilde{Y}\{{\delta}_{i}{\xi}_{i}+(1-{\xi}_{i}){m}_{0}({Z}_{i},\theta )\}}{1-G({\tilde{Y}}_{i})}\left\{\frac{{\widehat{G}}_{n}({\tilde{Y}}_{i})-G({\tilde{Y}}_{i})}{1-G({\tilde{Y}}_{i})}\right\}+{o}_{p}(1)}\hfill & ={T}_{n1}+{T}_{n2}+{T}_{n3}+{T}_{n4}+{T}_{n5}+{T}_{n6}+{o}_{p}(1),\hfill \hfill \end{array}$$

(A.16)

where *T*_{n1}, *T*_{n2} and *T*_{n3} are defined in (A.2), and *T*_{n4}, *T*_{n5} and *T*_{n6} are defined as follows

$$\begin{array}{l}{T}_{n4}={n}^{-1/2}{\displaystyle \sum _{i=1}^{n}{X}_{i}}\frac{{\tilde{Y}}_{i}{\xi}_{i}\{{\delta}_{i}-{m}_{0}({Z}_{i},\theta )\}}{1-G({\tilde{Y}}_{i})},\hfill \\ {T}_{n5}=-{n}^{-1/2}{\displaystyle \sum _{i=1}^{n}{X}_{i}}\frac{{\tilde{Y}}_{i}{\xi}_{i}\{{m}_{0}({Z}_{i},{\widehat{\theta}}_{n})-{m}_{0}({Z}_{i},\theta )\}}{1-G({\tilde{Y}}_{i})},\hfill \\ {T}_{n6}={n}^{-1/2}{\displaystyle \sum _{i=1}^{n}{X}_{i}}\frac{{\tilde{Y}}_{i}{\xi}_{i}\{{\delta}_{i}-{m}_{0}({Z}_{i},\theta )\}}{1-G({\tilde{Y}}_{i})}\left\{\frac{{\widehat{G}}_{n}({\tilde{Y}}_{i})-G({\tilde{Y}}_{i})}{1-G({\tilde{Y}}_{i})}\right\}.\hfill \end{array}$$

$${T}_{n1}+{T}_{n2}+{T}_{n3}={T}_{n}+{M}_{n}+{U}_{n}+{o}_{p}(1).$$

(A.17)

Let $\phantom{\rule{thinmathspace}{0ex}}\tilde{\mu}({Z}_{j})=E\phantom{\rule{thinmathspace}{0ex}}\left\{\frac{X\tilde{Y}\pi (\tilde{Y})}{1-G(\tilde{Y})}\alpha (Z,{Z}_{j})|{Z}_{j}\right\}$. Under conditions (C.1) and (C.2), using arguments similar to those leading to (A.3)–(A.5), we get

$${T}_{n5}=-{n}^{-1/2}{\displaystyle \sum _{j=1}^{n}\frac{{\xi}_{j}\{{\delta}_{j}-m({Z}_{j},\theta )\}}{{m}_{0}({Z}_{j},\theta )\{1-{m}_{0}({Z}_{j},\theta )\}}\tilde{\mu}({Z}_{j})+{o}_{p}(1).}$$

(A.18)

Recalling the definition of *T*_{n4}, it follows from (A.18) that

$$\begin{array}{ll}{T}_{n4}+{T}_{n5}\hfill & ={n}^{-1/2}{\displaystyle \sum _{i=1}^{n}\phantom{\rule{thinmathspace}{0ex}}\left[\frac{{X}_{i}{\tilde{Y}}_{i}}{1-G({\tilde{Y}}_{i})}-\frac{\tilde{\mu}({Z}_{i})}{{m}_{0}({Z}_{i},\theta )\{1-{m}_{0}({Z}_{i},\theta )\}}\right]}\phantom{\rule{thinmathspace}{0ex}}{\xi}_{i}\phantom{\rule{thinmathspace}{0ex}}\{{\delta}_{i}-{m}_{0}({Z}_{i},\theta )\}+{o}_{p}(1)\hfill \\ \hfill & :={T}_{n45}+{o}_{p}(1).\hfill \end{array}$$

By the central limit theorem, we have

$${T}_{n45}\stackrel{}{\to}$$

(A.19)

where

$${\mathrm{\Omega}}_{I}^{*}=E\phantom{\rule{thinmathspace}{0ex}}[L(X,\tilde{Y}){L}^{}]$$

(A.20)

and $L(X,\tilde{Y})=\frac{X\tilde{Y}}{1-G(\tilde{Y})}-\frac{\tilde{\mu}(\tilde{Y})}{{m}_{0}(Z,\theta )\{1-{m}_{0}(Z,\theta )\}}$. Similar to the way (A.5) was derived, we obtain

$$\begin{array}{ll}{T}_{n6}\hfill & ={n}^{-3/2}{\displaystyle \sum _{i\ne j}\frac{{X}_{i}{\tilde{Y}}_{i}{\xi}_{i}\{{\delta}_{i}-{m}_{0}({Z}_{i},\theta )\}\psi ({\tilde{Y}}_{j},{\delta}_{j},{\xi}_{j};{\tilde{Y}}_{i})}{1-G({\tilde{Y}}_{i})}+{o}_{p}(1)}\hfill \\ \hfill & :={T}_{n6,1}+{o}_{p}(1).\hfill \end{array}$$

(A.21)

Note that

$$E\phantom{\rule{thinmathspace}{0ex}}\left[{X}_{i}{\tilde{Y}}_{i}{\xi}_{i}\phantom{\rule{thinmathspace}{0ex}}\{{\delta}_{i}-{m}_{0}({Z}_{i},\theta )\}|{\tilde{Y}}_{i}\right]=E\left[\phantom{\rule{thinmathspace}{0ex}}{X}_{i}{\tilde{Y}}_{i}E({\xi}_{i}|{X}_{i},{\tilde{Y}}_{i},{\delta}_{i})\{{\delta}_{i}-{m}_{0}({Z}_{i},\theta )\}|{\tilde{Y}}_{i}\right]=E\phantom{\rule{thinmathspace}{0ex}}\left[{X}_{i}{\tilde{Y}}_{i}E({\xi}_{i}|{X}_{i},{\tilde{Y}}_{i})\{E({\delta}_{i}|{\tilde{Y}}_{i},{X}_{i})-E({\delta}_{i}|{\tilde{Y}}_{i})\}|{\tilde{Y}}_{i}\right]=0.$$

From the fact that *E*{ψ(*Ỹ _{j}*, δ

$$E\phantom{\rule{thinmathspace}{0ex}}{\Vert {T}_{n6,1}\Vert}^{2}\le 2{n}^{-1}E\phantom{\rule{thinmathspace}{0ex}}\left\{{\Vert {X}_{1}\Vert}^{2}{\tilde{Y}}_{1}^{2}{\psi}^{2}({\tilde{Y}}_{2},{\delta}_{2},{\xi}_{2};{\tilde{Y}}_{1})\right\}.$$

(A.22)

Equations (A.21) and (A.22) together prove that

$${T}_{n6}={o}_{p}(1).$$

(A.23)

Under the assumed missingness mechanism, a little algebra shows *cov*(*T*_{n1}, *T*_{n45}) = 0, *cov*(*U _{n}*,

*Proof of Theorem 4.1* Similar to the derivation of (A.16), by (C.5), (C.6) and (C.7), we have

$$\begin{array}{l}{n}^{-1/2}{\displaystyle \sum _{i=1}^{n}({\widehat{Y}}_{i,W}-{X}_{i}^{})}\phantom{\rule{thinmathspace}{0ex}}={n}^{-1/2}{X}_{i}\phantom{\rule{thinmathspace}{0ex}}({\widehat{Y}}_{i,R}-{X}_{i}^{})+{n}^{-1/2}{\displaystyle \sum _{i=1}^{n}{X}_{i}}\frac{\frac{{\xi}_{i}}{{\widehat{\pi}}_{n}({\tilde{Y}}_{i})}\{{\delta}_{i}-{m}_{0}({\tilde{Y}}_{i},\theta )\}{\tilde{Y}}_{i}}{1-{\widehat{G}}_{n}({\tilde{Y}}_{i})}+{n}^{-1/2}{\displaystyle \sum _{i=1}^{n}{X}_{i}}\frac{\frac{{\xi}_{i}}{{\widehat{\pi}}_{n}({\tilde{Y}}_{i})}\{{m}_{0}({Z}_{i},\theta )-{m}_{0}({Z}_{i},\widehat{{\theta}_{n}})\}{\tilde{Y}}_{i}}{1-{\widehat{G}}_{n}({\tilde{Y}}_{i})}\hfill & \hfill & \phantom{\rule{thinmathspace}{0ex}}={T}_{n1}+{M}_{n}+{U}_{n}+{S}_{n}+{R}_{n}+{o}_{p}(1),\hfill \hfill \end{array}$$

(A.24)

where *T*_{n1}, *M _{n}* and

$$\begin{array}{l}{S}_{n}={n}^{-1/2}{\displaystyle \sum _{i=1}^{n}{X}_{i}}\frac{\frac{{\xi}_{i}}{\pi ({\tilde{Y}}_{i})}\{{\delta}_{i}-{m}_{0}({Z}_{i},\theta )\}{\tilde{Y}}_{i}}{1-G({\tilde{Y}}_{i})}\hfill \\ {R}_{n}={n}^{-1/2}{\displaystyle \sum _{i=1}^{n}{X}_{i}}\frac{\frac{{\xi}_{i}}{\pi ({\tilde{Y}}_{i})}\{{m}_{0}({Z}_{i},\theta )-{m}_{0}({Z}_{i},{\widehat{\theta}}_{n})\}{\tilde{Y}}_{i}}{1-G({\tilde{Y}}_{i})}.\hfill \end{array}$$

Similar to the way (A3) was derived, we can show that

$${R}_{n}={n}^{-1/2}{\displaystyle \sum _{i=1}^{n}\frac{{\xi}_{j}\{{\delta}_{j}-m({X}_{j},\theta )\}}{{m}_{0}({Z}_{j},\theta )\{1-{m}_{0}({Z}_{j},\theta )\}}\mu ({\tilde{Y}}_{j})+{o}_{p}(1).}$$

Hence, it follows from the central limit theorem that

$${S}_{n}+{R}_{n}={n}^{-1/2}{\displaystyle \sum _{i=1}^{n}\left[\frac{{X}_{i}{\tilde{Y}}_{i}}{1-G({\tilde{Y}}_{i})}-\frac{\pi ({\tilde{Y}}_{i})\mu ({\tilde{Y}}_{i})}{{m}_{0}({Z}_{i},\theta )\{1-{m}_{0}({Z}_{i},\theta )\}}\right]}\phantom{\rule{thinmathspace}{0ex}}\times \phantom{\rule{thinmathspace}{0ex}}\frac{{\xi}_{i}\phantom{\rule{thinmathspace}{0ex}}\{{\delta}_{i}-{m}_{0}({Z}_{i},\theta )\}}{\pi ({\tilde{Y}}_{i})}+{o}_{p}(1)\stackrel{}{\to}$$

(A.25)

where

$${\mathrm{\Omega}}_{W}^{*}=E\phantom{\rule{thinmathspace}{0ex}}[\tilde{L}(X,\tilde{Y}){\tilde{L}}^{}]$$

(A.26)

and $\tilde{L}(X,\tilde{Y})=\frac{X\tilde{Y}}{1-G(\tilde{Y})}-\frac{\pi (\tilde{Y})\mu (\tilde{Y})}{{m}_{0}(Z,\theta )\{1-{m}_{0}(Z,\theta )\}}$.

It can be shown that *cov*(*T*_{n1} + *M _{n} + U_{n}, S_{n} + R_{n}*) = 0. This result, together with Eqs. (A.7), (A.14), (A.24), and (A.25), and the fact that $\frac{1}{n}\phantom{\rule{thinmathspace}{0ex}}{\Sigma}_{i=1}^{n}{X}_{i}{X}_{i}^{}$, prove Theorem 4.1.

Qihua Wang, Department of Mathematics and Statistics, Yunnan University, Kunming 650091, China. Academy of Mathematics and Systems Science, Chinese Academy of Science, Beijing 100190, China.

Gregg E. Dinse, Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA.

- Albert A, Anderson JA. On the existence of maximum likelihood estimates in logistic regression models. Biometrika. 1984;71:1–10.
- Buckley J, James I. Linear regression with censored data. Biometrika. 1979;66:429–436.
- Cox DR. Regression models and life tables (with discussion) J R Stat Soc B. 1972;34:187–220.
- Dewanji A. A note on a test for competing risks with missing failure type. Biometrika. 1992;79:855–857.
- Dikta G. On semiparametric random censorship models. J Stat Plan Inference. 1998;66:253–279.
- Dinse GE. Nonparametric estimation for partially-complete time and type of failure data. Biometrics. 1982;38:417–431. [PubMed]
- Gao G, Tsiatis AA. Semiparametric estimators for the regression coefficients in the linear transformation competing risks model with missing cause of failure. Biometrika. 2005;92:875–891.
- Goetghebeur EJ, Ryan L. A modified logrank test for competing risks with missing failure type. Biometrika. 1990;77:207–211.
- Goetghebeur EJ, Ryan L. Analysis of competing risks survival data when some failure types are missing. Biometrika. 1995;82:821–833.
- Hahn J. On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica. 1998;66:315–331.
- Hirano K, Imbens GW, Ridder G. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica. 2003;71:1161–1189.
- Horvitz DG, Thompson DJ. A generalization of sampling without replacement from a finite universe. J Am Stat Assoc. 1952;47:663–685.
- Jin Z, Lin DY, Wei LJ, Ying Z. Rank-based inference for the accelerated failure time model. Bio-metrika. 2003;90:341–353.
- Koul H, Susarla V, van Ryzin J. Regression analysis with randomly right-censored data. Ann Stat. 1981;9:1276–1288.
- Lai TL, Ying Z. Rank regression methods for left-truncated and right-censored data. Ann Stat. 1991;19:531–556.
- Leurgans S. Linear models, random censoring and synthetic data. Biometrika. 1987;74:301–309.
- Li G, Wang QH. Empirical likelihood regression analysis for right censored data. Stat Sinica. 2003;13:51–68.
- Lo S-H. Estimating a survival function with incomplete cause-of-death data. J Multivar Anal. 1991;39:217–235.
- Lu W, Liang Y. Analysis of competing risks data with missing cause of failure under additive hazards model. Stat Sinica. 2008;18:219–234.
- Lu K, Tsiatis AA. Multiple imputation methods for estimating regression coefficients in the competing risks model with missing cause of failure. Biometrics. 2001;57:1191–1197. [PubMed]
- Lunceford JK, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med. 2004;23:2937–2960. [PubMed]
- McKeague IW, Subramanian S. Product-limit estimators and Cox regression with missing censoring information. Scand J Stat. 1998;25:589–601.
- Miller RG. Least squares regression with censored data. Biometrika. 1976;63:449–464.
- Peddada SD, Patwardhan G. Jackknife variance estimators in linear models. Biometrika. 1992;79:654–657.
- Reid N. A conversation with Sir David Cox. Stat Sci. 1994;9:439–455.
- Ritov Y. Estimation in a linear regression model with censored data. Ann Stat. 1990;18:303–328.
- Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc. 1994;89:846–866.
- Santner TJ, Duffy DE. A note on A. Albert and J. A. Anderson’s conditions for the existence of maximum likelihood estimates in logistic regression models. Biometrika. 1986;73:755–758.
- Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for nonignorable drop-out using semiparametric nonresponse models (with discussion) J Am Stat Assoc. 1999;94:1096–1146.
- Subramanian S. Asymptotically efficient estimation of a survival function in the missing censoring indicator model. Nonparametr Stat. 2004;16:797–817.
- Subramanian S. Survival analysis for the missing censoring indicator model using kernel density estimation techniques. Stat Methodol. 2006;3:125–136. [PMC free article] [PubMed]
- Tsiatis AA. Estimating regression parameters using linear rank tests for censored data. Ann Stat. 1990;18:354–372.
- Tsiatis AA, Davidian M, McNeney B. Multiple imputation methods for testing treatment differences in survival distributions with missing cause of failure. Biometrika. 2002;89:238–244.
- Wang QH, Ng K. Asymptotically efficient product-limit estimators with censoring indicators missing at random. Stat Sinica. 2008;18:749–768.
- Wang QH, Linton O, Härdle W. Semiparametric regression analysis with missing response at random. J Am Stat Assoc. 2004;99:334–345.
- Ying Z. A large sample study of rank estimation for censored regression data. Ann Stat. 1993;21:76–99.
- Zhou X, Sun L. Additive hazards regression with missing censoring information. Stat Sinica. 2003;13:1237–1257.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |