Lifetime Data Anal. Author manuscript; available in PMC 2012 April 1.
Published in final edited form as:
PMCID: PMC3020262
NIHMSID: NIHMS223231

# Linear regression analysis of survival data with missing censoring indicators

## Abstract

Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial.

Keywords: Asymptotic normality, Censoring indicator, Imputation, Inverse probability weighting, Least squares, Missing at random, Regression calibration

## 1 Introduction

Linear regression is a popular statistical tool that has been used successfully in many areas. In survival analysis, a log-transformation of the response variable converts a conventional linear model to an accelerated failure time model, which is an appealing alternative to the Cox (1972) proportional hazards model because of its direct interpretation (cf. Reid 1994). Linear regression has been studied extensively for the analysis of randomly right censored data, and various procedures have been developed; see, for example, Miller (1976), Buckley and James (1979), Koul et al. (1981), Leurgans (1987), Ritov (1990), Tsiatis (1990), Lai and Ying (1991); Ying (1993), Jin et al. (2003), and Li and Wang (2003). All of these methods assume that every censoring indicator is observed.

In practice, some censoring indicators may be unobserved, and we develop methods for this situation. For example, we analyzed data from a clinical trial involving patients with glioblastoma, a form of brain cancer. One measure of post-treatment quality of life is the length of time until a patient declines to a state in which mobility is lost and the cancer has returned. We evaluated data on adult glioblastoma patients over 50 years of age. Progression status was always known, but ambulatory status was unknown in some cases. By the end of the study, some patients were non-ambulatory and had progressed, some had progressed but remained ambulatory, some had progressed but their ambulatory status was unrecorded, and some had not progressed. Our analysis focused on the time to non-ambulatory progression, and these four types of events contributed uncensored observations, censored observations, observations with a missing censoring indicator, and censored observations, respectively. In particular, we were interested in how sex and age affected the time to non-ambulatory progression. Our methods have applications in many other areas, including analyses with missing death certificates in epidemiological studies, unperformed necropsies in animal experiments, and unidentified component failures in quality control settings.

The methods we propose are formulated in the context of a censored survival problem with missing censoring indicators, but they can be used to analyze competing risks data on event-specific failure times when some event indicators are unknown. In practice, the usual competing risks analysis focuses on a particular event type of interest and treats the times to failure from other events as censored observations on the event time of interest. Thus, one can view an unknown failure type as a missing censoring indicator and use our methods.

If censoring indicators are missing, standard estimators for censored data are not directly applicable. One solution is to ignore cases with missing data and apply conventional analyses to the complete cases only. However, the efficiency of this complete-case approach decreases as the degree of missingness increases. Also, the complete-case estimator is consistent only when censoring indicators are missing completely at random (MCAR). Alternatively, some analyses directly incorporate survival data with missing censoring indicators. For example, Dinse (1982); Lo (1991); McKeague and Subramanian (1998), and Subramanian (2004, 2006) proposed survival estimators. Within the regression context, Goetghebeur and Ryan (1990, 1995); Dewanji (1992); Lu and Tsiatis (2001), and Tsiatis et al. (2002) developed methods under a proportional hazards model; Zhou and Sun (2003) and Lu and Liang (2008) worked with an additive hazards model; and Gao and Tsiatis (2005) employed a linear transformation competing risks model. Linear regression techniques, however, have received little attention in situations where censoring indicators are missing, and thus we now focus on that particular problem.

Suppose there are n subjects and their observations follow the linear regression model

where Yi is a response variable, Xi is a vector of d covariates, β is a d-vector of regression coefficients, and εi is a random error for subject i(i = 1,…,n). Given {X1,…,Xn}, the errors {ε1,…,εn} are independent and identically distributed (i.i.d.) with mean zero. Let {C1,…,Cn} be i.i.d. censoring variables, where Ci can randomly censor Yi on the right. The observed response is i = min(Yi, Ci), the censoring indicator is δi = I (YiCi), and the missingness indicator ξi is 1 if δi is observed and 0 otherwise. Although calling δi a censoring indicator might seem backwards, as δi = 1 indicates an uncensored observation rather than a censored observation, we follow the convention that refers to δi as a censoring indicator.

We observe {, X, δ} for subjects with complete data (ξ = 1) and {, X} for subjects with a missing censoring indicator (ξ = 0). Typically the missingness mechanism is assumed to produce data that are missing at random (MAR), which implies pr (ξ = 1|, X, δ) = pr (ξ = 1|, X). Alternatively, we assume pr (ξ = 1|, X, δ) = pr (ξ = 1|), which is more restrictive than the MAR assumption but less restrictive than the common MCAR assumption that pr (ξ = 1|Ỹ, X, δ) = pr (ξ = 1). We made an assumption stricter than MAR to avoid postulating a parametric model for pr (ξ = 1|Ỹ, X) and to sidestep problems encountered when applying a nonparametric method to sparse data, such as can arise in high-dimensional situations. Under the weaker MAR assumption, we expect to obtain similar results for a reasonable model in the parametric case or with sufficient data in the nonparametric case, but we leave such analyses for future work.

In this paper, we propose three estimators for β. Our methods extend an approach taken by Koul et al. (1981) when all censoring indicators are observed. They used the formula

$Yi*=δiY˜i1−Gn(Y˜i)$
(1)

to transform observed data (i, δi) into synthetic data $Yi*$, where Gn(c) is a product-limit type estimator of the censoring distribution G(c) = pr(Cc). Koul et al. estimated β by applying a modified least squares procedure to $(Yi*,Xi)$, but their estimator does not handle missing censoring indicators. Also, they deflate the synthetic response $Yi*$ to 0 when the true response Yi exceeds Ci, and they inflate $Yi*$ beyond Yi when Yi is actually observed, both of which are counterintuitive. Alternatively, our estimators ease this dilemma by replacing δi in (1) with a weighted average of δi and its estimated conditional mean i, say piδi + (1 − pi)i, which allows statistical inference about β when some censoring indicators are missing.

The article is organized as follows. Section 2 develops a synthetic data approach based on regression calibration, which sets pi 0. Sections 3 and 4 define imputation and inverse probability weighted estimators, which set pi = ξi and pi = ξi/Êi|i), respectively. As δi is unknown if ξi = 0, each of the three formulas for pi is zero in that situation. All three least squares estimators of β are shown to be asymptotically normal. Section 5 reports the results of a simulation study conducted to evaluate the finite sample behavior of the three proposed estimators, and Sect. 6 illustrates their use by applying them to the glioblastoma data. Finally, Sect. 7 discusses our estimators and the underlying assumptions, and proofs of the main results are given in the Appendix.

## 2 Least squares regression calibration estimator

If we define and m(zi) = Ei|Zi = zi), then one natural generalization of $Yi*$ replaces δi by its conditional expectation to obtain Yi,R = m(Zi)i/{1 − G (i)}. Averaging over δi and invoking the usual conditional expectation arguments yield:

For known m(·) and G(·), this result suggests the following least squares estimator of β:

In practice, however, m(·) and G(·) are usually unknown, and hence R is not defined. One simple solution is to calculate R using estimates of m(·) and G(·).

For example, consider a parametric model for m(z) of the form m0(z, θ), where m0(·, ·) is a known continuous function and θ is a vector of unknown parameters. The parameter vector θ can be estimated by n, the maximizer of the likelihood

and thus m(z) can be estimated by n(z) = m0(z, n). As for estimating G(·), we begin by defining π() = E (ξ| = ), which can be estimated nonparametrically by

$π^n(y˜)=∑i=1nξiW(y˜−Y˜ibn)/∑i=1nW(y˜−Y˜ibn),$

where W(·) is a kernel function and bn is a bandwidth sequence. Next, we define a Horvitz and Thompson (1952) type inverse probability weighted estimator of u() = E(δ| = ):

$u^n(y˜)=∑i=1nδi(ξiπ^n(Y˜i))K(y˜−Y˜ihn)/∑i=1n(ξiπ^n(Y˜i))K(y˜−Y˜ihn),$

where K (·) is a kernel function and hn is a bandwidth sequence. Finally, similar to Dikta (1998) and Wang and Ng (2008), we define the following estimator of G():

where Ri denotes the rank of i (i = 1,…,n). Now, analogous to R, we can define the following least squares regression calibration estimator of β:

where Ŷi,R is Yi,R with m(·) and G(·) replaced by n(·) and Ĝn(·), respectively.

Define for 1 ≤ r, sd + 1 and assume:

• (C.1) m0(z, θ) has bounded first-order derivatives on θ.
• (C.2) σr,s < ∞ for 1 ≤ r, sd + 1 and matrix A(θ) = (σr,s) is positive definite.
• (C.3) E {‖X22/2()} < ∞, where H(t) = pr(t) and (t) = 1 − H (t).
• (C.4) E {‖X2} < ∞.
• (C.5) π(·) has a bounded first-order derivative.
• (C.6) W(·) is of order 1 with bounded support.
• (C.7) nbn → ∞ and $nbn2→0$.

The following theorem addresses the asymptotic normality of R.

Theorem 2.1 Under assumptions (C.1), (C.2), (C.3) and (C.4), we have

where VR = Σ−1RΣ−1, Σ = E(XX), andR is defined in (A.15) of the Appendix.

The structure ofR is complex; thus, estimating VR by replacing the unknowns inR with appropriate component estimates is difficult to implement in practice. Alternatively, we estimate VR by nJ, where J is a jackknife estimator of the asymptotic variance of R. In particular, we use a jackknife variance estimator of the form proposed by Peddada and Patwardhan (1992), which was motivated by a minimum norm quadratic unbiased estimator (MINQUE) and which in our case reduces to the simple form:

(2)

Theorem 2.2 Under assumptions (C.1), (C.2), (C.3) and (C.4), we have

$nV^J→pVR.$

Proofs of Theorem 2.1 and Theorem 2.2 are given in the Appendix.

## 3 Least squares imputation estimator

Imputation is a popular method for dealing with missing data, and this section describes such an approach. When some censoring indicators are missing, some synthetic data in (1) are not defined. However, in the expression for $Yi*$, we can replace Gn(·) by Ĝn(·) and impute δi with n(Zi) if δi is missing (ξi = 0), which yields the imputed synthetic data

Analogous to Sect. 2, we define a least squares imputation estimator for β:

Under the assumed missingness mechanism, I also can be motivated by the relationship

where YI = {ξδ + (1 − ξ)m(Z)}/{1 − G()} and YR = m(Z)/{1 − G()}.

The following theorem addresses the asymptotic normality of I.

Theorem 3.1 Under assumptions (C.1), (C.2), (C.3) and (C.4), we have

where VI = Σ−1IΣ−1, $ΩI=ΩR+ΩI*$, and $ΩI*$ is defined in (A.20) of the Appendix.

Similar to Sect. 2, we estimate VI by the jackknife methods described earlier. Clearly I has larger asymptotic variance than R, and hence I is asymptotically less efficient than R. Intuitively, one reason for this may be that the imputed synthetic observation Ŷi,I is zero if Yi is known to be censored (ξi = 1, δi = 0). We see that the imputation approach creates synthetic data with greater variability than the regression calibration method by noting

$var(YI|X)=E([E(ξ|Y˜)δ+{1−E(ξ|Y˜)}m2(Z)]Y˜2{1−G(Y˜)}2|X)−E2(YI|X)=E([π(Y˜)E(δ|Z)+{1−π(Y˜)}m2(Z)]Y˜2{1−G(Y˜)}2|X)−E2(YR|X)=E(π(Y˜)m(Z){1−m(Z)}Y˜2{1−G(Y˜)}2|X)+var(YR|X)≥var(YR|X).$

## 4 Least squares inverse probability weighted estimator

Inverse probability weighting is another popular approach for dealing with missing data. If we define YW = [{ξ/π()}δ + {1 − ξ/π()}m(Z)]/{1 − G()}, then under the assumed missingness mechanism, conditional expectations similar to those in Sects. 2 and 3 give

In practice, m(·), π(·) and G(·) are usually unknown, but we can use estimates of these quantities to calculate inverse probability weighted synthetic data:

Note that Ŷi,W is very similar to Ŷi,I, except the former weights each ξi inversely proportional to the estimated conditional mean of ξi (i.e., the estimated conditional probability that δi is known, given i). Applying least squares techniques to these synthetic data produces the inverse probability weighted estimator

The following theorem addresses the asymptotic normality of W.

Theorem 4.1 Under assumptions (C.1) through (C.7), we have

where VW = Σ−1WΣ−1, $ΩW=ΩR+ΩW*$, and $ΩW*$ is defined in (A.26) of the Appendix.

As described earlier, VW can be estimated by jackknife methods. We see W has a larger asymptotic variance than I, and hence R, by noting for any d-vector α. One explanation is that if Yi is known to be censored (ξi = 1, δi = 0), then inverse probability weighting leads to synthetic data of the form Ŷi,W = {1 − 1/n(i)}n(Zi)i/{1 − Ĝn(i)}, the sign of which is the opposite of i, and hence Yi. Thus, this method creates synthetic data that are more variable than those based on either YR or YI, which can be seen by noting

$var(YW|X)=E[{E(ξ|Y˜)δπ2(Y˜)+2E(ξ|Y˜)δm(Z)π(Y˜)−2E(ξ|Y˜)δm(Z)π2(Y˜)}Y˜2{1−G(Y˜)}2+{m2(Z)−2E(ξ|Y˜)m2(Z)π(Y˜)+E(ξ|Y˜)m2(Z)π2(Y˜)}Y˜2{1−G(Y˜)}2|X]−E2(YW|X)=E[{E(δ|Z)π(Y˜)+2E(δ|Z)m(Z)−2E(δ|Z)m(Z)π(Y˜)}Y˜2{1−G(Y˜)}2+{m2(Z)−2m2(Z)+m2(Z)π(Y˜)}Y˜2{1−G(Y˜)}2|X]−E2(YR|X)=E[m(Z){1−m(Z)}Y˜2π(Y˜){1−G(Y˜)}2|X]+var(YR|X)≥var(YI|X)≥var(YR|X).$

One advantage of this approach is that if either m(·) or π(·) is modeled correctly, the estimator W will be consistent, which is the so-called “double-robustness” property (see, e.g., Scharfstein et al. 1999; Lunceford and Davidian 2004; and Wang et al. 2004). Note that we estimate π(·) nonparametrically, and thus W enjoys this robustness even if the model for m(·) is not specified correctly. This is an appealing property of W, but must be weighed against its larger asymptotic variance.

## 5 Simulation study

The finite-sample properties of the proposed estimators were evaluated via Monte Carlo simulation, with emphasis on the effects of sample size, percentage of observations censored, and percentage of censoring indicators missing. Response data were generated from the linear model: Y = α + βX + ε, with α = 3 and β = 0.5. The random variables ε, X, and C were independently generated from N(0,1), U(0,2), and N(μ, 4) distributions, respectively. For each subject, the observed response was = min(Y, C) and the censoring indicator was δ = I (YC). Conditional on = , the probability that the censoring indicator was missing, 1 − π(), was determined by the logistic model: log{π()/|1 − π()]} = θ1 + θ2.

Censoring rates of 20 and 40% were obtained by setting μ = 5.395 and 4.067, respectively. Values for θ1 and θ2 were selected by specifying the proportion of subjects with a missing censoring indicator at = 0 and by specifying the average proportion (over all ) with a missing censoring indicator. We simulated data so that the average missingness rate was 20 or 40% and so that the missingness rate increased or decreased with . We chose θ1 so that 1 − π(0) = {1 + exp1)}−1 was 0.15 or 0.25 when the average missingness rate was 20%, and so that 1 − π(0) was 0.30 or 0.50 when the average missingness rate was 40%. We also examined situations in which none of the censoring indicators were missing, both for 20 and 40% censoring, as well as the case where there was no censoring at all.

We generated 10,000 Monte Carlo random samples of size n = 50, 100, 200 and 400 under each combination of censoring and missingness. Every data set was analyzed under the model: Y = α + βX + ε. For each configuration, we averaged over the 10,000 data sets to estimate the mean squared error (MSE), bias, and standard error (SE) associated with the slope estimators R, I, and W. The results for the intercept estimators R, I, and W were qualitatively the same, so they are not presented. When none of the censoring indicators were missing, we also calculated the unbounded Koul et al. (1981) estimator:

where $Yi*$ is defined in (1). We view K as a reference for comparisons in our simulations.

We used the uniform kernel function $W(u)=12$ if |u| ≤ 1 and W(u) = 0 otherwise, and the biweight kernel function $K(u)=1516(1−2u2+u4)$ if |u| ≤ 1 and K(u) = 0 otherwise. The bandwidths were hn = bn = n−1/3max(). We estimated m(z) = pr(δ = 1|z) under the logistic model: log{m(z)/[1 − m(z)]} = γ1 + γ2 + γ3x. When the data on δ are completely (or quasi-completely) separated, the maximum likelihood estimate of γ = (γ1, γ2, γ3) does not exist (Albert and Anderson 1984; Santner and Duffy 1986). We excluded such data sets and continued simulating until we obtained 10,000 samples. The proportion of samples excluded did not exceed 0.3%, except when the average missingness rate was 40%, the censoring rate was 20%, and n was 50, in which case less than 2.3% of the samples were excluded.

The MSE, bias, and SE for K, R, I, and W are presented in Tables 1, ,2,2, and and3,3, respectively. The average jackknife estimate of SE is also shown in Table 3. Results are given only for situations where the average missingness rate increased with , as the results for the decreasing cases were virtually identical. The Koul et al. (1981) estimator (k) is included as a benchmark when no censoring indicators are missing. The first row of each table corresponds to the special case of no censoring, where all of the estimators are identical because each synthetic response $(Yi*,Y^i,R,Y^i,I,Y^i,W)$ reduces to the observed response Yi.

Mean squared errors for the regression calibration (R), imputation (I), and inverse probability weighted (W) estimators of β by censoring rate (CR), missingness rate (MR), and sample size (n) ...
Biases for the regression calibration (R) imputation (I), and inverse probability weighted (W) estimators of β by censoring rate (CR), missingness rate (MR), and sample size (n) based on 10,000 ...
Standard errors (and mean jackknife estimates) for the regression calibration (R), imputation (I), and inverse probability weighted (W) estimators of β by censoring rate (CR), missingness rate ...

Table 1 shows that when no censoring indicators were missing, the MSEs for R were less than or equal to those for K, whereas the MSEs for I and W were slightly larger, at least for the 40% censoring case. For any given configuration, with or without missing censoring indicators, the MSE for R never exceeded the MSE for I or W. As expected, the MSEs for R, I, and W decreased (i.e., improved) as sample sizes increased, as censoring rates decreased, and as average missingness rates decreased (see Table 1).

The estimators also were evaluated with respect to bias and SE, the squares of which contribute equally to MSE. Table 2 gives the biases, all of which are negative when censoring is present; thus, each approach tended to underestimate the slope parameter. One interesting result is that applying our methods with missing censoring indicators usually produced less bias than applying the standard estimator of Koul et al. (1981) with complete censoring information. In fact, with a 40% censoring rate, the bias for each of our estimators (with missingness rates of either 20 or 40%) was always less than for K with no missing censoring indicators. Among the proposed estimators, the bias was consistently smallest for the inverse probability weighted estimator, W. Also, biases decreased as sample sizes increased and as censoring rates decreased, but were not affected as much by missingness rates.

Table 3 gives the SEs and their jackknife estimates. The SE patterns mimic those for the MSEs in Table 1 because bias and SE contribute equally to MSE and the SEs are larger (in absolute value) than the biases. As with the MSEs, R had the smallest SEs and W had the largest. The SEs decreased as n increased, as censoring decreased, and as missingness decreased. The jackknife estimates were always good for W, but became too small for I, and even smaller for R, as n decreased, as censoring increased, and as missingness increased.

In order to assess robustness, we analyzed the same simulated data with a “poor” model choice for m(z). Rather than using log{m(z)/[1 − m(z)]} = γ1 + γ2 + γ3x to model m(z) as a logistic function of and x, we set γ2 = γ3 = 0 and modeled m(z) as a constant, which gives . Table 4 displays the biases and SEs obtained for this poor model choice. The results for no missing censoring indicators (M R = 0%) are the same as in Tables 2 and and3,3, except for R, as only R depends on (z) in this situation. The SEs always decrease as sample sizes increase, as does the bias of W, but the biases of R and I actually increase with the sample size in many cases. In fact, the bias of R is now positive and increases with n in every situation, whereas it was always negative and approached 0 as n increased when using the better model for m(z). The bias of I increases with the sample size, which makes the negative biases smaller in absolute value, but once the biases become positive, they grow in absolute value as n increases. In contrast, the bias of W for the poor model choice is nearly identical to the bias of W when using the better model. As predicted theoretically, the double-robustness property of W apparently keeps its performance from degrading if a poor choice is made when modeling m(z).

Biases (and standard errors) for the regression calibration (R), imputation (I), and inverse probability weighted (W) estimators of β for a poor choice of m(z) [i.e., m(z) m] by censoring ...

## 6 Example: analysis of glioblastoma data

We illustrate our methods with the glioblastoma data introduced earlier. These data are from a brain cancer clinical trial. As a measure of post-treatment quality of life, we focus on the number of days until a patient declines to a state in which he has lost his mobility and his cancer has returned. Thus, the endpoint of interest is the time (in days) to non-ambulatory progression, and we want to relate that outcome to the patient’s sex and age at study entry.

We have data on 276 glioblastoma patients who were over 50 years old when they entered the trial. By the end of the study, 59 of these patients progressed and were no longer ambulatory, 14 progressed and were still ambulatory, 166 did not progress, and 37 progressed but had an unknown ambulatory status. Therefore, with respect to the time to non-ambulatory progression, we have 59 uncensored observations (ξ = 1, δ = 1), 180 censored observations (ξ = 1, δ = 0), and 37 observations with a missing censoring indicator (ξ = 0). We have data on a covariate vector X = (X1, X2, X3), where X1 is fixed at 1, X2 indicates the patient’s sex (0 for female, 1 for male), and X3 is the patient’s age (in years) at study entry. Our data set includes information on 109 women and 167 men, with ages ranging from 51 to 74.

We applied our methods to the glioblastoma data to estimate the effects of sex and age on the time to non-ambulatory progression. As a first step, we fitted the linear regression model: E(Y|X = x) = β1x1 + β2x2 + β3x3, where Y is the log of the time to non-ambulatory progression. This initial model specifies parallel lines, with a separate intercept for each sex and a common slope in age. To help assess the appropriateness of the model, we added a sex-age interaction term (β4x2x3), which allows each line to have its own sex-specific intercept and slope. Estimated regression coefficients and their standard errors, obtained under both models via all three proposed methods, are shown in Table 5. As a further means of assessing model suitability, we also fitted a separate quadratic model in age for each sex, as well as a model with quadratic-age curves that shared common age terms, but none of the quadratic terms were significant, so these results are not presented.

Estimated regression coefficients (and jackknife estimates of their standard errors) under two models for the glioblastoma data from the brain cancer clinical trial (n = 276)

Based on 2-sided Wald tests, none of the analyses provided evidence of a sex effect, but to varying degrees they all suggested a detrimental age effect, with time to nonambulatory progression decreasing with age. The expanded model gave no hint of a sex-age interaction, regardless of which method was used, so we focus on the simpler model with only main effects for sex and age. Relative to the typical 0.05 level, the statistical significance of the age effect was marginal (p = 0.047) in the inverse probability weighted analysis, slightly stronger (p = 0.029) in the imputation analysis, and fairly high (p < 0.001) in the regression calibration analysis. Though excluding 13% of the data seems inefficient, if we ignore the 37 observations with a missing censoring indicator, the Koul et al. (1981) analysis gives similar results, with no evidence of a sex effect and a significant age effect (p = 0.022).

Thus, time to non-ambulatory progression does not differ significantly between men and women, but it appears to decrease (i.e., quality of life worsens) with age. Although evidence of this age effect is weak for the inverse probability weighted analysis, it is consistent across all three of the proposed analyses, as well as the Koul et al. (1981) analysis. The relative sizes of the estimated standard errors in Table 5 agree with our simulation results, where the SE for R was smallest (and its jackknife estimates were too small), followed by the SE for I, and finally by the SE for W (with good jackknife estimates). This suggests that the test based on W might be the least powerful, which further strengthens the conclusion that the observed age effect is real, as even the least powerful test was statistically significant.

In this example, expressing the failure time of interest as a function of two component times reveals a potential source of bias. That is, the time to non-ambulatory progression can be viewed as the maximum of the time to loss of mobility and the time to cancer progression. We treat patients who have progressed but are still ambulatory at the end of the study as contributing censored failure times, but this censoring is likely informative and may introduce a bias, as ambulatory patients who have already progressed should be at greater risk of failing than ambulatory patients who have not yet progressed. Similarly, patients in remission who have already lost their mobility should be at greater risk of failing than patients in remission who are still ambulatory. This problem is not unique to our approach, however, and affects a wide class of existing survival techniques, including something as simple as using a Kaplan–Meier curve to estimate the distribution of time to non-ambulatory progression. Future efforts should be directed at developing methods that properly account for this type of informative censoring, such as a multivariate procedure, although such developments are beyond the scope of this article.

## 7 Discussion

We propose three estimators for the regression coefficients in a linear least squares analysis of censored survival data when some censoring indicators are missing. Our methods generalize the synthetic data methods of Koul et al. (1981), which do not allow for missing censoring indicators. The proposed estimators modify the synthetic responses of Koul et al. so that each missing censoring indicator (δ) is replaced by its estimated conditional expectation. Our estimators differ from each other only with respect to how they handle the known censoring indicators. The regression calibration estimator replaces each known δ by its estimated mean, the imputation estimator uses the known δ, and the inverse probability weighted estimator substitutes a weighted average of the known δ and its estimated mean.

Similar to Dikta (1998), our regression calibration method can be used even when all of the censoring indicators are observed. In that situation, the Koul et al. (1981) estimator (K) transforms the observations (i, δi) into synthetic responses $(Yi*)$ and then applies a modified least squares procedure to $(Yi*,Xi)$. Their approach is motivated by the fact that , but K can perform poorly with small sample sizes. Under the MAR assumption, we have

$var{δiY˜i1−G(Y˜i)|Xi}≥var{m(Zi)Y˜i1−G(Y˜i)|Xi},$

which implies that K might be improved by replacing δi in $Yi*$ with a regression estimator of m(Zi) based on {(i, δi) : i = 1,…,n}. Our simulation results support this suggestion, but a detailed discussion is beyond the scope of this paper.

Along these same lines, we show in Sects. 3 and 4 that I and W have larger asymptotic variances than R, and thus our imputation and inverse probability weighted estimators are asymptotically less efficient than our regression calibration estimator. The simulation studies confirm these results (see Tables 3 and and4).4). This suggests that, at least with respect to this measure of performance, the regression calibration method for handling missing censoring indicators is better than the other two methods, even though the other two methods can lead to asymptotically efficient estimators in some situations (see, e.g., Robins et al. 1994; Hahn 1998; Hirano et al. 2003; and Wang et al. 2004).

On the other hand, there are several good reasons to favor W over I or R. The inverse probability weighted estimator is less biased than the other estimators (see Tables 2 and and4)4) and its jackknife estimates of SE were the most accurate (see Table 3). Also, the inverse probability weighted estimator enjoys the double-robustness property, which makes it less susceptible to problems caused by incorrectly specified models (see Table 4).

How should one balance the advantages and disadvantages of each approach? In general, we recommend the inverse probability weighted estimator. Not only is the inverse probability weighted estimator more robust to poor model choices for m(z), but relative to the other estimators, it is less biased and its jackknife variance estimates are more accurate. Although the regression calibration estimator has the smallest variance, its relatively large bias can produce misleading results in practice. In fact, our simulation showed that if a poor model was selected for m(z), the bias of R could actually increase with the sample size. Currently, the variance estimator for R does not adequately account for uncertainty in estimating m(z) and G(z). Also, the variance estimates are functions of the parameter estimates and R has the largest bias; thus, even though the true variance of R is smallest, the variance of R also can be severely underestimated (see Table 3), making results appear too significant. Future efforts should focus on accounting for uncertainty in estimating m(z) and G(z), but until then, we cannot recommend using the regression calibration method in its present form.

Clearly, our estimators depend on the choice of bandwidths. However, each is a global functional and thus the rate n1/2 asymptotic normality of the proposed estimators suggests that a proper choice of bn in assumption (C.7) does not depend on the first order term of the mean squared error. Rather, it may depend only on second (or higher) order terms. This implies that the selection of bn may be fairly flexible as long as the bandwidths satisfy (C.7).

We assume the censoring variables (C1,…,Cn) are i.i.d., which is the same assumption made by Koul et al. (1981). This form of censoring is more restrictive than the standard noninformative censoring assumed in most survival analyses, as the latter does not require the censoring distribution G(c) to be estimated. Our methods, however, as well as those developed by Koul et al., rely on synthetic data that are functions of an estimate of G(c). One avenue of future research might involve extending our methods by expressing G(c) as a function of the covariates. Generally, this requires either a parametric model, the choice of which must be justified, or else a nonparametric approach, which could suffer in high-dimensional situations if the data were too sparse. Alternatively, we might view G(c) as the marginal distribution of C, which “averages” over the covariates.

## Acknowledgements

Qihua Wang’s research was supported by the National Science Fund for Distinguished Young Scholars in China (10725106), the National Natural Science Foundation of China (10671198), the National Science Fund for Creative Research Groups in China, the research Grants Council of Hong Kong (HKU 7050/06P), and the grant from Key Lab of Random Complex Structures and Data Science, CAS. Gregg Dinse’s research was supported by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences (Z01-ES-045007-13). The authors thank Dr. Shyamal Peddada and Dr. David Dunson, as well as the editors and referees, for their many helpful comments.

## Appendix: Proofs of theorems

Define and set A(θ)=(σr,s) for 1≤r, sd+1, where . Furthermore, let α(z1, z2) = Grad[m0(z1, θ)] A−1(θ) Grad[m0(z2, θ)] and assume the following conditions:

• (C.1) m0(z, θ) has bounded first-order derivatives on θ.
• (C.2) σr,s < ∞ for 1 ≤ r, sd + 1 and the matrix A(θ) = (σr,s) is positive definite.
• (C.3) E {‖X22/2()} < ∞, where H(t) = pr(t) and (t) = 1 − H(t).
• (C.4) E {‖X2} < ∞.
• (C.5) π(·) has a bounded first-order derivative.
• (C.6) W(·) is of order 1 with bounded support.
• (C.7) nbn → ∞ and $nbn2→0$.

Proof of Theorem 2.1 For and , we can write

$n1/2(β^R−β)=∑n−1An.$
(A.1)

By adding and subtracting identical terms, we can rewrite An as follows:

(A.2)

Define $μ(z)=E{XY˜1−G(Y˜)α(Z,Zj)|Zj=z}$. Under conditions (C.1) and (C.2), we have

$Tn2=n−1/2∑j=1nμ(Zj)ξj{δj−m0(Zj,θ)}m0(Zj,θ){1−m0(Zj,θ)}+op(1).$
(A.3)

Set 0(t) = pr(i > t, δi = 0) and define

(A.4)

where u(·) is given in Sect. 2. Similar to Wang and Ng (2008), we have

$G^n(t)−G(t)=1−G(t)n∑i=1nψ(Y˜i,δi,ξi;t)+op(n−12).$

The third term in (A.2) can be rewritten as

$Tn3=n−3/2∑i=1n∑j=1nXiY˜im0(Zi,θ)ψ(Y˜j,δj,ξj;Y˜i)1−G(Y˜i)+op(1).$
(A.5)

Define

$h(Xi,Y˜i,δi,ξi;Xj,Y˜j,δj,ξj)=XiY˜im0(Zi,θ)ψ(Y˜j,δj,ξj;Y˜i)1−G(Y˜i)+XjY˜jm0(Zi,θ)ψ(Y˜i,δi,ξi;Y˜j)1−G(Y˜j)$

and set Un = n−3/2i<j h(Xi, i, δi, ξi; Xj, j, Sj, ξj), which leads to

$Tn3=Un+op(1).$
(A.6)

Combining (A.2), (A.3), and (A.6) gives

$An=Tn1+Mn+Un+op(1),$
(A.7)

where Tn1 is defined in (A.2) and Mn is the main component of Tn2 in (A.3).

By the central limit theorem, we have

$Tn1→N(0,Ω1),$
(A.8)

and under the assumed missingness mechanism, we have

(A.9)

where Ω1 = E{XX(YR − Xβ)2} and . For any d-vector of constants, say α, let hα(·) = αh(·), Unα = αUn, and gα(·) = αg(·), where

$g(X1,Y˜1,δ1,ξ1)=E{X2Y˜2m0(Z2,θ)ψ(Y˜1,δ1,ξ1;Y˜2)1−G(Y˜2)|X1,Y˜1,δ1,ξ1}.$

Under the assumed missingness mechanism, we have

from which it is straightforward to verify that E{hα(X1, 1, δ1, ξ1; X2, 2, δ2, ξ2)} = 0. As a result of the relationship

and condition (C.3), it follows that $E(hα2)<∞$. By the central limit theorem for U-statistics, we have

where

and . This result implies

(A.10)

As for covariances, under the assumed missingness mechanism, we have

(A.11)

Note that for ki, kj, i ≠ j and , where hij = h(Xi, i, δi, ξi; Xj, j, δj, ξj).

Similarly, we have

(A.12)

and

(A.13)

Together Eqs. (A.7)(A.9) and (A.10)(A.13) prove

(A.14)

where

$ΩR=Ω1+Ω2+Ω3+2Ω1,3+2Ω2,3.$
(A.15)

By the law of large numbers and (C.4), it follows that . This result, together with (A.1) and (A.14), prove Theorem 2.1.

Proof of Theorem 2.2 Asymptotic representations similar to (A.7) can be obtained for for i = 1,…,n. Then, by applying these representations to (2) and using some algebra, Theorem 2.2 can be proved.

Proof of Theorem 3.1 By standard arguments, we have

(A.16)

where Tn1, Tn2 and Tn3 are defined in (A.2), and Tn4, Tn5 and Tn6 are defined as follows

$Tn4=n−1/2∑i=1nXiY˜iξi{δi−m0(Zi,θ)}1−G(Y˜i),Tn5=−n−1/2∑i=1nXiY˜iξi{m0(Zi,θ^n)−m0(Zi,θ)}1−G(Y˜i),Tn6=n−1/2∑i=1nXiY˜iξi{δi−m0(Zi,θ)}1−G(Y˜i){G^n(Y˜i)−G(Y˜i)1−G(Y˜i)}.$

By (A.2) and (A.7), we have

$Tn1+Tn2+Tn3=Tn+Mn+Un+op(1).$
(A.17)

Let $μ˜(Zj)=E{XY˜π(Y˜)1−G(Y˜)α(Z,Zj)|Zj}$. Under conditions (C.1) and (C.2), using arguments similar to those leading to (A.3)(A.5), we get

$Tn5=−n−1/2∑j=1nξj{δj−m(Zj,θ)}m0(Zj,θ){1−m0(Zj,θ)}μ˜(Zj)+op(1).$
(A.18)

Recalling the definition of Tn4, it follows from (A.18) that

$Tn4+Tn5=n−1/2∑i=1n[XiY˜i1−G(Y˜i)−μ˜(Zi)m0(Zi,θ){1−m0(Zi,θ)}]ξi{δi−m0(Zi,θ)}+op(1):=Tn45+op(1).$

By the central limit theorem, we have

(A.19)

where

(A.20)

and $L(X,Y˜)=XY˜1−G(Y˜)−μ˜(Y˜)m0(Z,θ){1−m0(Z,θ)}$. Similar to the way (A.5) was derived, we obtain

$Tn6=n−3/2∑i≠jXiY˜iξi{δi−m0(Zi,θ)}ψ(Y˜j,δj,ξj;Y˜i)1−G(Y˜i)+op(1):=Tn6,1+op(1).$
(A.21)

Note that

$E[XiY˜iξi{δi−m0(Zi,θ)}|Y˜i]=E[XiY˜iE(ξi|Xi,Y˜i,δi){δi−m0(Zi,θ)}|Y˜i]=E[XiY˜iE(ξi|Xi,Y˜i){E(δi|Y˜i,Xi)−E(δi|Y˜i)}|Y˜i]=0.$

From the fact that E{ψ(j, δj, ξj; i)|i} = 0, it follows that

$E‖Tn6,1‖2≤2n−1E{‖X1‖2Y˜12ψ2(Y˜2,δ2,ξ2;Y˜1)}.$
(A.22)

Equations (A.21) and (A.22) together prove that

$Tn6=op(1).$
(A.23)

Under the assumed missingness mechanism, a little algebra shows cov(Tn1, Tn45) = 0, cov(Un, Tn45) = 0 and cov(Mn, Tn45) = 0. This result, Eqs. (A.14), (A.16), (A.17), (A.19), and (A.23), and the fact that together prove Theorem 3.1.

Proof of Theorem 4.1 Similar to the derivation of (A.16), by (C.5), (C.6) and (C.7), we have

(A.24)

where Tn1, Mn and Un are defined in (A.7), and Sn and Rn are given by

$Sn=n−1/2∑i=1nXiξiπ(Y˜i){δi−m0(Zi,θ)}Y˜i1−G(Y˜i)Rn=n−1/2∑i=1nXiξiπ(Y˜i){m0(Zi,θ)−m0(Zi,θ^n)}Y˜i1−G(Y˜i).$

Similar to the way (A3) was derived, we can show that

$Rn=n−1/2∑i=1nξj{δj−m(Xj,θ)}m0(Zj,θ){1−m0(Zj,θ)}μ(Y˜j)+op(1).$

Hence, it follows from the central limit theorem that

(A.25)

where

(A.26)

and $L˜(X,Y˜)=XY˜1−G(Y˜)−π(Y˜)μ(Y˜)m0(Z,θ){1−m0(Z,θ)}$.

It can be shown that cov(Tn1 + Mn + Un, Sn + Rn) = 0. This result, together with Eqs. (A.7), (A.14), (A.24), and (A.25), and the fact that , prove Theorem 4.1.

## Contributor Information

Qihua Wang, Department of Mathematics and Statistics, Yunnan University, Kunming 650091, China. Academy of Mathematics and Systems Science, Chinese Academy of Science, Beijing 100190, China.

Gregg E. Dinse, Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709, USA.

## References

• Albert A, Anderson JA. On the existence of maximum likelihood estimates in logistic regression models. Biometrika. 1984;71:1–10.
• Buckley J, James I. Linear regression with censored data. Biometrika. 1979;66:429–436.
• Cox DR. Regression models and life tables (with discussion) J R Stat Soc B. 1972;34:187–220.
• Dewanji A. A note on a test for competing risks with missing failure type. Biometrika. 1992;79:855–857.
• Dikta G. On semiparametric random censorship models. J Stat Plan Inference. 1998;66:253–279.
• Dinse GE. Nonparametric estimation for partially-complete time and type of failure data. Biometrics. 1982;38:417–431. [PubMed]
• Gao G, Tsiatis AA. Semiparametric estimators for the regression coefficients in the linear transformation competing risks model with missing cause of failure. Biometrika. 2005;92:875–891.
• Goetghebeur EJ, Ryan L. A modified logrank test for competing risks with missing failure type. Biometrika. 1990;77:207–211.
• Goetghebeur EJ, Ryan L. Analysis of competing risks survival data when some failure types are missing. Biometrika. 1995;82:821–833.
• Hahn J. On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica. 1998;66:315–331.
• Hirano K, Imbens GW, Ridder G. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica. 2003;71:1161–1189.
• Horvitz DG, Thompson DJ. A generalization of sampling without replacement from a finite universe. J Am Stat Assoc. 1952;47:663–685.
• Jin Z, Lin DY, Wei LJ, Ying Z. Rank-based inference for the accelerated failure time model. Bio-metrika. 2003;90:341–353.
• Koul H, Susarla V, van Ryzin J. Regression analysis with randomly right-censored data. Ann Stat. 1981;9:1276–1288.
• Lai TL, Ying Z. Rank regression methods for left-truncated and right-censored data. Ann Stat. 1991;19:531–556.
• Leurgans S. Linear models, random censoring and synthetic data. Biometrika. 1987;74:301–309.
• Li G, Wang QH. Empirical likelihood regression analysis for right censored data. Stat Sinica. 2003;13:51–68.
• Lo S-H. Estimating a survival function with incomplete cause-of-death data. J Multivar Anal. 1991;39:217–235.
• Lu W, Liang Y. Analysis of competing risks data with missing cause of failure under additive hazards model. Stat Sinica. 2008;18:219–234.
• Lu K, Tsiatis AA. Multiple imputation methods for estimating regression coefficients in the competing risks model with missing cause of failure. Biometrics. 2001;57:1191–1197. [PubMed]
• Lunceford JK, Davidian M. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med. 2004;23:2937–2960. [PubMed]
• McKeague IW, Subramanian S. Product-limit estimators and Cox regression with missing censoring information. Scand J Stat. 1998;25:589–601.
• Miller RG. Least squares regression with censored data. Biometrika. 1976;63:449–464.
• Peddada SD, Patwardhan G. Jackknife variance estimators in linear models. Biometrika. 1992;79:654–657.
• Reid N. A conversation with Sir David Cox. Stat Sci. 1994;9:439–455.
• Ritov Y. Estimation in a linear regression model with censored data. Ann Stat. 1990;18:303–328.
• Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc. 1994;89:846–866.
• Santner TJ, Duffy DE. A note on A. Albert and J. A. Anderson’s conditions for the existence of maximum likelihood estimates in logistic regression models. Biometrika. 1986;73:755–758.
• Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for nonignorable drop-out using semiparametric nonresponse models (with discussion) J Am Stat Assoc. 1999;94:1096–1146.
• Subramanian S. Asymptotically efficient estimation of a survival function in the missing censoring indicator model. Nonparametr Stat. 2004;16:797–817.
• Subramanian S. Survival analysis for the missing censoring indicator model using kernel density estimation techniques. Stat Methodol. 2006;3:125–136. [PubMed]
• Tsiatis AA. Estimating regression parameters using linear rank tests for censored data. Ann Stat. 1990;18:354–372.
• Tsiatis AA, Davidian M, McNeney B. Multiple imputation methods for testing treatment differences in survival distributions with missing cause of failure. Biometrika. 2002;89:238–244.
• Wang QH, Ng K. Asymptotically efficient product-limit estimators with censoring indicators missing at random. Stat Sinica. 2008;18:749–768.
• Wang QH, Linton O, Härdle W. Semiparametric regression analysis with missing response at random. J Am Stat Assoc. 2004;99:334–345.
• Ying Z. A large sample study of rank estimation for censored regression data. Ann Stat. 1993;21:76–99.
• Zhou X, Sun L. Additive hazards regression with missing censoring information. Stat Sinica. 2003;13:1237–1257.

 PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers.