The finite-sample properties of the proposed estimators were evaluated via Monte Carlo simulation, with emphasis on the effects of sample size, percentage of observations censored, and percentage of censoring indicators missing. Response data were generated from the linear model: Y = α + βX + ε, with α = 3 and β = 0.5. The random variables ε, X, and C were independently generated from N(0,1), U(0,2), and N(μ, 4) distributions, respectively. For each subject, the observed response was Ỹ = min(Y, C) and the censoring indicator was δ = I (Y ≤ C). Conditional on Ỹ = ỹ, the probability that the censoring indicator was missing, 1 − π(ỹ), was determined by the logistic model: log{π(ỹ)/|1 − π(ỹ)]} = θ1 + θ2ỹ.
Censoring rates of 20 and 40% were obtained by setting μ = 5.395 and 4.067, respectively. Values for θ1 and θ2 were selected by specifying the proportion of subjects with a missing censoring indicator at ỹ = 0 and by specifying the average proportion (over all ỹ) with a missing censoring indicator. We simulated data so that the average missingness rate was 20 or 40% and so that the missingness rate increased or decreased with ỹ. We chose θ1 so that 1 − π(0) = {1 + exp(θ1)}−1 was 0.15 or 0.25 when the average missingness rate was 20%, and so that 1 − π(0) was 0.30 or 0.50 when the average missingness rate was 40%. We also examined situations in which none of the censoring indicators were missing, both for 20 and 40% censoring, as well as the case where there was no censoring at all.
We generated 10,000 Monte Carlo random samples of size
n = 50, 100, 200 and 400 under each combination of censoring and missingness. Every data set was analyzed under the model:
Y = α + β
X + ε. For each configuration, we averaged over the 10,000 data sets to estimate the mean squared error (MSE), bias, and standard error (SE) associated with the slope estimators
R,
I, and
W. The results for the intercept estimators
R,
I, and
W were qualitatively the same, so they are not presented. When none of the censoring indicators were missing, we also calculated the unbounded
Koul et al. (1981) estimator:
where

is defined in
(1). We view
K as a reference for comparisons in our simulations.
We used the uniform kernel function

if |
u| ≤ 1 and
W(
u) = 0 otherwise, and the biweight kernel function

if |
u| ≤ 1 and
K(
u) = 0 otherwise. The bandwidths were
hn =
bn =
n−1/3max(
ỹ). We estimated
m(
z) =
pr(δ = 1|
z) under the logistic model:
log{
m(
z)/[1 −
m(
z)]} = γ
1 + γ
2ỹ + γ
3x. When the data on δ are completely (or quasi-completely) separated, the maximum likelihood estimate of γ = (γ
1, γ
2, γ
3) does not exist (
Albert and Anderson 1984;
Santner and Duffy 1986). We excluded such data sets and continued simulating until we obtained 10,000 samples. The proportion of samples excluded did not exceed 0.3%, except when the average missingness rate was 40%, the censoring rate was 20%, and
n was 50, in which case less than 2.3% of the samples were excluded.
The MSE, bias, and SE for
K,
R,
I, and
W are presented in , , and , respectively. The average jackknife estimate of SE is also shown in . Results are given only for situations where the average missingness rate increased with
ỹ, as the results for the decreasing cases were virtually identical. The
Koul et al. (1981) estimator (
k) is included as a benchmark when no censoring indicators are missing. The first row of each table corresponds to the special case of no censoring, where all of the estimators are identical because each synthetic response

reduces to the observed response
Yi.
shows that when no censoring indicators were missing, the MSEs for
R were less than or equal to those for
K, whereas the MSEs for
I and
W were slightly larger, at least for the 40% censoring case. For any given configuration, with or without missing censoring indicators, the MSE for
R never exceeded the MSE for
I or
W. As expected, the MSEs for
R,
I, and
W decreased (i.e., improved) as sample sizes increased, as censoring rates decreased, and as average missingness rates decreased (see ).
The estimators also were evaluated with respect to bias and SE, the squares of which contribute equally to MSE. gives the biases, all of which are negative when censoring is present; thus, each approach tended to underestimate the slope parameter. One interesting result is that applying our methods with missing censoring indicators usually produced less bias than applying the standard estimator of
Koul et al. (1981) with complete censoring information. In fact, with a 40% censoring rate, the bias for each of our estimators (with missingness rates of either 20 or 40%) was always less than for
K with no missing censoring indicators. Among the proposed estimators, the bias was consistently smallest for the inverse probability weighted estimator,
W. Also, biases decreased as sample sizes increased and as censoring rates decreased, but were not affected as much by missingness rates.
gives the SEs and their jackknife estimates. The SE patterns mimic those for the MSEs in because bias and SE contribute equally to MSE and the SEs are larger (in absolute value) than the biases. As with the MSEs,
R had the smallest SEs and
W had the largest. The SEs decreased as
n increased, as censoring decreased, and as missingness decreased. The jackknife estimates were always good for
W, but became too small for
I, and even smaller for
R, as
n decreased, as censoring increased, and as missingness increased.
In order to assess robustness, we analyzed the same simulated data with a “poor” model choice for
m(
z). Rather than using
log{
m(
z)/[1 −
m(
z)]} = γ
1 + γ
2ỹ + γ
3x to model
m(
z) as a logistic function of
ỹ and
x, we set γ
2 = γ
3 = 0 and modeled
m(
z) as a constant, which gives

. displays the biases and SEs obtained for this poor model choice. The results for no missing censoring indicators (
M R = 0%) are the same as in and , except for
R, as only
R depends on

(
z) in this situation. The SEs always decrease as sample sizes increase, as does the bias of
W, but the biases of
R and
I actually increase with the sample size in many cases. In fact, the bias of
R is now positive and increases with
n in every situation, whereas it was always negative and approached 0 as
n increased when using the better model for
m(
z). The bias of
I increases with the sample size, which makes the negative biases smaller in absolute value, but once the biases become positive, they grow in absolute value as
n increases. In contrast, the bias of
W for the poor model choice is nearly identical to the bias of
W when using the better model. As predicted theoretically, the double-robustness property of
W apparently keeps its performance from degrading if a poor choice is made when modeling
m(
z).