4.1 Simulation 1: Linear Model

We simulate independent tuples {*Y*_{i}, *N*_{i}, *Z*_{i}}, for *i* = 1, …, *n*. Here *N*_{i} is a stationary point process with intensity *X*_{i}, where *X*_{i} ~ 1 + Log Normal(0, 0.5); *Z*_{i} is an error-free Bernoulli random variable independent of *X*_{i} and with mean equal to 0.5; and *Y*_{i} is given by

where

*θ* = (

*θ*_{0},

*θ*_{1},

*θ*_{2})

^{T} = (1, 1, 1)

^{T} and

*ε*_{i} ~ Normal(0, 0.5

^{2}). We assume that

*X*_{i} is unknown but can be estimated by

*W*_{i} =

*N*_{i}((0,

*τ*])/

*τ*, where

*N*_{i}((0,

*τ*]) denotes the number of events of

*N*_{i} contained in the time interval (0,

*τ*]. Note that

*W*_{i} takes the general form given in (3).

We consider two scenarios for *N*_{i}. Under the first scenario, *N*_{i} is a homogeneous Poisson process. To simulate *N*_{i}, we first generate a Poisson random variable *M*_{i} with mean equal to *τX*_{i}, and then randomly simulate *M*_{i} events in (0, *τ*] according to a uniform distribution. Under the second scenario, *N*_{i} is a homogeneous Poisson cluster process. To simulate *N*_{i}, we first generate a homogeneous Poisson process with intensity *ρ*_{i} = *X*_{i}/3 as the parent process. For each parent, we then generate *m* = 3 children on average according to a Poisson distribution, and disperse the children locations independently following a normal distribution centered at the parent location and with standard deviation *ω* = 2. The final process contains only the children event times. For Poisson processes, all events in disjoint time intervals are independent. However, this is not true for Poisson cluster processes.

We repeat the simulation 200 times in each case. For each simulated dataset, we apply our proposed MOM and SUBEX estimators to estimate

*θ*. For the former, we obtain

and

^{2} based on 20 equally spaced

*p*-values in [1/3, 1/2] when regressing

*Ỹ* (

*p*) on

(

*p*). For the latter, we assign

*$\widehat{x}$*(

*p*) = 1/

*p* and

*$\widehat{x}$*(

*p*) = 1/

*p* − (1 −

*p*)

/(

*p*^{2}*τ*^{2}^{2}) and refer to the resulting estimators as SUBEX

_{1} and SUBEX

_{2}, respectively. We set the

*p*-values used for both estimators to be from 0.6 to 0.95 with an increment of 0.05, and use a quadratic function for the extrapolation. In the

online Appendix, we show additional results for SUBEX with a finer choice of

*p*-values and also with a cubic extrapolating function. These results suggest that the SUBEX method is not sensitive to the choice of

*p*-values, since increasing the number of

*p* does not change the results much. Using the cubic function can reduce the bias but also increases the variance of the SUBEX estimator, and leads to a less efficient estimator in terms of mean squared error.

For comparison, we also apply the naive estimator by treating *W*_{i} as if it was *X*_{i}, and the naive MOM (referred to as MOM_{1}) and the empirical SIMEX estimators described in Section 3.4. For the last two estimators, we assume that *N*_{i} in (0, *τ*/2] and (*τ*/2, *τ*] are independent. This is true for the Poisson processes, but not for the Poisson cluster processes. In , we report the bias, standard error, mean of the estimated standard error and relative efficiency of the considered estimators. The relative efficiency of an estimator is defined as the mean squared error of the naive estimator divided by that of the estimator under consideration.

| **Table 1**Simulation results for linear model |

In the Poisson process case, we set *τ* = 10 and *n* = 200. From , we see that the naive estimator for *θ*_{1} has a large bias due to the estimation error in *W*_{i}. However, the biases in the other estimators are much smaller. Since *N*_{i} is Poisson, there is no need to consider within-trajectory dependence. As a result, the naive MOM_{1} estimator, the empirical SIMEX, and SUBEX_{1} have slightly higher relative efficiencies than the proposed MOM_{2} and SUBEX_{2} estimators, both of which are designed to account for within-trajectory dependence. However, none of the estimators for *θ*_{2} suffers from attenuation, and the naive estimator has the smallest standard error. This is because *Z*_{i} is measured without error and is also independent with *X*_{i}. This phenomena has been repeatedly observed throughout our simulation. We will therefore discuss only the results for *θ*_{1} from now on.

In the Poisson cluster process case, we set *τ* = 10, 20 and *n* = 200. From , we see that the naive estimator is even more biased than in the Poisson process case. This is because of the increased estimation error in *W*_{i} due to correlation. Our proposed MOM_{2} estimator has the smallest bias among all estimators: the bias when *τ* = 10 is −0.1869 whereas the next smallest bias in magnitude is −0.4009; the bias when *τ* = 20 further reduces to −0.1014, which is again the smallest among all estimators. The relative efficiency for the proposed MOM_{2} estimator is also the highest. In particular, it can be eight times as efficient as the naive estimator when *τ* = 20. The two SUBEX estimators perform significantly better than the empirical SIMEX. This is expected following our discussion in Section 3.4. Between the two, SUBEX_{2} works slightly better than SUBEX_{1}, because it better accounts for within-trajectory dependence. In general, the amount of information used to estimate *X*_{i} increases as *τ* increases. This fact explains the better results for all estimators when *τ* = 20.

4.2 Simulation 2: Survival Model

An important part of the project is to model the first relapse time after treatment. As explained in Section 2, the data are partially interval censored. We use the second simulation study to mimic the real data and to check the performance of the proposed estimators.

We simulate *X*_{i}, *N*_{i}, and *Z*_{i} as in Section 4.1, for *i* = 1, …, *n*, and simulate the failure time *T*_{i} through a Cox proportional hazard model with the hazard function

where

*λ*_{0}(

*t*) =

*t* is the (unobserved) baseline hazard function. The regression coefficients are

*θ* = (

*β*,

*η*)

_{T} = (1, 1)

^{T}. We assume censoring at random and set the censoring times to be (0.2, 0.5, 1). Let the censoring indicator

*δ*_{i} be a binary variable independent of

*X*_{i} and

*Z*_{i}, with

*P*(

*δ;*_{i} = 1) = 0.5. When

*δ*_{i} = 1, the event time

*T*_{i} is censored in the interval between the two censoring times closest to

*T*_{i}; if

*T*_{i} is less than 0.2, it is censored in [

]; if an event time is over 1, it is automatically right censored at 1. Overall, about 12% of the observations are right censored, 43% are interval censored, and the rest 45% are observed.

For interval censored data, it is difficult to separate estimating the baseline hazard function from estimating

*θ* by using approaches such as the partial likelihood (

Cai and Betensky 2003). Many authors have considered modeling the baseline hazard as spline functions. We follow

Cai and Betensky (2003) and

Ruppert, Wand, and Carroll (2003) to model the log baseline hazard as a linear spline function:

where

*x*_{+} max(

*x*, 0) and

*κ*_{k}’s are the knots. There are two immediate benefits for using this model. First,

*λ*_{0}(·) is guaranteed to be nonnegative, so that we do not need any constraint on the parameters when maximizing the likelihood. Second, since

(·) is modeled as piecewise linear polynomial, the cumulative hazard function can be written out in an explicit form. For higher-order spline functions, such explicit expressions are not available.

Let Θ be the collection of all parameters, including both

*θ* and the spline coefficients in (

11). Then, Θ can be estimated by maximizing the penalized likelihood

where

_{i}(Θ) is the log-likelihood function for the

*i*th subject,

is a tuning parameter that controls the smoothness of

(

*t*), and

**b** = (

*b*_{1},

*b*_{2}, …,

*b*_{K})

^{T}.

Cai and Betensky (2003) gave the detailed expression for

_{i}(Θ) and also suggested a data-driven procedure to estimate

. As pointed out by

Apanasovich, Carroll, and Maity (2009), estimation of the parametric component in a semiparametric setting is not sensitive to the choice of the smoothing parameter. We therefore use a fixed yet reasonable

in our simulation so as to reduce the computational burden.

As in the linear model case, we use both homogeneous Poisson processes and Poisson cluster processes for

*N*_{i}. We consider five estimators: the estimator using the true

*X*_{i}, the naive estimator that replaces

*X*_{i} by

*W*_{i}, the two SUBEX estimators, that is, SUBEX

_{1} and SUBEX

_{2}, and the empirical SIMEX. The results in are obtained by modeling

(

*t*) with a 10-knot spline function. The spline function provides a sufficient approximation, as can be seen from the fact that the estimator based on the true

*X*_{i} is almost unbiased. Thus, all biases in the other four estimators are mainly caused by the estimation error in

*W*_{i}. We have also tried

*K* = 25 knots and obtained very similar results.

| **Table 2**Simulation results in the interval-censored failure time data case |

In the Poisson process case, we set *τ* = 10 and *n* = 200. Similar to the linear model case, the naive estimator is severely biased, but the SUBEX and the empirical SIMEX estimators can significantly reduce the bias. Since there is no within-trajectory dependence in *N*_{i}, the empirical SIMEX perform better than the two SUBEX estimators, and there is no benefit for using SUBEX_{2} over SUBEX_{1}.

In the Poisson cluster process case, we set *τ* = 10, 20 and *n* = 200. We also consider *n* = 400 when *τ* = 20. From , we see that both SUBEX_{1} and SUBEX_{2} have much smaller bias in magnitude than the empirical SIMEX but are also more variable. Thus, there is a trade-off between reducing the bias and increasing the variance. Between the two SUBEX estimators, SUBEX_{2} is slightly more effective in reducing the bias than SUBEX_{1}. In terms of relative efficiency, the two SUBEX estimators perform significantly better than the empirical SIMEX except when *τ* = 20 and *n* = 200. In this case, the relative efficiency of all three estimators are similar.

When

*τ* = 10, the bias is quite large after the bias correction. This is because the within-trajectory dependence is strong in this case. When

*τ* increases to 20, the biases decrease significantly in all cases. However, when

*n* increases from 200 to 400, the biases are nearly unchanged for all estimators, although the standard errors decrease. In general, the bias of the SUBEX estimators depends on the strength of the within-trajectory dependence relative to the length of the observation interval, that is,

*τ*, but not on the sample size

*n*. The statistic 2

/(

*τ* ^{2}^{2}) given in Section 3.4 can be used to gauge the level of dependence. Specifically, it is approximately equal to 0.2519 when

*τ* = 10 and equal to 0.1228 when

*τ* = 20.

In the

online Appendix, we show additional simulation results with a finer choice of

*p*-values and also with a cubic extrapolating function. Similar to the linear model case, we find that using the cubic function can reduce the bias but inflate the variance of the final estimator. Increasing the number of

*p*-values does not change the results much, although variability of the new estimator does become smaller.