In the following section, a simulation-based approach is used to compare the results from various modeling strategies focusing on the estimation of the association parameter θ

_{1} (

equation (3)). We compare the mean and standard deviation of 1000 estimated θ

_{1}’s obtained from the proposed joint model (

*Joint IC*,

equation (7)) using the interval-censored adjustment (

equation (8)) with the results based on five alternative strategies. The

*Joint IC* model accommodates exposure data that are measured with and without error (i.e. either fully observed or known only within an interval due to assay limitations). The

*Joint ALL IC* model applies the same approach except that it treats all exposure observations as interval-censored.

The

*two-stage IC* approach first models the observed exposure data to obtain estimates of

accounting for interval censoring. See Section 3.4 for details on empirical Bayes estimates. These estimates are then used in the second-stage health outcome model. This regression calibration-like method incorporates the interval-censoring adjustment in the first-stage exposure model, accommodating those measured with and without error, and then treats the estimated

as if measured without error in the second stage.

In the

*Joint Midpoint* model, each exposure observation, whether measured without error, set to a fraction (

) of the LOD, or rounded to the nearest integer, is treated as if measured without error. This model addresses the

*associated* measurement error using the joint model (

equation (7)), but ignores potential issues inherent in relying upon rounded values.

The

*two-stage Midpoint* approach first models the observed exposure data, treated expediently as in the joint midpoint approach, to obtain standard empirical Bayes estimates of

. These estimates, analogous to those proposed by Terrell

*et al.* [

24], are then inserted into the second-stage health outcome model.

The

*two-stage OLS* method obtains OLS estimates of the slope for each subject. These slopes are then regressed against potential covariates (i.e. body mass index (BMI)) to obtain adjusted slopes. In the second stage it treats the estimated

, as if measured without error. This method parallels that by Blanck

*et al.* [

6].

The

*True* method, though not available in practice, takes advantage of all (unobserved) true

*y*_{ij}’s available in the simulated data. This model addresses the

*associated* measurement error using the joint model (

equation (7)) approach, and suffers from no other source of error.

4.1. Simulation I: binary outcome

Four exposure measurements were simulated for each of 500 subjects. Time between measurements was generated via a uniform distribution with bounds of 0.5–2 years. BMI at the initial time of exposure was generated from a transformed Beta(2,3) distribution with bounds of 15 and 38. BMI was then centered around the sample mean.

Exposure measurements were generated under the model described in

equation (1) with BMI included as a covariate interacting with time. Covariance parameters were set to the following values:

, σ

_{ab} = −0.04, and σ

^{2} = 0.7. Coefficient parameters were: α

_{0} = 6, β

_{0} = −0.3, and β

_{1} = 0.002.

The lower LOD was set at 3, resulting in ≈ 10 per cent of measurements falling below the LOD. The remaining measures were randomly assigned via a Bernoulli distribution with probability = 0.5 to be fully observed or to be rounded to the nearest integer in order to mimic interval censoring. Simulated data ranged from 0–12. Interval-censored observations fell into one of the following categories: (0, 3), [3,3.5), [3.5,4.5), [4.5,5.5),…,[11.5,12.5),…, with 95 per cent of the data falling within the first six intervals.

A binary outcome was simulated for each subject under the model described in

equation (3) using the simulated

and age as a covariate. The event time for each subject,

, was generated from a uniform distribution with bounds of 1 and 3 years after the final exposure measurement. Age at the time of the health outcome was generated via a transformed Beta(1,3) with bounds 16–60 and was centered around the sample mean, and associated with the outcome via the coefficient θ

_{2} = −0.05. The overall probability of a subject having the health outcome of interest, i.e.

*Pr*[

*R*_{i} = 1], was chosen to be ≈0.25. Three parallel simulations were run, setting the parameter of interest, θ

_{1}, to be 0.2, 0.4, and 0.8, respectively.

Results are given in . Fitting the

*true* model, where only

*associated* measurement error is involved, results in little bias and optimal coverage within the specified logistic model setting. The

*OLS* method results in an average

_{1} that underestimates the true θ

_{1} by more than 50 per cent, together with markedly sub-nominal coverage rates. The

*midpoint* methods, both joint and two-stage, perform considerably better than the OLS method, although the results also show consistent underestimation of the association and sub-nominal coverage rates. The proposed interval-censored adjusted methods perform quite well, showing very little bias for the two smaller magnitudes of the association. The

*two-stage IC* approach exhibits greater bias when θ = 0.08. Also evident is a markedly lower coverage rate, 83 per cent, for the two-stage IC approach when θ = 0.08. The coverage rate of the

*Joint IC* method is 92 per cent, though below the nominal 95 per cent, is quite acceptable relative to the

*True* model.

| **Table I**Simulation I: binary outcome. |

In this simulation, and the two that follow in the next sections, convergence rates were near 99 per cent or better for all methods (see Table footnotes). The *Joint IC* and *Joint Midpt* models were fit in *SAS NLMIXED* using the parameter estimates from the *two-stage Midpt* model as the starting values.

To explore the impact of sample size, a second simulation study was run simulating data for 1000 subjects for θ

_{1} = 0.2 and 0.4 independently. The models were fit to a random set of 100, 200, 300, 400, 500, 750, and 1000 subjects for each simulated data set. A plot of the means of

_{1} against the sample size (see ) shows that the joint

*IC* model consistently improves the estimation of θ

_{1} for both θ

_{1} = 0.2 and 0.4 as compared with the two-stage joint and both midpoint models. The model performs quite well even with a sample size of 100 though we note that convergence is under 90 per cent when

*N* =

*N*_{k} = 100. Model convergence improves to better than 95 per cent when

*N* increases to 200 and to nearly 100 per cent when

*N*≥400.

4.2. Simulation II: binary outcome—coarser exposure

In the second simulation study, we explore the impact of fewer and wider intervals. We assume the identical exposure model and coefficient parameters as described in Section 4.1, however interval-censored observations were assigned to one of the following intervals: (0, 3), [3,4.5), [4.5,6.5), [6.5,8.5), [8.5,10.5), [10.5,12.5),…. Nearly 100 per cent of the data fell within the first four intervals.

Results are given in , and continue to indicate that the *IC* adjustment methods perform quite well for coarser exposure data. Bias is negligible and coverage is near-optimal for both θ_{1} = 0.2 and 0.4. We note that the *midpoint* methods still exhibit underestimation of the association, and sub-nominal coverage for larger θ_{1}.

| **Table II**Simulation II: binary outcome—coarser exposure. |

Here we also emphasize the joint interval-censored adjusted method when all observations are treated as interval-censored (

*Joint All IC*) versus a model that distinguishes between those measured without error and those interval-censored (

*Joint IC*). While these methods were virtually indistinguishable in , in

*Joint IC* yields a smaller average SE, reflecting the potential increase in efficiency when the method is tuned to handle both types of observations. The flexibility and ease with which the tuning is accomplished is an advantage of the proposed implementation via the

*NLMIXED* procedure [

32]. The MFHS study (Section 5) provides an example in which evolving laboratory methods logically lead to a desire to account for both completely observed and rounded (i.e. interval-censored) exposure data.

4.3. Simulation III—continuous outcome

In the third simulation study, we explore similar methods assuming a linear health outcome model. Exposure measurements were simulated and interval-censored in the same manner as described in Section 4.1. In the MFHS context, a potential outcome in this context is the age of onset of menarche in daughters of exposed mothers. The linear health outcome was simulated under the model described in

equation (6). A single covariate (

*B*_{i2}) representing whether the daughter was breastfed or not was included, generated via a Bernoulli distribution with a probability 0.5. We let

*N* = 400,

*N*_{k} = 219, and

*E*[

*n*_{i}] = 4. The mean simulated age of onset of menarche (

*R*_{i}) was ≈ 10.5 years. Three parallel simulations were run with θ

_{1} = −0.05, −0.1, and −0.2. For each simulation, θ

_{0} = 11, θ

_{2} = −0.3, and

.

Results are given in . The *IC* adjustment method again shows very little bias and provides good coverage for all three magnitudes of association, in both the joint and two-stage models. Both *midpoint* methods produce an underestimation of the association, with satisfactory coverage when θ_{1} = −0.05, but sub-nominal coverage for the larger associations. The OLS method again results in considerable bias and sub-nominal coverage.