In the following section, a simulation-based approach is used to compare the results from various modeling strategies focusing on the estimation of the association parameter θ1
). We compare the mean and standard deviation of 1000 estimated θ1
’s obtained from the proposed joint model (Joint IC
, equation (7)
) using the interval-censored adjustment (equation (8)
) with the results based on five alternative strategies. The Joint IC
model accommodates exposure data that are measured with and without error (i.e. either fully observed or known only within an interval due to assay limitations). The Joint ALL IC
model applies the same approach except that it treats all exposure observations as interval-censored.
The two-stage IC
approach first models the observed exposure data to obtain estimates of
accounting for interval censoring. See Section 3.4 for details on empirical Bayes estimates. These estimates are then used in the second-stage health outcome model. This regression calibration-like method incorporates the interval-censoring adjustment in the first-stage exposure model, accommodating those measured with and without error, and then treats the estimated
as if measured without error in the second stage.
In the Joint Midpoint
model, each exposure observation, whether measured without error, set to a fraction (
) of the LOD, or rounded to the nearest integer, is treated as if measured without error. This model addresses the associated
measurement error using the joint model (equation (7)
), but ignores potential issues inherent in relying upon rounded values.
The two-stage Midpoint
approach first models the observed exposure data, treated expediently as in the joint midpoint approach, to obtain standard empirical Bayes estimates of
. These estimates, analogous to those proposed by Terrell et al.
], are then inserted into the second-stage health outcome model.
The two-stage OLS
method obtains OLS estimates of the slope for each subject. These slopes are then regressed against potential covariates (i.e. body mass index (BMI)) to obtain adjusted slopes. In the second stage it treats the estimated
, as if measured without error. This method parallels that by Blanck et al.
method, though not available in practice, takes advantage of all (unobserved) true yij
’s available in the simulated data. This model addresses the associated
measurement error using the joint model (equation (7)
) approach, and suffers from no other source of error.
4.1. Simulation I: binary outcome
Four exposure measurements were simulated for each of 500 subjects. Time between measurements was generated via a uniform distribution with bounds of 0.5–2 years. BMI at the initial time of exposure was generated from a transformed Beta(2,3) distribution with bounds of 15 and 38. BMI was then centered around the sample mean.
Exposure measurements were generated under the model described in equation (1)
with BMI included as a covariate interacting with time. Covariance parameters were set to the following values:
= −0.04, and σ2
= 0.7. Coefficient parameters were: α0
= 6, β0
= −0.3, and β1
The lower LOD was set at 3, resulting in ≈ 10 per cent of measurements falling below the LOD. The remaining measures were randomly assigned via a Bernoulli distribution with probability = 0.5 to be fully observed or to be rounded to the nearest integer in order to mimic interval censoring. Simulated data ranged from 0–12. Interval-censored observations fell into one of the following categories: (0, 3), [3,3.5), [3.5,4.5), [4.5,5.5),…,[11.5,12.5),…, with 95 per cent of the data falling within the first six intervals.
A binary outcome was simulated for each subject under the model described in equation (3)
using the simulated
and age as a covariate. The event time for each subject,
, was generated from a uniform distribution with bounds of 1 and 3 years after the final exposure measurement. Age at the time of the health outcome was generated via a transformed Beta(1,3) with bounds 16–60 and was centered around the sample mean, and associated with the outcome via the coefficient θ2
= −0.05. The overall probability of a subject having the health outcome of interest, i.e. Pr
= 1], was chosen to be ≈0.25. Three parallel simulations were run, setting the parameter of interest, θ1
, to be 0.2, 0.4, and 0.8, respectively.
Results are given in . Fitting the true
model, where only associated
measurement error is involved, results in little bias and optimal coverage within the specified logistic model setting. The OLS
method results in an average 1
that underestimates the true θ1
by more than 50 per cent, together with markedly sub-nominal coverage rates. The midpoint
methods, both joint and two-stage, perform considerably better than the OLS method, although the results also show consistent underestimation of the association and sub-nominal coverage rates. The proposed interval-censored adjusted methods perform quite well, showing very little bias for the two smaller magnitudes of the association. The two-stage IC
approach exhibits greater bias when θ = 0.08. Also evident is a markedly lower coverage rate, 83 per cent, for the two-stage IC approach when θ = 0.08. The coverage rate of the Joint IC
method is 92 per cent, though below the nominal 95 per cent, is quite acceptable relative to the True
Simulation I: binary outcome.
In this simulation, and the two that follow in the next sections, convergence rates were near 99 per cent or better for all methods (see Table footnotes). The Joint IC and Joint Midpt models were fit in SAS NLMIXED using the parameter estimates from the two-stage Midpt model as the starting values.
To explore the impact of sample size, a second simulation study was run simulating data for 1000 subjects for θ1
= 0.2 and 0.4 independently. The models were fit to a random set of 100, 200, 300, 400, 500, 750, and 1000 subjects for each simulated data set. A plot of the means of 1
against the sample size (see ) shows that the joint IC
model consistently improves the estimation of θ1
for both θ1
= 0.2 and 0.4 as compared with the two-stage joint and both midpoint models. The model performs quite well even with a sample size of 100 though we note that convergence is under 90 per cent when N
= 100. Model convergence improves to better than 95 per cent when N
increases to 200 and to nearly 100 per cent when N
Figure 1 Plot of means of 1 against sample size from 500 simulations.
4.2. Simulation II: binary outcome—coarser exposure
In the second simulation study, we explore the impact of fewer and wider intervals. We assume the identical exposure model and coefficient parameters as described in Section 4.1, however interval-censored observations were assigned to one of the following intervals: (0, 3), [3,4.5), [4.5,6.5), [6.5,8.5), [8.5,10.5), [10.5,12.5),…. Nearly 100 per cent of the data fell within the first four intervals.
Results are given in , and continue to indicate that the IC adjustment methods perform quite well for coarser exposure data. Bias is negligible and coverage is near-optimal for both θ1 = 0.2 and 0.4. We note that the midpoint methods still exhibit underestimation of the association, and sub-nominal coverage for larger θ1.
Simulation II: binary outcome—coarser exposure.
Here we also emphasize the joint interval-censored adjusted method when all observations are treated as interval-censored (Joint All IC
) versus a model that distinguishes between those measured without error and those interval-censored (Joint IC
). While these methods were virtually indistinguishable in , in Joint IC
yields a smaller average SE, reflecting the potential increase in efficiency when the method is tuned to handle both types of observations. The flexibility and ease with which the tuning is accomplished is an advantage of the proposed implementation via the NLMIXED
]. The MFHS study (Section 5) provides an example in which evolving laboratory methods logically lead to a desire to account for both completely observed and rounded (i.e. interval-censored) exposure data.
4.3. Simulation III—continuous outcome
In the third simulation study, we explore similar methods assuming a linear health outcome model. Exposure measurements were simulated and interval-censored in the same manner as described in Section 4.1. In the MFHS context, a potential outcome in this context is the age of onset of menarche in daughters of exposed mothers. The linear health outcome was simulated under the model described in equation (6)
. A single covariate (Bi2
) representing whether the daughter was breastfed or not was included, generated via a Bernoulli distribution with a probability 0.5. We let N
= 400, Nk
= 219, and E
] = 4. The mean simulated age of onset of menarche (Ri
) was ≈ 10.5 years. Three parallel simulations were run with θ1
= −0.05, −0.1, and −0.2. For each simulation, θ0
= 11, θ2
= −0.3, and
Results are given in . The IC adjustment method again shows very little bias and provides good coverage for all three magnitudes of association, in both the joint and two-stage models. Both midpoint methods produce an underestimation of the association, with satisfactory coverage when θ1 = −0.05, but sub-nominal coverage for the larger associations. The OLS method again results in considerable bias and sub-nominal coverage.