PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of ijbiostatThe International Journal of BiostatisticsThe International Journal of BiostatisticsSubmit to The International Journal of BiostatisticsSubscribe
 
Int J Biostat. Jan 1, 2011; 7(1): Article 15.
Published online Feb 16, 2011. doi:  10.2202/1557-4679.1281
PMCID: PMC3058406
Methods for Estimation of Radiation Risk in Epidemiological Studies Accounting for Classical and Berkson Errors in Doses
Alexander Kukush, Sergiy Shklyar, Sergii Masiuk, Illya Likhtarov, Lina Kovgan, Raymond J Carroll, and Andre Bouville
Alexander Kukush, Ukraine Radiation Protection Institute;
With a binary response Y, the dose-response model under consideration is logistic in flavor with pr(Y=1 | D) = R (1+R)−1, R = λ0 + EAR D, where λ0 is the baseline incidence rate and EAR is the excess absolute risk per gray. The calculated thyroid dose of a person i is expressed as equation M1. Here, equation M2 is the measured content of radioiodine in the thyroid gland of person i at time tmes, equation M3 is the estimate of the thyroid mass, and fi is the normalizing multiplier. The Qi and Mi are measured with multiplicative errors equation M4 and equation M5, so that equation M6 (this is classical measurement error model) and equation M7 (this is Berkson measurement error model). Here, equation M8 is the true content of radioactivity in the thyroid gland, and equation M9 is the true value of the thyroid mass. The error in fi is much smaller than the errors in ( equation M10, equation M11) and ignored in the analysis.
By means of Parametric Full Maximum Likelihood and Regression Calibration (under the assumption that the data set of true doses has lognormal distribution), Nonparametric Full Maximum Likelihood, Nonparametric Regression Calibration, and by properly tuned SIMEX method we study the influence of measurement errors in thyroid dose on the estimates of λ0 and EAR. The simulation study is presented based on a real sample from the epidemiological studies. The doses were reconstructed in the framework of the Ukrainian-American project on the investigation of Post-Chernobyl thyroid cancers in Ukraine, and the underlying subpolulation was artificially enlarged in order to increase the statistical power. The true risk parameters were given by the values to earlier epidemiological studies, and then the binary response was simulated according to the dose-response model.
Keywords: Berkson measurement error, Chornobyl accident, classical measurement error, estimation of radiation risk, full maximum likelihood estimating procedure, regression calibration, SIMEX estimator, uncertainties in thyroid dose
The current methods of risk estimation (see Health Risks from Exposure to Low Levels of Ionizing Radiation, 2006; Jacob et al., 2006; Likhtarev et al., 2006a) take into account uncertainties related to health effects, but assume that the doses are known with perfect precision and accuracy, something that is not true (Kopecky et al., 2006; Likhtarev et al., 1993; Likhtarev et al., 2003; Ron and Hoffman, 1999). Although many attempts have been made to develop mathematical tools which take into account the dose uncertainty (see Carroll et al., 2006; Hofer, 2008; Kopecky et al., 2006; Li et al., 2007; Masiuk et al., 2008), the problem has yet to be solved in a satisfactory manner. It is our opinion that one of the reasons an accurate model is so hard to create is because the dose estimates are almost inevitably associated with a mixture of classical and Berkson errors (Reeves et al., 1998; Carroll et al., 2006; Li et al., 2007), which leads to difficulties in evaluating the influence of the mixed errors in the estimated dose on risk analysis, which is usually expressed in terms of relative (ERR) or absolute (EAR) risk (see Health Risks from Exposure to Low Levels of Ionizing Radiation, 2006). One such example is the analysis of epidemiological investigations of thyroid cancer among children exposed to radioiodines in fallout from the Chornobyl accident (Jacob et al., 2006; Likhtarev et al., 2006a; Tronko et al., 2006).
In the Chornobyl accident, not only the absolute and relative frequencies of thyroid cancer were determined with accuracy, but also the thyroid doses and their uncertainties of Berkson and classical types were carefully estimated. However, computationally efficient procedures that account for the mixed nature of the dosimetric uncertainties in the risk analysis are not available.
In the epidemiologic study of thyroid cancer conducted in Ukraine (Likhtarev et al., 2006b), the thyroid dose estimates are based on the measurements of Iodine-131 (131I) that were performed on all subjects, and the uncertainties in the thyroid dose estimates are mainly due to those in the measured activities and in the thyroid mass (Likhtarev et al., 2003).
It is assumed in this paper that the thyroid mass is measured with Berkson error, which appears as a result of averaging, while uncertainty in the measured activity is classical. The presence of errors of both types leads to a mixture of classical and Berkson errors in the thyroid dose estimates. In this case, the two components of the uncertainty in the dose are quite distinct and quantifiable, unlike in Reeves et al. (1998) and Li et al. (2007), where the mixture error model involves, besides the true dose, additional, unobservable variables in the model of dose measurement.
The model proposed in this paper is a new way of thinking about these problems. In the Nevada Test Site Study (Apanasovich et al., 2009; Li et al., 2007; Lyon et al., 2006; Mallick et al., 2002) it was recognized that there was a mixture of Berkson and classical errors in the dose estimates, but the available data set did not have information on their relative sizes, and hence the statistical methods used here are quite different. Even the SIMEX version in Apanasovich et al. (2009) is different from the modification developed below because of the different dose uncertainty model.
Though in the simulations and analysis of the source of uncertainty induced by Berkson and classical measurement errors we used the dose estimate and its uncertainty related to the thyroid exposure (Likhtarev et al., 1993; Likhtarev et al., 2003; Likhtarev et al., 2005), the main goal of the present paper is to study the possibility to apply in the risk analysis the most popular methods of regression analysis, e.g., Simulation-Extrapolation (SIMEX), Maximum Likelihood and Regression Calibration (see Carroll et al., 2006; Kopecky et al., 2006; Masiuk et al., 2008; Kukush and Schneeweiss, 2005). Those methods take into account the presence of measurement errors (classical, Berkson and mixed) in regressors. In particular we study the reliability of radiation risk estimates obtained by each of those methods, as well as their robustness with respect to the error magnitude and the form of distribution of the initial data set of dose estimates. Attention will be paid to development and mathematical justification of modifications of the listed methods that under certain conditions lead to significant improvement of the estimates.
In order to get the results that are as much as possible independent of the concrete radio-epidemiological investigations, a cohort was simulated with the following properties:
  • The cohort is sufficiently large that there are enough thyroid cancer disease cases for the existence of a solution to the optimization of the objective loglikelihood function, for all simulated data sets. Thyroid cancer is rare in a population, even for the Chornobyl accident.
  • The radiation risk parameters used are close to the quantities obtained in Jacob, et al. (2006), Likhtarev, et al. (2006a) and Tronko, et al. (2006).
  • The risk model is the simplest but quite popular generalized linear model of absolute risk with binary response (see below) that makes it possible to verify and compare the results with estimates obtained by the well-known program package EPICURE.
We mention the main novelties of the present paper:
  • An efficient version of SIMEX that provides smaller estimation bias compared with ordinary SIMEX.
  • A refined version of SIMEX in equation (A.3) from Appendix that improves statistical performance.
  • A new nonparametric regression calibration procedure.
  • A new nonparametric full maximum likelihood estimating procedure.
  • A simulation study in a unique context.
In this section, we describe the error-free model and computation methods for fitting it.
2.1. The Dose-Response Model
With a binary response Y, our dose-response model is
equation M12
(1)
where D denotes the individual thyroid dose for a given person, the event Y = 1 means that the person developed thyroid cancer within a certain time interval (e.g., during ten years), λ0 is the baseline incidence rate, and λ0β is the excess absolute risk per gray (EAR). Of course, λ0 and β are both positive.
2.2. Maximum Likelihood
The parameters in model (1) can be estimated by maximum likelihood, which consists of solving the equations
equation M13
equation M14
These equations can be obtained by differentiating the loglikelihood function (see e.g. Masiuk, 2008, page 67).
Remark 1 Of course, standard programs such as EPICURE can be used to compute the maximum likelihood estimators, but when we account for measurement error, one of the methods we discuss is the SIMEX algorithm, which is computationally expensive and involves fitting the maximum likelihood method many times. If the bootstrap is employed for inference, the maximum likelihood estimator would have to be computed tens of thousands of times. For this reason, in Section 2.3 we will suggest another approach.
2.3. A Computationally Efficient Method
In order to obtain a computationally efficient estimator, we first note that E{(1 – Y) (1 + βD)} = E(Y0) and E{(1 – Y)} = E[Y / {λ0(1 + β D)}]. Replacing the expectations by their empirical means, this suggests solving
equation M15
(2)
equation M16
(3)
From (2), for any β we have
equation M17
From (2) and (3), by some algebra, we get the equation for [beta]Q as
equation M18
(4)
where
equation M19
(5)
The statistical and large sample properties of the solution to (4) are described in Appendix A.1.
The solution to (4) is readily calculated via any one-dimensional equation solving method. While not the maximum likelihood estimator, in simulations not shown here, we have found that it is nearly as efficient, and in any case can be used as an initial estimate. When we later wish to employ the SIMEX procedure for measurement error correcting, this computational efficiency becomes important, see Remark 1.
According to the dosimetric model described by Likhtarev et al. (2005, 2006b), the calculated thyroid dose of person i is expressed as
equation M20
(6)
where equation M21 is the measured content of 131I in the thyroid gland of person i at time tmes, equation M22 is the estimate of the thyroid mass, and fi is a multiplier which takes into account the parameter values of an ecology-metabolism model for person i. While fi is also measured with error, its error is much smaller than the errors in equation M23 and in equation M24 (see Likhtarev et al., 2003, 2006b) and will be ignored in this analysis. In principle, the uncertainty in fi can be included in the evaluation in many ways, using for example the shared parameter methods of Stram and Kopecky (2003).
The uncertainties in equation M25 and equation M26 are as follows:
  • The measured activity is associated with a multiplicative error, which is determined by the characteristics of the measuring instrument (Likhtarev, et al., 1993; Likhtarev, et al., 2003), so that
    equation M27
    (7)
    where equation M28 is the true 131I content in the thyroid gland, and equation M29 is the independent multiplicative measurement error. Expression (7) of course defines a classical multiplicative error model.
  • The true values of the thyroid mass equation M30 are determined according to a Berkson measurement error model as
    equation M31
    (8)
    where equation M32 is the median of the thyroid mass for a given gender-age group, and equation M33 an independent multiplicative error.
  • The measurement errors ( equation M34, equation M35) are assumed to be lognormally distributed, so that equation M36 and equation M37. In the Chornobyl data, the parameters equation M38 and equation M39 have been reliably estimated (see Likhtarev et al., 1993), therefore in this paper those variables are assumed known.
  • We assume that the random vector ( equation M40, equation M41) and random variables equation M42, equation M43 are jointly independent, and we set equation M44. We allow the random variables equation M45 and equation M46 to be correlated.1
The unobservable error-free dose is denoted by equation M47, and is given as
equation M48
(9)
Remark 2 The assumption that the measurement errors in the thyroid mass have lognormal distributions is justified as follows: our research group possesses the data set of thyroid volume measurements performed by the Sasakawa Foundation in Ukraine for each age-sex-region group (the quantiles of the corresponding distributions are published in Chernobyl: A Decade, 1997, pp. 393–398). It is clear from the data set that the distributions are close to lognormal. By physical reasoning the measured activity is non-negative. Therefore, the multiplicative error in the activity is modeled by a lognormal law. In future we will study the distribution of this error in more detail.
4.1. Parametric Calibration
In practice, regression calibration (see Carroll et al., 2006) is often used to take into account the errors in measured dose. According to this method, prior to statistical processing of the epidemiological data, the measured dose is replaced by the conditional expectation of the true dose given measured quantities. In our case, we the measured dose is replaced by
equation M49
(10)
which is really a short-hand notation for conditioning on the measured quantities ( equation M50, equation M51, fi). Following this, classical regression analysis of the epidemiological data is performed, e.g., by the program package EPICURE (Preston et al., 1993), with inference using a bootstrap.
Consider for example the case that f Qtr/Mmes has a lognormal distribution. Let log(f Qtr/Mmes) = Normal(μ1, equation M52). Because equation M53 and the summands are independent, the conditional distribution of equation M54 given equation M55 is normal,
equation M56
see the corresponding formula in Anderson (1958, Section 2.5). Then
equation M57
Because equation M58, and the factors equation M59 and equation M60 are independent,
equation M61
Finally, we have to estimate μ1 and equation M62 and substitute the estimates:
equation M63
(11)
Consistent estimates for μ1 and equation M64 are the sample mean of equation M65 and the sample variance of equation M66 minus the average value of equation M67.
Also, the calibration of the dose can be done under the assumption that the distribution of log(f Qtr/Mmes) is a mixture of normal distributions. But in our simulations we do not see the advantage of involving the mixture of normals. The reason for that is the unimodality of the distribution of log(f Qtr/Mmes), see Figures 1 and and22 in Section 6.2.
Figure 1:
Figure 1:
Censored distribution of log(fQtr/Mmes).
Figure 2:
Figure 2:
Uncensored distribution of log(fQtr/Mmes).
4.2. Nonparametric Regression Calibration
Now, we suppose that the form of distribution of f Qtr/Mmes is not specified. One way to handle this problem is via a discrete approximation to this distribution. Let the range of log(f Qtr/Mmes) be partitioned into K intervals, with midpoints x1, …,xK. Then construct a discrete approximation to the distribution of equation M68 according to the formula
equation M69
(12)
(in the simulation study we took K = 25). Here the values bk are estimated by maximum likelihood based on known distribution of the measurement errors equation M70 and using the convex optimization technique. Next, by Bayes’ formula we construct the calibrated doses
equation M71
where
equation M72
is the conditional density of equation M73 given equation M74 at point xk.
The estimator obtained by the SIMEX method is randomized, i.e., it is a random function of observations. A similar method of risk parameters estimation was used in Kopecky et al. (2006), but the approach proposed below makes it possible to give proper weight to the structure of measured doses. Note that the SIMEX method does not require the knowledge of the probability distribution for the true activity equation M75.
To account for the Berkson error, the thyroid mass equation M76 is calibrated by formula:
equation M77
(13)
We intend to substitute equation M78 for equation M79. In order to take the classical error into consideration, the following steps are taken:
  • Select a natural number B ≥ 2. Generally, we select B to be a fairly large number, e.g., B = 100. Simulate random perturbations of log-activities equation M80 for b = 1, …, B and i = 1, …, N.
  • Choose various levels of κ ≥ 0. We use κ = (0.0,0.2,0.4,0.6).
  • For each κ, obtain the perturbed doses
    equation M81
    (14)
  • Compute the ordinary (naïve) estimators equation M82, equation M83 for κ = 0, 0.2, 0.4, 0.6 and average them across b, defining equation M84, equation M85.
  • Extrapolate numerically the functions equation M86 and EAR*(κ) to the point κ = −1 and get the SIMEX estimator for the parameters λ0 and EAR = λ0β.
We describe in Remark 1 why a computationally efficient version of SIMEX is desirable in our context. In the computationally efficient modification of the SIMEX method, other values are used instead of the doses (14), namely
equation M87
and as a result the system of equations for the modified estimate takes the form
equation M88
(15)
equation M89
(16)
Hence the left-hand side of (15) is obtained after taking into account the dose errors and correction of the expression equation M90, see equation (2) and Carroll et al. (2006), while the right-hand side of equation (16) is obtained after perturbation of equation M91 in the expressions equation M92, where equation M93 is the dose calibrated with respect to the errors in the thyroid mass. Equations (15)(16) can be solved similarly to equations (2)(3).
This modification of SIMEX is computationally efficient for the following reasons:
  • Each evaluation of the naïve estimator, that is solving (15)–(16), is efficient.
  • Some calculations common to different evaluations of the naïve estimator have to be performed only once.
A more detailed description of the proposed procedure is presented in Appendix A.2, while the characteristics of its implementation are given in Appendix A.3. This modification of the SIMEX estimator is justified in Appendix A.2.2.
6.1. Simulation Setup
We performed a simulation study based on data from epidemiological investigations (see Jacob, et al., 2006; Likhtarev, et al., 2005; Likhtarev, et al., 2006a; Likhtarev, et al., 2006b). For the simulation of doses a real subpopulation of 7077 children and teenagers up to 18 years was used; these persons resided in settlements of Zhitomir, Kyiv, and Chernigov Oblasts of Ukraine at the time of the Chornobyl accident and had direct measurements of thyroid activity in May–June 1986. The thyroid doses for this subpopulation were reconstructed in the framework of the Ukrainian-American project on the investigation of Post-Chornobyl thyroid cancers in Ukraine (Likhtarev, et al., 2006b).
6.2. Two Simulation Scenarios
We simulated under two scenarios, in one of which the distribution of thyroid doses is well-modeled by a lognormal distribution, while in the other it is not.
  • In the first scenario the subpopulation was formed in such a way that the persons with dose f Qtr/Mmes greater than 2 Gray were deleted. In addition, in order to increase the power of the numerical experiment, this subpopulation was artificially enlarged to N = 66665 persons. The cutoff was performed only at the preliminary stage of the simulation. But after the simulation the doses were not cut off and varied from 3 × 10−7 to 10.5 Grays (see Figure 1).
  • In the second scenario, at the preliminary stage the cutoff was not performed, and the subpopulation was artificially enlarged to N = 70770 persons (see Figure 2).
The true parameters of the absolute risk model (1) were given by the values close to the estimates obtained in the epidemiological studies of the thyroid cancers in Ukraine (Jacob, et al., 2006; Likhtarev, et al., 2006a). Namely, we set
equation M94
We simulated 100 different data sets. However, unperturbed data fi, equation M95 and equation M96, are based on real data and are the same for all realizations. In fact,
We used different values of the measurement error variances, including the real ones as obtained in Chernobyl: A Decade (1997), Likhtarev et al. (1993), and Likhtarev et al. (2003). The values of the geometric standard deviation GSDM of the uncertainty distribution for the thyroid mass were 1 (no error), 1.33,2 1.5, and 2.0, while the GSDQ of the uncertainty distribution for the measured activity varied from 1.5 to 5.0.
6.3. Simulation Results and Discussion
6.3.1. Setup and Methods
The simulation results are presented in Tables 14.
Table 1:
Table 1:
Estimates of baseline incidence rate for the first scenario.
Table 4:
Table 4:
Estimates of excess absolute risk for the second scenario.
The estimation of the absolute risk parameters was performed by several methods:
  • Naïve: The ordinary maximum likelihood method in which the dose estimates are taken to be error-free.
  • Parametric Full ML: Full maximum likelihood where the error in the measured activity was taken into account by means of the integral convolution over the distribution of true thyroid doses (Carroll et al., 2006; Masiuk et al., 2008), assuming that f Qtr/Mmes has a lognormal distribution. The error in the thyroid mass was handled by relation (13).
  • Nonparametric Full ML: The full maximum likelihood method described above, but without assumption about the distribution of f Qtr/Mmes. The latter distribution is approximated according to formula (12).
  • Parametric Regression Calibration: Parametric regression calibration as described in Section 4.1.
  • Nonparametric Regression Calibration: Nonparametric calibration as described in Section 4.2.
  • Ordinary SIMEX: The SIMEX method described in Section 5 with refinement (A.3).
  • Efficient SIMEX: Our computationally efficient modification of the SIMEX method presented in Section 5.
Each estimate was computed for 100 different realizations of doses and cases. Then the corresponding estimates were averaged.
For methods 1, 4 and 5, the confidence intervals were not calculated due to their well-known statistical inconsistency in the presence of dose errors, and hence we used the deviance interval (DI), calculated based on the 2.5th and the 97.5th percentiles of the estimates over 100 simulations.
For the estimators 2, 3, 6 and 7 the confidence intervals were constructed based on the asymptotic covariance matrices. The method applied to construct the confidence intervals for both SIMEX estimators is given in Appendix A.3.
6.3.2. The Naïve Method
The analysis of the results shows that in cases where the errors are present only in the measured activities (see the first block in Tables 14), the naïve method underestimates of the excess absolute risk, while the baseline incidence rate is overestimated. The bias increases as the errors increase. For GSDQ = 5 the value of the EAR is underestimated by more than a factor of 10.
If the errors are present both in the measured activity and in the thyroid mass, then the naïve estimate of the excess absolute risk approaches the true value only in case where the error variances are approximately equal. This is connected to the fact that Berkson errors in equation M99 lead to the overestimation of the EAR, while the classical errors lead conversely to its underestimation. Thus, the errors compensate for one another. If one of the errors is larger, then the EAR estimator is biased.
6.3.3. Parametric Regression Calibration and Maximum Likelihood
In Scenario 1, parametric methods are biased (see Tables 1 and and2),2), because the distribution of the thyroid doses is not lognormal. In other words, the misspecification of the true distribution leads to bias of the estimates which are constructed by structural methods (Schneeweiss and Cheng, 2006).
Table 2:
Table 2:
Estimates of excess absolute risk for the first scenario.
In Scenario 2 (see Tables 3 and and4),4), the dose distribution is uncensored and can be approximated by the lognormal law. It thus comes as no surprise that the parametric methods therefore have little bias.
Table 3:
Table 3:
Estimates of baseline incidence rate for the second scenario.
6.3.4. Nonparametric Regression Calibration and Maximum Likelihood
Nonparametric regression calibration and nonparametric full maximum likelihood methods seem to be more adequate, because they estimate the dose distribution in a nonparametric way. As a result those methods show quite modest bias even for the nonsmooth underlying distribution in Scenario 1.
However, in Scenario 2, things might be expected to be different because of the large right tail of true doses, for which the regression calibration approximation might break down. Nonparametric full maximum likelihood behaves quite well in this case, while nonparametric regression calibration has some bias, perhaps because the nonlinear effects of the underlying logistic type model are revealed.
6.3.5. SIMEX
Though SIMEX is robust with respect to the dose distribution, of course its performance gets worse for large errors. In particular ordinary SIMEX shows considerable bias for large measurement errors. The reason is as follows: we use a quadratic extrapolation function, but the naive ordinary likelihood estimator deviates from the quadratic law as a function of the additional error variance. In the computationally efficient SIMEX procedure, we use modified naive estimators given by the estimating equations (15) and (16), and those naive estimators vary almost as a quadratic function of the additional error variance. As a result the efficient SIMEX has small bias even for relatively large errors. In the simulation the level of GSDQ was limited to 5, but for larger classical errors the behavior of the computationally efficient SIMEX estimator could be worse.
6.3.6. Numerical Comparisons
For a particular comparison, we consider the case that GSDQ = 3 and GSDM = 1.5.
In Scenario 1, the %-bias in λ0 for Naïve, Parametric ML, Nonparametric ML, Parametric regression calibration, Nonparametric regression calibration, Ordinary SIMEX and Efficient SIMEX were 95%, 31%, 2%, 35%, 1%, 9% and 1%, respectively. The corresponding biases for Scenario 2 were 102%, 16%, 4%, 6%, 9%, 2% and 2%, respectively.
In Scenario 1, the %-bias in Excess Absolute Risk for Naïve, Parametric ML, Nonparametric ML, Parametric regression calibration, Nonparametric regression calibration, Ordinary SIMEX and computationally Efficient SIMEX were 70%, 28%, 0.1%, 13%, 1%, 53% and 0.2%, respectively. The corresponding biases for Scenario 2 were 50%, 11%, 2%, 5%, 3%, 11% and 2%, respectively.
In terms of variability, we ignore the Naïve estimate because of its severe bias. We considered the length of the 95% CI as a measure of standard deviation, and define the variance efficiency of a method compared to the computationally efficient SIMEX method as the square of the ratio of efficient SIMEX interval length to the method’s interval length. For the methods the CI were not computed, the deviance intervals were used.
In Scenario 1, the variance efficiency in λ0 for the Parametric ML, Nonparametric ML, Parametric regression calibration, Nonparametric regression calibration and Ordinary SIMEX compared to computationally Efficient SIMEX were 100%, 71%, 59%, 118% and 73%, respectively. The corresponding efficiencies for Scenario 2 were 121%, 150%, 100%, 125% and 79%, respectively.
In Scenario 1, the variance efficiency in Excess Absolute Risk for the Parametric ML, Nonparametric ML, Parametric regression calibration, Nonparametric regression calibration and Ordinary SIMEX compared to computationally Efficient SIMEX were 125%, 94%, 158%, 83% and 155%, respectively. The corresponding efficiencies for Scenario 2 were 125%, 81%, 99%, 107% and 137%, respectively.
6.3.7. Other Comments
Simulations showed that the level of the classical and Berkson errors has little influence on the size of confidence intervals for the computationally efficient SIMEX and nonparametric full maximum likelihood estimates. Nevertheless for larger classic errors confidence intervals become wider. Moreover, in some cases larger Berkson errors correspond to narrower confidence intervals. The reason for this effect is that in the presence of a larger Berkson error, the number of disease cases in the simulated response increases because the mean dose is increasing. This in turn yields more information and leads to shrinking of confidence intervals for risk estimators.
We considered model (1) in terms of the baseline incidence rate and EAR because we think it is more natural in the measurement error context. Additionally, EAR is a kind of slope parameter for dose, and the attenuation effect for the naïve estimator of slope parameter is well known (Carroll et al., 2006). But it is possible to compute point estimates of ERR as well and construct the corresponding confidence intervals. A rough estimate of ERR equals the ratio of estimates of EAR and λ0.
We studied the influence of mixed dose errors on radiation risk estimation in binary regression with linear dose dependence. A model of absolute risk was investigated where the doses are observed with a mixture of classical and Berkson multiplicative errors. The algorithms were constructed to compute the parameter estimates by full maximum likelihood, regression calibration and SIMEX method.
For the standard binary model, i.e., under the absence of errors in the dose estimates, a computationally efficient algorithm to compute the risk estimate was proposed, which leads to solving a one variable equation. In Appendix A.1 it is shown that this equation has no more than one root in the region λ0 > 0, β > 0 and it can be solved efficiently. Simulations show that it is almost as accurate as the maximum likelihood estimator and can serve as a good initial approximation for computation of the maximum likelihood estimator. Our simulations suggest that for the range of doses considered, the efficient SIMEX estimators turned out to be the most precise of all the estimators we considered.
We studied two scenarios in our simulation, with the following ideas in mind. Parametric regression calibration and parametric maximum likelihood necessarily make parametric assumptions about the distribution of a latent variable f Qtr/Mmes. We were interested in what happens when the parametric assumptions were very wrong (Scenario 1, Tables 1 and and2)2) or were approximately correct (Scenario 2, Tables 3 and and4).4). Not surprisingly, when the assumptions were very wrong, the parametric methods incurred significant bias. This shows that parametric regression calibration and maximum likelihood are excellent procedures as long as care is taken in the modeling assumptions. Whether Scenario 1 or Scenario 2 is the more practical case will depend on the application.
Among the methods that make no assumptions about the distribution of the latent variable, we generally observed little bias. Although the computationally efficient SIMEX and nonparametric full maximum likelihood seem to be more accurate, the nonparametric regression calibration is easier to apply. Indeed within the nonparametric calibration procedure, after estimating the underlying dose distribution and calibrating the doses, the standard package EPICURE produces the final risk estimates.
Appendix:. Sketch of Technical Arguments
A.1. Consistency of the Computationally Efficient Methods
Introduce a function
equation M104
where Dav is defined by equation (5). The relationship (4) is equivalent to the equation FN(β) = 0, β > 0. Assume the following:
  • Among the observations with Yi = 1, not all the values Di coincide.
  • equation M105.
  • equation M106.
Condition (i) means that not all ill subjects received an identical dose. Condition (ii) is natural as well and means that the mean dose for healthy subjects is less than the mean dose for ill subjects. It is more difficult to interpret condition (iii). One can show that for the structural model where the dose distribution does not degenerate to a single point (i.e. for each D0 ≥ 0, it holds pr(D = D0) < 1), the condition (iii) holds true with probability 1 for all N starting from a certain random number.
Lemma 1 1. Assume condition (i). Then the function FN is strictly decreasing, and therefore, equation (4) has no more than one solution.
2. Assume conditions (i) to (iii). Then there exists a unique solution β > 0 to equation (4).
The proof of 1. consists in verifying inequality equation M107, β ≥ 0. Next, (ii) implies that FN(0) > 0, and from (iii) it follows that equation M108, and then a continuous function FN equals zero at certain point β > 0.
Theorem 1 Assume that in the structural model (1) the distribution of dose D is not concentrated at a single point. Then the computationally efficient estimates [lambda with circumflex]0Q, [beta]Q exist and are unique for all NN0(ω) almost surely, moreover those estimates are strictly consistent, i.e. with probability 1 they converge to the true values equation M109 and equation M110, almost surely.
Here is a sketch proof of Theorem 1. Lemma 1 implies the existence of the estimates for NN0(ω), almost surely. The estimating equations for the efficient estimator can be rewritten as
equation M111
(A.1)
equation M112
(A.2)
(over the domain {λ0 > 0, β > 0} these equations are equivalent to (2)–(3); and to obtain equation (A.2), one has to subtract (3) from (2) and divide the difference by β). Equations (A.1)(A.2) are unbiased, that is
equation M113
equation M114
Here and hereafter Eλ0,β denotes the expectation, provided λ0 and β are true values of the parameters in the model (1). Moreover, the limit system of equations
equation M115
equation M116
has the unique solution [ell]0 = λ0, b = β. This can be proven using the convex function q([ell]0,a) = (1 –Y )([ell]0 + aD) –Y ln([ell]0 + aD). The functions under expectation are its partial derivatives at point ([ell]0,a) = ([ell]0,[ell]0b). By the theory of estimating equations (Carroll et al, 2006; Huber, 1967) the last two facts imply the strict consistency of the efficient estimates.
A.2. Computationally Efficient SIMEX
A.2.1. Description of the Estimator
The SIMEX estimator described in Section 5 differs from ordinary SIMEX as follows.
  • A model of observations of activities Qi with heteroscedastic errors is considered, i.e., the variances of the classical errors equation M117 can vary with the trial number i. At that the variances are assumed known and are not the values of, e.g., a known function of the unobserved value equation M118.
  • The estimator given by equations (2), (3) plays the role of the naïve maximum likelihood estimator for different values of κ. This simplifies calculations without essential loss in accuracy of estimation.
  • Perturbations equation M119 are simulated in such a way that
    equation M120
    (A.3)
We now explain why equality (A.3) provides smaller variability and smaller deviations.
Let θ = (λ0,EAR)T, s,Y,D) be the elementary estimating function for the naïve estimate, equation M121 and [theta w/ hat]*(κ) the estimates of the parameter θ used in the SIMEX method. For ordinary SIMEX estimator which is combined with efficient estimates, the elementary estimation function can be written in the form, see equations (A.1)(A.2),
equation M122
while for the efficient modification of SIMEX
equation M123
where equation M124, and equation M125.
The random function equation M126 is found from the equation
equation M127
We find the derivative equation M128 with equation M129. For this purpose we calculate the partial derivatives equation M130 and equation M131 of the estimating function s,Y,D):
equation M132
Under the condition of existence of the computationally efficient estimate (see Appendix A.1) the derivative equation M133 is a nonsingular matrix.
Then by the Implicit Function Theorem
equation M134
Note that equation M135 does not depend on b and find the derivative of the function equation M136 at point 0:
equation M137
If equation M138 then {d[theta w/ hat]*(r2/dr}r=0, and in the expansion
equation M139
(A.4)
the term equation M140 is vanishing.
Condition (A.3) decreases variability of the estimator. Indeed, under (A.3) the term equation M141 in the expansion (A.4) is vanishing, and this expansion looks more like Taylor expansion w.r.t. κ. Therefore, the extrapolated value [theta w/ hat]*(−1), which is the SIMEX estimator, will be more numerically stable.
A.2.2. Computationally Efficient SIMEX as a Combination of the SIMEX Method and the Corrected Score Method
The expression equation M142 is an unbiased estimator of equation M143, where equation M144 is defined in (9). Indeed,
equation M145
We derived this equality using the model of observations (7)–(8) and the assumption on the lognormality of multiplicative errors.
Therefore, equation (15), which can be rewritten as
equation M146
where the unknown variables are denoted as λ0 and β while in (15) the same variables are denoted as equation M147 and equation M148, is constructed from
equation M149
see equation (2), according to the Corrected Score method (Carroll et al., 2006). Like the SIMEX method, this method is functional and does not need the knowledge of the distribution of the latent variable.
A.3. Implementation of the Computationally Efficient SIMEX Method
Perturbations equation M150, b = 1, …, B, B = 10, were simulated in such a way that for the fixed i, the random variables equation M151 have joint Gaussian distribution with the covariance
equation M152
and therefore, equation M153, i = 1, …,N. The numerical extrapolation of the functions equation M154 and EAR*(κ) was performed via approximation by the least-squares polynomial of the second order at points κ = 0; 0.2; 0.4; 0.6. The explicit formula for the value of the extrapolating polynomial at point κ = −1 is as follows:
equation M155
Here [theta w/ hat]*(κ) stands for either equation M156 or EAR*(κ).
The confidence interval for the SIMEX estimator is constructed based on the SIMEX Sandwich covariance matrix of this estimator. The covariance matrix is computed according to the recommendation in Carroll et al. (2006, pp. 111 and 393–395). We estimate the variance of the naive estimator equation M157 taking Yi and equation M158 as the data using sandwich formula, and denote the variance estimate as equation M159. Then SIMEX Sandwich variance estimator is the extrapolated value of equation M160 to the point κ = −1.
Footnotes
Author Notes: This work was supported by funds from the U.S. National Cancer Institute and the Radiation Protection Institute ATS of Ukraine. The authors also want to thank their colleges from the Institute of Endocrinology and Metabolism AMS of Ukraine and the Radiation Protection Institute ATS of Ukraine who contributed to the preparation of the results presented in the paper.
Alexander Kukush is supported by the Swedish Institute grant SI-01424/2007. Carroll’s research was supported by a grant from the National Cancer Institute (CA57030). This publication is based in part on work supported by Award Number KUSCI-016-04, made by King Abdullah University of Science and Technology (KAUST).
1For the subpopulation from Section 6.1 the sample correlation between log(Qtr) and log(Mmes) is equal to 0.26. The explanation for positive correlation between the thyroid activity and the thyroid gland mass is the following: the larger age of a person the larger the mass, and therefore, in general, the larger the activity.
2Here the estimate of GSDM based on the real data is used in the simulation of equation M100. In this case the simulated error in the thyroid mass is heteroscedastic, and its average GSD is equation M101. For all other cases the simulated errors equation M102 are homoscedastic. Throughout the simulation, the errors equation M103 are homoscedastic.
Contributor Information
Alexander Kukush, Ukraine Radiation Protection Institute.
Sergiy Shklyar, Ukraine Radiation Protection Institute.
Sergii Masiuk, Ukraine Radiation Protection Institute.
Illya Likhtarov, Ukraine Radiation Protection Institute.
Lina Kovgan, Ukraine Radiation Protection Institute.
Raymond J Carroll, Texas A&M University.
Andre Bouville, National Cancer Institute, NIH, DHHS.
  • Anderson TW. An introduction to multivariate statistical analysis. New York: John Wiley and Sons; 1958.
  • Apanasovich TV, Carroll RJ, Maity A. “SIMEX and variance estimation in semiparametric measurement error models,” Electronic Journal of Statistics. 2009;3:318–348. doi: 10.1214/08-EJS341. [PubMed] [Cross Ref]
  • Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CA. Measurement Error in Nonlinear Models A Modern Perspective. Boca Raton: Chapman and Hall/CRC; 2006.
  • Yamashita S, Shibata Y, editors. Chernobyl: A Decade; Proceedings of the Fifth Chernobyl Sasakawa Medical Cooperation Symposium; Kiev, Ukraine. 14–15 October 1996; Amsterdam: Elsevier; 1997.
  • Health Risks from Exposure to Low Levels of Ionizing Radiation. Washington, DC: National Academy Press; (BEIR VII Phase 2, 2006),
  • Hofer E. “How to account for uncertainty due to measurement errors in an uncertainty analysis using Monte Carlo simulation,” Health Physics. 2008;95:277–290. doi: 10.1097/01.HP.0000314761.98655.dd. [PubMed] [Cross Ref]
  • Hosmer D, Lemeshow S. Applied Logistic Regression. New-York: John Wiley and Sons; 2000.
  • Huber PJ. Proceedings of the Fifth Berkeley Symposium in Mathematical Statistics and Probability. Vol. 1. Berkeley: University of California Press; 1967. “The behavior of maximum likelihood estimates under nonstandard conditions,” pp. 221–233.
  • Jacob P, Bogdanova TI, Buglova E, Chepurniy M, Demidchik Y, Gavrilin Y, Kenigsberg J, Meckbach R, Schotola C, Shinkarev S, Tronko MD, Ulanovsky A, Vavilov S, Walsh L. “Thyroid cancer risk in areas of Ukraine and Belarus affected by the Chernobyl accident,” Radiation Research. 2006;165:1–8. doi: 10.1667/RR3479.1. [PubMed] [Cross Ref]
  • Kopecky KJ, Stepanenko V, Rivkind N, Voilleque P, Onstad L, Shakhtarin V, Parshkov E, Kulikov S, Lushnikov E, Abrosimov A, Troshin V, Romanova G, Doroschenko V, Proshin A, Tsyb A, Davis S. “Childhood thyroid cancer, radiation dose from Chernobyl and dose uncertainties in Bryansk Oblast, Russia: A population-based case-control study,” Radiation Research. 2006;166:367–374. doi: 10.1667/RR3596.1. [PubMed] [Cross Ref]
  • Kukush A, Schneeweiss H. “Comparing different estimators in a nonlinear measurement error model. I, II,” Mathematical Methods of Statistics. 2005;14(1):53–79. and 14, No 2, 203–223.
  • Kukush A, Malenko A, Schneeweiss H. “Optimality of the quasi score estimator in a mean-variance model with applications to measurement error models,” Journal of Statistical Planning and Inference. 2009;139:3461–3472. doi: 10.1016/j.jspi.2009.03.022. [Cross Ref]
  • Likhtarev IA, Prohl G, Henrichs K. Munich: GSF-Bericht 19/93, Institut für Strahlenschutz; 1993. “Reliability and accuracy of the 131I thyroid activity measurements performed in the Ukraine after the Chernobyl accident in 1986,”
  • Likhtarev I, Minenko V, Khrouch V, Bouville A. “Uncertainties in thyroid dose reconstruction after Chernobyl,” Radiation Protection Dosimetry. 2003;105(1–4):601–608. [PubMed]
  • Likhtarev I, Kovgan L, Vavilov S, Chepurny M, Bouville A, Luckyanov N, Jacob P, Voillequé P, Voigt G. “Post-Chornobyl thyroid cancers in Ukraine. Report 1: Estimation of thyroid doses,” Radiation Research. 2005;163:125–136. doi: 10.1667/RR3291. [PubMed] [Cross Ref]
  • Likhtarev I, Kovgan L, Vavilov S, Chepurny M, Bouville A, Luckyanov N, Jacob P, Voillequé P, Voigt G. “Post-Chornobyl thyroid cancers in Ukraine. Report 2: Risk analysis,” Radiation Research. 2006a;166:375–386. doi: 10.1667/RR3593.1. [PubMed] [Cross Ref]
  • Likhtarev I, Bouville A, Kovgan L, Luckyanov N, Voilleque P, Chepurny M. “Questionnaire- and Measurement-Based Individual Thyroid Doses in Ukraine Resulting from the Chornobyl Nuclear Reactor Accident,” Radiation Research. 2006b;166:271–286. doi: 10.1667/RR3545.1. [PubMed] [Cross Ref]
  • Li Y, Guolo A, Hoffman FO, Carroll RJ. “Shared uncertainty in measurement error problems, with application to Nevada Test Site fallout data,” Biometrics. 2007;63:1226–1236. [PubMed]
  • Lyon JL, Alder SC, Stone MB, Scholl A, Reading JC, Holubkov R, Sheng X, White GL, Hegmann KT, Anspaugh L, Hoffman FO, Simon SL, Thomas B, Carroll RJ, Meikle AW. “Thyroid disease associated with exposure to the Nevada Test Site radiation: a reevaluation based on corrected dosimetry and examination data,” Epidemiology. 2006;17:604–614. doi: 10.1097/01.ede.0000240540.79983.7f. [PubMed] [Cross Ref]
  • Mallick B, Hoffman FO, Carroll RJ. “Semiparametric regression modeling with mixtures of Berkson and classical error, with application to fallout from the Nevada Test Site,” Biometrics. 2002;58:13–20. doi: 10.1111/j.0006-341X.2002.00013.x. [PubMed] [Cross Ref]
  • Masiuk SV, Shklyar SV, Kukush AG, Vavilov SE. Radiation and Risk. 3. Vol. 17. Bulletin of the National Radiation and Epidemiological Registry; 2008. “Impact of doses uncertainties on the radiation risks estimation,” pp. 64–75. (in Russian). http://mechmat.univ.kiev.ua/eng/ppages/kukush/Public/files/Unsert_risk_influence.doc.
  • Preston DL, Lubin JH, Pierce DA, McConney ME. EPICURE User’s Guide. Seattle, WA: Hirosoft Corporation; 1993.
  • Reeves GK, Cox DR, Darby SC, Whitley E. “Some aspects of measurement error in explanatory variables for continuous and binary regression models,” Statistics in Medicine. 1998;17:2157–2177. doi: 10.1002/(SICI)1097-0258(19981015)17:19<2157::AID-SIM916>3.0.CO;2-F. [PubMed] [Cross Ref]
  • Ron E, Hoffman FO. “Uncertainties in radiation dosimetry and their impact on dose-response analysis”. Proc. of a workshop held; September 3–5, 1997; Bethesda, Maryland. 1999. NIH Publication No. 99-4541.
  • Schneeweiss H, Cheng C-L. “Bias of the structural quasiscore estimator of a measurement error model under misspecification of the regressor distribution,” Journal of Multivariate Analysis. 2006;97:455–473. doi: 10.1016/j.jmva.2005.03.010. [Cross Ref]
  • Schneeweiss H, Kukush A. “Comparing the efficiency of structural and functional methods in measurement error models,” Probability Theory and Mathematical Statistics. 2009;80:131–142. doi: 10.1090/S0094-9000-2010-00800-3. (Teor Ĭmovir Matem Statist, ISSN 0868-6904), 80, 119–129 Reprinted in Theory of Probability and Mathematical Statistics (ISSN 1547-7363, ISSN 0094-9000) [Cross Ref]
  • Shklyar S, Schneeweiss H. “A comparison of asymptotic covariance matrices of three consistent estimators in the Poisson regression model with measurement errors,” Journal of Multivariate Analysis. 2005;94:250–270. doi: 10.1016/j.jmva.2004.05.002. [Cross Ref]
  • Stram DO, Kopecky KJ. “Power and uncertainty analysis of epidemiological studies of radiation-related disease risk in which dose estimates are based on a complex dosimetry system: some observations,” Radiation Research. 2003;160:408–417. doi: 10.1667/3046. [PubMed] [Cross Ref]
  • Tronko MD, Howe GR, Bogdanova TI, Bouville AC, Epstien OV, Brill AB, Likhtarev IA, Fink DJ, Markov VV, Greenebaum E, Olijnyk VA, Masnyk IJ, Shpak VM, McConnell RJ, Tereshchenko VP, Robbins J, Zvinchuk OV, Zablotska LB, Hatch M, Luckyanov NK, Ron E, Thomas TL, Voillequé PG, Beebe GW. “A cohort study of thyroid cancer and other thyroid diseases after the Chornobyl accident: thyroid cancer in Ukraine detected during first screening,” Journal of the National Cancer Institute. 2006;98:897–903. doi: 10.1093/jnci/djj244. [PubMed] [Cross Ref]
Articles from The International Journal of Biostatistics are provided here courtesy of
Berkeley Electronic Press