Home | About | Journals | Submit | Contact Us | Français |

**|**Int J Biostat**|**PMC3058406

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Error-Free Dose-Response Model and Risk Estimation
- 3. Model for Dose
- 4. Regression Calibration
- 5. The SIMEX Estimator and Its Computationally Efficient Modification
- 6. Simulation
- 7. Conclusions
- References

Authors

Related links

Int J Biostat. 2011 January 1; 7(1): Article 15.

Published online 2011 February 16. doi: 10.2202/1557-4679.1281

PMCID: PMC3058406

Alexander Kukush, Ukraine Radiation Protection Institute;

Copyright © 2011 The Berkeley Electronic Press. All rights reserved

This article has been cited by other articles in PMC.

With a binary response *Y*, the dose-response model under consideration is logistic in flavor with pr(*Y*=1 | *D*) = *R* (1+*R*)^{−1}, *R* = *λ*_{0} + EAR *D*, where *λ*_{0} is the baseline incidence rate and EAR is the excess absolute risk per gray. The calculated thyroid dose of a person *i* is expressed as
${D}_{i}^{\mathit{\text{mes}}}={f}_{i}{Q}_{i}^{\mathit{\text{mes}}}/{M}_{i}^{\mathit{\text{mes}}}$. Here,
${Q}_{i}^{\mathit{\text{mes}}}$ is the measured content of radioiodine in the thyroid gland of person *i* at time *t _{mes}*,
${M}_{i}^{\mathit{\text{mes}}}$ is the estimate of the thyroid mass, and

By means of Parametric Full Maximum Likelihood and Regression Calibration (under the assumption that the data set of true doses has lognormal distribution), Nonparametric Full Maximum Likelihood, Nonparametric Regression Calibration, and by properly tuned SIMEX method we study the influence of measurement errors in thyroid dose on the estimates of *λ*_{0} and EAR. The simulation study is presented based on a real sample from the epidemiological studies. The doses were reconstructed in the framework of the Ukrainian-American project on the investigation of Post-Chernobyl thyroid cancers in Ukraine, and the underlying subpolulation was artificially enlarged in order to increase the statistical power. The true risk parameters were given by the values to earlier epidemiological studies, and then the binary response was simulated according to the dose-response model.

The current methods of risk estimation (see *Health Risks from Exposure to Low Levels of Ionizing Radiation*, 2006; Jacob et al., 2006; Likhtarev et al., 2006a) take into account uncertainties related to health effects, but assume that the doses are known with perfect precision and accuracy, something that is not true (Kopecky et al., 2006; Likhtarev et al., 1993; Likhtarev et al., 2003; Ron and Hoffman, 1999). Although many attempts have been made to develop mathematical tools which take into account the dose uncertainty (see Carroll et al., 2006; Hofer, 2008; Kopecky et al., 2006; Li et al., 2007; Masiuk et al., 2008), the problem has yet to be solved in a satisfactory manner. It is our opinion that one of the reasons an accurate model is so hard to create is because the dose estimates are almost inevitably associated with a mixture of classical and Berkson errors (Reeves et al., 1998; Carroll et al., 2006; Li et al., 2007), which leads to difficulties in evaluating the influence of the mixed errors in the estimated dose on risk analysis, which is usually expressed in terms of relative (ERR) or absolute (EAR) risk (see *Health Risks from Exposure to Low Levels of Ionizing Radiation*, 2006). One such example is the analysis of epidemiological investigations of thyroid cancer among children exposed to radioiodines in fallout from the Chornobyl accident (Jacob et al., 2006; Likhtarev et al., 2006a; Tronko et al., 2006).

In the Chornobyl accident, not only the absolute and relative frequencies of thyroid cancer were determined with accuracy, but also the thyroid doses and their uncertainties of Berkson and classical types were carefully estimated. However, computationally efficient procedures that account for the mixed nature of the dosimetric uncertainties in the risk analysis are not available.

In the epidemiologic study of thyroid cancer conducted in Ukraine (Likhtarev et al., 2006b), the thyroid dose estimates are based on the measurements of Iodine-131 (^{131}I) that were performed on all subjects, and the uncertainties in the thyroid dose estimates are mainly due to those in the measured activities and in the thyroid mass (Likhtarev et al., 2003).

It is assumed in this paper that the thyroid mass is measured with Berkson error, which appears as a result of averaging, while uncertainty in the measured activity is classical. The presence of errors of both types leads to a mixture of classical and Berkson errors in the thyroid dose estimates. In this case, the two components of the uncertainty in the dose are quite distinct and quantifiable, unlike in Reeves et al. (1998) and Li et al. (2007), where the mixture error model involves, besides the true dose, additional, unobservable variables in the model of dose measurement.

The model proposed in this paper is a new way of thinking about these problems. In the Nevada Test Site Study (Apanasovich et al., 2009; Li et al., 2007; Lyon et al., 2006; Mallick et al., 2002) it was recognized that there was a mixture of Berkson and classical errors in the dose estimates, but the available data set did not have information on their relative sizes, and hence the statistical methods used here are quite different. Even the SIMEX version in Apanasovich et al. (2009) is different from the modification developed below because of the different dose uncertainty model.

Though in the simulations and analysis of the source of uncertainty induced by Berkson and classical measurement errors we used the dose estimate and its uncertainty related to the thyroid exposure (Likhtarev et al., 1993; Likhtarev et al., 2003; Likhtarev et al., 2005), the main goal of the present paper is to study the possibility to apply in the risk analysis the most popular methods of regression analysis, e.g., Simulation-Extrapolation (SIMEX), Maximum Likelihood and Regression Calibration (see Carroll et al., 2006; Kopecky et al., 2006; Masiuk et al., 2008; Kukush and Schneeweiss, 2005). Those methods take into account the presence of measurement errors (classical, Berkson and mixed) in regressors. In particular we study the reliability of radiation risk estimates obtained by each of those methods, as well as their robustness with respect to the error magnitude and the form of distribution of the initial data set of dose estimates. Attention will be paid to development and mathematical justification of modifications of the listed methods that under certain conditions lead to significant improvement of the estimates.

In order to get the results that are as much as possible independent of the concrete radio-epidemiological investigations, a cohort was simulated with the following properties:

- The cohort is sufficiently large that there are enough thyroid cancer disease cases for the existence of a solution to the optimization of the objective loglikelihood function, for all simulated data sets. Thyroid cancer is rare in a population, even for the Chornobyl accident.
- The radiation risk parameters used are close to the quantities obtained in Jacob, et al. (2006), Likhtarev, et al. (2006a) and Tronko, et al. (2006).
- The risk model is the simplest but quite popular generalized linear model of absolute risk with binary response (see below) that makes it possible to verify and compare the results with estimates obtained by the well-known program package
*EPICURE*.

We mention the main novelties of the present paper:

- An efficient version of SIMEX that provides smaller estimation bias compared with ordinary SIMEX.
- A refined version of SIMEX in equation (A.3) from Appendix that improves statistical performance.
- A new nonparametric regression calibration procedure.
- A new nonparametric full maximum likelihood estimating procedure.
- A simulation study in a unique context.

In this section, we describe the error-free model and computation methods for fitting it.

With a binary response *Y*, our dose-response model is

$$\text{pr}(Y=1|D)=\frac{{\lambda}_{0}(1+\beta D)}{1+{\lambda}_{0}(1+\beta D)}$$

(1)

where *D* denotes the individual thyroid dose for a given person, the event *Y* = 1 means that the person developed thyroid cancer within a certain time interval (e.g., during ten years), λ_{0} is the baseline incidence rate, and λ_{0}β is the excess absolute risk per gray (EAR). Of course, λ_{0} and β are both positive.

The parameters in model (1) can be estimated by maximum likelihood, which consists of solving the equations

$${\sum}_{i=1}^{N}{Y}_{i}{\{{\lambda}_{0}(1+\beta {D}_{i})\}}^{-1}={\sum}_{i=1}^{N}{\{1+{\lambda}_{0}(1+\beta {D}_{i})\}}^{-1};$$

$${\sum}_{i=1}^{N}{Y}_{i}{D}_{i}{\{{\lambda}_{0}(1+\beta {D}_{i})\}}^{-1}={\sum}_{i=1}^{N}{D}_{i}{\{1+{\lambda}_{0}(1+\beta {D}_{i})\}}^{-1}.$$

These equations can be obtained by differentiating the loglikelihood function (see e.g. Masiuk, 2008, page 67).

**Remark 1** Of course, standard programs such as *EPICURE* can be used to compute the maximum likelihood estimators, but when we account for measurement error, one of the methods we discuss is the SIMEX algorithm, which is computationally expensive and involves fitting the maximum likelihood method many times. If the bootstrap is employed for inference, the maximum likelihood estimator would have to be computed tens of thousands of times. For this reason, in Section 2.3 we will suggest another approach.

In order to obtain a computationally efficient estimator, we first note that **E**{(1 – *Y*) (1 + β*D*)} = **E**(*Y*/λ_{0}) and **E**{(1 – *Y*)} = **E**[*Y* / {λ_{0}(1 + β *D*)}]. Replacing the expectations by their empirical means, this suggests solving

$$0={\sum}_{i=1}^{N}\{(1-{Y}_{i})(1+\beta {D}_{i})-{Y}_{i}/{\lambda}_{0}\};$$

(2)

$$0={\sum}_{i=1}^{N}[1-{Y}_{i}-{Y}_{i}/\{{\lambda}_{0}(1+\beta {D}_{i})\}].$$

(3)

From (2), for any β we have

$${\widehat{\lambda}}_{0Q}={\widehat{\lambda}}_{0Q}(\beta )={\sum}_{i=1}^{N}{Y}_{i}/\{{\sum}_{i=1}^{N}(1-{Y}_{i})(1+\beta {D}_{i})\}.$$

From (2) and (3), by some algebra, we get the equation for * _{Q}* as

$$0={\sum}_{i=1}^{N}{Y}_{i}({D}_{i}-{D}_{\text{av}})/(1+\beta {D}_{i}),\beta 0,$$

(4)

where

$${D}_{\text{av}}={\sum}_{i=1}^{N}{D}_{i}(1-{Y}_{i})/\{{\sum}_{i=1}^{N}(1-{Y}_{i})\}.$$

(5)

The statistical and large sample properties of the solution to (4) are described in Appendix A.1.

The solution to (4) is readily calculated via any one-dimensional equation solving method. While not the maximum likelihood estimator, in simulations not shown here, we have found that it is nearly as efficient, and in any case can be used as an initial estimate. When we later wish to employ the SIMEX procedure for measurement error correcting, this computational efficiency becomes important, see Remark 1.

According to the dosimetric model described by Likhtarev et al. (2005, 2006b), the calculated thyroid dose of person *i* is expressed as

$${D}_{i}^{\mathit{\text{ins}}}=\frac{{Q}_{i}^{\mathit{\text{mes}}}}{{M}_{i}^{\mathit{\text{est}}}}{f}_{i},$$

(6)

where
${Q}_{i}^{\mathit{\text{mes}}}$ is the measured content of ^{131}I in the thyroid gland of person *i* at time *t _{mes}*,
${M}_{i}^{\mathit{\text{est}}}$ is the estimate of the thyroid mass, and

The uncertainties in ${Q}_{i}^{\mathit{\text{mes}}}$ and ${M}_{i}^{\mathit{\text{est}}}$ are as follows:

- The measured activity is associated with a multiplicative error, which is determined by the characteristics of the measuring instrument (Likhtarev, et al., 1993; Likhtarev, et al., 2003), so thatwhere ${Q}_{i}^{\mathit{\text{tr}}}$ is the true$${Q}_{i}^{\mathit{\text{mes}}}={Q}_{i}^{\mathit{\text{tr}}}{V}_{i}^{Q},$$(7)
^{131}I content in the thyroid gland, and ${V}_{i}^{Q}$ is the independent multiplicative measurement error. Expression (7) of course defines a classical multiplicative error model. - The true values of the thyroid mass ${M}_{i}^{\mathit{\text{tr}}}$ are determined according to a Berkson measurement error model aswhere ${M}_{i}^{\mathit{\text{mes}}}$ is the median of the thyroid mass for a given gender-age group, and ${V}_{i}^{M}$ an independent multiplicative error.$${M}_{i}^{\mathit{\text{tr}}}={M}_{i}^{\mathit{\text{mes}}}{V}_{i}^{M},$$(8)
- The measurement errors ( ${V}_{i}^{Q}$, ${V}_{i}^{M}$) are assumed to be lognormally distributed, so that $\text{log}({V}_{i}^{Q})=\text{Normal}(0,{\sigma}_{Q,i}^{2})$ and $\text{log}({V}_{i}^{M})=\text{Normal}(0,{\sigma}_{M,i}^{2})$. In the Chornobyl data, the parameters ${\sigma}_{Q,i}^{2}$ and ${\sigma}_{M,i}^{2}$ have been reliably estimated (see Likhtarev et al., 1993), therefore in this paper those variables are assumed known.
- We assume that the random vector ( ${Q}_{i}^{\mathit{\text{tr}}}$, ${M}_{i}^{\mathit{\text{mes}}}$) and random variables ${V}_{i}^{Q}$, ${V}_{i}^{M}$ are jointly independent, and we set ${M}_{i}^{\mathit{\text{est}}}={M}_{i}^{\mathit{\text{mes}}}$. We allow the random variables ${Q}_{i}^{\mathit{\text{tr}}}$ and ${M}_{i}^{\mathit{\text{mes}}}$ to be correlated.
^{1}

The unobservable error-free dose is denoted by ${D}_{i}^{\mathit{\text{tr}}}$, and is given as

$${D}_{i}^{\mathit{\text{tr}}}={Q}_{i}^{\mathit{\text{tr}}}{f}_{i}/{M}_{i}^{tr}.$$

(9)

**Remark 2** The assumption that the measurement errors in the thyroid mass have lognormal distributions is justified as follows: our research group possesses the data set of thyroid volume measurements performed by the Sasakawa Foundation in Ukraine for each age-sex-region group (the quantiles of the corresponding distributions are published in *Chernobyl: A Decade*, 1997, pp. 393–398). It is clear from the data set that the distributions are close to lognormal. By physical reasoning the measured activity is non-negative. Therefore, the multiplicative error in the activity is modeled by a lognormal law. In future we will study the distribution of this error in more detail.

In practice, regression calibration (see Carroll et al., 2006) is often used to take into account the errors in measured dose. According to this method, prior to statistical processing of the epidemiological data, the measured dose is replaced by the conditional expectation of the true dose given measured quantities. In our case, we the measured dose is replaced by

$${D}_{i}^{*}=\mathbf{\text{E}}({D}_{i}^{\mathit{\text{tr}}}|{D}_{i}^{\mathit{\text{ins}}}),$$

(10)

which is really a short-hand notation for conditioning on the measured quantities (
${M}_{i}^{\mathit{\text{est}}}$,
${Q}_{i}^{\mathit{\text{mes}}}$*, f _{i}*). Following this, classical regression analysis of the epidemiological data is performed, e.g., by the program package

Consider for example the case that *f Q ^{tr}/M^{mes}* has a lognormal distribution. Let log(

$$\text{log}\left(\frac{{f}_{i}{Q}_{i}^{\mathit{\text{tr}}}}{{M}_{i}^{\mathit{\text{mes}}}}\right)|{D}_{i}^{\mathit{\text{ins}}}=\text{Normal}\left(\frac{{\sigma}_{1}^{2}\text{log}{D}_{i}^{\mathit{\text{ins}}}+{\sigma}_{Q,i}^{2}{\mu}_{1}}{{\sigma}_{1}^{2}+{\sigma}_{Q,i}^{2}},\frac{{\sigma}_{1}^{2}{\sigma}_{Q,i}^{2}}{{\sigma}_{1}^{2}+{\sigma}_{Q,i}^{2}}\right),$$

see the corresponding formula in Anderson (1958, Section 2.5). Then

$$\mathbf{\text{E}}\left(\frac{{f}_{i}{Q}_{i}^{\mathit{\text{tr}}}}{{M}_{i}^{\mathit{\text{mes}}}}|{D}_{i}^{\mathit{\text{ins}}}\right)=\text{exp}\left(\frac{{\sigma}_{1}^{2}\text{log}{D}_{i}^{\mathit{\text{ins}}}+{\sigma}_{Q,i}^{2}{\mu}_{1}+{\sigma}_{1}^{2}{\sigma}_{Q,i}^{2}/2}{{\sigma}_{1}^{2}+{\sigma}_{Q,i}^{2}}\right).$$

Because ${D}_{i}^{\mathit{\text{tr}}}=({f}_{i}{Q}_{i}^{\mathit{\text{tr}}}/{M}_{i}^{\mathit{\text{mes}}})\times {({V}_{i}^{M})}^{-1}$, and the factors ${f}_{i}{Q}_{i}^{\mathit{\text{tr}}}/{M}_{i}^{\mathit{\text{mes}}}$ and ${({V}_{i}^{M})}^{-1}$ are independent,

$$\begin{array}{l}\mathbf{\text{E}}[{D}_{i}^{\mathit{\text{tr}}}|{D}_{i}^{\mathit{\text{ins}}}]=\mathbf{\text{E}}[{f}_{i}{Q}_{i}^{\mathit{\text{tr}}}/{M}_{i}^{\mathit{\text{mes}}}|{D}_{i}^{\mathit{\text{ins}}}]\mathbf{\text{E}}{({V}_{i}^{M})}^{-1}\\ =\text{exp}\left\{\frac{{\sigma}_{1}^{2}\text{log}{D}_{i}^{\mathit{\text{ins}}}+{\sigma}_{Q,i}^{2}{\mu}_{1}+{\sigma}_{1}^{2}{\sigma}_{Q,i}^{2}/2}{{\sigma}_{1}^{2}+{\sigma}_{Q,i}^{2}}+\frac{{\sigma}_{M,i}^{2}}{2}\right\}.\end{array}$$

Finally, we have to estimate μ_{1} and
${\sigma}_{1}^{2}$ and substitute the estimates:

$${D}_{i}^{*}=\text{exp}\left\{\frac{{\widehat{\sigma}}_{1}^{2}\text{log}{D}_{i}^{\mathit{\text{ins}}}+{\sigma}_{Q,i}^{2}{\widehat{\mu}}_{1}+{\widehat{\sigma}}_{1}^{2}{\sigma}_{Q,i}^{2}/2}{{\widehat{\sigma}}_{1}^{2}+{\sigma}_{Q,i}^{2}}+\frac{{\sigma}_{M,i}^{2}}{2}\right\}.$$

(11)

Consistent estimates for μ_{1} and
${\sigma}_{1}^{2}$ are the sample mean of
$\text{log}({D}_{i}^{\mathit{\text{ins}}})$ and the sample variance of
$\text{log}({D}_{i}^{\mathit{\text{ins}}})$ minus the average value of
${\sigma}_{Q,i}^{2}$.

Also, the calibration of the dose can be done under the assumption that the distribution of log(*f Q ^{tr}/M^{mes}*) is a mixture of normal distributions. But in our simulations we do not see the advantage of involving the mixture of normals. The reason for that is the unimodality of the distribution of log(

Now, we suppose that the form of distribution of *f Q ^{tr}/M^{mes}* is not specified. One way to handle this problem is via a discrete approximation to this distribution. Let the range of log(

$$\text{pr}\{{f}_{i}{Q}_{i}^{\mathit{\text{tr}}}/{M}_{i}^{\mathit{\text{mes}}}=\text{exp}({x}_{k})\}={b}_{k},k=1,\dots ,K$$

(12)

(in the simulation study we took *K* = 25). Here the values *b _{k}* are estimated by maximum likelihood based on known distribution of the measurement errors
${V}_{i}^{Q}$ and using the convex optimization technique. Next, by Bayes’ formula we construct the calibrated doses

$${D}_{i}^{*}=\frac{{\sum}_{k=1}^{K}{b}_{k}{ik}_{\text{exp}}{\sum}_{k=1}^{K}{b}_{k}{ik}_{},}{}$$

where

$${ik}_{=}$$

is the conditional density of
$\text{log}({D}_{i}^{\mathit{\text{mes}}})$ given
${f}_{i}{Q}_{i}^{\mathit{\text{tr}}}/{M}_{i}^{\mathit{\text{mes}}}$ at point *x _{k}*.

The estimator obtained by the SIMEX method is randomized, i.e., it is a random function of observations. A similar method of risk parameters estimation was used in Kopecky et al. (2006), but the approach proposed below makes it possible to give proper weight to the structure of measured doses. Note that the SIMEX method does not require the knowledge of the probability distribution for the true activity ${Q}_{i}^{\mathit{\text{tr}}}$.

To account for the Berkson error, the thyroid mass ${M}_{i}^{\mathit{\text{mes}}}$ is calibrated by formula:

$$\mathbf{\text{E}}\{(1/{M}_{i}^{\mathit{\text{tr}}})|{M}_{i}^{\mathit{\text{mes}}}\}=\mathbf{\text{E}}(1/{V}_{i}^{M})/{M}_{i}^{\mathit{\text{mes}}}=\text{exp}({\sigma}_{M,i}^{2}/2)/{M}_{i}^{\mathit{\text{mes}}}.$$

(13)

We intend to substitute $\mathbf{\text{E}}\{(1/{M}_{i}^{\mathit{\text{tr}}})|{M}_{i}^{\mathit{\text{mes}}}\}$ for $1/{M}_{i}^{\mathit{\text{mes}}}$. In order to take the classical error into consideration, the following steps are taken:

- Select a natural number
*B*≥ 2. Generally, we select*B*to be a fairly large number, e.g.,*B*= 100. Simulate random perturbations of log-activities ${U}_{b,i}^{*}=\text{Normal}(0,{\sigma}_{Q,i}^{2})$ for*b*= 1, …,*B*and*i*= 1, …,*N*. - Choose various levels of κ ≥ 0. We use κ = (0.0,0.2,0.4,0.6).
- For each κ, obtain the perturbed doses$${D}_{b,i}^{*}(\kappa )=\frac{{Q}_{i}^{\mathit{\text{mes}}}}{{M}_{i}^{\mathit{\text{mes}}}}{f}_{i}\text{exp}({\sigma}_{M,i}^{2}/2+\sqrt{\kappa}{U}_{b,i}^{*}).$$(14)
- Compute the ordinary (naïve) estimators ${\widehat{\lambda}}_{0,b}^{*}(\kappa )$, ${\widehat{\beta}}_{b}^{*}(\kappa )$ for κ = 0, 0.2, 0.4, 0.6 and average them across
*b*, defining ${\widehat{\lambda}}_{0}^{*}(\kappa )={B}^{-1}{\sum}_{b=1}^{B}{\widehat{\lambda}}_{0,b}^{*}(\kappa )$, $\text{EAR}*(\kappa )={B}^{-1}{\sum}_{b=1}^{B}{\widehat{\lambda}}_{0,b}^{*}(\kappa ){\widehat{\beta}}_{b}^{*}(\kappa )$. - Extrapolate numerically the functions ${\widehat{\lambda}}_{0}^{*}(\kappa )$ and EAR*(κ) to the point κ = −1 and get the SIMEX estimator for the parameters λ
_{0}and EAR = λ_{0}β.

We describe in Remark 1 why a computationally efficient version of SIMEX is desirable in our context. In the computationally efficient modification of the SIMEX method, other values are used instead of the doses (14), namely

$${D}_{b,i}^{*}(\kappa )=\{\begin{array}{l}({Q}_{i}^{\mathit{\text{mes}}}/{M}_{i}^{\mathit{\text{mes}}}){f}_{i}\text{exp}\left\{\left({\sigma}_{M,i}^{2}-{\sigma}_{Q,i}^{2}\right)/2\right\},{Y}_{i}=0,({Q}_{i}^{\mathit{\text{mes}}}/{M}_{i}^{\mathit{\text{mes}}}){f}_{i}\text{exp}\left({\sigma}_{M,i}^{2}/2+\sqrt{\kappa}{U}_{b,i}^{*}\right),{Y}_{i}=1,\hfill \hfill \end{array}$$

and as a result the system of equations for the modified estimate takes the form

$${\sum}_{i=1}^{N}(1-{Y}_{i})\left[1+{\widehat{\beta}}_{b}^{*}(\kappa )\frac{{Q}_{i}^{\mathit{\text{mes}}}}{{M}_{i}^{\mathit{\text{mes}}}}{f}_{i}\text{exp}\{(1/2)({\sigma}_{M,i}^{2}-{\sigma}_{Q,i}^{2})\}\right]={\sum}_{i=1}^{N}{Y}_{i}/{\widehat{\lambda}}_{0,b}^{*}(\kappa ),$$

(15)

$${\sum}_{i=1}^{N}(1-{Y}_{i})={\sum}_{i=1}^{N}{Y}_{i}{[{\widehat{\lambda}}_{0,b}^{*}(\kappa )\{1+{\widehat{\beta}}_{b}^{*}(\kappa ){D}_{b,i}^{*}(\kappa )\}]}^{-1}.$$

(16)

Hence the left-hand side of (15) is obtained after taking into account the dose errors and correction of the expression ${\sum}_{i=1}^{N}(1-{Y}_{i})(1+\beta {D}_{i})$, see equation (2) and Carroll et al. (2006), while the right-hand side of equation (16) is obtained after perturbation of ${D}_{i}^{M\mathit{\text{cal}}}$ in the expressions ${\sum}_{i=1}^{N}{Y}_{i}/\{{\lambda}_{0}(1+\beta {D}_{i}^{M\mathit{\text{cal}}}$, where ${D}_{i}^{M\mathit{\text{cal}}}$ is the dose calibrated with respect to the errors in the thyroid mass. Equations (15)–(16) can be solved similarly to equations (2)–(3).

This modification of SIMEX is computationally efficient for the following reasons:

- Each evaluation of the naïve estimator, that is solving (15)–(16), is efficient.
- Some calculations common to different evaluations of the naïve estimator have to be performed only once.

A more detailed description of the proposed procedure is presented in Appendix A.2, while the characteristics of its implementation are given in Appendix A.3. This modification of the SIMEX estimator is justified in Appendix A.2.2.

We performed a simulation study based on data from epidemiological investigations (see Jacob, et al., 2006; Likhtarev, et al., 2005; Likhtarev, et al., 2006a; Likhtarev, et al., 2006b). For the simulation of doses a real subpopulation of 7077 children and teenagers up to 18 years was used; these persons resided in settlements of Zhitomir, Kyiv, and Chernigov Oblasts of Ukraine at the time of the Chornobyl accident and had direct measurements of thyroid activity in May–June 1986. The thyroid doses for this subpopulation were reconstructed in the framework of the Ukrainian-American project on the investigation of Post-Chornobyl thyroid cancers in Ukraine (Likhtarev, et al., 2006b).

We simulated under two scenarios, in one of which the distribution of thyroid doses is well-modeled by a lognormal distribution, while in the other it is not.

- In the
*first scenario*the subpopulation was formed in such a way that the persons with dose*f Q*greater than 2 Gray were deleted. In addition, in order to increase the power of the numerical experiment, this subpopulation was artificially enlarged to^{tr}/M^{mes}*N*= 66665 persons. The cutoff was performed only at the preliminary stage of the simulation. But after the simulation the doses were not cut off and varied from 3 × 10^{−7}to 10.5 Grays (see Figure 1). - In the
*second scenario*, at the preliminary stage the cutoff was not performed, and the subpopulation was artificially enlarged to*N*= 70770 persons (see Figure 2).

The true parameters of the absolute risk model (1) were given by the values close to the estimates obtained in the epidemiological studies of the thyroid cancers in Ukraine (Jacob, et al., 2006; Likhtarev, et al., 2006a). Namely, we set

$${\lambda}_{0}={10}^{-3}\frac{\text{cases}}{\text{persons}};EAR=5\times {10}^{-3}\frac{\text{cases}}{\text{Gray}\text{persons}}$$

We simulated 100 different data sets. However, unperturbed data *f _{i}*,
${Q}_{i}^{\mathit{\text{tr}}}$ and
${M}_{i}^{\mathit{\text{mes}}},i=1,\dots ,N$, are based on real data and are the same for all realizations. In fact,

*f*is the estimate from the ecological dosimetric model, see Likhtarev et al. (2006b),_{i}- for ${Q}_{i}^{\mathit{\text{tr}}}$ we took the thyroid activity measured in 1986, see Likhtarev et al. (1993),
- and ${M}_{i}^{\mathit{\text{mes}}}$ is the median of the thyroid mass distribution for corresponding age-sex-region group, see
*Chornobyl: A Decade*(1997).

We used different values of the measurement error variances, including the real ones as obtained in *Chernobyl: A Decade* (1997), Likhtarev et al. (1993), and Likhtarev et al. (2003). The values of the geometric standard deviation GSD* _{M}* of the uncertainty distribution for the thyroid mass were 1 (no error), 1.33,

The simulation results are presented in Tables 1–4.

The estimation of the absolute risk parameters was performed by several methods:

- Naïve: The ordinary maximum likelihood method in which the dose estimates are taken to be error-free.
- Parametric Full ML: Full maximum likelihood where the error in the measured activity was taken into account by means of the integral convolution over the distribution of true thyroid doses (Carroll et al., 2006; Masiuk et al., 2008), assuming that
*f Q*has a lognormal distribution. The error in the thyroid mass was handled by relation (13).^{tr}/M^{mes} - Nonparametric Full ML: The full maximum likelihood method described above, but without assumption about the distribution of
*f Q*. The latter distribution is approximated according to formula (12).^{tr}/M^{mes} - Parametric Regression Calibration: Parametric regression calibration as described in Section 4.1.
- Nonparametric Regression Calibration: Nonparametric calibration as described in Section 4.2.
- Ordinary SIMEX: The SIMEX method described in Section 5 with refinement (A.3).
- Efficient SIMEX: Our computationally efficient modification of the SIMEX method presented in Section 5.

Each estimate was computed for 100 different realizations of doses and cases. Then the corresponding estimates were averaged.

For methods 1, 4 and 5, the confidence intervals were not calculated due to their well-known statistical inconsistency in the presence of dose errors, and hence we used the deviance interval (DI), calculated based on the 2.5th and the 97.5th percentiles of the estimates over 100 simulations.

For the estimators 2, 3, 6 and 7 the confidence intervals were constructed based on the asymptotic covariance matrices. The method applied to construct the confidence intervals for both SIMEX estimators is given in Appendix A.3.

The analysis of the results shows that in cases where the errors are present only in the measured activities (see the first block in Tables 1–4), the naïve method underestimates of the excess absolute risk, while the baseline incidence rate is overestimated. The bias increases as the errors increase. For GSD* _{Q}* = 5 the value of the EAR is underestimated by more than a factor of 10.

If the errors are present both in the measured activity and in the thyroid mass, then the naïve estimate of the excess absolute risk approaches the true value only in case where the error variances are approximately equal. This is connected to the fact that Berkson errors in ${M}_{i}^{\mathit{\text{tr}}}$ lead to the overestimation of the EAR, while the classical errors lead conversely to its underestimation. Thus, the errors compensate for one another. If one of the errors is larger, then the EAR estimator is biased.

In Scenario 1, parametric methods are biased (see Tables 1 and and2),2), because the distribution of the thyroid doses is not lognormal. In other words, the misspecification of the true distribution leads to bias of the estimates which are constructed by structural methods (Schneeweiss and Cheng, 2006).

In Scenario 2 (see Tables 3 and and4),4), the dose distribution is uncensored and can be approximated by the lognormal law. It thus comes as no surprise that the parametric methods therefore have little bias.

Nonparametric regression calibration and nonparametric full maximum likelihood methods seem to be more adequate, because they estimate the dose distribution in a nonparametric way. As a result those methods show quite modest bias even for the nonsmooth underlying distribution in Scenario 1.

However, in Scenario 2, things might be expected to be different because of the large right tail of true doses, for which the regression calibration approximation might break down. Nonparametric full maximum likelihood behaves quite well in this case, while nonparametric regression calibration has some bias, perhaps because the nonlinear effects of the underlying logistic type model are revealed.

Though SIMEX is robust with respect to the dose distribution, of course its performance gets worse for large errors. In particular ordinary SIMEX shows considerable bias for large measurement errors. The reason is as follows: we use a quadratic extrapolation function, but the naive ordinary likelihood estimator deviates from the quadratic law as a function of the additional error variance. In the computationally efficient SIMEX procedure, we use modified naive estimators given by the estimating equations (15) and (16), and those naive estimators vary almost as a quadratic function of the additional error variance. As a result the efficient SIMEX has small bias even for relatively large errors. In the simulation the level of GSD* _{Q}* was limited to 5, but for larger classical errors the behavior of the computationally efficient SIMEX estimator could be worse.

For a particular comparison, we consider the case that GSD* _{Q}* = 3 and GSD

In Scenario 1, the %-bias in λ_{0} for Naïve, Parametric ML, Nonparametric ML, Parametric regression calibration, Nonparametric regression calibration, Ordinary SIMEX and Efficient SIMEX were 95%, 31%, 2%, 35%, 1%, 9% and 1%, respectively. The corresponding biases for Scenario 2 were 102%, 16%, 4%, 6%, 9%, 2% and 2%, respectively.

In Scenario 1, the %-bias in Excess Absolute Risk for Naïve, Parametric ML, Nonparametric ML, Parametric regression calibration, Nonparametric regression calibration, Ordinary SIMEX and computationally Efficient SIMEX were 70%, 28%, 0.1%, 13%, 1%, 53% and 0.2%, respectively. The corresponding biases for Scenario 2 were 50%, 11%, 2%, 5%, 3%, 11% and 2%, respectively.

In terms of variability, we ignore the Naïve estimate because of its severe bias. We considered the length of the 95% CI as a measure of standard deviation, and define the variance efficiency of a method compared to the computationally efficient SIMEX method as the square of the ratio of efficient SIMEX interval length to the method’s interval length. For the methods the CI were not computed, the deviance intervals were used.

In Scenario 1, the variance efficiency in λ_{0} for the Parametric ML, Nonparametric ML, Parametric regression calibration, Nonparametric regression calibration and Ordinary SIMEX compared to computationally Efficient SIMEX were 100%, 71%, 59%, 118% and 73%, respectively. The corresponding efficiencies for Scenario 2 were 121%, 150%, 100%, 125% and 79%, respectively.

In Scenario 1, the variance efficiency in Excess Absolute Risk for the Parametric ML, Nonparametric ML, Parametric regression calibration, Nonparametric regression calibration and Ordinary SIMEX compared to computationally Efficient SIMEX were 125%, 94%, 158%, 83% and 155%, respectively. The corresponding efficiencies for Scenario 2 were 125%, 81%, 99%, 107% and 137%, respectively.

Simulations showed that the level of the classical and Berkson errors has little influence on the size of confidence intervals for the computationally efficient SIMEX and nonparametric full maximum likelihood estimates. Nevertheless for larger classic errors confidence intervals become wider. Moreover, in some cases larger Berkson errors correspond to narrower confidence intervals. The reason for this effect is that in the presence of a larger Berkson error, the number of disease cases in the simulated response increases because the mean dose is increasing. This in turn yields more information and leads to shrinking of confidence intervals for risk estimators.

We considered model (1) in terms of the baseline incidence rate and EAR because we think it is more natural in the measurement error context. Additionally, EAR is a kind of slope parameter for dose, and the attenuation effect for the naïve estimator of slope parameter is well known (Carroll et al., 2006). But it is possible to compute point estimates of ERR as well and construct the corresponding confidence intervals. A rough estimate of ERR equals the ratio of estimates of EAR and λ_{0}.

We studied the influence of mixed dose errors on radiation risk estimation in binary regression with linear dose dependence. A model of absolute risk was investigated where the doses are observed with a mixture of classical and Berkson multiplicative errors. The algorithms were constructed to compute the parameter estimates by full maximum likelihood, regression calibration and SIMEX method.

For the standard binary model, i.e., under the absence of errors in the dose estimates, a computationally efficient algorithm to compute the risk estimate was proposed, which leads to solving a one variable equation. In Appendix A.1 it is shown that this equation has no more than one root in the region λ_{0} > 0, β > 0 and it can be solved efficiently. Simulations show that it is almost as accurate as the maximum likelihood estimator and can serve as a good initial approximation for computation of the maximum likelihood estimator. Our simulations suggest that for the range of doses considered, the efficient SIMEX estimators turned out to be the most precise of all the estimators we considered.

We studied two scenarios in our simulation, with the following ideas in mind. Parametric regression calibration and parametric maximum likelihood necessarily make parametric assumptions about the distribution of a latent variable *f Q ^{tr}/M^{mes}*. We were interested in what happens when the parametric assumptions were very wrong (Scenario 1, Tables 1 and and2)2) or were approximately correct (Scenario 2, Tables 3 and and4).4). Not surprisingly, when the assumptions were very wrong, the parametric methods incurred significant bias. This shows that parametric regression calibration and maximum likelihood are excellent procedures as long as care is taken in the modeling assumptions. Whether Scenario 1 or Scenario 2 is the more practical case will depend on the application.

Among the methods that make no assumptions about the distribution of the latent variable, we generally observed little bias. Although the computationally efficient SIMEX and nonparametric full maximum likelihood seem to be more accurate, the nonparametric regression calibration is easier to apply. Indeed within the nonparametric calibration procedure, after estimating the underlying dose distribution and calibrating the doses, the standard package *EPICURE* produces the final risk estimates.

Introduce a function

$${F}_{N}(\beta )={\left({\sum}_{i=1}^{N}\frac{{Y}_{i}}{1+\beta {D}_{i}}\right)}^{-1}{\sum}_{i=1}^{N}\frac{{Y}_{i}({D}_{i}-{D}_{\text{av}})}{1+\beta {D}_{i}},\beta \ge 0,$$

where *D*_{av} is defined by equation (5). The relationship (4) is equivalent to the equation *F _{N}*(β) = 0, β > 0. Assume the following:

- Among the observations with
*Y*= 1, not all the values_{i}*D*coincide._{i} - ${\sum}_{i=1}^{N}{Y}_{i}({D}_{i}-{D}_{\text{av}})>0$.
- ${\sum}_{i=1}^{N}{Y}_{i}({D}_{i}-{D}_{\text{av}})/{D}_{i}<0$.

Condition (i) means that not all ill subjects received an identical dose. Condition (ii) is natural as well and means that the mean dose for healthy subjects is less than the mean dose for ill subjects. It is more difficult to interpret condition (iii). One can show that for the structural model where the dose distribution does not degenerate to a single point (i.e. for each *D*_{0} ≥ 0, it holds pr(*D* = *D*_{0}) < 1), the condition (iii) holds true with probability 1 for all *N* starting from a certain random number.

**Lemma 1** *1. Assume condition (i). Then the function F _{N} is strictly decreasing, and therefore, equation (4) has no more than one solution.*

*2. Assume conditions (i) to (iii). Then there exists a unique solution* β > 0 *to equation (4).*

The proof of 1. consists in verifying inequality
${F}_{N}^{\prime}(\beta )<0$, β ≥ 0. Next, (ii) implies that *F _{N}*(0) > 0, and from (iii) it follows that
$\underset{\beta \to +\infty}{\text{lim}}{F}_{N}(\beta )<0$, and then a continuous function

**Theorem 1** *Assume that in the structural model (1) the distribution of dose D is not concentrated at a single point. Then the computationally efficient estimates* _{0Q}, _{Q}*exist and are unique for all N* ≥ *N*_{0}(ω) *almost surely, moreover those estimates are strictly consistent, i.e. with probability 1 they converge to the true values*
$\underset{N\to \infty}{\text{lim}}{\widehat{\lambda}}_{0Q}={\lambda}_{0}$ *and*
$\underset{N\to \infty}{\text{lim}}{\widehat{\beta}}_{Q}=\beta $, *almost surely*.

Here is a sketch proof of Theorem 1. Lemma 1 implies the existence of the estimates for *N* ≥ *N*_{0}(ω), almost surely. The estimating equations for the efficient estimator can be rewritten as

$$0={\sum}_{i=1}^{N}[1-{Y}_{i}-{Y}_{i}/\{{\lambda}_{0}(1+\beta {D}_{i})\}],$$

(A.1)

$$0={\sum}_{i=1}^{N}{D}_{i}[1-{Y}_{i}-{Y}_{i}/\{{\lambda}_{0}(1+\beta {D}_{i})\}]$$

(A.2)

(over the domain {λ_{0} > 0, β > 0} these equations are equivalent to (2)–(3); and to obtain equation (A.2), one has to subtract (3) from (2) and divide the difference by β). Equations (A.1)–(A.2) are unbiased, that is

$$0={\mathbf{\text{E}}}_{{\lambda}_{0},\beta}[1-Y-Y/\{{\lambda}_{0}(1+\beta D)\}];$$

$$0={\mathbf{\text{E}}}_{{\lambda}_{0},\beta}(D[1-Y-Y/\{{\lambda}_{0}(1+\beta D)\}]).$$

Here and hereafter **E**_{λ0,β} denotes the expectation, provided λ_{0} and β are true values of the parameters in the model (1). Moreover, the limit system of equations

$$0={\mathbf{\text{E}}}_{{\lambda}_{0},\beta}[1-Y-Y/\{{0}_{(}$$

$$0={\mathbf{\text{E}}}_{{\lambda}_{0},\beta}(D[1-Y-Y/\{{0}_{(}$$

has the unique solution _{0} = λ_{0}, *b* = β. This can be proven using the convex function *q*(_{0}*,a*) = (1 –*Y* )(_{0} + *aD*) –*Y* ln(_{0} + *aD*). The functions under expectation are its partial derivatives at point (_{0}*,a*) = (_{0}*,*_{0}*b*). By the theory of estimating equations (Carroll et al, 2006; Huber, 1967) the last two facts imply the strict consistency of the efficient estimates.

The SIMEX estimator described in Section 5 differs from ordinary SIMEX as follows.

- A model of observations of activities
*Q*with heteroscedastic errors is considered, i.e., the variances of the classical errors ${\sigma}_{Q,i}^{2}$ can vary with the trial number_{i}*i*. At that the variances are assumed known and are not the values of, e.g., a known function of the unobserved value ${Q}_{i}^{\mathit{\text{tr}}}$. - The estimator given by equations (2), (3) plays the role of the naïve maximum likelihood estimator for different values of κ. This simplifies calculations without essential loss in accuracy of estimation.
- Perturbations ${U}_{b,i}^{*}$ are simulated in such a way that$${\sum}_{b=1}^{B}{U}_{b,i}^{*}=0,i=1,\dots ,N.$$(A.3)

We now explain why equality (A.3) provides smaller variability and smaller deviations.

Let θ = (λ_{0}*,EAR*)^{T}, *s*(θ *,Y,D*) be the elementary estimating function for the naïve estimate,
${\widehat{\theta}}_{b}^{*}(\kappa )$ and ^{*}(κ) the estimates of the parameter θ used in the SIMEX method. For ordinary SIMEX estimator which is combined with efficient estimates, the elementary estimation function can be written in the form, see equations (A.1)–(A.2),

$$s(\theta ,Y,D)={(1,D)}^{\text{T}}\{1-Y-Y/({\lambda}_{0}+EARD)\},$$

while for the efficient modification of SIMEX

$$s(\theta ,Y,D)={\{1-Y-Y/({\lambda}_{0}+EARD),(1-Y){\widehat{D}}_{\text{av}}-YD/({\lambda}_{0}+EARD)\}\text{T},}^{}$$

where ${\widehat{D}}_{\text{av}}={\{{\sum}_{i=1}^{N}(1-{Y}_{i})\}}^{-1}{\sum}_{i=1}^{N}(1-{Y}_{i}){D}_{i}^{M\mathit{\text{cal}}}$, and ${D}_{i}^{M\mathit{\text{cal}}}$.

The random function ${\widehat{\theta}}_{b}^{*}(\kappa )$ is found from the equation

$${\sum}_{i=1}^{N}s({\widehat{\theta}}_{b}^{*}(\kappa ),{Y}_{i},{D}_{i}^{M\mathit{\text{cal}}}=0.$$

We find the derivative
${\frac{d{\widehat{\theta}}_{b}^{*}({r}^{2})}{dr}|}_{r=0}$ with
$r=\sqrt{\kappa}$. For this purpose we calculate the partial derivatives
${s}_{D}^{\prime}$ and
${s}_{\theta}^{\prime}$ of the estimating function *s*(θ *,Y,D*):

$$\begin{array}{l}{\frac{{\sum}_{i=1}^{N}s(\theta ,{Y}_{i},{D}_{i}^{M\mathit{\text{cal}}}r|}{r=0}={\sum}_{i=1}^{N}{s}_{D}^{\prime}(\theta ,{Y}_{i},{D}_{i}^{M\mathit{\text{cal}}}\\ {\frac{{\sum}_{i=1}^{N}s(\theta ,{Y}_{i},{D}_{i}^{M\mathit{\text{cal}}}\theta |}{r=0}={\sum}_{i=1}^{N}{s}_{D}^{\prime}(\theta ,{Y}_{i},{D}_{i}^{M\mathit{\text{cal}}}={\sum}_{i=1}^{N}\{{Y}_{i}/{({\theta}_{1}+{\theta}_{2}{D}_{i}^{M\mathit{\text{cal}}}2}^{\}}(\begin{array}{ll}1\hfill & {D}_{i}^{M\mathit{\text{cal}}}\hfill & {D}_{i}^{M\mathit{\text{cal}}}{({D}_{i}^{M\mathit{\text{cal}}}2}^{}\hfill \hfill \\ )\\ .\end{array}}_{}}_{}\end{array}$$

Under the condition of existence of the computationally efficient estimate (see Appendix A.1) the derivative ${\sum}_{i=1}^{N}{s}_{\theta}^{\prime}(\theta ,{Y}_{i},{D}_{i}^{M\mathit{\text{cal}}}$ is a nonsingular matrix.

Then by the Implicit Function Theorem

$${\frac{d{\widehat{\theta}}_{b}^{*}({r}^{2})}{dr}|}_{r=0}=-{\{{\sum}_{i=1}^{N}{s}_{\theta}^{\prime}({\widehat{\theta}}_{b}^{*}(0),{Y}_{i},{D}_{i}^{M\mathit{\text{cal}}}\}-1}^{{\sum}_{i=1}^{N}}$$

Note that
${\widehat{\theta}}_{b}^{*}(0)={\widehat{\theta}}_{1}^{*}(0)$ does not depend on *b* and find the derivative of the function
$\widehat{\theta}*({r}^{2})={B}^{-1}{\sum}_{b=1}^{B}{\widehat{\theta}}_{b}^{*}({r}^{2})$ at point 0:

$${\frac{d\widehat{\theta}*({r}^{2})}{dr}|}_{r=0}=-\frac{{\sum}_{i=1}^{N}{s}_{D}^{\prime}({\widehat{\theta}}_{1}^{*}(0),{Y}_{i},{D}_{i}^{M\mathit{\text{cal}}}B{\sum}_{i=1}^{N}{s}_{\theta}^{\prime}({\widehat{\theta}}_{1}^{*}(0),{Y}_{i},{D}_{i}^{M\mathit{\text{cal}}}}{}$$

If
${\sum}_{b=1}^{B}{U}_{b,i}^{*}=0$ then {*d**(*r*^{2}/*dr*}_{r=0}, and in the expansion

$$\widehat{\theta}*(\kappa )=\widehat{\theta}*(0)+{\mathit{\text{coef}}}_{1}\sqrt{\kappa}+{\mathit{\text{coef}}}_{2}\kappa +{\mathit{\text{coef}}}_{3}\sqrt{{\kappa}^{3}}+\dots $$

(A.4)

the term ${\mathit{\text{coef}}}_{1}\sqrt{\kappa}$ is vanishing.

Condition (A.3) decreases variability of the estimator. Indeed, under (A.3) the term
${\mathit{\text{coef}}}_{1}\sqrt{\kappa}$ in the expansion (A.4) is vanishing, and this expansion looks more like Taylor expansion w.r.t. κ. Therefore, the extrapolated value ^{*}(−1), which is the SIMEX estimator, will be more numerically stable.

The expression ${\sum}_{i=1}^{N}(1-{Y}_{i}){Q}_{i}^{\mathit{\text{mes}}}{f}_{i}\text{exp}\{({\sigma}_{M,i}^{2}-{\sigma}_{Q,i}^{2})/2\}/{M}_{i}^{\mathit{\text{mes}}}={\widehat{D}}_{\text{av}}{\sum}_{i=1}^{N}(1-{Y}_{i})$ is an unbiased estimator of ${\sum}_{i=1}^{N}(1-{Y}_{i}){D}_{i}^{\mathit{\text{tr}}}$, where ${D}_{i}^{\mathit{\text{tr}}}$ is defined in (9). Indeed,

$$\begin{array}{l}\mathbf{\text{E}}[(1-{Y}_{i}){Q}_{i}^{\mathit{\text{mes}}}{f}_{i}\text{exp}\{({\sigma}_{M,i}^{2}-{\sigma}_{Q,i}^{2})/2\}/{M}_{i}^{\mathit{\text{mes}}}|{Y}_{i},{Q}_{i}^{\mathit{\text{tr}}},{M}_{i}^{\mathit{\text{mes}}},{f}_{i}]\\ =(1-{Y}_{i}){f}_{i}\mathbf{\text{E}}[{Q}_{i}^{\mathit{\text{mes}}}\text{exp}(-{\sigma}_{Q,i}^{2}/2)|{Q}_{i}^{\mathit{\text{tr}}}]\text{exp}({\sigma}_{M,i}^{2}/2)/{M}_{i}^{\mathit{\text{mes}}}=(1-{Y}_{i}){f}_{i}{Q}_{i}^{\mathit{\text{tr}}}\mathbf{\text{E}}[{V}_{i}^{Q}\text{exp}(-{\sigma}_{Q,i}^{2}/2)]\mathbf{\text{E}}[{({V}_{i}^{M})}^{-1}]/{M}_{i}^{\mathit{\text{mes}}}=(1-{Y}_{i}){f}_{i}{Q}_{i}^{\mathit{\text{tr}}}\mathbf{\text{E}}[{({M}_{i}^{\mathit{\text{tr}}})}^{-1}|{M}_{i}^{\mathit{\text{mes}}}]=\mathbf{\text{E}}[(1-{Y}_{i}){D}_{i}^{\mathit{\text{tr}}}|{Y}_{i},{Q}_{i}^{\mathit{\text{tr}}},{M}_{i}^{\mathit{\text{mes}}},{f}_{i}].\end{array}$$

We derived this equality using the model of observations (7)–(8) and the assumption on the lognormality of multiplicative errors.

Therefore, equation (15), which can be rewritten as

$${\sum}_{i=1}^{N}(1-{Y}_{i})(1+\beta {\widehat{D}}_{\text{av}})={\sum}_{i=1}^{N}{Y}_{i}/{\lambda}_{0},$$

where the unknown variables are denoted as λ_{0} and β while in (15) the same variables are denoted as
${\widehat{\lambda}}_{0,b}^{*}(\kappa )$ and
${\widehat{\beta}}_{b}^{*}(\kappa )$, is constructed from

$${\sum}_{i=1}^{N}(1-{Y}_{i})(1+\beta {D}_{i}^{tr})={\sum}_{i=1}^{N}{Y}_{i}/{\lambda}_{0},$$

see equation (2), according to the Corrected Score method (Carroll et al., 2006). Like the SIMEX method, this method is functional and does not need the knowledge of the distribution of the latent variable.

Perturbations
${U}_{b,i}^{*}=\text{Normal}(0,{\sigma}_{Q,i}^{2})$, *b* = 1, …, *B*, *B* = 10, were simulated in such a way that for the fixed *i*, the random variables
${U}_{b,i}^{*}$ have joint Gaussian distribution with the covariance

$$\mathbf{\text{E}}({U}_{{b}_{1},i}^{*}{U}_{{b}_{2},i}^{*})=-{\sigma}_{Q,i}^{2}/(B-1),{b}_{1}\ne {b}_{2},$$

and therefore,
${\sum}_{b=1}^{B}{U}_{b,i}^{*}=0$, *i* = 1, …,*N*. The numerical extrapolation of the functions
${\widehat{\lambda}}_{0}^{*}(\kappa )$ and *EAR**(κ) was performed via approximation by the least-squares polynomial of the second order at points κ = 0; 0.2; 0.4; 0.6. The explicit formula for the value of the extrapolating polynomial at point κ = −1 is as follows:

$$\widehat{\theta}*(-1)=12.45\widehat{\theta}*(0)-9.35\widehat{\theta}*(0.2)-10.65\widehat{\theta}*(0.4)+8.55\widehat{\theta}*(0.6).$$

Here ^{*}(κ) stands for either
${\widehat{\lambda}}_{0}^{*}(\kappa )$ or *EAR**(κ).

The confidence interval for the SIMEX estimator is constructed based on the SIMEX Sandwich covariance matrix of this estimator. The covariance matrix is computed according to the recommendation in Carroll et al. (2006, pp. 111 and 393–395). We estimate the variance of the naive estimator
${\widehat{\theta}}_{b}^{*}(\kappa )$ taking *Y _{i}* and
${D}_{b,i}^{*}(\kappa )$ as the data using sandwich formula, and denote the variance estimate as
${\widehat{\tau}}_{b}^{2}(\kappa )$. Then SIMEX Sandwich variance estimator is the extrapolated value of
$\frac{1}{B}{\sum}_{b=1}^{B}{\widehat{\tau}}_{b}^{2}(\kappa )-\frac{1}{B-1}{\sum}_{b=1}^{B}{({\widehat{\theta}}_{b}^{*}(\kappa )-\widehat{\theta}*(\kappa ))}^{2}$ to the point κ = −1.

**Author Notes:** This work was supported by funds from the U.S. National Cancer Institute and the Radiation Protection Institute ATS of Ukraine. The authors also want to thank their colleges from the Institute of Endocrinology and Metabolism AMS of Ukraine and the Radiation Protection Institute ATS of Ukraine who contributed to the preparation of the results presented in the paper.

Alexander Kukush is supported by the Swedish Institute grant SI-01424/2007. Carroll’s research was supported by a grant from the National Cancer Institute (CA57030). This publication is based in part on work supported by Award Number KUSCI-016-04, made by King Abdullah University of Science and Technology (KAUST).

^{1}For the subpopulation from Section 6.1 the sample correlation between log(*Q ^{tr}*) and log(

^{2}Here the estimate of GSD* _{M}* based on the real data is used in the simulation of
${V}_{i}^{M}$. In this case the simulated error in the thyroid mass is heteroscedastic, and its average GSD is
$\frac{1}{N}{\sum}_{i=1}^{N}\text{exp}({\sigma}_{M,i})=1.33$. For all other cases the simulated errors
${V}_{i}^{M}$ are homoscedastic. Throughout the simulation, the errors
${V}_{i}^{Q}$ are homoscedastic.

Alexander Kukush, Ukraine Radiation Protection Institute.

Sergiy Shklyar, Ukraine Radiation Protection Institute.

Sergii Masiuk, Ukraine Radiation Protection Institute.

Illya Likhtarov, Ukraine Radiation Protection Institute.

Lina Kovgan, Ukraine Radiation Protection Institute.

Raymond J Carroll, Texas A&M University.

Andre Bouville, National Cancer Institute, NIH, DHHS.

- Anderson TW. An introduction to multivariate statistical analysis. New York: John Wiley and Sons; 1958.
- Apanasovich TV, Carroll RJ, Maity A. “SIMEX and variance estimation in semiparametric measurement error models,” Electronic Journal of Statistics. 2009;3:318–348. doi: 10.1214/08-EJS341. [PubMed] [Cross Ref]
- Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CA. Measurement Error in Nonlinear Models A Modern Perspective. Boca Raton: Chapman and Hall/CRC; 2006.
- Yamashita S, Shibata Y, editors. Chernobyl: A Decade; Proceedings of the Fifth Chernobyl Sasakawa Medical Cooperation Symposium; Kiev, Ukraine. 14–15 October 1996; Amsterdam: Elsevier; 1997.
- Health Risks from Exposure to Low Levels of Ionizing Radiation. Washington, DC: National Academy Press; (BEIR VII Phase 2, 2006),
- Hofer E. “How to account for uncertainty due to measurement errors in an uncertainty analysis using Monte Carlo simulation,” Health Physics. 2008;95:277–290. doi: 10.1097/01.HP.0000314761.98655.dd. [PubMed] [Cross Ref]
- Hosmer D, Lemeshow S. Applied Logistic Regression. New-York: John Wiley and Sons; 2000.
- Huber PJ. Proceedings of the Fifth Berkeley Symposium in Mathematical Statistics and Probability. Vol. 1. Berkeley: University of California Press; 1967. “The behavior of maximum likelihood estimates under nonstandard conditions,” pp. 221–233.
- Jacob P, Bogdanova TI, Buglova E, Chepurniy M, Demidchik Y, Gavrilin Y, Kenigsberg J, Meckbach R, Schotola C, Shinkarev S, Tronko MD, Ulanovsky A, Vavilov S, Walsh L. “Thyroid cancer risk in areas of Ukraine and Belarus affected by the Chernobyl accident,” Radiation Research. 2006;165:1–8. doi: 10.1667/RR3479.1. [PubMed] [Cross Ref]
- Kopecky KJ, Stepanenko V, Rivkind N, Voilleque P, Onstad L, Shakhtarin V, Parshkov E, Kulikov S, Lushnikov E, Abrosimov A, Troshin V, Romanova G, Doroschenko V, Proshin A, Tsyb A, Davis S. “Childhood thyroid cancer, radiation dose from Chernobyl and dose uncertainties in Bryansk Oblast, Russia: A population-based case-control study,” Radiation Research. 2006;166:367–374. doi: 10.1667/RR3596.1. [PubMed] [Cross Ref]
- Kukush A, Schneeweiss H. “Comparing different estimators in a nonlinear measurement error model. I, II,” Mathematical Methods of Statistics. 2005;14(1):53–79. and
**14**, No 2, 203–223. - Kukush A, Malenko A, Schneeweiss H. “Optimality of the quasi score estimator in a mean-variance model with applications to measurement error models,” Journal of Statistical Planning and Inference. 2009;139:3461–3472. doi: 10.1016/j.jspi.2009.03.022. [Cross Ref]
- Likhtarev IA, Prohl G, Henrichs K. Munich: GSF-Bericht 19/93, Institut für Strahlenschutz; 1993. “Reliability and accuracy of the
^{131}I thyroid activity measurements performed in the Ukraine after the Chernobyl accident in 1986,” - Likhtarev I, Minenko V, Khrouch V, Bouville A. “Uncertainties in thyroid dose reconstruction after Chernobyl,” Radiation Protection Dosimetry. 2003;105(1–4):601–608. [PubMed]
- Likhtarev I, Kovgan L, Vavilov S, Chepurny M, Bouville A, Luckyanov N, Jacob P, Voillequé P, Voigt G. “Post-Chornobyl thyroid cancers in Ukraine. Report 1: Estimation of thyroid doses,” Radiation Research. 2005;163:125–136. doi: 10.1667/RR3291. [PubMed] [Cross Ref]
- Likhtarev I, Kovgan L, Vavilov S, Chepurny M, Bouville A, Luckyanov N, Jacob P, Voillequé P, Voigt G. “Post-Chornobyl thyroid cancers in Ukraine. Report 2: Risk analysis,” Radiation Research. 2006a;166:375–386. doi: 10.1667/RR3593.1. [PubMed] [Cross Ref]
- Likhtarev I, Bouville A, Kovgan L, Luckyanov N, Voilleque P, Chepurny M. “Questionnaire- and Measurement-Based Individual Thyroid Doses in Ukraine Resulting from the Chornobyl Nuclear Reactor Accident,” Radiation Research. 2006b;166:271–286. doi: 10.1667/RR3545.1. [PubMed] [Cross Ref]
- Li Y, Guolo A, Hoffman FO, Carroll RJ. “Shared uncertainty in measurement error problems, with application to Nevada Test Site fallout data,” Biometrics. 2007;63:1226–1236. [PubMed]
- Lyon JL, Alder SC, Stone MB, Scholl A, Reading JC, Holubkov R, Sheng X, White GL, Hegmann KT, Anspaugh L, Hoffman FO, Simon SL, Thomas B, Carroll RJ, Meikle AW. “Thyroid disease associated with exposure to the Nevada Test Site radiation: a reevaluation based on corrected dosimetry and examination data,” Epidemiology. 2006;17:604–614. doi: 10.1097/01.ede.0000240540.79983.7f. [PubMed] [Cross Ref]
- Mallick B, Hoffman FO, Carroll RJ. “Semiparametric regression modeling with mixtures of Berkson and classical error, with application to fallout from the Nevada Test Site,” Biometrics. 2002;58:13–20. doi: 10.1111/j.0006-341X.2002.00013.x. [PubMed] [Cross Ref]
- Masiuk SV, Shklyar SV, Kukush AG, Vavilov SE. Radiation and Risk. 3. Vol. 17. Bulletin of the National Radiation and Epidemiological Registry; 2008. “Impact of doses uncertainties on the radiation risks estimation,” pp. 64–75. (in Russian). http://mechmat.univ.kiev.ua/eng/ppages/kukush/Public/files/Unsert_risk_influence.doc.
- Preston DL, Lubin JH, Pierce DA, McConney ME. EPICURE User’s Guide. Seattle, WA: Hirosoft Corporation; 1993.
- Reeves GK, Cox DR, Darby SC, Whitley E. “Some aspects of measurement error in explanatory variables for continuous and binary regression models,” Statistics in Medicine. 1998;17:2157–2177. doi: 10.1002/(SICI)1097-0258(19981015)17:19<2157::AID-SIM916>3.0.CO;2-F. [PubMed] [Cross Ref]
- Ron E, Hoffman FO. “Uncertainties in radiation dosimetry and their impact on dose-response analysis”. Proc. of a workshop held; September 3–5, 1997; Bethesda, Maryland. 1999. NIH Publication No. 99-4541.
- Schneeweiss H, Cheng C-L. “Bias of the structural quasiscore estimator of a measurement error model under misspecification of the regressor distribution,” Journal of Multivariate Analysis. 2006;97:455–473. doi: 10.1016/j.jmva.2005.03.010. [Cross Ref]
- Schneeweiss H, Kukush A. “Comparing the efficiency of structural and functional methods in measurement error models,” Probability Theory and Mathematical Statistics. 2009;80:131–142. doi: 10.1090/S0094-9000-2010-00800-3. (
*Teor Ĭmovir Matem Statist*, ISSN 0868-6904), 80, 119–129 Reprinted in*Theory of Probability and Mathematical Statistics*(ISSN 1547-7363, ISSN 0094-9000) [Cross Ref] - Shklyar S, Schneeweiss H. “A comparison of asymptotic covariance matrices of three consistent estimators in the Poisson regression model with measurement errors,” Journal of Multivariate Analysis. 2005;94:250–270. doi: 10.1016/j.jmva.2004.05.002. [Cross Ref]
- Stram DO, Kopecky KJ. “Power and uncertainty analysis of epidemiological studies of radiation-related disease risk in which dose estimates are based on a complex dosimetry system: some observations,” Radiation Research. 2003;160:408–417. doi: 10.1667/3046. [PubMed] [Cross Ref]
- Tronko MD, Howe GR, Bogdanova TI, Bouville AC, Epstien OV, Brill AB, Likhtarev IA, Fink DJ, Markov VV, Greenebaum E, Olijnyk VA, Masnyk IJ, Shpak VM, McConnell RJ, Tereshchenko VP, Robbins J, Zvinchuk OV, Zablotska LB, Hatch M, Luckyanov NK, Ron E, Thomas TL, Voillequé PG, Beebe GW. “A cohort study of thyroid cancer and other thyroid diseases after the Chornobyl accident: thyroid cancer in Ukraine detected during first screening,” Journal of the National Cancer Institute. 2006;98:897–903. doi: 10.1093/jnci/djj244. [PubMed] [Cross Ref]

Articles from The International Journal of Biostatistics are provided here courtesy of **Berkeley Electronic Press**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |