Home | About | Journals | Submit | Contact Us | Français |

**|**Int J Biostat**|**PMC3404553

Formats

Article sections

Authors

Related links

Int J Biostat. 2011 January 1; 7(1): Article 4.

Published online 2011 January 6. doi: 10.2202/1557-4679.1259

PMCID: PMC3404553

Donna Spiegelman, Harvard School of Public Health;

Copyright © 2011 The Berkeley Electronic Press. All rights reserved

This article has been cited by other articles in PMC.

The problem of covariate measurement error with heteroscedastic measurement error variance is considered. Standard regression calibration assumes that the measurement error has a homoscedastic measurement error variance. An estimator is proposed to correct regression coefficients for covariate measurement error with heteroscedastic variance. Point and interval estimates are derived. Validation data containing the gold standard must be available. This estimator is a closed-form correction of the uncorrected primary regression coefficients, which may be of logistic or Cox proportional hazards model form, and is closely related to the version of regression calibration developed by Rosner et al. (1990). The primary regression model can include multiple covariates measured without error. The use of these estimators is illustrated in two data sets, one taken from occupational epidemiology (the ACE study) and one taken from nutritional epidemiology (the Nurses’ Health Study). In both cases, although there was evidence of moderate heteroscedasticity, there was little difference in estimation or inference using this new procedure compared to standard regression calibration. It is shown theoretically that unless the relative risk is large or measurement error severe, standard regression calibration approximations will typically be adequate, even with moderate heteroscedasticity in the measurement error model variance. In a detailed simulation study, standard regression calibration performed either as well as or better than the new estimator. When the disease is rare and the errors normally distributed, or when measurement error is moderate, standard regression calibration remains the method of choice.

When validation data are available, the non-iterative *regression calibration* (RC) method can be used to obtain approximately consistent point and interval linear, logistic and Cox regression model parameter estimates with measurement error in one or more continuous covariates, provided certain assumptions are satisfied (Prentice 1982; Fuller 1987; Armstrong, Whittemore et al. 1989; Rosner, Willett et al. 1989; Rosner, Spiegelman et al. 1990; Rosner, Spiegelman et al. 1992; Spiegelman, McDermott et al. 1997; Wang, Hsu et al. 1997; Carroll, Ruppert et al. 2006). In Rosner *et al.*’s version of regression calibration, a standard multiple regression model is used to estimate the uncorrected point and interval estimates of the parameters, and these are bias-corrected using an estimate of the slopes from a linear measurement error model for the true exposure given the surrogate obtained from the validation data. A *Science Citation Index* search in April, 2010 on the three papers by Rosner *et al.*(1989,1990,1992) yielded 627 citations, approximately half of which were published in epidemiology and medical journals. Many of these appear to have involved direct applications of the methodology to original analyses of data.

This method, along with others proposed previously for the covariate measurement error problem, including SIMEX (Cook and Stefanski 1995) and the application of empirical process theory to survival data analysis (Huang and Wang 2000), require homoscedastic measurement error variance. The distributions of many environmental and dietary intake variables are often highly skewed, raising concern that the homoscedasticity requirement of regression calibration and other methods may often be unrealistic for important potential applications. For example, in a recent publication, moderate heteroscedasticity was observed in the measurement error model for exposure to airborne soot and nitrogen dioxide (Van Roosbroeck S, Li R et al. 2008), and in a recently published nutritional epidemiology study, moderate heteroscedasticity was observed in the measurement error models for average daily alcohol intake in a pooled analysis of renal cancer incidence (Lee, Hunter et al. 2007). Two other motivating examples of measurement error model heteroscedasticity are studied in depth in this paper, one looking at health symptoms in relation to exposure to anti-neoplastic drugs (Spiegelman and Valanis 1998), and one looking at alcohol intake in relation to breast cancer incidence (Willett, Stampfer et al. 1987). Thus, there is a need to extend regression calibration to apply when the requirement for homoscedasticity of the measurement error model variance is violated, and to compare this extension to several less restrictive iterative approaches, including maximum likelihood and semi-parametric efficient estimating equations (Robins, Hsieh et al. 1995). This paper addresses this need. In Section 2, Rosner *et al.*’s version of regression calibration is reviewed and the new estimator, * _{RCH}*, is derived. Next, we consider the case when the true exposure variable,

The parameter of interest is *β _{1}* from the generalized linear model

$$g[E(Y|x,\mathrm{\mathit{\text{U}})]={\beta}_{0}+{\beta}_{1}x+{\mathit{\text{\beta}}}_{\mathit{\text{2}}}^{T}\mathit{\text{U}},}$$

(1)

where *Y* is the outcome of interest, *g[A]* is a link function which linearizes the conditional mean function in the covariates and ** U** is a vector of covariates measured without error. Substituting the covariate measured with error,

$$\widehat{\mathit{\text{\beta}}}={({\widehat{\beta}}_{1},\mathrm{{\widehat{\mathit{\text{\beta}}}}_{2}^{T})}T}^{,}$$

are adjusted for measurement error in a one-step procedure. When *g*[*A*] = *E*(*Y* | *X*, ** U**), regression calibration is applied to a linear regression model (Fuller 1987). When

$$\mathit{\text{log}}[I(t)|X,\mathrm{\mathit{\text{U}})]=\mathit{\text{log}}[I(t|X=0)]+{\beta}_{1}X+{\mathit{\text{\beta}}}_{\mathit{\text{2}}}^{T}\mathit{\text{U}},}$$

where *I* (*t*) is the incidence rate at time *t*, then when the disease is rare, regression calibration can be applied to a Cox proportional hazards regression model (Prentice 1982; Spiegelman, McDermott et al. 1997; Wang, Hsu et al. 1997; Hu, Tsiatis et al. 1998). The application of regression calibration to these three basic models, all of which are used widely in epidemiology, was unified with a special focus on interval estimation and computing in SAS (Spiegelman, McDermott et al. 1997). Application of measurement error methods to data requires that the main study, which contains data (*Y _{i}*,

The point and interval estimates of effect can be corrected for measurement error using Rosner *et al*.’s formulas (Rosner, Spiegelman et al. 1990)

$$\begin{array}{l}\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{{\widehat{\mathit{\text{\beta}}}}_{\mathit{\text{RC}}}=\widehat{\mathit{\text{\beta}}}{\widehat{\mathit{\text{\Gamma}}}}^{-1}}}\widehat{\mathit{\text{Var}}}({\widehat{\mathit{\text{\beta}}}}_{\mathit{\text{RC}}})\approx {\left[{\widehat{\mathit{\text{\Gamma}}}}^{-1}\right]}^{T}\widehat{\mathit{\text{Var}}}(\widehat{\mathit{\text{\beta}}}){\widehat{\Gamma}}^{-1}+\widehat{\mathit{\text{\beta}}}\widehat{\mathit{\text{Var}}}({\widehat{\mathit{\text{\Gamma}}}}^{-1}){\widehat{\mathit{\text{\beta}}}}^{T}}}}}}}}}}}}}}}}\end{array}$$

(2)

where ** and
$\widehat{\mathit{\text{Var}}}(\widehat{\mathit{\text{\beta}}})$ are estimated by fitting (1) to the main study data, ***(Y,X,**U**).* The first row of **, denoted
$\widehat{\mathit{\text{\gamma}}}={({\widehat{\gamma}}_{1},\mathrm{{\widehat{\mathit{\text{\gamma}}}}_{2}^{T})}T}^{}$, and
$\widehat{\mathit{\text{Var}}}(\widehat{\mathit{\text{\gamma}}})$ are obtained from fitting the linear regression model to the validation data
**

$$E(x|X,\mathrm{\mathit{\text{U}})={\alpha}^{\prime}+{\gamma}_{1}X+{\gamma}_{2}^{T}\mathit{\text{U}},}$$

(3)

under the assumption that

$$\mathit{\text{Var}}(x|X,\mathrm{\mathit{\text{U}})={\sigma}^{2}.}$$

(4)

Appendix 1 of Rosner, Spiegelman et al. (1990) gives the construction of ** from **** and equation (A7) of the same paper gives
$\widehat{\mathit{\text{Var}}}({\widehat{\mathit{\text{\Gamma}}}}^{-1})$. Assumption (4) is the homoscedasticity assumption, the relaxation of which is the focus of this manuscript. Regression calibration has been presented by others for use in a variety of settings (Prentice 1982; Fuller 1987; Armstrong, Whittemore et al. 1989; Rosner, Spiegelman et al. 1990; Rosner, Spiegelman et al. 1992; Spiegelman, McDermott et al. 1997; Wang, Hsu et al. 1997; Carroll, Ruppert et al. 2006). All versions of the regression calibration method assume that measurement error is non-differential with respect to the response variable, ***Y*, i.e.

$$f(Y|x,\mathrm{X,\mathrm{\mathit{\text{U}})=f(Y|x,\mathrm{\mathit{\text{U}}),}}}$$

where *f(*·*)* is a density function. In addition to the assumptions specified by (1), (3) and (4), in the case of univariate *x*, the key additional requirement for approximate unbiasedness of the regression calibration estimator when *g*[*E*(*Y* | *x*, ** U**)] in (1) is logistic is that either
${\beta}_{1}^{2}{\sigma}^{2}$ is small, or

It is sometimes the case with biological variables that heteroscedasticity is evident over the observed range of the data; when this occurs, it typically takes the form that the spread increases with the level. Sometimes but by no means always, the variance can be stabilized by log transforming the data but this solution can be undesirable when the variable in question is to be used as a predictor variable in a regression model, and the scientific hypothesis focuses on measuring the relationship of the variable in its originally observed scale to the outcome variable of interest. In this paper, we assume that an empirically verifiable linear model for the mean, (3), holds, but assumption (4), that of constant variance of the measurement error model del for *x* | *X*, ** U**, is untenable for the observed data. Instead, it may appear from the available validation data that

$$\mathit{\text{Var}}({x}_{i}|{X}_{i},\mathrm{{\mathit{\text{U}}}_{i})=h({X}_{i},\mathrm{{\mathit{\text{U}}}_{i}){\sigma}^{2}}}$$

(5)

is a more reasonable model for the variance, where *h* (*X _{i}*,

Standard regression calibration, which assumes homoscedastic measurement error, can be derived by a first order Taylor series expansion around the naïve likelihood in which the mis-measured exposure is treated as if there were no error. In what follows, the second order expansion is developed. By adding the additional term, measurement error model heteroscedasticity can be accommodated, albeit with an unavoidably more complex estimator. In another approach to deriving the regression calibration estimator, the approximate logistic likelihood is derived under the assumptions of normal residual measurement error and rare disease. We develop these approaches in more detail in what follows below, to provide the formal derivation for the new estimator.

We assume the following mean and variance model for ** x** given

$$\begin{array}{c}E(\mathit{\text{x}}|\mathit{\text{X}},\mathrm{\mathit{\text{U}})={\mathit{\text{\alpha}}}^{\prime}+{\mathit{\text{\Gamma}}}_{1}\mathit{\text{X}}+{\mathit{\text{\Gamma}}}_{2}\mathit{\text{U}},}& \mathit{\text{Var}}(\mathit{\text{x}}|\mathit{\text{X}},\mathrm{\mathit{\text{U}})={\sigma}^{2}\mathit{\text{h}}(\mathit{\text{X}},\mathrm{\mathit{\text{U}}),}}\end{array}$$

*dim(**x**)=q,* and ** h** (

$${f}_{1}(Y=1|\mathit{\text{X}},\mathrm{\mathit{\text{U}})\approx {e}^{{\beta}_{0}+{\mathit{\text{\beta}}}_{1}^{T}({\mathit{\text{\alpha}}}^{\prime}+{\mathit{\text{\Gamma}}}_{1}\mathit{\text{X}}+{\mathit{\text{\Gamma}}}_{2}\mathit{\text{U}})+\frac{{\sigma}^{2}}{2}{\mathit{\text{\beta}}}_{1}^{T}\mathit{\text{h}}(\mathit{\text{X}},\mathrm{\mathit{\text{U}}){\mathit{\text{\beta}}}_{1}+{\mathit{\text{\beta}}}_{2}^{T}\mathit{\text{U}}}}}$$

(6)

where

$$\begin{array}{c}{\beta}_{0}^{*}={\beta}_{0}+{\mathit{\text{\beta}}}_{1}^{T}{\mathit{\text{\alpha}}}^{\prime}\\ {\mathit{\text{\beta}}}_{\mathrm{11}}^{*}={({\mathit{\text{\beta}}}_{1}^{T}{\mathit{\text{\Gamma}}}_{1})}^{T}\\ {\mathit{\text{\beta}}}_{\mathrm{12}}^{*}=\frac{{\mathit{\text{\beta}}}_{1}\sigma}{\sqrt{2}}\\ {\mathit{\text{\beta}}}_{2}^{*}={({\mathit{\text{\beta}}}_{1}^{T}{\mathit{\text{\Gamma}}}_{2})}^{T}+{\mathit{\text{\beta}}}_{2}\end{array}$$

(Kuha 1994). A similar result for general mean-variance models, using a second-order Taylor series expansion about *E(x*|*X,**U**)* (i.e. a small measurement error approximation) was obtained for scalar *x* by Carroll *et al.* (Carroll, Ruppert et al. 2006). For *dim(x)*>1, Carroll and Stefanski (Carroll and Stefanski 1990) presented an analogous result, but omitted the detailed derivation. Similar approximations have been given for the relative risk function in ** X**(

Insight can be gained by inspecting the form of (6) with no covariates, ** U**, and for scalar

$${f}_{1}(Y|x)=\frac{{e}^{({\beta}_{0}+{\beta}_{1}x)Y}}{1+{e}^{{\beta}_{0}+{\beta}_{1}x}}\approx {e}^{({\beta}_{0}+{\beta}_{1}x)Y}.$$

Figure 1 shows the theoretical relationship between *log[E(Y)]* under the rare disease assumption and *x* given by (1) (solid line), and the approximate induced relationships between *log[E(Y)]* and *X* for *h(X)=X* (dotted line), *h(X)=X ^{2}* (short dashed line), and

To simplify notation,
${[{\beta}_{\mathrm{12}}^{*}]}^{2}$ was rewritten as
${\tilde{\beta}}_{\mathrm{12}}^{*}$. Then, solving for *β*_{1} in the two terms multiplying *X* and *h(X)* in (6), two estimators for *β*_{1},

$${\widehat{\beta}}_{\mathrm{11}}={\widehat{\beta}}_{\mathrm{11}}^{*}/{\widehat{\gamma}}_{1}$$

(7)

and

$${\widehat{\beta}}_{\mathrm{12}}=\mathit{\text{sign}}({\widehat{\beta}}_{\mathrm{11}})\sqrt{2|{\widehat{\tilde{\beta}}}_{\mathrm{12}}^{*}|/{\widehat{\sigma}}^{2}}$$

where *sign(t)* = 1 if *t* is positive and −1 if *t* is negative, are available from the uncorrected regression of *Y* on *X* and *h(X)*. Of course, the approximate likelihood (6) can be directly fit to the data using an iterative approach, jointly estimating all parameters simultaneously. Alternatively, we suggest a procedure for obtaining _{RCH,1} which can be constructed from standard software tools and used in routine analysis in what follows. Both approaches will provide consistent estimators, but the maximum likelihood estimator will, of course, be more efficient asymptotically. Later, we will compare the behavior of these estimators in a simulation study.

The new method is as follows:

- A logistic regression model of
*Y*on*X*,*h(X,**U**)*andis run in the main study to obtain ${\widehat{\beta}}_{\mathrm{11}}^{*}$ and ${\widehat{\beta}}_{\mathrm{12}}^{*}$ and their estimated variances,*U* - A weighted linear regression is run in the validation study, with weights 1/
*h(X,**U**)*, to obtain^{2}and_{1}. _{11}and_{12}are formed from the formulas above and efficiently combined to produce a single estimate,_{RCH,1},$${\widehat{\beta}}_{\mathit{\text{RCH}},1}={w}_{1}{\widehat{\beta}}_{\mathrm{11}}+(1-{w}_{1}){\widehat{\beta}}_{\mathrm{12}}$$

The asymptotically minimum variance weights and their derivation, as well as the derivation of the formula for the variance of _{RCH,1}, are given in Appendix 1. _{RCH,2}, the measurement-error-corrected estimator of the coefficients corresponding to ** U**, has form

$${\widehat{\mathit{\text{\beta}}}}_{\mathit{\text{RCH}},2}$$

(8)

Its asymptotic variance is derived in Appendix 2, along with *Cov*(_{RCH,1}, _{RCH,2}). As is evident from (6), this estimator can only be used for models with scalar *x*. For multivariate *x* with heteroscedastic covariance for ** x | X, U**, a term is added to the model for

$$2{\sigma}^{2}{\mathit{\text{\beta}}}_{1}^{T}\mathit{\text{h}}(\mathit{\text{X}},\mathrm{\mathit{\text{U}};\mathrm{\mathit{\text{\u2211}}){\mathit{\text{\beta}}}_{1},}}$$

which is a complicated function of the *q* elements of *β*_{1} that cannot be used to uniquely solve for the *q* elements of
${\widehat{\beta}}_{\mathrm{12}}^{*}$.

A result similar in form to (6) was given by Prentice (Prentice 1982) for the proportional hazards regression model on a vector *X**(t),* in which one or more of the elements of *X**(t)* are measured with error, and where the conditional distribution of *x**(t)* given *X**(t)* is multivariate normal with a linear mean in *X**(t)* and variance ** Σ**(

Under either small measurement error or a rare disease with normal errors for *x*|*X,* ** U**, it is evident that if either
${\sigma}_{i}^{2}={\sigma}^{2}h({X}_{i},\mathrm{{\mathit{\text{U}}}_{i})}$ or
${\beta}_{1}^{2}$ is small, for scalar

Standard regression calibration is valid even if instead of ** x**, the gold standard, an unbiased imperfect gold standard,
${\mathit{\text{x}}}_{i}^{*}$, is observed in the validation study, where
${x}_{i}^{*}={x}_{i}+{i}_{}$ and

$$E({x}_{i}^{*}|{X}_{i})={\alpha}^{\prime}+{\gamma}_{1}{X}_{i}+{\mathit{\text{\gamma}}}_{2}{\mathit{\text{U}}}_{i}$$

to the data. However,

$$\mathit{\text{Var}}({x}_{i}^{*}|{X}_{i})=h(X,\mathrm{{\mathit{\text{U}}}_{i}){\sigma}^{2}+\mathit{\text{Var}}(e),}$$

where
${e}_{i}={x}_{i}-{x}_{i}^{*}$, so Step 2 in the procedure for obtaining _{RCH,1} given above will not provide a valid estimate of *h*(*X _{i}*,

Another important special case emerges when *h(X)=X.* Then, from equation (6) we obtain

$${\widehat{\beta}}_{\mathit{\text{RCH}},1}=\left[-{\widehat{\gamma}}_{1}+\mathit{\text{sign}}({\widehat{\beta}}_{11})\sqrt{{\widehat{\gamma}}_{1}^{2}+2{\widehat{\sigma}}^{2}{\widehat{\beta}}_{11}^{*}}\right]/{\widehat{\sigma}}^{2}$$

(9)

where
${\widehat{\beta}}_{\mathrm{11}}^{*}$ is the regression coefficient for *X* obtained from fitting a logistic regression model of *Y* on *X* and ** U**,

There are an important set of problems where *x* is unobservable, i.e. no “gold standard” exists. In some of these cases, the measurement error model

$$\begin{array}{ccc}\hfill {X}_{ij}& =& {x}_{i}+{\mathrm{ij}}_{}\hfill & \hfill E({\mathrm{ij}}_{)}=& 0\hfill & \hfill \mathit{\text{Var}}({\mathrm{ij}}_{)}=& {\sigma}^{2}\hfill & \hfill \mathit{\text{Cov}}({\mathrm{ij}}_{,}\end{array}$$

(10)

is considered reasonable, where *n _{2i}* is the number of replicates for subject

$$f({x}_{i},\mathrm{{X}_{ij})=\frac{\text{exp}\left\{-\frac{1}{2}\left[\frac{{({X}_{ij}-{x}_{i})}^{2}}{h({x}_{i}){\sigma}^{2}}+\frac{{({x}_{i}-{\mu}_{x})}^{2}}{{\sigma}_{x}^{2}}\right]\right\}}{2\pi \sqrt{h({x}_{i}){\sigma}^{2}{\sigma}_{x}^{2}}},}$$

and

$$f({x}_{i}|{X}_{ij})=\frac{\frac{\text{exp}\left\{-\frac{1}{2}\left[\frac{{({X}_{ij}-{x}_{i})}^{2}}{h({x}_{i}){\sigma}^{2}}+\frac{{({x}_{i}-{\mu}_{x})}^{2}}{{\sigma}_{x}^{2}}\right]\right\}}{2\pi \sqrt{h({x}_{i}){\sigma}^{2}{\sigma}_{x}^{2}}}}{\underset{-\infty}{\overset{+\infty}{\int}}\frac{\text{exp}\left\{-\frac{1}{2}\left[\frac{{({X}_{ij}-{x}_{i})}^{2}}{h({x}_{i}){\sigma}^{2}}+\frac{{({x}_{i}-{\mu}_{x})}^{2}}{{\sigma}_{x}^{2}}\right]\right\}}{2\pi \sqrt{h(x){\sigma}^{2}{\sigma}_{x}^{2}}}dx}.$$

Neither *f(x _{i},X_{ij})* nor

$$f({Y}_{i}|{X}_{i})={\int}_{x}f({Y}_{i}|x)f(x|{X}_{i})dx$$

exists when
$\mathit{\text{Var}}({\mathrm{ij}}_{)}$ and *ε* is Gaussian. If functions *h(x _{i})* are found which fit the data at hand, it is unlikely that the resulting expression for

Iterative methods can also be applied to the problem of heteroscedastic measurement error in regression covariates. These methods will typically have the advantage of relaxing some of the assumptions required by closed-form methods, such as (1), (3) and (4), but will have the disadvantage of computational complexity which is a barrier to use in applications. In a main study/external validation study design for a binomial outcome and a Gaussian measurement error model, the log likelihood of the data is equal to

$$L(\mathit{\text{\beta}},\mathrm{{\alpha}^{\prime},\mathrm{\mathit{\text{\gamma}},\mathrm{{\sigma}^{2})=\sum _{i=1}^{{n}_{1}}\text{log}[{f}_{3}({Y}_{i}|{X}_{i},\mathrm{{\mathit{\text{U}}}_{i};\mathit{\text{\beta}},\mathrm{{\alpha}^{\prime},\mathrm{\mathit{\text{\gamma}},\mathrm{{\sigma}^{2})]}+\sum _{i={n}_{1}+1}^{{n}_{1}+{n}_{2}}\text{log}[\mathrm{({x}_{i}|{X}_{i},\mathrm{{\mathit{\text{U}}}_{i};{\alpha}^{\prime},\mathrm{\mathit{\text{\gamma}},\mathrm{{\sigma}^{2})]}.}}}}}}}}}$$

We assume here that (*α′*, **γ**,
${\sigma}_{i}^{2}$) is the normal density with mean given by (3) and variance given by (4), and the probability that *Y _{i}* =

$$\begin{array}{l}\text{Pr}({Y}_{i}|{X}_{i},\mathrm{{\mathit{\text{U}}}_{i})={f}_{3}({Y}_{i}|{X}_{i},\mathrm{{\mathit{\text{U}}}_{i};\mathrm{\mathit{\text{\beta}},\mathrm{{\alpha}^{\prime},\mathrm{\mathit{\text{\gamma}},\mathrm{{\sigma}^{2})}}\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{\mathrm{=\underset{-\infty}{\overset{\infty}{\int}}\frac{{[\text{exp}({\beta}_{0}+{\beta}_{1}x+{\mathit{\text{\beta}}}_{2}^{T}{\mathit{\text{U}}}_{i})]}^{{Y}_{i}}}{1+\text{exp}({\beta}_{0}+{\beta}_{1}x+{\mathit{\text{\beta}}}_{2}^{T}{\mathit{\text{U}}}_{i})}\frac{\text{exp}\left[-\frac{{(x-{\alpha}^{\prime}-{\gamma}_{1}{X}_{i}-{\mathit{\text{\gamma}}}_{2}^{T}{\mathit{\text{U}}}_{i})}^{2}}{2h({X}_{i}){\sigma}^{2}}\right]}{\sqrt{2\pi h({X}_{i}){\sigma}^{2}}}dx.}}}}}}}}}}}}}}}}}}}}}}\end{array}$$

The numerical method given by (Crouch and Spiegelman 1990) can be used to evaluate *f _{3}* (

$$\begin{array}{l}\mathit{\text{logit}}[{f}_{3}({X}_{i};\mathit{\text{\beta}},\mathrm{{\alpha}^{\prime},\mathrm{\mathit{\text{\gamma}},\mathrm{{\sigma}^{2})]}}\approx {\beta}_{0}+{\beta}_{1}[{\alpha}^{\prime}+{\mathit{\text{\gamma}}}^{T}({X}_{i},\mathrm{{\mathit{\text{U}}}_{i})]+{\beta}_{2}^{T}{\mathit{\text{U}}}_{i}+\frac{1-\text{exp}({\beta}_{0}+{\beta}_{2}^{T}{\mathit{\text{U}}}_{i})}{2[1+\text{exp}({\beta}_{0}+{\beta}_{2}^{T}{\mathit{\text{U}}}_{i})]}{\beta}_{1}^{2}h({X}_{i},\mathrm{{\mathit{\text{U}}}_{i}){\sigma}^{2}.}}}\end{array}$$

Likelihoods based on both the exact, * _{ML}*, and approximate,

A consistent, semi-parametric efficient estimator was proposed by (Robins, Hsieh et al. 1995), where (Begun, Hall et al. 1983) defined the semi-parametric efficiency bound as the smallest possible variance obtained by any estimator which is consistent for ** β** over all possible measurement error models for

(Valanis, Vollmer et al. 1993) described a cross-sectional study of acute health effects from occupational chemotherapeutics exposure in 675 hospital pharmacists. The research objective is estimation and inference about the prevalence odds ratio for acute health effects related to chemotherapeutics exposure. Here, we will focus on fever prevalence in relation to exposure. There were 110 cases of fever. Average weekly chemotherapeutics exposure (*X*) was self-reported on questionnaire; in a sub-sample of 56 pharmacists on-site drug mixing diaries were kept for 1–2 weeks (*x*). The correlation between these two methods of exposure assessment was 0.70. The correlation between the predicted values from the linear measurement error model for diary data (*x),* conditional upon the questionnaire data *(X)* and other model covariates, and the absolute value of the residuals (Carroll and Ruppert 1988) was 0.21, indicative of moderate heteroscedasticity (Figure 2).

These analyses were adjusted for three covariates (** U**): age in years, shift work (1 if night or rotating shift, 0 if day shift), and employed by a community hospital (1 if yes, 0 if no). In previously published analysis of these data, the uncorrected prevalence odds ratio for a top to bottom quintile contrast in number of drugs mixed per day, corresponding to an increment of 52 drugs/day, was 1.08 (95% confidence interval (CI) 1.02–1.15) and the regression calibration estimate of the same quantity, ignoring the observed heteroscedasticity in the measurement error model, was 1.22 (95% CI 1.05–1.45) (Spiegelman and Valanis 1998). Using maximum likelihood methods with a gamma measurement error model that was empirically verified in the ACE validation study and allows for heteroscedasticity which depends on covariates in an arbitrary manner, the estimated prevalence ratio was 1.17 (95% CI 1.04–1.26) (Spiegelman and Casella 1997). Note that the odds ratio will be a good approximation for the prevalence ratio when the outcome is rare and when the prevalence ratio is near one.

We first needed to identify a form for *h(X)*, and we searched over the class of functions *h(X)=(X+b) ^{p}*. We sought to find the transformation in this class for which the correlation between the absolute value of the weighted residuals from the weighted least squares regression of

Correlation between the weighted absolute value of the measurement error model residuals and the predicted values for selected weight functions, ACE Study^{1}

Table 2 gives the results of this analysis, where we find that the results which take into account the apparent heteroscedasticity in the measurement error model are virtually unchanged from the standard regression calibration results which ignore this feature of the data. Both the exact and approximate maximum likelihood estimates and the semi-parametric locally efficient estimate were similar to the regression calibration estimates. Application of the semi-parametric estimator led to a large efficiency loss. The regression calibration estimators took trivially more CPU time than the uncorrected estimator, and the iterative estimators took 10 to 100 fold more CPU time than these. With a small data set such as here, none of these CPU times were prohibitive.

Willett et al. described a prospective study of the relationship between breast cancer incidence and moderate alcohol consumption among 89,538 U.S. women aged 34–59 who were followed for 4 years beginning in 1980 (Willett, Stampfer et al. 1987). After updating the original data to include 8 years of follow-up, 1466 cases occurred during this study period. Alcohol intake was calculated from three questions about the consumption of beer, wine and liquor that were included on a 61-item food frequency questionnaire data. These data were validated in a sub-sample of 173 women with four one-week weighed diet records (Willett, Sampson et al. 1985). The correlation between these two methods of exposure assessment was 0.85. For average daily alcohol intake (g/day), the correlation between the absolute value of the regression residuals and the predicted values from the linear measurement error model for the diet record data *(x)* conditional upon the food frequency data *(X)* and other model covariates was 0.44 (Figure 3).

A logistic regression model,
$\mathit{\text{logit}}({Y}_{i})={\beta}_{0}+{\beta}_{1}{X}_{i}+{\mathit{\text{\beta}}}_{2}^{T}{\mathit{\text{U}}}_{i}$, was fit to the data, where *Y _{i}* is the probability that participant

Although *h*(*X*) = log(*X* + 1.005) was an option for minimizing the correlation between the weighted residuals and the predicteds from the measurement error model fit in the validation study, transformations of the exposure of interest are not desirable for substantive interpretability unless absolutely necessary for statistical reasons. Thus, the optimal transformation of *X* for minimizing the correlation between the residuals and the predicteds was again a linear one (Table 3), and the special case of * _{RCH}* discussed earlier was again applied.

Correlation between the weighted absolute value of the measurement error model residuals and the predicted values for selected weight functions, Nurses’ Health Study^{1}

A small non-zero value for the parameter *b* was used for the reasons given above in Section 3.1. There was a slight attenuation in * _{RCH}* relative to

We studied the small sample behavior of all estimators discussed in Section 2 by simulation. We designed the simulation study to follow the Nurses’ Health Study described in Section 3.2, and varied the validation study sample size (*n _{2}=173 or 346*), the parameter

Results from the simulation study are given in Table 5. The most striking result is that standard regression calibration, * _{RC}*, does as well or better than all the other estimators considered, including maximum likelihood methods, in all scenarios studied. This was true even when measurement error or heteroscedasticity were severe, and from the point of view of both bias and coverage probability. The positive bias predicted in Figure 1 was apparent when

It is of interest what practical gain is likely be derived from the application of the more robust estimator, * _{SPLE}*. As can be seen in Table 5, the percent bias, mean square error and coverage probability of

Although the derivation given in equation (6) has appeared previously, this estimator, * _{RCH}*, is novel. The new estimator developed in this paper,

It took longer to find * _{aML}* than

Although the marginal distributions of *x* and *X* were sharply skewed in both the ACE data (for number of drugs mixed per week) and the NHS (for grams of alcohol per day), the distributions of the standardized residuals from the models for *E(x*|*X)* were symmetrized to a large extent. Marginal distributions should not, in general, be used as evidence for or against heteroscedasticity in a conditional variance. The correlations between the absolute value of the measurement error model regression residuals and the predicted values from these regressions were moderate. As shown in Section 2, if either
${\beta}_{1}^{2}$ or *σ ^{2}* is small, the convergent value of

An extensive simulation study of * _{RCH}* and the other estimators was conducted, based upon a data structure motivated by the Nurses’ Health Study data considered in this paper. The results clearly indicated that under the scenarios studied,

The coverage probability of * _{RCH}* was below the desired value in nearly all cases considered. This family of heteroscedastic variance functions is similar in spirit to the Box-Cox family of transformations, where, following the initial suggestion of Box and Cox (Box and Cox 1964), standard practice in applied statistics is to estimate the parameter first and then treat this estimate as fixed when estimating the remaining parameters. We did similarly here. This two-step procedure was followed in the computation of the iterative estimators as well. It has been well established that in fitting standard models such as (1) with a heteroscedastic variance function such as (5), the asymptotic distribution of

$\frac{{\sigma}^{2}}{2}{\beta}_{1}^{2}\stackrel{}{h(X,\mathrm{U)}\xaf}$ ranged over several orders of magnitude, from 0.0000091 to 0.00882 in the simulations shown n in Table 5, where
$\stackrel{}{h(X,\mathrm{U)}\xaf}$ is the average *h*(*X*, *U*) in the data. Kuha had previously suggested that the bias in * _{RC}* in the homoscedastic measurement error case would be low when
$\frac{{\sigma}^{2}}{2}{\beta}_{1}^{2}$ was less than 0.5 (Kuha 1994), but we found in another implementation of the regression calibration estimator with homoscedastic measurement error that 0.5 was a far too liberal a criterion, with unacceptable levels of bias when
$\frac{{\sigma}^{2}}{2}{\beta}_{1}^{2}$ was much smaller than 0.5 (Weller, Milton et al. 2007). In the present simulation study, the Spearman correlation of
$\frac{{\sigma}^{2}}{2}{\beta}_{1}^{2}\stackrel{}{h(X,\mathrm{U)}\xaf}$ with bias in

The maximum likelihood approaches considered here are strictly valid only when the distribution for *x|X,*** U** is Gaussian. We did not study the performance of the maximum likelihood estimators when a mis-specified likelihood is fit due to incorrect distributional assumptions about

In summary, as predicted by the theory, standard regression calibration is adequate when measurement error is not severe or the mis-measured covariate effect is moderate, even when heteroscedasticity is severe. It may be worthwhile to recall that covariate measurement error leads to an exponentially increasing information loss given the same sample size and all other conditions held constant; for example, it is well known that under classical homoscedastic measurement error, the effective sample size is decreased by the factor *ρ _{x,X}^{2}* (Fleiss 1986; White, Armstrong et al. 2006). In the most extreme case of measurement error considered in our simulation study, where

Let *V _{1}* =

$${V}_{1}\approx \frac{\mathit{\text{Var}}({\widehat{\beta}}_{\mathrm{11}}^{*})}{{\gamma}_{1}^{2}}+\frac{{[{\widehat{\beta}}_{\mathrm{11}}^{*}]}^{2}\mathit{\text{Var}}({\widehat{\gamma}}_{1})}{{\gamma}_{1}^{4}}.$$

Under the heteroscedastic measurement error model (5), when ** γ** is estimated using weighted linear regression with weights

$$\mathit{\text{Var}}(\widehat{\gamma})\approx {\sigma}^{2}{[\{{({X}_{i},\mathrm{{\mathit{\text{U}}}_{i}^{T})}T}^{\}}}^{}$$

where
$\{{({X}_{i},\mathrm{{\mathit{\text{U}}}_{i}^{T})}T}^{\}}$ is an *n _{2}* × (

$${V}_{2}\approx \frac{\mathit{\text{Var}}({\widehat{\beta}}_{\mathrm{12}}^{*})}{2{\sigma}^{2}|{\beta}_{\mathrm{12}}^{*}|}+\frac{|{\beta}_{\mathrm{12}}^{*}|\mathit{\text{Var}}({\widehat{\sigma}}^{2})}{2{\sigma}^{6}},$$

since by arguments analogous to those given in Appendix 1 of Spiegelman *et al.* (Spiegelman, Carroll et al. 2001),
$\mathit{\text{Cov}}({\widehat{\beta}}_{\mathrm{12}}^{*},\mathrm{{\widehat{\sigma}}^{2})}$ is asymptotically 0, and

$${V}_{\mathrm{12}}\approx \mathit{\text{sign}}({\beta}_{\mathrm{11}}^{*})\mathit{\text{sign}}({\beta}_{\mathrm{12}}^{*})\frac{\mathit{\text{Cov}}({\widehat{\beta}}_{\mathrm{11}}^{*},\mathrm{{\widehat{\beta}}_{\mathrm{12}}^{*})}{\gamma}_{1}\sigma \sqrt{2|{\beta}_{\mathrm{12}}^{*}|}}{.}$$

All other covariance terms are 0, since by the Gauss-Markov theorem, *Cov*(* _{1}*,

$$\mathit{\text{Cov}}[\left(\begin{array}{c}{\widehat{\beta}}_{\mathrm{11}}^{*}\\ {\widehat{\beta}}_{\mathrm{12}}^{*}\end{array}\right),\mathrm{\left(\begin{array}{c}{\widehat{\gamma}}_{1}\\ {\widehat{\sigma}}^{2}\end{array}\right)}]\approx 0.$$

Thus,

$$\mathit{\text{Var}}[{w}_{1}{\widehat{\beta}}_{\mathrm{11}}+(1-{w}_{1}){\widehat{\beta}}_{\mathrm{12}}]={w}_{1}^{2}{V}_{1}+{(1-{w}_{1})}^{2}{V}_{2}+2{w}_{1}(1-{w}_{1}){V}_{\mathrm{12}}$$

(1)

The estimates of *V _{1}, V_{2}*, and

Now, to derive the optimal weight, *w _{1}*, for (1), we need to minimize

$$g({w}_{1})={w}_{1}^{2}{V}_{1}+{(1-{w}_{1})}^{2}{V}_{2}+2{w}_{1}(1-{w}_{1}){V}_{\mathrm{12}}$$

with respect to *w _{1}*, since

$$\mathit{\text{Var}}({\widehat{\beta}}_{\mathit{\text{RCH}},1})=\frac{{V}_{1}{V}_{2}-{V}_{\mathrm{12}}^{2}}{{V}_{1}+{V}_{2}-2{V}_{\mathrm{12}}}.$$

To estimate *Var*(_{RCH,1}), estimates of *V _{1}, V_{2}*, and

By the multivariate delta method,

$$\begin{array}{ll}\widehat{\mathit{\text{Var}}}({\widehat{\mathit{\text{\beta}}}}_{\mathit{\text{RCH}},2})& \approx \widehat{\mathit{\text{Var}}}({\widehat{\mathit{\text{\beta}}}}_{2}^{*})+{\widehat{\beta}}_{\mathit{\text{RCH}},1}^{2}\widehat{\mathit{\text{Var}}}({\widehat{\mathit{\text{\gamma}}}}_{2})+\widehat{\mathit{\text{Var}}}({\widehat{\beta}}_{\mathit{\text{RCH}},1}){\widehat{\mathit{\text{\gamma}}}}_{2}{\widehat{\mathit{\text{\gamma}}}}_{2}^{T}\\ & -\widehat{\mathit{\text{Cov}}}({\widehat{\mathit{\text{\beta}}}}_{2}^{*},\mathrm{{\widehat{\beta}}_{\mathit{\text{RCH}},1}){\widehat{\mathit{\text{\gamma}}}}_{2}^{T}-{\widehat{\mathit{\text{\gamma}}}}_{2}\widehat{\mathit{\text{Cov}}}({\widehat{\mathit{\text{\gamma}}}}_{2},\mathrm{{\widehat{\beta}}_{\mathit{\text{RCH}},1})}}& +{\widehat{\beta}}_{\mathit{\text{RCH}},1}[\widehat{\mathit{\text{Cov}}}({\widehat{\mathit{\text{\gamma}}}}_{2},\mathrm{{\widehat{\beta}}_{\mathit{\text{RCH}},1}){\widehat{\mathit{\text{\gamma}}}}_{2}^{T}}]+{\widehat{\gamma}}_{2}[{\widehat{\mathit{\text{Cov}}}}^{T}({\widehat{\mathit{\text{\gamma}}}}_{2},\mathrm{{\widehat{\beta}}_{\mathit{\text{RCH}},1})}]\end{array}$$

where

$$\widehat{\mathit{\text{Cov}}}({\widehat{\mathit{\text{\beta}}}}_{2}^{*},\mathrm{{\widehat{\beta}}_{\mathit{\text{RCH}},1})\approx \frac{{w}_{1}\widehat{\mathit{\text{Cov}}}({\widehat{\mathit{\text{\beta}}}}_{2}^{*},\mathrm{{\beta}_{\mathrm{11}}^{*})}{\widehat{\gamma}}_{1}}{+}}$$

(2)

and

$$\widehat{\mathit{\text{Cov}}}({\widehat{\mathit{\text{\gamma}}}}_{2},\mathrm{{\widehat{\beta}}_{\mathit{\text{RCH}},1})\approx -\frac{{w}_{1}{\widehat{\beta}}_{\mathrm{11}}^{*}\widehat{\mathit{\text{Cov}}}({\widehat{\mathit{\text{\gamma}}}}_{2},\mathrm{{\widehat{\gamma}}_{1})}{\widehat{\gamma}}_{1}^{2}}{}}$$

(3)

where $\widehat{\mathit{\text{Var}}}({\widehat{\mathit{\text{\gamma}}}}_{2})$ and $\widehat{\mathit{\text{Cov}}}({\widehat{\mathit{\text{\gamma}}}}_{2},\mathrm{{\widehat{\gamma}}_{1})}$ are obtained from the corresponding elements and sub-matrices of

$$\mathit{\text{Var}}(\widehat{\gamma})\approx {\sigma}^{2}{[\{{({X}_{i},\mathrm{{\mathit{\text{U}}}_{i}^{T})}T}^{\}}}^{}$$

and
$\widehat{\mathit{\text{Var}}}({\widehat{\mathit{\text{\beta}}}}_{2}^{*})$,
$\widehat{\mathit{\text{Cov}}}({\widehat{\mathit{\text{\beta}}}}_{2}^{*},\mathrm{{\widehat{\beta}}_{\mathrm{11}}^{*})}$ and
$\widehat{\mathit{\text{Cov}}}({\widehat{\mathit{\text{\beta}}}}_{2}^{*},\mathrm{{\widehat{\beta}}_{\mathrm{12}}^{*})}$ are obtained from the uncorrected logistic regression of *Y* on *(X,**U**)*. Finally,

$$\widehat{\mathit{\text{Cov}}}({\widehat{\mathit{\text{\beta}}}}_{\mathit{\text{RCH}},2},\mathrm{{\widehat{\beta}}_{\mathit{\text{RCH}},1})\approx \widehat{\mathit{\text{Cov}}}({\widehat{\mathit{\text{\beta}}}}_{2}^{*},\mathrm{{\widehat{\beta}}_{\mathit{\text{RCH}},1})-\widehat{\mathit{\text{Var}}}({\widehat{\beta}}_{\mathit{\text{RCH}},1}){\widehat{\mathit{\text{\gamma}}}}_{2}-{\widehat{\beta}}_{\mathit{\text{RCH}},1}\widehat{\mathit{\text{Cov}}}({\widehat{\mathit{\text{\gamma}}}}_{2},\mathrm{{\widehat{\beta}}_{\mathit{\text{RCH}},1})}}}$$

where the covariances in the first and third terms are given by equations (2) and (3), respectively, and $\widehat{\mathit{\text{Var}}}({\widehat{\beta}}_{\mathit{\text{RCH}},1})$ is derived in Appendix 1.

Note that ${\beta}_{\mathrm{11}}^{*}={\beta}_{1}{\gamma}_{1}+{\beta}_{1}^{2}{\sigma}^{2}/2$ and equation (9) simplifies to

$${\beta}_{\mathit{\text{RCH}},1}=\frac{-{\gamma}_{1}+\mathit{\text{sign}}({\beta}_{\mathrm{11}})\sqrt{{\gamma}_{1}^{2}+2{\sigma}^{2}{\beta}_{\mathrm{11}}^{*}}}{{\sigma}^{2}}=\left[-{\gamma}_{1}+\mathit{\text{sign}}({\beta}_{1}+{\beta}_{1}^{2}{\sigma}^{2}/(2{\gamma}_{1}))\left|{\gamma}_{1}+{\beta}_{1}{\sigma}^{2}\right|\right]/{\sigma}^{2}$$

Without loss of generality, hats indicating estimators are suppressed throughout this proof.

_{RCH,1} is consistent for *β*_{1} when

$${\gamma}_{1}+\sigma {\beta}_{1}>{0}^{2},\mathrm{\mathrm{\mathrm{\mathbf{\text{and}}}}}$$

(1)

$${\beta}_{1}+\frac{{\sigma}^{2}{\beta}_{1}^{2}}{2{\gamma}_{1}}>0$$

(2)

In addition, _{RCH,1} is consistent for *β*_{1} when

$${\gamma}_{1}+{\sigma}^{2}{\beta}_{1}<0,\mathrm{\mathrm{\mathrm{\mathbf{\text{and}}}}}$$

(3)

$${\beta}_{1}+\frac{{\sigma}^{2}{\beta}_{1}^{2}}{2{\gamma}_{1}}<0$$

(4)

To see this, first consider the first pair of conditions. When condition 2) holds the sign function yields +1 and condition 1) results in |*γ _{1}* +

- It is clear that (1) and (2) hold when both
*β*_{1}and*γ*_{1}are positive, hence_{RCH,1}is always consistent for*β*_{1}under these circumstances. - Next, we consider the case when
*γ*> 0,_{1}*β*< 0._{1}- holds if −
*σ*^{2}*β*<_{1}*γ*, i.e._{1}*σ*|^{2}*β*| <_{1}*γ*(*)_{1} - holds if ${\beta}_{1}+\frac{{\sigma}^{2}{\beta}_{1}^{2}}{2{\gamma}_{1}}>0$ iff $\frac{1}{2}{\sigma}^{2}{\beta}_{1}^{2}>-{\beta}_{1}{\gamma}_{1}$

*β*| to get $\frac{1}{2}{\sigma}^{2}|{\beta}_{1}|>{\gamma}_{1}$ (**)_{1}(**) ${\gamma}_{1}<\frac{1}{2}{\sigma}^{2}|{\beta}_{1}|$ and (*)*σ*|^{2}*β*| <_{1}*γ*are not possible at the same time. So both (1) and (2) positive is not possible when_{1}*γ*> 0,_{1}*β*< 0._{1}Consider both expressions negative:- 3)
*γ*+_{1}*σ*^{2}*β*< 0 iff_{1}*γ*<_{1}*σ*|^{2}*β*| (*)_{1} - 4) ${\beta}_{1}+\frac{{\sigma}^{2}{\beta}_{1}^{2}}{2{\gamma}_{1}}<0$ iff $\frac{1}{2}{\sigma}^{2}{|{\beta}_{1}|}^{2}<-{\beta}_{1}{\gamma}_{1}=|{\beta}_{1}|{\gamma}_{1}$

Divide by |*β*| to get $\frac{1}{2}{\sigma}^{2}|{\beta}_{1}|<{\gamma}_{1}$ (**)_{1}When$$(*){\gamma}_{1}<{\sigma}^{2}|{\beta}_{1}|\mathrm{\text{and}\mathrm{(**)\frac{1}{2}{\sigma}^{2}|{\beta}_{1}|{\gamma}_{1}\mathrm{\text{yields}\frac{1}{2}{\sigma}^{2}|{\beta}_{1}|{\gamma}_{1}{\sigma}^{2}|{\beta}_{1}|.}}}$$*β*< 0 and_{1}*γ*> 0, the only situation that produces a valid estimate is when $\frac{1}{2}{\sigma}^{2}|{\beta}_{1}|<{\gamma}_{1}<{\sigma}^{2}|{\beta}_{1}|$._{1} - Next, we consider the case when
*β*> 0,_{1}*γ*< 0_{1}*γ*+_{1}*σ*^{2}*β*> 0,_{1}**and**- ${\beta}_{1}+\frac{{\sigma}^{2}{\beta}_{1}^{2}}{2{\gamma}_{1}}>0$
*γ*+_{1}*σ*^{2}*β*> 0 if_{1}*σ*^{2}*β*> |_{1}*γ*| (*)_{1}${\beta}_{1}+\frac{{\sigma}^{2}{\beta}_{1}^{2}}{2{\gamma}_{1}}>0$ when ${\beta}_{1}>\frac{{\sigma}^{2}{\beta}^{2}}{2(-{\gamma}_{1})}=\frac{{\sigma}^{2}{\beta}^{2}}{2|{\gamma}_{1}|}$. That is, $|{\gamma}_{1}|>\frac{{\sigma}^{2}{\beta}_{1}}{2}$ (**)Both of these are true when $\frac{1}{2}{\sigma}^{2}{\beta}_{1}<|{\gamma}_{1}|<{\sigma}^{2}{\beta}_{1}$.Now consider,_{RCH,1}is consistent for*β*_{1}when *γ*+_{1}*σ*^{2}*β*< 0,_{1}**and**- ${\beta}_{1}+\frac{{\sigma}^{2}{\beta}_{1}^{2}}{2{\gamma}_{1}}<0$(*)
*σ*^{2}*β*< |_{1}*γ*| and (**) $|{\gamma}_{1}|<\frac{1}{2}{\sigma}^{2}{\beta}_{1}$ is not possible._{1}

*β*< 0,_{1}*γ*< 0_{1}_{RCH,1}is consistent for*β*_{1}when*γ*+_{1}*σ*^{2}*β*> 0,_{1}**and**- ${\beta}_{1}+\frac{{\sigma}^{2}{\beta}_{1}^{2}}{2{\gamma}_{1}}>0$

Now consider,_{RCH,1}is consistent for*β*_{1}when- 3)
*γ*+_{1}*σ*^{2}*β*< 0,_{1}**and** - 4) ${\beta}_{1}+\frac{{\sigma}^{2}{\beta}_{1}^{2}}{2{\gamma}_{1}}>0$

Similarly it is easy to see that both 3) and 4) are always true. Hence, when *β _{1}* < 0,

Conclusion:

_{RCH,1}is always consistent for*β*_{1}when both*β*_{1}and*γ*_{1}are positive._{RCH,1}is always consistent for*β*_{1}when both*β*_{1}and*γ*_{1}are negative.

When *β _{1}* < 0,

**Author Notes:** This study was supported by NIH grants CA50597, NIH ES09411, and NIH CA74112.

Donna Spiegelman, Harvard School of Public Health.

Roger Logan, Harvard School of Public Health.

Douglas Grove, Fred Hutchinson Cancer Research Center.

- Armstrong BG, Whittemore AS, et al. “Analysis Of Case-Control Data With Covariate Measurement Error - Application To Diet And Colon Cancer.” Statistics In Medicine. 1989;8(9):1151–1163. doi: 10.1002/sim.4780080916. [PubMed] [Cross Ref]
- Begun JM, Hall WJ, et al. “Information and asymptotic efficiency in parametric-nonparametric models.” Annals of Statistics. 1983;11:432–452. doi: 10.1214/aos/1176346151. [Cross Ref]
- Bishop YMM, Fienberg SE, et al. Discrete Multivariate Analyses: Theory and Practice. MIT Press; 1975. pp. 492–494.
- Box GEP, Cox DR. “AN ANALYSIS OF TRANSFORMATIONS” Journal of the Royal Statistical Society Series B-Statistical Methodology. 1964;26(2):211–252.
- Carroll R, Ruppert D. Transformation and weighting in regression. London: Chapman and Hall; 1988.
- Carroll RJ, Ruppert D. Transformation and Weighting in Regression. London: Chapman and Hall; 1988.
- Carroll RJ, Ruppert D, et al. Measurement Error in Nonlinear Models. London: Chapman & Hall; 2006.
- Carroll RJ, Stefanski LA. “Approximate quasi-liklihood estimation in models with surrogate predictors” Journal of the American Statistical Association. 1990;85:652–663. doi: 10.2307/2290000. [Cross Ref]
- Carroll RJ, Wand MP. “SEMIPARAMETRIC ESTIMATION IN LOGISTIC MEASUREMENT ERROR MODELS” Journal of the Royal Statistical Society Series B-Methodological. 1991;53(3):573–585.
- Cook J, Stefanski LA. “A simulation extrapolation method for parametric measurement error models” Journal of the American Statistical Association. 1995;89:1314–1328. doi: 10.2307/2290994. [Cross Ref]
- Crouch EAC, Spiegelman D. “The Evaluation Of Integrals Of The Form Integral-Infinity+Infinity F(T)Exp(-T2) Dt - Application To Logistic Normal-Models” Journal Of The American Statistical Association. 1990;85(410):464–469. doi: 10.2307/2289785. [Cross Ref]
- Fleiss J. The design and analysis of clinical experiments. New York: Wiley; 1986.
- Fuller WA. Measurement Error Models. New York: Wiley; 1987.
- Fung KY, Krewski D. “On measurement error adjustment methods in Poisson regression” Environmetrics. 1999;10(2):213–224. doi: 10.1002/(SICI)1099-095X(199903/04)10:2<213::AID-ENV349>3.0.CO;2-B. [Cross Ref]
- Hu P, Tsiatis AA, et al. “Estimating the parameters in the Cox model when covariate variables are measured with error.” Biometrics. 1998;54(4):1407–1419. doi: 10.2307/2533667. [PubMed] [Cross Ref]
- Huang Y, Wang CY. “Cox regression with accurate covariates unascertainable: a nonparametric-correction approach” Journal of the American Statistical Association. 2000;95:1209–1219. doi: 10.2307/2669761. [Cross Ref]
- Kuha J. “Corrections for exposure measurement error in logistic regression models with an application to nutritional data” Stat Med. 1994;13(11):1135–1148. doi: 10.1002/sim.4780131105. [PubMed] [Cross Ref]
- Kukush A, Schneeweis H, et al. “Three estimators for the Poisson regression model with measurement errors.” Statistical Papers. 2004;45(3):351–368. doi: 10.1007/BF02777577. [Cross Ref]
- Lee JE, Hunter DJ, et al. “Alcohol intake and renal cell cancer in a pooled analysis of 12 prospective studies.” Journal of the National Cancer Institute. 2007;99(10):801–810. doi: 10.1093/jnci/djk181. [PubMed] [Cross Ref]
- Preis SR, Spiegelman D, et al. “Random and correlated errors in gold standards used in nutritional epidemiology: implications for validation studies.” American Journal of Epidemiology. 2010
**In press.** - Prentice RL. “Covariate measurement errors and parameter estimation in a failure time regression model” Biometrika. 1982;69:331–342. doi: 10.1093/biomet/69.2.331. [Cross Ref]
- Robins JM, Hsieh F, et al. “Semi-parametric efficient estimation of a conditional density with missing or mis-measured covariates.” Journal of the Royal Statistical Society, Series B. 1995;57:409–424.
- Rosner B, Spiegelman D, et al. “Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error.” Am J Epidemiol. 1990;132(4):734–745. [PubMed]
- Rosner B, Spiegelman D, et al. “Correction of logistic regression relative risk estimates and confidence intervals for random within-person measurement error.” Am J Epidemiol. 1992;136(11):1400–1413. [PubMed]
- Rosner B, Willett WC, et al. “Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error.” Stat Med. 1989;8(9):1051–1069. doi: 10.1002/sim.4780080905. discussion 1071–1053. [PubMed] [Cross Ref]
- Seber G. Linear Regression Analysis. New York: Wiley & Sons; 1977.
- Spiegelman D, Carroll RJ, et al. “Efficient regression calibration for logistic regression in main study/internal validation study designs with an imperfect reference instrument.” Statistics In Medicine. 2001;20(1):139–160. doi: 10.1002/1097-0258(20010115)20:1<139::AID-SIM644>3.0.CO;2-K. [PubMed] [Cross Ref]
- Spiegelman D, Casella M. “Fully parametric and semi-parametric regression models for common events with covariate measurement error in main study/validation study designs” Biometrics. 1997;53(2):395–409. doi: 10.2307/2533945. [PubMed] [Cross Ref]
- Spiegelman D, Gray R. “COST-EFFICIENT STUDY DESIGNS FOR BINARY RESPONSE DATA WITH GAUSSIAN COVARIATE MEASUREMENT ERROR” Biometrics. 1991;47(3):851–869. doi: 10.2307/2532644. [PubMed] [Cross Ref]
- Spiegelman D, McDermott A, et al. “Regression calibration method for correcting measurement-error bias in nutritional epidemiology.” Am J Clin Nutr. 1997;65(4 Suppl):1179S–1186S. [PubMed]
- Spiegelman D, Rosner B, et al. “Estimation and inference for logistic regression with covariate misclassification and measurement error in main study/validation study designs.” Journal of the American Statistical Association. 2000;95:51–61. doi: 10.2307/2669522. [Cross Ref]
- Spiegelman D, Valanis B. “Correcting for bias in relative risk estimates due to exposure measurement error: A case study of occupational exposure to antineoplastics in pharmacists” American Journal of Public Health. 1998;88(3):406–412. doi: 10.2105/AJPH.88.3.406. [PubMed] [Cross Ref]
- Subar AF, Kipnis V, et al. “Using intake biomarkers to evaluate the extent of dietary misreporting in a large sample of adults: the OPEN study.” Am J Epidemiol. 2003;158(1):1–13. doi: 10.1093/aje/kwg092. [PubMed] [Cross Ref]
- Valanis BG, Vollmer WM, et al. “Association of antineoplastic drug handling with acute adverse effects in pharmacy personnel.” Am J Hosp Pharm. 1993;50:445–462. [PubMed]
- Van Roosbroeck S, Li R, et al. “Traffic-related outdoor air pollution and respiratory symptoms in children: the impact of adjustment for exposure measurement error” Epidemiology. 2008;19:409–416. doi: 10.1097/EDE.0b013e3181673bab. [PubMed] [Cross Ref]
- Wang CY, Hsu L, et al. “Regression calibration in failure time regression.” Biometrics. 1997;53(1):131–145. doi: 10.2307/2533103. [PubMed] [Cross Ref]
- Weller EA, Milton DK, et al. “Regression calibration for logistic regression with multiple surrogates for one exposure.” Journal of Statistical Planning and Inference. 2007;137(2):449–461. doi: 10.1016/j.jspi.2006.01.009. [Cross Ref]
- White EJ, Armstrong BK, et al. Principles of exposure measurement in epidemiology: collecting, evaluating and improving measures of disease risk factors. Oxford, England: New York, New York, Oxford University Press; 2006.
- Willett WC, Sampson L, et al. “Reproducibility and validity of a semiquantitative food frequency questionnaire.” American Journal of Epidemiology. 1985;122:51–65. [PubMed]
- Willett WC, Stampfer MJ, et al. “MODERATE ALCOHOL-CONSUMPTION AND THE RISK OF BREAST-CANCER.” New England Journal of Medicine. 1987;316(19):1174–1180. doi: 10.1056/NEJM198705073161902. [PubMed] [Cross Ref]

Articles from The International Journal of Biostatistics are provided here courtesy of **Berkeley Electronic Press**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |