Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2676235

Formats

Article sections

- SUMMARY
- 1. INTRODUCTION
- 2. METHODS
- 3. IMPLEMENTATION
- 4. SIMULATIONS
- 5. ASYMPTOTIC VARIANCES IN A SIMPLIFIED SITUATION
- 6. APPLICATION TO AARP STUDY
- 7. DISCUSSION
- References

Authors

Related links

Stat Med. Author manuscript; available in PMC 2009 May 2.

Published in final edited form as:

PMCID: PMC2676235

NIHMSID: NIHMS101420

Laurence S. Freedman,^{1,}^{*}^{†} Douglas Midthune,^{2} Raymond J. Carroll,^{3} and Victor Kipnis^{2}

Regression calibration (RC) is a popular method for estimating regression coefficients when one or more continuous explanatory variables, *X*, are measured with an error. In this method, the mismeasured covariate, *W*, is substituted by the expectation *E*(*X*|*W*), based on the assumption that the error in the measurement of *X* is non-differential. Using simulations, we compare three versions of RC with two other ‘substitution’ methods, moment reconstruction (MR) and imputation (IM), neither of which rely on the non-differential error assumption. We investigate studies that have an internal calibration sub-study. For RC, we consider (i) the usual version of RC, (ii) RC applied only to the ‘marker’ information in the calibration study, and (iii) an ‘efficient’ version (ERC) in which the estimators (i) and (ii) are combined. Our results show that ERC is preferable when there is non-differential measurement error. Under this condition, there are cases where ERC is less efficient than MR or IM, but they rarely occur in epidemiology. We show that the efficiency gain of usual RC and ERC over the other methods can sometimes be dramatic. The usual version of RC carries similar efficiency gains to ERC over MR and IM, but becomes unstable as measurement error becomes large, leading to bias and poor precision. When differential measurement error does pertain, then MR and IM have considerably less bias than RC, but can have much larger variance. We demonstrate our findings with an analysis of dietary fat intake and mortality in a large cohort study.

The problem of error in the measurement of covariates to be used in a regression analysis is often important in epidemiological research, where accurate measurements are commonly difficult to achieve. It is now well understood that such error can cause bias in the estimates of regression coefficients, and a large collection of special methods for eliminating such bias is available [1].

Regression calibration (RC) [2, 3] has been one of the most commonly used methods [4]. One of the principal advantages of RC is its simplicity. In place of each individual’s mismeasured covariate one uses a substitute value and then runs the same estimation procedure as would be used with a precisely measured covariate. The substituted value is the expected value of the true measurement conditional on the observed measurement and other exactly measured covariates in the model.

Recently, it has been realized that two other ‘substitution’ methods are available that would possibly share these advantages enjoyed by RC. The first is moment reconstruction (MR), proposed by Freedman *et al.* [5]. The second is imputation (IM) or multiple IM [6], a class of methods that is well established for handling missing data, but that has been proposed for dealing with errors of measurement in covariates [7]. Cole *et al.* [8] applied this method to an epidemiological study with a binary outcome and a mismeasured binary covariate. MR and IM both allow the covariate measurement error to be differential, that is, informative about the outcome variable, whereas RC requires the measurement error to be non-differential. All three substitution methods require information about the measurement error, usually obtained from a calibration study, in which the mismeasured covariate is supplemented by a reference measurement.

In this paper we study and compare these three methods in the context of adjusting for measurement error in dietary intakes, in studies relating diet to disease. We investigate studies that have an internal calibration sub-study. For RC, we consider three variants: (i) the usual version of RC; (ii) RC applied only to the reference measurements in the calibration study; and (iii) an ‘efficient’ version of RC [9] in which the first two estimators are combined in an efficient manner.

Our method of comparing the methods is primarily through simulation in which the models used for simulation are motivated by nutritional epidemiology. We also give a practical illustration of the methods applied to data from the NIH-AARP Diet and Health Cohort study.

Our results show that ‘efficient’ RC (ERC) is preferable among the methods we compare when there is no reason to suspect differential measurement error. Under this condition, there are cases where ERC is less efficient than MR or IM, but they rarely occur in epidemiology. We show that the efficiency gain of ERC over MR or IM can sometimes be dramatic, and that the price paid by the other methods for relaxing the assumption of non-differential measurement error is high. The usual version of RC carries similar efficiency gains to ERC over MR and IM, but becomes unstable when the measurement error is large, leading to bias and poor precision. When differential measurement error does pertain, then MR and IM have considerably less bias than RC and ERC, but can have much larger variance.

In this section we describe the methods that we will investigate. We consider the following situation. Let *Y* be the disease variable. We will allow *Y* to be either continuous, as for a disease marker, or binary, as for a disease indicator. Let *X* be the exposure variable(s) of interest. We would like to measure *X* exactly, but are not able. Instead we measure *W*, which is *X* with the error.

The statistical models linking *Y*, *X* and *W* will consist of two parts, the disease model linking *Y* and *X*, and the measurement error model linking *X* and *W*. When the measurement error is differential, then the latter model will also include *Y*.

We express the disease model as

(1)

where *h* is either the identity function when *Y* is continuous or the logistic function when *Y* is binary. Our aim is to estimate *β _{X}* as well as possible. In many applications, there will also be covariates in the model that are measured without error. However, the insights that we hope to gain from our study do not require their inclusion here.

We express the non-differential measurement error model as

(2)

where *δ* is a residual error with zero expectation that is independent of *X* and *Y*. Such a model has been called the non-classical measurement error model to distinguish it from the classical measurement error model where *γ*_{0} = 0 and *γ _{X}* = 1. The model is motivated by dietary self-report data that appear to conform to this model after a suitable transformation [10]. Later, in this paper we will discuss calibration studies that may be conducted to estimate the parameters of model (2), which need to be known in order to implement the three statistical methods that we now describe. Later, we will also consider differential measurement error models where equation (2) will include some dependence of

In this method we first estimate the quantity

(3)

and then substitute this quantity into the regression model (1) in place of the unknown *X*, so as to estimate *β _{X}*. Under the assumption of non-differential measurement error (i.e.

In this method the aim is to find a quantity *X*_{MR}(*W, Y*) that has the same distribution as *X* and then substitute this quantity into the regression model (1) in place of the unknown *X*, so as to estimate *β _{X}*. Standard errors for the estimate of

where *G* = {cov(*X*|*Y*)}^{1/2}{cov(*W*|*Y*)}^{−1/2}. However, this expression was based on the assumption that *W* follows a classical measurement error model, in which case *E*(*X*|*Y*)= *E*(*W*|*Y*). For non-classical measurement error *E*(*X*|*Y*) ≠ *E*(*W*|*Y*), so that a modification is needed to the expression for *X*_{MR}(*W, Y*) so as to preserve the first-moment relationship *E*[*X*_{MR}(*W, Y*)] = *E*(*X*). This is achieved by modifying the definition to

(4)

with *G* defined exactly as before, from which the desired equality of the first two moments follows immediately on taking expectations conditional on *Y*.

Freedman *et al*. [5] demonstrated that when the measurement error model parameters are known, MR is equivalent to RC in linear regression, and in logistic regression with normally distributed covariates the MR estimate of *β _{X}* is, unlike RC, consistent. A further potential benefit of MR is that the conditioning on

In this method we estimate the quantity *E*(*X*|*W,Y*) and then compute

(5)

where *e* is a random draw from the distribution of residuals from the regression of *X* on *W* and *Y*. We then substitute this quantity into the regression model (1) in place of the unknown *X*, so as to estimate *β _{X}*. Similar to MR, the method can accommodate differential measurement error since both methods condition on

Each method has several variants that can be considered for use. We have chosen to report on variants that we believe are practical and make efficient use of the available data. The details of each variant will be specified below.

The description of methods in Section 2 is quite general. However, implementation of the methods requires estimation of the measurement error model parameters, based on a calibration study. We consider here the case of an internal calibration study, where the subjects are a random sample of those in the main study sample. We assume that there is a single explanatory variable *X* that is measured unbiasedly by a ‘marker’ *M* in the calibration study. The measurement of *M* is considerably more expensive than that of *W* and can be performed only in the smaller calibration study but not in the main study sample. *M* is related to *X* by the classical measurement error model:

(6)

where *u* is a random error with zero expectation, independent of *X*, *W* and *Y*. We assume that *M* is measured twice on each person in the calibration study and that the random errors *u* for the two measurements are independent, so that the variance of *u* can be estimated. We consider the calibration study design in which *W* and two values of *M* and the disease variable *Y* are measured on each person. If *Y* is not measured in the calibration study, then the MR and IM methods are not directly available since they require estimates of moments of *X* conditional on *Y*. Indirect versions of MR and IM, based on the assumption of the non-differential measurement error, can be constructed in these circumstances, but we will not pursue these here, deferring comment on these until Section 7.

To simplify the description of the implementation we assume that (*Y, X, W, M*) has a multivariate normal distribution. In the case where *Y* is binary, we assume that (*X, W, M*) has a multivariate normal distribution conditional on *Y*, and also marginally, so that all subsidiary regressions required for our methods are linear. While the conditional and marginal normality assumptions cannot hold simultaneously, they can be approximately true simultaneously when the disease (*Y* = 1) is rare, which we will assume.

We consider here three separate estimates that have previously been termed RC in the literature. The first estimate, _{X,}_{RC1} is the standard RC estimate when the calibration study is external to the main study.

The second estimate, _{X,}_{RC2}, is obtained from the RC estimate based on the individuals in the calibration study using their outcome values, *Y*, and their repeated marker values, *M*_{1} and *M*_{2}, that are assumed to follow the classical measurement error model. This estimate is not usually considered, but is of interest in its own right since it does not employ values of *W*, and therefore is valid when the measurement error in *W* is differential. To distinguish it from the other methods, we call this method ‘calibration study RC’.

The third estimate, proposed by Spiegelman *et al.* [9] is a weighted average of _{X,}_{RC1} and _{X,}_{RC2}. The two estimates are weighted by the inverse of their estimated variances; see Appendix A for details. It is expected that this estimate will be more efficient than both _{X,}_{RC1} and _{X,}_{RC2} and has been termed ERC [9].

We will denote the above three methods by RC1, RC2 and ERC, respectively. Note that the RC1 and ERC estimates are based on the assumption of non-differential measurement error.

*X*_{MR}(*W, Y*) = *E*(*X*|*Y*)+ *G*{*W* − *E*(*W*|*Y*)} may be calculated as follows. *E*(*W*|*Y*) and var(*W*|*Y*) are estimated from the main study. *E*(*X*|*Y*) and var(*X*|*Y*) are estimated from the calibration study data, via the regression of on *Y* using *Ê*(*X*|*Y*)= *Ê*(|*Y*) and
, where
. *G* is then estimated by
and *X*_{MR} calculated for each individual in the main study. Finally, we estimate _{X,}_{MR} as the coefficient of *X*_{MR} in the regression of *Y* on *X*_{MR} in the main study sample.

We follow the general approach described in Appendix 2 of Cole *et al*. [8]. For each person in the main study sample who is not also in the calibration study we impute *X* using *X*_{IM}(*W, Y*) = *E*(*X*|*W,Y*)+*e*, whereas for persons in the calibration study, we impute using *X*_{IM}(*W, Y, *)= *E*(*X*|*W,Y, *)+*e*^{*}. In these formulas, *e* is a random draw from the distribution of residuals in the regression of *X* on (*W,Y*), whereas *e*^{*} is a random draw from the distribution of residuals in the regression of *X* on (*W,Y, *). We repeat the procedure *K* times, thereby creating a total of *K* imputed sets of covariates
. For each *k* from 1 to *K*, we then regress *Y* on
in the main study to obtain the estimate
and the naïve model-based estimate
that ignores the fact that *X* was imputed.

Finally, we estimate *β _{X}* as

and we estimate var(_{X,}_{IM}) as

The full details of the procedure are described in Appendix A. Cole *et al*. [8] in their Appendix 2 comment that their IM procedure is ‘proper’ in the sense of Little and Rubin [6, p. 214]. The method is expected to give unbiased estimates of the model parameters, and confidence intervals with coverage probabilities at the nominal level.

We investigated both *K* = 10 and 40, the value suggested by Cole *et al*. [8], in our simulations. Parameter estimates with both *K* = 10 and 40 were essentially unbiased. However, the coverage properties of the confidence intervals with *K* = 40 follow the nominal level, whereas with *K* = 10 the coverage is slightly below the nominal level (see Section 4). The results in our tables (which deal with bias and precision of the estimates and not with confidence interval coverage) are those based on *K* = 10.

It will become apparent when considering the simulation results given below that the above five estimation methods RC1, RC2, ERC, MR and IM actually fall into three classes. Methods RC2, MR and IM derive all or most of their information from the marker (*M*) and disease (*Y*) measurements in the internal calibration study. Method RC1 derives most of its information from the exposure (*W*) and disease (*Y*) measurements in the main study. Method ERC combines these two types of information in an efficient manner. When viewed in this manner, one could also ask how efficient combinations of RC1 and MR, or of RC1 and IM, would perform. We content ourselves with studying just one combination method (ERC) in this paper, choosing the method that has already appeared in the literature [9].

In this section, unless stated otherwise, we take as given that each method yields unbiased or nearly unbiased estimates of *β _{X}*. Our main interest is therefore in the precision of the methods, and we use simulations to compare them. We begin with a simulation where the disease model is a linear regression.

We generated data according to the following model parameters and conditions:

*Disease model* (1): Link function *h*^{−1} = Identity; *β*_{0}=0; *β _{X}* =0.3 or 0.6; var(

*Measurement error model* (2): *γ*_{0} = 0; *γ _{X}* = 0.5, 0.75 or 1; var(

‘*Marker*’ (*M*) *model* (6): var(*u*)= 1.0.

*Main study*: sample size (*N*)= 1000; one measurement of *Y* and *W* per person.

*Calibration study*: sample size (*n*)= 100; one measurement of *Y* and *W*, two measurements of *M* per person.

*Number of simulations per scenario*: 1000.

*Methods of estimating β _{X}*: RC1, RC2, ERC, MR and IM.

By varying the regression slope in the disease model, and the error variance in the measurement error model, we compare cases where the disease–diet relationship is strong (larger slope) and weak (smaller slope), and where there is a large measurement error (higher error variance) and a small measurement error (lower error variance).

Note also that we do not yet include differential measurement error in these first simulations.

The results of these simulations are shown in Table I. The table shows the precision of * _{X}* for the various methods in the 10 (2×5) different scenarios. Examination of the table shows that the naïve method was very biased, whereas RC2, ERC, MR and IM methods all had little or no bias. When the measurement error was high then the ERC estimate was underestimated but by less than 10 per cent in our examples. The precision of the ERC estimate was generally greater than or similar to that of the RC2, MR and IM estimates. For example, when

Empirical means of _{X} (with empirical standard deviations in parentheses) in 1000 simulations where the disease model is linear, estimated by naïve linear regression of *Y* on *W*, regression calibration for *W* in the main study (RC1), **...**

The table also shows the advantage of ERC over the standard RC estimate, RC1. It may be seen that the latter becomes badly biased with inflated standard error as the measurement error increases. However, its combination with the RC2 estimate (yielding the ERC estimate) stabilizes the estimation procedure and yields standard errors considerably smaller than those of the component parts. Note that the good results for ERC occur partly because the bias of RC1 and its standard error increase together as the measurement error increases. As ERC is a weighted average of RC1 and RC2 with weights equal to the inverse of their variance, the RC1 estimate has an ever-decreasing influence on ERC as its bias grows.

We also examined the empirical coverage of the multiple IM estimated confidence interval for *β _{X}*. With 40 multiply imputed data sets, the coverage was close to the nominal 95 per cent for all 10 scenarios considered in Table I (range: 94.1–95.7 per cent). With only 10 multiply imputed data sets, the coverage was slightly below the nominal level (range: 91.0–93.8 per cent). Full data are available from the authors.

We consider two types of study that may be analyzed by logistic regression: case–control and cohort studies. In case–control studies dietary ascertainment is done after the occurrence (or not) of the disease, resulting in greater opportunity for differential error. Therefore, for this design we simulate scenarios with differential error as well as non-differential error. In cohort studies dietary intake is assessed before any occurrence of the disease, and differential error is much less likely. Therefore, for this design we simulate only non-differential error. Moreover, the sample sizes of these two designs are typically very different. We therefore generated data according to the following model parameters and conditions:

*Disease model* (1): Link function *h*^{−1} = Logistic.

Cohort: *β*_{0} = −2.2; *β _{X}* = 0.3 or 0.6; var(

Case–control: Data are generated from the above cohort model, and cases and controls are randomly selected so that there are equal numbers of each in the case–control study.

*Measurement error model* (2):

Cohort, non-differential error (*δ**Y*): As for linear regression *γ*_{0} = 0*, γ _{X}* = 0.5, var(

Case–control, non-differential error: As above.

Case–control, differential error: Three simulations with *β _{X}* = 0.3, as follows (the extra suffix in the symbols below denotes case/control status, with 0= control and 1= case):

‘*Marker*’ (*M*) *model* (6): var(*u*)= 1, as for linear regression above.

Main study

Cohort: sample size (*N*)= 100000; one measurement of *Y* and *W* per person.

Case–control: sample size (*N*)= 1000 cases and 1000 controls; one measurement of *Y* and *W* per person.

Calibration study

Cohort: sample size (*n*)= 1000; one measurement of *Y* and *W*, two measurements of *M* per person.

Case–control: sample size (*n*)= 100 cases and 100 controls; one measurement of *Y* and *W*, two measurements of *M* per person.

*Number of simulations per scenario:* 1000

*Methods of estimating β _{X}*: RC1, RC2, ERC, MR, IM.

The results are shown in Table II and show similar trends to those seen in Table I. For cohort studies with non-differential measurement error, the RC1 and ERC estimates are more precise than those of the RC2, MR and IM methods, sometimes dramatically so. The RC1 and ERC estimates are subject to mild bias especially for larger exposure effects (*β _{X}* = 0.6), but in our simulations the bias was less than 5 per cent of the estimate. For larger exposure effects (

Empirical means of _{X} (with empirical standard deviations in parentheses) in 1000 simulations when the disease model is logistic, estimated by different methods: naïve logistic regression of *Y* on *W*, regression calibration for **...**

For case–control studies with non-differential measurement error, ERC is once again more efficient than the RC2, MR and IM methods. The bias in the ERC estimate is a little higher than that in the cohort study simulations but still remains below 10 per cent of the estimate. When the measurement error is large (var(*δ*)= 4), the advantage of ERC over MR and IM lessens, but the mean-squared error of the ERC estimate remains smaller than that of the MR and IM estimates. Note that in these simulations standard RC (RC1) is quite biased, particularly as the exposure effect and the measurement error increases.

For case–control studies with differential measurement error, Table II shows that RC1 and ERC estimates have considerable bias, but will often have smaller variance than the almost unbiased RC2, MR and IM methods. In these situations the trade-off between bias and precision will have to be weighed.

In these simulations, we again examined the empirical coverage of the multiple IM estimated confidence interval for *β _{X}* for the six scenarios of case–control studies with non-differential measurement error that are listed in Table II. As with the linear regression model, we found that with 40 multiply imputed data sets the coverage was close to the nominal 95 per cent (range: 94.2–94.9 per cent), but that with only 10 multiply imputed data sets, the coverage was slightly below the nominal level (range: 92.0–94.3 per cent).

Some aspects of the results presented in Section 4 were surprising to us and not easily understood intuitively. It was particularly surprising that according to Tables I and andIIII the precision of estimates from the MR and IM methods appeared insensitive to the measurement error model parameters, in contrast to the RC1 and ERC estimates. In order to gain better insight, and also as a check on our results, we developed asymptotic expressions for the standard error of * _{X}* in linear regression of the methods in a slightly simpler context than our simulations, where

Standard regression calibration (RC1):

Calibration study RC (RC2):

ERC:

Moment reconstruction (MR):

where *ρ _{XY}* is the correlation coefficient between

Imputation (IM):

where
is the multiple correlation coefficient between *X* and (*Y, W*),
, and, under the assumption of non-differential error,
.

These formulas confirm that the asymptotic standard error for the RC1 and ERC estimators are dependent on parameters *ρ _{XW}* and

Although the asymptotic standard error of the IM estimator does involve *ρ _{XW}* (through the expression
), one can in fact show that it is bounded above by the asymptotic standard error of the MR estimator. This follows directly from the observation that
. In the worst-case scenario that

Tabulating these expressions for different values of *ρ _{XW}* and

The NIH-AARP Diet and Health study is a large cohort consisting of 550 644 individuals (325 176 men and 225 468 women) over the age of 50 years, who completed a food frequency questionnaire (FFQ) in 1995–1996 and have since been followed for mortality and cancer incidence. Details of the study are provided by Schatzkin *et al*. [11]. We examine the question whether dietary fat intake is related to mortality. At the time of the analysis, subjects had been followed for a median of 9.6 years, and 65 168 subjects (44 445 men and 20 723 women) had died.

The internal calibration sub-study comprised 1953 subjects (987 men and 966 women), who in addition to completing the FFQ also completed at least one of the two 24-h recalls (24HR) (1890 completed both), to be used as reference measurements. Details of the calibration sub-study are provided by Thompson *et al*. [12]. At the time of the analysis, 208 subjects (114 men and 94 women) in the calibration sub-study had died.

For illustration of our methods, we estimate the parameters in a logistic regression of mortality (*Y*) on the logarithm of per cent calories from fat in the diet (*X*) and age. Reported exposure (*W*) is log per cent calories from fat as measured by the FFQ, and the reference measurements (*M*_{1} and *M*_{2}) are log per cent calories from fat as measured by the two 24HR. Note that there is doubt over whether the 24HR measurements will indeed conform to the classical measurement error model (6), but currently there is no measure of fat intake available that is known to be a valid reference measurement (i.e. unbiased with errors that are uncorrelated with *Y*, *X* and *W*).

Prior to the analysis, we excluded 5034 subjects (976 deaths) who reported dietary intakes that were determined to be outliers of *W* or FFQ log total caloric intake. None of the excluded subjects were in the calibration sub-study. For subjects in the calibration sub-study, we excluded 20 values of *M*_{1} and 23 values of *M*_{2} that were also determined to be outliers. Outliers were defined to be values that fell below the 25th percentile of the distribution of the variable minus two interquartile ranges or above the 75th percentile plus two interquartile ranges.

We estimated parameters in the logistic regression of mortality on log per cent calories from fat and age, separately for men and women, using six different methods: naïve regression of *Y* on *W*, RC1, RC2, ERC, MR and IM. Standard errors were estimated using a bootstrap method with 100 replications.

Table IV presents the estimates of the coefficient for log per cent calories from fat. The naïve estimate indicates a moderate association with an odds ratio of 1.7–2.0 (exp(0.55)–exp(0.71)) for a 2.7-fold= exp(1.0) increase in per cent fat intake. This association is highly statistically significant (*z*>20 for both men and women), because of the very large sample size. Adjustment for the measurement error by RC1 or ERC indicates an even stronger association with an odds ratio of 2.9–4.7 (exp(1.06)–exp(1.54)) for a 2.7-fold increase in per cent fat intake, which is still highly statistically significant (*z*>10 for both men and women). However, the MR and IM method estimates have standard errors that are 5–10 times larger than that of the RC1 or ERC estimate, and consequently conventional statistical significance (*z*>1.96) is no longer seen.

AARP study: estimated regression coefficient for log per cent calories from fat in a logistic regression of mortality on log per cent calories from fat and age, using six methods of estimation: naïve logistic regression of mortality on FFQ, regression **...**

This result appears even stronger than that in the first row of the simulated cohort studies seen in Table II, where standard errors of the MR and IM estimates were approximately 4 times larger than that of the ERC estimate. The key to linking the AARP result to the simulations lies in considering the values of *ρ _{XY}* and

Our main conclusion is that the MR and IM methods in this context are grossly inefficient compared with RC1 or ERC. The finding, using RC1 or ERC, of a possibly highly important association between per cent fat intake and total mortality needs further examination. Confounding with other factors needs to be considered. The association may partly reflect the known fat intake–cholesterol–heart disease pathway, which may be studied by examining the association for selected causes of death.

We have described and compared three substitution methods for correcting regression coefficients for measurement error in the covariates, in the context of nutritional epidemiologic studies. We note that in place of our term ‘substitution’, we could have used the word ‘imputation’ (in its general sense), but to do so may have caused confusion. In fact, RC corresponds to the conditional mean IM method described by Little [13], but it is not clear where MR would fit into the array of current IM methods.

We have considered in this paper the case where the calibration study includes information on the disease variable *Y*. Sometimes this information is not available in the calibration study. In these cases, among the methods we have described, only RC1 is available, as others require knowledge of *Y* in the calibration study.

The ‘efficient’ version of RC (ERC) that we have used appeared in our simulations to offer a considerable advantage over the usual RC estimator (denoted by RC1 in our tables). Tables I and andIIII show the large advantage of ERC over standard RC when the measurement error is large. We also found in simulations not reported here that ERC was preferable to using Fuller’s small-sample correction for RC [14].

When the calibration study includes information on disease, and non-differential error pertains, then ERC appears more efficient than MR and IM in almost all of the situations that we have examined in our simulations. The simulations indicated that the gap between the methods narrows as the measurement error variance increases, but we found only one case where the standard error for the ERC estimate was larger than that of the other estimates. Our asymptotic results indicate that when the correlation between *X* and *Y* is high (*ρ _{XY}* ≥0.6) and measurement error is high (

We believe that the efficiency advantage provided by ERC stems primarily from its assumption of non-differential measurement error. RC2, MR and IM do not make this assumption, and payment in the form of increased variance is extracted for the privilege. In some cases the payment is very high. It is in fact possible, although more complex, to construct versions of MR and IM that are based on the assumption of non-differential measurement error and do not use the knowledge of *Y* in the calibration study. We have studied this separately and have found in simulations, not reported here, that they perform very similarly to ERC. This reinforces our view that the increased variance of the MR and IM estimates (differential error version) relative to the ERC estimates indeed results from relaxing the non-differential measurement error assumption. The rare cases where MR and IM improve on ERC will occur through the former methods’ use of *Y*, which, if it is highly correlated with *X*, can supply important extra information for estimating *X*.

We note that ERC can be viewed as an efficient linear combination of the usual RC estimator (RC1) and an RC estimator applied to the marker data in the calibration study (RC2). The insight that MR and IM also derive most of their information from the marker data in the calibration study, leads to the suggestion of combining the RC1 estimator with MR or with IM, instead of with RC2. As MR and IM will generally have somewhat greater precision than RC2, one would expect the resulting combined estimators to have slightly greater precision than ERC. We have not pursued this line here, as we made our aim to compare methods that have been proposed in the literature, but it is of interest to do so. Examining the combination of RC1 with IM would seem most worthwhile, firstly because IM is slightly more precise than MR and, secondly, because the variance of MR can be determined only by bootstrap, making it more cumbersome to obtain the best weights for the linear combination.

When the differential measurement error pertains, then RC2, MR and IM have considerably less bias than ERC, but can have much larger variance, and the decision which to use has to be weighed according to the expected degree of the bias arising in the ERC method. In the important case of prospective studies, however, differential measurement is less likely and the decision regarding which method to use can be based on the estimated variances, as in the AARP example presented.

The methods of MR and IM perform similarly, but IM has greater precision in some circumstances. Theoretical results indicate that, asymptotically, IM is always as efficient, or more efficient than MR. These results are supported by our simulations, although in many cases there is little practical difference between the two methods. One advantage of the IM estimate is the ability to obtain direct estimates of the standard error without resorting to use of the bootstrap. We found that the confidence intervals for the model parameters had good coverage properties if they were based on 40 multiply imputed data sets.

Contract/grant sponsor: National Cancer Institute; contract/grant number: CA-57030

Carroll’s research was supported by a grant from the National Cancer Institute (CA-57030).

The ERC estimate is a weighted average of two available RC estimates of *β _{X}*. The first estimate,

This is the usual RC estimate when the calibration study is external to the main study. However, as we have an internal calibration study, we can improve upon this estimate.

The second estimate, _{X,}_{RC2}, is obtained by (i) estimating *E*(), var() and var(*u*)= var(*M*_{2} − *M*_{1})/2 in the calibration study, where is the mean of the two determinations of *M*; (ii) calculating
and
; (iii) for each individual in the calibration study, calculating *X*_{RC2} = *Ê*()+ * _{M}* { −

Finally, we combine the two estimates of *β _{X}* as follows: (i) we estimate variances of

For each person in the main study sample who is not also in the calibration study we impute *X* using *X*_{IM}(*W, Y*) = *E*(*X*|*W,Y*)+*e*, whereas for persons in the calibration study, we impute using *X*_{IM}(*W, Y, *)= *E*(*X*|*W,Y, *)+*e*^{*}. In these formulas, *e* is a random draw from the distribution of residuals in the regression of *X* on (*W,Y*), whereas *e*^{*} is a random draw from the distribution of residuals in the regression of *X* on (*W,Y, *).

Assuming that *X* has a normal distribution conditional on *W* and *Y*, and that *u* in (5) has a normal distribution, then

(A1)

where *μ*(*α; W,Y*)= *E*(*X*|*W,Y*), Σ_{12}(*θ; W,Y*)= var(*X*|*W,Y*) and Σ_{11}(*θ; W,Y*)= var(*X*|*W,Y*)+ var(*u*). Then (*X*|*W,Y*)~ N(*μ* (*α; W,Y*), Σ_{12}(*θ; W,Y*)).

In addition, (*X*|*W,Y, *)~ N(*μ* (*α; W,Y*)+ *R*(*θ; W,Y*){ − *μ* (*α; W,Y*)}, Σ_{12}(*θ; W,Y*){1−*R*(*θ; W,Y*)}) where

The multiple IM procedure is therefore as follows:

- Fit model (A1), where
*μ*, Σ_{11}and Σ_{12}are known functions (defined below) of unknown parameter vectors*α*or*θ*, in the calibration study to obtain estimates , , and . - For
*k*= 1 to*K*IMs,- Generate a random draw of the parameter estimates: .
- For each individual in the main study but not in the calibration study, generate
*e*^{(}^{k}^{)}~ N(0, Σ_{12}(*θ*^{(}^{k}^{)};*W,Y*)) and calculate . - For each individual in the calibration study, generate
*e*^{*(}^{k}^{)}~ N(0, Σ_{12}(*θ*^{(}^{k}^{)};*W,Y*){1−*R*(*θ*^{(}^{k}^{)};*W,Y*)}) and calculate . - Regress
*Y*on in the main study to obtain the estimate and the naïve model-based estimate (that ignores the fact that*X*was imputed).

- Estimate
*β*as ._{X} - Estimate ( ) as

For continuous *Y*, the mean and variance functions are

This parameterization is used to ensure that the estimates of variance are always positive.

For binary *Y*, the mean and variance functions are

Assume the model

Assume that *Y* and *W* are measured in *N* individuals where *N* is very large.

Assume also that *Y*, *W* and *X* are measured in an independent sub-study of *n* individuals where *n* is much smaller than *N*.

This is not exactly the same situation as we simulated (e.g. we assume here that we can measure *X* exactly, whereas in the simulations we had repeat measurements of an unbiased of *X*), but we think it is close enough to give us insight into the results of the simulations in Section 4.

Define the following quantities:

where subscript *n* denotes that the quantity is being evaluated in the calibration sub-study.

Estimates of *β _{X}* considered in our paper are given by the OLS regressions of

where
is the estimated variance of _{X,}_{RC}* _{i}* (

Furthermore,

Subscript *N* indicates that the estimate is being made across the full study.

Our task is to evaluate these variances.

- Standard RC (RC1):Assuming
*N*very large, using the delta method, the approximate variance of this expression is *Calibration study RC (RC2)*: It is simple to show that*ERC*(*RC1*): From the above results it follows that the variance for_{X,}_{RC}is given by- MR:where is the sub-study estimate of the regression coefficient of
*X*on*Y*, and*ψ*is the regression coefficient of*W*on*Y*.For large*N*, using the delta method liberally, the variance of this quantity simplifies to approximately - IM:where is the sub-study estimate of the regression coefficients of
*X*on (*Y, W*).Using the delta method liberally, the variance of this expression for large*N*turns out to bewhere under the assumption of non-differential measurement error,

To verify the accuracy of the variance expressions for ERC, MR and IM, we compared their values with empirical variances obtained from simulations. In Table BI, the theoretical value is given in the upper half of the cell, and the empirical value in the bottom half. The values var(*X*)= 1, *β*_{0} = 0, *β _{X}* = 1,

Comparison of theoretical asymptotic standard errors (upper half of the cell) with empirical values (lower half of the cell) for efficient regression calibration (ERC), moment reconstruction (MR) and multiple imputation (IM) estimators.

In most cases the approximate formulas agree well with the empirical values. The formula for RC does not appear to do very well when var(*ε*) is large, i.e. 9. However, with a larger *N* (100 000) the empirical variance for ERC in this case reduces to 0.046, much closer to the theoretical (asymptotic) value of 0.042.

1. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement Error in Nonlinear Models: A Modern Perspective. 2. Chapman Hall/CRC; Boca Raton, FL: 2006.

2. Carroll RJ, Stefanski LA. Approximate quasilikelihood estimation in models with surrogate predictors. Journal of the American Statistical Association. 1990;85:652–663.

3. Gleser LJ. Improvements of the naïve approach to estimation in non-linear errors-in-variables regression models. In: Brown PJ, Fuller WA, editors. Statistical Analysis of Measurement Error Models and Applications. American Mathematical Society; Providence, RI: 1990.

4. Pierce DA, Kellerer AM. Adjusting for covariate errors with nonparametric assessment of the true covariate distribution. Biometrika. 2004;91:863–876.

5. Freedman LS, Fainberg V, Kipnis V, Midthune D, Carroll RJ. A new method for dealing with measurement error in explanatory variables of regression models. Biometrics. 2004;60:172–181. [PubMed]

6. Little RJA, Rubin DB. Statistical Analysis with Missing Data. 2. Chapter 10. Wiley; Hoboken, NJ: 2002.

7. Brownstone D, Valletta RG. Modeling earnings measurement error: a multiple imputation approach. The Review of Economics and Statistics. 1996;78:705–717.

8. Cole SR, Chu H, Greenland S. Multiple-imputation for measurement-error correction. International Journal of Epidemiology. 2006;35:1074–1081. [PubMed]

9. Spiegelman D, Carroll RJ, Kipnis V. Efficient regression calibration for logistic regression in main study/internal validation study designs with an imperfect reference instrument. Statistics in Medicine. 2001;20:139–160. [PubMed]

10. Kipnis V, Subar AF, Midthune D, Freedman LS, Ballard-Barbash R, Troiano R, Bingham S, Schoeller DA, Schatzkin A, Carroll RJ. The structure of dietary measurement error: results of the OPEN biomarker study. American Journal of Epidemiology. 2003;158:14–21. [PubMed]

11. Schatzkin A, Subar AF, Thompson FE, Harlan LC, Tangrea J, Hollenbeck AR, Hurwitz PE, Coyle L, Schussler N, Michaud DS, Freedman LS, Brown CC, Midthune D, Kipnis V. Design and serendipity in establishing a large cohort with wide dietary intake distributions: the National Institutes of Health-American Association of Retired Persons Diet and Health Study. American Journal of Epidemiology. 2001;154:1119–1125. [PubMed]

12. Thompson FE, Kipnis V, Midthune D, Freedman LS, Carroll RJ, Subar AF, Brown CC, Butcher MS, Mouw T, Leitzmann M, Schatzkin A. Performance of a Food Frequency Questionnaire in the U.S. National Institutes of Health-AARP Diet and Health Study. Public Health Nutrition. 2008;11:183–195. [PubMed]

13. Little RJA. Regression with missing X’s: a review. Journal of the American Statistical Association. 1992;87:1227–1237.

14. Fuller WA. Measurement Error Models. Wiley; New York: 1987.

15. Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for random within-person measurement error. American Journal of Epidemiology. 1992;136:1400–1413. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |