Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Stat Med. Author manuscript; available in PMC 2013 May 1.
Published in final edited form as:
Stat Med. 2012 October 15; 31(23): 2713–2732.
Published online 2012 June 29. doi:  10.1002/sim.5435
PMCID: PMC3640838

Regression Calibration with More Surrogates than Mismeasured Variables


In a recent paper, Weller, Milton, Eisen, and Spiegelman discuss fitting logistic regression models when a scalar main explanatory variable is measured with error by several surrogates, i.e., a situation with more surrogates than variables measured with error. Using a regression calibration approximate model as if it were exact, they compare two methods of adjusting for measurement error. One is the standard regression calibration approach consisting of substituting an estimated conditional expectation of the true covariate given observed data in the logistic regression. The other is a novel two-stage approach when the logistic regression is fitted to multiple surrogates, and then a linear combination of estimated slopes is formed as the estimate of interest. Applying estimated asymptotic variances for both methods in a single data set with some sensitivity analysis, the authors assert superiority of their two-stage approach. We investigate this claim in some detail. A troubling aspect of the proposed two-stage method is that, unlike standard regression calibration and a natural form of maximum likelihood, the resulting estimates are not invariant to reparameterization of nuisance parameters in the model. We show, however, that, under the regression calibration approximation, the two-stage method is asymptotically equivalent to a maximum likelihood formulation, and is therefore in theory superior to standard regression calibration. However, our extensive finite-sample simulations in the practically important parameter space where the regression calibration model provides a good approximation failed to uncover such superiorityof the two-stage method. We also discuss extensions to different data structures.

Keywords: attenuation, logistic regression, measurement error, reparameterization

1. Introduction

1.1. Basic problem

Weller et al. [1] consider a problem where a binary variable Y is to be regressed on a scalar main explanatory variable X and a vector covariate Z via logistic regression, but that X is subject to measurement error. Instead of observing X, a vector covariate W is observed, so that there are more surrogates, W, than there are variables measured with error, the scalar X. In place of standard regression calibration as described by Carroll and Stefanski [2] and Carroll et al. [3], they describe a novel extension of the two-stage linear approximation approach by Rosner et al. [4] to estimating the slope for X. They compare their new approach with standard regression calibration by estimating the corresponding approximate asymptotic variances based on a single data set and some sensitivity analysis and conclude that the two-stage approach provides a more efficient estimator than standard regression calibration.

The purpose of this paper is to place these methods in a more general context than just logistic regression, and to replace the somewhat informal calculations of Weller et al. with a more formal theoretical treatment. There are four important aspects of this treatment.

  • Weller et al. work with an approximate model, namely the regression calibration approximation to the conditional distribution of Y given the observed covariates (W, Z) and then do all their theoretical calculations as if this approximate model were true. This is a fairly common strategy when dealing with measurement error modeling and regression calibration, see for example Thurston et al. [5]. In the parameter subspace where such an approximation is close to the correct model, this allows a formal theoretical analysis, which we provide.
  • In this problem, the primary interest is in a single parameter, the regression slope for the main explanatory variable. All other parameters are nuisance parameters. We can thus write the likelihood function of the data as L(ζ, β1), where ζ is the vector of nuisance parameters and β1 is the parameter of interest. In all such likelihood problems, the maximum likelihood estimator of β1 is unchanged when a 1–1 reparameterization of ζ is done: the same is true for standard regression calibration. Unfortunately, the suggested two-stage estimator is not invariant to such 1–1 reparameterizations of the nuisance parameters, and can be made to vary quite widely in a given finite set of data. We suggest a modified two-stage estimator which, in our extensive simulations, provides better finite sample results than the original method.
  • We show that if the regression calibration approximation were exact, the two-stage estimator would have the same asymptotic variance as the maximum likelihood estimator. Extensive numerical work shows that, asymptotically, the maximum likelihood estimator and modified two-stage estimator are typically only slightly less variable than the one based on standard regression calibration, at least in the practically important parameter space where the regression calibration model provides a good approximation.
  • As noted above, the theoretical calculations in Weller et al. and in the present work are based on an approximate model. It is well known that this approximation can be a very poor one if the main explanatory variable has a relatively large effect on the outcome and/or measurement error is relatively large. In such cases, as is well demonstrated by the results in Weller et al., the model based on the regression calibration assumption is badly misspecified leading to severely inconsistent estimators with very different asymptotic and finite-sample properties. Comparing asymptotic variances of estimators derived under the regression calibration assumption may be irrelevant in this situation. However, these are exactly the cases where Weller et al. emphasize the superiority of their two stage method over the standard one, disregarding involved biases and finite-sample properties. We therefore provide finite-sample simulation results which suggest that standard regression calibration is, in most cases, at least as good as the two-stage method in terms of bias and mean squared error.

1.2. Basic model assumptions

The setup of the problem follows Weller et al. We have primary study data consisting of (Yip, Wip, Zip) for i = 1, …, np. We also have data on an external validation study (Wiv, Xiv, Ziv) for i = 1, …, nv, where X is scalar. There are two parts to the model. The first is the relationship between X and (W, Z). It is assumed that the regression of X on (W, Z) is linear and homoscedastic in both the primary and the validation data with identical parameters. In symbols, Xiv=γ0+WivTγ1+ZivTγ2+εiv,Xip=γ0+WipTγ1+ZipTγ2+εip, E(εiv) = E(εiv) = 0 and var(εiv)=var(εip)=σε2. As it turns out, all the methods to be discussed in this paper are numerically unchanged if we add the assumption that εiv and εip are also normally distributed. We write this as follows.

  • Assumption 1 The mean of X given (W, Z) equals γ0+ WTγ1+ZTγ2 and the variance of X given (W, Z) equals σε2 in both the primary and the validation data.

    The second part of the model is the relationship between Y and (X, W, Z) in the primary data. Here it is typical to make two assumptions.

  • Assumption 2 The measurement error is non-differential, i.e., W is independent of Y given (X, Z).
  • Assumption 3 In the primary data, the distribution of Y given (X, Z) follows an additive model, with density or mass function f (Y,β0 +X β1 + ZTβ2, ζ), where ζ is a vector of nuisance parameters.

1.3. Regression calibration approximation and approaches to comparisons

Our goal is to compare the two-stage method and the regression calibration method. There are three ways that this can be done.

  • First, one can do simulation studies, something we take up in Section 3.2.
  • The second approach is theoretical, involving asymptotic theory, but it involves an approximation as used by Weller et al. [1] among many other authors. The approximation, called the regression calibration approximation, is stated below as Assumption 4. It allows for a clear asymptotic comparison among the methods, see Appendix A.
  • The third approach is more formal than the other two and does not depend on the regression calibration approximation. Based only on Assumptions 1–3, one can develop the asymptotic distribution of the two-stage and regression calibration methods. We show how to do this for logistic regression in Appendix B. There are however problems with this in terms of numerical comparisons.
    • The two methods generally estimate different quantities, see Section 1.4 for cases that they do estimate the same thing.
    • Both estimators converge to quantities that are solutions to nonlinear integral equations which depend on the joint distribution of (X, W, Z). This makes numerical comparison of their biases and variances difficult because in each case, an integral equation must be solved.
    • Interestingly, we show in Appendix B that if we make the regression calibration approximation, the results of the developed correct asymptotic theory reduce to those in Appendix A.

The regression calibration approximation, which we state here as Assumption 4, has to do with the relationship between Y and (W, Z), i.e., the distribution of the observed data.

Assumption 4: The distribution of Y given (W, Z) is the same as that of Y given (X, Z), except X is replaced by its regression on (W, Z), E (X | W, Z), and the intercept is changed [3]. In symbols, it is assumed that the density/mass function of Y given (W, Z) is f (Y, α0 + β1 E (X | W, Z) +ZTβ2, ζ) =f (Y, α0 + WTα1+ZTα2, ζ), where α1 = β1γ1 and α2 = β2 + β1 γ2.

1.4. Discussion of assumptions

It is important to emphasize that, for most nonlinear regressions including logistic regression, Assumptions 1–4 are contradictory. In particular, Assumptions 1–3 are the ones usually made in the analysis, from which not only does Assumption 4 not follow, but, strictly speaking, it is incompatible with them. There are at least three notable exceptions.

  • If β1 = 0, then the regression calibration approximation holds exactly and Assumption 4 is true. In this case, both the two-stage method and standard regression calibration method estimate β1 = 0 asymptotically. It is a consequence of the work in Appendix A that the two estimates are asymptotically equivalent when β1 = 0.
  • The second case occurs when Y given (X, Z) follows a homoscedastic linear model, in which case Assumption 4 is strictly true. As it turns out, in this case the two-stage method and the regression calibration method are asymptotically equivalent, as we showin Appendix A.
  • The third is probit regression, where the probability of disease is pr(Y =1| X, Z) = Φ(β0 +X β1+ZTβ2). If X given (W, Z) is normally distributed, then the observed data follow the probit model
    where β1=β1/(1+β12σε2)1/2,β0=(β0+γ0β1)/(1+β12σε2)1/2 and β2=(β2+β1γ2)/(1+β12σε2)1/2. In this probit case, maximum likelihood for the observed data of Y given (W, Z), the two-stage method and the standard regression calibration method all estimate β1, and our asymptotic results apply. Of course, if β12σε2 is large, then all methods are therefore badly biased.

Outside of these important cases, we are left with a dilemma about the relevance of the asymptotic theory based on Assumption 4 for comparing the two methods. There are three possible ways to argue the relevance of the asymptotic theory based on Assumption 4, all of which are equivalent for our calculations but are very different conceptually. The first is the conventional approach of staying with the usually made Assumptions 1–3 and using Assumption 4 as an approximation. In cases where β12σε2 is relatively small, this approximation is quite good and the theoretical results are relevant. One should be careful though and not extend this theory to the full parametric space. In the absence of relevant theoretical results when Assumption 4 provides a poor approximation, the statistical properties of the estimated slope β1 should be investigated by extensive finite-sample simulations.

The second possibility is to make Assumptions 1, 2 and 4. In this case, the true model remains unknown, but one assumes that β1 has some approximate relevance to the parameter of interest, although we do not know exactly how to specify this relationship. The third possibility is to make Assumptions 1, 3 and 4. In this case, one does not assume that error is non-differential, but rather that it is such that Assumptions 1, 3 and 4 hold. Although all three possibilities are interesting, and our work applies to all of them, we will concentrate on the first one which is usually used in practical applications.

1.5. Summary and outline

In summary, a generalization of the framework of Weller et al. based on Assumptions 1–3 and using Assumption 4 as an approximation is as follows.


The goal is to estimate β1. Model (1)-(3) represents a likelihood formulation for the observed data. This is a new insight, and it suggests immediately that the data can be fit, efficiently, by maximum likelihood to get an asymptotically efficient estimate of the crucial parameter β1.

An outline of this paper is as follows. In Section 2, we describe the two-stage method of Weller et al. generalized to our context (1)–(3). We then state our three main results:(a) in the linear model for Y given (X, Z), all three methods are asymptotically equivalent; (b) maximum likelihood for (1)–(3) and the two-stage method are asymptotically equivalent; and (c) standard regression calibration is not in general asymptotically equivalent to maximum likelihood for (1)–(3).

In Section 3.1, we take up the issue of how much more efficient the two-stage method is theoretically compared to standard regression calibration, by studying asymptotic variances over a broad range of interesting problems, and via simulations. In contrast to Weller et al., who use a single data set and compute estimated asymptotic variances, we find that in practice, the difference in asymptotic efficiency between the two methods typically appears to be slight, although there are some exceptions. As mentioned above, under usually made Assumptions 1–3, model (1)–(3) is merely an approximation; this suggests that a finite-sample simulation study that compares the standard regression calibration and two-stage methods be undertaken. The results of the simulation study are reported in Section 3.2. and Appendix C. We conclude with a discussion in Section 4. A sketch of the technical details when Assumption 4 approximately holds is given in Appendix A, while in Appendix B we derive the exact asymptotic distributions for logistic regression, and show that they reduce to the results in Appendix A when the regression calibration approximation is invoked.

2. Methods and results

2.1. Methods

Maximum likelihood for model (1)–(3) is straightforward. Standard regression calibration consists of first estimating (γ1, γ2) using the validation data only, substituting them into (3), assuming that γ1 is fixed to this estimated value and known, and then performing maximum likelihood to estimate the unknown slope β1.

The two-stage method is somewhat more complex. Again, Weller et al. first estimate (γ1, γ2) using the validation data to obtain ([gamma with circumflex]1, [gamma with circumflex]2). Then estimation of β1 proceeds in two stages. At stage 1, the following model is fit to the data:


leading to estimates [alpha]1 of the “naive” slopes. Because of model (3), one has β1eone = α1//γ1, where // denotes element-wise division and where eone = (1, 1,…, 1)T. At stage 2, Weller et al. estimate β1 by forming the weighted linear combination τT(α1//γ1) and choosing the weights to minimize the asymptotic variance of this linear combination subject to the constraint that eoneTτ=1.

2.2. Non-invariance of the two-stage method

Let Λ be any full rank square matrix whose number of columns is the dimension of W. Consider using the transformed vector of surrogates Wiv=ΛWiv and Wip=ΛWip based on the same full-rank linear transformation Λ applied to both the validation and primary data. It is easy to see theoretically that both maximum likelihood and regression calibration give exactly the same estimate of β1 whether (Wiv, Wip) or ( Wiv,Wip) are used. For example, with regression calibration, dropping precisely measured covariates for simplicity, if [gamma with circumflex]1 is the estimate of γ1 based on the linear regression of Xiv on Wiv, then γ^1=(ΛT)-1γ^1 is the corresponding estimate if Wiv is used, and the regression calibration predictions are equivalent, i.e., WipTγ^1=WipTγ^1. This is as it should be: a 1–1 transformation of nuisance parameters should not affect the estimate of the main parameter of interest.

In contrast, the two-stage estimator enjoys no such property, and indeed in some settings it can be made to take on almost any value by varying Λ. For example, at

can be found primary and validation data sets when W is two dimensional. We generated 10,000 full rank linear transformations Λ by having their entries be standard normal random variables. The two-stage estimator for the transformed W-variables over these 10,000 simulated full rank linear transformations had a minimum = −36.11, a maximum = 2.43, a mean = 0.79 and a 98% confidence interval of [0.75, 0.83].

We suggest linearly transforming W in the validation study to have mean zero and identity covariance matrix, applying the same transformation to the primary data, and then applying the two-stage method. In our extensive simulations, we have found that the modified two-stage method provides better results than the original two-stage estimator.

2.3. Assumptions and results

We show that the two-stage method achieves optimal asymptotic properties under the regression calibration assumption.

Here is our main result, the proof of which is given in Appendix A. The asymptotic variances for maximum likelihood, regression calibration and the two-stage method are given in Appendix A at equations (A.3), (A.4) and (A.5), respectively.

Theorem 1

Under model (1)-(3), the maximum likelihood and standard regression calibration estimators have the same asymptotic variances if Y given (X, Z) follows a homoscedastic linear model or if β1 = 0, but not in general. Under model (1)-(3), the two-stage estimator of β1 is asymptotically equivalent to maximum likelihood if there are no nuisance parameters ζ or if Y given (W, Z) follows a generalized linear model in the primary data.

2. Methods Comparisons

3.1. Asymptotic efficiency comparisons under the regression calibration assumption

3.2. Background

While, at least technically, regression calibration is not efficient for approximate model (1)-(3), it remains to be seen how inefficient it is in realistic contexts. The asymptotic variances for all considered estimators are given at (A.3), (A.4) and (A.5) in Appendix A.

To compare these variances, we consider the case of logistic regression. Without loss of generality one can set E(X) = 0 and var(X)=γTcov(W)γ+σε2=1, and because maximum likelihood, regression calibration and the modified two-stage method for estimating β1 are all invariant to full-rank linear transformations of W, without loss of generality one can assume that cov(W) = I. Assuming that εiv=εip=Normal(0,σε2), the only items that one needs to vary are (a) the distribution of W; (b) ( σε2, γ) subject to γTγ+σε2=1; (c) the odds ratio of interest; and (d) the rate of disease in the source population. The latter two determine the underlying logistic intercept β0 and the slope β1.

3.1.2. Normal and Chi-square W

First, W was set to be bivariate, with mean zero and covariance matrix the identity, but generated from (a) a bivariate normal distribution and then rescaled; (b) a bivariate chi-square distribution with one degree of freedom and then rescaled; and (c) a bivariate chi-square distribution with one degree of freedom with a random sign, resulting in a mixture distribution. Without loss of generality, we defined X = WT γ+ ε to have mean zero and variance one. We did this by first generating σε2=Uniform(0,1). We then generated γ* = Normal(0, I2), and defined γ=(1-σε2)1/2γ/(γTγ)1/2. Then we set the true logistic intercept β0 and the true logistic slope β1 so that the odds ratio of moving from the 10th to the 90th percentile of the distribution of X was in the set (2.0, 3.0, 4.0, 5.0, 6.0), and the probability of disease in the source population was in the set (0.075, 0.150,…, 0.450). Hence, we considered relatively rare to common diseases, and relatively modest to large true odds ratios, along with fairly extreme distributions for W. We varied the ratio of the validation sample size to the primary sample size, cF = nv/np, to be (0.01, 0.10). For each of these settings, we searched over 1,000cases of ( σε2, γ). When computing asymptotic variances, we set α0=β0+β12σε2/2, the usual regression calibration approximation. Monte-Carlo was used to calculate integrals where required.

The results are striking: in all cases, the asymptotic variance of the standard regression calibration estimator of β1 was less than 1.5% larger than the asymptotic variance of the maximum likelihood estimator, and the two-stage estimator.

3.1.3. Lognormal and Gamma W

We also considered lognormal cases. We let Σv be a correlation matrix with correlation ρ in the off-diagonal element. We generated 1,000 settings where we generated −0.95 ≤ ρ ≤ 0.95, σε2 and γ so that var(X) =1, see Section 3.1.2. We set α0=β0+β12σε2/2, where β0 and β1 are defined below so that the rate of disease and the odds ratio are the same as in Section 3.1.2.

The W data were generated as follows. We describe here the theoretical setting, although simulation was actually used in all cases to compute the integrals involved in the asymptotic variances. First, W* were bivariate independent lognormal random variables such that log(W*) are bivariate independent normal variables with mean zero and variance ζ = Uniform(0,1). We defined W** to be the standardized version of W*, with both independent variables having mean zero and variance one, then made W to be a linear transformation of W** that had covariance matrix Σv and exactly mean zero. In symbols, let Z be a bivariate standard normal random variable. Then


The lognormal distribution is notorious for generating a few extremely wild observations. For example, in Figures 1 and and22 we give box plots of 5 randomly generated data sets of size 10,000 from the lognormal and gamma distributions, each with the coefficient of variation equal to 1.0. These are plotted on the same scale to make the point that the lognormal distribution is much more prone to wild observations than is the gamma distribution. Thus, the distribution that we actually used for W was a truncated one, truncated at 4×IQR above the 75th or below the 25th percentile, where IQR is the interquartile range. Using this rule of thumb, in the lognormal case we removed 1.18% of the distribution of W, while for the gamma distribution the corresponding figure is 0.24%. We then generated X from the remaining observations. We set β1 so that the odds ratio for moving from the 10th to the 90th percentile of the distribution of X would be a predefined value O R, and then formed β0 so that the probability pr(Y =1) of disease in the population would be equal to a predefined value π.

Figure 1
Box plots of five randomly generated lognormal data sets of size 10,000 with coefficient of variation = 1.0.
Figure 2
Box plots of five randomly generated gamma data sets of size 10,000 with coefficient of variation = 1.0.

In most interesting situations for practical applications, with odds ratios OR ≤ 6, the worst case for regression calibration was when π = 0.075, O R = 6, and nv/np = 0.01. In the 1,000 cases, the maximum asymptotic variance inflation of standard regression calibration was 13%, and more than 90% of these test cases had less than a 2% variance inflation. If nv/np = 0.10, there was at worst a 1% variance inflation. In other words, there was little or no asymptotic variance inflation in these cases. We have done calculations without truncating the lognormal distribution, and the variance inflation becomes larger, sometimes much larger, reaching 33% in extreme cases, but these situations seem to be unrealistic in practice because of the wild observations. In the gamma case, there was essentially no variance inflation.

We have also checked the lognormal case should the odds ratio for moving from the 10th to the 90th percentile of the distribution of X reach a (practically impossible in our experience) value of O R = 45. The maximum variance inflation of standard regression calibration was 83% with truncation and 147% without. As was mentioned previously, in all those cases, the regression calibration approximation leads to a severely misspecified model. The resulting slopes estimated by both methods involve large biases and should not be used for measurement error adjustment.

3.2. Finite-sample simulations

All results up to now have been asymptotic results based on the regression calibration approximation in equations (1)–(3). In this section, we use simulations to examine the finite sample properties of the two-stage and regression calibration methods when the regression calibration assumption given by equation (3) is violated. We especially concentrate on cases when equation (3) does not hold even approximately, and the relevance of asymptotic variance calculations under this assumption becomes questionable.

3.2.1. Normal W

We let Y be a binary disease outcome, X = true exposure, W1 = first surrogate measurement of X and W2 = second surrogate measurement of X. We performed 5 sets of 10,000 simulations. For each simulation, we generated two independent data sets, a main study and a validation study. In the main study, each subject had variables Y, W1 and W2. In the validation study, each subject had the variables X, W1 and W2. The sample size for the main study was 10,000, while the sample size for the calibration study was 1,000. We generated the data according to the following models. First, (W1, W2) was bivariate normal with zero means, common variance equal to 1.0, and correlation ρ. We generated ε=Normal(0,σε2) and set X = γ0 + γ1W1 + γ2W2 + ε. Finally, we generated Y as a binary variable with pr(Y =1| X) =H (β0 +β1 X), where H (·) is the logistic distribution function. In all simulations, we set ρ = 0.5, γ0 = 0, and σε2=1-(γ12+γ22+2ργ1γ2), thus ensuring that var(X) =1. Then we set the true logistic intercept β0 and the true logistic slope β1 so that the odds ratio for increasing X by two standard deviations of its distribution was equal to a predefined value OR, and the probability of disease in the source population was equal to a predefined value π. Table 1 shows the values of the parameters for each simulation. For each simulation, we estimated β1 using standard regression calibration and both the original and modified version of the two-stage method of Weller et al. Table 2 shows the results of the simulations.

Table 1
Parameters used in simulation study with normal W.
Table 2
Results of simulation study with normal W. Means, standard deviations (SD) and root mean squared errors (RMSE) of the estimators of β1.

The results are instructive. In cases 1 – 4, the empirical variance of standard regression calibration is equal to or smaller than that of the unmodified two-stage method with essentially no difference between standard regression calibration and the modified two-stage method. In the most extreme case 5, there is a 3% and 9% difference in variance, with the original and modified two-stage methods, respectively, being less variable. However, in all cases, standard regression calibration is less biased than either two-stage method and, as a result, has smaller root mean squared error. When one deals with true odds ratios greater than 20, cases 4–5, the regression calibration approximation can be particularly poor, and finite-sample properties of the considered estimators become especially relevant.

3.2.2. Lognormal W

We also ran simulations when surrogate measurements had a lognormal distribution. In addition to a normally distributed error ε in the linear regression of X on W, at the request of one of the reviewers, we also considered the case of a lognormal error, both homoscedastic and heteroscedastic that depended on W. The description of those simulations and their results are given in Appendix C. All results are reasonably in line with those in the previous section for the normal case.

3.2.3. Categorical W

We used the data described by Weller et al. to understand the properties of the methods when W1, W2 are categorical. Following Section 6 in Weller et al,. we considered the case when W = (W1, W2) had in total 3 distinct values. We set W to be categorical with 3 levels according to probabilities (0.4,0.3,0.3), and let W1, W2 be the dummy variable indicators according to the two highest levels, with the induced correlation of ρ = −3/7. We set γ0 = 0.127, and σε2=1-(γ12+γ22+2ργ1γ2). Table 3 shows the values of the parameters for each simulation. In the simulations, the primary sample size was 10,000, the validation sample size was 1,000, and 10,000 simulated data sets were generated at each configuration. The results given in Table 4 are quantitatively similar to those for continuously distributed surrogate values.

Table 3
Parameters used in simulation study with categorical W.
Table 4
Results of simulation study with categorical W. Means, standard deviations (SD) and root mean squared errors (RMSE) of the estimators of β1.

4. Discussion

We have shown that the two-stage method of Weller et al. is not regular in the sense that the finite sample estimates of the odds ratio can vary widely if different 1–1 transformations of nuisance parameters are made. Despite this, the two-stage method is asymptotically equivalent to maximum likelihood for approximate model (1)–(3).

Our most striking conclusion is that, under the regression calibration approximation, the two-stage method is asymptotically less or no more variable than standard regression calibration in a wide variety of circumstances.

We performed numerical calculations that somewhat temper the main conclusion. We found in most cases that the two-stage and standard regression calibration methods were very nearly equivalent in asymptotic variance, exceptions being when the odds ratio was very and perhaps unrealistically large. It would be interesting to find practical cases with modest odds ratios and well behaved distributions of W when the variance gain from using the two-stage method was substantial, but we were unable to find such cases.

We have also conducted an extensive simulation study to compare finite-sample properties of the estimators, with a special attention to cases where the regression calibration assumption does not provide a good approximation and the relevance of the asymptotic variances of the estimators under this approximation becomes questionable. We have found that only in such extreme cases the standard regression calibration estimator had a slightly greater finite-sample variance than the two-stage estimator. Yet, in all cases, standard regression calibration was less biased and, as a result, had smaller root mean squared error.

Under the regression calibration approximation, the two-stage method is asymptotically efficient for the problems we have described. It would be interesting to see how the method could be applied to other problems and assess its asymptotic properties in these situations. One of the properties of standard regression calibration, or maximum likelihood in the regression calibration context, is that it generalizes easily to cases where there are more than one predictor measured with error, and the surrogates for each such predictor may or may not be the same. It is possible to define a two-stage procedure for such cases, although it is not straightforward and the resulting estimators remain not invariant to reparameterization of nuisance parameters Whether they have optimality properties that match the asymptotic properties of the regression calibration maximum likelihood is unknown.

To illustrate this, suppose that there are K = 2 variables measured with error and X1=W1Tγ1+W2Tγ2+ε1 and X2=W2Tγ3+W3Tγ4+ε2. This is a case that one set of covariates, W2 appear in each regression, but the others, W1 and W3, appear in only one of the regressions. Then write WT=(W1T,W2T,W3T), β=(β1, β2)T, so that


It is not clear what α is here, and how one can take advantage of the fact that there is knowledge that W3 is not a predictor of X1 given (W1, W2). While it is possible to devise estimators, and weight them, their properties are unclear; maximum likelihood and standard regression calibration have no such lack of clarity.


We thank Dr. Weller for providing us with the data used in their original analysis. Carroll’s research was supported by a grant from the National Cancer Institute (CA57030) and by Award Number KUS-CI-016-04, made by King Abdullah University of Science and Technology (KAUST).

Appendix A: Sketch of Technical Arguments Under the Regression Calibration Model

A.1. Simplifications

It is readily seen that the scores for estimating (α, β1, β2, γ0, γ1, γ2, ζ) in model (3) are orthogonal to the score for estimating σε, so we will without loss of generality assume σε2 is known. Also, without loss of generality, we center Xiv so that E(Xiv) = 0. We also center Wiv by subtracting a constant vector so that E(Wiv) = 0, and subtract the same constant from Wip: the same procedure is applied to Ziv and Zip. This means that γ0 = 0 and we can ignore it in our development.

It suffices to prove the main result in the special case that there are no nuisance parameters ζ. In generalized linear models there may be a nuisance parameter ζ, but the loglikelihood is of the form


However, the score for ζ is orthogonal to the score for (α, β1, γ1, β2). To see this well-known fact, we note that the score for (α, β1, γ1, β2) is


where the superscript means first derivative. However, because this is a generalized linear model, E{G(Y, W, Z)| W, Z} = 0. Differentiating the score for (α, β1, γ1, β2) with respect to ζ yields −G (Y, W, Z)/ζ2, so it too has mean zero. This means that the Fisher information matrix is block-diagonal, that the scores are orthogonal, and hence in generalized linear models the asymptotic theory is the same as if δ were known.

It suffices to work in the special case with no accurately measured covariates Z in the model: the extra generality is accomplished by somewhat more complicated definitions of what are terms A and B defined below. In this case, we replace γ1 by γ and β1 by β. The log-likelihood score then takes the form [ell] (y, v) = [partial differential]log{f (y, v)}/[partial differential]v with v = + α+ βWTγ. Hence, we do the detailed calculations for the model


We also use standard facts about likelihood theory, e.g., that the square of the score has expectation that is −1 times the expectation of the derivative of the score, so that, for example, we define in the primary study


These are easily confirmed by conditioning on W. Define nv/np = cF.

A.2. Some Matrix Formulae

We use repeatedly the fact that if




where b22=(a22-a21a11-1a12)-1,b11=a11-1+a11-1a12b22a21a11-1,b12=-a11-1a12b22, and b21=-b22a21a11-1. It is also worth remembering that


A.3. Asymptotic Theory for Maximum Likelihood

The score for (α, γ, β) is given as


It follows with routine calculations, and remembering that cov(Wiv) = Σv, that


Let e1T=(0,0,0,1). It follows from likelihood theory that


We want to compute e1TJ-1e1. Define




A.4. Asymptotic Theory for Regression Calibration

The estimating equation for (α, γ, β) is




Define M to be the covariance matrix of the left hand side of the last expansion so that


Then we have that


A.5. Asymptotic Theory for the Two-Stage Method

This method estimates γ in the validation data only, and thus nv1/2(γ^-γ)Normal(0,σε2v-1). Hence, np1/2(γ^-γ)Normal{0,(σε2/cF)v-1}. The method then fits the likelihood f(Yip,α0+WipTα1), and the loglikelihood score for this is just (1,Wip)T(Yip,α0+WipTα1). This has information matrix


Hence, the asymptotic covariance matrix of np1/2(α^1-α1) is the lower right corner submatrix of J2-1, and thus


The two-stage method considers a weighted linear combination of {diag([gamma with circumflex])}−1[alpha]1. Now, by algebra,


remembering that β(1,…,1)T ={diag(γ)}−1α1. Hence,


If eone = (1, 1, …, 1)T, then the optimal weight is ω=Ξ-1eone(eoneTΞ-1eone)-1, from which it follows that ωTΞω=(eoneTΞ-1eone)-1, and thus




and hence


A.6. Asymptotic Equivalence of MLE and Two-Stage Methods

Here we show that the maximum likelihood and two-stage estimators of β are asymptotically equivalent. Write θ=cF/σu2 and N =CBA−1BT. Studying (A.3) and (A.4), we have to show that Ω = Λ where for completeness






a11 = A−1{1+β2BTa22BA−1}, a12 = −βA−1BTa22 and a21 = −βa22BA−1.


so that the MLE and the two-stage estimator are asymptotically equivalent. This completes the argument.

A.7. Asymptotic Non-Efficiency of Standard Regression Calibration

Here we show that the regression calibration estimator is not generally asymptotically equivalent to the other estimators. To do this, we simply need to exhibit a special case, and the simplest such case arises when Σv = I and B = 0. When B = 0, for the MLE,


and hence


the lastfollowing from our more general result.

For standard regression calibration, we get a different answer. Let D* and M* be the lower non-diagonal versions of Dand M, so that




Direct calculations show that


This is in general different from the maximum likelihood and two-stage variance. It does coincide with them in the case of linear regression, where C=I/σε2, and in the case when β = 0.

Table A1

Parameters used in simulation study with lognormal W.

Caseγ1γ2β0β1 σε2R2 * β12σε2Odds Ratio exp(2β1)Probability of Disease in Population
* R2=1-σε2 is the R-squared of the regression of X on (W1, W2).

Table A2

Results of simulation study with lognormal W and normal error (ε). Means, standard deviations (SD) and root mean squared errors (RMSE) of the estimators of β1.

Caseβ1 β12σε2Original Two-Stage MethodModified Two-Stage MethodRegression Calibration Method

Table A3

Results of simulation study with lognormal W and homoscedastic lognormal error (ε). Means, standard deviations (SD) and root mean squared errors (RMSE) of the estimators of β1.

Caseβ1 β12σε2Original Two-Stage MethodModified Two-Stage MethodRegression Calibration Method

Table A4

Results of simulation study with lognormal W and heteroscedastic lognormal error (ε). Means, standard deviations (SD) and root mean squared errors (RMSE) of the estimators of β1.

Caseβ1 β12σε2Original Two-Stage MethodModified Two-Stage MethodRegression Calibration Method

Appendix B: Asymptotic Theory for Logistic Regression

The purpose of this appendixis to provide a precise asymptotic analysis in the case of logistic regression, clearly stating where the regression calibration is used. In cases that the regression calibration is used, we will replace = by [equals, single dot above].

Let H (x) =exp(x)/{1+exp(x)} be the logistic distribution function. Our model is that in the validation and primary studies, which are of size np and nv, respectively,


We define cF = nv/np and use the fact that np1/2(γ^-γ)Normal{0,(σε2/cf)v-1}.

B.1. Two-stage method

B.1.1. Basic Theory

The two-stage method makes the assumption that there is a unique (α0, α1) such that


In this case, (α0,α1) are the parameters that logistic regression of Y on W in the primary study actually converges to in probability. It should be noted that, precisely,


In other word, (B.2) says that the observed data do not follow a logistic regression exactly. Logistic regression of Y on W in the primary study means solving the estimating equation


Define vip=Yip-H(α0+WipTα1), and make the further definitions


where H (1)(x) =H (x){1 − H (x)}. It is a consequence of standard estimating equation theory that


This means that


The two-stage method considers a weighted linear combination of {diag(γ)}−1[alpha]1. We have that




If eone = (1, 1, …, 1)T, then the optimal weighted combination in terms of variance of {diag([gamma with circumflex])}−1[alpha]1has weights ω=Q4-1eone(eoneTQ4-1eone)-1. The two stage estimator of β then is [beta]TS = ωT{diag(γ)}−1[alpha]1, and we have that


Of course, an implication of (B.3) is that the two-stage estimator converges to ωT{diag(γ)}−1α1).

B.1.2. Implications of the regression calibration approximation

Equation (B.3) makes it clear that the two-stage method actually converges to ωT{diag(γ)}−1α1). However, ω = ω(Q1, Q2, γ, α1) depends on (Q1, Q2, γ, α1). Since (α0, α1) are the solutions to the integral equations (B.1), which in turn depend on the distributions of (X, W) in the primary study, the asymptotic variance (B.4) is not particularly useful for comparison with regression calibration.

However, there are important simplifications if one makes the same regression calibration approximations that are made by Weller, et al.[1], namely that for some βTS,



where in both cases we have invoked the term [equals, single dot above] to reflect the regression calibration approximation. In this case, it is readily seen from (B.4) that Q1 [equals, single dot above] Q2, ωT{diag(γ)}− 1α1 [equals, single dot above] βTS, and then the regression calibration approximation to the asymptotic variance of the two-stage method reduces to the result in the Appendix A. In particular, define Vip=H(1)(α0+WipTα1). Then we have that Q3 [equals, single dot above] (CBA−1BT)−1, which agrees with the result in Appendix A. Using (B.5), we also have


which also agrees with the results in Appendix A.

B.2. Regression calibration method

We will distinguish here between the regression calibration method and the regression calibration approximation.

B.2.1. Basic theory

The regression calibration method makes the assumption that there is a unique (β0, βRC) such that


Then ([beta]0, [beta]1) solve


This means that the estimating equations are


where if Vip=H(1)(β0+WipTγβRC),


Letting vip =YipH (β0 +WipγβRC), and defining


it follows that if e1 = (0, 0, …, 0, 1)T,


B.2.2. Implications of the regression calibration approximation

Like (B.3), equation (B.8) is not particularly useful for making comparisons among various methods because of the need to solve the integral equation (B.6) that in turn depends on the distribution of Wip, etc. However, the equivalent regression calibration approximation to (B.4)–(B.5) is that


Under (B.4)–(B.5) and (B.9), βTS [equals, single dot above] βRC. Equation (B.9) implies that E(vip2Wip)E(VipWip). In this case, D1 and D2 simplify to



exactly as described in Appendix A.

Appendix C: Simulations with Lognormal W

In this appendix, we present results of three sets of simulations where surrogates W1 and W2 had bivariate lognormal distributions. We first generated (Z1, Z2) as bivariate standard normal with correlation ρ*. Let σ=log(5), μ* = exp(σ2/2) and σ2=exp(2σ2)-exp(σ2), so that the coefficient of variation is σ*/μ* = 2. By simulation, we found ρ* such that corr{exp(σZ1), exp(σZ2)}= ρ. Then (W1, W2) defined by Wj ={exp(σZj) − μ*}/σ*, j =1,2 has the desired bivariate lognormal distribution with meanzero, vari ance one, and correlation ρ.

As with other simulations described in the main text, we generated X as X = γ0 + γ1W1 +γ2W2 + ε, where ε was uncorrelated with (W1, W2) and E(ε) = 0, σε2=1-(γ12+γ22+2ργ1γ2), thus ensuring E (X) = 0, var(X) =1. As before, we generated Y as a binary variable with pr(Y =1| X) =H (β0 +β1 X), where H (·) is the logistic distribution function. Then we set the true logistic intercept β0 and the true logistic slope β1 so that the odds ratio for increasing X by two standard deviations of its distribution was equal to a predefined value OR, and the probability of disease in the source population was equal to a predefined value π.

In the first set of simulations, we generated ε to be normal and independent of (W1, W2). In the second set, we set ε = δE(δ), where δ was independent of (W1, W2) and had a lognormal distribution with E(δ)=1-(γ12+γ22+2ργ1γ2) and CV = 1. In the third set of simulations, we considered a case of heteroscedastic error ε = (γ1W1 +γ2W2)× (δ −1), δ was independent of (W1, W2), lognormal with mean E(δ) =1, variance var(δ)={1-(γ12+γ22+2ργ1γ2)}/(γ12+γ22+2ργ1γ2), and CV(δ) = var(δ).

In all simulations, we set ρ = 0.5, γ0 = 0, nv =1,000 and np =100,000. Other parameters for each simulation are given in Table A1. Tables A2A4 present the results for simulation sets 1 to 3, respectively.


1. Weller EA, Milton DK, Eisen EA, Spiegelman D. Regression calibration for logistic regression with multiple surrogates for one exposure. Journal of Statistical Planning and Inference. 2007;137:449–461.
2. Carroll RJ, Stefanski LA. Approximate quasi-likelihood estimation in models with surrogate predictors. Journal of the American Statistical Association. 1990;85:652–663.
3. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement Error in Nonlinear Models: A Modern Perspective. 2. Chapman and Hall CRC Press; Boca Raton, FL: 2006.
4. Rosner B, Willett WC, Spiegelman D. Correction of logistic regression relative risks and confidence intervals for systematic within-person measurement error. Statistics in Medicine. 1989;8:1051–1070. [PubMed]
5. Thurston SW, Spiegelman D, Ruppert D. Equivalence of regression calibration methods in main study/external validation study designs. Journal of Statistical Planning and Inference 2003. 2003;113:527–539.