PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Stat Med. Author manuscript; available in PMC May 1, 2013.
Published in final edited form as:
PMCID: PMC3640838
NIHMSID: NIHMS453468
Regression Calibration with More Surrogates than Mismeasured Variables
Victor Kipnis,a* Douglas Midthune,a Laurence S. Freedman,b and Raymond J. Carrollc
aBiometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, MD, U.S.A
bGertner Institute for Epidemiology and Health Policy Research, Sheba Medical Center, Tel Hashomer, Israel
cDepartment of Statistics, Texas A&M University, College Station, TX, U.S.A
*Correspondence to: Victor Kipnis, Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Executive Plaza North, Room 3124, 6130 Executive Blvd., MSC 7354, Bethesda, MD 20892-7354, U.S.A
kipnisv/at/mail.nih.gov, Tel: 301-496-7464, FAX: 301-402-0816
In a recent paper, Weller, Milton, Eisen, and Spiegelman discuss fitting logistic regression models when a scalar main explanatory variable is measured with error by several surrogates, i.e., a situation with more surrogates than variables measured with error. Using a regression calibration approximate model as if it were exact, they compare two methods of adjusting for measurement error. One is the standard regression calibration approach consisting of substituting an estimated conditional expectation of the true covariate given observed data in the logistic regression. The other is a novel two-stage approach when the logistic regression is fitted to multiple surrogates, and then a linear combination of estimated slopes is formed as the estimate of interest. Applying estimated asymptotic variances for both methods in a single data set with some sensitivity analysis, the authors assert superiority of their two-stage approach. We investigate this claim in some detail. A troubling aspect of the proposed two-stage method is that, unlike standard regression calibration and a natural form of maximum likelihood, the resulting estimates are not invariant to reparameterization of nuisance parameters in the model. We show, however, that, under the regression calibration approximation, the two-stage method is asymptotically equivalent to a maximum likelihood formulation, and is therefore in theory superior to standard regression calibration. However, our extensive finite-sample simulations in the practically important parameter space where the regression calibration model provides a good approximation failed to uncover such superiorityof the two-stage method. We also discuss extensions to different data structures.
Keywords: attenuation, logistic regression, measurement error, reparameterization
1.1. Basic problem
Weller et al. [1] consider a problem where a binary variable Y is to be regressed on a scalar main explanatory variable X and a vector covariate Z via logistic regression, but that X is subject to measurement error. Instead of observing X, a vector covariate W is observed, so that there are more surrogates, W, than there are variables measured with error, the scalar X. In place of standard regression calibration as described by Carroll and Stefanski [2] and Carroll et al. [3], they describe a novel extension of the two-stage linear approximation approach by Rosner et al. [4] to estimating the slope for X. They compare their new approach with standard regression calibration by estimating the corresponding approximate asymptotic variances based on a single data set and some sensitivity analysis and conclude that the two-stage approach provides a more efficient estimator than standard regression calibration.
The purpose of this paper is to place these methods in a more general context than just logistic regression, and to replace the somewhat informal calculations of Weller et al. with a more formal theoretical treatment. There are four important aspects of this treatment.
  • Weller et al. work with an approximate model, namely the regression calibration approximation to the conditional distribution of Y given the observed covariates (W, Z) and then do all their theoretical calculations as if this approximate model were true. This is a fairly common strategy when dealing with measurement error modeling and regression calibration, see for example Thurston et al. [5]. In the parameter subspace where such an approximation is close to the correct model, this allows a formal theoretical analysis, which we provide.
  • In this problem, the primary interest is in a single parameter, the regression slope for the main explanatory variable. All other parameters are nuisance parameters. We can thus write the likelihood function of the data as L(ζ, β1), where ζ is the vector of nuisance parameters and β1 is the parameter of interest. In all such likelihood problems, the maximum likelihood estimator of β1 is unchanged when a 1–1 reparameterization of ζ is done: the same is true for standard regression calibration. Unfortunately, the suggested two-stage estimator is not invariant to such 1–1 reparameterizations of the nuisance parameters, and can be made to vary quite widely in a given finite set of data. We suggest a modified two-stage estimator which, in our extensive simulations, provides better finite sample results than the original method.
  • We show that if the regression calibration approximation were exact, the two-stage estimator would have the same asymptotic variance as the maximum likelihood estimator. Extensive numerical work shows that, asymptotically, the maximum likelihood estimator and modified two-stage estimator are typically only slightly less variable than the one based on standard regression calibration, at least in the practically important parameter space where the regression calibration model provides a good approximation.
  • As noted above, the theoretical calculations in Weller et al. and in the present work are based on an approximate model. It is well known that this approximation can be a very poor one if the main explanatory variable has a relatively large effect on the outcome and/or measurement error is relatively large. In such cases, as is well demonstrated by the results in Weller et al., the model based on the regression calibration assumption is badly misspecified leading to severely inconsistent estimators with very different asymptotic and finite-sample properties. Comparing asymptotic variances of estimators derived under the regression calibration assumption may be irrelevant in this situation. However, these are exactly the cases where Weller et al. emphasize the superiority of their two stage method over the standard one, disregarding involved biases and finite-sample properties. We therefore provide finite-sample simulation results which suggest that standard regression calibration is, in most cases, at least as good as the two-stage method in terms of bias and mean squared error.
1.2. Basic model assumptions
The setup of the problem follows Weller et al. We have primary study data consisting of (Yip, Wip, Zip) for i = 1, …, np. We also have data on an external validation study (Wiv, Xiv, Ziv) for i = 1, …, nv, where X is scalar. There are two parts to the model. The first is the relationship between X and (W, Z). It is assumed that the regression of X on (W, Z) is linear and homoscedastic in both the primary and the validation data with identical parameters. In symbols, equation M1, E(εiv) = E(εiv) = 0 and equation M2. As it turns out, all the methods to be discussed in this paper are numerically unchanged if we add the assumption that εiv and εip are also normally distributed. We write this as follows.
  • Assumption 1 
    The mean of X given (W, Z) equals γ0+ WTγ1+ZTγ2 and the variance of X given (W, Z) equals equation M3 in both the primary and the validation data.
    The second part of the model is the relationship between Y and (X, W, Z) in the primary data. Here it is typical to make two assumptions.
  • Assumption 2 
    The measurement error is non-differential, i.e., W is independent of Y given (X, Z).
  • Assumption 3 
    In the primary data, the distribution of Y given (X, Z) follows an additive model, with density or mass function f (Y,β0 +X β1 + ZTβ2, ζ), where ζ is a vector of nuisance parameters.
1.3. Regression calibration approximation and approaches to comparisons
Our goal is to compare the two-stage method and the regression calibration method. There are three ways that this can be done.
  • First, one can do simulation studies, something we take up in Section 3.2.
  • The second approach is theoretical, involving asymptotic theory, but it involves an approximation as used by Weller et al. [1] among many other authors. The approximation, called the regression calibration approximation, is stated below as Assumption 4. It allows for a clear asymptotic comparison among the methods, see Appendix A.
  • The third approach is more formal than the other two and does not depend on the regression calibration approximation. Based only on Assumptions 1–3, one can develop the asymptotic distribution of the two-stage and regression calibration methods. We show how to do this for logistic regression in Appendix B. There are however problems with this in terms of numerical comparisons.
    • The two methods generally estimate different quantities, see Section 1.4 for cases that they do estimate the same thing.
    • Both estimators converge to quantities that are solutions to nonlinear integral equations which depend on the joint distribution of (X, W, Z). This makes numerical comparison of their biases and variances difficult because in each case, an integral equation must be solved.
    • Interestingly, we show in Appendix B that if we make the regression calibration approximation, the results of the developed correct asymptotic theory reduce to those in Appendix A.
The regression calibration approximation, which we state here as Assumption 4, has to do with the relationship between Y and (W, Z), i.e., the distribution of the observed data.
Assumption 4: The distribution of Y given (W, Z) is the same as that of Y given (X, Z), except X is replaced by its regression on (W, Z), E (X | W, Z), and the intercept is changed [3]. In symbols, it is assumed that the density/mass function of Y given (W, Z) is f (Y, α0 + β1 E (X | W, Z) +ZTβ2, ζ) =f (Y, α0 + WTα1+ZTα2, ζ), where α1 = β1γ1 and α2 = β2 + β1 γ2.
1.4. Discussion of assumptions
It is important to emphasize that, for most nonlinear regressions including logistic regression, Assumptions 1–4 are contradictory. In particular, Assumptions 1–3 are the ones usually made in the analysis, from which not only does Assumption 4 not follow, but, strictly speaking, it is incompatible with them. There are at least three notable exceptions.
  • If β1 = 0, then the regression calibration approximation holds exactly and Assumption 4 is true. In this case, both the two-stage method and standard regression calibration method estimate β1 = 0 asymptotically. It is a consequence of the work in Appendix A that the two estimates are asymptotically equivalent when β1 = 0.
  • The second case occurs when Y given (X, Z) follows a homoscedastic linear model, in which case Assumption 4 is strictly true. As it turns out, in this case the two-stage method and the regression calibration method are asymptotically equivalent, as we showin Appendix A.
  • The third is probit regression, where the probability of disease is pr(Y =1| X, Z) = Φ(β0 +X β1+ZTβ2). If X given (W, Z) is normally distributed, then the observed data follow the probit model
    equation M4
    where equation M5 and equation M6. In this probit case, maximum likelihood for the observed data of Y given (W, Z), the two-stage method and the standard regression calibration method all estimate equation M7, and our asymptotic results apply. Of course, if equation M8 is large, then all methods are therefore badly biased.
Outside of these important cases, we are left with a dilemma about the relevance of the asymptotic theory based on Assumption 4 for comparing the two methods. There are three possible ways to argue the relevance of the asymptotic theory based on Assumption 4, all of which are equivalent for our calculations but are very different conceptually. The first is the conventional approach of staying with the usually made Assumptions 1–3 and using Assumption 4 as an approximation. In cases where equation M9 is relatively small, this approximation is quite good and the theoretical results are relevant. One should be careful though and not extend this theory to the full parametric space. In the absence of relevant theoretical results when Assumption 4 provides a poor approximation, the statistical properties of the estimated slope β1 should be investigated by extensive finite-sample simulations.
The second possibility is to make Assumptions 1, 2 and 4. In this case, the true model remains unknown, but one assumes that β1 has some approximate relevance to the parameter of interest, although we do not know exactly how to specify this relationship. The third possibility is to make Assumptions 1, 3 and 4. In this case, one does not assume that error is non-differential, but rather that it is such that Assumptions 1, 3 and 4 hold. Although all three possibilities are interesting, and our work applies to all of them, we will concentrate on the first one which is usually used in practical applications.
1.5. Summary and outline
In summary, a generalization of the framework of Weller et al. based on Assumptions 1–3 and using Assumption 4 as an approximation is as follows.
equation M10
(1)
equation M11
(2)
equation M12
(3)
The goal is to estimate β1. Model (1)-(3) represents a likelihood formulation for the observed data. This is a new insight, and it suggests immediately that the data can be fit, efficiently, by maximum likelihood to get an asymptotically efficient estimate of the crucial parameter β1.
An outline of this paper is as follows. In Section 2, we describe the two-stage method of Weller et al. generalized to our context (1)–(3). We then state our three main results:(a) in the linear model for Y given (X, Z), all three methods are asymptotically equivalent; (b) maximum likelihood for (1)–(3) and the two-stage method are asymptotically equivalent; and (c) standard regression calibration is not in general asymptotically equivalent to maximum likelihood for (1)–(3).
In Section 3.1, we take up the issue of how much more efficient the two-stage method is theoretically compared to standard regression calibration, by studying asymptotic variances over a broad range of interesting problems, and via simulations. In contrast to Weller et al., who use a single data set and compute estimated asymptotic variances, we find that in practice, the difference in asymptotic efficiency between the two methods typically appears to be slight, although there are some exceptions. As mentioned above, under usually made Assumptions 1–3, model (1)–(3) is merely an approximation; this suggests that a finite-sample simulation study that compares the standard regression calibration and two-stage methods be undertaken. The results of the simulation study are reported in Section 3.2. and Appendix C. We conclude with a discussion in Section 4. A sketch of the technical details when Assumption 4 approximately holds is given in Appendix A, while in Appendix B we derive the exact asymptotic distributions for logistic regression, and show that they reduce to the results in Appendix A when the regression calibration approximation is invoked.
2.1. Methods
Maximum likelihood for model (1)–(3) is straightforward. Standard regression calibration consists of first estimating (γ1, γ2) using the validation data only, substituting them into (3), assuming that γ1 is fixed to this estimated value and known, and then performing maximum likelihood to estimate the unknown slope β1.
The two-stage method is somewhat more complex. Again, Weller et al. first estimate (γ1, γ2) using the validation data to obtain ([gamma with circumflex]1, [gamma with circumflex]2). Then estimation of β1 proceeds in two stages. At stage 1, the following model is fit to the data:
equation M13
(4)
leading to estimates [alpha]1 of the “naive” slopes. Because of model (3), one has β1eone = α1//γ1, where // denotes element-wise division and where eone = (1, 1,…, 1)T. At stage 2, Weller et al. estimate β1 by forming the weighted linear combination τT(α1//γ1) and choosing the weights to minimize the asymptotic variance of this linear combination subject to the constraint that equation M14.
2.2. Non-invariance of the two-stage method
Let Λ be any full rank square matrix whose number of columns is the dimension of W. Consider using the transformed vector of surrogates equation M15 and equation M16 based on the same full-rank linear transformation Λ applied to both the validation and primary data. It is easy to see theoretically that both maximum likelihood and regression calibration give exactly the same estimate of β1 whether (Wiv, Wip) or ( equation M17) are used. For example, with regression calibration, dropping precisely measured covariates for simplicity, if [gamma with circumflex]1 is the estimate of γ1 based on the linear regression of Xiv on Wiv, then equation M18 is the corresponding estimate if equation M19 is used, and the regression calibration predictions are equivalent, i.e., equation M20. This is as it should be: a 1–1 transformation of nuisance parameters should not affect the estimate of the main parameter of interest.
In contrast, the two-stage estimator enjoys no such property, and indeed in some settings it can be made to take on almost any value by varying Λ. For example, at
can be found primary and validation data sets when W is two dimensional. We generated 10,000 full rank linear transformations Λ by having their entries be standard normal random variables. The two-stage estimator for the transformed W-variables over these 10,000 simulated full rank linear transformations had a minimum = −36.11, a maximum = 2.43, a mean = 0.79 and a 98% confidence interval of [0.75, 0.83].
We suggest linearly transforming W in the validation study to have mean zero and identity covariance matrix, applying the same transformation to the primary data, and then applying the two-stage method. In our extensive simulations, we have found that the modified two-stage method provides better results than the original two-stage estimator.
2.3. Assumptions and results
We show that the two-stage method achieves optimal asymptotic properties under the regression calibration assumption.
Here is our main result, the proof of which is given in Appendix A. The asymptotic variances for maximum likelihood, regression calibration and the two-stage method are given in Appendix A at equations (A.3), (A.4) and (A.5), respectively.
Theorem 1
Under model (1)-(3), the maximum likelihood and standard regression calibration estimators have the same asymptotic variances if Y given (X, Z) follows a homoscedastic linear model or if β1 = 0, but not in general. Under model (1)-(3), the two-stage estimator of β1 is asymptotically equivalent to maximum likelihood if there are no nuisance parameters ζ or if Y given (W, Z) follows a generalized linear model in the primary data.
3.1. Asymptotic efficiency comparisons under the regression calibration assumption
3.2. Background
While, at least technically, regression calibration is not efficient for approximate model (1)-(3), it remains to be seen how inefficient it is in realistic contexts. The asymptotic variances for all considered estimators are given at (A.3), (A.4) and (A.5) in Appendix A.
To compare these variances, we consider the case of logistic regression. Without loss of generality one can set E(X) = 0 and equation M21, and because maximum likelihood, regression calibration and the modified two-stage method for estimating β1 are all invariant to full-rank linear transformations of W, without loss of generality one can assume that cov(W) = I. Assuming that equation M22, the only items that one needs to vary are (a) the distribution of W; (b) ( equation M23, γ) subject to equation M24; (c) the odds ratio of interest; and (d) the rate of disease in the source population. The latter two determine the underlying logistic intercept β0 and the slope β1.
3.1.2. Normal and Chi-square W
First, W was set to be bivariate, with mean zero and covariance matrix the identity, but generated from (a) a bivariate normal distribution and then rescaled; (b) a bivariate chi-square distribution with one degree of freedom and then rescaled; and (c) a bivariate chi-square distribution with one degree of freedom with a random sign, resulting in a mixture distribution. Without loss of generality, we defined X = WT γ+ ε to have mean zero and variance one. We did this by first generating equation M25. We then generated γ* = Normal(0, I2), and defined equation M26. Then we set the true logistic intercept β0 and the true logistic slope β1 so that the odds ratio of moving from the 10th to the 90th percentile of the distribution of X was in the set (2.0, 3.0, 4.0, 5.0, 6.0), and the probability of disease in the source population was in the set (0.075, 0.150,…, 0.450). Hence, we considered relatively rare to common diseases, and relatively modest to large true odds ratios, along with fairly extreme distributions for W. We varied the ratio of the validation sample size to the primary sample size, cF = nv/np, to be (0.01, 0.10). For each of these settings, we searched over 1,000cases of ( equation M27, γ). When computing asymptotic variances, we set equation M28, the usual regression calibration approximation. Monte-Carlo was used to calculate integrals where required.
The results are striking: in all cases, the asymptotic variance of the standard regression calibration estimator of β1 was less than 1.5% larger than the asymptotic variance of the maximum likelihood estimator, and the two-stage estimator.
3.1.3. Lognormal and Gamma W
We also considered lognormal cases. We let Σv be a correlation matrix with correlation ρ in the off-diagonal element. We generated 1,000 settings where we generated −0.95 ≤ ρ ≤ 0.95, equation M29 and γ so that var(X) =1, see Section 3.1.2. We set equation M30, where β0 and β1 are defined below so that the rate of disease and the odds ratio are the same as in Section 3.1.2.
The W data were generated as follows. We describe here the theoretical setting, although simulation was actually used in all cases to compute the integrals involved in the asymptotic variances. First, W* were bivariate independent lognormal random variables such that log(W*) are bivariate independent normal variables with mean zero and variance ζ = Uniform(0,1). We defined W** to be the standardized version of W*, with both independent variables having mean zero and variance one, then made W to be a linear transformation of W** that had covariance matrix Σv and exactly mean zero. In symbols, let Z be a bivariate standard normal random variable. Then
equation M31
The lognormal distribution is notorious for generating a few extremely wild observations. For example, in Figures 1 and and22 we give box plots of 5 randomly generated data sets of size 10,000 from the lognormal and gamma distributions, each with the coefficient of variation equal to 1.0. These are plotted on the same scale to make the point that the lognormal distribution is much more prone to wild observations than is the gamma distribution. Thus, the distribution that we actually used for W was a truncated one, truncated at 4×IQR above the 75th or below the 25th percentile, where IQR is the interquartile range. Using this rule of thumb, in the lognormal case we removed 1.18% of the distribution of W, while for the gamma distribution the corresponding figure is 0.24%. We then generated X from the remaining observations. We set β1 so that the odds ratio for moving from the 10th to the 90th percentile of the distribution of X would be a predefined value O R, and then formed β0 so that the probability pr(Y =1) of disease in the population would be equal to a predefined value π.
Figure 1
Figure 1
Box plots of five randomly generated lognormal data sets of size 10,000 with coefficient of variation = 1.0.
Figure 2
Figure 2
Box plots of five randomly generated gamma data sets of size 10,000 with coefficient of variation = 1.0.
In most interesting situations for practical applications, with odds ratios OR ≤ 6, the worst case for regression calibration was when π = 0.075, O R = 6, and nv/np = 0.01. In the 1,000 cases, the maximum asymptotic variance inflation of standard regression calibration was 13%, and more than 90% of these test cases had less than a 2% variance inflation. If nv/np = 0.10, there was at worst a 1% variance inflation. In other words, there was little or no asymptotic variance inflation in these cases. We have done calculations without truncating the lognormal distribution, and the variance inflation becomes larger, sometimes much larger, reaching 33% in extreme cases, but these situations seem to be unrealistic in practice because of the wild observations. In the gamma case, there was essentially no variance inflation.
We have also checked the lognormal case should the odds ratio for moving from the 10th to the 90th percentile of the distribution of X reach a (practically impossible in our experience) value of O R = 45. The maximum variance inflation of standard regression calibration was 83% with truncation and 147% without. As was mentioned previously, in all those cases, the regression calibration approximation leads to a severely misspecified model. The resulting slopes estimated by both methods involve large biases and should not be used for measurement error adjustment.
3.2. Finite-sample simulations
All results up to now have been asymptotic results based on the regression calibration approximation in equations (1)–(3). In this section, we use simulations to examine the finite sample properties of the two-stage and regression calibration methods when the regression calibration assumption given by equation (3) is violated. We especially concentrate on cases when equation (3) does not hold even approximately, and the relevance of asymptotic variance calculations under this assumption becomes questionable.
3.2.1. Normal W
We let Y be a binary disease outcome, X = true exposure, W1 = first surrogate measurement of X and W2 = second surrogate measurement of X. We performed 5 sets of 10,000 simulations. For each simulation, we generated two independent data sets, a main study and a validation study. In the main study, each subject had variables Y, W1 and W2. In the validation study, each subject had the variables X, W1 and W2. The sample size for the main study was 10,000, while the sample size for the calibration study was 1,000. We generated the data according to the following models. First, (W1, W2) was bivariate normal with zero means, common variance equal to 1.0, and correlation ρ. We generated equation M32 and set X = γ0 + γ1W1 + γ2W2 + ε. Finally, we generated Y as a binary variable with pr(Y =1| X) =H (β0 +β1 X), where H (·) is the logistic distribution function. In all simulations, we set ρ = 0.5, γ0 = 0, and equation M33, thus ensuring that var(X) =1. Then we set the true logistic intercept β0 and the true logistic slope β1 so that the odds ratio for increasing X by two standard deviations of its distribution was equal to a predefined value OR, and the probability of disease in the source population was equal to a predefined value π. Table 1 shows the values of the parameters for each simulation. For each simulation, we estimated β1 using standard regression calibration and both the original and modified version of the two-stage method of Weller et al. Table 2 shows the results of the simulations.
Table 1
Table 1
Parameters used in simulation study with normal W.
Table 2
Table 2
Results of simulation study with normal W. Means, standard deviations (SD) and root mean squared errors (RMSE) of the estimators of β1.
The results are instructive. In cases 1 – 4, the empirical variance of standard regression calibration is equal to or smaller than that of the unmodified two-stage method with essentially no difference between standard regression calibration and the modified two-stage method. In the most extreme case 5, there is a 3% and 9% difference in variance, with the original and modified two-stage methods, respectively, being less variable. However, in all cases, standard regression calibration is less biased than either two-stage method and, as a result, has smaller root mean squared error. When one deals with true odds ratios greater than 20, cases 4–5, the regression calibration approximation can be particularly poor, and finite-sample properties of the considered estimators become especially relevant.
3.2.2. Lognormal W
We also ran simulations when surrogate measurements had a lognormal distribution. In addition to a normally distributed error ε in the linear regression of X on W, at the request of one of the reviewers, we also considered the case of a lognormal error, both homoscedastic and heteroscedastic that depended on W. The description of those simulations and their results are given in Appendix C. All results are reasonably in line with those in the previous section for the normal case.
3.2.3. Categorical W
We used the data described by Weller et al. to understand the properties of the methods when W1, W2 are categorical. Following Section 6 in Weller et al,. we considered the case when W = (W1, W2) had in total 3 distinct values. We set W to be categorical with 3 levels according to probabilities (0.4,0.3,0.3), and let W1, W2 be the dummy variable indicators according to the two highest levels, with the induced correlation of ρ = −3/7. We set γ0 = 0.127, and equation M34. Table 3 shows the values of the parameters for each simulation. In the simulations, the primary sample size was 10,000, the validation sample size was 1,000, and 10,000 simulated data sets were generated at each configuration. The results given in Table 4 are quantitatively similar to those for continuously distributed surrogate values.
Table 3
Table 3
Parameters used in simulation study with categorical W.
Table 4
Table 4
Results of simulation study with categorical W. Means, standard deviations (SD) and root mean squared errors (RMSE) of the estimators of β1.
We have shown that the two-stage method of Weller et al. is not regular in the sense that the finite sample estimates of the odds ratio can vary widely if different 1–1 transformations of nuisance parameters are made. Despite this, the two-stage method is asymptotically equivalent to maximum likelihood for approximate model (1)–(3).
Our most striking conclusion is that, under the regression calibration approximation, the two-stage method is asymptotically less or no more variable than standard regression calibration in a wide variety of circumstances.
We performed numerical calculations that somewhat temper the main conclusion. We found in most cases that the two-stage and standard regression calibration methods were very nearly equivalent in asymptotic variance, exceptions being when the odds ratio was very and perhaps unrealistically large. It would be interesting to find practical cases with modest odds ratios and well behaved distributions of W when the variance gain from using the two-stage method was substantial, but we were unable to find such cases.
We have also conducted an extensive simulation study to compare finite-sample properties of the estimators, with a special attention to cases where the regression calibration assumption does not provide a good approximation and the relevance of the asymptotic variances of the estimators under this approximation becomes questionable. We have found that only in such extreme cases the standard regression calibration estimator had a slightly greater finite-sample variance than the two-stage estimator. Yet, in all cases, standard regression calibration was less biased and, as a result, had smaller root mean squared error.
Under the regression calibration approximation, the two-stage method is asymptotically efficient for the problems we have described. It would be interesting to see how the method could be applied to other problems and assess its asymptotic properties in these situations. One of the properties of standard regression calibration, or maximum likelihood in the regression calibration context, is that it generalizes easily to cases where there are more than one predictor measured with error, and the surrogates for each such predictor may or may not be the same. It is possible to define a two-stage procedure for such cases, although it is not straightforward and the resulting estimators remain not invariant to reparameterization of nuisance parameters Whether they have optimality properties that match the asymptotic properties of the regression calibration maximum likelihood is unknown.
To illustrate this, suppose that there are K = 2 variables measured with error and equation M35 and equation M36. This is a case that one set of covariates, W2 appear in each regression, but the others, W1 and W3, appear in only one of the regressions. Then write equation M37, β=(β1, β2)T, so that
equation M38
It is not clear what α is here, and how one can take advantage of the fact that there is knowledge that W3 is not a predictor of X1 given (W1, W2). While it is possible to devise estimators, and weight them, their properties are unclear; maximum likelihood and standard regression calibration have no such lack of clarity.
Acknowledgments
We thank Dr. Weller for providing us with the data used in their original analysis. Carroll’s research was supported by a grant from the National Cancer Institute (CA57030) and by Award Number KUS-CI-016-04, made by King Abdullah University of Science and Technology (KAUST).
Appendix A: Sketch of Technical Arguments Under the Regression Calibration Model
A.1. Simplifications
It is readily seen that the scores for estimating (α, β1, β2, γ0, γ1, γ2, ζ) in model (3) are orthogonal to the score for estimating σε, so we will without loss of generality assume equation M52 is known. Also, without loss of generality, we center Xiv so that E(Xiv) = 0. We also center Wiv by subtracting a constant vector so that E(Wiv) = 0, and subtract the same constant from Wip: the same procedure is applied to Ziv and Zip. This means that γ0 = 0 and we can ignore it in our development.
It suffices to prove the main result in the special case that there are no nuisance parameters ζ. In generalized linear models there may be a nuisance parameter ζ, but the loglikelihood is of the form
equation M53
However, the score for ζ is orthogonal to the score for (α, β1, γ1, β2). To see this well-known fact, we note that the score for (α, β1, γ1, β2) is
equation M54
where the superscript means first derivative. However, because this is a generalized linear model, E{G(Y, W, Z)| W, Z} = 0. Differentiating the score for (α, β1, γ1, β2) with respect to ζ yields −G (Y, W, Z)/ζ2, so it too has mean zero. This means that the Fisher information matrix is block-diagonal, that the scores are orthogonal, and hence in generalized linear models the asymptotic theory is the same as if δ were known.
It suffices to work in the special case with no accurately measured covariates Z in the model: the extra generality is accomplished by somewhat more complicated definitions of what are terms A and B defined below. In this case, we replace γ1 by γ and β1 by β. The log-likelihood score then takes the form [ell] (y, v) = [partial differential]log{f (y, v)}/[partial differential]v with v = + α+ βWTγ. Hence, we do the detailed calculations for the model
equation M55
We also use standard facts about likelihood theory, e.g., that the square of the score has expectation that is −1 times the expectation of the derivative of the score, so that, for example, we define in the primary study
equation M56
These are easily confirmed by conditioning on W. Define nv/np = cF.
A.2. Some Matrix Formulae
We use repeatedly the fact that if
equation M57
then
equation M58
where equation M59, and equation M60. It is also worth remembering that
equation M61
(A.1)
A.3. Asymptotic Theory for Maximum Likelihood
The score for (α, γ, β) is given as
equation M62
It follows with routine calculations, and remembering that cov(Wiv) = Σv, that
equation M63
Let equation M64. It follows from likelihood theory that
equation M65
(A.2)
We want to compute equation M66. Define
equation M67
Then
equation M68
(A.3)
equation M69
(A.4)
A.4. Asymptotic Theory for Regression Calibration
The estimating equation for (α, γ, β) is
equation M70
where
equation M71
Define M to be the covariance matrix of the left hand side of the last expansion so that
equation M72
Then we have that
equation M73
(A.4)
A.5. Asymptotic Theory for the Two-Stage Method
This method estimates γ in the validation data only, and thus equation M74. Hence, equation M75. The method then fits the likelihood equation M76, and the loglikelihood score for this is just equation M77. This has information matrix
equation M78
Hence, the asymptotic covariance matrix of equation M79 is the lower right corner submatrix of equation M80, and thus
equation M81
The two-stage method considers a weighted linear combination of {diag([gamma with circumflex])}−1[alpha]1. Now, by algebra,
equation M82
remembering that β(1,…,1)T ={diag(γ)}−1α1. Hence,
equation M83
If eone = (1, 1, …, 1)T, then the optimal weight is equation M84, from which it follows that equation M85, and thus
equation M86
However,
equation M87
and hence
equation M88
(A.5)
A.6. Asymptotic Equivalence of MLE and Two-Stage Methods
Here we show that the maximum likelihood and two-stage estimators of β are asymptotically equivalent. Write equation M89 and N =CBA−1BT. Studying (A.3) and (A.4), we have to show that Ω = Λ where for completeness
equation M90
equation M91
Then
equation M92
where
equation M93,
a11 = A−1{1+β2BTa22BA−1}, a12 = −βA−1BTa22 and a21 = −βa22BA−1.
equation M94
so that the MLE and the two-stage estimator are asymptotically equivalent. This completes the argument.
A.7. Asymptotic Non-Efficiency of Standard Regression Calibration
Here we show that the regression calibration estimator is not generally asymptotically equivalent to the other estimators. To do this, we simply need to exhibit a special case, and the simplest such case arises when Σv = I and B = 0. When B = 0, for the MLE,
equation M95
and hence
equation M96
the lastfollowing from our more general result.
For standard regression calibration, we get a different answer. Let D* and M* be the lower non-diagonal versions of Dand M, so that
equation M97
Then
equation M98
Direct calculations show that
equation M99
This is in general different from the maximum likelihood and two-stage variance. It does coincide with them in the case of linear regression, where equation M100, and in the case when β = 0.
Table A1
Parameters used in simulation study with lognormal W.
Caseγ1γ2β0β1 equation M46R2 * equation M47Odds Ratio exp(2β1)Probability of Disease in Population
10.50.3−30.500.510.490.132.70.05
20.40.2−20.750.720.280.414.50.14
30.30.1−21.000.870.130.877.40.16
40.20.1−11.250.930.071.4512.20.32
50.10.0501.500.980.022.2120.10.50
* equation M48 is the R-squared of the regression of X on (W1, W2).
Table A2
Results of simulation study with lognormal W and normal error (ε). Means, standard deviations (SD) and root mean squared errors (RMSE) of the estimators of β1.
Caseβ1 equation M49Original Two-Stage MethodModified Two-Stage MethodRegression Calibration Method
10.500.13Mean0.4900.4910.493
SD0.0220.0220.022
RMSE0.0240.0240.023
20.750.41Mean0.6920.6970.701
SD0.0430.0420.043
RMSE0.0720.0680.065
31.000.87Mean0.8420.8600.871
SD0.0900.0830.083
RMSE0.1820.1630.153
41.251.45Mean0.9330.9600.984
SD0.1300.1230.127
RMSE0.3430.3150.295
51.502.21Mean0.9131.0101.070
SD0.3410.3230.337
RMSE0.6790.5870.546
Table A3
Results of simulation study with lognormal W and homoscedastic lognormal error (ε). Means, standard deviations (SD) and root mean squared errors (RMSE) of the estimators of β1.
Caseβ1 equation M50Original Two-Stage MethodModified Two-Stage MethodRegression Calibration Method
10.500.13Mean0.4850.4860.487
SD0.0220.0220.022
RMSE0.0270.0260.026
20.750.41Mean0.6840.6890.693
SD0.0460.0430.043
RMSE0.0800.0750.071
31.000.87Mean0.8380.8540.864
SD0.0930.0830.081
RMSE0.1870.1680.158
41.251.45Mean1.0041.0331.056
SD0.1530.1400.139
RMSE0.2900.2580.239
51.502.21Mean1.0861.1841.260
SD0.3770.3590.380
RMSE0.5600.4780.449
Table A4
Results of simulation study with lognormal W and heteroscedastic lognormal error (ε). Means, standard deviations (SD) and root mean squared errors (RMSE) of the estimators of β1.
Caseβ1 equation M51Original Two-Stage MethodModified Two-Stage MethodRegression Calibration Method
10.500.13Mean0.4580.4730.479
SD0.1160.1150.109
RMSE0.1230.1180.111
20.750.41Mean0.5660.6020.616
SD0.1870.1960.180
RMSE0.2620.2460.224
31.000.87Mean0.7000.7570.783
SD0.2930.2980.292
RMSE0.4190.3850.364
41.251.45Mean0.7330.7930.825
SD0.3560.3580.355
RMSE0.6280.5810.554
51.502.21Mean0.7800.8720.911
SD0.4500.4620.466
RMSE0.8490.7800.751
Appendix B: Asymptotic Theory for Logistic Regression
The purpose of this appendixis to provide a precise asymptotic analysis in the case of logistic regression, clearly stating where the regression calibration is used. In cases that the regression calibration is used, we will replace = by [equals, single dot above].
Let H (x) =exp(x)/{1+exp(x)} be the logistic distribution function. Our model is that in the validation and primary studies, which are of size np and nv, respectively,
equation M101
We define cF = nv/np and use the fact that equation M102.
B.1. Two-stage method
B.1.1. Basic Theory
The two-stage method makes the assumption that there is a unique (α0, α1) such that
equation M103
(B.1)
In this case, (α0,α1) are the parameters that logistic regression of Y on W in the primary study actually converges to in probability. It should be noted that, precisely,
equation M104
(B.2)
In other word, (B.2) says that the observed data do not follow a logistic regression exactly. Logistic regression of Y on W in the primary study means solving the estimating equation
equation M105
Define equation M106, and make the further definitions
equation M107
where H (1)(x) =H (x){1 − H (x)}. It is a consequence of standard estimating equation theory that
equation M108
This means that
equation M109
The two-stage method considers a weighted linear combination of {diag(γ)}−1[alpha]1. We have that
equation M110
where
equation M111
If eone = (1, 1, …, 1)T, then the optimal weighted combination in terms of variance of {diag([gamma with circumflex])}−1[alpha]1has weights equation M112. The two stage estimator of β then is [beta]TS = ωT{diag(γ)}−1[alpha]1, and we have that
equation M113
(B.3)
Of course, an implication of (B.3) is that the two-stage estimator converges to ωT{diag(γ)}−1α1).
B.1.2. Implications of the regression calibration approximation
Equation (B.3) makes it clear that the two-stage method actually converges to ωT{diag(γ)}−1α1). However, ω = ω(Q1, Q2, γ, α1) depends on (Q1, Q2, γ, α1). Since (α0, α1) are the solutions to the integral equations (B.1), which in turn depend on the distributions of (X, W) in the primary study, the asymptotic variance (B.4) is not particularly useful for comparison with regression calibration.
However, there are important simplifications if one makes the same regression calibration approximations that are made by Weller, et al.[1], namely that for some βTS,
equation M114
(B.4)
equation M115
(B.5)
where in both cases we have invoked the term [equals, single dot above] to reflect the regression calibration approximation. In this case, it is readily seen from (B.4) that Q1 [equals, single dot above] Q2, ωT{diag(γ)}− 1α1 [equals, single dot above] βTS, and then the regression calibration approximation to the asymptotic variance of the two-stage method reduces to the result in the Appendix A. In particular, define equation M116. Then we have that Q3 [equals, single dot above] (CBA−1BT)−1, which agrees with the result in Appendix A. Using (B.5), we also have
equation M117
which also agrees with the results in Appendix A.
B.2. Regression calibration method
We will distinguish here between the regression calibration method and the regression calibration approximation.
B.2.1. Basic theory
The regression calibration method makes the assumption that there is a unique (β0, βRC) such that
equation M118
(B.6)
Then ([beta]0, [beta]1) solve
equation M119
(B.7)
This means that the estimating equations are
equation M120
where if equation M121,
equation M122
Letting vip =YipH (β0 +WipγβRC), and defining
equation M123
it follows that if e1 = (0, 0, …, 0, 1)T,
equation M124
(B.8)
B.2.2. Implications of the regression calibration approximation
Like (B.3), equation (B.8) is not particularly useful for making comparisons among various methods because of the need to solve the integral equation (B.6) that in turn depends on the distribution of Wip, etc. However, the equivalent regression calibration approximation to (B.4)–(B.5) is that
equation M125
(B.9)
Under (B.4)–(B.5) and (B.9), βTS [equals, single dot above] βRC. Equation (B.9) implies that equation M126. In this case, D1 and D2 simplify to
equation M127
equation M128
exactly as described in Appendix A.
Appendix C: Simulations with Lognormal W
In this appendix, we present results of three sets of simulations where surrogates W1 and W2 had bivariate lognormal distributions. We first generated (Z1, Z2) as bivariate standard normal with correlation ρ*. Let equation M129, μ* = exp(σ2/2) and equation M130, so that the coefficient of variation is σ*/μ* = 2. By simulation, we found ρ* such that corr{exp(σZ1), exp(σZ2)}= ρ. Then (W1, W2) defined by Wj ={exp(σZj) − μ*}/σ*, j =1,2 has the desired bivariate lognormal distribution with meanzero, vari ance one, and correlation ρ.
As with other simulations described in the main text, we generated X as X = γ0 + γ1W1 +γ2W2 + ε, where ε was uncorrelated with (W1, W2) and E(ε) = 0, equation M131, thus ensuring E (X) = 0, var(X) =1. As before, we generated Y as a binary variable with pr(Y =1| X) =H (β0 +β1 X), where H (·) is the logistic distribution function. Then we set the true logistic intercept β0 and the true logistic slope β1 so that the odds ratio for increasing X by two standard deviations of its distribution was equal to a predefined value OR, and the probability of disease in the source population was equal to a predefined value π.
In the first set of simulations, we generated ε to be normal and independent of (W1, W2). In the second set, we set ε = δE(δ), where δ was independent of (W1, W2) and had a lognormal distribution with equation M132 and CV = 1. In the third set of simulations, we considered a case of heteroscedastic error ε = (γ1W1 +γ2W2)× (δ −1), δ was independent of (W1, W2), lognormal with mean E(δ) =1, variance equation M133, and CV(δ) = var(δ).
In all simulations, we set ρ = 0.5, γ0 = 0, nv =1,000 and np =100,000. Other parameters for each simulation are given in Table A1. Tables A2A4 present the results for simulation sets 1 to 3, respectively.
1. Weller EA, Milton DK, Eisen EA, Spiegelman D. Regression calibration for logistic regression with multiple surrogates for one exposure. Journal of Statistical Planning and Inference. 2007;137:449–461.
2. Carroll RJ, Stefanski LA. Approximate quasi-likelihood estimation in models with surrogate predictors. Journal of the American Statistical Association. 1990;85:652–663.
3. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement Error in Nonlinear Models: A Modern Perspective. 2. Chapman and Hall CRC Press; Boca Raton, FL: 2006.
4. Rosner B, Willett WC, Spiegelman D. Correction of logistic regression relative risks and confidence intervals for systematic within-person measurement error. Statistics in Medicine. 1989;8:1051–1070. [PubMed]
5. Thurston SW, Spiegelman D, Ruppert D. Equivalence of regression calibration methods in main study/external validation study designs. Journal of Statistical Planning and Inference 2003. 2003;113:527–539.