|Home | About | Journals | Submit | Contact Us | Français|
The problem of covariate measurement error with heteroscedastic measurement error variance is considered. Standard regression calibration assumes that the measurement error has a homoscedastic measurement error variance. An estimator is proposed to correct regression coefficients for covariate measurement error with heteroscedastic variance. Point and interval estimates are derived. Validation data containing the gold standard must be available. This estimator is a closed-form correction of the uncorrected primary regression coefficients, which may be of logistic or Cox proportional hazards model form, and is closely related to the version of regression calibration developed by Rosner et al. (1990). The primary regression model can include multiple covariates measured without error. The use of these estimators is illustrated in two data sets, one taken from occupational epidemiology (the ACE study) and one taken from nutritional epidemiology (the Nurses’ Health Study). In both cases, although there was evidence of moderate heteroscedasticity, there was little difference in estimation or inference using this new procedure compared to standard regression calibration. It is shown theoretically that unless the relative risk is large or measurement error severe, standard regression calibration approximations will typically be adequate, even with moderate heteroscedasticity in the measurement error model variance. In a detailed simulation study, standard regression calibration performed either as well as or better than the new estimator. When the disease is rare and the errors normally distributed, or when measurement error is moderate, standard regression calibration remains the method of choice.
When validation data are available, the non-iterative regression calibration (RC) method can be used to obtain approximately consistent point and interval linear, logistic and Cox regression model parameter estimates with measurement error in one or more continuous covariates, provided certain assumptions are satisfied (Prentice 1982; Fuller 1987; Armstrong, Whittemore et al. 1989; Rosner, Willett et al. 1989; Rosner, Spiegelman et al. 1990; Rosner, Spiegelman et al. 1992; Spiegelman, McDermott et al. 1997; Wang, Hsu et al. 1997; Carroll, Ruppert et al. 2006). In Rosner et al.’s version of regression calibration, a standard multiple regression model is used to estimate the uncorrected point and interval estimates of the parameters, and these are bias-corrected using an estimate of the slopes from a linear measurement error model for the true exposure given the surrogate obtained from the validation data. A Science Citation Index search in April, 2010 on the three papers by Rosner et al.(1989,1990,1992) yielded 627 citations, approximately half of which were published in epidemiology and medical journals. Many of these appear to have involved direct applications of the methodology to original analyses of data.
This method, along with others proposed previously for the covariate measurement error problem, including SIMEX (Cook and Stefanski 1995) and the application of empirical process theory to survival data analysis (Huang and Wang 2000), require homoscedastic measurement error variance. The distributions of many environmental and dietary intake variables are often highly skewed, raising concern that the homoscedasticity requirement of regression calibration and other methods may often be unrealistic for important potential applications. For example, in a recent publication, moderate heteroscedasticity was observed in the measurement error model for exposure to airborne soot and nitrogen dioxide (Van Roosbroeck S, Li R et al. 2008), and in a recently published nutritional epidemiology study, moderate heteroscedasticity was observed in the measurement error models for average daily alcohol intake in a pooled analysis of renal cancer incidence (Lee, Hunter et al. 2007). Two other motivating examples of measurement error model heteroscedasticity are studied in depth in this paper, one looking at health symptoms in relation to exposure to anti-neoplastic drugs (Spiegelman and Valanis 1998), and one looking at alcohol intake in relation to breast cancer incidence (Willett, Stampfer et al. 1987). Thus, there is a need to extend regression calibration to apply when the requirement for homoscedasticity of the measurement error model variance is violated, and to compare this extension to several less restrictive iterative approaches, including maximum likelihood and semi-parametric efficient estimating equations (Robins, Hsieh et al. 1995). This paper addresses this need. In Section 2, Rosner et al.’s version of regression calibration is reviewed and the new estimator, RCH, is derived. Next, we consider the case when the true exposure variable, x, is unobservable, but replicate measures for the unbiased surrogate, X, are available in a reliability sub-study. Then, when heteroscedastic classical measurement error is assumed, we show that it is not possible to obtain a version of the new estimator. Iterative alternatives appropriate to this setting, including maximum likelihood, approximate maximum likelihood, and semiparametric efficient, are also presented in this section. In Section 3, these estimators are applied to two illustrative examples, one from occupational epidemiology and one from nutritional epidemiology, in which moderate heteroscedasticity is evident. An extensive simulation study of the new estimator, standard regression calibration and the iterative approaches is presented in Section 4. In Section 5, the results of the illustrative examples, of the analytic work and of the simulation study are summarized and recommendations are made.
The parameter of interest is β1 from the generalized linear model
where Y is the outcome of interest, g[A] is a link function which linearizes the conditional mean function in the covariates and U is a vector of covariates measured without error. Substituting the covariate measured with error, X, for x, the uncorrected point and interval estimates of effect,
are adjusted for measurement error in a one-step procedure. When g[A] = E(Y | X, U), regression calibration is applied to a linear regression model (Fuller 1987). When g[A] = logit[E(Y | X, U)], regression calibration is applied to a logistic regression model (Rosner, Spiegelman et al. 1990; Rosner, Spiegelman et al. 1992). When
where I (t) is the incidence rate at time t, then when the disease is rare, regression calibration can be applied to a Cox proportional hazards regression model (Prentice 1982; Spiegelman, McDermott et al. 1997; Wang, Hsu et al. 1997; Hu, Tsiatis et al. 1998). The application of regression calibration to these three basic models, all of which are used widely in epidemiology, was unified with a special focus on interval estimation and computing in SAS (Spiegelman, McDermott et al. 1997). Application of measurement error methods to data requires that the main study, which contains data (Yi, Xi, Ui), i = 1,…, n1, is supplemented by an external validation study, which contains data (Xi, xi, Ui), i = n1 + 1,…, n1 + n2. Because it is difficult and expensive to validate Xi with xi, the validation study is typically much smaller than the main study, i.e. n1 >> n2.
The point and interval estimates of effect can be corrected for measurement error using Rosner et al.’s formulas (Rosner, Spiegelman et al. 1990)
where and are estimated by fitting (1) to the main study data, (Y,X,U). The first row of , denoted , and are obtained from fitting the linear regression model to the validation data
under the assumption that
Appendix 1 of Rosner, Spiegelman et al. (1990) gives the construction of from and equation (A7) of the same paper gives . Assumption (4) is the homoscedasticity assumption, the relaxation of which is the focus of this manuscript. Regression calibration has been presented by others for use in a variety of settings (Prentice 1982; Fuller 1987; Armstrong, Whittemore et al. 1989; Rosner, Spiegelman et al. 1990; Rosner, Spiegelman et al. 1992; Spiegelman, McDermott et al. 1997; Wang, Hsu et al. 1997; Carroll, Ruppert et al. 2006). All versions of the regression calibration method assume that measurement error is non-differential with respect to the response variable, Y, i.e.
where f(·) is a density function. In addition to the assumptions specified by (1), (3) and (4), in the case of univariate x, the key additional requirement for approximate unbiasedness of the regression calibration estimator when g[E(Y | x, U)] in (1) is logistic is that either is small, or Pr(Y = 1 | x, U) is small and f(x | X, U) is normal (Kuha 1994).
It is sometimes the case with biological variables that heteroscedasticity is evident over the observed range of the data; when this occurs, it typically takes the form that the spread increases with the level. Sometimes but by no means always, the variance can be stabilized by log transforming the data but this solution can be undesirable when the variable in question is to be used as a predictor variable in a regression model, and the scientific hypothesis focuses on measuring the relationship of the variable in its originally observed scale to the outcome variable of interest. In this paper, we assume that an empirically verifiable linear model for the mean, (3), holds, but assumption (4), that of constant variance of the measurement error model del for x | X, U, is untenable for the observed data. Instead, it may appear from the available validation data that
is a more reasonable model for the variance, where h (Xi, Ui) is some function of the covariates which induces heteroscedasticity in the measurement error model. In what follows, we derive an extension to the standard regression calibration estimator given above in (2) and to its multivariate counterpart, in which the constant residual variance requirement given by (4) is eliminated.
Standard regression calibration, which assumes homoscedastic measurement error, can be derived by a first order Taylor series expansion around the naïve likelihood in which the mis-measured exposure is treated as if there were no error. In what follows, the second order expansion is developed. By adding the additional term, measurement error model heteroscedasticity can be accommodated, albeit with an unavoidably more complex estimator. In another approach to deriving the regression calibration estimator, the approximate logistic likelihood is derived under the assumptions of normal residual measurement error and rare disease. We develop these approaches in more detail in what follows below, to provide the formal derivation for the new estimator.
We assume the following mean and variance model for x given (X, U) follows
dim(x)=q, and h (X, U) is a q × q function. Then, the distribution of Y | X, U for logistic regression with rare disease and multivariate normality is
(Kuha 1994). A similar result for general mean-variance models, using a second-order Taylor series expansion about E(x|X,U) (i.e. a small measurement error approximation) was obtained for scalar x by Carroll et al. (Carroll, Ruppert et al. 2006). For dim(x)>1, Carroll and Stefanski (Carroll and Stefanski 1990) presented an analogous result, but omitted the detailed derivation. Similar approximations have been given for the relative risk function in X(t), a vector-valued, possibly time-varying covariate, in survival data analysis (Prentice 1982), and for linear regression (Spiegelman, McDermott et al. 1997) where (6) is exact.
Insight can be gained by inspecting the form of (6) with no covariates, U, and for scalar x, denoted x,
Figure 1 shows the theoretical relationship between log[E(Y)] under the rare disease assumption and x given by (1) (solid line), and the approximate induced relationships between log[E(Y)] and X for h(X)=X (dotted line), h(X)=X2 (short dashed line), and h(X)=1, the homoscedastic case, (long dashed lines), when plugging in the measurement error model parameters and logistic regression parameters estimated from the Nurses’ Health Study data on breast cancer incidence in relation to alcohol consumption discussed in Section 3.2. It is evident that when h(X)=X, the uncorrected estimator is likely to under-estimate β1 but when h(X)=X2, over-estimation could occur. If h(X)=X, (6) simplifies to a linear model in X as a function of the measurement error model and logistic regression model parameters (see Section 2.1), and it can be seen that the standard regression calibration estimator is also likely to over-estimate β1 (long dashed line).
To simplify notation, was rewritten as . Then, solving for β1 in the two terms multiplying X and h(X) in (6), two estimators for β1,
where sign(t) = 1 if t is positive and −1 if t is negative, are available from the uncorrected regression of Y on X and h(X). Of course, the approximate likelihood (6) can be directly fit to the data using an iterative approach, jointly estimating all parameters simultaneously. Alternatively, we suggest a procedure for obtaining RCH,1 which can be constructed from standard software tools and used in routine analysis in what follows. Both approaches will provide consistent estimators, but the maximum likelihood estimator will, of course, be more efficient asymptotically. Later, we will compare the behavior of these estimators in a simulation study.
The new method is as follows:
The asymptotically minimum variance weights and their derivation, as well as the derivation of the formula for the variance of RCH,1, are given in Appendix 1. RCH,2, the measurement-error-corrected estimator of the coefficients corresponding to U, has form
Its asymptotic variance is derived in Appendix 2, along with Cov(RCH,1, RCH,2). As is evident from (6), this estimator can only be used for models with scalar x. For multivariate x with heteroscedastic covariance for x | X, U, a term is added to the model for E(Y|X, U) equal to
which is a complicated function of the q elements of β1 that cannot be used to uniquely solve for the q elements of .
A result similar in form to (6) was given by Prentice (Prentice 1982) for the proportional hazards regression model on a vector X(t), in which one or more of the elements of X(t) are measured with error, and where the conditional distribution of x(t) given X(t) is multivariate normal with a linear mean in X(t) and variance Σ(X(t)). Hence, under the conditions specified by Prentice, RCH can be applied to Cox regression models with an arbitrary number of perfectly measured covariates and a single covariate measured with error, just as discussed above for logistic regression.
Under either small measurement error or a rare disease with normal errors for x|X, U, it is evident that if either or is small, for scalar x, the standard regression calibration estimator will be approximately valid, even in the presence of heteroscedasticity of the variance of the measurement error model, since the third term in the exponent of (6) vanishes.
Standard regression calibration is valid even if instead of x, the gold standard, an unbiased imperfect gold standard, , is observed in the validation study, where and i is a random, mean-zero error term. For example, doubly labeled water is considered an unbiased biomarker for total energy intake (Subar, Kipnis et al. 2003). However, doubly labeled water is measured with some random-within person, unbiased error (Preis, Spiegelman et al. 2010). Here as well, for scalar x, as long as and , γ and α′ can be consistently estimated from fitting the model
to the data. However,
where , so Step 2 in the procedure for obtaining RCH,1 given above will not provide a valid estimate of h(Xi, Ui) σ2, the quantity needed for RCH,1. Without replicate data within subjects, followed by additional calculations not described here, RCH is not applicable when an alloyed gold standard is observed.
Another important special case emerges when h(X)=X. Then, from equation (6) we obtain
where is the regression coefficient for X obtained from fitting a logistic regression model of Y on X and U, 11 is as above in (7), and RCH,2 is as above in (8). It is evident from (6) that the function h(X)=X +b, where b is a positive constant, can also be considered here. In this case, b is absorbed by the intercept, , and does not need to be considered any further in the measurement error correction procedure. The variance of RCH,1 in the special case was again derived using the multivariate delta method (Bishop, Fienberg et al. 1975). It should be noted that when β1γ1 < 0, RCH,1 will provide a consistent estimate of β1 only when multivariate delta method (Bishop, Fienberg et al. 1975). It should be noted that when β1γ1 < 0, RCH,1 will provide a consistent estimate of β1 only when (Appendix 3). However, under (3), as long as Corr(x,X) is positive, as would usually be the case when X is an surrogate for x, γ1 will be positive, and whenever it is anticipated that β1 might be negative, x and X can be recoded to avoid this.
There are an important set of problems where x is unobservable, i.e. no “gold standard” exists. In some of these cases, the measurement error model
is considered reasonable, where n2i is the number of replicates for subject i. Examples of data which are believed to follow this model include blood pressure, serum biomarkers such as cholesterol and its subfractions, hormones, and vitamin concentrations. When model (10) holds, Rosner et al.’s (1992) version of the regression calibration method applies, in a procedure similar to that described earlier. In the univariate case, estimates of the reliability coefficient, Var(x)/Var(X), and its variance are substituted for 1 and its variance in equation (3). Multivariate generalizations have been given (Rosner, Spiegelman et al. 1992). When the third component of (10) does not hold, i.e. when , an extension of regression calibration to accommodate this expansion of the model is needed. We rederived the likelihood for scalar x and no additional covariates U to attempt to identify an appropriate estimator in this simple case, now assuming that Xij | xi ~ N(xi, h(xi)σ2), , i=1,…,n2, j=1,…,nR, where nR is the number of replicates for each subject i, and obtained
Neither f(xi,Xij) nor f(xi|Xij) has a Gaussian structure, and the method of completing the square to obtain a closed-form solution for f(Yi|Xi) used to derive (6) is therefore not applicable. It is unlikely that a closed-form for
exists when and ε is Gaussian. If functions h(xi) are found which fit the data at hand, it is unlikely that the resulting expression for f(Yi|Xi) will be of a form such that the link function, g[E(Yi|Xi)], can be found with g equal to the logistic or log transformations. Without these features, even RCH, the scalar version of RCH, does not exist.
Iterative methods can also be applied to the problem of heteroscedastic measurement error in regression covariates. These methods will typically have the advantage of relaxing some of the assumptions required by closed-form methods, such as (1), (3) and (4), but will have the disadvantage of computational complexity which is a barrier to use in applications. In a main study/external validation study design for a binomial outcome and a Gaussian measurement error model, the log likelihood of the data is equal to
We assume here that (α′, γ, ) is the normal density with mean given by (3) and variance given by (4), and the probability that Yi = 1 given (Xi, Ui) is
The numerical method given by (Crouch and Spiegelman 1990) can be used to evaluate f3 (Yi | Xi, Ui; β, α′, γ, σ2), or code could be developed which makes use of standard statistical software through the non-linear optimizer that is provided. Generalizing Kuha’s derivation (Kuha 1994) to the heteroscedastic measurement error variance case, by using a second-order Taylor series expansion of logit[Pr(Y=1)] with respect to β1 around β1 = 0 under the rare disease assumption and with x | X, U normally distributed with linear mean and variance, f3 (Yi | Xi, Ui; β, α′, γ, σ2) can be approximated as follows:
Likelihoods based on both the exact, ML, and approximate, aML, expressions for f3 (Yi | Xi, Ui; β, α′, γ, σ2) given above were fit to the data in Section 4 and studied by simulation in Section 5.
A consistent, semi-parametric efficient estimator was proposed by (Robins, Hsieh et al. 1995), where (Begun, Hall et al. 1983) defined the semi-parametric efficiency bound as the smallest possible variance obtained by any estimator which is consistent for β over all possible measurement error models for x | X, U. Because the model for x | X, U is unknown a priori and validation study sample sizes are often small, methods that are robust to mis-specification of the model for x | X, U are desirable. The semi-parametric fully efficient estimator of this class requires a computationally cumbersome non-parametric fit of the density of x | X, U – instead, we investigated the semi-parametric locally efficient version, which is consistent even when the density of x | X, U is mis-specified, and uses a parametric fit for x | X, U. As this parametric density, fit and empirically verified in the validation study, approaches the true density for x | X, U, full semi-parametric efficiency is approached. Details on this estimator can be found in (Spiegelman and Casella 1997), Appendix 1, where, here f2 (x | X, U; α′, γ, σ2) was taken to be the normal density with mean given by (3) and variance given by (4), and α′, γ, and σ2 are estimated in the validation study, as usual. This estimator, SPLE, was fit to the data in Section 4 and studied by simulation in Section 5 along with the others.
(Valanis, Vollmer et al. 1993) described a cross-sectional study of acute health effects from occupational chemotherapeutics exposure in 675 hospital pharmacists. The research objective is estimation and inference about the prevalence odds ratio for acute health effects related to chemotherapeutics exposure. Here, we will focus on fever prevalence in relation to exposure. There were 110 cases of fever. Average weekly chemotherapeutics exposure (X) was self-reported on questionnaire; in a sub-sample of 56 pharmacists on-site drug mixing diaries were kept for 1–2 weeks (x). The correlation between these two methods of exposure assessment was 0.70. The correlation between the predicted values from the linear measurement error model for diary data (x), conditional upon the questionnaire data (X) and other model covariates, and the absolute value of the residuals (Carroll and Ruppert 1988) was 0.21, indicative of moderate heteroscedasticity (Figure 2).
These analyses were adjusted for three covariates (U): age in years, shift work (1 if night or rotating shift, 0 if day shift), and employed by a community hospital (1 if yes, 0 if no). In previously published analysis of these data, the uncorrected prevalence odds ratio for a top to bottom quintile contrast in number of drugs mixed per day, corresponding to an increment of 52 drugs/day, was 1.08 (95% confidence interval (CI) 1.02–1.15) and the regression calibration estimate of the same quantity, ignoring the observed heteroscedasticity in the measurement error model, was 1.22 (95% CI 1.05–1.45) (Spiegelman and Valanis 1998). Using maximum likelihood methods with a gamma measurement error model that was empirically verified in the ACE validation study and allows for heteroscedasticity which depends on covariates in an arbitrary manner, the estimated prevalence ratio was 1.17 (95% CI 1.04–1.26) (Spiegelman and Casella 1997). Note that the odds ratio will be a good approximation for the prevalence ratio when the outcome is rare and when the prevalence ratio is near one.
We first needed to identify a form for h(X), and we searched over the class of functions h(X)=(X+b)p. We sought to find the transformation in this class for which the correlation between the absolute value of the weighted residuals from the weighted least squares regression of x on X and the other covariates is nearest to zero, where the weights are h −1(X). As apparent from Table 1, the optimal transformation on was h(X) = X, leading us to consider the special case of RCH discussed in Section 2.2. However, in order to identify the best transformation over the class of functions h(X)=(X+b)p as shown in Table 1, we needed a non-zero value for b when p=0 for the log function, since the data contain zero values. In addition, to fit the SPLE method, a parametric working model for x | X needs to be specified. Here, we used a working normal with mean given by equation (3) and in the footnote of Table 1, and Var(x | X) = Xσ2. Because the normal likelihood is undefined when X = 0 and Var(X) = Xσ2, the function h(X)=X+b was again needed with a small positive value of b = 0.1. As previously noted, the measurement error correction procedure is invariant to the choice of constant in the function h(X)=X+b, and the results in Table 2 were unchanged to three digits of precision for b = 0.
Table 2 gives the results of this analysis, where we find that the results which take into account the apparent heteroscedasticity in the measurement error model are virtually unchanged from the standard regression calibration results which ignore this feature of the data. Both the exact and approximate maximum likelihood estimates and the semi-parametric locally efficient estimate were similar to the regression calibration estimates. Application of the semi-parametric estimator led to a large efficiency loss. The regression calibration estimators took trivially more CPU time than the uncorrected estimator, and the iterative estimators took 10 to 100 fold more CPU time than these. With a small data set such as here, none of these CPU times were prohibitive.
Willett et al. described a prospective study of the relationship between breast cancer incidence and moderate alcohol consumption among 89,538 U.S. women aged 34–59 who were followed for 4 years beginning in 1980 (Willett, Stampfer et al. 1987). After updating the original data to include 8 years of follow-up, 1466 cases occurred during this study period. Alcohol intake was calculated from three questions about the consumption of beer, wine and liquor that were included on a 61-item food frequency questionnaire data. These data were validated in a sub-sample of 173 women with four one-week weighed diet records (Willett, Sampson et al. 1985). The correlation between these two methods of exposure assessment was 0.85. For average daily alcohol intake (g/day), the correlation between the absolute value of the regression residuals and the predicted values from the linear measurement error model for the diet record data (x) conditional upon the food frequency data (X) and other model covariates was 0.44 (Figure 3).
A logistic regression model, , was fit to the data, where Yi is the probability that participant i has received a diagnosis of breast cancer between the time of the 1980 questionnaire return and January 1,1989, Xi is the covariate measured with error, alcohol intake, and Ui is the vector of other covariates, taken to be perfectly measured: age, age at menarche, menopausal status, age at first live birth, history of benign breast disease, family history of breast cancer, body mass index, and parity. The uncorrected and regression calibration point and interval estimates of the rate ratio from a Cox regression model, corresponding to a 12 g/day increase in alcohol intake, in the Nurses’ Health Study based on this same 8 year follow-up period (1466 cases) were given previously as 1.09 (95% CI 1.04–1.14) and 1.15 (95% CI 1.05–1.26), respectively (Spiegelman, McDermott et al. 1997).
Although h(X) = log(X + 1.005) was an option for minimizing the correlation between the weighted residuals and the predicteds from the measurement error model fit in the validation study, transformations of the exposure of interest are not desirable for substantive interpretability unless absolutely necessary for statistical reasons. Thus, the optimal transformation of X for minimizing the correlation between the residuals and the predicteds was again a linear one (Table 3), and the special case of RCH discussed earlier was again applied.
A small non-zero value for the parameter b was used for the reasons given above in Section 3.1. There was a slight attenuation in RCH relative to RC but substantive interpretation of the data remained unchanged (Table 4). Both the exact and approximate maximum likelihood estimators gave results similar to those given by RCH, and CPU time was not elevated compared to the regression calibration methods. The semi-parametric efficient estimate was also similar to the others, although slightly larger and less powerful. The excessive CPU time needed for calculations makes this estimator impractical for use with such a large data set.
We studied the small sample behavior of all estimators discussed in Section 2 by simulation. We designed the simulation study to follow the Nurses’ Health Study described in Section 3.2, and varied the validation study sample size (n2=173 or 346), the parameter p in h(X)=Xp (p=0.5,1,2), the extent of measurement error as expressed by Corr(x,X)=(0.4, 0.6, 0.8), and the extent of heteroscedasticity as expressed by Corr(ê2, ) between 0.2 and 0.6. Following the Nurses’ Health Study, we set (β0, β1, α′) = (−2.633, 0.01, 3.29) and (γ, σ2) were varied to obtain the desired Corr(x,X) and Corr(ê2, ). To simulate the main study, n1=8953 X’s were chosen with replacement from the Nurses’ Health study and used to generate n1 x’s from a normal distribution with mean given by (3) and variance given by (5) at the specified values of (α′, γ, σ2). Then, n1 values of Y were generated from a Bernoulli with parameter given by (1) with the logistic link function, at the fixed values for (β0, β1). To simulate the validation study, n2 X’s were chosen with replacement from the Nurses’ Health Study validation study and used to generate n2 x’s from a normal distribution with mean given by (3) and variance given by (5) at the specified values of (α′, γ, σ2). Five hundred simulations were run for each design point.
Results from the simulation study are given in Table 5. The most striking result is that standard regression calibration, RC, does as well or better than all the other estimators considered, including maximum likelihood methods, in all scenarios studied. This was true even when measurement error or heteroscedasticity were severe, and from the point of view of both bias and coverage probability. The positive bias predicted in Figure 1 was apparent when h(X)=X2, especially when measurement error was severe. The new estimator, RCH, performed poorly in many instances, and never did materially better than standard regression calibration. In an attempt to improve finite sample coverage probability, we compared the asymptotic Wald confidence interval coverage probability to the empirical and normalized bootstrapped confidence interval coverage probability, where the latter is the bootstrapped mean of the estimator plus or minus the squared root of the bootstrapped variance times 1.96. Five hundred bootstrapped samples were generated for each of 500 simulations. Bootstrapping the confidence intervals for RCH solved the problem of its poor empirical coverage probability. The average bias of the bootstrapped estimator for RCH remained larger than that for RC in nearly all cases considered, often substantially so, although when bias was calculated as the average of the median bias or median of the median bias, the differences decreased considerably and the overall bias dropped to an acceptable value in many cases (data not shown). In the main and validation study sample sizes considered in this simulation study, the asymptotic optimality of the maximum likelihood estimator and its approximation were not evident.
It is of interest what practical gain is likely be derived from the application of the more robust estimator, SPLE. As can be seen in Table 5, the percent bias, mean square error and coverage probability of SPLE were acceptable in some of the cases considered, and but no better than those obtained from the standard regression calibration. When measurement error was severe, SPLE had considerably more bias than standard regression calibration, although its coverage probability was correct. In all cases considered, the computational burden was at least an order of magnitude greater than the maximum likelihood methods and two order of magnitudes greater than the standard regression calibration method.
Although the derivation given in equation (6) has appeared previously, this estimator, RCH, is novel. The new estimator developed in this paper, RCH, has several attractive features. Standard software can be used for the primary regression analysis, which is subsequently corrected for bias in a non-iterative calculation at the end. It relaxes one of the most restrictive assumptions of regression calibration, that of homoscedasticity of the measurement error model. This new estimator allows for a multivariate vector of variables measured without error in the primary regression model, along with a scalar variable measured with error, a setting which will be applicable in many situations, including in the examples motivating the research. In two illustrative examples, the new estimator performed well. Although theoretical justification was provided for use of this methodology with the Cox model for rare outcomes and normally distributed error (e.g. (Prentice 1982)), further work could be done to study the behavior of the method in this setting, particularly under departures from rare outcome and from normal errors. In addition, extensions to Poisson regression models with covariate measurement error should also be considered (Fung and Krewski 1999; Kukush, Schneeweis et al. 2004).
It took longer to find aML than ML in both illustrative examples. Thus, the approximate maximum likelihood estimator should not be considered further.
Although the marginal distributions of x and X were sharply skewed in both the ACE data (for number of drugs mixed per week) and the NHS (for grams of alcohol per day), the distributions of the standardized residuals from the models for E(x|X) were symmetrized to a large extent. Marginal distributions should not, in general, be used as evidence for or against heteroscedasticity in a conditional variance. The correlations between the absolute value of the measurement error model regression residuals and the predicted values from these regressions were moderate. As shown in Section 2, if either or σ2 is small, the convergent value of RCH is likely to be approximately equal to the convergent value of RC. These conditions appear to have been met in both applications, as in both cases, there was little difference in estimates or inference obtained from the two methods. The approximations used to derive RCH assume either rare disease and multivariate normality for the true x given the surrogate, or “small” . These assumptions are empirically verifiable, and in the applications in epidemiology which motivated this research, they are verified. In cases where these assumptions are unreasonable, further research is needed to derive suitable estimators.
An extensive simulation study of RCH and the other estimators was conducted, based upon a data structure motivated by the Nurses’ Health Study data considered in this paper. The results clearly indicated that under the scenarios studied, RCH was outperformed by the other estimators. Standard regression calibration, RC, performed well, as did both the likelihood approximation, aML, and the exact maximum likelihood estimator, ML. The good performance of RC under heteroscedasticity was observed previously (Spiegelman, Rosner et al. 2000). In the simulation study presented in that paper, two covariates in the logistic regression for Y on x, one with moderate error and the other with considerable error, were dichotomized. By transforming these continuous covariates to Bernoullis, where the variance is a function of the mean, heteroscedasticity was induced. Findings from that study indicated that RC was approximately valid for estimation and inference, at least when the validation study size was doubled or more from the original 173. Results from this simulation study suggest that validation study sizes larger than those typically found in nutritional epidemiology are needed when measurement error heteroscedasticity is anticipated.
The coverage probability of RCH was below the desired value in nearly all cases considered. This family of heteroscedastic variance functions is similar in spirit to the Box-Cox family of transformations, where, following the initial suggestion of Box and Cox (Box and Cox 1964), standard practice in applied statistics is to estimate the parameter first and then treat this estimate as fixed when estimating the remaining parameters. We did similarly here. This two-step procedure was followed in the computation of the iterative estimators as well. It has been well established that in fitting standard models such as (1) with a heteroscedastic variance function such as (5), the asymptotic distribution of is the same whether p in (5) is known or estimated from the data (Carroll and Ruppert 1988). However, in the presence of covariate measurement error, the current situation is somewhat different from the one considered by Carroll and Ruppert, and it is possible that accounting for the estimation of the parameter p, which determines the measurement error variance function within the class considered, could have improved the coverage probabilities. Further research could investigate both the theoretical and empirical properties of the two-step approach in this setting, derive the asymptotic variance of RCH with variability of taken into account, examine its variance empirically through simulation studies, and compare to a joint estimation and inference approach for iterative estimators.
ranged over several orders of magnitude, from 0.0000091 to 0.00882 in the simulations shown n in Table 5, where is the average h(X, U) in the data. Kuha had previously suggested that the bias in RC in the homoscedastic measurement error case would be low when was less than 0.5 (Kuha 1994), but we found in another implementation of the regression calibration estimator with homoscedastic measurement error that 0.5 was a far too liberal a criterion, with unacceptable levels of bias when was much smaller than 0.5 (Weller, Milton et al. 2007). In the present simulation study, the Spearman correlation of with bias in RC was 0.52 and with bias in RCH was 0.22. It appears that is not a reliable metric for identifying conditions under which these estimators may be biased in finite samples in the case of heteroscedastic measurement error.
The maximum likelihood approaches considered here are strictly valid only when the distribution for x|X,U is Gaussian. We did not study the performance of the maximum likelihood estimators when a mis-specified likelihood is fit due to incorrect distributional assumptions about x|X,U or when the proper likelihood is derived under alternative distributions for x|X,U. It is indeed likely that the maximum likelihood estimator would exhibit less finite sample bias in a main study/internal validation study design, since this design is substantially more informative that a main study/external validation study design (Spiegelman and Gray 1991). When the disease is rare, as in NHS and, typically, other cohort studies of cancer and other chronic disease endpoints, the external validation study is the design by default. Although the semi-parametric locally efficient estimator, SPLE, is consistent under any distribution for x|X,U, this estimator had large finite sample bias in some scenarios studied by simulation, especially as measurement error increased. In addition, this estimator is difficult to program and computationally burdensome to calculate. RC can be readily computed using SAS macros obtained at http://www.hsph.harvard.edu/faculty/spiegelman/blinplus.html and http://www.hsph.harvard.edu/faculty/spiegelman/relibpls8.html.
In summary, as predicted by the theory, standard regression calibration is adequate when measurement error is not severe or the mis-measured covariate effect is moderate, even when heteroscedasticity is severe. It may be worthwhile to recall that covariate measurement error leads to an exponentially increasing information loss given the same sample size and all other conditions held constant; for example, it is well known that under classical homoscedastic measurement error, the effective sample size is decreased by the factor ρx,X2 (Fleiss 1986; White, Armstrong et al. 2006). In the most extreme case of measurement error considered in our simulation study, where ρx,X = 0.4, this would lead to more than an 80% reduction in the effective sample size. It appears from the simulation study that heteroscedasticity leads to an even greater information loss for the same amount of measurement error, since RC was found to have much better performance in similarly designed simulation studies in prior publications when measurement error was homoscedastic (Rosner, Willett et al. 1989; Carroll and Wand 1991). When measurement error heteroscedasticity is suspected, larger validation studies than are typically the current norm in epidemiology are needed.
Let V1 = Var(11), V2 = Var(12), and V12 = Cov(11, 12). By the multivariate delta method,
Under the heteroscedastic measurement error model (5), when γ is estimated using weighted linear regression with weights h(Xi), i=1,…,n2,
where is an n2 × (1 + dim(U)) matrix with columns , i=1,…,n2 (Seber 1977). Again by the multivariate delta method,
since by arguments analogous to those given in Appendix 1 of Spiegelman et al. (Spiegelman, Carroll et al. 2001), is asymptotically 0, and
All other covariance terms are 0, since by the Gauss-Markov theorem, Cov(1, 2)= 0, and by arguments analogous to those given in Appendix 1 of Spiegelman et al. (Spiegelman, Carroll et al. 2001),
The estimates of V1, V2, and V12 are obtained by substituting the parameters for their estimates for the uncorrected primary regression of Y on [X,h(X),U] in the main study, and the weighted linear regression of x on X and U in the validation study. Likewise, the variances of these parameter estimates are obtained from these same regression analyses, and substituted into the expressions for V1, V2, and V12 to obtain estimates of these quantities.
Now, to derive the optimal weight, w1, for (1), we need to minimize
with respect to w1, since w2=1-w1, in order to obtain a consistent estimator. The single extremum of this function, subject to the constraint, is at . This is a global minimum as long as V1 + V2 > 2V12, a condition which is likely to be met in most situations. With these optimal weights, the variance of RCH,1 is thus
To estimate Var(RCH,1), estimates of V1, V2, and V12 are obtained from the fit of (1) to the main study data, Var (1) is estimated by plugging 2 into the expression for Var(1) given above, and Var(2) = 2σ4 / [n - dim(γ)-1].
By the multivariate delta method,
where and are obtained from the corresponding elements and sub-matrices of
and , and are obtained from the uncorrected logistic regression of Y on (X,U). Finally,
Note that and equation (9) simplifies to
Without loss of generality, hats indicating estimators are suppressed throughout this proof.
RCH,1 is consistent for β1 when
In addition, RCH,1 is consistent for β1 when
To see this, first consider the first pair of conditions. When condition 2) holds the sign function yields +1 and condition 1) results in |γ1 + σ2 β1 |= γ1 + σ2 β1. Then, the numerator simplifies to σ2 β1, and so βRCH, 1 = β1, We now consider the possible combinations of β1 and γ1,
Similarly it is easy to see that both 3) and 4) are always true. Hence, when β1 < 0, γ1 < 0, RCH,1 is consistent for β1.
When β1 < 0, γ1 > 0 RCH,1 is inconsistent for β1 when or σ2 |β1| < γ1. When β1 > 0, γ1 < 0, RCH,1 is inconsistent for β1 when or σ2 β1 < |γ1|. Hence, when β1 * γ1 < 0, RCH,1 is consistent for β1 when .
Author Notes: This study was supported by NIH grants CA50597, NIH ES09411, and NIH CA74112.
Donna Spiegelman, Harvard School of Public Health.
Roger Logan, Harvard School of Public Health.
Douglas Grove, Fred Hutchinson Cancer Research Center.