PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of ijbiostatThe International Journal of BiostatisticsThe International Journal of BiostatisticsSubmit to The International Journal of BiostatisticsSubscribe
 
Int J Biostat. Jan 1, 2011; 7(1): Article 4.
Published online Jan 6, 2011. doi:  10.2202/1557-4679.1259
PMCID: PMC3404553
Regression Calibration with Heteroscedastic Error Variance
Donna Spiegelman, Roger Logan, and Douglas Grove
Donna Spiegelman, Harvard School of Public Health;
The problem of covariate measurement error with heteroscedastic measurement error variance is considered. Standard regression calibration assumes that the measurement error has a homoscedastic measurement error variance. An estimator is proposed to correct regression coefficients for covariate measurement error with heteroscedastic variance. Point and interval estimates are derived. Validation data containing the gold standard must be available. This estimator is a closed-form correction of the uncorrected primary regression coefficients, which may be of logistic or Cox proportional hazards model form, and is closely related to the version of regression calibration developed by Rosner et al. (1990). The primary regression model can include multiple covariates measured without error. The use of these estimators is illustrated in two data sets, one taken from occupational epidemiology (the ACE study) and one taken from nutritional epidemiology (the Nurses’ Health Study). In both cases, although there was evidence of moderate heteroscedasticity, there was little difference in estimation or inference using this new procedure compared to standard regression calibration. It is shown theoretically that unless the relative risk is large or measurement error severe, standard regression calibration approximations will typically be adequate, even with moderate heteroscedasticity in the measurement error model variance. In a detailed simulation study, standard regression calibration performed either as well as or better than the new estimator. When the disease is rare and the errors normally distributed, or when measurement error is moderate, standard regression calibration remains the method of choice.
Keywords: measurement error, logistic regression, heteroscedasticity, regression calibration
When validation data are available, the non-iterative regression calibration (RC) method can be used to obtain approximately consistent point and interval linear, logistic and Cox regression model parameter estimates with measurement error in one or more continuous covariates, provided certain assumptions are satisfied (Prentice 1982; Fuller 1987; Armstrong, Whittemore et al. 1989; Rosner, Willett et al. 1989; Rosner, Spiegelman et al. 1990; Rosner, Spiegelman et al. 1992; Spiegelman, McDermott et al. 1997; Wang, Hsu et al. 1997; Carroll, Ruppert et al. 2006). In Rosner et al.’s version of regression calibration, a standard multiple regression model is used to estimate the uncorrected point and interval estimates of the parameters, and these are bias-corrected using an estimate of the slopes from a linear measurement error model for the true exposure given the surrogate obtained from the validation data. A Science Citation Index search in April, 2010 on the three papers by Rosner et al.(1989,1990,1992) yielded 627 citations, approximately half of which were published in epidemiology and medical journals. Many of these appear to have involved direct applications of the methodology to original analyses of data.
This method, along with others proposed previously for the covariate measurement error problem, including SIMEX (Cook and Stefanski 1995) and the application of empirical process theory to survival data analysis (Huang and Wang 2000), require homoscedastic measurement error variance. The distributions of many environmental and dietary intake variables are often highly skewed, raising concern that the homoscedasticity requirement of regression calibration and other methods may often be unrealistic for important potential applications. For example, in a recent publication, moderate heteroscedasticity was observed in the measurement error model for exposure to airborne soot and nitrogen dioxide (Van Roosbroeck S, Li R et al. 2008), and in a recently published nutritional epidemiology study, moderate heteroscedasticity was observed in the measurement error models for average daily alcohol intake in a pooled analysis of renal cancer incidence (Lee, Hunter et al. 2007). Two other motivating examples of measurement error model heteroscedasticity are studied in depth in this paper, one looking at health symptoms in relation to exposure to anti-neoplastic drugs (Spiegelman and Valanis 1998), and one looking at alcohol intake in relation to breast cancer incidence (Willett, Stampfer et al. 1987). Thus, there is a need to extend regression calibration to apply when the requirement for homoscedasticity of the measurement error model variance is violated, and to compare this extension to several less restrictive iterative approaches, including maximum likelihood and semi-parametric efficient estimating equations (Robins, Hsieh et al. 1995). This paper addresses this need. In Section 2, Rosner et al.’s version of regression calibration is reviewed and the new estimator, [beta]RCH, is derived. Next, we consider the case when the true exposure variable, x, is unobservable, but replicate measures for the unbiased surrogate, X, are available in a reliability sub-study. Then, when heteroscedastic classical measurement error is assumed, we show that it is not possible to obtain a version of the new estimator. Iterative alternatives appropriate to this setting, including maximum likelihood, approximate maximum likelihood, and semiparametric efficient, are also presented in this section. In Section 3, these estimators are applied to two illustrative examples, one from occupational epidemiology and one from nutritional epidemiology, in which moderate heteroscedasticity is evident. An extensive simulation study of the new estimator, standard regression calibration and the iterative approaches is presented in Section 4. In Section 5, the results of the illustrative examples, of the analytic work and of the simulation study are summarized and recommendations are made.
The parameter of interest is β1 from the generalized linear model
equation M1
(1)
where Y is the outcome of interest, g[A] is a link function which linearizes the conditional mean function in the covariates and U is a vector of covariates measured without error. Substituting the covariate measured with error, X, for x, the uncorrected point and interval estimates of effect,
equation M2
are adjusted for measurement error in a one-step procedure. When g[A] = E(Y | X, U), regression calibration is applied to a linear regression model (Fuller 1987). When g[A] = logit[E(Y | X, U)], regression calibration is applied to a logistic regression model (Rosner, Spiegelman et al. 1990; Rosner, Spiegelman et al. 1992). When
equation M3
where I (t) is the incidence rate at time t, then when the disease is rare, regression calibration can be applied to a Cox proportional hazards regression model (Prentice 1982; Spiegelman, McDermott et al. 1997; Wang, Hsu et al. 1997; Hu, Tsiatis et al. 1998). The application of regression calibration to these three basic models, all of which are used widely in epidemiology, was unified with a special focus on interval estimation and computing in SAS (Spiegelman, McDermott et al. 1997). Application of measurement error methods to data requires that the main study, which contains data (Yi, Xi, Ui), i = 1,…, n1, is supplemented by an external validation study, which contains data (Xi, xi, Ui), i = n1 + 1,…, n1 + n2. Because it is difficult and expensive to validate Xi with xi, the validation study is typically much smaller than the main study, i.e. n1 >> n2.
The point and interval estimates of effect can be corrected for measurement error using Rosner et al.’s formulas (Rosner, Spiegelman et al. 1990)
equation M4
(2)
where [beta] and equation M5 are estimated by fitting (1) to the main study data, (Y,X,U). The first row of [Gamma], denoted equation M6, and equation M7 are obtained from fitting the linear regression model to the validation data
equation M8
(3)
under the assumption that
equation M9
(4)
Appendix 1 of Rosner, Spiegelman et al. (1990) gives the construction of [Gamma] from [gamma with circumflex] and equation (A7) of the same paper gives equation M10. Assumption (4) is the homoscedasticity assumption, the relaxation of which is the focus of this manuscript. Regression calibration has been presented by others for use in a variety of settings (Prentice 1982; Fuller 1987; Armstrong, Whittemore et al. 1989; Rosner, Spiegelman et al. 1990; Rosner, Spiegelman et al. 1992; Spiegelman, McDermott et al. 1997; Wang, Hsu et al. 1997; Carroll, Ruppert et al. 2006). All versions of the regression calibration method assume that measurement error is non-differential with respect to the response variable, Y, i.e.
equation M11
where f(·) is a density function. In addition to the assumptions specified by (1), (3) and (4), in the case of univariate x, the key additional requirement for approximate unbiasedness of the regression calibration estimator when g[E(Y | x, U)] in (1) is logistic is that either equation M12 is small, or Pr(Y = 1 | x, U) is small and f(x | X, U) is normal (Kuha 1994).
It is sometimes the case with biological variables that heteroscedasticity is evident over the observed range of the data; when this occurs, it typically takes the form that the spread increases with the level. Sometimes but by no means always, the variance can be stabilized by log transforming the data but this solution can be undesirable when the variable in question is to be used as a predictor variable in a regression model, and the scientific hypothesis focuses on measuring the relationship of the variable in its originally observed scale to the outcome variable of interest. In this paper, we assume that an empirically verifiable linear model for the mean, (3), holds, but assumption (4), that of constant variance of the measurement error model del for x | X, U, is untenable for the observed data. Instead, it may appear from the available validation data that
equation M13
(5)
is a more reasonable model for the variance, where h (Xi, Ui) is some function of the covariates which induces heteroscedasticity in the measurement error model. In what follows, we derive an extension to the standard regression calibration estimator given above in (2) and to its multivariate counterpart, in which the constant residual variance requirement given by (4) is eliminated.
Standard regression calibration, which assumes homoscedastic measurement error, can be derived by a first order Taylor series expansion around the naïve likelihood in which the mis-measured exposure is treated as if there were no error. In what follows, the second order expansion is developed. By adding the additional term, measurement error model heteroscedasticity can be accommodated, albeit with an unavoidably more complex estimator. In another approach to deriving the regression calibration estimator, the approximate logistic likelihood is derived under the assumptions of normal residual measurement error and rare disease. We develop these approaches in more detail in what follows below, to provide the formal derivation for the new estimator.
We assume the following mean and variance model for x given (X, U) follows
equation M14
dim(x)=q, and h (X, U) is a q × q function. Then, the distribution of Y | X, U for logistic regression with rare disease and multivariate normality is
equation M15
(6)
where
equation M16
(Kuha 1994). A similar result for general mean-variance models, using a second-order Taylor series expansion about E(x|X,U) (i.e. a small measurement error approximation) was obtained for scalar x by Carroll et al. (Carroll, Ruppert et al. 2006). For dim(x)>1, Carroll and Stefanski (Carroll and Stefanski 1990) presented an analogous result, but omitted the detailed derivation. Similar approximations have been given for the relative risk function in X(t), a vector-valued, possibly time-varying covariate, in survival data analysis (Prentice 1982), and for linear regression (Spiegelman, McDermott et al. 1997) where (6) is exact.
Insight can be gained by inspecting the form of (6) with no covariates, U, and for scalar x, denoted x,
equation M17
Figure 1 shows the theoretical relationship between log[E(Y)] under the rare disease assumption and x given by (1) (solid line), and the approximate induced relationships between log[E(Y)] and X for h(X)=X (dotted line), h(X)=X2 (short dashed line), and h(X)=1, the homoscedastic case, (long dashed lines), when plugging in the measurement error model parameters and logistic regression parameters estimated from the Nurses’ Health Study data on breast cancer incidence in relation to alcohol consumption discussed in Section 3.2. It is evident that when h(X)=X, the uncorrected estimator is likely to under-estimate β1 but when h(X)=X2, over-estimation could occur. If h(X)=X, (6) simplifies to a linear model in X as a function of the measurement error model and logistic regression model parameters (see Section 2.1), and it can be seen that the standard regression calibration estimator is also likely to over-estimate β1 (long dashed line).
Figure 1.
Figure 1.
True, observed and regression calibration-corrected regression functions (NHS)
To simplify notation, equation M18 was rewritten as equation M19. Then, solving for β1 in the two terms multiplying X and h(X) in (6), two estimators for β1,
equation M20
(7)
and
equation M21
where sign(t) = 1 if t is positive and −1 if t is negative, are available from the uncorrected regression of Y on X and h(X). Of course, the approximate likelihood (6) can be directly fit to the data using an iterative approach, jointly estimating all parameters simultaneously. Alternatively, we suggest a procedure for obtaining [beta]RCH,1 which can be constructed from standard software tools and used in routine analysis in what follows. Both approaches will provide consistent estimators, but the maximum likelihood estimator will, of course, be more efficient asymptotically. Later, we will compare the behavior of these estimators in a simulation study.
The new method is as follows:
  • A logistic regression model of Y on X, h(X, U) and U is run in the main study to obtain equation M22 and equation M23 and their estimated variances,
  • A weighted linear regression is run in the validation study, with weights 1/h(X, U), to obtain [sigma with hat]2 and [gamma with circumflex]1.
  • [beta]11 and [beta]12 are formed from the formulas above and efficiently combined to produce a single estimate, [beta]RCH,1,
    equation M24
The asymptotically minimum variance weights and their derivation, as well as the derivation of the formula for the variance of [beta]RCH,1, are given in Appendix 1. [beta]RCH,2, the measurement-error-corrected estimator of the coefficients corresponding to U, has form
equation M25
(8)
Its asymptotic variance is derived in Appendix 2, along with Cov([beta]RCH,1, [beta]RCH,2). As is evident from (6), this estimator can only be used for models with scalar x. For multivariate x with heteroscedastic covariance for x | X, U, a term is added to the model for E(Y|X, U) equal to
equation M26
which is a complicated function of the q elements of β1 that cannot be used to uniquely solve for the q elements of equation M27.
A result similar in form to (6) was given by Prentice (Prentice 1982) for the proportional hazards regression model on a vector X(t), in which one or more of the elements of X(t) are measured with error, and where the conditional distribution of x(t) given X(t) is multivariate normal with a linear mean in X(t) and variance Σ(X(t)). Hence, under the conditions specified by Prentice, [beta]RCH can be applied to Cox regression models with an arbitrary number of perfectly measured covariates and a single covariate measured with error, just as discussed above for logistic regression.
Under either small measurement error or a rare disease with normal errors for x|X, U, it is evident that if either equation M28 or equation M29 is small, for scalar x, the standard regression calibration estimator will be approximately valid, even in the presence of heteroscedasticity of the variance of the measurement error model, since the third term in the exponent of (6) vanishes.
2.1. Special case when an ‘alloyed’ gold standard is available
Standard regression calibration is valid even if instead of x, the gold standard, an unbiased imperfect gold standard, equation M30, is observed in the validation study, where equation M31 and epsiloni is a random, mean-zero error term. For example, doubly labeled water is considered an unbiased biomarker for total energy intake (Subar, Kipnis et al. 2003). However, doubly labeled water is measured with some random-within person, unbiased error (Preis, Spiegelman et al. 2010). Here as well, for scalar x, as long as equation M32 and equation M33, γ and α′ can be consistently estimated from fitting the model
equation M34
to the data. However,
equation M35
where equation M36, so Step 2 in the procedure for obtaining [beta]RCH,1 given above will not provide a valid estimate of h(Xi, Ui) σ2, the quantity needed for [beta]RCH,1. Without replicate data within subjects, followed by additional calculations not described here, [beta]RCH is not applicable when an alloyed gold standard is observed.
2.2. Special case of [beta]RCH when h(X)=X
Another important special case emerges when h(X)=X. Then, from equation (6) we obtain
equation M37
(9)
where equation M38 is the regression coefficient for X obtained from fitting a logistic regression model of Y on X and U, [beta]11 is as above in (7), and [beta]RCH,2 is as above in (8). It is evident from (6) that the function h(X)=X +b, where b is a positive constant, can also be considered here. In this case, b is absorbed by the intercept, equation M39, and does not need to be considered any further in the measurement error correction procedure. The variance of [beta]RCH,1 in the special case was again derived using the multivariate delta method (Bishop, Fienberg et al. 1975). It should be noted that when β1γ1 < 0, [beta]RCH,1 will provide a consistent estimate of β1 only when multivariate delta method (Bishop, Fienberg et al. 1975). It should be noted that when β1γ1 < 0, [beta]RCH,1 will provide a consistent estimate of β1 only when equation M40 (Appendix 3). However, under (3), as long as Corr(x,X) is positive, as would usually be the case when X is an surrogate for x, γ1 will be positive, and whenever it is anticipated that β1 might be negative, x and X can be recoded to avoid this.
2.3. Application to the classical measurement error model
There are an important set of problems where x is unobservable, i.e. no “gold standard” exists. In some of these cases, the measurement error model
equation M41
(10)
is considered reasonable, where n2i is the number of replicates for subject i. Examples of data which are believed to follow this model include blood pressure, serum biomarkers such as cholesterol and its subfractions, hormones, and vitamin concentrations. When model (10) holds, Rosner et al.’s (1992) version of the regression calibration method applies, in a procedure similar to that described earlier. In the univariate case, estimates of the reliability coefficient, Var(x)/Var(X), and its variance are substituted for [gamma with circumflex]1 and its variance in equation (3). Multivariate generalizations have been given (Rosner, Spiegelman et al. 1992). When the third component of (10) does not hold, i.e. when equation M42, an extension of regression calibration to accommodate this expansion of the model is needed. We rederived the likelihood for scalar x and no additional covariates U to attempt to identify an appropriate estimator in this simple case, now assuming that Xij | xi ~ N(xi, h(xi)σ2), equation M43, i=1,…,n2, j=1,…,nR, where nR is the number of replicates for each subject i, and obtained
equation M44
and
equation M45
Neither f(xi,Xij) nor f(xi|Xij) has a Gaussian structure, and the method of completing the square to obtain a closed-form solution for f(Yi|Xi) used to derive (6) is therefore not applicable. It is unlikely that a closed-form for
equation M46
exists when equation M47 and ε is Gaussian. If functions h(xi) are found which fit the data at hand, it is unlikely that the resulting expression for f(Yi|Xi) will be of a form such that the link function, g[E(Yi|Xi)], can be found with g equal to the logistic or log transformations. Without these features, even [beta]RCH, the scalar version of [beta]RCH, does not exist.
2.4. Iterative methods
Iterative methods can also be applied to the problem of heteroscedastic measurement error in regression covariates. These methods will typically have the advantage of relaxing some of the assumptions required by closed-form methods, such as (1), (3) and (4), but will have the disadvantage of computational complexity which is a barrier to use in applications. In a main study/external validation study design for a binomial outcome and a Gaussian measurement error model, the log likelihood of the data is equal to
equation M48
We assume here that [var phi](α′, γ, equation M49) is the normal density with mean given by (3) and variance given by (4), and the probability that Yi = 1 given (Xi, Ui) is
equation M50
The numerical method given by (Crouch and Spiegelman 1990) can be used to evaluate f3 (Yi | Xi, Ui; β, α′, γ, σ2), or code could be developed which makes use of standard statistical software through the non-linear optimizer that is provided. Generalizing Kuha’s derivation (Kuha 1994) to the heteroscedastic measurement error variance case, by using a second-order Taylor series expansion of logit[Pr(Y=1)] with respect to β1 around β1 = 0 under the rare disease assumption and with x | X, U normally distributed with linear mean and variance, f3 (Yi | Xi, Ui; β, α′, γ, σ2) can be approximated as follows:
equation M51
Likelihoods based on both the exact, [beta]ML, and approximate, [beta]aML, expressions for f3 (Yi | Xi, Ui; β, α′, γ, σ2) given above were fit to the data in Section 4 and studied by simulation in Section 5.
A consistent, semi-parametric efficient estimator was proposed by (Robins, Hsieh et al. 1995), where (Begun, Hall et al. 1983) defined the semi-parametric efficiency bound as the smallest possible variance obtained by any estimator which is consistent for β over all possible measurement error models for x | X, U. Because the model for x | X, U is unknown a priori and validation study sample sizes are often small, methods that are robust to mis-specification of the model for x | X, U are desirable. The semi-parametric fully efficient estimator of this class requires a computationally cumbersome non-parametric fit of the density of x | X, U – instead, we investigated the semi-parametric locally efficient version, which is consistent even when the density of x | X, U is mis-specified, and uses a parametric fit for x | X, U. As this parametric density, fit and empirically verified in the validation study, approaches the true density for x | X, U, full semi-parametric efficiency is approached. Details on this estimator can be found in (Spiegelman and Casella 1997), Appendix 1, where, here f2 (x | X, U; α′, γ, σ2) was taken to be the normal density with mean given by (3) and variance given by (4), and α′, γ, and σ2 are estimated in the validation study, as usual. This estimator, [beta]SPLE, was fit to the data in Section 4 and studied by simulation in Section 5 along with the others.
3.1. The ACE Study of the acute health effects of occupational exposure to anti-neoplastics among pharmacists
(Valanis, Vollmer et al. 1993) described a cross-sectional study of acute health effects from occupational chemotherapeutics exposure in 675 hospital pharmacists. The research objective is estimation and inference about the prevalence odds ratio for acute health effects related to chemotherapeutics exposure. Here, we will focus on fever prevalence in relation to exposure. There were 110 cases of fever. Average weekly chemotherapeutics exposure (X) was self-reported on questionnaire; in a sub-sample of 56 pharmacists on-site drug mixing diaries were kept for 1–2 weeks (x). The correlation between these two methods of exposure assessment was 0.70. The correlation between the predicted values from the linear measurement error model for diary data (x), conditional upon the questionnaire data (X) and other model covariates, and the absolute value of the residuals (Carroll and Ruppert 1988) was 0.21, indicative of moderate heteroscedasticity (Figure 2).
Figure 2.
Figure 2.
Evidence for heteroscedasticity in the ACE Study
These analyses were adjusted for three covariates (U): age in years, shift work (1 if night or rotating shift, 0 if day shift), and employed by a community hospital (1 if yes, 0 if no). In previously published analysis of these data, the uncorrected prevalence odds ratio for a top to bottom quintile contrast in number of drugs mixed per day, corresponding to an increment of 52 drugs/day, was 1.08 (95% confidence interval (CI) 1.02–1.15) and the regression calibration estimate of the same quantity, ignoring the observed heteroscedasticity in the measurement error model, was 1.22 (95% CI 1.05–1.45) (Spiegelman and Valanis 1998). Using maximum likelihood methods with a gamma measurement error model that was empirically verified in the ACE validation study and allows for heteroscedasticity which depends on covariates in an arbitrary manner, the estimated prevalence ratio was 1.17 (95% CI 1.04–1.26) (Spiegelman and Casella 1997). Note that the odds ratio will be a good approximation for the prevalence ratio when the outcome is rare and when the prevalence ratio is near one.
We first needed to identify a form for h(X), and we searched over the class of functions h(X)=(X+b)p. We sought to find the transformation in this class for which the correlation between the absolute value of the weighted residuals from the weighted least squares regression of x on X and the other covariates is nearest to zero, where the weights are h −1(X). As apparent from Table 1, the optimal transformation on was h(X) = X, leading us to consider the special case of [beta]RCH discussed in Section 2.2. However, in order to identify the best transformation over the class of functions h(X)=(X+b)p as shown in Table 1, we needed a non-zero value for b when p=0 for the log function, since the data contain zero values. In addition, to fit the SPLE method, a parametric working model for x | X needs to be specified. Here, we used a working normal with mean given by equation (3) and in the footnote of Table 1, and Var(x | X) = 2. Because the normal likelihood is undefined when X = 0 and Var(X) = 2, the function h(X)=X+b was again needed with a small positive value of b = 0.1. As previously noted, the measurement error correction procedure is invariant to the choice of constant in the function h(X)=X+b, and the results in Table 2 were unchanged to three digits of precision for b = 0.
Table 1.
Table 1.
Correlation between the weighted absolute value of the measurement error model residuals and the predicted values for selected weight functions, ACE Study1
Table 2.
Table 2.
Comparison of [beta]RCH to other methods: The ACE Study1
Table 2 gives the results of this analysis, where we find that the results which take into account the apparent heteroscedasticity in the measurement error model are virtually unchanged from the standard regression calibration results which ignore this feature of the data. Both the exact and approximate maximum likelihood estimates and the semi-parametric locally efficient estimate were similar to the regression calibration estimates. Application of the semi-parametric estimator led to a large efficiency loss. The regression calibration estimators took trivially more CPU time than the uncorrected estimator, and the iterative estimators took 10 to 100 fold more CPU time than these. With a small data set such as here, none of these CPU times were prohibitive.
3.2. The Nurses’ Health Study of the relationship between dietary alcohol intake and breast cancer incidence rates
Willett et al. described a prospective study of the relationship between breast cancer incidence and moderate alcohol consumption among 89,538 U.S. women aged 34–59 who were followed for 4 years beginning in 1980 (Willett, Stampfer et al. 1987). After updating the original data to include 8 years of follow-up, 1466 cases occurred during this study period. Alcohol intake was calculated from three questions about the consumption of beer, wine and liquor that were included on a 61-item food frequency questionnaire data. These data were validated in a sub-sample of 173 women with four one-week weighed diet records (Willett, Sampson et al. 1985). The correlation between these two methods of exposure assessment was 0.85. For average daily alcohol intake (g/day), the correlation between the absolute value of the regression residuals and the predicted values from the linear measurement error model for the diet record data (x) conditional upon the food frequency data (X) and other model covariates was 0.44 (Figure 3).
Figure 3.
Figure 3.
Evidence for heteroscedasticity in the NHS
A logistic regression model, equation M52, was fit to the data, where Yi is the probability that participant i has received a diagnosis of breast cancer between the time of the 1980 questionnaire return and January 1,1989, Xi is the covariate measured with error, alcohol intake, and Ui is the vector of other covariates, taken to be perfectly measured: age, age at menarche, menopausal status, age at first live birth, history of benign breast disease, family history of breast cancer, body mass index, and parity. The uncorrected and regression calibration point and interval estimates of the rate ratio from a Cox regression model, corresponding to a 12 g/day increase in alcohol intake, in the Nurses’ Health Study based on this same 8 year follow-up period (1466 cases) were given previously as 1.09 (95% CI 1.04–1.14) and 1.15 (95% CI 1.05–1.26), respectively (Spiegelman, McDermott et al. 1997).
Although h(X) = log(X + 1.005) was an option for minimizing the correlation between the weighted residuals and the predicteds from the measurement error model fit in the validation study, transformations of the exposure of interest are not desirable for substantive interpretability unless absolutely necessary for statistical reasons. Thus, the optimal transformation of X for minimizing the correlation between the residuals and the predicteds was again a linear one (Table 3), and the special case of [beta]RCH discussed earlier was again applied.
Table 3.
Table 3.
Correlation between the weighted absolute value of the measurement error model residuals and the predicted values for selected weight functions, Nurses’ Health Study1
A small non-zero value for the parameter b was used for the reasons given above in Section 3.1. There was a slight attenuation in [beta]RCH relative to [beta]RC but substantive interpretation of the data remained unchanged (Table 4). Both the exact and approximate maximum likelihood estimators gave results similar to those given by [beta]RCH, and CPU time was not elevated compared to the regression calibration methods. The semi-parametric efficient estimate was also similar to the others, although slightly larger and less powerful. The excessive CPU time needed for calculations makes this estimator impractical for use with such a large data set.
Table 4.
Table 4.
Comparison of [beta]RCH to other methods: The Nurses’ Health Study1
We studied the small sample behavior of all estimators discussed in Section 2 by simulation. We designed the simulation study to follow the Nurses’ Health Study described in Section 3.2, and varied the validation study sample size (n2=173 or 346), the parameter p in h(X)=Xp (p=0.5,1,2), the extent of measurement error as expressed by Corr(x,X)=(0.4, 0.6, 0.8), and the extent of heteroscedasticity as expressed by Corr(ê2, x̂) between 0.2 and 0.6. Following the Nurses’ Health Study, we set (β0, β1, α′) = (−2.633, 0.01, 3.29) and (γ, σ2) were varied to obtain the desired Corr(x,X) and Corr(ê2, x̂). To simulate the main study, n1=8953 X’s were chosen with replacement from the Nurses’ Health study and used to generate n1 x’s from a normal distribution with mean given by (3) and variance given by (5) at the specified values of (α′, γ, σ2). Then, n1 values of Y were generated from a Bernoulli with parameter given by (1) with the logistic link function, at the fixed values for (β0, β1). To simulate the validation study, n2 X’s were chosen with replacement from the Nurses’ Health Study validation study and used to generate n2 x’s from a normal distribution with mean given by (3) and variance given by (5) at the specified values of (α′, γ, σ2). Five hundred simulations were run for each design point.
Results from the simulation study are given in Table 5. The most striking result is that standard regression calibration, [beta]RC, does as well or better than all the other estimators considered, including maximum likelihood methods, in all scenarios studied. This was true even when measurement error or heteroscedasticity were severe, and from the point of view of both bias and coverage probability. The positive bias predicted in Figure 1 was apparent when h(X)=X2, especially when measurement error was severe. The new estimator, [beta]RCH, performed poorly in many instances, and never did materially better than standard regression calibration. In an attempt to improve finite sample coverage probability, we compared the asymptotic Wald confidence interval coverage probability to the empirical and normalized bootstrapped confidence interval coverage probability, where the latter is the bootstrapped mean of the estimator plus or minus the squared root of the bootstrapped variance times 1.96. Five hundred bootstrapped samples were generated for each of 500 simulations. Bootstrapping the confidence intervals for [beta]RCH solved the problem of its poor empirical coverage probability. The average bias of the bootstrapped estimator for [beta]RCH remained larger than that for [beta]RC in nearly all cases considered, often substantially so, although when bias was calculated as the average of the median bias or median of the median bias, the differences decreased considerably and the overall bias dropped to an acceptable value in many cases (data not shown). In the main and validation study sample sizes considered in this simulation study, the asymptotic optimality of the maximum likelihood estimator and its approximation were not evident.
Table 5.
Table 5.
Simulation study of estimators under heteroscedastic measurement error variance
It is of interest what practical gain is likely be derived from the application of the more robust estimator, [beta]SPLE. As can be seen in Table 5, the percent bias, mean square error and coverage probability of [beta]SPLE were acceptable in some of the cases considered, and but no better than those obtained from the standard regression calibration. When measurement error was severe, [beta]SPLE had considerably more bias than standard regression calibration, although its coverage probability was correct. In all cases considered, the computational burden was at least an order of magnitude greater than the maximum likelihood methods and two order of magnitudes greater than the standard regression calibration method.
Although the derivation given in equation (6) has appeared previously, this estimator, [beta]RCH, is novel. The new estimator developed in this paper, [beta]RCH, has several attractive features. Standard software can be used for the primary regression analysis, which is subsequently corrected for bias in a non-iterative calculation at the end. It relaxes one of the most restrictive assumptions of regression calibration, that of homoscedasticity of the measurement error model. This new estimator allows for a multivariate vector of variables measured without error in the primary regression model, along with a scalar variable measured with error, a setting which will be applicable in many situations, including in the examples motivating the research. In two illustrative examples, the new estimator performed well. Although theoretical justification was provided for use of this methodology with the Cox model for rare outcomes and normally distributed error (e.g. (Prentice 1982)), further work could be done to study the behavior of the method in this setting, particularly under departures from rare outcome and from normal errors. In addition, extensions to Poisson regression models with covariate measurement error should also be considered (Fung and Krewski 1999; Kukush, Schneeweis et al. 2004).
It took longer to find [beta]aML than [beta]ML in both illustrative examples. Thus, the approximate maximum likelihood estimator should not be considered further.
Although the marginal distributions of x and X were sharply skewed in both the ACE data (for number of drugs mixed per week) and the NHS (for grams of alcohol per day), the distributions of the standardized residuals from the models for E(x|X) were symmetrized to a large extent. Marginal distributions should not, in general, be used as evidence for or against heteroscedasticity in a conditional variance. The correlations between the absolute value of the measurement error model regression residuals and the predicted values from these regressions were moderate. As shown in Section 2, if either equation M53 or σ2 is small, the convergent value of [beta]RCH is likely to be approximately equal to the convergent value of [beta]RC. These conditions appear to have been met in both applications, as in both cases, there was little difference in estimates or inference obtained from the two methods. The approximations used to derive [beta]RCH assume either rare disease and multivariate normality for the true x given the surrogate, or “small” equation M54. These assumptions are empirically verifiable, and in the applications in epidemiology which motivated this research, they are verified. In cases where these assumptions are unreasonable, further research is needed to derive suitable estimators.
An extensive simulation study of [beta]RCH and the other estimators was conducted, based upon a data structure motivated by the Nurses’ Health Study data considered in this paper. The results clearly indicated that under the scenarios studied, [beta]RCH was outperformed by the other estimators. Standard regression calibration, [beta]RC, performed well, as did both the likelihood approximation, [beta]aML, and the exact maximum likelihood estimator, [beta]ML. The good performance of [beta]RC under heteroscedasticity was observed previously (Spiegelman, Rosner et al. 2000). In the simulation study presented in that paper, two covariates in the logistic regression for Y on x, one with moderate error and the other with considerable error, were dichotomized. By transforming these continuous covariates to Bernoullis, where the variance is a function of the mean, heteroscedasticity was induced. Findings from that study indicated that [beta]RC was approximately valid for estimation and inference, at least when the validation study size was doubled or more from the original 173. Results from this simulation study suggest that validation study sizes larger than those typically found in nutritional epidemiology are needed when measurement error heteroscedasticity is anticipated.
The coverage probability of [beta]RCH was below the desired value in nearly all cases considered. This family of heteroscedastic variance functions is similar in spirit to the Box-Cox family of transformations, where, following the initial suggestion of Box and Cox (Box and Cox 1964), standard practice in applied statistics is to estimate the parameter first and then treat this estimate as fixed when estimating the remaining parameters. We did similarly here. This two-step procedure was followed in the computation of the iterative estimators as well. It has been well established that in fitting standard models such as (1) with a heteroscedastic variance function such as (5), the asymptotic distribution of [beta] is the same whether p in (5) is known or estimated from the data (Carroll and Ruppert 1988). However, in the presence of covariate measurement error, the current situation is somewhat different from the one considered by Carroll and Ruppert, and it is possible that accounting for the estimation of the parameter p, which determines the measurement error variance function within the class considered, could have improved the coverage probabilities. Further research could investigate both the theoretical and empirical properties of the two-step approach in this setting, derive the asymptotic variance of [beta]RCH with variability of [p with hat] taken into account, examine its variance empirically through simulation studies, and compare to a joint estimation and inference approach for iterative estimators.
equation M55 ranged over several orders of magnitude, from 0.0000091 to 0.00882 in the simulations shown n in Table 5, where equation M56 is the average h(X, U) in the data. Kuha had previously suggested that the bias in [beta]RC in the homoscedastic measurement error case would be low when equation M57 was less than 0.5 (Kuha 1994), but we found in another implementation of the regression calibration estimator with homoscedastic measurement error that 0.5 was a far too liberal a criterion, with unacceptable levels of bias when equation M58 was much smaller than 0.5 (Weller, Milton et al. 2007). In the present simulation study, the Spearman correlation of equation M59 with bias in [beta]RC was 0.52 and with bias in [beta]RCH was 0.22. It appears that equation M60 is not a reliable metric for identifying conditions under which these estimators may be biased in finite samples in the case of heteroscedastic measurement error.
The maximum likelihood approaches considered here are strictly valid only when the distribution for x|X,U is Gaussian. We did not study the performance of the maximum likelihood estimators when a mis-specified likelihood is fit due to incorrect distributional assumptions about x|X,U or when the proper likelihood is derived under alternative distributions for x|X,U. It is indeed likely that the maximum likelihood estimator would exhibit less finite sample bias in a main study/internal validation study design, since this design is substantially more informative that a main study/external validation study design (Spiegelman and Gray 1991). When the disease is rare, as in NHS and, typically, other cohort studies of cancer and other chronic disease endpoints, the external validation study is the design by default. Although the semi-parametric locally efficient estimator, [beta]SPLE, is consistent under any distribution for x|X,U, this estimator had large finite sample bias in some scenarios studied by simulation, especially as measurement error increased. In addition, this estimator is difficult to program and computationally burdensome to calculate. [beta]RC can be readily computed using SAS macros obtained at http://www.hsph.harvard.edu/faculty/spiegelman/blinplus.html and http://www.hsph.harvard.edu/faculty/spiegelman/relibpls8.html.
In summary, as predicted by the theory, standard regression calibration is adequate when measurement error is not severe or the mis-measured covariate effect is moderate, even when heteroscedasticity is severe. It may be worthwhile to recall that covariate measurement error leads to an exponentially increasing information loss given the same sample size and all other conditions held constant; for example, it is well known that under classical homoscedastic measurement error, the effective sample size is decreased by the factor ρx,X2 (Fleiss 1986; White, Armstrong et al. 2006). In the most extreme case of measurement error considered in our simulation study, where ρx,X = 0.4, this would lead to more than an 80% reduction in the effective sample size. It appears from the simulation study that heteroscedasticity leads to an even greater information loss for the same amount of measurement error, since [beta]RC was found to have much better performance in similarly designed simulation studies in prior publications when measurement error was homoscedastic (Rosner, Willett et al. 1989; Carroll and Wand 1991). When measurement error heteroscedasticity is suspected, larger validation studies than are typically the current norm in epidemiology are needed.
Appendix 1.. Derivation of the optimal weights and the variance for [beta]RCH,1
Let V1 = Var([beta]11), V2 = Var([beta]12), and V12 = Cov([beta]11, [beta]12). By the multivariate delta method,
equation M61
Under the heteroscedastic measurement error model (5), when γ is estimated using weighted linear regression with weights h(Xi), i=1,…,n2,
equation M62
where equation M63 is an n2 × (1 + dim(U)) matrix with columns equation M64, i=1,…,n2 (Seber 1977). Again by the multivariate delta method,
equation M65
since by arguments analogous to those given in Appendix 1 of Spiegelman et al. (Spiegelman, Carroll et al. 2001), equation M66 is asymptotically 0, and
equation M67
All other covariance terms are 0, since by the Gauss-Markov theorem, Cov([gamma with circumflex]1, [sigma with hat]2)= 0, and by arguments analogous to those given in Appendix 1 of Spiegelman et al. (Spiegelman, Carroll et al. 2001),
equation M68
Thus,
equation M69
(1)
The estimates of V1, V2, and V12 are obtained by substituting the parameters for their estimates for the uncorrected primary regression of Y on [X,h(X),U] in the main study, and the weighted linear regression of x on X and U in the validation study. Likewise, the variances of these parameter estimates are obtained from these same regression analyses, and substituted into the expressions for V1, V2, and V12 to obtain estimates of these quantities.
Now, to derive the optimal weight, w1, for (1), we need to minimize
equation M70
with respect to w1, since w2=1-w1, in order to obtain a consistent estimator. The single extremum of this function, subject to the constraint, is at equation M71. This is a global minimum as long as V1 + V2 > 2V12, a condition which is likely to be met in most situations. With these optimal weights, the variance of [beta]RCH,1 is thus
equation M72
To estimate Var([beta]RCH,1), estimates of V1, V2, and V12 are obtained from the fit of (1) to the main study data, Var ([gamma with circumflex]1) is estimated by plugging [sigma with hat]2 into the expression for Var([gamma with circumflex]1) given above, and Var([sigma with hat]2) = 2σ4 / [n - dim(γ)-1].
Appendix 2.. Derivation of Var([beta]RCH,2) and Cov([beta]RCH,1, [beta]RCH,2)
By the multivariate delta method,
equation M73
where
equation M74
(2)
and
equation M75
(3)
where equation M76 and equation M77 are obtained from the corresponding elements and sub-matrices of
equation M78
and equation M79, equation M80 and equation M81 are obtained from the uncorrected logistic regression of Y on (X,U). Finally,
equation M82
where the covariances in the first and third terms are given by equations (2) and (3), respectively, and equation M83 is derived in Appendix 1.
Appendix 3.. Proof of conditions under which [beta]RCH,1 is consistent for β1 when h(X) = X
Note that equation M84 and equation (9) simplifies to
equation M85
Without loss of generality, hats indicating estimators are suppressed throughout this proof.
[beta]RCH,1 is consistent for β1 when
equation M86
(1)
equation M87
(2)
In addition, [beta]RCH,1 is consistent for β1 when
equation M88
(3)
equation M89
(4)
To see this, first consider the first pair of conditions. When condition 2) holds the sign function yields +1 and condition 1) results in |γ1 + σ2 β1 |= γ1 + σ2 β1. Then, the numerator simplifies to σ2 β1, and so βRCH, 1 = β1, We now consider the possible combinations of β1 and γ1,
  • It is clear that (1) and (2) hold when both β1 and γ1 are positive, hence [beta]RCH,1 is always consistent for β1 under these circumstances.
  • Next, we consider the case when γ1 > 0, β1 < 0.
    • holds if −σ2 β1 < γ1, i.e. σ2 |β1| < γ1 (*)
    • holds if equation M90 iff equation M91
    This is the same as equation M92. Divide by |β1| to get equation M93 (**)
    (**) equation M94 and (*) σ2 |β1| < γ1 are not possible at the same time. So both (1) and (2) positive is not possible when γ1 > 0, β1 < 0.
    Consider both expressions negative:
    • 3) γ1 + σ2 β1 < 0 iff γ1 < σ2 |β1| (*)
    • 4) equation M95 iff equation M96
    Divide by |β1| to get equation M97 (**)
    equation M98
    When β1 < 0 and γ1 > 0, the only situation that produces a valid estimate is when equation M99.
  • Next, we consider the case when β1 > 0, γ1 < 0
    • γ1 + σ2 β1 > 0, and
    • equation M100
      γ1 + σ2 β1 > 0 if σ2 β1 > |γ1| (*)
      equation M101 when equation M102. That is, equation M103 (**)
      Both of these are true when equation M104.
      Now consider, [beta]RCH,1 is consistent for β1 when
    • γ1 + σ2 β1 < 0, and
    • equation M105
      (*) σ2 β1 < |γ1| and (**) equation M106 is not possible.
  • β1 < 0, γ1 < 0
    [beta]RCH,1 is consistent for β1 when
    • γ1 + σ2 β1 > 0, and
    • equation M107
    It easy to show that both 1) and 2) are never true.
    Now consider, [beta]RCH,1 is consistent for β1 when
    • 3) γ1 + σ2 β1 < 0, and
    • 4) equation M108
Similarly it is easy to see that both 3) and 4) are always true. Hence, when β1 < 0, γ1 < 0, [beta]RCH,1 is consistent for β1.
Conclusion:
  • [beta]RCH,1 is always consistent for β1 when both β1 and γ1 are positive.
    [beta]RCH,1 is always consistent for β1 when both β1 and γ1 are negative.
When β1 < 0, γ1 > 0 [beta]RCH,1 is inconsistent for β1 when equation M109 or σ2 |β1| < γ1. When β1 > 0, γ1 < 0, [beta]RCH,1 is inconsistent for β1 when equation M110 or σ2 β1 < |γ1|. Hence, when β1 * γ1 < 0, [beta]RCH,1 is consistent for β1 when equation M111.
Footnotes
Author Notes: This study was supported by NIH grants CA50597, NIH ES09411, and NIH CA74112.
Contributor Information
Donna Spiegelman, Harvard School of Public Health.
Roger Logan, Harvard School of Public Health.
Douglas Grove, Fred Hutchinson Cancer Research Center.
  • Armstrong BG, Whittemore AS, et al. “Analysis Of Case-Control Data With Covariate Measurement Error - Application To Diet And Colon Cancer.” Statistics In Medicine. 1989;8(9):1151–1163. doi: 10.1002/sim.4780080916. [PubMed] [Cross Ref]
  • Begun JM, Hall WJ, et al. “Information and asymptotic efficiency in parametric-nonparametric models.” Annals of Statistics. 1983;11:432–452. doi: 10.1214/aos/1176346151. [Cross Ref]
  • Bishop YMM, Fienberg SE, et al. Discrete Multivariate Analyses: Theory and Practice. MIT Press; 1975. pp. 492–494.
  • Box GEP, Cox DR. “AN ANALYSIS OF TRANSFORMATIONS” Journal of the Royal Statistical Society Series B-Statistical Methodology. 1964;26(2):211–252.
  • Carroll R, Ruppert D. Transformation and weighting in regression. London: Chapman and Hall; 1988.
  • Carroll RJ, Ruppert D. Transformation and Weighting in Regression. London: Chapman and Hall; 1988.
  • Carroll RJ, Ruppert D, et al. Measurement Error in Nonlinear Models. London: Chapman & Hall; 2006.
  • Carroll RJ, Stefanski LA. “Approximate quasi-liklihood estimation in models with surrogate predictors” Journal of the American Statistical Association. 1990;85:652–663. doi: 10.2307/2290000. [Cross Ref]
  • Carroll RJ, Wand MP. “SEMIPARAMETRIC ESTIMATION IN LOGISTIC MEASUREMENT ERROR MODELS” Journal of the Royal Statistical Society Series B-Methodological. 1991;53(3):573–585.
  • Cook J, Stefanski LA. “A simulation extrapolation method for parametric measurement error models” Journal of the American Statistical Association. 1995;89:1314–1328. doi: 10.2307/2290994. [Cross Ref]
  • Crouch EAC, Spiegelman D. “The Evaluation Of Integrals Of The Form Integral-Infinity+Infinity F(T)Exp(-T2) Dt - Application To Logistic Normal-Models” Journal Of The American Statistical Association. 1990;85(410):464–469. doi: 10.2307/2289785. [Cross Ref]
  • Fleiss J. The design and analysis of clinical experiments. New York: Wiley; 1986.
  • Fuller WA. Measurement Error Models. New York: Wiley; 1987.
  • Fung KY, Krewski D. “On measurement error adjustment methods in Poisson regression” Environmetrics. 1999;10(2):213–224. doi: 10.1002/(SICI)1099-095X(199903/04)10:2<213::AID-ENV349>3.0.CO;2-B. [Cross Ref]
  • Hu P, Tsiatis AA, et al. “Estimating the parameters in the Cox model when covariate variables are measured with error.” Biometrics. 1998;54(4):1407–1419. doi: 10.2307/2533667. [PubMed] [Cross Ref]
  • Huang Y, Wang CY. “Cox regression with accurate covariates unascertainable: a nonparametric-correction approach” Journal of the American Statistical Association. 2000;95:1209–1219. doi: 10.2307/2669761. [Cross Ref]
  • Kuha J. “Corrections for exposure measurement error in logistic regression models with an application to nutritional data” Stat Med. 1994;13(11):1135–1148. doi: 10.1002/sim.4780131105. [PubMed] [Cross Ref]
  • Kukush A, Schneeweis H, et al. “Three estimators for the Poisson regression model with measurement errors.” Statistical Papers. 2004;45(3):351–368. doi: 10.1007/BF02777577. [Cross Ref]
  • Lee JE, Hunter DJ, et al. “Alcohol intake and renal cell cancer in a pooled analysis of 12 prospective studies.” Journal of the National Cancer Institute. 2007;99(10):801–810. doi: 10.1093/jnci/djk181. [PubMed] [Cross Ref]
  • Preis SR, Spiegelman D, et al. “Random and correlated errors in gold standards used in nutritional epidemiology: implications for validation studies.” American Journal of Epidemiology. 2010 In press.
  • Prentice RL. “Covariate measurement errors and parameter estimation in a failure time regression model” Biometrika. 1982;69:331–342. doi: 10.1093/biomet/69.2.331. [Cross Ref]
  • Robins JM, Hsieh F, et al. “Semi-parametric efficient estimation of a conditional density with missing or mis-measured covariates.” Journal of the Royal Statistical Society, Series B. 1995;57:409–424.
  • Rosner B, Spiegelman D, et al. “Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error.” Am J Epidemiol. 1990;132(4):734–745. [PubMed]
  • Rosner B, Spiegelman D, et al. “Correction of logistic regression relative risk estimates and confidence intervals for random within-person measurement error.” Am J Epidemiol. 1992;136(11):1400–1413. [PubMed]
  • Rosner B, Willett WC, et al. “Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error.” Stat Med. 1989;8(9):1051–1069. doi: 10.1002/sim.4780080905. discussion 1071–1053. [PubMed] [Cross Ref]
  • Seber G. Linear Regression Analysis. New York: Wiley & Sons; 1977.
  • Spiegelman D, Carroll RJ, et al. “Efficient regression calibration for logistic regression in main study/internal validation study designs with an imperfect reference instrument.” Statistics In Medicine. 2001;20(1):139–160. doi: 10.1002/1097-0258(20010115)20:1<139::AID-SIM644>3.0.CO;2-K. [PubMed] [Cross Ref]
  • Spiegelman D, Casella M. “Fully parametric and semi-parametric regression models for common events with covariate measurement error in main study/validation study designs” Biometrics. 1997;53(2):395–409. doi: 10.2307/2533945. [PubMed] [Cross Ref]
  • Spiegelman D, Gray R. “COST-EFFICIENT STUDY DESIGNS FOR BINARY RESPONSE DATA WITH GAUSSIAN COVARIATE MEASUREMENT ERROR” Biometrics. 1991;47(3):851–869. doi: 10.2307/2532644. [PubMed] [Cross Ref]
  • Spiegelman D, McDermott A, et al. “Regression calibration method for correcting measurement-error bias in nutritional epidemiology.” Am J Clin Nutr. 1997;65(4 Suppl):1179S–1186S. [PubMed]
  • Spiegelman D, Rosner B, et al. “Estimation and inference for logistic regression with covariate misclassification and measurement error in main study/validation study designs.” Journal of the American Statistical Association. 2000;95:51–61. doi: 10.2307/2669522. [Cross Ref]
  • Spiegelman D, Valanis B. “Correcting for bias in relative risk estimates due to exposure measurement error: A case study of occupational exposure to antineoplastics in pharmacists” American Journal of Public Health. 1998;88(3):406–412. doi: 10.2105/AJPH.88.3.406. [PubMed] [Cross Ref]
  • Subar AF, Kipnis V, et al. “Using intake biomarkers to evaluate the extent of dietary misreporting in a large sample of adults: the OPEN study.” Am J Epidemiol. 2003;158(1):1–13. doi: 10.1093/aje/kwg092. [PubMed] [Cross Ref]
  • Valanis BG, Vollmer WM, et al. “Association of antineoplastic drug handling with acute adverse effects in pharmacy personnel.” Am J Hosp Pharm. 1993;50:445–462. [PubMed]
  • Van Roosbroeck S, Li R, et al. “Traffic-related outdoor air pollution and respiratory symptoms in children: the impact of adjustment for exposure measurement error” Epidemiology. 2008;19:409–416. doi: 10.1097/EDE.0b013e3181673bab. [PubMed] [Cross Ref]
  • Wang CY, Hsu L, et al. “Regression calibration in failure time regression.” Biometrics. 1997;53(1):131–145. doi: 10.2307/2533103. [PubMed] [Cross Ref]
  • Weller EA, Milton DK, et al. “Regression calibration for logistic regression with multiple surrogates for one exposure.” Journal of Statistical Planning and Inference. 2007;137(2):449–461. doi: 10.1016/j.jspi.2006.01.009. [Cross Ref]
  • White EJ, Armstrong BK, et al. Principles of exposure measurement in epidemiology: collecting, evaluating and improving measures of disease risk factors. Oxford, England: New York, New York, Oxford University Press; 2006.
  • Willett WC, Sampson L, et al. “Reproducibility and validity of a semiquantitative food frequency questionnaire.” American Journal of Epidemiology. 1985;122:51–65. [PubMed]
  • Willett WC, Stampfer MJ, et al. “MODERATE ALCOHOL-CONSUMPTION AND THE RISK OF BREAST-CANCER.” New England Journal of Medicine. 1987;316(19):1174–1180. doi: 10.1056/NEJM198705073161902. [PubMed] [Cross Ref]
Articles from The International Journal of Biostatistics are provided here courtesy of
Berkeley Electronic Press