Consider a biomarker assessment W of an (unmeasured) nutrient consumption variable Z, along with a corresponding self-report assessment Q. For example, W may be a (precise) estimate of the logarithm of short-term total daily energy (calories) consumed by a study subject as assessed using the doubly-labeled water method mentioned above, while Z is the logarithm of actual average daily energy consumption over a longer period of time (e.g., 6-month period), and Q is the logarithm of daily energy consumption over this longer time period as assessed using a FFQ. A plausible statistical model (Prentice et al, 2002
) assumes a classical measurement model for the biomarker, and a more complex measurement model for the self-report data:
where Z* = a0
Z + a2T
V + a3T
Z + r, and V is a vector of study subject characteristics (e.g., body mass index, age, ethnicity, ‘social desirability’ factors) that may influence dietary self-report measurement error, r is a person-specific random effect, and a0
, …, a3
are constants. Random variables on the right side of (1
) are assumed to be statistically independent, given V. An important feature of (1
) is the allowance for systematic bias in the self-report data, a modeling feature also proposed in earlier papers in this research area (Prentice, 1996; Carroll et al, 1998
; Kipnis et al, 2001
). While measurement error without such systematic components primarily attenuates odds ratios or hazard ratios in nutritional epidemiology, systematic biases can more severely distort dose-response associations. The key features of the model (1
) are that the ‘reference’ assessment W, here a biomarker, has been properly calibrated so that the measured assessment plausibly adheres to a classical measurement model; the ability to allow for sources of systematic bias in the self-report through the vector V; and the independence of the error terms, u and r + e, for the biomarker and self-report. This independence assumption is crucial to the development of reliable epidemiologic association information, and is an unlikely assumption if W is instead based on a second self-report assessment. For example, if Q is based on a FFQ, while W is based on a food record from the same study subject, then positive correlations between W and Q could arise in part or in whole from correlated measurement errors, rather than from the self-reports reflecting the underlying nutrient consumption Z. A biomarker assessment W, on the other hand, has the advantage of objectivity and freedom from self-report biases. However, a biomarker assessment, perhaps based on a urine- or blood-based nutrient consumption, needs to be properly calibrated and the classical measurement model in (1
) needs to be applicable.
Specialized statistical methods are needed to apply the measurement model (1
) to data (Q, V) on a study cohort, and data (Q, V, W) on a biomarker subcohort. For example, Sugar et al (2007)
developed regression calibration, refined regression calibration and conditional scores procedures for odds ratio estimation, and provided pertinent asymptotic distribution theory. Extensive simulation studies showed ordinary regression calibration to yield log-odds ratio parameter estimates having good efficiency and robustness, and minimal bias in configurations of interest. Moreover, it was shown that biomarker subsamples as small as 500 could be used to effectively calibrate cohorts of the size included in WHI, even with such high incidence outcomes as breast cancer, or total invasive cancer. Note, however, that about half of the variance in log-odds ratio parameter estimates may be due to variation in calibration equation parameter estimates at a biomarker sample size of 500, depending somewhat on disease incidence. These and other statistical procedures for hazard ratio (Cox model) parameter estimation procedures were developed and compared in an unpublished 2006 Department of Biostatistics, University of Washington, doctoral dissertation by Dr. Pamela Shaw. Once again, ordinary regression calibration proved to provide an efficient approach to association parameter (here log-hazard ratio) estimation that involved negligible bias in simulation studies that emulate applications to WHI cohorts. The regression calibration approach involves replacing a self-report nutrient consumption (e.g., log-energy consumption) by a (nearly) unbiased estimate of actual nutrient consumption (e.g., actual log-energy consumption), here under the measurement model (1
). Under a joint normality assumption for (Z, r + e) given V, it follows that Z given (Q, V), and hence W given (Q, V) adheres to a simple linear regression model with non-zero coefficients for V or V
Z indicating systematic bias in the FFQ assessment. Linear regression of W on (Q, V) in the biomarker subsample then allows calibrated consumption estimates to be obtained, throughout the remainder of the study cohort, from each subject’s (Q, V) value.
We recently reported (Neuhouser et al, 2008
) calibration equations under (1
) for energy, protein, and % of energy from protein derived from biomarker data from 544 weight-stable women recruited from the Nutrient Biomarker Study mentioned above. These women (50% DM trial intervention group; 50% comparison group) were recruited at a representative 12 Clinical Centers (from a total of 40 Centers in the WHI). Each woman completed a basic protocol over a two-week period that included DLW, UN, an FFQ and other questionnaires, and each provided a blood specimen. A 20% reliability subsample repeated the entire protocol about 6 months later. shows estimated coefficients from linear regression of W on (Q, V) with standard error estimated obtained from a ‘sandwich’ variance estimator.
Table 1 Regression Calibration Coefficients for Log-transformed Total Energy, Total Protein, and % of Energy from Protein (Neuhouser et al, 2008)
Note, for energy, the rather weak signal (coefficient of 0.062) arising from FFQ log-energy, while body mass index (weight in kg/height in meters squared) and age provide more highly significant predictors of biomarker-derived log-energy consumption. The full regression models fitted (Neuhouser et al, 2008
) also included some moderate dependencies on ethnicity and socioeconomic factors, but there was little evidence of a dependence of systematic bias on actual consumption (i.e., of a3
≠ 0 in (1)). Log-protein consumption tends to show those same patterns, but with a larger regression coefficient (0.211) for log (FFQ) protein. The coefficient for log (FFQ) % of energy from protein was considerably larger (0.439), indicating better FFQ properties for this nutrient density measure, while there was an inverse dependence on body mass index, suggesting that energy underreporting among overweight and obese women derives primarily from fat and/or carbohydrate underreporting.
These calibration equations were applied to FFQ data obtained early in WHI to develop calibrated estimates of energy, protein, and % of energy from protein for individual women in the DM trial, as well as for women in the companion WHI Observational Study (OS), which included 93,676 postmenopausal women. , from Prentice et al (2009)
, shows Cox model hazard ratio parameter estimates for a 20% increment in nutrient consumption with and without biomarker calibration of the nutrient, based on the analysis of combined data from the DM trial comparison group and the OS that includes 5041 women who developed invasive cancer during WHI follow-up. The log-hazard ratio was modeled as a linear function of Z in these analyses. A regression calibration procedure was used to estimate log-hazard ratio parameters, and a bootstrap procedure was used to estimate standard errors for these parameters. The following variables were included in the hazard ratio model for total invasive cancer (to control confounding) (Prentice et al, 2009
): race/ethnicity, education, exercise, current or past cigarette smoking, alcohol consumption, unopposed estrogen use, estrogen plus progestin use, history of diabetes, and hypertension.
Note, from , that following biomarker calibration there is a noteworthy positive association of total invasive cancer risk with energy consumption, a weaker positive association with protein consumption, and an inverse association with % of energy from protein, whereas these associations are not evident in the absence of biomarker calibration. The confidence intervals are considerably wider for the calibrated compared to the uncalibrated hazard ratio ‘estimates’, reflecting both the attenuation of coefficients and standard error estimates that attends the hazard ratio estimates without calibration, and the random variation in the coefficient estimates in and hence in the calibrated consumption estimates, with a biomarker sample of only 544 women. The positive association with energy was also evident for several site-specific cancers including breast, colon, endometrium, and kidney, in alignment with the obesity associations mentioned in the introduction, whereas these associations were not evident without biomarker calibration (Prentice et al, 2009
). The inverse association of total cancer with % of energy from protein points to fat, alcohol and carbohydrate collectively, as nutrients responsible for the positive energy association. Corresponding analyses have been carried out for other clinical outcomes, including cardiovascular diseases, diabetes, and frailty, with equally interesting, but yet to be published, results.
The analyses just summarized do not control for body mass index in the disease risk model, and the association between energy consumption and disease risk tend to be reduced or to disappear if body mass index is added to this model. Similarly, the association between body mass index and disease risk also tends to be reduced or disappear when calibrated energy consumption is included in the disease risk model (Prentice et al, 2009
). Basically available data are not extensive enough to reliably establish separate roles for the two cancer risk factors. Note that years of consuming a high calorie diet could readily lead to body fat accumulation, so that including body mass index in the disease risk model may lead to ‘overcontrol’. On the other hand, a high body mass implies greater energy requirements, and analyses that do not control for body mass could include some confounding. Further study of this issue, with longitudinal data on body mass and calibrated energy consumption, is needed to sort out the joint association of energy consumption and body mass to the risk of these diseases.
It will also be important for biomarker data of these types to be assembled in additional epidemiologic cohorts, to study the consistency of emerging associations, and to study the transferability of calibration equations from one study population to another. Even then, the fact that suitable biomarkers, plausibly adhering to (1
), have been developed only for a few nutrients will remain as an important nutritional epidemiology research barrier.