Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2881223

Formats

Article sections

- Summary
- 1. Introduction
- 2. Measurement Error and Regression Calibration Models
- 3. Contribution of FFQ Data
- 4. Relationship between Fish Intake and Mercury Level in Women in NHANES
- 5. Simulation Results
- 6. Discussion
- 7. Supplementary Materials
- Supplementary Material
- References

Authors

Related links

Biometrics. Author manuscript; available in PMC 2010 June 5.

Published in final edited form as:

PMCID: PMC2881223

NIHMSID: NIHMS202205

Victor Kipnis,^{1,}^{*} Douglas Midthune,^{1} Dennis W. Buckman,^{2} Kevin W. Dodd,^{1} Patricia M. Guenther,^{3} Susan M. Krebs-Smith,^{4} Amy F. Subar,^{4} Janet A. Tooze,^{5} Raymond J. Carroll,^{6} and Laurence S. Freedman^{7}

The publisher's final edited version of this article is available at Biometrics

See other articles in PMC that cite the published article.

Dietary assessment of episodically consumed foods gives rise to nonnegative data that have excess zeros and measurement error. Tooze et al. (2006, *Journal of the American Dietetic Association* **106,** 1575–1587) describe a general statistical approach (National Cancer Institute method) for modeling such food intakes reported on two or more 24-hour recalls (24HRs) and demonstrate its use to estimate the distribution of the food’s usual intake in the general population. In this article, we propose an extension of this method to predict individual usual intake of such foods and to evaluate the relationships of usual intakes with health outcomes. Following the regression calibration approach for measurement error correction, individual usual intake is generally predicted as the conditional mean intake given 24HR-reported intake and other covariates in the health model. One feature of the proposed method is that additional covariates potentially related to usual intake may be used to increase the precision of estimates of usual intake and of diet-health outcome associations. Applying the method to data from the Eating at America’s Table Study, we quantify the increased precision obtained from including reported frequency of intake on a food frequency questionnaire (FFQ) as a covariate in the calibration model. We then demonstrate the method in evaluating the linear relationship between log blood mercury levels and fish intake in women by using data from the National Health and Nutrition Examination Survey, and show increased precision when including the FFQ information. Finally, we present simulation results evaluating the performance of the proposed method in this context.

U.S. national nutritional surveys traditionally have used the 24-hour recall (24HR) to collect information on food intake as the primary assessment instrument (Dwyer et al., 2003). The main purposes of such surveys are to estimate the distribution of usual (that is, average long-term) intake of nutrients and foods in the population, and to monitor such intakes over time. Another important purpose is to relate individual usual intakes to health outcomes such as blood pressure.

Even assuming unbiasedness of the 24HR, there has been concern over its use for assessing intake of foods that are not typically consumed every day. Many consumers of such episodically consumed foods report zero intake on the 24HR if the report happens to be on a nonconsumption day. Consequently, with typically only one or two administrations of a 24HR in surveys, usual intake of such foods is difficult to estimate. For this reason, an additional instrument, a food frequency questionnaire (FFQ) that queries frequency of consumption over the past year, was included in the National Health and Nutrition Examination Survey (NHANES) conducted in 2003–2006 (Subar et al., 2006). Although the FFQ leads to biased reporting of intake of energy (Kipnis et al., 2003) and therefore of at least some foods, it might nevertheless provide valuable information together with the 24HR to improve estimates of usual intake.

Dodd et al. (2006) reviewed the methods for estimating distributions of usual intake. Tooze et al. (2006) proposed a new method, called the NCI method, to handle nonnegative data with excess zeros that occur in 24HR reports on episodically consumed foods, and demonstrated its use for estimating distributions of usual intakes. Generalizing the two-part modeling approach to longitudinal semicontinuous observations (Olsen and Schafer, 2001; Tooze, Grunwald, and Jones, 2002) to include latent variables, the NCI method uses a two-part nonlinear mixed effects *measurement error* model with correlated random effects, where both parts may incorporate covariates, including other dietary-assessment instruments, such as a FFQ. In this article, we propose an extension of the NCI method to estimate an individual’s usual intake of episodically consumed foods using 24HR data with covariate information. The method may fill a void in analyzing relationships between usual intake and health outcomes for these foods.

All dietary assessment methods based on self-report, including the 24HR, fail to measure true usual intake precisely. The measurement error distorts diet-health outcome relationships, often attenuating them. A popular method of correcting for measurement error is regression calibration (Carroll et al., 2006), which uses, in place of the unknown usual intake, its best mean square error (MSE) predictor, that is, its estimated conditional expectation given the observed 24HRs and other covariates in the health outcome model.

In this article, we derive this conditional expectation for episodically consumed foods. In Section 2, we describe the measurement error model and derive the corresponding regression calibration predictor. The approach allows conditioning on additional covariates related to intake, which may increase the precision of estimates of usual intake and diet-health outcome associations. In Section 3, using data from the Eating at America’s Table Study (EATS), we quantify the increased precision obtained from including a FFQ report as a covariate. In Section 4, we demonstrate the method for evaluating the relationship between blood mercury and fish intake based on NHANES data and show increased precision of the estimated association from incorporating the FFQ. In Section 5, we present simulations to evaluate the finite-sample performance of the method. Section 6 contains discussion.

For individual *i* on day *j*, *i* = 1, …, *n; j* = 1, …, *J _{i}*; let

$$E({Y}_{i}\mid {T}_{i},{\mathbf{Z}}_{i})=m\left({\alpha}_{0}+{\alpha}_{T}{T}_{i}+{\alpha}_{z}^{t}{\mathbf{Z}}_{i}\right),$$

(1)

where **Z*** _{i}* = (

$${\mathrm{\Delta}}_{T}={m}^{-1}\{E({Y}_{i}\mid {T}_{1},{\mathbf{Z}}_{i})\}-{m}^{-1}\{E({Y}_{i}\mid {T}_{0},{\mathbf{Z}}_{i})\},$$

(2)

when dietary intake changes from *T*_{0} to *T*_{1}. For example, for linear regression, expression (2) represents a change in the mean outcome, while, for logistic regression, a log odds ratio.

Let
${\mathbf{X}}_{i}={({\mathbf{Z}}_{i}^{t},{\mathbf{C}}_{i}^{t})}^{t}$ be a vector of covariates related to usual intake. It generally includes covariates **Z*** _{i}* in the health outcome model (1) and some additional factors

Regression calibration requires evaluation of the best MSE predictor *E*(* _{i}* |

Before introducing the measurement error model for episodically consumed foods, we present the model for nutrients and foods consumed daily.

Following convention, the 24HRs are assumed unbiased for individual usual intake, i.e.,

$${R}_{ij}={T}_{i}+{\epsilon}_{ij},\phantom{\rule{0.38889em}{0ex}}E({\epsilon}_{ij}\mid i)=0,$$

(3)

where within-person random errors *ε _{ij}* reflect the daily variation in an individual’s intake and other sources of random error. The model requires that

Assume the linear regression of *T _{i}* on covariates

$${T}_{i}={\beta}_{0}+{\mathit{\beta}}_{X}^{t}{\text{X}}_{i}+{u}_{i},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{u}_{i}\sim \text{Normal}(0;{\sigma}_{u}^{2}).$$

(4)

From equations (3)–(4), the 24HR-reported intake follows the linear mixed measurement error model

$${R}_{ij}={\beta}_{0}+{\mathit{\beta}}_{x}^{t}{\mathbf{X}}_{i}+{u}_{i}+{\epsilon}_{ij}.$$

(5)

In addition to the fixed effect population-level parameters
$\mathit{\beta}={({\beta}_{0},{\mathit{\beta}}_{x}^{t})}^{t}$, model (5) includes the random effect *u _{i}* representing person-specific deviations of usual intake from the population profile defined by covariates

Evaluation of the regression calibration predictor *E*(*T _{i}* |

$${\widehat{T}}_{i}(\mathit{\theta})\equiv E({T}_{i}\mid {\mathbf{R}}_{i},{\mathbf{X}}_{i};\mathit{\theta})=\int \mathfrak{T}({\mathbf{X}}_{i},{u}_{i};\mathit{\theta})f({u}_{i}\mid {\mathbf{R}}_{i},{\mathbf{X}}_{i};\mathit{\theta})d{u}_{i},$$

(6)

where, according to Bayes’ theorem

$$f({u}_{i}\mid {\mathbf{R}}_{i},{\mathbf{X}}_{i};\mathit{\theta})=\frac{f({\mathbf{R}}_{i}\mid {\mathbf{X}}_{i},{u}_{i};\mathit{\theta})f({u}_{i}\mid {\mathbf{X}}_{i};\mathit{\theta})}{\int f({\mathbf{R}}_{i}\mid {\mathbf{X}}_{i},{u}_{i};\mathit{\theta})f({u}_{i}\mid {\mathbf{X}}_{i};\mathit{\theta})d{u}_{i}}.$$

(7)

When parameters in ** θ** are estimated by fitting model (5) to the data, predicted usual intake

$${\widehat{T}}_{i}(\widehat{\mathit{\theta}})={\widehat{w}}_{i}{\overline{R}}_{i}+(1-{\widehat{w}}_{i})({\widehat{\beta}}_{0}+{\widehat{\mathit{\beta}}}_{X}^{t}{X}_{i}),$$

(8)

of the mean of the *J _{i}* reported intakes,

This methodology is already known and was previously suggested for classical measurement error correction (e.g., Whittemore, 1989; Tsiatis, DeGruttola, and Wulfsohn, 1995), but is applicable only to reported intakes that follow the classical error model on the original scale.

Often, within-person random error in the 24HR reported intake is dependent on the individual mean and has a skewed distribution, violating the classical error model assumptions. The most common fix has been to monotonically transform the intakes *R _{ij}* to values
${R}_{ij}^{\ast}=g({R}_{ij})$ that more closely follow the classical model with normally distributed error (Eckert, Carroll, and Wang, 1997). In most cases, it may be achieved using the Box–Cox family of transformations (Box and Cox, 1964)

$$g(v,\lambda )=({v}^{\lambda}-1)/\lambda .$$

(9)

We assume that such a transformation exists and that, on the transformed scale, we have

$$\begin{array}{l}{R}_{ij}^{\ast}\equiv g({R}_{ij},{\lambda}_{R})\\ =E\{g({R}_{ij},{\lambda}_{R})\mid i\}+{\epsilon}_{ij},{\epsilon}_{ij}\sim \text{Normal}\left(0,{\sigma}_{\epsilon}^{2}\right),\end{array}$$

where *ε _{ij}* are independent of the individual mean
${\mu}_{i}^{\ast}=E({R}_{ij}^{\ast}\mid i)$ and of each other. Assuming the regression of
${\mu}_{i}^{\ast}$ on

$${\mu}_{i}^{\ast}={\beta}_{0}+{\mathit{\beta}}_{X}^{t}{\mathbf{X}}_{i}+{u}_{i},\phantom{\rule{0.16667em}{0ex}}{u}_{i}\sim \text{Normal}(0;{\sigma}_{u}^{2}),$$

the reported intakes follow the *nonlinear* mixed effects measurement error model

$$g({R}_{ij},{\lambda}_{R})={\beta}_{0}+{\mathit{\beta}}_{X}^{t}{\mathbf{X}}_{i}+{u}_{i}+{\epsilon}_{ij}.$$

(10)

Following convention (Dodd et al., 2006), we continue to assume that 24HRs are unbiased for true individual intake *on the original scale*, so the usual intake of person *i* is:

$$\begin{array}{l}{T}_{i}\equiv E({R}_{ij}\mid {\mathbf{X}}_{i},{u}_{i})\\ =E\{{g}^{-1}({\beta}_{0}+{\mathit{\beta}}_{X}^{t}{\mathbf{X}}_{i}+{u}_{i}+{\epsilon}_{ij},{\lambda}_{R})\mid {\mathbf{X}}_{i},{u}_{i}\}\\ \approx g\ast ({\beta}_{0}+{\mathit{\beta}}_{X}^{t}{\mathbf{X}}_{i}+{u}_{i},{\lambda}_{R}),\end{array}$$

(11)

where, from Taylor’s expansion,

$$g\ast (v,{\lambda}_{R})={g}^{-1}(v,{\lambda}_{R})+\frac{1}{2}{\sigma}_{\epsilon}^{2}\frac{{\partial}^{2}\{{g}^{-1}(v,{\lambda}_{R})\}}{\partial {v}^{2}}.$$

(12)

Following equation (11), the best predictor of individual usual intake is given by

$$E({T}_{i}\mid {\mathbf{R}}_{i},{\mathbf{X}}_{i})\approx E\{g\ast ({\beta}_{0}+{\mathit{\beta}}_{X}^{t}{\mathbf{X}}_{i}+{u}_{i},{\lambda}_{R})\mid {\mathbf{R}}_{i},{\mathbf{X}}_{i}\}.$$

When *g* is the identity function, this conditional expectation reduces to the BLUP (8). For general *g*, it is different and needs to be evaluated according to equations (6)–(7). Because *u _{i}* and

$$\begin{array}{l}f({\mathbf{R}}_{i}^{\ast}\mid {\mathbf{X}}_{i},{u}_{i};\mathit{\theta})=\prod _{j=1}^{{J}_{i}}\left\{\frac{1}{{\sigma}_{\epsilon}}\phi \left(\frac{g({R}_{ij})-{\beta}_{0}-{\mathit{\beta}}_{X}^{t}{\mathbf{X}}_{i}-{u}_{i}}{{\sigma}_{\epsilon}}\right)\right\},\\ f({u}_{i}\mid {\mathbf{X}}_{i};\mathit{\theta})=\frac{1}{{\sigma}_{u}}\phi ({u}_{i}/{\sigma}_{u}),\end{array}$$

so that

$$\begin{array}{l}{\widehat{T}}_{i}(\mathit{\theta})\equiv E\{g\ast ({\beta}_{0}+{\mathit{\beta}}_{X}^{t}{\mathbf{X}}_{i}+{u}_{i},{\lambda}_{R})\mid {\mathbf{R}}_{i}^{\ast},{\mathbf{X}}_{i};\mathit{\theta}\}\\ =\frac{\int g\ast ({\beta}_{0}+{\mathit{\beta}}_{X}^{t}{\mathbf{X}}_{i}+{u}_{i},{\lambda}_{R}){\displaystyle \prod _{j=1}^{{J}_{i}}}\left\{\frac{1}{{\sigma}_{\epsilon}}\phi \left(\frac{g({R}_{ij},{\lambda}_{R})-{\beta}_{0}-{\mathit{\beta}}_{X}^{t}{\mathbf{X}}_{i}-{u}_{i}}{{\sigma}_{\epsilon}}\right)\right\}\frac{1}{{\sigma}_{u}}\phi ({u}_{i}/{\sigma}_{u})d{u}_{i}}{\int {\displaystyle \prod _{j=1}^{{J}_{i}}}\left\{\frac{1}{{\sigma}_{\epsilon}}\phi \left(\frac{g({R}_{ij},{\lambda}_{R})-{\beta}_{0}-{\mathit{\beta}}_{X}^{t}{\mathbf{X}}_{i}-{u}_{i}}{{\sigma}_{\epsilon}}\right)\right\}\frac{1}{{\sigma}_{u}}\phi ({u}_{i}/{\sigma}_{u})d{u}_{i}},\end{array}$$

(13)

where is the standard normal distribution density. Following the EB approach, the integrals in expression (13) are evaluated by substituting in ** after fitting model (10) to the data.**

As far as we know, this method has not previously been described and should be useful in itself for intakes of nutrients or foods that are daily consumed. It is also an intermediate step to the estimation of usual intake for episodically consumed foods.

Generalizing the approach developed at Iowa State University (Nusser, Fuller, and Guenther, 1997) to deal with episodically consumed foods, Tooze et al. (2006) considered two components of usual intake in the NCI method. The first is the individual *probability* to consume a food on a given day, *p _{i}* =

$${T}_{i}\equiv E({T}_{ij}\mid i)={p}_{i}{A}_{i},$$

(14)

the product of the probability to consume and the usual amount on consumption days.

To specify a measurement error model in this case requires modifying the assumptions. Following Tooze et al. (2006), we assume that (i) a food is reported on the 24HR as consumed on a certain day if and only if it *was* consumed on that day, so that *P*(*R _{ij}* > 0|

$$E({R}_{ij}\mid i)={p}_{i}{A}_{i}={T}_{i}.$$

(15)

Following the NCI method, we consider a two-part measurement error model for the 24HR. In the first part, we model the consumption probability as the mixed effects logistic regression

$$P({R}_{ij}>0\mid i)\equiv {p}_{i}=H({\beta}_{10}+{\mathit{\beta}}_{X1}^{t}{\text{X}}_{1i}+{u}_{1i}),j=1,\dots ,{J}_{i},$$

(16)

where
${u}_{1i}\sim \text{Normal}(0;{\sigma}_{u1}^{2})$ is independent of **X**_{1}* _{i}*. In addition to fixed effect population-level parameters
${\mathit{\beta}}_{1}={({\beta}_{10},{\mathit{\beta}}_{X1}^{t})}^{t}$, the model includes the random effect

$$g({R}_{ij},{\lambda}_{R}\mid {R}_{ij}>0)={\beta}_{20}+{\mathit{\beta}}_{X2}^{t}{\mathbf{X}}_{2i}+{u}_{2i}+{\epsilon}_{ij},$$

(17)

where *u*_{2}* _{i}* and

The two parts of the model are linked in two ways. First, the random effects, *u*_{1}* _{i}* and

$$\begin{array}{l}{\mathit{u}}_{i}={({u}_{1i},{u}_{2i})}^{t}=\text{Normal}(0,{\mathrm{\sum}}_{u}),\\ {\mathrm{\sum}}_{u}=\left(\begin{array}{cc}{\sigma}_{u1}^{2}& {\rho}_{u1,u2}{\sigma}_{u1}{\sigma}_{u2}\\ {\rho}_{u1,u2}{\sigma}_{u1}{\sigma}_{u2}& {\sigma}_{u2}^{2}\end{array}\right).\end{array}$$

(18)

Second, both parts of the model may share common covariates among the components of **X**_{1}* _{i}* and

In our model, the probability of consumption for an individual may be arbitrarily small, but is always positive. The model therefore allows for any finite number of days with zero intakes, but does not incorporate never-consumers, if they exist. We discuss this further in Section 6.

According to equation (15), usual intake *T _{i}* on the original scale, is

$$\begin{array}{l}{T}_{i}=P({R}_{ij}>0\mid i)E\{{g}^{-1}({R}_{ij},{\lambda}_{R})\mid i;{R}_{ij}>0\}\\ \approx H({\beta}_{10}+{\mathit{\beta}}_{X1}^{t}{\mathbf{X}}_{1i}+{u}_{1i})g\ast ({\beta}_{20}+{\mathit{\beta}}_{X2}^{t}{\mathbf{X}}_{2i}+{u}_{2i},{\lambda}_{R}),\end{array}$$

(19)

where *g*^{*} is defined by equation (12). As before, we follow formulas (6)–(7) to obtain:

$$\begin{array}{l}{\widehat{T}}_{i}(\mathit{\theta})\equiv E({T}_{i}\mid {\mathbf{R}}_{i},{\mathbf{X}}_{1i},{\mathbf{X}}_{2i};\mathit{\theta})\\ \approx \frac{\int H({\beta}_{10}+{\mathit{\beta}}_{X1}^{t}{\mathbf{X}}_{1i}+{u}_{1i})g\ast ({\beta}_{20}+{\mathit{\beta}}_{X2}^{t}{\mathbf{X}}_{2i}+{u}_{2i},{\lambda}_{R})f\{g({\mathbf{R}}_{i},{\lambda}_{R})\mid {\mathbf{u}}_{i},{\mathbf{X}}_{1i},{\mathbf{X}}_{2i};\mathit{\theta}\}f({\mathbf{u}}_{i}\mid {\mathbf{X}}_{1i},{\mathbf{X}}_{2i};\mathit{\theta})d{u}_{1i}d{u}_{2i}}{\int f\{g({\mathbf{R}}_{i},{\lambda}_{R})\mid {\mathbf{u}}_{i},{\mathbf{X}}_{1i},{\mathbf{X}}_{2i};\mathit{\theta}\}f({\mathbf{u}}_{i}\mid {\mathbf{X}}_{1i},{\mathbf{X}}_{2i};\mathit{\theta})d{u}_{1i}d{u}_{2i}},\end{array}$$

(20)

where

$$\begin{array}{l}f\{g({\mathbf{R}}_{i},{\lambda}_{R})\mid {\mathbf{u}}_{i},{\mathbf{X}}_{1i},{\mathbf{X}}_{2i};\mathit{\theta}\}=\prod _{j=1}^{{J}_{i}}\left[\frac{{\{exp({\beta}_{10}+{\mathit{\beta}}_{X1}^{t}{\mathbf{X}}_{1i}+{u}_{1i})\}}^{I({R}_{ij}>0)}}{1+exp({\beta}_{10}+{\mathit{\beta}}_{X1}^{t}{\mathbf{X}}_{1i}+{u}_{1i})}\times {\left\{\frac{1}{{\sigma}_{\epsilon}}\phi \left(\frac{g({R}_{ij},{\lambda}_{R})-{\beta}_{20}-{\mathit{\beta}}_{X2}^{t}{\mathbf{X}}_{2i}-{u}_{2i}}{{\sigma}_{\epsilon}}\right)\right\}}^{\text{I}({\text{R}}_{\text{ij}}>0)}\right];\\ f({\mathbf{u}}_{i}\mid {\mathbf{X}}_{1i},{\mathbf{X}}_{2i};\mathit{\theta})\equiv f({\mathbf{u}}_{i};\mathit{\theta})=\frac{1}{{\sigma}_{u1}\sqrt{1-{\rho}_{u1,u2}^{2}}}\times \phi \left(\frac{{u}_{1i}-{\rho}_{u1,u2}({\sigma}_{u1}/{\sigma}_{u2}){u}_{2i}}{{\sigma}_{u1}\sqrt{1-{\rho}_{u1,u2}^{2}}}\right)\frac{1}{{\sigma}_{u2}}\phi \left(\frac{{u}_{2i}}{{\sigma}_{u2}}\right),\\ \mathit{\theta}={({\beta}_{10},{\mathit{\beta}}_{X1}^{t},{\beta}_{20},{\mathit{\beta}}_{X2}^{t},{\rho}_{u1,u2},{\sigma}_{u1},{\sigma}_{u2},{\lambda}_{R})}^{t}.\end{array}$$

Following the EB approach, the integrals in equation (20) may be evaluated using adaptive Gaussian quadrature (Liu and Pierce, 1994) by substituting in the maximum likelihood estimates of parameters ** θ** after simultaneously fitting models (16)–(18) to the data.

When usual intake has a skewed distribution with high leverage points, one might transform it to a more appropriate scale before relating to a health outcome in model (1). Assuming that such transformation is given by *g*(*T _{i}*,

$$g({T}_{i},{\lambda}_{T})\approx g\{H({\beta}_{10}+{\mathit{\beta}}_{X1}^{t}{\mathbf{X}}_{1i}+{u}_{1i})g\ast ({\beta}_{20}+{\mathit{\beta}}_{X2}^{t}{\mathbf{X}}_{2i}+{u}_{2i},{\lambda}_{R}),{\lambda}_{T}\}$$

(21)

instead of *T _{i}* in formula (20).

Because the FFQ report is related to true food intake and is independent of the health outcome given true intake and other covariates in model (1), it may be included as an additional covariate (a component of vector **C*** _{i}*) in the calibration model. In this section, we quantify the contribution of the FFQ to the prediction of individual usual intake, using the EATS data (Subar et al., 2001). We considered 965 respondents who successfully completed four 24HRs and a FFQ.

We assumed a simple univariate model relating a transformed food intake to either a hypothetical continuous health outcome (linear regression) or to a dichotomous outcome (logistic regression). For a given food, we fit models (16)–(18) to the data, estimated the distribution of usual intake by applying the NCI method (Tooze et al., 2006), found the best Box–Cox transformation of true intake to approximate normality, and finally predicted individual usual intake on the transformed scale. We assumed the same set of additional covariates for both parts of the model and considered three different sets. The first set was empty; the second contained age, body mass index (BMI), and education (no college, some college, and college graduate); the third set additionally contained the FFQ report. We Box–Cox transformed FFQ positive values to improve linearity and homoscedasticity of model (17), and used an indicator variable for zero FFQ reports.

Finally, we calculated the variances *V*_{1}, *V*_{2}, *V*_{3} of the predicted usual intakes corresponding to each of the three scenarios described above. We quantified the contribution of the additional set of covariates (age, BMI, education) by the ratio *V*_{2}/*V*_{1}, and the additional contribution of the FFQ as *V*_{3}/*V*_{2}. As we prove in Web Appendix A, the larger the ratios *V*_{2}/*V*_{1} and *V*_{3}/*V*_{2} are, the larger is the contribution of the covariate(s) to the precision of the predictor and the greater is the precision of estimated exposure effects in the health outcome model.

We present results of our analyses in Table 1, separately for men and women, for five selected food groups: dark green vegetables, tomatoes and tomato products, fruit, milk and milk products, and fish. The variance ratios show that the FFQ contributed considerably more to the estimation of usual intake than did the other covariates, but the size of contribution varied substantially across foods. For dark green vegetables, tomatoes (for women), and fish intake, the increase in efficiency due to the FFQ was large and ranged from 21% (dark green vegetables for women) to 137% (dark green vegetables for men). For other foods, the increase in efficiency from the FFQ was more modest, ranging from 3% (fruit for men) to 13% (tomatoes for men).

To further illustrate our methods, we examined the relationship between fish intake and blood mercury levels in women of child-bearing age by using NHANES data. This subject is important because it has been shown that high levels of mercury can increase complications of pregnancy (Xue et al., 2007). We analyzed 1605 females, aged 12–49 years, who participated in the 2003–2004 round of NHANES, and provided at least one 24HR, one FFQ, and a blood sample for measurement of serum mercury. Among these participants, 1206 (75%) reported no fish consumption on either 24HR, 342 (21%) reported fish consumption on one of the two days, and 57 (3.6%) reported consuming fish on both days. In view of the large proportion of nonconsumption days, one might expect that the FFQ, with 1553 (96.8%) women reporting fish consumption over the past year, would add useful information.

We examined the linear regression of log serum mercury level (*μ*g/l) on usual fish intake (oz/day), obtained after a suitable Box–Cox transformation. We compared three regressions, one where individual fish intake was represented by the average of the 24HRs (the “naïve” analysis), and the other two that used the regression calibration to adjust for measurement error in the 24HR using the proposed method. In the first calibration model, individual fish intake was predicted using age, race (White, African American, other), and education (no college, some college, college graduate) as additional covariates (components of vector **C*** _{i}*). In the second, fish intake was predicted with the same three additional covariates plus the FFQ frequencies that were handled the same way as described above for the EATS example. Standard errors (SEs) of the estimated regression slopes were estimated using the balanced repeated replication method (Wolter, 1995). Results are presented in Table 2.

Estimated difference in log mercury (μg/l) between women with an average of 0.1 oz and 1.0 oz of fish per day in NHANES

The naïve analysis indicates a clear relationship between serum mercury and fish intake (Wald’s *z* = 0.33/0.038 = 8.7), but quantifies the effect as a 39% increase (exp(0.33)) in serum mercury between those who consume an average of 0.1 oz of fish per day and those who consume 1 oz per day. (These consumption levels were approximately the 10th and 90th percentiles in the population.) Such a small increase might only be of modest public health concern. The proposed method estimates the increase in serum mercury to be approximately fivefold (exp(1.58)) to sevenfold (exp(1.97)) with/without the FFQ in the calibration model, respectively, values that certainly would warrant concern. Note that the addition of the FFQ increases efficiency in the estimated effect approximately twofold yielding a SE of 0.38 compared with 0.54 for the regression calibration without the FFQ.

To assess model fit, we applied an informal graphical approach, as described in Web Appendix B. The results (Web Figures 1–4) did not exhibit obvious model misspecification.

We conducted a simulation study to evaluate the performance of the proposed method in a finite sample. We designed our simulation to mimic the investigation of the relationship between serum mercury levels and usual fish intake in women in NHANES, described in Section 4. In the simulation, the true relationship was specified as the simple linear regression

$${Y}_{i}=-1.35+0.8{T}_{i}^{\ast}+{\delta}_{i},$$

(22)

where *Y* represented log mercury concentration (*μ*g/l), and *T*^{*} represented Box–Cox transformed usual fish intake (oz/day) with parameter *λ _{T}* = 0.17, chosen to improve the linearity and homoscedasticity of the model. The coefficient of 0.8 leads to a true 1.52 increase in log mercury level between persons who consume 0.1 oz of fish compared with 1 oz per day, similar to the 1.58 estimated from the NHANES data (Table 2). We used

The details of the simulations are provided in Web Appendix C. Three different sets of additional covariates (components of vector **C*** _{i}*) were used in the calibration model: (a) empty set; (b) age, BMI, and education; and (c) age, BMI, education, and Box–Cox transformed FFQ report. Table 3 presents the overall results of applying this procedure to 250 simulated data sets.

Estimating the increase in log mercury level between consumers of an average of 0.1 oz and 1.0 oz of fish per day: empirical results based on 250 simulations regressing log mercury level on estimated usual fish intake (true model:
${Y}_{i}=-1.35+0.8$ **...**

The results show that, due to measurement error, the naïve approach using the average of the 24HRs grossly underestimates the true value, as expected by theory. On average, estimates based on the proposed method have negligible bias, although, again as expected by theory, their precision is poorer than that of the naïve estimate. Importantly, the precision improves with the inclusion of additional covariates for predicting an individual’s usual intake. The estimate based on demographic covariates and the FFQ report is four times more efficient ([0.018/0.009]^{2}) than the estimate based on no covariates and 2.42 times more efficient than the estimate based on the demographic covariates only. The latter effect is equivalent to reducing the sample size of the study by approximately 60% ([1 − 1/2.42] × 100%), illustrating the potential gains from including the FFQ in the prediction of an individual’s usual intake.

We have presented a method of predicting an individual’s usual intake of an episodically consumed food and relating it to a health outcome. The method is based on regression calibration prediction applied to short-term repeat observations of intake that contain measurement error and excess zeros, under two important assumptions. First, the fact of short-term consumption is assumed to be correctly classified. Second, the reported intake on consumption days is assumed unbiased for true intake. In our method, information from the main dietary instrument may be combined with that from another longer-term, presumably less precise and even biased, report using an auxiliary instrument. We have demonstrated, through real data and simulations, that the gain from combining two instruments may be substantial, with increases in the precision of the predicted usual intake and of the estimated diet-health outcome relationship.

In our applications, the main instrument was a 24HR and the auxiliary instrument a FFQ. Unfortunately, the assumption of unbiasedness of the main instrument does not strictly apply to the 24HR. Recent biomarker studies (Kipnis et al., 2003) have shown that, for total energy, the 24HR also involves systematic error related to true usual intake. Such biases in reporting energy intake indicate bias also in the reporting of at least some energy-contributing foods. On the other hand, these same studies confirmed that the bias in 24HR reports is considerably less than that in FFQs. Thus, in the absence of any accurate biomarker for most foods and nutrients, using the 24HR in our proposed method may provide the best available approximation.

Our method appears to fill a gap in the analytic tools of nutritional epidemiologists estimating food and health outcome associations. Use of 24HRs alone is known to be problematic when there is a large number of zero values, whereas use of the FFQ alone is marred by the large reporting biases of this instrument. Our examples have demonstrated that the proposed method is feasible to implement and produces nearly unbiased estimates of associations of intakes of episodically consumed foods with health outcomes. The method outperformed the “naïve” approach even without the FFQ in the calibration model, giving an estimate with a much reduced MSE. However, use of the FFQ greatly increased the precision of the estimate.

As shown in Section 3, use of the FFQ will not have a large impact for all foods. Probably the most important factor that determines the impact of the FFQ is the overall probability to consume the food on a given day. For foods with a relatively low probability of consumption (e.g., fish and dark green vegetables in Table 1), the FFQ will most likely provide a larger increase in efficiency. However, a larger sample size (or, alternatively, more repeat 24HRs) is required to obtain reliable estimates of the model parameters when the consumption probability is very low. This is because a substantial number of individuals with at least two consumption days are needed to estimate properly the within-person variance in the second part of the model. In our NHANES example, there were 57 women (out of 1605) who consumed fish on both days. We would not expect reliable fits for very rarely consumed foods (e.g., organ meats or yogurt in NHANES) with considerably fewer than 50 individuals with two positive intakes and indeed we have encountered some convergence problems in simulations of such cases.

In our two-part model, the first part specifies the probability of the point mass at zero, and the second part *conditionally* models the continuous variable given that it is positive. Another potential approach to modeling semicontinuous data with measurement error was proposed by Li, Shao, and Palta (2005). It is based on the sample selection model that posits an underlying continuous variable censored by a random mechanism. Using our notation, true long-term and reported intakes are specified as *T _{i}* = max (0,

Our two-part model assumes that each food is ultimately consumed by all individuals, so that *T _{i}* > 0. This derives from specifying the random effect in the probability part as a continuous variable. In a similar situation, Olsen and Schafer (2001) suggested a two-part mixture for the distribution of this random effect, where the status of a “teetotaler” is specified by a latent class classification variable, but did not provide any details of fitting such a model.

We considered adding a third part to our model, which specifies for each person the probability to be a never-consumer by using fixed-effect logistic regression on a vector of covariates **X**_{3}* _{i}*. We have fitted this model to the data on fish intake in EATS among 515 women, including 30 who reported zero intakes on the FFQ. An indicator variable of whether fish consumption was reported on the FFQ was used as a covariate in

Our methodology is suitable for analysis of a particular food and its relationship with a health outcome that involves no other dietary factors. An extension to a multivariate case with several foods and nutrients requires conditioning in formula (20) on potentially correlated random effects for all considered dietary factors simultaneously and is another area for future research.

Although we concentrated on dietary surveys, the proposed method can also be applied to cohort studies of associations between episodically consumed foods and disease. Currently, most such studies use a FFQ as the main dietary-assessment instrument, while a more precise short-term reference instrument is available only in a calibration substudy. In such cases, the regression calibration is based on estimating

$${\widehat{T}}_{i}(\mathit{\theta})\equiv E({T}_{i}\mid {\mathbf{X}}_{i};\mathit{\theta})=\int \mathfrak{T}({\mathbf{X}}_{i};\mathit{\theta})f({u}_{i}\mid {\mathbf{X}}_{i};\mathit{\theta})d{u}_{i},$$

which involves conditioning on the FFQ and other covariates, but not on the 24HR (and therefore random effects) as in formulas (6)–(7). This simplifies the method and, more importantly, allows its application to a multivariate case with several foods and nutrients by considering regression calibration of each dietary factor, one at a time.

In the future, as automated 24HRs become available, our methodology could combine multiple administrations of this instrument with the FFQ to achieve more precise results.

Web Appendices A–C, Web Figures 1–4, and NHANES example data, referenced in Sections 2, 4, and 5, as well as the SAS program implementing the proposed method are available under the Paper Information link at the *Biometrics* web-site http://www.biometrics.tibs.org.

Click here to view.^{(249K, csv)}

Click here to view.^{(29K, txt)}

Click here to view.^{(35K, txt)}

R.J.C.’s research was supported by a grant from the National Cancer Institute (CA57030) and by Award KUS-CI-016-04, made by King Abdullah University of Science and Technology.

- Box GEP, Cox DR. An analysis of transformations. Journal of the Royal Statistical Society, Series B. 1964;26:211–252.
- Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement Error in Nonlinear Models: A Modern Perspective. 2. Boca Raton, Florida: Chapman and Hall CRC Press; 2006.
- Dodd K, Guenther PM, Freedman LS, Subar AF, Kipnis V, Midthune D, Tooze JA, Krebs-Smith SM. Statistical methods for estimating usual intake of nutrients and foods: A review of the theory. Journal of the American Dietetic Association. 2006;106:1640–1650. [PubMed]
- Dwyer J, Picciano MF, Raiten DJ. Members of the Steering Committee. Collection of food and dietary supplement intake data: What we eat in America—NHANES. The Journal of Nutrition. 2003;133:590S–600S. [PubMed]
- Eckert RS, Carroll RJ, Wang N. Transformations to additivity in measurement error models. Biometrics. 1997;53:262–272. [PubMed]
- Kipnis V, Subar AF, Midthune D, Freedman LS, Ballard-Barbash R, Troiano R, Bingham S, Schoeller DA, Schatzkin A, Carroll RJ. The structure of dietary measurement error: Results of the OPEN biomarker study. American Journal of Epidemiology. 2003;158:14–21. [PubMed]
- Li L, Shao J, Palta M. A longitudinal measurement error model with a semicontinuous covariate. Biometrics. 2005;61:824–830. [PubMed]
- Liu Q, Pierce DA. A note on Gauss-Hermite quadrature. Biometrika. 1994;81:624–629.
- McCulloch CE, Searle SR. Generalized, Linear, and Mixed Models. New York: Wiley; 2001.
- Nusser SM, Fuller WA, Guenther PM. Estimating usual dietary intake distributions: Adjusting for measurement error and nonnormality in 24-hour food intake data. In: Lyberg L, Biemer P, Collins M, Deleeuw E, Dippo C, Schwartz N, Trewin D, editors. Survey Measurement and Process Quality. New York: Wiley; 1997. pp. 670–689.
- Olsen MK, Schafer JL. A two-part random-effects model for semicontinuous longitudinal data. Journal of the American Statistical Association. 2001;96:730–745.
- Subar AF, Thompson FE, Kipnis V, Midthune D, Hurwitz P, McNutt S, McIntosh A, Rosenfeld S. Comparative validation of the Block, Willett, and National Cancer Institute food frequency questionnaires: The Eating at America’s Table Study. American Journal of Epidemiology. 2001;154:1089–1099. [PubMed]
- Subar AF, Dodd KW, Guenther PM, Kipnis V, Midthune D, McDowell M, Tooze JA, Freedman LS, Krebs-Smith SM. The Food Propensity Questionnaire (FPQ): Concept, development and validation for use as a covariate in model to estimate usual food intake. Journal of the American Dietetic Association. 2006;106:1556–1563. [PubMed]
- Tooze JA, Grunwald GK, Jones RH. Analysis of repeated measures data clumping at zero. Statistical Methods in Medical Research. 2002;11:341–355. [PubMed]
- Tooze JA, Midthune D, Dodd KW, Freedman LS, Krebs-Smith SM, Subar AF, Carroll RJ, Kipnis V. A new statistical method for estimating the usual intake of episodically consumed foods with application to their distribution. Journal of the American Dietetic Association. 2006;106:1575–1587. [PMC free article] [PubMed]
- Tsiatis AA, DeGruttola V, Wulfsohn MS. Modeling the relationship of survival to longitudinal data measured with error. Application to survival and CD4 counts in patients with AIDS. Journal of the American Statistical Association. 1995;90:27–37.
- Whittemore AS. Errors-in-variables regression using Stein estimates. The American Statistician. 1989;43:226–228.
- Wolter KM. Introduction to Variance Estimation. New York: Springer-Verlag; 1995.
- Xue F, Holzman C, Rahbar MH, Trosko K, Fischer L. Maternal fish consumption, mercury levels, and risk of preterm delivery. Environmental Health Perspectives. 2007;115:42–47. [PMC free article] [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |