|Home | About | Journals | Submit | Contact Us | Français|
It is of interest to estimate the distribution of usual nutrient intake for a population from repeat 24-h dietary recall assessments. A mixed effects model and quantile estimation procedure, developed at the National Cancer Institute (NCI), may be used for this purpose. The model incorporates a Box–Cox parameter and covariates to estimate usual daily intake of nutrients; model parameters are estimated via quasi-Newton optimization of a likelihood approximated by the adaptive Gaussian quadrature. The parameter estimates are used in a Monte Carlo approach to generate empirical quantiles; standard errors are estimated by bootstrap. The NCI method is illustrated and compared with current estimation methods, including the individual mean and the semi-parametric method developed at the Iowa State University (ISU), using data from a random sample and computer simulations. Both the NCI and ISU methods for nutrients are superior to the distribution of individual means. For simple (no covariate) models, quantile estimates are similar between the NCI and ISU methods. The bootstrap approach used by the NCI method to estimate standard errors of quantiles appears preferable to Taylor linearization. One major advantage of the NCI method is its ability to provide estimates for subpopulations through the incorporation of covariates into the model. The NCI method may be used for estimating the distribution of usual nutrient intake for populations and subpopulations as part of a unified framework of estimation of usual intake of dietary constituents.
Nutrients are components of foods, including carbohydrate, fat, protein, vitamins, and minerals. Because many nutrients are found in a wide variety of foods, most nutrients are consumed nearly every day by nearly everyone. There is interest in estimating the distribution of the usual dietary intake of nutrients for a population with regard to establishing population norms, evaluating compliance with dietary recommendations, assessing risk, and making public policy decisions .
Usual intake is defined as the long-run average daily intake of a dietary component by an individual. The 24-h recall (24HR) is the standard method of assessing dietary intake in surveillance studies in the U.S. ; other methods, such as weighed food records may also be used . Within-individual variability in daily intake is generally large, due to day-to-day variation in diet and other random sources of measurement error, as compared with between-individual variability for most nutrients [4, 5]. Because 24HRs are typically interviewer-administered and collect a large amount of detailed food information that must be coded, it is often impractical to administer more than two recalls per individual, which is the minimum number required to separate within-individual from between-individual variability.
The purpose of this paper is to investigate the use of a model and a quantile estimation procedure, developed at the National Cancer Institute (NCI), for estimating the distribution of usual nutrient intake, using data from two or more 24HRs, and to compare the new estimation procedure (‘NCI method’) to standard methods. Section 2 provides a brief review of the standard methods for estimating nutrient intake. The NCI method is presented in Section 3. Section 4 presents an example comparing the NCI method to other methods using data from a random sample. Simulation studies are presented in Section 5.
In dietary surveillance, it is of interest to estimate the entire distribution of usual intake for a population. Often researchers and policy makers are interested in estimating the area in the tails of the distribution to estimate levels of inadequate or excess intake. Traditionally, one 24HR or the mean of two or more 24HR observations for an individual have been considered as a representative sample of a person’s dietary intake, and the distribution of these values has been used to estimate the proportion of the population or subpopulations having intakes above or below particular dietary standards . Under the assumption that 24HRs yield unbiased estimates of single-day intake, the mean of the distribution of individual means provides an unbiased estimate of mean usual intake. However, because one observation or the mean of a few observations includes a substantial amount of within-individual variability, the spread of the distribution of individual means is substantially larger than the spread of the usual intake distribution. Thus, using the empirical distribution of individual means to estimate usual intake will lead to biased estimates of other moments of the distribution except the mean. To address these limitations, alternative statistical methods have been developed for the problem of excess within-individual variability with two or more 24HRs per person. These methods, described in detail in Dodd et al. , separate within-from between-person variation to obtain an estimated distribution that reflects variation in usual intake. The methods are most convenient to apply when the intakes are normally distributed, but this is rarely the case in practice; intake distributions for most dietary components are skewed to the right. Therefore, the statistical methods apply a transformation to the observed intakes to achieve approximate normality, estimate the requisite parameters, and then backtransform the data to the original scale. The methods are described briefly in this section.
For dietary components consumed on an almost daily basis by nearly everyone, such as most nutrients, the distributions of 24HR observations can generally be transformed to approximate normality using power or logarithmic transformations. On the transformed scale, a measurement error model may be used to adjust for within-individual variability. Methods employing this approach for the estimation of frequently consumed nutrients are well-established [8–12]. A 1986 National Research Council (NRC) report  and a subsequent algorithm published by the Institute of Medicine (IOM)  proposed a measurement error model for the estimation of usual intake:
where represents the transformed 24HR nutrient intake for individual i at time j, is the usual intake for the individual on the transformed scale, and the eij represent within-individual variation. At the population or group level, are assumed to have mean and variance ; the eij have mean 0, variance , and are independent of usual intake. The IOM report estimated the distribution of usual intake on the transformed scale by the empirical distribution of , a shrinkage estimator given by:
where is the individual mean on the transformed scale, is the group mean, and m is the number of recalls per person. This is an intuitively appealing estimator, as the adjusted values are very similar to the individual means when the within-individual variability is small or there are a large number of replicates, but are very similar to the grand mean when is large. The IOM algorithm does not clearly address estimation of the distribution on the original scale, but the original NRC reference suggests applying the inverse of the initial transformation to each adjusted value, then constructing the empirical distribution of the backtransformed values.
A method developed at the ISU is an extension of the NRC method for nutrients and uses a semi-parametric transformation of the data to normality followed by a backtransformation to the original scale after adjusting for within-individual variation . In contrast to the NRC method, the ISU method uses a two-stage normality transformation, fitting a grafted cubic polynomial to the normal quantile plot of the data after an initial power or logarithmic transformation. It also allows for heterogeneous within-individual variances in the transformed scale. The backtransformation to the original scale incorporates an adjustment to ensure that the mean of the estimated usual intake distribution in the original scale is essentially equal to the overall mean of the original 24HR data. The latter adjustment reflects the assumption that 24HRs are unbiased for single-day intake on the original scale. The NRC method as implemented in the IOM algorithm, which does not incorporate such an adjustment, implicitly assumes that the unbiasedness applies only on the transformed scale.
The NRC method does not provide guidance for computing standard errors of estimated quantities such as the percentiles of usual intake or the proportion of the usual intake distribution above or below some cutoff values; the ISU method as implemented in the software package C-SIDE  (v 1.03, ISU, Ames, IA) produces approximate standard errors based on the Taylor series linearization  for data arising from simple random samples, but requires alternative variance estimation techniques if data come from a complex survey.
The NCI method [14, 15] has been proposed for use in estimating the usual intake distributions for episodically-consumed dietary components such as foods, which exhibit a large proportion of zero intakes on any given day. The centerpiece of the NCI method is a two-part model for repeated measures data with correlated random effects. As in the ISU method, the NCI method requires two or more 24HRs on at least a random subset of the population. The model separates usual intake into two parts: the probability to consume a food on a particular day, and given that the food was consumed, the amount eaten on the consumption day. However, nutrient intake may be analyzed as a special case of the two-part model, as we now describe. Because the probability of consumption of a nutrient is close to or equal to 1, only the amount part of the model is needed.
Let Tij be the true nutrient intake for an individual i (i = 1, …, N) on day j on the original scale. The usual intake for the individual is Ti = E(Tij|i), the conditional expectation of true single-day intakes (the E(·|i) notation symbolizes that the expectation is conditional on the ith individual). There is no gold standard for measuring nutrient intake; hence, we rely on measures of dietary intake from the 24HR, represented by Rij. We assume that the 24HR is an unbiased instrument for the consumption day amount, i.e. E(Rij|i)= Ti. Following the usual assumption in nutrition, as in the ISU method, we assume that the 24HR is unbiased on the original scale. However, a transformation is almost always required for 24HR nutrient data, due to skewed distributions and within-person error that is dependent on the mean intake. A simple, but flexible approach is to apply a Box–Cox transformation, which includes the logarithmic transformation as a limiting case. We define
to be the Box–Cox transformation of the 24HR data, g(r,λ)= (rλ−1)λ−1. When λ= 0, the natural log transformation is used. The transformation is chosen such that, on the transformed scale, we have
where within-person errors eij are additive, uncorrelated with the individual mean , and independent of each other. We may also incorporate a vector of covariates Xi that may be associated with through the model
This model may be extended so that the variances of ui and/or eij vary with the level of a specific covariate Xi as long as the covariate is categorical with only a few levels. However, for clarity of exposition, we retain the simpler form of the model here.
The contribution to the likelihood for the ith subject (j = 1, …,m) is
The full likelihood for the model can be maximized using quasi-Newton optimization of a likelihood approximated by adaptive Gaussian quadrature . This method is implemented in the SAS PROC NLMIXED procedure (SAS Institute, Cary, NC, Version 9.1).
Percentiles of the distribution of usual intake of nutrients for the population are estimated by using a Monte Carlo procedure. To generate a distribution of values that reflects the covariate pattern in the population, we use the covariate patterns of the sampled individuals, in combination with the estimates of β and the variance components and . So that the covariate pattern is proportional to that found in the sample, first we compute X′i for each sampled individual i. We simulate k realizations of ul, a random variable for each person to form simulated , where l = 1, …,kN and Xl = Xi for l = i, …,ki. We use k = 100 simulated values per person. The distribution of the 100N values of reflects a representative sample of transformed usual intakes for the population. The Taylor series expansion may be used to approximate . For the Box–Cox transformation, this backtransformation is:
The estimates of λ and are used to obtain l.
When the within-person variability is much larger than the between-person variability and the data are highly skewed, the Taylor series approximation may not work well. In these cases, e.g. for vitamin A, we propose the use of the nine-point approximation used by the ISU method . In this approximation, a set of nine points ck and nine weights wk are constructed so that the first five moments are the same for the nine-point distribution and the distribution of given X′lβ:
Again, the estimates of λ and are used to obtain l.
Sample quantiles computed from the representative sample of the 100N backtransformed values comprise the estimated population quantiles; the sample fractions that fall above or below a given cutoff comprise the estimates of the proportion of the population with usual intake above or below that cutoff.
The Eating at America’s Table Study (EATS) was conducted to validate the NCI’s Diet History Questionnaire, a food frequency questionnaire (FFQ) . One thousand men and women aged 20–70 years old from a nationally representative sample completed four 24HRs and the FFQ during 1997 and 1998. The 24HR was telephone-administered by trained interviewers using the U.S. Department of Agriculture’s multiple-pass methodology. Portion size estimation was aided by the use of models and measuring cups. Fours 24HRs were completed, one each season, over a one-year period. Although only the 24HR data are used in this paper, some exclusions based on the FFQ were made in the data cleaning process of the EATS data set: 35 participants were excluded from the analysis for skipping at least two pages of the questionnaire; 79 participants were excluded for reporting implausible energy intake (outside the range of 600 to 3500 kcal per day for women or 800 to 4200 kcal for men).
The model described in Section 3 was applied to the EATS data set for three nutrients, calcium, iron and vitamin A. These were chosen to represent nutrients with a variety of statistical properties. Calcium tends to be a ‘well behaved’ nutrient, with a ratio of within-person to between-person variance that is generally around 1.5, less skewness than other nutrients, and a distribution that can be transformed to approximate normality with a power transformation; vitamin A is the opposite, with a high ratio of within-person to between-person variance (in the order of 4:1), resulting in a distribution that is highly skewed, and more difficult to transform to approximate normality. The distribution of iron generally falls in-between these extremes.
The distributions of usual intakes of calcium, iron, and vitamin A for women and men using 4 days and for only 2 days were estimated using individuals’ means, the NCI method, and the ISU method (Table I). Six recall days had 0 reported intake of vitamin A; these were set to one-half of the lowest non-zero value. Standard errors for the NCI method were estimated using a simple bootstrap with 200 replications; standard errors for the ISU method were computed based on the Taylor linearization method that does not account for variability in the choice of normality . Results indicated that the NCI method was comparable to the ISU method, and both methods greatly differed from the 2d or 4d simple individual mean. The linearization-based estimated standard errors for the ISU method were larger than the bootstrap-based estimates for the NCI method. Graphical representations of the cumulative distributions for calcium and vitamin A are shown in Figures 1 and and2.2. Estimates for calcium and iron were computed using the Taylor series backtransformation for the NCI method; however, the nine-point approximation was used for vitamin A because we found that the Taylor series backtransformation did not work well for vitamin A in simulations (described in Section 5).
One advantage of using the NCI method is the ease of performing subgroup analysis by including the subgroup variable of interest as a covariate in the statistical model, rather than doing a stratified analysis as in the ISU method. When all of the model parameters do not vary between subgroups, this method is also more efficient. The efficiency of the modeling subgroup analysis using common variance parameters was studied; results are shown in Tables II(a) and II(b), for women and men, respectively. Models were fit: (1) assuming common variance parameters for the two age groups of interest (20–49 years and 50–70 years), (2) allowing the random effect variance parameter ( ) to vary, (3) allowing the within-subject variance parameter ( ) to vary, and (4) stratified by subgroup. Because these are nested models, likelihood ratio tests were used to compare each model to the base model, and a ‘best’ model was identified. For women, model (3) was best for calcium, (1) was best for iron, and (2) was best for vitamin A. For men, model (3) was best for calcium, (4) was best for iron, and (2) was best for vitamin A.
After selecting the best model and obtaining estimates of the distribution of usual intake from the NCI method, these values were compared with subgroup estimates from the ISU method and the 4-d mean (Table III). Estimates from the NCI and ISU methods were generally within one standard error of each other. Both modeling methods produced estimates significantly different than those for the 4-d mean method, except near the mean usual intake, where the NCI, ISU, and 4-d mean methods agree by construction. In most of the cases shown in Table III, the linearization-based standard errors for the ISU method were greater than those estimated by the bootstrap method, suggesting that the default standard error calculations for the ISU method may overestimate the uncertainty of its estimates. To investigate this, bootstrap standard errors were calculated from the ISU method for women 20–49 years for calcium and vitamin A. These standard errors tended to be smaller than the linearization-based ISU method estimates, suggesting that the estimated standard errors from the ISU method may be too large in the tails of the distribution. A simulation study (described below) was undertaken in part to investigate this hypothesis.
Two simulation studies were performed based on data from the EATS study; the first was based on calcium intake and the second on vitamin A intake. A goal of both studies was to compare percentile estimates. The first study also compared the accuracy of standard error estimates among the methods of interest.
Three hundred data sets were simulated with 500 individuals each. Data were simulated from a normal distribution based on the distribution of transformed calcium in the EATS data, and then transformed using the inverse of the Box–Cox transformation with a lambda of 0.3. Simple means were calculated at the individual level for 2 and 1000 days of intake. Truth was obtained from the percentiles of the 1000-d means for all 300 data sets combined. The ISU method and the NCI method were fit to the two days of data, and the mean of the 5th, 10th, 25th, 50th, 75th, 90th, and 95th percentiles were estimated and compared with truth and with 2-d means. The NCI method was fit using the Taylor series approximation in the backtransformation. Both the NCI method and the ISU method produced results that were very close to truth (Table IV). The mean percentiles for the 2-d means overestimated the percentage of participants in the tails, but were fairly accurate for the 50th and 75th percentiles. Table IV shows that the NCI method produced less variable estimates than the ISU method, most likely due to the latter’s use of a more complex normality transformation.
Standard errors of the percentile estimates for the NCI method were derived using the bootstrap with 100 replicates. The values obtained from these methods were compared with the standard deviations of the quantile estimates from the 300 data sets. The simulations indicated that the bootstrap produced estimates that were reasonably close to the estimated standard deviation of the percentile estimates for the simulated data sets (Figure 3). The linearization-based standard errors produced by the ISU method do not show a consistent bias—they were too large in the extreme tails and too small for percentiles near the median. This may explain the small linearization-based standard error for the percentage of women with vitamin A intake below 5000 IU in Table III.
Data were simulated from the vitamin A data for women in EATS, using a gamma distribution to simulate data for the model with a log link; the intercept was 8.7, random effect variance 0.37, and scale parameter 1.26. Five hundred data sets of size 500 were simulated. Truth was estimated by 365-d means for all 250 000 simulated individuals. The ISU method and the NCI method were fit to 2 days of data; the nine-point approximation was used for the backtransformation in both methods. The 2-d mean produced mean percentile estimates that were too small for the 5th, 10th, 25th, and 50th percentiles; were close to truth for the 75th percentile; and were inflated for the 90th and 95th percentiles. Both the NCI and the ISU methods produced mean percentile estimates that were above truth for the 5th, 10th, 25th, and 50th percentiles; were close to truth for the 75th percentile; and were below truth for the 90th and 95th percentiles (Table IV).
Standard errors of the percentile estimates for the NCI method were derived using the bootstrap with 100, 200, 300, 400, and 500 replicates. Standard errors of the ISU method were obtained using the default linearization method and from bootstrap samples. The values obtained from these methods were compared with the standard deviations of the quantile estimates from the 500 data sets. The simulations indicated that the bootstrap produced estimates that were reasonably close to the estimated standard deviation of the percentile estimates for the simulated data sets (Figure 4). There was not a large difference in the estimated standard errors among the different samples; in particular, there was little additional gain in using more than 200 bootstrap replicates. The ISU linearization method produced standard errors that were biased in the tails; however, bootstrap standard errors from the ISU method were similar to those obtained from the NCI method (Figure 4).
In this paper, the use of the amount part of the NCI method to obtain estimates of the distribution of usual intake of nutrients has been presented and illustrated using examples from the EATS and simulations. The NCI method performs similar to the ISU method for the example nutrients and simulations considered, and both methods are superior to the simple individual mean. Although we used 24HR as the dietary assessment instrument in these analyses, it would be possible to apply this methodology to other dietary assessment tools that meet the model assumptions, such as repeat weighed food records. Additionally, this method requires at least two dietary assessments on at least a subset of individuals. In this paper, we used four days of measurement from the EATS data because it was available and covered a full year of intake; however, we have also illustrated that the method works well with only 2 days of data.
One limitation of the use of the 24HR is the possible violation of the assumption that it is unbiased for the consumption day amount. Validation studies of the 24HR using biomarkers for total energy (doubly labeled water) and protein (urinary nitrogen) have demonstrated biases in 24HRs [18, 19]. Freedman et al. demonstrated that incorporating adjustment for protein intake can reduce the bias in 24HRs for total energy . Yanetz et al. describe how to use doubly labeled water data in a validation study to adjust the estimated distribution of usual energy intake in a national survey . Unfortunately, currently there are no adequate biomarkers that produce unbiased estimates of true intake for nutrients other than energy, protein, and possibly potassium. However, the requirement of unbiasedness is essential for the suggested methodology. Therefore, the degree to which the estimation of the distribution of usual intake may be biased for most nutrients is unknown. In this paper, we have assumed that the 24HR meets the assumption of unbiased estimation for the nutrients we examined; however, this is a potential limitation of this instrument for use with this model.
One advantage of using the NCI method over the ISU method is the efficiency in making estimates for subgroups, by being able to fit common parameters for the subgroups. It is apparent, however, that in some cases fitting different variance parameters by subgroup may be necessary. Even if different variance components are fit, efficiency may be gained by fitting other common parameters, as was demonstrated in the analyses of the EATS data in which a model with some common parameters was superior to a fully stratified model. It is also possible to compare nutrient intake of population subgroups, adjusted for other factors that affect intake if necessary. In addition, the ability to incorporate covariates into the model allows the analyst to make adjustments simultaneously for day of the week, season, and sequence effects, and individual covariates, using a straightforward extension of the model .
Standard errors of estimates were calculated for the NCI method using the bootstrap method; they were calculated for the ISU method using Taylor linearization assuming that the transformation is fixed and known  and a simple bootstrap in some cases. The assumption of a fixed and known transformation leads to standard errors for the ISU method that ignore the variability due to choosing the normality transformation, which may explain some of the negative bias shown over the percentiles near the median in Figure 1. However, it is important to note that the ISU method’s linearization-based standard errors are intended only for analysis of simple random samples. For analysis of complex survey data, alternative estimation methods, such as the bootstrap and balanced repeated replication methods, will most likely be equally suitable for both the NCI and ISU methods.
The NCI method and ISU method performed similarly in estimating the percent of the sample below a cutoff on the nutrients we examined. However, although this paper provides evidence for the utility of the NCI method for nutrient estimation, further investigation is warranted on the comparison of the NCI method with the ISU method. In particular, further investigation into the impact of sample size, how easily data may be transformed to meet the model assumptions, and estimation of standard errors in the tails is needed. One difference we discovered between the two methods was the type of approximation used in the backtransformation. In the simulations based on calcium, the Taylor series approximation worked well; however, for nutrients with a large amount of within-person variation this method does not appear to work as well as the nine-point approximation. Therefore, the nine-point quadrature approximation is recommended for general use.
The NCI method provides a unified framework for estimating the usual intake of dietary constituents, with application to estimating the distribution of foods , nutrients, and applications for relating usual intake to disease . We have shown here that the NCI method, although originally devised to estimate distributions of the usual intake of episodically consumed foods , also provides a flexible framework for estimating the distribution of usual intake of nutrients. It thus provides a unified framework for estimating the distribution of usual intakes of any dietary component of interest.