The HEI-2005 component scores are based upon ratios of reported intakes of food groups or nutrients to that of total energy. Estimating distribution properties of ratios is always more challenging than estimating those of single variables. Ratios may be complicated by measurement errors and other variation in the denominator values and by correlations between the denominator values with the ratio of the numerator to the denominator [5
]. For example, an individual’s energy intake on a specific day is often positively correlated with his/her fat-to-energy ratio on that day.
Such complications are compounded by the HEI-2005 component scores themselves, which are non-linear functions of a ratio, due to the truncation imposed at the minimum and maximum scores. This non-linearity can lead to bias even when the ratio itself is estimated without bias. For example, consider the Whole Fruits component and imagine an individual who consumes exactly 2000 kcal (8368 kJ) consistently each day. Suppose this individual consumes 1 cup-equivalent (240 mL) of whole fruit on half the days, but none on the other half. Then the mean or “usual” ratio for the individual is 0.25 cup-equivalents (60 mL) per 1000 kcal (4184 kJ), leading to a score of 5 × 0.25/0.4 = 3.125, where 0.4 is the truncation point for the maximum achievable score of 5. If we now determine the mean of the ratios over several days, we will obtain over the long-term the correct 0.25 cup-equivalents (60 mL) per 1000 kcal (4184 kJ) (since energy intake is constant). If we determine means of the scores on individual days, however, then over the long-term we will obtain a minimum score of 0 on half the days and a maximum score of 5 on the other half, giving a mean of 2.5, and not the true value of 3.125.
These complications make it impossible to predict analytically which of the three proposed estimates is likely to be the least biased. This suggests that the surest way of investigating the matter is through computer simulation. Based on the results in –, the least biased of the three methods to estimate population’s mean usual HEI-2005 component scores is the score of the population ratio.
Our conclusion is that one should estimate the population’s mean usual HEI-2005 component scores by calculating the score of the population ratio, that is, by taking the score of the ratio of the total food/nutrient intake to energy intake. Nevertheless, this conclusion has some caveats. The conclusion is empirically driven and depends on the US distributions of reported intakes of the components included in the HEI-2005, as well on the standards by which the HEI-2005 component scores are determined.
We have found in a sensitivity analysis that our conclusion is robust to the sampling errors involved when estimating the parameters from the sample of 738 women participating in EATS. The results are reported in on-line Appendix C
. We have also examined distributions of intake reported by men in the EATS study and by women in the Continuing Survey of Food Intakes by Individuals, 1994–96 [8
]. Although we have not fully modeled these data in the same depth as the data on the women in EATS, we obtained a strong impression that the distributional characteristics were very similar in the three groups (allowing for different levels of absolute intake) and would lead to the same conclusions presented here.
Nevertheless, we are aware that substantial changes in intake distributions or in the scoring standards could change the conclusions. For example, while developing the details of this work, we noticed that changes in the chosen standards for the scores could change the performance of the three methods that we examined.
It is important to check that the data used for calculating the population’s mean usual HEI scores are representative of the usual intake of the population, even if usual intake cannot be assessed in the individual participants. This requires that, in order to make inferences about the US population, the data come from a nationally representative sample, the dietary reports are collected for all seven days of the week with proportional representation weekend and week days and seasons of the year. If probability samples rather than simple random samples are used, then the appropriate weights must be employed when the population ratios of the total food/nutrient intake to total energy intake are estimated. It is also advisable that the sample is quite large, in the order of 1000 individuals or more, to ensure that the standard errors of the estimates are relatively small.
As mentioned above, we are confident that our conclusion holds true for currently available US population data. However, we are not so sanguine with regard to minority subpopulations of the US, nor with regard to populations in other countries. We recommend that researchers interested in HEI-2005 component scores in these populations carry out a similar exercise to that reported here, simulating data that follow intake distributions reported in the population of interest. Until evidence emerges for the superiority of another estimate, the score of the population ratio would seem to be the best choice in such cases. We also recommend that periodic checks be carried out to confirm that this measure remains optimal for the US population because intake distributions may change.
With the caveats mentioned, we recommend estimating the population’s mean usual HEI-2005 component scores by the score of the population ratio. Constructing a (two-sided) 95% confidence interval for this measure is recommended over estimating a standard error, as the sampling distribution may be asymmetric. A 95% confidence interval for a component score can be constructed using standard survey packages in the following manner. First, determine the confidence-interval for the associated population ratio with the package, and then score the end points of the interval. A precision measure for the total HEI-2005 score - the sum of the 12 component scores - is more difficult to develop. An algorithm is given in on-line Appendix D
Our main comparison of the three estimators was based on their biases and not on their standard errors. We considered the standard error of the estimators to be of secondary importance to the bias, because in the relatively large samples that we envisage the bias will dominate the error of the estimate, especially in this case where the biases are often large. To check this further, we computed from our simulation (under the assumption of a varying probability of consumption that is correlated with amount of intake on consumption days) the standard error of the three estimates that would be expected from a sample of 1000 individuals. The mean of the standard errors taken over the 12 components were 0.09 for the mean score, 0.18 for the score of the mean ratio, and 0.14 for the score of the population ratio, compared to mean absolute biases of 0.73, 0.66 and 0.37 respectively. More details may be found in on-line Appendix E
Nutritional survey data sometimes include repeated dietary assessments on all or a subset of participants. Such repeat assessments allow statistical modeling of within-person variation and offer the possibility of reducing the bias in estimating the population distribution of usual intakes by using statistical modeling [8
]. A future research aim will be to extend such methods to estimate the US population distribution of the usual HEI-2005 component scores. It is clearly advantageous to be able to estimate the full distribution rather than just the mean. Furthermore, if this can be implemented successfully, it would be a short further step to estimate the population mean directly from these distributions. In principle, estimates of the population mean derived in this manner should have minimal bias and could, therefore, be an improvement over the best method when one 24HR is available, namely the score of the population ratio. Currently, the score of the population ratio should be regarded as the principal method for estimating the population mean usual HEI-2005 component and total scores.