Our statistical model assumes that the 24HR gives an unbiased assessment of an individual’s usual intake and that its errors are independent of the FFQ; without such assumptions, it is impossible to model the usual intakes that are central to our analysis. Data from the National Cancer Institute’s Observing Protein and Energy Nutrition (OPEN) Study, which has reference biomarkers for protein, potassium, and energy intakes, showed that such assumptions are untrue for these nutrients (4
). However, the 24HR was substantially less biased than the FFQ and is used regularly in dietary surveillance (26
). We also note that should the 24HR involve some bias for assessing certain dietary intakes, regression calibration would produce biased estimates of the risk parameter of interest. In univariate disease models, the bias would be multiplicative and therefore would not change the relative comparisons of the variability of the estimated risk parameters. However, the statistical powers and sample sizes necessary under different strategies would change. We estimated this change for energy-adjusted protein and potassium intakes in the OPEN Study for men and women separately. We applied regression calibration adjustment as described above and then estimated the R2
ratios in 2 ways: first, assuming that the 24HR is unbiased, and second, using the unbiased biomarkers. The results are displayed in Web Appendix 8
. The R2
ratios are almost exactly the same for all nutrients but protein for men, indicating that although a biased 24HR may lead to biased estimates of relative risk, nevertheless the relative powers/sample sizes displayed in our tables are robust to possible biases in the 24HR.
While the relative comparison of variability, statistical powers, and necessary sample sizes seems robust to our modeling assumptions, this does not mean that regression calibration using a 24HR resolves the measurement error issue. While use of the 24HR is a step in the right direction and our results indicate how many such 24HRs are needed, it remains an interim approach until bona fide biomarkers become available for additional dietary components.
These findings provide insights for designing studies that include a measure of dietary intake, where the goal is to obtain the best possible estimate considering participant burden and cost. Recalls can now be inexpensively self-administered electronically, but how many we can reasonably expect participants to complete and with what level of quality remains uncertain. One study of volunteers showed a willingness to complete 8 or more nonconsecutive automated recalls, with a general but inconsistent decline in mean energy estimates with successive recalls (16
). This decline in energy reporting has also been found using multiple consecutive days of food records (30
). To date, the Automated Self-Administered 24-Hour Recall is being used in more than 40 studies in multiple populations; results and experiences from these studies should elucidate issues of data quality, drop-off, and utility. Further studies are planned to compare the Automated Self-Administered 24-Hour Recall with interviewer-administered recalls and to validate the Automated Self-Administered 24-Hour Recall in a controlled feeding study and against recovery biomarkers. Finally, although electronic instruments are widely available, some burden exists for researchers who must set up their studies and contact, monitor, and support participants across assessments.
Our findings further illustrate previous findings (1
) indicating that FFQs alone contain substantial measurement error which might hamper the ability to detect diet-disease associations. However, these findings also show that FFQs contribute important information in combination with 24HR data, especially for episodically consumed foods or nutrients. The critical question, however, is whether researchers can expect participants to complete an FFQ and 4–6 recalls. Is there a point at which the gain in accuracy is offset by the loss of participants due to excess burden? These are questions to be answered by the future studies of new dietary assessment technologies such as those previously mentioned. Such information will further assist in forming recommendations for measuring dietary intakes in future research.
Our findings may aid investigators in refining estimates of how many recalls to administer or whether or not to include an FFQ. For nutrients and foods consumed nearly daily by most people, such as fats or carbohydrates, one could argue that adding an FFQ to two 24HRs is worth the burden, but not for 4–6 24HRs, given the relatively small improvement expected. However, for an episodically consumed nutrient or food group like dark green vegetables, an FFQ alone is better than up to nine 24HRs, so there may be no need to administer any recalls. For cohort studies, however, investigators almost never limit the scope to selected foods or nutrients, suggesting the need to include both 24HRs and FFQs.
We have not addressed whether food records might substitute for recalls as the primary diet assessment instrument in large studies. Although records are known to be problematic in terms of reactivity, theoretically they do not suffer from the memory limitations of 24HRs (if completed throughout the reporting day as opposed to the end of a reporting day). The high costs of administering and coding food records pose a barrier to their use in large studies. However, investigators in 2 different cohort studies have successfully collected baseline food records, later selectively coding and analyzing them within a nested case-control design (8
). Both found statistically significant diet-disease relations with food records but not with an FFQ. While we have analyzed multiple 24HRs here, our statistical methods could be, and should be, applied to multiple food records. In addition, newer food record tools that include digital photography may greatly improve portion-size estimates and may better facilitate real-time recording. It remains unclear how the strengths, limitations, and biases of each of these tools will affect the bottom line in determining what is optimal in terms of quality, expense, practicality, and burden.