Search tips
Search criteria 


Logo of amjepidLink to Publisher's site
Am J Epidemiol. 2010 December 1; 172(11): 1315–1323.
Published online 2010 October 11. doi:  10.1093/aje/kwq284
PMCID: PMC3025629

A Practical Method for Collecting Food Record Data in a Prospective Cohort Study of Breast Cancer Survivors


Multiple-day diet records can be unsuitable for cohort studies because of high administrative and analytical costs. Costs could be reduced if a subsample of participants were analyzed in a nested case-control study. However, completed records are usually reviewed (“documented”) with participants to correct errors and omissions before analysis. The authors evaluated the suitability of using undocumented 3-day food records in 2 samples of women in a Northern California cohort study of breast cancer survivorship (2006–2009). One group of participants (n = 130) received an introduction to the food record at enrollment, while another (n = 70) received more comprehensive instruction. Food records were mailed to participants 6 months later for follow-up and were analyzed as received and after phone documentation. Error rates for adequate completion were high in the first group but substantially lower among persons receiving instruction; prevalences of missing data on serving size and incomplete food descriptions changed from 30% to 4% and from 32% to 6%, respectively (P < 0.0001). Correlations between nutrient intakes calculated from undocumented and documented records were 0.72–0.93 in the first group and were significantly stronger (0.84–0.99) among persons receiving instruction. Documentation had little effect on intraclass correlation coefficients across days, but training increased the coefficients for many nutrients. When participants receive proper instruction, undocumented food records can be satisfactory for large epidemiologic studies.

Keywords: breast neoplasms, cohort studies, data collection, diet records, reproducibility of results

Food frequency questionnaires (FFQs) are commonly used in epidemiologic studies to assess dietary intake because of increased efficiency and low cost, yet in general, multiple-day food records are more precise measures of food intake than FFQs (1). In fact, studies using doubly labeled water and other biomarkers suggest that food records are more valid than FFQs and that the error associated with FFQs is greater than previously estimated (26). However, the use of food records in cohort studies is often considered impractical because of the need to conduct a nutritionist-administered review to correct completed records and the prohibitive analytical costs for the entire cohort (13). However, if a food record protocol does not require nutritionist review, a case-cohort or nested case-control study design could be used, and only records from a subset of the cohort (i.e., cases and controls) would be needed for data analysis.

To our knowledge, only 1 small study has examined the quality of food records with and without nutritionist review (4), which is a process called documentation. In a random sample of 68 healthy men and women from the Vitamins and Lifestyle (VITAL) cohort study in western Washington State, comparison of undocumented 3-day food records with traditional documented food records yielded comparable mean nutrient intakes (high precision, minimal bias), high nutrient correlations (high validity), and similar within-person day-to-day nutrient variability. We decided to test this validation protocol in a sample of 130 women newly diagnosed with breast cancer and recently enrolled in a prospective cohort study of breast cancer survivors.

Initial results (unpublished data) showed that nutrient values from undocumented food records were less precise and less valid than those from documented records. Therefore, we decided to improve upon the original protocol by training study staff to administer comprehensive, in-person instruction to participants on how to complete a 3-day food record, evaluating this protocol in a unique sample of breast cancer patients enrolled in a study of breast cancer survival. Our aims were 1) to assess the rate of specific errors in the undocumented dietary records, 2) to examine the precision (mean difference and standard error) and validity (correlation) of nutrient intake measures from undocumented records by comparing data from the undocumented records (analyzed as received) with the reference measurement of records documented (corrected) by means of nutritionist interview, and 3) to evaluate whether minimal or comprehensive instruction on completing the food record improved data quality.


The Pathways Study is an ongoing, prospective cohort study that is actively recruiting women recently diagnosed with invasive breast cancer (all stages) from the Kaiser Permanente Northern California patient population (5). The overarching goal of the study is to determine how lifestyle and molecular factors might influence breast cancer recurrence and survival. Written informed consent is obtained from all participants before they are enrolled in the study, typically at the beginning of the in-person baseline interview. The study was approved by the institutional review boards of Kaiser Permanente Northern California and all collaborating institutions.

Data collection

During the baseline interview (2 months postdiagnosis, on average), a series of questionnaires were administered, and information pertinent to this analysis was collected, such as information on demographic factors (race/ethnicity, education, and household income) and self-reported height and weight for calculation of body mass index (weight (kg)/height (m)2). Participants were also asked to complete a baseline 3-day food record. The 3-day food record booklet was designed to be entirely self-administered. It contained instructions for recording food intake (including how to describe food preparation methods, added fats, brand names, and ingredients of mixed dishes and recipes), as well as an example of a correctly completed day's record. The booklet also contained 12 questions on food-use patterns to collect information typically obtained during review of completed food records; these responses were used to assign default values when records were incomplete. The food record can be viewed online at

Six months later, as part of the first follow-up, all participants were mailed another 3-day food record, detailed instructions, and a return envelope. They also received a 12-page serving size booklet containing photographs and other measurement tools to facilitate accurate quantification of foods and beverages consumed. The serving size booklet can also be viewed at Because of dietary changes that can be caused by effects of adjuvant treatment during the first 6 months after a breast cancer diagnosis, data from food records obtained at the 6-month follow-up, rather than at the baseline interview, were used in this analysis.

Two convenience samples of women were selected from the study population. In the first group (n = 150), the food record was briefly introduced by an interviewer during the baseline interview (hereafter referred to as “minimal instruction”), with only a brief mention to the participant that she would be asked to complete another food record in 6 months as part of her 6-month follow-up. In the second group (n = 76), interviewers were trained to deliver more comprehensive instructions on completing the food record during the baseline interview and repeatedly emphasized to the participant that she would be asked to complete another food record at the 6-month follow-up (hereafter referred to as “comprehensive instruction”). The additional instructions consisted of the following directives: Record 3 nonconsecutive days of food intake, including 1 weekend day; do not change your usual eating habits while keeping the record; describe completely the foods eaten, amounts eaten, and preparation methods (if applicable); and include labels providing nutrition facts (if applicable). The interviewer also went through each page of the booklet with the participant and helped the participant start recording her first day of intake.

Six-month records received within 17 days of the last day the participant recorded food and beverage intake (2 weeks to complete, plus 3 days to mail) were considered sufficiently recent to complete documentation over the phone. Trained staff entered data from these undocumented food records into the Nutrition Data System for Research (version 2008, Food and Nutrient Database 39) (6, 7) using a set of rules to standardize entry of foods with incomplete information. Records were additionally coded for the numbers of foods that were missing the following types of required information: serving size, food description, preparation method, and recipe or mixed food ingredients. An interviewer called the participant to collect missing and incomplete information, and these data were also entered into the Nutrition Data System for Research by staff who were blinded to decisions made during analysis of the undocumented records.

Statistical analysis

Error rates (defined as the number of omissions divided by the number of foods subject to that omission) were calculated for each type of missing information for each day's intake. To test whether error rates differed according to age at diagnosis, body mass index, race/ethnicity, education, or household income, we used multivariate weighted linear regression models with daily error rates (as response variables) weighted for the number of foods in each category of error, adjusting for all characteristics of interest. In linear tests for trend, continuous variables (age at diagnosis, body mass index) were modeled with continuous values, and categorical variables (education, household income) were modeled on an ordinal scale. To test whether individual error rates differed between the minimal instruction and comprehensive instruction groups, the data were combined and an indicator variable for group status was included in the multivariate regression model. To assess the impact of recording errors on nutrient estimates, we calculated precision (bias) as the mean difference and standard error between undocumented and documented records, and we computed validity as Pearson's correlation coefficients and Spearman's rank correlation coefficients using log-transformed nutrient values. Therefore, a correlation coefficient of 1.0 would be considered perfect precision (i.e., agreement) between the undocumented and documented nutrient values. We used intraclass correlation coefficients (ICCs) among 3 days of food records to compare day-to-day variation in nutrient intake between the undocumented and documented records. Fisher's z transformation tests were used to compare correlation coefficients and ICCs (8). All reported P values are 2-sided and were considered statistically significant at P < 0.05.


In the minimal and comprehensive instruction groups, respectively, 130 (87%) out of 150 women (19 ineligible, 1 refused) and 70 (92%) out of 76 women (5 ineligible, 1 refused) returned a completed 6-month food record within 17 days of the last intake day and were successfully contacted for review of their food record within 5 days. The main reason for ineligibility was inability to reach the participant for review of her food record within the allotted time period of 5 days. We made an exception for 3 participants in the first group and 1 participant in the second group who were contacted up to 14 days after the record was received, since their records were returned well below the average time for their respective groups. The mean time to return of the food record to the study office was 9.1 days (standard deviation (SD), 3.3) in the first group and 8.5 days (SD, 3.3) in the second group, while time needed to reach the participant for the phone review was 1.7 days (SD, 2.1) in the first group and 2.4 days (SD, 2.3) in the second group. This analysis was limited to the 130 and 70 women with complete data from the minimal and comprehensive instruction groups, respectively. Participants were similar to the overall Pathways cohort (5) by being predominantly white (minimal instruction group, 85%; comprehensive instruction group, 86%), having at least some college education (minimal instruction group, 87%; comprehensive instruction group, 90%), and more often being overweight or obese (minimal instruction group, 65%; comprehensive instruction group, 66%).

Table 1 gives numbers of omissions and error rates in the 3-day food records. Participants reported consuming an average of 15.7 and 15.9 foods per day in the minimal and comprehensive instruction groups, respectively. In the minimal instruction group (top portion of Table 1), error rates were high, and the most common error was missing preparation method (62%), followed by missing description of food item (32%), missing serving size amount (30%), and missing recipe or mixed food ingredients (26%). After interviewer training (comprehensive instruction group), all measures of quality improved: serving size amount (4%), food item description (6%), preparation method (40%), and recipe or mixed food ingredients (23%) (bottom portion of Table 1).

Table 1.
Mean Numbers of Foods and Omissions per Person per Day in Self-Administered 3-Day Food Records in Participant Subgroups Receiving Minimal and Comprehensive Instruction, Pathways Study, Northern California, 2006–2009

Table 1 also gives the associations between error rates and participant characteristics in the 2 groups. In the minimal instruction group (top portion of Table 1), women aged 70 years or more were more likely to miss serving sizes (P = 0.0025), nonwhite women were more likely to miss preparation method (P = 0.019), and increasing household income was borderline statistically significant, inversely related to missing recipes or ingredients in mixed foods (P = 0.009 for $50,000–$89,999 and P = 0.088 for ≥$90,000; P for trend = 0.061). In addition, less-educated women had more missing data on preparation method (for women with a high school education, P = 0.032; P for trend = 0.018). None of the error rates differed by body mass index. In the comprehensive instruction group (bottom portion of Table 1), women aged 70 years or more consistently had higher error rates: missing serving size amount (borderline statistically significant; P = 0.062), missing food description (P = 0.012; P for age trend = 0.020), missing preparation method (P = 0.033), and missing recipe or mixed food ingredients (P = 0.0017). While results were borderline significant, women with body mass indices of 25–29 also had more missing food descriptions (P = 0.066), and women with annual household incomes of $50,000–$89,000 had more missing recipe or mixed food ingredients (P = 0.062). None of the error rates differed by race/ethnicity or education. Overall, the minimal instruction group had significantly higher error rates than the comprehensive instruction group, except for missing recipe or mixed food ingredients.

Table 2 gives correlation coefficients for the correlation between undocumented and documented food records in both the minimal and comprehensive instruction groups for a variety of macro-, micro-, and fat- and water-soluble nutrients of most interest for cancer research. In the minimal instruction group, agreement was moderate, ranging from 0.72 for trans-fatty acids to 0.93 for alcohol. Agreement was significantly higher in the comprehensive instruction group for each nutrient examined, ranging from 0.84 for trans-fatty acids to 0.99 for alcohol. For example, the correlations for percentage of calories from fat, vitamin D, and calcium increased from 0.79 to 0.92 (P < 0.001), from 0.85 to 0.96 (P < 0.0001), and from 0.86 to 0.96 (P < 0.0001), respectively. Interestingly, after documentation in both the minimal and comprehensive instruction groups, reductions in estimated energy intake of 33.7 kcal/day and 44.4 kcal/day, respectively, were observed. This difference may be explained, to a large extent, by reductions in estimated fat intake of 4.0 g/day and 3.6 g/day, respectively (saturated fatty acids by 1.5 g/day and 1.0 g/day, monounsaturated fatty acids by 1.6 g/day and 0.7 g/day, polyunsaturated fatty acids 0.6 g/day and 1.6 g/day, and trans-fatty acids by 0.5 g/day and 0.6 g/day). These changes indicate that the fat content of foods could possibly be emphasized when instructing participants on completing the food record in order to improve dietary assessment.

Table 2.
Pearson and Spearman Coefficients for Correlation Between Undocumented and Documented Food Records in Participant Subgroups Receiving Minimal and Comprehensive Instruction, Pathways Study, Northern California, 2006–2009

Table 3 gives ICCs for correlation among the 3 days of recorded intake between the undocumented and documented food records in the minimal and comprehensive instruction groups for the same nutrients as presented in Table 2. Overall, in both the minimal and comprehensive instruction groups, documentation did not substantially affect the ICCs. However, ICCs were often uniformly and substantially higher in the comprehensive instruction group than in the minimal instruction group, with the exception of lycopene. For example, the documented ICCs for energy, fat, and carbohydrate increased from 0.46 to 0.69 (P < 0.0001), from 0.32 to 0.61 (P < 0.0001), and from 0.46 to 0.67 (P < 0.0001), respectively.

Table 3.
Intraclass Correlation Coefficients for Correlation Among 3 Days of Recorded Dietary Intake Between Undocumented and Documented Food Records in Participant Subgroups Receiving Minimal and Comprehensive Instruction, Pathways Study, Northern California, ...


This study of women recently diagnosed with breast cancer found that the quality of data from undocumented food records was acceptable as long as participants received adequate instruction. There were strong correlations between nutrients calculated from documented 3-day food records and undocumented 3-day food records, which were uniformly larger among women who received comprehensive instruction on how to complete the record as compared with those receiving minimal instruction. Similarly, errors such as missing data on portion size or preparation method were common in women receiving minimal instruction and were substantially lower among those receiving comprehensive instruction. Finally, documentation had no effect on the within-person day-to-day variability of nutrients among the 3 days of records; however, the ICCs for many nutrients were higher among the women who received comprehensive instruction than among those who did not.

In a previous assessment of food record quality among 68 healthy men and women in the VITAL Study (4), error rates and measures of precision were largely similar to our results, particularly those for our comprehensive instruction group. VITAL participants reported consuming an average of 17.6 foods per day, of which 3% were missing portion sizes and 8% were incompletely described, while in our comprehensive instruction group, women reported consuming 15.9 foods per day, of which 4% were missing portion sizes and 6% were incompletely described. When comparing mean nutrient intakes between the undocumented and documented food records in the VITAL Study, the Pearson correlation coefficients ranged from 0.87 to 1.00, while the correlation coefficients were 0.84–0.99 in our comprehensive instruction group. Overall, the ICCs comparing undocumented and documented records in both studies were similar.

We found that comprehensive instruction at entry into the study markedly improved the quality of the follow-up undocumented food record, and our protocol for delivering this instruction focused on the study staff. The training of 11 field interviewers who conducted the baseline interview occurred over a 2-month period during the summer of 2008. Initially, all staff completed a 3-day food record based on the written instructions in the food record booklet, and then the completed booklets were mailed for data entry and review by trained nutritionists. Written feedback was provided to staff on errors such as common omissions or the need for clarification of portion sizes and recipes. A conference call was subsequently arranged between the field staff and nutritionists to review samples of completed food records of varying quality, provide additional feedback, and answer any remaining questions. A new introductory script for the food record was also discussed, finalized, and distributed for use. Field observations of all staff using the new script were later conducted with appropriate feedback. A possible reason as to why the quality of the food records included in this analysis improved with additional instruction (i.e., the comprehensive instruction group) is that field staff reviewed the food record in detail with the participant at the beginning of the baseline interview, including helping the participant to start recording her first day of intake, and then reiterated the instructions again at the end of the interview. Overall, training interviewers to deliver more comprehensive instructions to participants before they complete the food record is more efficient than postinterview documentation, as well as feasible in studies with in-person participant contact. While it was outside the scope of the present study, a logical and informative next step would be a cost analysis comparing the 2 methods.

There were several limitations to this study. As is true for all cohort studies, participants in the Pathways Study and this small validity study were a volunteer sample characterized by their willingness to complete an extensive questionnaire. The overall response rate in the cohort was approximately 51% (5), yet selection bias was probably minimal, because participation in a cohort study is generally not jointly affected by exposure and future (unknown) disease, or in this context disease outcome. We also only examined 2 sources of measurement error, eliminating in-person instruction and postrecord review of completed food records. Although it was beyond the scope of this study, we did not address errors due to having only 3 days of food records, behavior change due to record-keeping, or other potential inaccuracies in recording food intake, as well as possible differences in dietary patterns between documented and undocumented records. Additionally, participants were not randomly assigned to receive minimal or comprehensive instruction. Instead, comprehensive instruction was instituted to improve data quality after evaluating the results from the first group receiving little instruction, which resulted in approximately a 2-year difference in data collection period between the groups. The recorded intake dates for minimal and comprehensive instruction ranged from January 2007 to July 2007 and from March 2009 to July 2009, respectively. Thus, the groups were not strictly comparable; however, they were similar in terms of demographic characteristics and seasonality of data collection, and we have no reason to believe that the second group's better performance in keeping the food records was due to reasons other than the comprehensive training. Finally, because of limited statistical power, we were unable to test potential interactions such as those between participant characteristics and level of training.

Notably, our results are not directly generalizable to studies using undocumented food records with more than 3 days of records. We do not think that results would change substantially with additional recorded days up to 7 days, but we have no data with which to address this possibility. In addition, our results can only be applied directly to diseased female subjects recently enrolled in a cohort study.

Nonetheless, we have shown that food records can be a reasonable and practical alternative for dietary assessment in large cohort studies, such that epidemiologists need not always rely on the more popularly used FFQ. Furthermore, many scientists believe that FFQs are not generating sufficiently accurate dietary data to support research hypotheses (911), such as the association between dietary fat and breast cancer risk (12, 13). In the Pathways Study, FFQ data are also collected during the same longitudinal time points as administration of the 3-day food record. We plan to conduct future analyses examining the role of dietary factors of interest, such as fat, on breast cancer prognosis (and other endpoints such as quality of life and comorbid conditions) while comparing dietary assessments from FFQs with those from food records. These analyses aim to provide further insight into the current controversy surrounding use of FFQs in epidemiologic studies.

In conclusion, with prior proper instruction, undocumented 3-day food records are an acceptable approach to collecting dietary intake data, even in diseased study populations such as women recently diagnosed with breast cancer. This streamlined protocol is feasible and should be considered for large epidemiologic cohort studies in which higher-quality measurement of food intake is desired. Furthermore, when food records are used in the context of nested case-control analyses such that completed records can be stored for future analyses as needed, the quality of research on diet and disease outcome could be substantially improved.


Author affiliations: Division of Research, Kaiser Permanente Northern California, Oakland, California (Marilyn L. Kwan, Lawrence H. Kushi, Jun Song, Allegra W. Timperi); Nutrition Assessment Shared Resource, Fred Hutchinson Cancer Research Center, Seattle, Washington (Alanna M. Boynton, Karen M. Johnson, Judi Standley); Department of Epidemiology, School of Public Health and Community Medicine, University of Washington, Seattle, Washington (Alan R. Kristal); and Cancer Prevention Program, Fred Hutchinson Cancer Research Center, Seattle, Washington (Alan R. Kristal).

This work was supported by the National Cancer Institute (grant R01 CA105274 to L. H. K. and grant P30 CA15704 to A. R. K.) and the American Cancer Society (grant RSG-06-209-01-LR to M. L. K.).

The authors thank Dr. Charles P. Quesenberry, Jr., for biostatistical support.

The contents of this article are solely the responsibility of the authors and do not necessarily represent the official views of the funding agencies.

Conflict of interest: none declared.



food frequency questionnaire
intraclass correlation coefficient
standard deviation
Vitamins and Lifestyle


1. Armstrong BK, White E, Saracci R. Principles of Exposure Measurement in Epidemiology. (Monographs in Epidemiology and Biostatistics) NewYork, NY: Oxford University Press; 1994.
2. Kristal AR, Shattuck AL, Williams AE. 17th National Nutrient Databank Conference. Baltimore, MD: International Life Sciences Institute; 1994. Food frequency questionnaires for diet intervention research; pp. 110–125.
3. Willett WC. Nutritional Epidemiology. New York, NY: Oxford University Press; 1998.
4. Kolar AS, Patterson RE, White E, et al. A practical method for collecting 3-day food records in a large cohort. Epidemiology. 2005;16(4):579–583. [PubMed]
5. Kwan ML, Ambrosone CB, Lee MM, et al. The Pathways Study: a prospective study of breast cancer survivorship within Kaiser Permanente Northern California. Cancer Causes Control. 2008;19(10):1065–1076. [PMC free article] [PubMed]
6. Schakel SF, Buzzard IM, Gebhardt SE. Procedures for estimating nutrient values for food composition databases. J Food Compost Anal. 1997;10:102–114.
7. Schakel SF, Sievert YA, Buzzard IM. Sources of data for developing and maintaining a nutrient database. J Am Diet Assoc. 1988;88(10):1268–1271. [PubMed]
8. Konishi S, Gupta AK. Testing the equality of several intraclass correlation coefficients. J Stat Plan Inference. 1989;21:93–105.
9. Kristal AR, Peters U, Potter JD. Is it time to abandon the food frequency questionnaire? Cancer Epidemiol Biomarkers Prev. 2005;14(12):2826–2828. [PubMed]
10. Kristal AR, Potter JD. Not the time to abandon the food frequency questionnaire: counterpoint. Cancer Epidemiol Biomarkers Prev. 2006;15(10):1759–1760. [PubMed]
11. Willett WC, Hu FB. Not the time to abandon the food frequency questionnaire: point. Cancer Epidemiol Biomarkers Prev. 2006;15(10):1757–1758. [PubMed]
12. Bingham SA, Luben R, Welch A, et al. Are imprecise methods obscuring a relation between fat and breast cancer? Lancet. 2003;362(9379):212–214. [PubMed]
13. Freedman LS, Potischman N, Kipnis V, et al. A comparison of two dietary instruments for evaluating the fat-breast cancer relationship. Int J Epidemiol. 2006;35(4):1011–1021. [PubMed]
14. Foster-Powell K, Holt SH, Brand-Miller JC. International table of glycemic index and glycemic load values: 2002. Am J Clin Nutr. 2002;76(1):5–56. [PubMed]

Articles from American Journal of Epidemiology are provided here courtesy of Oxford University Press