|Home | About | Journals | Submit | Contact Us | Français|
To evaluate the reproducibility and validity of the food-frequency questionnaire (FFQ) used in the California Teachers Study (CTS) cohort and to use this data to quantify the effects of correcting nutrient-breast cancer relative risks for measurement error.
195 CTS cohort members participated in a 10-month dietary validation study that included four 24-hour dietary recalls and pre- and post-study FFQs. Shrout-Fleiss intraclass correlations for reproducibility were computed. Under several standard assumptions concerning the correlations of errors in the FFQs and 24-hour recalls, we calculated energy-adjusted deattenuated Pearson correlations for validity and tested for differences in validity according to a number of demographic and other risk factors. For each nutrient, we compared ot performance of the FFQ versus the 24-hour recalls, estimating the number of days of recalls that give equivalent information about true intake as does a single FFQ.. Finally, the effects of adjustment for measurement error on risk estimates were evaluated in 44,423 postmenopausal cohort members, 1,544 of whom developed breast cancer during seven years of follow-up. Relative risks (RR) and confidence intervals (CI) were calculated using Cox proportional hazards with and without correction for measurement error.
Reproducibility correlations for the nutrients ranged from 0.60 to 0.87. With a few exceptions, validity correlations were reasonably high (range: 0.55–0.85), including r=0.74 for alcohol. Performance of the FFQ differed by age for percent of calories from fat and by body mass index and hormone therapy use for alcohol consumption. For most nutrients examined, our FFQ is comparable to two to six recalls for each subject in capturing true intake. In the measurement error-adjusted risk analyses, corrected RRs were within 13% of uncorrected values for all nutrients examined except for linoleic acid. For alcohol consumption the corrected RR (per 20g/1000kcal/d) was 1.36 (95% CI: 1.03–1.51) compared to the uncorrected estimate of 1.25 (95% CI: 1.10–1.42).
The FFQ dietary assessment used in the CTS is reproducible and valid for all nutrients except the unsaturated fatty acids. Correcting relative risk estimates for measurement error resulted in relatively small changes in the associations between the majority of nutrients and the risk of postmenopausal breast cancer.
The relationship between diet and disease elicits strong public and scientific interest, but it is challenging to measure and quantify usual dietary intake in epidemiologic studies. The most common tool used in these studies is the food-frequency questionnaire (FFQ). The development of any given FFQ is based on the dietary intake of a defined population during a specific period in time. Thus, when these instruments are to be used in populations outside those for which they were specifically developed, it is important to evaluate their reproducibility and validity in the new target population. Additionally, it is important to understand how measurement error can be introduced when nutrient values are assigned to each food group on a FFQ, which are necessarily averages across the foods included in the group, the effects of preparation, seasonal variation, brand differences, and other sources of variation.
While most studies assessing the validity of FFQs generally report good correlations with alternative dietary assessment methods (e.g., 24-hour recalls or diet records), several nutrients often stand out as being measured particularly well or particularly poorly [1–4]. It is important to understand this variation when interpreting the nutrient-disease relationships observed within each population, particularly when these data are being used for purposes of correcting the nutrient-disease estimates for measurement error [1, 5, 6].
The California Teachers Study (CTS), established in 1995–1996, is a prospective cohort study of 133,479 women who are current, former, or retired public school teachers and administrators . Strengths of this cohort include its wide age range; ethnic, geographic, and socioeconomic diversity; and a comprehensive set of exposure characteristics, including usual diet, that were assessed at baseline and are updated periodically. The baseline dietary assessment used in the CTS was an early version of the Block95 FFQ [8, 9]. While various versions of this FFQ have been validated in other populations [10–13], it is critical to describe the reproducibility and validity of this FFQ within the CTS, given the wide range of nutrient-disease investigations being undertaken in this cohort. We also update (with additional follow-up of the cohort) our previously reported results on the association between dietary intake and breast cancer risk among postmenopausal women in the CTS applying the validation results to correct the effect estimates for measurement error .
This validation study has been described in detail previously . Briefly, this substudy of the CTS included a random sample of 386 CTS participants, who were 85 years old or younger at baseline and resided in the greater San Jose, California area. Forty-six (12%) of the women sampled were not contacted to participate in the substudy because they had died, moved out of the substudy area, or could not be located. The 340 women who were contacted were asked to participate in a 10-month study that included four 24-hour dietary recalls. The dietary recalls were spaced at three-month intervals beginning in early 2000. The first dietary recall was conducted in-person and the remaining three via telephone on a random day within the assigned month. Two self-administered FFQs were completed. The first, covering usual dietary intake during 1999, was left with the participant following the first dietary recall and returned to the study office by mail. At the end of the study, a second FFQ, covering usual dietary intake during 2000, was mailed to the participant. The substudy was approved by the Institutional Review Boards of the Northern California Cancer Center and the California Department of Health Services. Written informed consent covering all phases of the study was obtained prior to the phase I interview. Additional verbal consent was obtained before each telephone recall was conducted (phases II–IV).
Of the 340 women invited to participate, 195 (57%) completed the four dietary recalls, 23 (7%) began the study but dropped out prior to its completion, 108 (32%) declined to participate, and 14 (4%) were not interviewed for other reasons (most notably, illness). Of the 195 women participating in all four dietary recalls, 186 (95%) completed the pre-study FFQ and 183 (94%) completed the post-study FFQ; 178 (91%) women completed all of these assessments.
The dietary recalls involved recording all foods, beverages, and vitamin and mineral supplements that were consumed, when they were consumed, how foods and beverages were prepared, and what was added to the foods or beverages (e.g., fats, salt, spices, and condiments) during the 24-hour period directly preceding the interview. During the in-person recall, portion size was estimated using visual aids, such as standard dinner plates, small wood cubes grouped into four portion sizes, glasses, measuring spoons and cups, several models of meat portions, and 2-dimensional images (actual size) of different portions of bananas, pizza, bread, and several other foods. After the in-person recall, interviewers oriented the participants to two-dimensional portion size pictures that were left with the participant and used during the telephone recalls. Nutritionist V software (First DataBank, Inc., San Bruno, CA) was used to assign nutrient values to all dietary recall data.
The FFQ was an early version of the 103 food-item Block95 questionnaire [8, 9] with the addition of a few phytoestrogen-rich foods, for a total of 113 food and beverage items/groups. Frequency of consumption (categories from never to once/day or 5+/day depending on the item) and usual portion size (small, medium, or large relative to a given standard medium portion) were assessed as was regular use of vitamin supplements (including multi-vitamins with and without minerals and single supplements of vitamins A, β-carotene, C, E, or calcium; for C and E, dose was also assessed). Ancillary questions addressed cooking and consumption practices and preferences, the addition of fats in cooking and at the table, consumption of low-fat versions of selected foods, and overall estimates of fruit, vegetable, cereal, and milk consumption. The FFQs were self-administered and checked by study staff for completeness. FFQ data were assigned nutrient values based on an updated version of the Block95 nutrient database. These updates included incorporating more recent data obtained from Dr. Block and from the USDA (most notably for carotenoids). Vitamin D data from the USDA and manufacturer data were also added.
Chi square and Wilcoxon rank sum tests were used to evaluate differences between the substudy participants and the remainder of the cohort [16, 17]. Reproducibility of the nutrient estimates derived from the pre- and post-study FFQs (often referred to as reliability under the assumption that any reported changes in intake during this period are due to reporting errors rather than true changes in eating habits) was estimated using Shrout-Fleiss intraclass correlation coefficients (rSF) . Energy-adjusted (based on using nutrient densities) deattenuated Pearson correlation coefficients (rP) were used to compare the validity of the FFQ relative to the average of the 24-hour recalls[1, 5].
Under the same assumptions (that the errors in each 24 hr recall are independent of each other and independent of the errors in the FFQ) that allow us to provide de-attenuated Pearson correlation coefficients, we also estimated, for each energy-adjusted nutrient the number of days, n, of recalls that an FFQ is equivalent to in terms of its correlation with true intake. Specifically we solved for n as
where r2 is the square of the corrected correlation coefficient between true intake and the FFQ value and Var(True Intake), and Var(one 24hr recall|True Intake) are both estimated as the between and within person variances in a one way random effects ANOVA model of the repeated 24-hour recalls (see Appendix). Using n days of recalls (with n computed from equation 1) to estimate nutrient intake will then give the same correlation between estimated intake and true intake as does using a single FFQ to estimate the same nutrient. In general, the more recalls the FFQ is equivalent to, the more precise the FFQ-estimated nutrient intake is considered to be.
We also evaluated whether the performance of the FFQ differed by subgroups of participants based on race/ethnicity (white, non-white), age (<50, 50–64, ≥65 years), body mass index (BMI; <25, 25–29, ≥30 kg/m2), average lifetime (to age 54) strenuous physical activity (none, <4, ≥4 hrs/wk), and use of hormone therapy (HT) before or at baseline (never, ever). In these analyses the calibration equation (estimated by regression of the average of the four 24 hour recalls against the FFQ values) included these other covariates (race/ethnicity, age, BMI, physical activity, hormone therapy) as potential modifiers of the calibration slope and intercept parameters. For each nutrient we report the results for those covariates significantly (p<0.05) acting as modifiers of the calibration slope as judged by a test of interaction. In general we interpreted higher slopes as being more favorable since, in those groups (defined by the covariates, race/ethnicity, age, BMI, physical activity, HT) with significantly higher calibration slopes, more of the variance of the 24 hour recalls would tend to be explained by a given change in FFQ value.
Finally, the validation results were used to correct for measurement error when evaluating the association between dietary exposures and disease risk. We previously reported on the association between dietary intake and breast cancer risk in the CTS . Here we update those findings for postmenopausal women (based on additional follow-up of the cohort) and apply the calibration equation to correct the estimates for measurement error.
Included in our main regression model were 44,423 cohort members who were postmenopausal at baseline and had complete data on dietary and all confounder variables. Follow-up was complete through December 31, 2002, during which time 1,544 women developed invasive breast cancer. Cox proportional hazards regression, with age as the time scale, was used to obtain the hazard ratios (relative risks) and 95% confidence intervals for breast cancer associated with a change equivalent to approximately the interquartile range observed for each nutrient of interest, using separate analyses for each nutrient. Each analysis of nutrient density (i.e., nutrient/calories) was adjusted for race/ethnicity/birthplace (white, nonwhite and born in North America, nonwhite and born elsewhere), average lifetime (until age 54) strenuous physical activity (0, <4, ≥ 4 hours per week), family history of breast cancer in a first degree relative (yes, no), a history of biopsy-diagnosed benign breast disease (yes, no), age at menarche (≤ 12, ≥13 years), nulliparity/age at first full-term pregnancy (nulliparous, ≤ 24, 25–29, ≥30 years), a variable representing the joint effects of HT use and BMI (kg/m2) (no HT and BMI <25.8, no HT and BMI ≥25.8, estrogen alone ≤ 5 years, estrogen alone > 5 years, estrogen and progestin ≤ 5 years, and estrogen and progestin > 5 years; women using estrogen alone followed by estrogen and progestin were included in the latter two groups based on the total time HT was used), and average alcohol consumption in the year prior to baseline (none, <20 g/d, ≥20 g/d).
The validation regression used in the calibration analyses included a subsample of 155 women from the validation study who had participated in all four 24-hr dietary recalls, had completed the post-study FFQ, and had complete confounder data (as described above) from the baseline survey. We estimated a calibration equation for each nutrient, which regressed the average of the four 24 hour recalls against the FFQ values, adjusting for all confounders. The slope associated with the FFQ term was considered the validation coefficient.
The corrected hazard ratio (relative risk) for each nutrient was calculated by dividing the parameter estimate by the validation coefficient and then exponentiating this ratio. The 95% confidence intervals were estimated using the method described by Stram et al.1 This approach is known to correspond to the correction method of Rosner et al.  in that it gives the same estimates and nearly the same standard errors of those estimates for small to moderate sized risk estimates. In particular, by our procedure standard errors increase proportionately to the increase in risk estimates so that the significance of a test of the null hypothesis that a nutrient variable is unrelated to risk is not changed by doing the regression calibration.
To evaluate the representativeness of our validation study sample, we compared distributions of selected baseline characteristics for the 195 substudy participants to the remainder of the CTS cohort (age ≤85 years) (Table 1; see also ref 15). Substudy participants were similar to the entire cohort in terms of age, body mass index, physical activity, and broad dietary composition (i.e., average daily caloric intake and the contribution of fat and fruits and vegetables to the diet). Menopausal status at baseline was also comparable for substudy participants compared to the remainder of the cohort (50% vs. 53% being postmenopausal; p=0.61) . However, substudy participants were more likely than the remainder of the cohort to be Latina (8% vs. 4%) and less likely to be African American (1% vs. 3%) or white (84% vs. 88%), reflecting the racial/ethnic composition of the substudy area compared to the state as a whole.
Table 2 presents the variation in average daily intake of total calories, macronutrients, fiber, selected micronutrients, and alcohol as measured by the pre- and post-study FFQs and the average of the four 24-hour recalls. Usual intakes during the previous year, as measured by FFQ at the beginning and at the end of the 10-month study period, were generally constant, and correlations reflecting the FFQ’s reproducibility for these nutrients were generally high (ranging from rSF=0.60 for oleic acid to rSF=0.87 for alcohol). The average of all reproducibility correlations was 0.71. Correlations quantifying the validity of the FFQ assessment compared to the average of the four 24-hour recalls were low for caloric intake, linoleic acid, and oleic acid (rP=0.14 to 0.44) but moderate to high for all other nutrients (ranging from 0.57 for protein and vitamin A to 0.85 for β-carotene). The average of all validity correlations was 0.63; excluding caloric and oleic and linoleic acid intake, the average correlation was 0.71. Correlations between the recalls and both the pre-study and post-study FFQs were generally similar for all nutrients.
Multiple 24-hour dietary recalls can be used as a gold standard of dietary assessment, under the assumption that each gives an independent and unbiased estimate of true nutrient exposure. For individual nutrients, the number of recalls needed to achieve the same degree of measurement precision represents a useful metric for evaluating a FFQ; the higher the number of recalls a FFQ represents, the greater its precision for estimating the intake of a given nutrient. For most nutrients, estimates obtained from our FFQ corresponded to approximately two to six recalls (Table 2). Dietary fiber, saturated fat, vitamin A and vitamin C were measured particularly well by our FFQ as they were comparable to more than four recalls. As reflected by the low validation coefficients, our FFQ captured caloric and linoleic and oleic acid intake less well. For each, the FFQ assessment was worth less than one recall.
To understand whether the performance of the FFQ differed by selected characteristics of the participants (i.e., race/ethnicity, age, BMI, physical activity, and HT use), we examined the slope of the calibration equation for each dietary factor (Table 3). There was a significant relationship between age and the slope for percent of calories from fat, however, the trend was not monotonic. For white women with a BMI<25 who did not exercise or use HT, the slopes for ages <50, 50–64, and ≥65 years were 0.97 (= 0.78 + 0.19, i.e., as shown in Table 3, the baseline slope estimate + the estimate for women age <50), 0.36, and 0.78, respectively, evidence that the FFQ captured intake less well for women age 50–64 compared to either younger or older women. Similarly, the slope for β-carotene differed by participant age; BMI modified performance of the FFQ for fiber, vitamins C and D, and alcohol; physical activity influenced the estimates for saturated fat (data not shown), vitamin D, and folate; and the slopes for fiber and alcohol differed by HT use.
Table 4 presents the uncorrected and corrected relative risks for the association between dietary intake and postmenopausal breast cancer risk in the study cohort. For all nutrients except linoleic acid, the corrected risk estimates were within 13% of the uncorrected estimates. The only dietary factor significantly associated with breast cancer risk was alcohol (uncorrected RR=1.25 per 20g/1000kcal/d, 95% CI: 1.10–1.42). Correcting for measurement error provided a relative risk of 1.36 per 20g/1000kcal/d (95% CI: 1.03–1.51) for this association. When simultaneous adjustment for measurement error in caloric intake, alcohol consumption, and each nutrient of interest was made using the methods described by Rosner et al. and Spiegelman et al. corrected relative risks of similar magnitude were observed (data not shown).
These data suggest that the FFQ assessment and its associated nutrient database used in the CTS produce reproducible and valid estimates of nutrient intake. Saturated fat, fiber, β-carotene, vitamins C and E, and alcohol were captured particularly well when expressed as nutrient densities while neither the unsaturated fatty acids nor absolute caloric intake were well measured.
Repeated self-administration of our FFQ 10-months apart resulted in correlations ranging from 0.60 to 0.87 for most nutrients, with an average correlation of 0.71. This is similar to the average correlation observed over a 6-month period in the Iowa Women’s Health Study (IWHS) cohort and higher than reproducibility in the IWHS and the Nurse’s Health Study (NHS) over periods of two and four years, respectively  . Validity correlations, rP, based on comparing estimates of intake from our FFQ to those from the alternative dietary assessment method (i.e., the average of four 24-hour dietary recalls over a 10-month period), ranged from 0.55 to 0.85 for most of the nutrients examined. These estimates are consistent with validity estimates for FFQs used in other large cohort studies [1, 2, 4, 20]. In all these studies, as well as ours, validity estimates for absolute caloric intake are relatively poor while the relative composition of the diet can be generally well estimated using nutrient densities. This property of FFQs was elegantly demonstrated by the Observing Protein and Energy Nutrition (OPEN) study which performed an extraordinarily careful characterization of total energy, protein, and protein density using biomarkers and found that FFQ correlations were poor for absolute measure of protein and energy, but substantially better for protein density [21, 22].
We have presented correlations between our dietary recalls and two FFQs. The timing of the two FFQs have complementary strengths and limitations . The pre-study FFQ was not likely to be influenced by the increased awareness of diet stemming from study participation and refers to an earlier time period. Responses to the post-study FFQ may have been somewhat biased by the awareness resulting from study participation but the time period covered by this FFQ and the dietary recalls is the same. However, the similarity between the two sets of validity correlations (i.e., the FFQs with the recall data) and the reproducibility coefficients (i.e., comparing estimates from the two FFQs) suggest only minimal biases in this regard.
Our validity coefficients reflect not only the correspondence between the two assessment measures but also between the two nutrient databases used. Caloric and nutrient intake from the 24-hour recalls was based on the brand- and preparation-specific data available in Nutritionist V database while the corresponding measures for the FFQs were based on averaged data for foods or food groups. Database differences may have accounted for the higher median β-carotene estimates from the FFQ compared to the dietary recalls and for the relatively poor validity correlations observed for the unsaturated fatty acids. In addition, foods high in these nutrients may be poorly measured by the FFQ or infrequently consumed resulting in inaccurate estimation from only four 24-hour recalls. However, the relatively high correlations for the majority of the nutrients examined suggest that our FFQ database is accurate and robust for measuring nutrient intake.
While the number of dietary recalls we chose to administer as part of this study was based on a balance between statistical precision and practicality, there is always concern that a relatively small number of recalls can affect the accuracy of estimated intake, especially when intake of the nutrients of interest varies substantially within individuals . To minimize possible misclassification related to the number of recalls, we did not tell participants about the specific nature of the recalls until the first in-person meeting and then conducted the second, third, and fourth dietary recalls on random days (including weekends and holidays) within the specified month. In addition, we used statistical techniques to adjust for the number of recalls which were conducted and to account for the resulting within- and between-person variation .
Unlike prior studies [1, 23, 24], we found no racial/ethnic differences in the performance of our FFQ. This may relate to the highly-educated nature of our cohort where the vast majority of teachers hold a bachelor’s degree or higher. However, we did observe differences for several nutrients by age, BMI, physical activity level, or HT use. For percent of calories from fat and vitamin C we can compare our results directly with those found in the MEC cohort . Among women participating in the MEC cohort young age was associated with better reporting of vitamin C, but not fat, on the FFQ; in our cohort the opposite was observed, i.e., young women reported fat intake better than older women whereas there were no age differences in reporting vitamin C intake. BMI did not affect the FFQ performance in MEC for either of these nutrients but in our cohort, the FFQ characterized vitamin C intake better in thinner women (BMI <25 kg/m2) than those who were overweight (BMI ≥30 kg/m2).
A major reason to conduct validation/calibration studies is to obtain the data needed for correction of measurement error in exposure-disease association estimates. We examined both uncorrected and corrected estimates for postmenopausal breast cancer risk, finding that the correction usually resulted in generally similar point estimates and, as expected, wider confidence intervals.
In summary, our dietary assessment appears to be both reproducible and valid with the observed correlations being similar to those reported by other cohort studies. Correcting relative risk estimates of diet-breast cancer associations for measurement error using the data from this validation/calibration study resulted in relatively small changes (<13%) for most nutrients. Our analyses also confirmed the high measurement quality underlying the observation of increased breast cancer risk associated with greater alcohol consumption, which is the sole dietary factor associated with overall breast cancer risk in this study.
The authors would like to thank Dr. Esther John for her work in compiling the vitamin D nutrient data and the members of the CTS Steering Committee, who are responsible for the formation and maintenance of the cohort within which this study was conducted, but are not authors on the current paper: Hoda Anton-Culver, Rosemary Cress, Dennis Deapen, David Peel, Rich Pinder, Ronald K. Ross (deceased), Dee W. West, William E. Wright, Giske Ursin, and Argyrios Ziogas.
Funding Acknowledgement: This research was supported by grant R01 CA77398 from the National Cancer Institute and contract 97–10500 from the California Breast Cancer Research Fund. The funding sources did not contribute to the design or conduct of the study, nor to the writing or submission of this manuscript. The collection of cancer incidence data used in this study was supported by the California Department of Health Services as part of the statewide cancer reporting program mandated by California Health and Safety Code Section 103885; the National Cancer Institute’s Surveillance, Epidemiology and End Results Program under contract N01-PC-35136 awarded to the Northern California Cancer Center, contract N01-PC-35139 awarded to the University of Southern California, and contract N02-PC-15105 awarded to the Public Health Institute; and the Centers for Disease Control and Prevention’s National Program of Cancer Registries, under agreement #U55/CCR921930-02 awarded to the Public Health Institute. The ideas and opinions expressed herein are those of the authors and endorsement by the State of California, Department of Health Services, the National Cancer Institute, and the Centers for Disease Control and Prevention or their contractors and subcontractors is not intended nor should be inferred.
We define n by equating the correlation, r, between the true nutrient intake and the FFQ with the correlation between true nutrient intake and n 24-hour recalls (which increases as n increases). Note that this is equivalent to equating Var(True Intake | FFQ value) to Var(True Intake | n 24-hour recalls) since 1-r2 is equal to Var(True|Estimated)/Var(True), i.e., it is the “unexplained” portion of the variance of True not captured by the estimate. If we assume that all variables are distributed normally then we can compute both Var(True Intake | FFQ value) as well as Var(True Intake | n 24-hour recalls) by a standard variance components analysis.
Let Xi be true intake of the nutrient, then by fitting a random intercept model (as in SAS proc mixed) to the 4 recalls (Rij, j=1,4, i=1,N) per subject, then Rij is estimated by
where εij is the random error in our measurement. We can estimate Var(X) as the variance of the random intercept and Var(ε) as the residual variance (i.e. the variance of one 24 hr recall given true intake). By an elementary application of conditional probability for normal random variables, and defining i = Rij/n, we have
Note also that by fitting a second random intercept model
and equating Xi = a+bFFQi + Wi (so that the random intercept W is the portion of true X unexplained by the regression on the FFQ value), we can estimate Var(X|FFQ) Var(W) as the variance of the random intercept term. Thus to estimate the number of days of 24-hr recalls that one FFQ is “worth” we equate Var(W) = Var(X|i) = Var(X)−Var(X)2/[Var(X)+Var(ε)/n] and solve for n as
Since the square of the corrected correlation between FFQ and the 24-hour recalls can be written as
(this follows since X−W is the portion of X that is “explained” by the FFQ and r2 is always the ratio of the variance of the “explained” portion of X t othe total variance of X) we have
which is identical to equation (1).