|Home | About | Journals | Submit | Contact Us | Français|
The performance of the dietary questionnaire used in a multiethnic cohort study in Hawaii and Los Angeles was assessed in a calibration substudy that compared diet reported from the questionnaire with three 24-hour dietary recalls. For the calibration substudy, subjects from each of eight subgroups defined by sex and ethnic group (African-American, Japanese-American, Latino, and White) were chosen randomly from among the cohort members, and each participant’s previous day’s diet was assessed by telephone recall on three occasions over approximately 2 months. After completing the three 24-hour recalls, each calibration subject was sent a second questionnaire; 1,606 persons completed three recalls and a second questionnaire (127 to 267 per ethnic-sex group). This report describes correlation coefficients and calibration slopes for the relation between the 24-hour recalls and second questionnaire values for a selected set of macro- and micronutrients, as absolute intakes, nutrient densities, and calorie-adjusted nutrients. In all subgroups, estimates of the correlation between the questionnaire and 24-hour recalls were greater after energy adjustment (average correlations ranged from 0.57–0.74 for nutrient densities and from 0.55–0.74 for calorie-adjusted nutrients) than when absolute nutrient values were used (average range 0.26–0.57). For absolute nutrient intakes, the correlations were greatest for Whites, somewhat lower for Japanese-Americans and Latinos, and lowest for African-Americans. After energy adjustment, the difference between subgroups were diminished, and the correlations were generally highly satisfactory.
Disease risk estimates derived from cohort or case-control studies of diet are generally smaller than are corresponding risk estimates from ecologic analyses (1, 2). However, while ecologic studies suffer from inadequate control of confounders and other well-known limitations, estimates of risk within cohorts can be substantially attenuated by measurement errors in assessing individual diets (3). Measurement error is particularly a problem in cohort studies that include subgroups for whom the questionnaire may perform differently, because this complicates inter-group comparisons. A calibration substudy, in which data from diet history questionnaires are compared with information from a second source that is assumed to provide an unbiased estimate, allows for correction of risk estimates for measurement error. We report here on the results of a calibration substudy for a recently established cohort study that includes several different ethnic groups. The two principal purposes of conducting the calibration study were: 1) to provide information that will ultimately be used for correction of risk estimates obtained from analysis of nutritional risk factors as cancer incidence is observed prospectively in this new multiethnic cohort; and 2) to evaluate the performance of the questionnaire, by correlational analysis, for each ethnic group under study.
The Hawaii-Los Angeles Multiethnic Cohort Study of diet and cancer is based on the prospective follow-up of over 215,000 adults who were aged 45–75 years at enrollment and who completed a mailed survey instrument that included a quantitative food frequency questionnaire (QFFQ) designed to assess typical food intake of individual males and females in five principal ethnic groups—African-Americans, Japanese-Americans, Latinos, Native Hawaiians, and Whites—and who lived in Hawaii and Los Angeles, California. This paper includes four of the five groups. Native Hawaiians were added to the cohort after the other groups; hence, their calibration data are not as yet complete. The cohort includes substantial numbers of members from each ethnic group. This adds power to the statistical assessment of the relation between diet and cancer, by augmenting the variability of individual dietary habits within ethnic group with the variability between ethnic groups in mean consumption of nutrients and particular food items. Primary issues arising in the use of this cohort to assess cancer risk include the comparability of risk estimates across ethnic groups and the extent to which between-ethnic group differences in cancer rates are explained by the between-ethnic group differences in diet.
The standard approach to the validation and calibration of a dietary questionnaire compares the questionnaire intakes with daily intakes for a representative sample of subjects. In this calibration study, we compared the QFFQ intakes with intakes from three 24-hour telephone recalls of the previous day’s diet. The statistical method used in this comparison is that of regression-calibration (3–5), in which the average of the 24-hour recall intakes is modeled as a function of the QFFQ intake.
The correlation between nutrient intakes from the 24-hour recalls and nutrients derived from the QFFQ is used as a standard measure of validity of the questionnaire. Under the assumption that each of the 24-hour recalls provides an unbiased estimate of a single day’s diet, a regression-calibration equation can be used as a prediction equation for imputing an estimated true consumption given the QFFQ value. These imputed values, when used in an analysis of the relation between disease and nutrient intake, can eliminate (or at least substantially reduce) certain types of measurement error biases that would otherwise affect the risk estimates (3–5).
The QFFQ we used was developed for use in the multiethnic cohort and includes many ethnic food items. Of particular interest in this study is the comparison of the performance of the QFFQ in the comparatively unstudied populations of African-Americans, Japanese-Americans, and Latinos compared with its performance in Whites. Because we anticipated that our QFFQ might perform differently by ethnic group and sex, we designed our calibration study with enough subjects from each ethnic-sex group to provide well-estimated correlations and regression-calibration equations for each ethnic-sex subcohort. The general criteria used for determining the number of subjects was that the variability in the calibration slope estimate should not add importantly to the variance of corrected risk estimates ultimately obtained from the main study.
Our objective was to produce a self-administered diet history applicable across the ethnic groups. Particular attention was paid to wording and format to ensure that the QFFQ could be understood and completed by persons with less than a high school education.
We developed the QFFQ as a modification of the face-to-face interview method used for many years by our group in Hawaii. In this method, usual frequencies and amounts of foods consumed during the past year are obtained, using photographs showing three portion sizes to estimate quantitative intakes of most foods. To develop the self-administered modification of this method, we initially collected 3-day measured food records from about 60 men and women, aged 45–75 years, from each ethnic group in Hawaii and Los Angeles. These records were utilized to identify food items for inclusion into the questionnaire. For each ethnic group, the percent contribution of the individual foods was computed for each dietary component of major interest (e.g., fat, dietary fiber, vitamin A, carotenoids, and vitamin C). The foods were compiled in a list sorted in order of importance, and the first set of foods contributing at least 85 percent of the nutrient intake was selected. Because foods that met this criteria for any ethnic group were included, the questionnaire actually accounts for much more than 85 percent of the intakes of the major nutrients. Lastly, specific food items uniquely associated with the traditional diets of a particular group irrespective of their contribution to intake were included (e.g., ham hocks for African-Americans, tofu and salted fish for Japanese-Americans, and tamales for Latinos). Similar foods with comparable nutrient composition were grouped into food items.
We chose a single questionnaire for all ethnic groups rather than separate questionnaires for three reasons: 1) The instrument we have used for many years in face-to-face interviews, from which the present questionnaire is adapted, was developed for and validated in multiethnic populations. 2) Although ethnic diets are distinct, many ethnic foods are eaten sufficiently often by all groups in our population to necessitate their inclusion in all of the questionnaires; thus, although frequency of consumption and quantity consumed would differ by ethnic group, the food items listed in the different questionnaires would have a great deal of overlap, thereby reducing the value of creating ethnic-specific instruments. 3) We did not know with certainty the ethnicity of many subjects when we mailed the questionnaire to them, and we would have risked having an individual respond to an inappropriate questionnaire if the QFFQ was ethnic-specific.
The questionnaire includes eight frequency categories for foods and nine for beverages to permit adequate specificity in defining daily intakes. For food items, the highest frequency category is ≥2 times/day, whereas, for beverages, the highest category is ≥4 times/day.
The 3-day measured food records noted above revealed wide variation in amounts consumed within and between ethnic groups. These differences were utilized to identify three typical serving sizes for each category of food, so that the respondent could choose the size that most often typified his/her usual serving. Because it is difficult to visualize a particular serving size for some items (e.g., 3 oz (85 g) of beef), we printed photographs in the questionnaire illustrating three serving sizes of selected food groups. We also collected information on additions to breads, coffee, and tea, and on intakes of condiments. Modifications to nutrients for selected items were made based on the subject’s use of various fats and oils in cooking and on his/her usual practice of eating the fat on meats and the skin on chicken.
It was considered likely that the QFFQ would assess dietary intake differently in each ethnic-sex group. The calibration study was therefore designed to include subjects from each of the ethnic-sex groups, and in essence a separate calibration study was performed for each of these subgroups.
For each ethnic-sex subgroup, a random sample of approximately 260 subjects was targeted for inclusion in the calibration study, based on lists of cohort members. An initial 24-hour dietary recall (of the previous day’s diet) was performed after telephone contact had been established and then two more recalls were taken on randomly selected days of the week approximately one month apart. Participating subjects were not informed in advance when the 24-hour dietary recall was to occur. Interviewers were registered dietitians, two working at each location (Hawaii and Los Angeles) who were trained in eliciting 24-hour dietary information. The interviewers were instructed to give an assessment of the quality of each 24-hour recall at the time that it was performed, based on their impressions of how forthcoming the subject had been during the interview.
An additional QFFQ identical to the initial questionnaire was mailed to the calibration study subjects approximately 4–6 weeks after the three 24-hour recalls were completed. This second questionnaire covers the reference period of the 24-hour recalls. In addition, the calibration substudy required several years to complete, making the time between administration of the initial questionnaire and the 24-hour recalls vary greatly across calibration substudy participants. Therefore, in this analysis, nutrients derived from the second QFFQ were compared with the corresponding 24-hour recall values, and subjects who did not return the second QFFQ were excluded.
The same basic food composition data were used to compute nutrients from the 24-hour recalls and the QFFQ. However, the food composition data were modified to account for the grouped foods in the QFFQ. We prepared a customized, and in part ethnic-specific, food composition table for the cohort QFFQ. To develop this database, we utilized the three 24-hour recalls from the first 1,362 participants in the calibration study. Individual foods from the recalls and their frequencies were classified according to the listing of the food items in the QFFQ. Several items, such as coleslaw and cauliflower, were listed as single foods on the QFFQ. However, other QFFQ items were composites, e.g., “pizza” included many different types. Final nutrient composition values for each composite food item were weighted averages of the nutrients for the component foods, where the weights were determined by the frequencies of consumption reported in the recalls.
The questionnaire food consumption database has separate records for meats trimmed of all visible fat and for chicken with skin removed. Based on a cohort member’s response as to their usual practice in regard to fat/skin removal, the appropriate food composition value was used. Similarly, the respondent’s usual preference for a specific type of oil (e.g., sunflower), butter, or margarine (e.g., whipped) was used to assign the appropriate nutrients to these items. A few composite food items in the questionnaire were assigned different weighting of the nutrients depending on ethnicity. For instance, the item “corn tortillas, corn muffins, or corn-bread” was assigned a different nutrient composition for Latinos, where corn tortillas were given a higher weight (based on the 24-hour recall data). The food composition data were based primarily on US Department of Agriculture values (6, 7), with supplementation from laboratory analyses (unpublished data) and other research and commercial publications (8–10). In addition to the values of energy and macroand micronutrients, we included values for carotenoids (11), tocopherols, isoflavonoids, and individual fiber components (unpublished data). The food composition database includes over 1,500 foods and 700 recipes, which were developed from the food records and 24-hour recalls, as well as from established cookbooks.
The standard assumption behind the use of 24-hour recalls to assess the correlations between the QFFQ-based intakes and true nutrient intakes, and to make predictions of true intakes based on the QFFQ intakes, is that each 24-hour recall provides an unbiased estimate of the true consumption of a nutrient (or of some aspect of diet, such as percent calories from fat). For a given nutrient, let xi indicate the true long-term intake of subject i, and let zij be his/her intake estimated from the jth 24-hour recall (j = 1,2,3). We use the standard method for calculating the correlation, RxQ, between xi and the nutrient value from the QFFQ, Qi (3–5). Letting i- be the average of the three 24-hour recall nutrient values, we first calculate the correlation, RzQ, between i and RxQ. RxQ is then estimated from RzQ by adjusting for the fact that i is based on only three recall values rather than a large number. The adjustment formula is
where is the within-person variance of the 24-hour recalls, is the between-person variance of the 24-hour recalls, and m is the number of recalls per person, here 3. The quantities are estimated from a simple variance components analysis of the within- and between-person variability of the 24-hour recalls. The method can be straightforwardly generalized to cover the situation where each QFFQ does not have the same number of associated 24-hour recalls. The RxQ correlations are referred to as “corrected correlations”.
In our correlational analyses, we estimate separate correlations for each of the eight ethnic-sex subgroups. These analyses were carried out for each nutrient by considering the nutrients in absolute terms, as nutrient densities (nutrient/total calories – alcohol), and as nutrient values adjusted for calories based on the method of residuals (3).
and estimate a and b by ordinary regression of i on Qi. In this situation, the slope term, b, is key to measurement error correction of the relations between disease outcome and QFFQ estimates of nutrient intake in the entire cohort. In its simplest form, the error correction of a log relative risk estimate derived from the main cohort analysis of disease and nutrient intake from the QFFQ is accomplished by dividing the estimate by b (4, 5).
We estimated separate intercept, a, and slope, b, parameters for each of the ethnic-sex subgroups. Only the slope parameters are needed to combine estimates of nutrient effect on disease risk across subgroups, but the intercept parameters are required in addition for the comparison of true intakes between subgroups. We performed additional analyses to investigate whether the slope, b, depends on other factors beyond sex and ethnic group, such as age, educational level, and body mass index (BMI). Such analyses of the calibration study data may be important in later work with the cohort, in order to assess the comparability of risk estimates across these variables as well.
Consider the contribution to the variance of an error-corrected log relative risk estimate due to sampling error in our estimate of slope in the calibration study. In its simplest form, the corrected log relative risk estimate may be written as /, where is the log of the uncorrected relative risk estimate and is the estimate of the calibration slope b. For a calibration study with a very large number of subjects, there is no error in the estimation of b so that the variance of the corrected estimate is then
The ratio of the variance of the corrected estimate based on a finite sized (denoted n) calibration study to this value is
Under independence of and n, equation 1 can be written, after a little algebra, as
If we approximate E(1/n) as 1/b and Var(1/n) as Var (n)/b4, then we see that equation 2 is approximately equal to
where c = 1 + [E()/SE()]2. The square root of equation 4 approximates the inflation in the size of confidence intervals for the true log relative risk which is due to correction using only a finite number of subjects in the calibration study. Note that the ratio of expectation to standard error, i.e., E()/SE(), determines the power of the main study to detect a nonzero log relative risk, using the uncorrected estimate. Based on standard sample size considerations, for a main study that is designed to have well specified type I and II error properties,
Thus, for a study with 80 percent power to reject the null hypothesis at the 95 percent level of significance, c must be set to 1 + (1.96 + 0.84)2 = 8.84.
Note also that the estimate of b/SE(n) (the inverse of which appears in equation 4 is the usual t statistic for testing for a nonzero calibration slope estimate. If t > 6.5 in the calibration study, then for a reasonably sized main study (one with −80 percent power to detect an effect), from equation 4 the inflation in the confidence interval for true log relative risk due to errors in the calibration study estimate will be less than 10 percent. If t = 9.3, then the inflation factor is 5 percent, and if t = 4.5, the inflation is 20 percent. We report below the range of values of t observed in the calibration study for the eight different sex-ethnic group combinations, and use this statistic as a performance criteria for judging the adequacy of the sample size in the calibration.
Calibration study members completed their initial QFFQs over the period from May 1993 to January 1996. The 24-hour recalls were collected from March 1994 to June 1997, and the second questionnaires were collected from May 1994 to June 1997. Table 1 gives participation rates in the calibration substudy. Because the 24-hour recalls were done by phone, subjects had to be reachable by this mechanism. Our main losses to participation were caused by cohort members whose phone numbers were unavailable. Of those we were able to contact, the majority (range 76–90 percent) agreed to participate and completed all three recalls. However, many of the subjects failed to return the second QFFQ, with response rates ranging from 52 percent to 84 percent. Approximately 95 percent of 24-hour recalls were assessed as being reliable. No unreliable recalls were used in analysis. Additionally, persons were excluded if they had fewer than two reliable recalls or if their calorie intakes from the second QFFQ were outside the range of 500–8,000 kcal per day.
Although a substantial proportion of the originally selected subjects were not included in the calibration analyses, the final sample was reasonably representative of the entire cohort. Table 2 gives background information (age, body weight, BMI, education, and smoking status) on the subjects retained in the analysis compared with all cohort members, by sex and ethnicity. The calibration subjects who completed the second QFFQ and who were retained in the analysis well mirror the full cohort for age, weight, BMI, and smoking status. However, they are somewhat better educated than the full cohort. This implies that it may be especially important, here and in future work, to investigate whether the performance of the questionnaire in assessing diet varies by educational attainment.
The average 24-hour recall estimates of total energy intake were 1,811 kcal/day for males and 1,428 kcal/day for females averaged over all the ethnic groups. While these estimates are low even in comparison with what may be expected in sedentary older populations (12), the impact of underreporting on the 24-hour recalls on calibration slopes and correlations is presently unknown (see Discussion).
Tables 3 and and44 give corrected correlations for the regression of mean 24-hour intakes on the QFFQ intakes by ethnic-sex group for 10 selected dietary components. Nutrient densities generally have better correlations than do absolute values of nutrients. Nutrient densities and calorie-adjusted nutrient values give very similar results, as would be expected.
The subgroups with the highest average correlations for absolute intakes in tables 3 and and44 are White males and females (respective R’s = 0.57 and 0.48); the subgroups with the lowest average correlations are African-American males and females (respective R’s = 0.30 and 0.26). For nutrient densities, all groups have relatively high average correlations, varying from a low of 0.57 in Latino females to a high of 0.74 in White females.
Tables 5 and and66 show the slope estimates from the calibration equations for males and females, respectively. The extent to which these slopes differ from 1.0 reflects the amount of adjustment needed to “correct” the observed QFFQ nutrient intakes for measurement error. For example, based on the results in table 6, African-American and Latino women will require the greatest adjustment to their QFFQ fat density values.
Table 7 gives the overall between-ethnic group significance test results for differences in the slope estimates for nutrient densities. Significant differences (p < 0.05) in the regression slopes of the calibration equations are found between ethnic groups for total calories and vitamin A in males, where the slopes are lower for African-Americans and Japanese-Americans, respectively. For females, significant differences were found for protein, fat, saturated fat, and carbohydrate, generally resulting from the higher slopes in Whites.
The values for t = b/SE(), ranged from 1.6 to 12.6, for the nutrients and sex-ethnic group combinations considered in the previous tables, with a median value of 5.7. The median value corresponds to an inflation in confidence intervals for the true log relative risk, attributable to variability in the calibration slope estimates, of 13 percent. The estimates of the performance criteria, t, showed patterns similar to the results for the corrected correlations. In general, the t values were smaller for total calories (median 3.8) and absolute nutrients (median 4.4) than for nutrient densities (median 6.8).
As an illustration of further analyses that may be required for a full discussion of questionnaire performance and between-ethnic group comparisons, tables 8 and and99 expand the calibration model for two nutrients: percent of calories from fat, and density of vitamin C. A significant relation between educational level and the slope of the calibration equation for fat density was seen in males, with slopes generally increasing with years of education. However, the relation was not monotonic; for example, slopes for White men in the highest age and BMI groups with <11 years of education, 11–12 years, some college, and college degrees were, respectively, −0.008 (= 0.183 + − 0.191), 0.135, 0.507, and 0.183. The difference between ethnic groups in the slopes for fat density for women remained significant after adjustment for the other variables.
The calibration slopes for both nutrients in both sexes were larger among subjects with BMI ≤ 29 (kg/m2) than among more obese subjects, and the difference was significant in males. This finding indicates that any future analyses of the joint relation between nutrients and BMI and risk of specific cancers may need to control carefully for differences in the performance of the questionnaire by BMI group.
Our correlation measures compare favorably, and those for nutrient densities compare very favorably, with those of other investigators and with our previous work. In the Nurses’ Health Study (13), values ranged from 0.26 for vitamin A without supplements to 0.73 for total vitamin C (unadjusted for energy), based on 28 days of food records for each subject. We report (table 5) for White females correlations that ranged from 0.38 for protein to 0.66 for vitamin C. For energy-adjusted nutrient intake among the nurses, Willett (3) reported correlations from 0.47 to 0.59 for protein, fat, saturated fat, and carbohydrate, compared with our range among White females of 0.65 to 0.59. The correlations for protein and fat were 0.33 and 0.39. Pietinen et al. (14) reported the following correlations: total fat, 0.60; vitamin A, 0.51; and vitamin C, 0.59. We report a wider range of correlations here; however this may be expected given that we are reporting values for a total of eight different ethnic-sex groups. In previous work in Hawaii, a preliminary version of this questionnaire was validated against 28 days of food records (15), in White, Native Hawaiian, Japanese, Chinese, and Filipino ethnic groups. Correlations for total fat, vitamin A, and vitamin C ranged from 0.21 to 0.76 for these groups, a range which is quite similar to our current results for Whites and Japanese-Americans. The large multi-centered European Prospective Investigation into Cancer and Nutrition (EPIC) (16) has recently reported pilot-phase corrected correlation coefficients for energy-adjusted nutrient residuals, which may be compared with the results presented here. These correlations appeared to vary over a roughly similar range as seen in our results. For example, the correlations for energy-adjusted fat intake ranged from 0.09 to 0.87 in the EPIC study, compared with the range of 0.52 to 0.76 reported here. Our poor correlations for calories are in concordance with the poor correlations that have been observed (28) between questionnaires, daily food records, and physical or biologic measurements of total energy expenditure (e.g., doubly labeled water).
Based on between-group differences in the slope estimates in calibration equations, there were significant differences by ethnicity in the performance of the questionnaire in characterizing intake (table 7). These differences highlight the importance of performing separate calibration studies for each of the different ethnic-sex groups. The calibration slopes shown in tables 5 and and66 are needed as a part of the evaluation of the consistency of effects of diet on cancer risk between the ethnic groups, as described previously by Kaaks et al. (17).
In our data, the QFFQ appeared to characterize nutrient intakes somewhat better in Whites, and especially in White females, than in the other ethnic groups. This is particularly true for absolute values, but also true for energy-adjusted values. The average corrected correlations for nutrient densities in males and females, respectively, were 0.61 and 0.60 for African-Americans, 0.66 and 0.61 for Japanese-Americans, 0.62 and 0.57 for Latinos, and 0.66 and 0.74 for Whites (tables 3 and and4).4). These correlations are more than satisfactory for all ethnic-sex groups.
Comparisons of diet questionnaires with daily diet records have been performed in relatively few US minority populations. Coates et al. (18) found that food frequency questionnaires appeared to work less well in minority populations than among Whites generally, although this was not true in every study they reviewed. Recently, Kristal et al. (19) found that ethnicity was related to the effectiveness of the diet questionnaire used in the Women’s Health Trial for a number of nutrients, with correlations among African-American females significantly lower than among Whites and Latinos. Liu et al. (20) also found that the questionnaire used in the Coronary Artery Risk Development in Young Adults (CARDIA) Study performed less well in African-Americans than in Whites, when compared with seven 24-hour recalls. However, unlike in the present study, they found that correlations were not necessarily better for nutrient densities than for absolute nutrients; their average corrected correlations for absolute nutrients were 0.77 for White males, 0.56 for White females, 0.54 for African-American males and 0.37 for African-American females, with the corresponding means for nutrient densities being 0.72, 0.73, 0.44, and 0.34. (Our corresponding means were 0.57, 0.48, 0.30, and 0.26 for absolute nutrients, and 0.66, 0.66, 0.74, and 0.61 for nutrient densities.)
It is notable that for each nutrient and for each ethnic-sex group, the calibration slopes are always less than 1.0 and often less than 0.5. This appears to be a quite general finding in nutritional epidemiology. The meaning of the slope estimate of 0.69 for the group of White males, for example (table 3), is that an observed difference in percent of calories from fat, on the questionnaire, of 20 percent between subjects corresponds to an average true difference of 14 percent [0.14 = 0.69 × (0.40 – 0.20)]. This contraction of differences in mean nutrient intakes between individuals has obvious implications for the interpretation of differences in disease risk from this (or any other) cohort study using diet questionnaires. For example, Hunter et al. (21) found that measurement error correction more than tripled the relative risk estimate per 25 g of fat in a pooled analysis of data from eight cohort studies.
Underlying the use of a calibration study for assessing the performance of the questionnaire are several assumptions, with the most important being that the 24-hour recalls provide unbiased estimates of true diet, and that errors in the 24-hour recalls are uncorrelated with errors in the QFFQ. Other issues, such as whether the 24-hour recalls are more accurate (i.e., less variable around true intake) than the questionnaire or whether the relation between the 24-hour recalls and the questionnaire values is linear (22), are less problematic. As long as a food record or recall is an unbiased estimate of daily food intake, even a single day of data can be used as the comparison instrument in a calibration study. One day of data per subject may give much less information about long-term intake of a particular nutrient than does the questionnaire; however, it is possible to compensate for this by increasing the number of subjects used in the calibration study (23, 24). If examination of a plot of the 24-hour recalls and questionnaire nutrient values indicates that a nonlinear relation holds, then this may be modeled by the inclusion of polynomial or other terms in the regression calibration equation (25).
It is increasingly well recognized that self-reports of 24-hour food intake generally underreport total intake (26), and our study is no exception to this rule. Daily energy intake in a sedentary population is expected to be 1.55 times the basal metabolism rate (12). Briefel et al. (27) reported for the National Health and Nutrition Examination Survey (NHANES) that 24-hour recalls yield energy intake estimates that averaged 1.47 times estimated basal metabolism rate for men and 1.26 times for women. Further evidence of underreporting of total energy intake has been found when comparisons of self-assessments of diet have been made with energy expenditure measurements using the doubly labeled water method (28).
At present, the significance in calibration and correlational analysis of the evident underreporting of total energy intake in 24-hour recalls is not fully understood either for this study or for others using the calibration study approach for measurement error correction. If, for example, the 24-hour recalls simply underestimate intake by some relatively fixed amount, or by a relatively constant fraction of intake, then calibration slope estimates on the arithmetic scale (in the first case), or the log scale (in the second) would be unaffected by the underreporting. Serious problems arise when errors in reporting on the two instruments are correlated, conditional on true intake, because this will result in positively biased correlations and calibration slope estimates. The type of experiments required to carefully address these issues of conditional correlation have been performed only on small numbers of subjects. One such study compared protein excretion in 24-hour urine samples with several diet assessment techniques and came to the conclusion that correlation between errors in the self-reports was substantial (29). However, the extent to which this effect is at play in calibration studies such as ours has not yet been carefully examined.
In our study, we found that men (but not women) with BMI >29 appeared to underreport on the questionnaire their percent of calories from fat and density of vitamin C intake. There have been reports (30, 31) of underreporting of food intake by obese subjects on both questionnaire and daily food records, when reported intake was compared with doubly labeled water measurements of energy expenditure. This observation while suggestive of the kind of correlation between errors, conditional on true intake, that would bias the results of a calibration study, does not constitute prima facie proof of the existence of such an effect. Obese people may be only underreporting more than nonobese on an absolute scale and not as a percent of true intake. If this is so, then multiplicative errors in the two instruments would not be correlated with the obesity of the subject, and on the log scale the corrected correlations and calibration slopes should remain unbiased. Indeed, Lissner et al. (32), in an investigation involving intensive study of 63 adult women found that obese subjects underreported no more than nonobese subjects, on food diaries or 24-hour recalls, when they were compared at the same intake level. If, on the other hand, obese individuals are observed to underreport more on both instruments than do normal subjects at the same true energy level, then this would have impact on the calibration slope and correlation estimates. That is, this amounts to a violation of the assumption that errors in the two reports are independent conditional on true intake. The issue of underreporting in the 24-hour recalls is of serious concern. Further knowledge concerning the extent to which errors in the 24-hour recalls are correlated with the errors in the QFFQ will be needed in order to determine whether corrected correlations or calibration slopes have been seriously distorted in this or similar studies. In the absence of large, carefully designed and analyzed studies, the degree to which our study or other similar calibration studies may be subject to correlation between errors in the two dietary assessments is unknown.
In summary, we found for all sex-ethnic groups that the questionnaire used in this multiethnic cohort gave calibration and correlation results for nutrient densities and energy corrected nutrients that are comparable with those reported by other groups, and highly satisfactory for energy-adjusted nutrients for all subgroups. The performance of the questionnaire varied by ethnic group, with higher correlations in Whites, and also varied by BMI and, to a lesser degree, education level. These findings indicate the need to control for the differences in performance of the questionnaire across subgroups.
This work was supported in part by Public Health Service grant no. R01 CA54281 from the National Cancer Institute.
The authors thank the registered dietitians who collected the 24-hour recalls by telephone: Kapuanani Rothfus, Deborah Ishiyama, and Sabrina Umphress in Hawaii, and Gretchen Perea, Dawn Narvaez, and Kerne Weaver in Los Angeles.