|Home | About | Journals | Submit | Contact Us | Français|
To quantify measurement error in the estimation of family diet intakes using 7‐day household food inventories and to investigate the effect of measurement‐error adjustment on diet–disease associations.
Historical cohort study in 16 districts in England and Scotland, between 1937 and 1939.
4999 children from 1352 families in the Carnegie Survey of Diet and Health. 86.6% of these children were traced as adults and form the Boyd Orr cohort. The reproducibility analysis was based on 195 families with two assessments of family diet recorded 3–15 months apart.
Intraclass correlation coefficients (ICCs) were calculated for a variety of nutrients and food groups. Diet–cancer associations reported previously in the Boyd Orr cohort were reassessed using two methods: (a) the ICC and (b) the regression calibration.
The ICCs for the dietary intakes ranged from 0.44 (β carotene) to 0.85 (milk and milk products). The crude fully adjusted hazard ratio (HR) for cancer mortality per 1 MJ/day increase in energy intake was 1.15 (95% CI 1.06 to 1.24). After adjustment using the ICC for energy (0.80) the HR (95% CI) increased to 1.19 (1.08 to 1.31), and the estimate from regression calibration was 1.14 (0.98 to 1.32). The crude fully adjusted odds ratio (OR) for cancer incidence per 40 g/day increase in fruit intake was 0.84 (95% CI 0.73 to 0.97). After adjustment using the fruit ICC (0.78) it became 0.81 (0.67 to 0.96) and the OR derived from regression calibration was 0.81 (0.59 to 1.10).
The diet–disease relationships for the dietary intakes with low measurement error were robust to adjustment for measurement error.
Assessment of usual diet in free‐living humans is challenging and prone to error.1 Random errors in the measurement of dietary intakes are likely to lead to an underestimation of the association between dietary intake and disease—that is, non‐differential misclassification.2 If the measurement error is quantifiable, corrections can be made for the error and an adjusted association between the diet variables and the disease outcome can be estimated.3
A validation study within the Nurses Health study showed that vitamin A and energy intakes had been measured with considerable error.4 On adjusting for this measurement error, the association between vitamin A and the incidence of breast cancer substantially increased. However, in such validation studies, the “gold standard” method used is often an alternative dietary assessment method to the one used in the main study and is regarded as relatively reliable for measuring dietary intakes, but itself may be prone to random and systematic measurement errors.
Random within‐person error can be quantified by completing a reproducibility study—that is, repeat measures being taken on the whole sample or a subsample of the subjects studied.3 From these reproducibility studies, the reliability of the estimated nutrient intake can be measured by the intraclass correlation coefficient (ICC), and the methods used to adjust for measurement error include the use of ICC and also regression calibration.5,6
Measurement error correction by regression calibration within categorical covariates (eg, quartiles of food intake) is more complex than using continuous measurements because the assumption that the errors are uncorrelated with the true values does not hold.7 For example, it has been shown, when examining categorical measures of green tea intake in relation to cholesterol levels, that if the correlation between error and true value was ignored, correction for measurement error led to overadjustment.7
The effect of measurement error in confounding variables is less predictable and can lead to overestimation or underestimation of diet–disease associations.3 Hence, account should also be taken of the measurement errors in the confounding variables as well as in the main exposure of interest when statistical adjustments are applied to exposure–disease associations.8
Studies using the Boyd Orr cohort have related childhood family diet to cancer incidence and mortality in adulthood,9,10 but as yet have not taken account of any measurement errors within the dietary data.
This study describes a reproducibility study completed in a subsample of the families who took part in the Carnegie survey of diet and health in pre‐war Britain,11 and the traced children of the families in the survey comprise the Boyd Orr cohort. Using this reproducibility study, we quantify the measurement error in the dietary intakes and assess the impact of this error on previously reported estimates of diet–disease associations using the whole Boyd Orr cohort.9,10
The methods used to establish the Boyd Orr cohort have been described previously.12,13 In summary, the data come from the original records of the Carnegie survey of diet and health in pre‐war Britain. A total of 1352 families from 16 urban and rural areas of England and Scotland were surveyed between 1937 and 1939. Children from the surveyed families were traced as adults through the National Health Service central register. Two studies based on the Boyd Orr cohort have investigated the association between childhood diet and risk of cancer in adulthood.9,10 In Frankel et al's9 analysis of adult cancer mortality in relation to family energy intake in childhood, the follow‐up period was from 1 January 1948 to 30 June 1996. In Maynard et al's10 analysis of cancer incidence and mortality in relation to family fruit and vegetable intake in childhood, the follow‐up period was extended to 31 July 2000. Since this study is based on a reanalysis of previous work on the cohort and historical data, no new ethical approvals were required.
Dietary information on study participants was obtained from a detailed 7‐day inventory of household diet; this was recorded for each of the 1352 participating families. All foods in the home at the beginning and at the end of the seven survey days were weighed and recorded. The number of meals consumed outside the home was also noted, as were the weights and descriptions of the household refuse. The number of family members present for each meal was recorded and the survey personnel visited the family during the week to ensure that all food purchased was being recorded. Subsequently, the dietary information was transcribed onto summary sheets, which were then analysed using the contemporary (1930s) food tables.11
A second 7‐day household inventory was completed by 361 families (27% of all study families). The interval between the baseline and repeat dietary assessment varied from 3 to 15 months between districts. The families were recruited from eight of the survey areas in Scotland and from three in England. The original report did not describe the sampling strategy for the repeat diet assessments. The survey report states, “In order to check the reproducibility of the results which had been obtained by the survey method duplicate investigations of some households were undertaken after an interval that varied from centre to centre. These repeat surveys were made at eight of the Scottish and at three of the English centres and the number of households so treated was 361, approximately one in four of those originally examined…”.11
A total of 151 families in the reproducibility study also took part in a dietary supplementation trial nested within the Carnegie survey14; subsequently, dietary intakes from the repeat inventories were higher than those at baseline. These families and also 15 families with missing data were excluded from the reproducibility analysis, which used data provided by the remaining 195 families.
Recoding of the foods from the baseline and repeat inventories was carried out using the “diet in data out” programme, developed by the MRC Human Nutrition Research in Cambridge.15 Nutrient intakes were recalculated by linking the diet in data out input with the nutrient values from current food tables.16 Where composition of foods in the 1930s differ from similar foods today (eg, meat and meat products), or where there is no modern day equivalent, nutrient values from pre‐war food tables were added to the database and used to calculate the nutrient intakes.17,18,19,20,21,22
Daily per capita dietary intakes for each family were derived from the household data by taking into account the number of family members and visitors present for each meal during the survey week, and the number of days for which the household inventory was kept. For each family two sets of dietary intakes were estimated, one from baseline and the other from repeat inventory. The nutrient intakes calculated were energy, carbohydrate, fat, protein, retinol, β carotene, vitamins A, C and E, and calcium. Food group intakes were calculated for cereals and cereal products, milk and milk products, eggs and egg products, total vegetables, potatoes and potato products, total fruits, fish and fish products (excluding fish oil), meat and meat products, animal, fish, vegetable fats and oils, and sugars, preserves and confectionery.
Differences in family characteristics between the baseline and repeat inventories, and between those families used in the repeatability analysis and those not, were assessed using χ2 tests, paired t tests and regression analyses. Mean daily per capita dietary intakes from baseline and repeat inventories were compared using paired t tests. The ICC and 95% CIs of the ICC for the repeated dietary data were calculated.
We used dietary intake as a continuous variable in all analyses. For energy intake, the relationship with mortality was expressed per 1 MJ/day increase.9 The associations between cancer and fruit intake10 were recalculated as per 40 g/day increase in fruit intakes. The increment of 40 g/day was chosen since it is half of one recommended serving of fruit.23 No energy adjustment was carried out on dietary measures since, in the previous diet–disease relationships examined,10 energy intakes were taken into account in the original regression models and also in the measurement error adjusted models used here. To enable us to compare the regression coefficients across the various models, we did not take into account clustering (cohort members nested within families) in any of our analyses as the software we used did not enable us to do so in the regression calibration models.
In our initial assessment of measurement error, we adjusted the crude (ignoring measurement error) hazard ratios (HRs) and odds ratios (ORs) by dividing the effect estimates (on the log scale) by the ICC for the respective dietary intakes, and then exponentiating the adjusted estimate to give an adjusted HR/OR.2 The 95% CIs were calculated in an analogous fashion and do not take into account sampling variation in the estimation of ICC. In our subsequent assessment of measurement error effects we carried out a fully adjusted analysis, using regression calibration.24 This method has two steps: first, the calibration step, which develops a model for the unknown “true” exposures based on the replicates; then the predicted “true” values from the calibration models are used in the usual regression model for the outcome. The standard errors for the estimates of diet–disease associations from this model are adjusted (using the bootstrap) for the uncertainty in estimating the measurement error. This method assumes that measurement errors are unrelated to the true covariate values, to each other and to other subject characteristics. The regression calibration method was used to calculate HR/ORs adjusting for error in the measurement of the dietary exposure (energy and fruit intakes), and then additionally to adjust for error in the measurement of a covariate (household food expenditure). Analyses were performed using STATA V.8.
The mean (SD) size of the families increased between baseline (5.74 (1.84)) and repeat inventory (5.94 (1.88); p=0.0003). One of the reasons for the increase was that the mean (SD) number of children (aged 0–20 years) in the household increased from baseline (3.60 (1.83)) to repeat inventory (3.72 (1.95); p=0.0025). For the baseline inventories, 96 (49%) were completed in the spring (March–May), 47 (24%) in the summer (June–August), 28 (14%) in the autumn (September–November) and 24 (12%) in the winter (December–February) compared with 38 (19%) of the repeat inventories being completed in the summer, 87 (45%) in the autumn and 70 (36%) in the winter (p<0.001). The mean (SD) weekly per capita food expenditure did not change appreciably from the baseline (32.0 pence (13.7)) to the repeat inventory (32.2 pence (15.0); p=0.63).
After controlling for seasonal and district differences between participating and non‐participating families, the 195 families that took part in the reproducibility analysis were similar to the non‐participating families with regard to household income, expenditure on food and family size. The social class distribution of the families included in the reproducibility analyses differed slightly from the whole cohort (p=0.031); the social class distribution (based on the head of the household) of the families included in the reproducibility analysis was social class I, 1.7%; II, 15.3%; IIINM, 21.0%; IIIM, 30.1%; IV, 11.4% and unemployed, 20.5% compared with I, 1.5%; II, 7.4%; IIINM, 21.8%; IIIM, 23.6%; IV, 16.5% and unemployed, 29.3% for the non‐participating families. When categories were collapsed into two combined social groups (I‐IIINM and IIIM‐unemployed), to investigate ICC differences for the dietary intakes between the combined two social groups, there was no evidence of a difference in the proportions of the combined social groups between the participating and non‐participating families (p=0.55). The only baseline dietary intake for which there was evidence of a difference between the families in the reproducibility analysis and those not, was for intake of potatoes and potato products (p=0.016), with a higher baseline mean (SD) intake of 187 g (80.5) for those not included compared with 177 g (78.9) for those included.
There were no marked differences between the mean family nutrient intake measured at baseline and 3–15 months later (table 11).). However, more eggs and egg products, and animal, fish, vegetable fats and oils were eaten by families at the baseline assessment than at follow‐up, whereas the converse was the case for vegetables, potatoes and potato products, and meat and meat products (table 22).
The ICC for the nutrient intakes ranged from 0.44 (β carotene) to 0.83 (fat). The ICC for the food groups ranged from 0.48 (eggs and egg products, fish and fish products, total vegetables) to 0.85 (milk and milk products; table 33).
Table 44 shows the age adjusted and fully adjusted HRs for the relationship between energy intake and all‐cause and cancer mortalities.
When the original HRs were adjusted taking into account the ICC (table 44),), generally the associations became slightly stronger. When the HRs were adjusted for measurement error in energy intake by regression calibration, there was no consistent change in the strength of associations with some increasing and others decreasing in magnitude. Overall, there were no marked differences in the association between measurement error adjusted and crude estimates of energy and cancer mortality.
Table 55 shows the crude and adjusted estimates of associations between fruit intake and cancer incidence and mortality (for all cancers and those not related to smoking).
Using both ICC and regression calibration (taking into account the errors in the fruit and energy intakes) approaches to adjust the ORs, the association for a beneficial effect of increasing fruit intake on cancer incidence/mortality was strengthened in comparison to when no adjustment for measurement error was made. Generally, this also occurred for the regression calibration when the errors in the estimation of household food expenditure were also taken into account.
In our previous work10 we reported associations between high intake of vitamin E and a reduced risk of breast cancer, but we found no protective effect of vegetable‐rich childhood diets. Dietary intakes of both vitamin E (ICC 0.57) and vegetables (ICC 0.48) were, however, not as reproducible as those for energy and fruit intake, and so we investigated the possible effect of measurement error on these associations.
On recalculation of the association between vitamin E (as a continuous variable) and breast cancer mortality, the crude estimate (95% CI) for the fully adjusted model (using the same factors as Maynard et al10) was 0.77 (0.59 to 1.00). On adjustment using the vitamin E ICC, the resulting OR (95% CI) was 0.63 (0.40 to 1.00), and using regression calibration it was 0.76 (0.52 to 1.11), adjusting for measurement errors in vitamin E, energy intakes and household food expenditure, for the fully adjusted model.
The crude OR (95% CI) for the association between vegetable intakes (per 40 g increments) and total cancer incidence was 1.04 (0.94 to 1.14). On measurement error adjustment using the vegetable ICC, the ORs (95% CI) increased to 1.09 (0.88 to 1.31) and by regression calibration, which took into account the measurement errors in the estimation of the vegetable, energy intakes and household food expenditure, it increased to 1.10 (0.89 to 1.35) for the fully adjusted model. For the association between vegetable intakes and total cancer mortality, on measurement error adjustment using the ICC and regression calibration (taking into account the same errors as for cancer incidence), the resulting ORs (95% CI) were 0.98 (0.75 to 1.24) and 1.02 (0.79 to 1.30) respectively, compared with the crude estimate for the fully adjusted model of 0.99 (0.87 to 1.11).
We have shown that dietary intakes, assessed in a sample of 195 families studied in the 1930s using 7‐day household food inventories, are reasonably reproducible. ICCs ranged from 0.44 (β carotene) to 0.85 (milk and milk products). The level of reproducibility of these data increases confidence that the dietary intakes calculated for the overall Boyd Orr cohort (which included these families) were representative of the habitual family diet of survey members in the 1930s.
In reproducibility and validation studies within the Nurses Health study, the lowest Pearson correlation coefficients obtained were for intakes of vitamin A (using repeat food records: 0.36, repeat Food Frequency Questionnaire: 0.52).4 In our study the lowest ICC was 0.44 for β carotene and the ICC was 0.49 for total vitamin A (in this study, the ICC were very similar to the Pearson correlation coefficients), possibly due to variation between the baseline and repeat inventories in the food sources of vitamin A and β carotene in the diet, such as vegetables, meat and meat products, and animal, fish, vegetable fats and oils.25 The highest correlation coefficients obtained in the Nurses Health study were, for carbohydrate (r=0.72) using the diet records and for sucrose (r=0.71) using the Food Frequency Questionnaires.4 In this study, the carbohydrate intake was also highly reproducible (ICC=0.78), with recordings of foods rich in carbohydrate—that is, cereal and cereal products, and sugar, preserves and confectionery, being highly reproducible.
When measurement errors in the estimation of the energy or fruit intakes were taken into account by using the ICC to adjust the HR and ORs, the strength of the diet–disease associations increased. This is similar to findings in other studies when adjustments were made to diet–disease risk estimates taking account of the measurement error.6,26
The ratios adjusted by regression calibration for errors in measurement of dietary exposures were stronger than the crude estimates on two of five occasions with energy–mortality associations, and on all four occasions with the fruit–cancer associations (partially adjusted models). Additionally, when account was taken of the errors in household food expenditure, the associations between energy and mortality became stronger on one of five occasions, and the associations between fruit intakes and cancer became stronger on three of four occasions (fully adjusted models).
The 95% CIs for the adjusted ratios were wider than those for the crude estimates. This reflects the additional uncertainty in estimating the degree of measurement error—that is, the uncertainty around using the replicate measurements to estimate the “true” dietary intake of energy and fruit consumption. When errors in confounding variables (eg, food expenditure) were also included, CIs became even wider, again due to the uncertainty in estimating the measurement error.
Since both energy and fruit intakes were found to have been estimated with a relatively small amount of measurement error, the adjustment on dietary–cancer relationships using dietary intakes with more measurement error were investigated for vegetables and vitamin E intakes. For the vegetable–cancer incidence/mortality associations, the changes in ORs on measurement error adjustment were not that dissimilar to those found with the fruit intakes. The direction of the vegetable–cancer associations suggest that increasing levels of vegetable intake are weakly associated with an increased risk of cancer; however, the ORs generally only show a modest increase in risk and all the 95% CIs include the null value of one, providing limited evidence for an adverse effect of childhood family vegetable intakes on cancer incidence/mortality. Evidence was originally found, using quartiles of vitamin E intake, that a higher intake was associated with a decrease in breast cancer mortality.10 On completing measurement error adjustment, the ORs increased in strength but the 95% CIs widened, especially when regression calibration was used and account was taken of the measurement errors in the estimation of the vitamin E, energy intakes and household food expenditure.
This study had two strengths: the sample size of the reproducibility study and that repeat data were available on a likely confounder of the diet–disease relationship, household food expenditure. Reproducibility studies need a sample size adequate to estimate the measurement error with reasonable precision.27 The sample of families used in this reproducibility analysis (n=195) was reasonable. Sample sizes of this magnitude yield relatively precise estimates of the ICC as indicated by the width of 95% CI in table 33.28 There were only minimal dietary and social differences between the families used in the reproducibility analysis and those in the remaining whole cohort, suggesting that the sample selected will have provided an unbiased estimate of reproducibility.
Imprecision in confounding variables has been shown to cause as much bias in investigations of the association between blood lipid levels and coronary heart disease risk estimates as that from the main exposure variable of interest.8 Here, we adjusted for measurement error in estimates of household food expenditure and energy intake as confounders in adjusted models, using the regression calibration method. This led to little material change in estimates of diet–disease associations, with most of the estimates moving further away from the null value of one on adjustment.
There were a number of methodological issues in this study. Firstly, one component of the variation between the repeat measures could be seasonality. A high proportion of the baseline inventories were completed in spring and summer, whereas the repeat inventories were generally completed in autumn and winter. Owing to the time of the study, it is to be expected that season would be one of several factors causing the variation in the dietary measures, possibly due to differing availabilities of certain foods. Hence it was important that season was taken account of in the adjusted models. Seasonality in this study was linked with the area in which the family resided since the inventories from each area were usually completed within the same season. To account for this, district of residence was included in all models.
Another source of measurement error in the dietary measures could be due to how the daily per capita intakes were calculated. It was assumed that diets of children and adults were qualitatively similar and that foods and nutrients were evenly distributed within the families. However, no data were available on how foods were distributed within the families, hence the assumptions cannot be checked. There is evidence to suggest that food distribution within families does vary and is not always related to need.29 If the measurement errors due to this assumption were random an underestimation of the diet–disease associations would be seen, whereas if the errors were systematic then the associations may be biased.
In the Boyd Orr study, diet was measured at the household rather than the individual level and food and nutrient intakes are probably less variable at the household than the individual level. Furthermore, in this study, diet was measured in the 1930s at a time when food choice in Britain was not as wide as it is today and obtaining a measure of habitual diet is much easier when a limited variety of foods are consumed. So our findings of high reproducibility may not be generalisable to contemporary patterns of family diet. Thus, in current dietary studies it is important to obtain and use reproducibility data, as shown here.
Another limitation is that no account was taken of dietary intake during adulthood, only dietary intake from childhood was available. However, studies have found that food habits and choices are established in childhood and that for many these behaviours track into adulthood.30,31 Smoking is another factor during adulthood that could have influenced the end points examined here; however, the diet–disease associations were examined separately for smoking and non‐smoking‐related cancers.
A key assumption of our analysis was that the error in a given variable is not related to another variable included in the model. This could not be checked here, as this is a reproducibility study rather than a validity study. However, as the nutrient, food group and energy intakes are each derived from the same food items in the household inventory, it is possible that an error in one inventory item could lead to errors in estimation of both a food group intake and the energy intake, thus leading to correlated errors.32
Finally, although we suggest that the regression models are fully adjusted, this may not necessarily be true. Unknown confounders can affect the associations found in cohort studies.33 However, we have taken care in this study to identify the main confounders in the diet–disease relationships, but cannot guarantee that we have identified all the confounders.
This study increases confidence in the estimation of the dietary intakes for the Boyd Orr cohort as children in the 1930s. These analyses highlight that for the diet measures with low measurement error, the estimates of diet–disease relationships were robust to adjustment for measurement error in explanatory and confounding variables.
We thank Professor Peter Morgan, Director of the Rowett Research Institute, and Walter Duncan, honorary archivist to the Rowett Research Institute, for allowing us access to the original research records and Dr Richard Martin, who helped to provide an updated dataset. We also thank all the research workers and subjects who participated in the original survey 1937–39. The World Cancer Research Fund has funded this specific research on the Boyd Orr cohort.
ICC - intraclass correlation coefficient
Funding: This work was supported by the World Cancer Research Fund.
Competing interests: None.