Using data from the National Institutes of Health-AARP Diet and Health Study, we evaluated the influence of adulthood weight history on mortality risk. The National Institutes of Health-AARP Diet and Health Study is an observational cohort study of US men and women who were aged 50–71 years at entry in 1995–1996. This analysis focused on 109,947 subjects who had never smoked and were younger than age 70 years. We estimated hazard ratios of total and cause-specific mortality for recalled body mass index (BMI; weight (kg)/height (m)2) at ages 18, 35, and 50 years; weight change across 3 adult age intervals; and the effect of first attaining an elevated BMI at 4 successive ages. During 12.5 years' follow-up through 2009, 12,017 deaths occurred. BMI at all ages was positively related to mortality. Weight gain was positively related to mortality, with stronger associations for gain between ages 18 and 35 years and ages 35 and 50 years than between ages 50 and 69 years. Mortality risks were higher in persons who attained or exceeded a BMI of 25.0 at a younger age than in persons who reached that threshold later in adulthood, and risks were lowest in persons who maintained a BMI below 25.0. Heavier initial BMI and weight gain in early to middle adulthood strongly predicted mortality risk in persons aged 50–69 years.
body mass index; body weight; mortality; obesity; weight gain; weight loss
It is of interest to estimate the distribution of usual nutrient intake for a population from repeat 24-h dietary recall assessments. A mixed effects model and quantile estimation procedure, developed at the National Cancer Institute (NCI), may be used for this purpose. The model incorporates a Box–Cox parameter and covariates to estimate usual daily intake of nutrients; model parameters are estimated via quasi-Newton optimization of a likelihood approximated by the adaptive Gaussian quadrature. The parameter estimates are used in a Monte Carlo approach to generate empirical quantiles; standard errors are estimated by bootstrap. The NCI method is illustrated and compared with current estimation methods, including the individual mean and the semi-parametric method developed at the Iowa State University (ISU), using data from a random sample and computer simulations. Both the NCI and ISU methods for nutrients are superior to the distribution of individual means. For simple (no covariate) models, quantile estimates are similar between the NCI and ISU methods. The bootstrap approach used by the NCI method to estimate standard errors of quantiles appears preferable to Taylor linearization. One major advantage of the NCI method is its ability to provide estimates for subpopulations through the incorporation of covariates into the model. The NCI method may be used for estimating the distribution of usual nutrient intake for populations and subpopulations as part of a unified framework of estimation of usual intake of dietary constituents.
statistical distributions; diet surveys; nutrition assessment; mixed-effects model; nutrients; percentiles
With the advent of Internet-based 24-hour recall (24HR) instruments, it is now possible to envision their use in cohort studies investigating the relation between nutrition and disease. Understanding that all dietary assessment instruments are subject to measurement errors and correcting for them under the assumption that the 24HR is unbiased for usual intake, here the authors simultaneously address precision, power, and sample size under the following 3 conditions: 1) 1–12 24HRs; 2) a single calibrated food frequency questionnaire (FFQ); and 3) a combination of 24HR and FFQ data. Using data from the Eating at America’s Table Study (1997–1998), the authors found that 4–6 administrations of the 24HR is optimal for most nutrients and food groups and that combined use of multiple 24HR and FFQ data sometimes provides data superior to use of either method alone, especially for foods that are not regularly consumed. For all food groups but the most rarely consumed, use of 2–4 recalls alone, with or without additional FFQ data, was superior to use of FFQ data alone. Thus, if self-administered automated 24HRs are to be used in cohort studies, 4–6 administrations of the 24HR should be considered along with administration of an FFQ.
combining dietary instruments; data collection; dietary assessment; energy adjustment; epidemiologic methods; measurement error; nutrient density; nutrient intake
The authors describe a statistical method of combining self-reports and biomarkers that, with adequate control for confounding, will provide nearly unbiased estimates of diet-disease associations and a valid test of the null hypothesis of no association. The method is based on regression calibration. In cases in which the diet-disease association is mediated by the biomarker, the association needs to be estimated as the total dietary effect in a mediation model. However, the hypothesis of no association is best tested through a marginal model that includes as the exposure the regression calibration-estimated intake but not the biomarker. The authors illustrate the method with data from the Carotenoids and Age-Related Eye Disease Study (2001--2004) and show that inclusion of the biomarker in the regression calibration-estimated intake increases the statistical power. This development sheds light on previous analyses of diet-disease associations reported in the literature.
bias (epidemiology); carotenoids; cataract; lutein; measurement error; sample size
To examine associations between food patterns, constructed with cluster analysis, and colorectal cancer incidence within the National Institutes of Health (NIH)–AARP Diet and Health Study.
A prospective cohort, aged 50–71 years at baseline in 1995–96, followed until the end of 2000.
Subjects and Method
Food patterns were constructed, separately in men (n=293 576) and women (n=198 730), with 181 food variables (daily intake frequency per 1 000 kilocalories) from a food frequency questionnaire. Four large clusters were identified in men and three in women. Cox proportional hazards regression examined associations between patterns and cancer incidence.
In men, a Vegetable and Fruit Pattern was associated with reduced colorectal cancer incidence (multivariate HR: 0.85 95%CI: 0.76, 0.94), when compared to less salutary food choices. Both the Vegetable and Fruit pattern and a Fat-Reduced Foods pattern were associated with reduced rectal cancer incidence in men. In women, a similar Vegetable and Fruit pattern was associated with colorectal cancer protection (age-adjusted HR: 0.82 95%CI: 0.70, 0.95), but the association was not statistically significant in multivariate analysis.
These results, together with findings from previous studies support the hypothesis that micronutrient dense, low-fat, high-fiber food patterns protect against colorectal cancer.
food patterns; cluster analysis; colorectal cancer; prospective cohort
Prospective epidemiologic data on the effects of different types of dietary sugars on cancer incidence have been limited. In this report, we investigated the association of total sugars, sucrose, fructose, added sugars, added sucrose and added fructose in the diet with risk of 24 malignancies. Participants (n = 435,674) aged 50–71 years from the NIH-AARP Diet and Health Study were followed for 7.2 years. The intake of individual sugars was assessed using a 124-item food frequency questionnaire (FFQ). Cox proportional hazards regression was used to estimate hazard ratios (HR) and 95% confidence intervals (CI) in multivariable models adjusted for confounding factors pertinent to individual cancers. We identified 29,099 cancer cases in men and 13,355 cases in women. In gender-combined analyses, added sugars were positively associated with risk of esophageal adenocarcinoma (HRQ5 vs. Q1: 1.62, 95% CI: 1.07–2.45; Ptrend = 0.01); added fructose was associated with risk of small intestine cancer (HRQ5 vs. Q1: 2.20, 95% CI: 1.16–4.16; Ptrend = 0.009); and all investigated sugars were associated with increased risk of pleural cancer. In women, all investigated sugars were inversely associated with ovarian cancer. We found no association between dietary sugars and risk of colorectal or any other major cancer. Measurement error in FFQ-reported dietary sugars may have limited our ability to obtain more conclusive findings. Statistically significant associations observed for the rare cancers are of interest and warrant further investigation.
Sugars; added sugars; diet; cancer; AARP Study
To provide preliminary clinical performance evaluation of a novel CaP assay, PSA/SIA (Solvent Interaction Analysis) that focused on changes to the structure of PSA.
222 men undergoing prostate biopsy for accepted clinical criteria at three sites (University Hospitals Case Medical Center in Cleveland, Cleveland Clinic, and Veterans Administration Boston Healthcare System) were enrolled in IRB approved study. Prior to TRUS guided biopsy, patients received DRE with systematic prostate massage followed by collection of urine. The PSA/SIA assay determined the relative partitioning of heterogeneous PSA isoform populations in urine between two aqueous phases. A structural index, K, whose numerical value is defined as the ratio of the concentration of all PSA isoforms, was determined by total PSA ELISA and used to set a diagnostic threshold for CaP. Performance was assessed using ROC analysis with biopsy as the gold standard.
Biopsies were pathologically classified as case (malignant, n=100) or control (benign, n=122). ROC performance demonstrated AUC=0.90 for PSA/SIA and 0.58 for serum tPSA. At a cutoff value of K=1.73, PSA/SIA displayed sensitivity=100%, specificity=80.3%, PPV=80.6%, and NPV=100%. No attempt was made in this preliminary study to further control patient population or selection criteria for biopsy, nor did we analytically investigate the type of structural differences in PSA that led to changes in K value.
PSA/SIA provides ratiometric information independently of PSA concentration. In this preliminary study, analysis of the overall structurally heterogeneous PSA isoform population using the SIA assay showed promising results to be further evaluated in future studies.
We consider a Bayesian analysis using WinBUGS to estimate the distribution of usual intake for episodically consumed foods and energy (calories). The model uses measures of nutrition and energy intakes via a food frequency questionnaire (FFQ) along with repeated 24 hour recalls and adjusting covariates. In order to estimate the usual intake of the food, we phrase usual intake in terms of person-specific random effects, along with day-to-day variability in food and energy consumption. Three levels are incorporated in the model. The first level incorporates information about whether an individual in fact reported consumption of a particular food item. The second level incorporates the amount of intake from those individuals who reported consumption of the food, and the third level incorporates the energy intake. Estimates of posterior means of parameters and distributions of usual intakes are obtained by using Markov chain Monte Carlo calculations. This R function reports to users point estimates and credible intervals for parameters in the model, samples from their posterior distribution, samples from the distribution of usual intake and usual energy intake, trace plots of parameters and summary statistics of usual intake, usual energy intake and energy adjusted usual intake.
excess zero models; MCMC; nonlinear mixed models; R; R2WinBUGS; zero-inflation
Dietary measurement error creates serious challenges to reliably discovering new diet–disease associations in nutritional cohort studies. Such error causes substantial underestimation of relative risks and reduction of statistical power for detecting associations. On the basis of data from the Observing Protein and Energy Nutrition Study, we recommend the following approaches to deal with these problems. Regarding data analysis of cohort studies using food-frequency questionnaires, we recommend 1) using energy adjustment for relative risk estimation; 2) reporting estimates adjusted for measurement error along with the usual relative risk estimates, whenever possible (this requires data from a relevant, preferably internal, validation study in which participants report intakes using both the main instrument and a more detailed reference instrument such as a 24-hour recall or multiple-day food record); 3) performing statistical adjustment of relative risks, based on such validation data, if they exist, using univariate (only for energy-adjusted intakes such as densities or residuals) or multivariate regression calibration. We note that whereas unadjusted relative risk estimates are biased toward the null value, statistical significance tests of unadjusted relative risk estimates are approximately valid. Regarding study design, we recommend increasing the sample size to remedy loss of power; however, it is important to understand that this will often be an incomplete solution because the attenuated signal may be too small to distinguish from unmeasured confounding in the model relating disease to reported intake. Future work should be devoted to alleviating the problem of signal attenuation, possibly through the use of improved self-report instruments or by combining dietary biomarkers with self-report instruments.
To develop a method to validate an FFQ for reported intake of episodically consumed foods when the reference instrument measures short-term intake, and to apply the method in a large prospective cohort.
The FFQ was evaluated in a sub-study of cohort participants who, in addition to the questionnaire, were asked to complete two non-consecutive 24 h dietary recalls (24HR). FFQ-reported intakes of twenty-nine food groups were analysed using a two-part measurement error model that allows for nonconsumption on a given day, using 24HR as a reference instrument under the assumption that 24HR is unbiased for true intake at the individual level.
The National Institutes of Health–AARP Diet and Health Study, a cohort of 567 169 participants living in the USA and aged 50–71 years at baseline in 1995.
A sub-study of the cohort consisting of 2055 participants.
Estimated correlations of true and FFQ-reported energy-adjusted intakes were 0·5 or greater for most of the twenty-nine food groups evaluated, and estimated attenuation factors (a measure of bias in estimated diet–disease associations) were 0·4 or greater for most food groups.
The proposed methodology extends the class of foods and nutrients for which an FFQ can be evaluated in studies with short-term reference instruments. Although violations of the assumption that the 24HR is unbiased could be inflating some of the observed correlations and attenuation factors, results suggest that the FFQ is suitable for testing many, but not all, diet–disease hypotheses in a cohort of this size.
Diet; Food; Epidemiological methods; Questionnaires; Validation studies
There has been great public health interest in estimating usual, i.e., long-term average, intake of episodically consumed dietary components that are not consumed daily by everyone, e.g., fish, red meat and whole grains. Short-term measurements of episodically consumed dietary components have zero-inflated skewed distributions. So-called two-part models have been developed for such data in order to correct for measurement error due to within-person variation and to estimate the distribution of usual intake of the dietary component in the univariate case. However, there is arguably much greater public health interest in the usual intake of an episodically consumed dietary component adjusted for energy (caloric) intake, e.g., ounces of whole grains per 1000 kilo-calories, which reflects usual dietary composition and adjusts for different total amounts of caloric intake. Because of this public health interest, it is important to have models to fit such data, and it is important that the model-fitting methods can be applied to all episodically consumed dietary components.
We have recently developed a nonlinear mixed effects model (Kipnis, et al., 2010), and have fit it by maximum likelihood using nonlinear mixed effects programs and methodology (the SAS NLMIXED procedure). Maximum likelihood fitting of such a nonlinear mixed model is generally slow because of 3-dimensional adaptive Gaussian quadrature, and there are times when the programs either fail to converge or converge to models with a singular covariance matrix. For these reasons, we develop a Monte-Carlo (MCMC) computation of fitting this model, which allows for both frequentist and Bayesian inference. There are technical challenges to developing this solution because one of the covariance matrices in the model is patterned. Our main application is to the National Institutes of Health (NIH)-AARP Diet and Health Study, where we illustrate our methods for modeling the energy-adjusted usual intake of fish and whole grains. We demonstrate numerically that our methods lead to increased speed of computation, converge to reasonable solutions, and have the flexibility to be used in either a frequentist or a Bayesian manner.
Bayesian approach; latent variables; measurement error; mixed effects models; nutritional epidemiology; zero-inflated data
In the United States the preferred method of obtaining dietary intake data is the 24-hour dietary recall, yet the measure of most interest is usual or long-term average daily intake, which is impossible to measure. Thus, usual dietary intake is assessed with considerable measurement error. Also, diet represents numerous foods, nutrients and other components, each of which have distinctive attributes. Sometimes, it is useful to examine intake of these components separately, but increasingly nutritionists are interested in exploring them collectively to capture overall dietary patterns. Consumption of these components varies widely: some are consumed daily by almost everyone on every day, while others are episodically consumed so that 24-hour recall data are zero-inflated. In addition, they are often correlated with each other. Finally, it is often preferable to analyze the amount of a dietary component relative to the amount of energy (calories) in a diet because dietary recommendations often vary with energy level. The quest to understand overall dietary patterns of usual intake has to this point reached a standstill. There are no statistical methods or models available to model such complex multivariate data with its measurement error and zero inflation. This paper proposes the first such model, and it proposes the first workable solution to fit such a model. After describing the model, we use survey-weighted MCMC computations to fit the model, with uncertainty estimation coming from balanced repeated replication.
The methodology is illustrated through an application to estimating the population distribution of the Healthy Eating Index-2005 (HEI-2005), a multi-component dietary quality index involving ratios of interrelated dietary components to energy, among children aged 2-8 in the United States. We pose a number of interesting questions about the HEI-2005 and provide answers that were not previously within the realm of possibility, and we indicate ways that our approach can be used to answer other questions of importance to nutritional science and public health.
Bayesian methods; Dietary assessment; Latent variables; Measurement error; Mixed models; Nutritional epidemiology; Nutritional surveillance; Zero-Inflated Data
We propose a semiparametric Bayesian method for handling measurement error in nutritional epidemiological data. Our goal is to estimate nonparametrically the form of association between a disease and exposure variable while the true values of the exposure are never observed. Motivated by nutritional epidemiological data we consider the setting where a surrogate covariate is recorded in the primary data, and a calibration data set contains information on the surrogate variable and repeated measurements of an unbiased instrumental variable of the true exposure. We develop a flexible Bayesian method where not only is the relationship between the disease and exposure variable treated semiparametrically, but also the relationship between the surrogate and the true exposure is modeled semiparametrically. The two nonparametric functions are modeled simultaneously via B-splines. In addition, we model the distribution of the exposure variable as a Dirichlet process mixture of normal distributions, thus making its modeling essentially nonparametric and placing this work into the context of functional measurement error modeling. We apply our method to the NIH-AARP Diet and Health Study and examine its performance in a simulation study.
B-splines; Dirichlet process prior; Gibbs sampling; Measurement error; Metropolis-Hastings algorithm; Partly linear model
The authors compared dietary pattern methods—cluster analysis, factor analysis, and index analysis—with colorectal cancer risk in the National Institutes of Health (NIH)–AARP Diet and Health Study (n = 492,306). Data from a 124-item food frequency questionnaire (1995–1996) were used to identify 4 clusters for men (3 clusters for women), 3 factors, and 4 indexes. Comparisons were made with adjusted relative risks and 95% confidence intervals, distributions of individuals in clusters by quintile of factor and index scores, and health behavior characteristics. During 5 years of follow-up through 2000, 3,110 colorectal cancer cases were ascertained. In men, the vegetables and fruits cluster, the fruits and vegetables factor, the fat-reduced/diet foods factor, and all indexes were associated with reduced risk; the meat and potatoes factor was associated with increased risk. In women, reduced risk was found with the Healthy Eating Index-2005 and increased risk with the meat and potatoes factor. For men, beneficial health characteristics were seen with all fruit/vegetable patterns, diet foods patterns, and indexes, while poorer health characteristics were found with meat patterns. For women, findings were similar except that poorer health characteristics were seen with diet foods patterns. Similarities were found across methods, suggesting basic qualities of healthy diets. Nonetheless, findings vary because each method answers a different question.
colorectal neoplasms; food habits; risk
A major problem in detecting diet-disease associations in nutritional cohort studies is measurement error in self-reported intakes, which causes loss of statistical power. The authors propose using biomarkers correlated with dietary intake to strengthen analyses of diet-disease hypotheses and to increase statistical power. They consider combining self-reported intakes and biomarker levels using principal components or a sum of ranks and relating the combined measure to disease in conventional regression analyses. They illustrate their method in a study of the inverse association of dietary lutein plus zeaxanthin with nuclear cataracts, using serum lutein plus zeaxanthin as the biomarker, with data from the Carotenoids in Age-Related Eye Disease Study (United States, 2001–2004). This example demonstrates that the combined measure provides higher statistical significance than the dietary measure or the serum measure alone, and it potentially provides sample savings of 8%–53% over analysis with dietary intake alone and of 6%–48% over analysis with serum level alone, depending on the definition of the outcome variable and the choice of confounders entered into the regression model. The authors conclude that combining appropriate biomarkers with dietary data in a cohort can strengthen the investigation of diet-disease associations by increasing the statistical power to detect them.
carotenoids; cataract; lutein; ranks; sample size
Dietary assessment of episodically consumed foods gives rise to nonnegative data that have excess zeros and measurement error. Tooze et al. (2006, Journal of the American Dietetic Association 106, 1575–1587) describe a general statistical approach (National Cancer Institute method) for modeling such food intakes reported on two or more 24-hour recalls (24HRs) and demonstrate its use to estimate the distribution of the food’s usual intake in the general population. In this article, we propose an extension of this method to predict individual usual intake of such foods and to evaluate the relationships of usual intakes with health outcomes. Following the regression calibration approach for measurement error correction, individual usual intake is generally predicted as the conditional mean intake given 24HR-reported intake and other covariates in the health model. One feature of the proposed method is that additional covariates potentially related to usual intake may be used to increase the precision of estimates of usual intake and of diet-health outcome associations. Applying the method to data from the Eating at America’s Table Study, we quantify the increased precision obtained from including reported frequency of intake on a food frequency questionnaire (FFQ) as a covariate in the calibration model. We then demonstrate the method in evaluating the linear relationship between log blood mercury levels and fish intake in women by using data from the National Health and Nutrition Examination Survey, and show increased precision when including the FFQ information. Finally, we present simulation results evaluating the performance of the proposed method in this context.
Dietary measurement error; Dietary survey; Episodically consumed foods; Excess zero models; Food frequency questionnaire; Fish; Individual usual intake; Mercury; Nonlinear mixed models; Regression calibration; 24-hour recall
It would be of enormous public health importance if diet and physical activity—both modifiable behavioral factors--were causally related to cancer. Nevertheless, the nutritional epidemiology of cancer remains problematic, in part because of persistent concerns that standard questionnaires measure diet and physical activity with too much error. We present a new strategy for addressing this measurement error problem. First, as background, we note that food frequency and physical activity questionnaires require respondents to report ‘typical’ diet or activity over the previous year or longer. Multiple 24-hour recalls (24HR), based on reporting only the previous day’s behavior, offer potential cognitive advantages over the questionnaires, and biomarker evidence suggests the 24HR is more accurate than the food frequency questionnaire. The expense involved in administering multiple 24HRs in large epidemiologic studies, however, has up to now been prohibitive. In that context, we suggest that internet-based 24HRs, for both diet and physical activity, represent a practical and cost-effective approach for incorporating multiple recalls in large epidemiologic studies. We discuss 1) recent efforts to develop such internet-based instruments and their accompanying software support systems; 2) ongoing studies to evaluate the feasibility of using these new instruments in cohort studies; 3) additional investigations to gauge the accuracy of the internet-based recalls vis-à-vis standard instruments and biomarkers; and 4) new statistical approaches for combining the new instruments with standard assessment tools and biomarkers The incorporation of internet-based 24HRs into large epidemiologic studies may help advance our understanding of the nutritional determinants of cancer.
Nutrition; Diet; Physical Activity; Cancer; Measurement error
Identifying diet-disease relationships in nutritional cohort studies is plagued by the measurement error in self-reported intakes.
The authors propose using biomarkers known to be correlated with dietary intake, so as to strengthen analyses of diet-disease hypotheses. The authors consider combining self-reported intakes and biomarker levels using principal components, Howe's method, or a joint statistical test of effects in a bivariate model. They compared the statistical power of these methods with that of conventional univariate analyses of self-reported intake or of biomarker level. They used computer simulation of different disease risk models, with input parameters based on data from the literature on the relationship between lutein intake and age-related macular degeneration.
The results showed that if the dietary effect on disease was fully mediated through the biomarker level, then the univariate analysis of the biomarker was the most powerful approach. However, combination methods, particularly principal components and Howe's method, were not greatly inferior in this situation, and were as good as, or better than, univariate biomarker analysis if mediation was only partial or non-existent. In some circumstances sample size requirements were reduced to 20-50% of those required for conventional analyses of self-reported intake.
The authors conclude that (i) including biomarker data in addition to the usual dietary data in a cohort could greatly strengthen the investigation of diet-disease relationships, and (ii) when the extent of mediation through the biomarker is unknown, use of principal components or Howe's method appears a good strategy.
Renal cell cancer (RCC) incidence has increased in the United States over the past three decades. The authors analyzed the association between body mass index (BMI) and invasive RCC in the National Institutes of Health (NIH)–AARP Diet and Health Study, a large, prospective cohort aged 50–71 years at baseline initiated in 1995–1996, with follow-up through December 2003. Detailed analyses were conducted in a subcohort responding to a second questionnaire, including BMI at younger ages (18, 35, and 50 years); weight change across three consecutive age intervals; waist, hip, and waist-to-hip ratio; and height at age 18 years. Incident RCC was diagnosed in 1,022 men and 344 women. RCC was positively and strongly related to BMI at study baseline. Among subjects analyzed in the subcohort, RCC associations were strongest for baseline BMI and BMI recalled at age 50 years and were successively attenuated for BMI recalled at ages 35 and 18 years. Weight gain in early (18–35 years of age) and mid- (35–50 years of age) adulthood was strongly associated with RCC, whereas weight gain after midlife (age 50 years to baseline) was unrelated. Waist-to hip ratio was positively associated with RCC in women and with height at age 18 years in both men and women.
body height; body mass index; body size; carcinoma, renal cell; obesity; overweight; waist-hip ratio
We examine two issues of importance in nutritional epidemiology: the relationship between dietary fat intake and breast cancer, and the comparison of different dietary assessment instruments, in our case the food frequency questionnaire (FFQ) and the multiple-day food record (FR). The data we use come from women participants in the control group of the Dietary Modification component of the Women’s Health Initiative (WHI) Clinical Trial. The difficulty with the analysis of this important data set is that it comes from a truncated sample, namely those women for whom fat intake as measured by the FFQ amounted to 32% or more of total calories. We describe methods that allow estimation of logistic regression parameters in such samples, and also allow comparison of different dietary instruments. Because likelihood approaches that specify the full multivariate distribution can be difficult to implement, we develop approximate methods for both our main problems that are simple to compute and have high efficiency. Application of these approximate methods to the WHI study reveals statistically significant fat and breast cancer relationships when a FR is the instrument used, and demonstrate a marginally significant advantage of the FR over the FFQ in the local power to detect such relationships.
Biased sampling; Breast cancer; Case–control studies; Comparison of instruments; Measurement error; Misspecified models; Nutritional epidemiology; Truncation; Women’s Health Initiative
While diet has long been suspected as an etiological factor for colorectal cancer, studies of single foods and nutrients have provided inconsistent results.
We used factor analysis methods to study associations between dietary patterns and colorectal cancer in middle-aged Americans.
Diet was assessed among 293,615 men and 198,767 women in the NIH-AARP Diet and Health Study. Principal components factor analysis identified three primary dietary patterns: a fruits and vegetables, a diet foods, and a red meat and potatoes pattern. State cancer registries identified 2,151 incident cases of colorectal cancer in men and 959 in women between 1995 and 2000.
Men with high scores on the fruit and vegetable factor were at decreased risk (RR for Q5 vs. Q1 = 0.81, 95% CI 0.70–0.93, p for trend = 0.004). Both men and women had a similar risk reduction with high scores on the diet food factor (RR = 0.82, 95% CI 0.72–0.94, p for trend = 0.001 for men and RR = 0.87, 95% CI 0.71–1.07, p for trend = 0.06 for women). High scores on the red meat factor were associated with increased risk (RR = 1.17, 95% CI 1.02–1.35, p for trend = 0.14 for men and RR = 1.48, 95% CI 1.20–1.83, p for trend = 0.0002 for women).
These results suggest that dietary patterns characterized by low frequency of meat and potato consumption and frequent consumption of fruit and vegetables and fat-reduced foods are consistent with a decreased risk of colorectal cancer.
Diet patterns; colorectal cancer; fruits and vegetables; meat; cohort study; dietary fat
Regression calibration (RC) is a popular method for estimating regression coefficients when one or more continuous explanatory variables, X, are measured with an error. In this method, the mismeasured covariate, W, is substituted by the expectation E(X|W), based on the assumption that the error in the measurement of X is non-differential. Using simulations, we compare three versions of RC with two other ‘substitution’ methods, moment reconstruction (MR) and imputation (IM), neither of which rely on the non-differential error assumption. We investigate studies that have an internal calibration sub-study. For RC, we consider (i) the usual version of RC, (ii) RC applied only to the ‘marker’ information in the calibration study, and (iii) an ‘efficient’ version (ERC) in which the estimators (i) and (ii) are combined. Our results show that ERC is preferable when there is non-differential measurement error. Under this condition, there are cases where ERC is less efficient than MR or IM, but they rarely occur in epidemiology. We show that the efficiency gain of usual RC and ERC over the other methods can sometimes be dramatic. The usual version of RC carries similar efficiency gains to ERC over MR and IM, but becomes unstable as measurement error becomes large, leading to bias and poor precision. When differential measurement error does pertain, then MR and IM have considerably less bias than RC, but can have much larger variance. We demonstrate our findings with an analysis of dietary fat intake and mortality in a large cohort study.
differential measurement error; moment reconstruction; multiple imputation; non-differential measurement error; regression calibration
We propose a new statistical method that uses information from two 24-hour recalls (24HRs) to estimate usual intake of episodically-consumed foods.
Statistical Analyses Performed
The method developed at the National Cancer Institute (NCI) accommodates the large number of non-consumption days that arise with foods by separating the probability of consumption from the consumption-day amount, using a two-part model. Covariates, such as sex, age, race, or information from a food frequency questionnaire (FFQ), may supplement the information from two or more 24HRs using correlated mixed model regression. The model allows for correlation between the probability of consuming a food on a single day and the consumption-day amount. Percentiles of the distribution of usual intake are computed from the estimated model parameters.
The Eating at America's Table Study (EATS) data are used to illustrate the method to estimate the distribution of usual intake for whole grains and dark green vegetables for men and women and the distribution of usual intakes of whole grains by educational level among men. A simulation study indicates that the NCI method leads to substantial improvement over existing methods for estimating the distribution of usual intake of foods.
The NCI method provides distinct advantages over previously proposed methods by accounting for the correlation between probability of consumption and amount consumed and by incorporating covariate information. Researchers interested in estimating the distribution of usual intakes of foods for a population or subpopulation are advised to work with a statistician and incorporate the NCI method in analyses.
Usual intake; Episodically-consumed foods; statistical methods
Results from several large cohort studies that were reported 10 to 20 years ago seemed to indicate that the hypothesized link between dietary fat intake and breast cancer risk was illusory. In this article, we review several strands of more recent evidence that have emerged. These include two studies comparing the performance of dietary instruments used to investigate the dietary fat-breast cancer hypothesis, a large randomized disease prevention trial, a more recent meta-analysis of nutritional cohort studies, and a very large nutritional cohort study. Each of the studies discussed in this article suggests that a modest but real association between fat intake and breast cancer is likely. If the association is causative, it would have important implications for public health strategies in reducing breast cancer incidence. The evidence is not yet conclusive, but additional follow-up in the randomized trial, as well as efforts to improve dietary assessment methodology for cohort studies, may be sufficient to provide a convincing answer.
breast cancer; dietary fat; dietary measurement error; food frequency questionnaire; multiple-day food record