|Home | About | Journals | Submit | Contact Us | Français|
To examine associations between food patterns, constructed with cluster analysis, and colorectal cancer incidence within the National Institutes of Health (NIH)–AARP Diet and Health Study.
A prospective cohort, aged 50–71 years at baseline in 1995–96, followed until the end of 2000.
Food patterns were constructed, separately in men (n=293 576) and women (n=198 730), with 181 food variables (daily intake frequency per 1 000 kilocalories) from a food frequency questionnaire. Four large clusters were identified in men and three in women. Cox proportional hazards regression examined associations between patterns and cancer incidence.
In men, a Vegetable and Fruit Pattern was associated with reduced colorectal cancer incidence (multivariate HR: 0.85 95%CI: 0.76, 0.94), when compared to less salutary food choices. Both the Vegetable and Fruit pattern and a Fat-Reduced Foods pattern were associated with reduced rectal cancer incidence in men. In women, a similar Vegetable and Fruit pattern was associated with colorectal cancer protection (age-adjusted HR: 0.82 95%CI: 0.70, 0.95), but the association was not statistically significant in multivariate analysis.
These results, together with findings from previous studies support the hypothesis that micronutrient dense, low-fat, high-fiber food patterns protect against colorectal cancer.
Epidemiological research suggests that dietary factors may both protect against and promote the development of colorectal cancer. High intakes of fiber, folate and calcium have been associated with reduced colorectal cancer risk (Giovannucci 2002) (Bingham et al 2003)(Norat and Riboli 2003)(Larsson et al 2006), and high intakes of meat and fat with increased risk (Norat et al 2005) (Giovannucci et al 1992). Experts argue that because of the multifaceted nature of diet-disease associations, traditional multivariate analysis may be an inefficient approach in nutrition epidemiology (Schatzkin et al 1995) (Jacques and Tucker 2001). Because foods are consumed together, and dietary components act in synergism or are metabolized jointly, it can be argued that the true effect of diet may only be observed when all components are considered simultaneously. Also, analysis of dietary data and interpretation of diet-disease associations are hampered by the difficulties in separating out individual dietary components and adequately modeling their potential interactions (Byers and Gieseker 1997).
Patterning methodologies, including cluster analysis (CA), factor analysis (FA) and diet quality indexes, may turn the analytical difficulties into an advantage (Hu 2002)(Kant 2004)(Newby and Tucker 2004). CA, which aggregates individuals with similar characteristics (Aldenderfer and Blashfield 1984) has successfully been applied in epidemiology (Tucker et al 1992)(Hulshof et al 1992)(Greenwood et al 2000) (Wirfält et al 2001)(Engeset et al 2005), but only a few CA studies have examined food patterns and colorectal health (Rouillier et al 2005)(Austin et al 2007). This study examines associations between food pattern clusters and colorectal cancer incidence in the National Institutes of Health (NIH)–AARP Diet and Health Study. In a series of papers, the same group of researchers is currently investigating different ways of constructing food patterns and their associations with colorectal cancer incidence (Flood et al 2008) (Reedy et al 2008). A forthcoming paper will discuss and compare the experiences of this CA study with other approaches.
The NIH-AARP Diet and Health Study was established in 1995–1996 (Schatzkin et al 2001). A total of 340 148 men and 227 021 women above 50 years of age, residing in six states (California, Florida, Pennsylvania, New Jersey, North Carolina, and Louisiana) and two Metropolitan areas (Atlanta, Georgia and Detroit, Michigan), adequately completed a 16-page mailed questionnaire. The study protocol was approved by the Special Studies Institutional Review Board of the U.S. National Cancer Institute, and all subjects provided their informed consent upon entry.
Vital status is ascertained annually through linkage of the cohort to the Social Security Administration Death Master File in the U.S., follow-up searches of the National Death Index Plus for participants who matched to the Social Security Administration Death Master File, cancer registry linkage, and responses to questionnaires and other mailings. The design and maintenance of this cohort have been described in detail elsewhere (Schatzkin et al 2001).
In this analysis, we excluded individuals with prevalent cancer (43 341 men and 26 048 women), end-stage renal disease (626 men and 371 women) at baseline, and those reporting extreme energy intakes (2 566 men and 1 835 women) defined as being below the 25th percentile minus two interquartile ranges or above the 75th percentile plus two interquartile ranges of energy intake on the logarithmic scale. In preliminary CA with 100 clusters (performed twice), we also identified individuals with extreme food intakes; individuals in small clusters (less than 10 individuals) were removed (39 men and 37 women). The final sample was 293 576 men and 198 730 women.
Incident cases of cancer were identified by linkage between the NIH-AARP cohort membership and cancer registry databases of the eight targeted states, which are estimated to be 95% complete within two years of cancer diagnosis and certified by the North American Association of Central Cancer Registries for meeting the highest standard of data quality (Michaud et al 2005). Incident colorectal cancer cases were defined according to the International Classification of Disease-Oncology 3rd ed. (codes C180 – C189, C260, C199, and C209). A total of 2 151 men and 959 women were diagnosed with primary incident colorectal cancer during the 4.5 year period from the baseline examinations (1995–96) until the end of 2000. In men 631 cases were diagnosed with rectal cancer, and 1 539 with colon cancer. In women 258 cases were diagnosed with rectal cancer, and 707 with colon cancer. Person-years of observation accumulated from the date of study entry until the date of colorectal cancer diagnosis, or until censoring at the date of cancer diagnosis at another site, death, migration out of the study areas, or until December 31, 2000, whichever occurred first.
The baseline food frequency (AARP-FFQ) questionnaire was an early version of the new Diet History Questionnaire (DHQ) of the National Cancer Institute that has undergone extensive cognitive testing during development (Subar et al 1995)(Subar et al 2001b)(Subar et al 2001a). The AARP-FFQ was evaluated against two 24-hour dietary recalls in the calibration sub-study of 2 000 men and women and demonstrated a satisfactory relative validity (Thompson et al 2008). The energy adjusted validity coefficients were in men for protein 0.43, carbohydrate 0.71, fat 0.72, and fruit and vegetables 0.72; in women for protein 0.50, carbohydrate 0.64, fat 0.62, and fruit and vegetables 0.61. The energy adjusted attenuation factors were lowest for protein in both men (0.26) and women (0.31) and highest for saturated fat in men (0.68) and for vitamin B6 in women (0.62). The baseline questionnaire included 124 food items with ten frequency response categories (i.e., never; 1–6 times/year; 7–11 times/year; once/month; 2–3 times/month; 1–2 times/week; 3–4 times/week; 5–6 time/week; once/day; and twice or more/day) and three portion size alternatives. In addition, 21 questions requested frequency information on intake of low-fat and high-fiber foods and food preparation, and two cross-checking questions asked about the overall consumption of vegetables and fruits. The questionnaire, designed for the general population, includes some regional and ethnic group-specific foods, and three items on the type, frequency and dosage of supplement use. The reference period was the last 12 months. The energy and nutrient intakes were calculated by applying the food frequency and portion size information to the nutrient composition database that was newly derived from national survey data; CSFII, U.S. Department of Agriculture (Subar et al 2000). This study examined intakes of adjusted for energy using the density method.
A total of 204 food frequency variables were available in the database. We reduced these variables to 181 by collapsing those indicating different ways of eating butter and margarines into five variables (i.e., butter, stick margarine, tub margarine, butter-margarine mixture and diet margarine), and non-caloric sweeteners (i.e., aspartame and saccharine) into one variable. Two of the original food variables (i.e., “other fruits” and “other vegetables”) were excluded due to no reported consumption.
We used energy adjusted food frequency variables (i.e., food frequencies per 4.19 MJ and day) in order to concentrate on dietary proportions, and to reduce measurement error common in food frequency questionnaires (Willett et al 1997)(Kipnis et al 2003). To remove the extraneous effect of variables with large variances on formation of clusters(1988) we also standardized the energy-adjusted food variables to a mean of zero and standard deviation of one.
The baseline questionnaire included questions on demographics and potential cancer risk factors. The following variables were used in this study: age; education (< high school; completed high school; some college; college degree and higher); ethnicity (white; black; other); smoking (never; former, <20 cigarettes/day; former, ≥ 20 cigarettes/day; current, ≤ 20 cigarettes/day; current, >20 cigarettes/day), leisure time physical activity (never or rarely; 1–3 times/month; 1–2 times/week; 3–4 times/week; 5 or more times/week), body mass index (BMI, kg/m2 ) computed from self-reported weights and heights (<18.5; 18.5 < 25.0; 25.0 < 30.0; 30.0 < 35.0; 35.0 < 40.0; >=40.0); and in women only menopausal hormone therapy, MHT (never use; current use; past use). An indicator variable for missing responses in each covariate was used, if applicable.
We used SAS version 8.1 (Cary, NC) for all statistical analyses. Statistical tests were two-sided with significance levels equal to 0.05 and all analytical procedures were conducted separately for men and women. Cluster analysis was performed using a k-means method, an iterative partitioning procedure that attempts to group the data into k clusters in such a way as to maximize the overall R2 value, defined as R2 = 1 − W/T, where W is the sum of squared Euclidean distances between each data point and its within-cluster mean (or center), and T is the sum of squared distances between each data point and the overall mean (Aldenderfer and Blashfield 1984)(1988). The k-means methodology is recommended when working with large data sets, and have previously been used in a large number of diet-chronic disease studies (Kant 2004)(Newby and Tucker 2004). Clustering was based on the 181 energy-adjusted and standardized food frequency variables for k=3 to 12 clusters. A final number of clusters was chosen based on the stability of large clusters (n>10 000) that were formed, and on the overall R2 values. When plotting the R2 values against the number of clusters, six clusters for men and nine clusters for women accounted for most of the increase in R2 and ensured three stable large clusters for each gender. Four clusters in men and three in women were used in subsequent analyses.
The distributions of relative food frequencies and the medians of total energy and energy-adjusted nutrient variables were examined across clusters. Chi-square analysis examined the distribution of common risk factors for colorectal cancer.
The Cox proportional hazards regression (Cox 1972), with time since entry into the study as the time scale, was used to examine the association between clusters and incidence of colorectal cancer, colon cancer and rectal cancer. The largest cluster (labeled “Many foods” in both men and women) was used as the reference category. Three models were fit for each cancer end point. The first model included only cluster (categorical) and age (continuous) as covariates. The second model also included BMI, and the third, multivariate, model adjusted in addition for education, ethnicity, smoking, leisure time physical activity and log total energy (continuous), and MHT in women. We also assessed the potential impact of dietary fiber, folic acid, and calcium intakes, but since results did not change materially, these nutrients were not included in our final models.
The food and nutrient characteristics of clusters are described in Table 1 and and22 (Please see Appendix 1 and and22 for detailed description of clusters). In men, four clusters, with more than 10 000 individuals, emerged. For the largest cluster “Many foods,” the CA procedure did not indicate any specific distinguishing food, but intakes of alcohol and sweets ranked comparatively high. The second largest cluster (“Vegetable and fruit”) was characterized by high intakes of vegetables, fruits, and low-fat foods like fish and lean chicken. This pattern was lowest in fat and the densest in micro-nutrients. The third largest cluster (“Fatty meats”) was characterized by regular-fat meats. The fourth largest cluster (“Fat-reduced foods”) was characterized by fat-reduced foods (but not lean meats), with skim milk ranking comparatively high. Specific food items (i.e., Pumpkin Pie, Custard Pie, Lard, Bacon and Eggs) influenced the formation of the two smallest clusters.
In women, three of the nine clusters had more than 30 000 individuals, while six clusters had fewer than 10 000 individuals. Similar to men, no specific food emerged as the distinguishing feature for the largest cluster (“Many foods”), but sweets ranked comparatively high. Although the second largest cluster “Vegetables and fruits” had similar characteristics to the “Vegetables and fruits” cluster in men, skim milk with cereals and yogurt also ranked high in that cluster in women. Alcohol intakes were lower overall in women than in men, but appear to rank higher both in the “Vegetables and fruit” and the “Many foods” clusters. Different diet foods and lean meats characterized the third largest cluster in women (“Diet foods and lean meats”). Similar to men the formation of the smallest clusters was driven by frequent consumption of specific foods (i.e., several types of pie or chicken, shortening, lard, or liver).
Tables 3 and and44 show the within-cluster distributions of some potential risk factors for colorectal cancer. In men, the “Vegetable and fruit” cluster was associated with being older, more educated, more likely to have never smoked, more physically active and less obese than the total sample, while the “Many foods” cluster was associated with being younger, less educated, more likely to have smoked, less physically active and more obese. Similar tendencies were seen for the comparable clusters in women. The “Diet foods and lean meats” cluster in women was associated with obesity, but the “Fat-reduced foods” cluster in men was not. MHT-use appeared more common among women of the “Vegetable and fruit” cluster.
Hazard ratio estimates for colorectal cancer incidence are shown in Table 5 for clusters with more than 10 000 individuals. Smaller clusters had too few cases to give meaningful estimates. In men, the “Vegetable and fruit” cluster was statistically significantly associated with reduced colorectal cancer incidence when compared to the “Many foods” cluster; the association remained significant after multivariate adjustment (HR: 0.85; 95% CI 0.76, 0.94). In women, the “Vegetable and fruit” cluster was statistically significantly associated with reduced colorectal cancer incidence in the age-and BMI-adjusted models (HR: 0.83; 95%CI 0.72, 0.97), but not in the multivariate model.
When analyses were repeated for colon and rectal cancer as separate end points (Table 6), both the “Vegetable and fruits” (HR: 0.74; 95%CI 0.60, 0.91), and the “Fat-reduced foods” (HR: 0.56; 95%CI 0.34, 0.95) clusters in men were inversely associated with rectal cancer after multivariate adjustment for other risk factors. The “Vegetable and fruits” cluster was also associated with a borderline protective association for colon cancer. In women, no significant associations were observed for any food pattern when colon and rectal cancer were examined as separate end points.
Several large clusters of diverse dietary composition were identified in the NIH-AARP cohort. A food pattern characterized by high intake of vegetables, fruits and other foods high in micronutrients and low in fat, was associated with reduced colorectal cancer incidence in men, even after adjusting for other known risk factors. In men, the “Vegetable and fruits” and “Fat-reduced foods” patterns were also associated with reduced rectal cancer incidence, although the small number of cases (n=15) for the “Fat-reduced foods” pattern makes this finding somewhat tentative. In women, a similar “Vegetable and fruit” pattern was associated with reduced colorectal cancer incidence, but that association was not independent of other risk factors.
The major advantages of the NIH-AARP Diet and Health study are the large sample size and endpoint ascertainment from high quality registries (Schatzkin et al 2001). Further, prospective dietary data collection avoids biases associated with differential recall for cases and non-cases. We kept the aggregation of the original food items to a minimum, in order to avoid the potential attenuation of food pattern–disease associations that may occur with broader food groups (McCann et al 2001). The use of density variables based on consumption frequency and standardized to have the same variance, allowed food patterns characterized by low energy foods to emerge. This may be an advantage when the diet-disease hypotheses include the health benefits of non-energy contributing plant foods (Giovannucci 2002)(Bingham et al 2003)(Norat and Riboli 2003)(Larsson et al 2006).
Findings of other food pattern studies (Randall et al 1992)(Slattery et al 1998)(Terry P 2001)(Harnack et al 2002)(Fung et al 2003)(Dixon et al 2004) (Mizoue T and S. 2005), are largely consistent with ours. Although two previous CA studies of dietary patterns and colorectal adenomas used distinct analytical approaches their findings were also consistent with ours (Rouillier et al 2005)(Austin et al 2007). A French case-control study (n=1 372) identified 5 clusters by first reducing the diet history data (159 food items) into 13 factors and then applying these factors to CA procedure (Rouillier et al 2005). A U.S. case-control study (n=725) used FFQ data converted to gram per 1 000 kcal variables, but aggregated food variables into 18 food groups (Austin et al 2007). The French study found that a food pattern high in animal fat, processed meat and total energy was associated with increased risk of colorectal cancer (Rouillier et al 2005), while the U.S. study found that a pattern high in fruit and low in meat was associated with reduced risk (Austin et al 2007).
Although red and processed meats are thought to contain carcinogenic substances for large bowel cancer, and other studies have linked these foods to increased colorectal cancer risk (WCRF/AICR 2007), comparable associations were not seen in our study. The lack of a significant association with the “Fatty meats” cluster in men in our study was unexpected. However, the intake of alcohol, that previously has been associated with increased colorectal cancer risk (WCRF/AICR 2007), was comparatively low in this cluster, and may have contributed to the findings. In women no cluster characterized by fatty meats emerged, instead hamburgers and meatloaf ranked comparatively high in the “Many foods” cluster. The largest clusters in men and in women appear overall to show similar dietary characteristics. However, low-fat dairy foods ranked comparatively high in the “Vegetable and fruit” cluster in women, and these foods ranked high in the “Fat-reduced foods” cluster in men. Previous reports from this cohort also indicate differences in dietary heterogeneity in men and in women (Schatzkin et al 2001). Since we used energy-adjusted food variables, the differences cannot simply be a result of different energy intakes. These food selection differences by gender, consistent with previous research in this area (Randall et al 1992) (Wirfält et al 2001), may influence the formation of patterns and could partly explain the observed differences in associations with colorectal cancer. Such food choice differences could depend on differences in health behavior awareness and social desirability (Hebert et al 1997). A Danish review concluded that higher education in men was associated with food habits that tended to be more similar to those of women (O’Doherty Jensen and Holm 1999). These differences could translate into actual dietary differences, or alternatively into differences in reporting of diet (measurement error) (Macintyre and Anderson 1997).
Dietary measurement error may affect the food pattern analysis in two ways. First, it may influence the formation of clusters leading to distortion of the main exposure. Although the effect of this potential distortion on the estimated hazard ratio has not been sufficiently studied, it is likely to attenuate the estimated cluster effect in a simple univariate analysis. Second, dietary measurement error may affect covariate adjustment, even for exactly measured confounders, by producing residual confounding in a multivariate model. The OPEN study with reference biomarkers for protein and energy intake indicated that measurement error may be a greater threat to dietary assessment in women than in men (Kipnis et al 2003), and could therefore contribute to the differences in associations observed in this study. The smaller sample size resulted in fewer cases and less analytical power to detect associations in women than in men, which, especially in the presence of measurement error, could have contributed to the observed differences in study outcomes by gender.
Morover, not only diet but also lifestyle and socio-economic factors may be imperfectly measured, so that residual confounding could affect results even when major potential confounders are included in the model. Also, since dietary patterns tend to co-vary with lifestyle and socio-economic factors, both in men and women (Patterson et al 1994) (Greenwood et al 2000)(Reedy et al 2005)(Engeset et al 2005) other unknown risk factors could, even in multivariate analysis, easily confound associations between clusters and disease risk.
To conclude, food patterns characterized by plant foods high in micronutrients and low in fat were associated with reduced colorectal cancer incidence in the NIH-AARP study. The associations were stronger in men than in women, and in men observed even in a multivariate model after adjusting for other known risk factors. Also, in men, these food patterns were more strongly associated with rectal cancer than with colon cancer. The observed gender differences may be due to actual differences in reported food choices, resulting in cluster differences or; alternatively, may be due to differences in statistical power or differences in residual confounding between men and women. Our findings are supported by previous food pattern studies.
This research was supported by the Intramural Research Program of the NIH, National Cancer Institute. Cancer incidence data from the Atlanta metropolitan area were collected by the Georgia Center for Cancer Statistics, Department of Epidemiology, Rollins School of Public Health, Emory University. Cancer incidence data from California were collected by the California Department of Health Services, Cancer Surveillance Section. Cancer incidence data from the Detroit metropolitan area were collected by the Michigan Cancer Surveillance Program, Community Health Administration, State of Michigan. The Florida cancer incidence data used in this report were collected by the Florida Cancer Data System under contract to the Department of Health (DOH). The views expressed herein are solely those of the authors and do not necessarily reflect those of the contractor or DOH. Cancer incidence data from Louisiana were collected by the Louisiana Tumor Registry, Louisiana State University Medical Center in New Orleans. Cancer incidence data from New Jersey were collected by the New Jersey State Cancer Registry, Cancer Epidemiology Services, New Jersey State Department of Health and Senior Services. Cancer incidence data from North Carolina were collected by the North Carolina Central Cancer Registry. Cancer incidence data from Pennsylvania were supplied by the Division of Health Statistics and Research, Pennsylvania Department of Health, Harrisburg, Pennsylvania. The Pennsylvania Department of Health specifically disclaims responsibility for any analyses, interpretations or conclusions. We are indebted to the participants in the NIH-AARP Diet and Health Study for their outstanding cooperation. Funding was also received from The Swedish Cancer Foundation (contract 05 0128 to E.W.) and the Swedish Council for Working Life and Social Research (contract 2005-1703 to E.W.).