|Home | About | Journals | Submit | Contact Us | Français|
While diet has long been suspected as an etiological factor for colorectal cancer, studies of single foods and nutrients have provided inconsistent results.
We used factor analysis methods to study associations between dietary patterns and colorectal cancer in middle-aged Americans.
Diet was assessed among 293,615 men and 198,767 women in the NIH-AARP Diet and Health Study. Principal components factor analysis identified three primary dietary patterns: a fruits and vegetables, a diet foods, and a red meat and potatoes pattern. State cancer registries identified 2,151 incident cases of colorectal cancer in men and 959 in women between 1995 and 2000.
Men with high scores on the fruit and vegetable factor were at decreased risk (RR for Q5 vs. Q1 = 0.81, 95% CI 0.70–0.93, p for trend = 0.004). Both men and women had a similar risk reduction with high scores on the diet food factor (RR = 0.82, 95% CI 0.72–0.94, p for trend = 0.001 for men and RR = 0.87, 95% CI 0.71–1.07, p for trend = 0.06 for women). High scores on the red meat factor were associated with increased risk (RR = 1.17, 95% CI 1.02–1.35, p for trend = 0.14 for men and RR = 1.48, 95% CI 1.20–1.83, p for trend = 0.0002 for women).
These results suggest that dietary patterns characterized by low frequency of meat and potato consumption and frequent consumption of fruit and vegetables and fat-reduced foods are consistent with a decreased risk of colorectal cancer.
Colorectal cancer is the second most commonly diagnosed malignancy (excluding non-melanoma skin cancer) in the United States (1), and for decades, epidemiologists have pointed to evidence from migrant studies and in-country time trends to implicate lifestyle factors as playing a major role in disease etiology (2). Whereas diet and nutrition have long been among the chief areas of interest for investigators trying to identify the lifestyle factors that are responsible for causing or preventing colorectal cancer (2–6), studies of nutrients, single foods, or food groups have in many cases provided inconsistent results.
A major difficulty in conducting studies of single foods or nutrients in relation to colorectal cancer is the high degree of correlation among dietary constituents. In this situation, isolating the particular effects of a single food or nutrient becomes a serious methodological problem. Moreover, the assumption that single foods or nutrients have isolated effects may not be valid; foods and nutrients more likely act in synergy such that the joint effects of the foods and nutrients work in something other than a simple additive fashion (7, 8). Recognition of these facts has led several investigators to propose a diet patterns approach as an alternative to the reductionist, single-food or single-nutrient focus of many studies of diet and chronic disease (7, 9, 10). A diet patterns approach could, it is hoped, capture the totality of dietary experience, including all the nutrient interactions, in a manner that studies of single nutrients or individual foods can not.
Factor analysis is a variable consolidation technique designed to generate a small number of variables that will capture much of the information in a larger data set. In this way factor analysis allows an investigator to reduce information on frequency of food intake among the members of a study population over the entire range of foods covered by a food frequency questionnaire (FFQ) into 2 or 3 variables that capture the primary sources of variation in reported diet. These variables, by identifying where the major sources of dietary variation lie, are one way of describing the main dietary patterns in that study population.
While a limited number of prospective studies have used factor analysis to investigate diet patterns as risk factors for colorectal cancer (11–17), the results have not been entirely consistent and have frequently suffered from a lack of statistical power. We used principle components factor analysis to identify dietary patterns in a large cohort of middle-aged Americans and then used the factor scores on each of these patterns to determine their association with subsequent risk of colorectal cancer.
The National Institutes of Health (NIH)-AARP Diet and Health Study is a prospective cohort study designed for the investigation of dietary and other lifestyle causes of cancer in the U.S. population. Details of the study rationale and methods have been published elsewhere (18), but are summarized briefly here.
AARP (formerly known as the American Association for Retired Persons) is a large, nonprofit organization in the United States with membership open to all Americans over the age of 50. In 1995–1996 investigators from the National Cancer Institute mailed a 16-page diet and lifestyle questionnaire to approximately 3.5 million members of AARP in six states (California, Florida, Pennsylvania, New Jersey, North Carolina, and Louisiana) plus two metropolitan areas (Atlanta and Detroit), and they received a total of 617,119 questionnaires (17.6 percent of the original mailing sample) in reply. Excluded from the sample were 35,685 questionnaires with inadequate or incomplete data, 14,203 that were either duplicates or from someone who was either ineligible or not the intended respondent, and 824 who subsequently withdrew from the study.
We further excluded respondents who were proxies for the intended respondent (N=15,760), who had a prior cancer (N=52,867), or who had a self report of end-stage renal disease (N=997). Finally, we excluded 1,835 women and 2,566 men who were outliers on calorie intake which we defined as being below the 25th percentile minus two interquartile ranges or above the 75th percentile plus two interquartile ranges of energy intake on the logarithmic scale. After these exclusions, 492,382 of the initial respondents remained in the analytic sample we used for this study.
The NIH-AARP Diet and Health Study was approved by the Special Studies Institutional Review Board of the U.S. National Cancer Institute, and all subjects provided their informed consent upon entry.
The NIH-AARP Diet and Health Study baseline questionnaire included a 124-item FFQ that was an early version of the Diet History Questionnaire (18). Participants were asked to report their usual frequency of intake and portion size over the preceding 12 months for each of the 124 food items plus provide responses to an additional 21 questions on intake of low-fat and high-fiber foods and on food preparation. Respondents indicated reported intake using 10 frequency categories ranging from ‘never’ to ‘6+ times per day’ for beverages and from ‘never’ to ‘2+ times per day’ for solid foods as well as 3 categories for portion size. Responses to frequency and portion size questions allowed for calculation of estimated daily energy intake based on national dietary data from the US Department of Agriculture’s 1994–96 Continuing Food Intake by Individuals (19). Additional details of the questionnaire design and its validity have been reported elsewhere (19–23).
The baseline questionnaire, in addition to assessing diet, also captured information on demographic characteristics, alcohol intake, tobacco use and physical activity.
We identified incident cases of colorectal cancer (International Classification of Disease for Oncology, 3rd ed., codes C180 – C189, C260, C199, and C209) that occurred during follow-up through December 31, 2000 by probabilistic linkage between the AARP cohort membership and the eight state cancer registry databases covering the places of residency for the study participants. The cancer registries for this cohort have been estimated to be 95% complete within two years of cancer incidence and have been certified by the North American Association of Central Cancer Registries (NAACCR) for meeting the highest standard of data quality (24). Our case ascertainment method has been described in a previous study (25). Vital status was ascertained through annual linkage of the cohort to the Social Security Administration Death Master File in the U.S., follow-up searches of the National Death Index Plus for participants who were determined to be deceased by the Social Security Administration Death Master File, cancer registry linkage, questionnaire responses, and responses to other mailings. Incident colorectal cancer cases had to be both invasive and, if multiple cancers were diagnosed in the same participant, had to be the first malignancy diagnosed during the follow-up period. We further classified colorectal cancer by anatomic subsite: proximal colon (C180-C184), distal colon (C185-C187), and rectum (C199, C209). Using these methods, we identified 2,151 incident cases of colorectal cancer in men and 959 in women.
We identified dietary patterns separately for men and women using principal components factor analysis based on responses to the baseline questionnaire. The FFQ database provided information on 204 separate food items which we aggregated into 181 food groups. Variables indicating different ways of eating butter and margarines were collapsed into five variables (i.e., butter, stick margarine, tub margarine, butter-margarine mixture and diet margarine), and non-caloric sweeteners (i.e., aspartame and saccharine) were collapsed into one variable. Two of the original food variables (i.e., “other fruits” and “other vegetables”) were excluded due to low reported consumption. Using a caloric density approach, we divided each individual’s daily frequency of consumption of each of the 181 food groups by his or her total daily calorie consumption in order to adjust for energy, and then we standardized the energy-adjusted frequency values to a mean of 0 and standard deviation of 1.0. Each of the standardized, energy-adjusted, frequency variables entered the factor analysis (using PROC FACTOR in SAS statistical software, version 8.2) and based on inspection of scree plots, three factors were retained. The factors were rotated using the varimax procedure to facilitate interpretability of the factors. For every subject we calculated factor scores on each of the three retained factors by summing frequency of consumption multiplied by factor loadings across all food items.
We used Cox proportional hazards regression (PROC PHREG in SAS version 8.2) with person years as the underlying time metric to generate rate ratios (RR’s) and 95% confidence intervals (CI’s) for factor scores on each of the 3 factors separately for men and women in both age-adjusted and multivariate-adjusted models. The multivariate models adjusted for the following potential confounders using dummy variables: ethnicity (white, African American, or “other”), tobacco use (never smoker, former smoker of 1 pack or less per day, former smoker of more than 1 pack per day, current smoker of 1 pack or less per day, current smoker of more than 1 pack per day), physical activity (rarely or never engaging in 20 minutes of moderate to vigorous physical activity per day, doing so 1 to 3 times per month, 1–2 times per week, 3–4 times per week, or 5 or more times per week), body mass index (BMI, less than 25, 25–30, 30–35, 35–40, or more than 40 kg/m2), education (less than high school graduate, high school graduate, some college education, or college graduate), and, in the case of women, use of menopausal hormone therapy (never user, previous user, or current user). All multivariate models also included separate dummy variables indicating a missing value for each of these potentially confounding factors.
All p-values were two-sided. To test for trend, we entered the factor scores into the model as continuous terms.
Principal components factor analysis identified three primary dietary patterns in men and women. Prior to rotation, these three factors explained 35.1% of the variance in the men and 34.2% of the variance in women. The foods with the highest factor loadings on each of the 3 factors for both men and women appear in Table 1. The first factor looked very similar for men and women with the highest factor loadings concentrated in the fruits and vegetables. Among men, the foods that loaded most heavily on the second factor were fat-reduced foods, diet foods, and lean meats. We also observed a factor with a similar pattern of factor loadings among women, but rather than the second factor, this was the third factor. The second factor for women was one where the highest factor loadings were for high-fat foods, red meats, and potatoes. Again, we observed a factor with a similar pattern of factor loadings among men, but in this case it was factor 3. As a statistical procedure, factor analysis selects factors such that foods loading heavily on one factor typically do not load even modestly on any of the others. This is apparent in Table 1 and indicates that the factors captured distinct sources of variation in the dietary practices of the men and women in the AARP cohort.
Baseline characteristics for the cohort by quintile of factor score for each of the three dietary patterns among both men and women appear in Table 2. Men and women with high scores on the fruit and vegetable factor had slightly lower BMI, were more physically active, were more likely to be college graduates, were less likely to be current smokers, and consumed less alcohol. Men with high fruit and vegetable-factor scores reported consuming many fewer calories per day, although for women there was little difference in energy intake. With respect to the red meat and potatoes factor, high scores were associated with higher BMI, increased energy intake, decreased physical activity, a lower likelihood of being a college graduate, and increased smoking for both men and women. Thus the fruit and vegetable pattern was associated with many behaviors and characteristics commonly understood to indicate or be predictive of good health while the red meat and potatoes pattern was associated with those that are indicative of poor health status. The fat-reduced and diet-foods pattern was also associated with many of the same health behaviors as was the fruit and vegetables pattern, but the degree of association was generally lower.
Results of the proportional hazards regression analyses of the factor scores appear in Table 3. Increasing score on the fruit and vegetable factor was associated with significantly lower risk of colorectal cancer in both age-adjusted and multivariate-adjusted models for men (RR = 0.81, 95% CI 0.70–0.93 for Q5 vs. Q1; p trend 0.004 in the multivariate model). The fat-reduced and diet-foods factor showed a similar inverse association with risk of incident colorectal cancer (RR = 0.82, 95% CI 0.72–0.94 for Q5 vs. Q1; p trend 0.0001 in the multivariate model). For each of these two patterns, there was a nearly monotonic decrease in risk with increasing quintile of factor score. In contrast to these results, men with a higher score on the red meat and potatoes factor were at increased risk of colorectal cancer in both the age-adjusted and multivariate-adjusted models, though the magnitude of association was modest (RR = 1.17, 95% CI 1.02–1.35 for Q5 vs. Q1; p trend 0.14 in the multivariate model).
Unlike in men, there was essentially no association between factor score on the fruit and vegetable factor and subsequent risk of colorectal cancer among women in the cohort (RR = 1.06, 95% CI 0.86–1.30 for Q5 vs. Q1; p trend 0.22 in the multivariate model). On the other hand, we did observe a similar reduction in risk among the women for higher scores on the fat-reduced and diet-foods factor, although the magnitude of the association was somewhat less than that found in the men, and it was not statistically significant (RR = 0.87, 95% CI 0.71–1.07 for Q5 vs. Q1; p trend 0.06 in the multivariate model). The red meat and potatoes pattern showed a strong positive association with incident colorectal cancer that was even more pronounced than that observed among the men (RR = 1.48, 95% CI 1.20–1.83 for Q5 vs. Q1; p trend 0.0002 in the multivariate model).
We examined the associations among factor scores for the three dietary patterns in men and women and subsequent risk of cancer by anatomic subsite (Table 4). While there were individual examples where some factors appeared to have a somewhat different association depending on subsite, the inconsistencies in these few subsite-specific differences in risk estimates made it difficult to conclude that dietary patterns as observed in the AARP cohort had effects that were more or less pronounced in any subsite of the lower gastrointestinal tract.
In addition to the analyses described above, we also ran models with interaction terms for the factor scores and both NSAID use and menopausal hormone therapy (MHT) use but saw little evidence of effect modification (data not shown).
Rather than describing hypothesized healthy eating patterns or recommended patterns, the patterns we identified in this analysis were reflective of the patterns of consumption that actually existed within the AARP cohort population. A potential criticism of this approach is that, given the data driven nature of the factors, they are dependent on the study population for their validity. Following from this, in a different population, or even in the same population at a different time, we might have observed a different set of factors thus limiting the interpretive value of these dietary patterns. But analogous versions of two of the three patterns we observed, the fruit and vegetable pattern and the meat and potatoes pattern, have emerged repeatedly in studies using factor analysis to study dietary patterns in North America (12, 17), Europe (11, 13, 16), and Asia(14, 15). It is frequently the case that these studies identified additional factors, often of unique relevance to the locality of the study, but the fruit and vegetable and the meat and starch patterns were ubiquitous. The common observation of these patterns occurred despite the obvious geographical and cultural differences, despite the use of different FFQs, and despite different decisions by investigators with respect to food groupings and number of factors to retain. Furthermore, in longitudinal analyses, these patterns have been shown to be highly stable over time (26). Given the broad geographic and temporal consistency in factor analysis results, it is reasonable to conclude that the fruit and vegetable and the meat and starch patterns we observed are not likely to be the results of chance observations but rather reflect true underlying dietary patterns observed in many populations over time, and therefore do capture important dimensions of the dietary experience in the AARP cohort.
The seven previous prospective studies of diet patterns as identified by factor analysis and subsequent risk of colorectal cancer (11–17) have found associations that were generally consistent with what we observed in the AARP cohort. The fruit and vegetable pattern has been associated with a reduced risk of colorectal cancer in most cohort studies (12–17), but the associations have been modest and typically were not statistically significant. Conversely, the meat and starch pattern has been associated with increased risk of colorectal cancer in most (12–14, 17), but not all (11, 15, 16), studies, and even in the studies that did find increased risk with high score on the meat and potatoes factor, the associations were not statistically significant.
The first factor had the appearance of a global fruit and vegetable score. In that way, using it as an exposure may not have been too different from simply using fruits and vegetables as an exposure and then controlling for all other dietary variables. The advantage of factor scores in this situation may have been the ability of the factors to account adequately for variation in all other dietary components (since all foods made weighted contributions to the factor score), whereas in a traditional single food (or food group) analysis, controlling for potential confounding by other dietary constituents may have been incomplete. If this is true, it may help explain why we were able to observe a fruit and vegetable effect in men when many cohorts looking only at fruit or at vegetables observed null results (27–34). We still observed a null result for women on the fruit and vegetable factor, though, and this is consistent with most recent results from prospective studies of these food groups taken individually (27, 28, 30–37).
The associations we observed for the meat and potatoes factor are consistent with results published in two recent meta-analyses of meat and risk of colorectal cancer (38, 39). While some have noted that almost all of the individual studies in those meta-analyses, and in subsequent publications (40), found only modest and not statistically significant increases in risk, the association we observed between high score on the meat and potato factor and risk of subsequent disease provide support to the notion that diets characterized by high intake of meat, and particularly red meat, increase risk of colorectal cancer.
While, as expected, we did observe the two ubiquitous patterns (fruits and vegetables and meat and potatoes), the third pattern, the fat-reduced and diet food factor, was novel. The dieting-related features of this pattern caused us to wonder what other characteristics described people with high scores on this factor. For example, were they overweight people whose adoption of this pattern reflected a desire to address poor health and dissatisfaction with current weight? But on the contrary, high scores on this factor were associated with health-promoting behaviors and lower BMI. When we controlled for these, however, we still observed the inverse association giving us greater confidence that this dietary pattern was itself responsible for the decreased risk. Nonetheless, it is likely that we did not control for the “healthy” lifestyle factors completely, either through imperfect measurement of the exposures or through failure to control for other unknown or unmeasured confounding factors, and therefore we cannot rule out residual confounding as an explanation for the inverse association.
Interestingly, we found different associations between what appeared to be similar factors in men and women and subsequent risk of colorectal cancer. It is possible that women and men completed the FFQ differently resulting in different degrees of measurement error. If the reported diet in women had more random error, then associations would be attenuated compared to what we observed in the men. But the associations were not simply attenuated, as we actually observed stronger associations for the meat and potatoes factor in women. The use of MHT was more common among women with low scores on the meat and potatoes factor, and given previous studies showing a possible inverse association between MHT and colorectal cancer (41, 42), this could, in part, explain the stronger association in women than in men for this variable. We did control for MHT use in multivariate models, but it is possible that we did so imperfectly and thus we cannot rule out residual confounding. There was no association between MHT and the fruit and vegetable factor scores, however, and yet we still saw differences in risk estimates for men and women on this variable. An alternate possibility is that genuine differences exist in the diet-related pathology of colorectal cancers between men and women.
In summary, we observed that both for men and (especially) women, a dietary pattern characterized by frequent meat and potatoes consumption was associated with an increased risk of colorectal cancer, whereas a dietary pattern typified by frequent consumption of fat-reduced and diet foods was associated with a significant reduction in risk among men and the suggestion of a decreased risk among women. And while a vegetables and fruits pattern was associated with reduced risk among men, it was not associated with colorectal cancer outcomes in women. These differences in the associations among dietary patterns and risk of colorectal cancer do raise interesting questions with respect to a possible physiological role of gender in disease etiology, but in general, dietary patterns characterized by comparatively low frequency of meat and potato consumption, high frequency of fruit and vegetable intake, and high frequency of fat-reduced foods are consistent with a decreased risk of disease.
Contributions of Authors: Andrew Flood developed the analytic strategy, analyzed the data, and drafted the manuscript. Tanuja Rastogi performed initial data analysis and assisted in drafting the manuscript. Elisabet Wirfält, Panagiota N. Mitrou, and Jill Reedy assisted in developing the analytic strategy and assisted in drafting the manuscript. Amy F. Subar developed the dietary assessment tool, assisted in the conceptualization of the NIH-AARP Diet and Health Study, and assisted in drafting the manuscript. Victor Kipnis provided statistical support, assisted in the conceptualization of the NIH-AARP Diet and Health Study, and assisted in drafting the manuscript. Traci Mouw provided data management support and assisted in drafting the manuscript. Albert R. Hollenbeck assisted in the conceptualization of the NIH-AARP Diet and Health Study and assisted in drafting the manuscript. Michael Leitzmann provided overall study management and assisted in drafting the manuscript. Arthur Schatzkin initiated the NIH-AARP Diet and Health Study, provided overall study management, and assisted in drafting the manuscript. None of the authors have any conflict of interest.
Cancer incidence data from the Atlanta metropolitan area were collected by the Georgia Center for Cancer Statistics, Department of Epidemiology, Rollins School of Public Health, Emory University. Cancer incidence data from California were collected by the California Department of Health Services, Cancer Surveillance Section. Cancer incidence data from the Detroit metropolitan area were collected by the Michigan Cancer Surveillance Program, Community Health Administration, State of Michigan. The Florida cancer incidence data used in this report were collected by the Florida Cancer Data System under contract to the Department of Health (DOH). The views expressed herein are solely those of the authors and do not necessarily reflect those of the contractor or DOH. Cancer incidence data from Louisiana were collected by the Louisiana Tumor Registry, Louisiana State University Medical Center in New Orleans. Cancer incidence data from New Jersey were collected by the New Jersey State Cancer Registry, Cancer Epidemiology Services, New Jersey State Department of Health and Senior Services. Cancer incidence data from North Carolina were collected by the North Carolina Central Cancer Registry. Cancer incidence data from Pennsylvania were supplied by the Division of Health Statistics and Research, Pennsylvania Department of Health, Harrisburg, Pennsylvania. The Pennsylvania Department of Health specifically disclaims responsibility for any analyses, interpretations or conclusions.
Funding: National Institutes of Health (K07 CA108910-01A1 to A.F.) and Intramural Research Program funds from the National Cancer Institute, Bethesda, MD.