|Home | About | Journals | Submit | Contact Us | Français|
Predictors of colorectal cancer have been extensively studied with some evidence suggesting that risk factors vary by subsite. Using data from 2 prospective cohort studies, we examined established risk factors to determine whether they were differentially associated with colon and rectal cancer. Our study population included 87,733 women from the Nurses’ Health Study (NHS) and 46,632 men from the Health Professionals Follow Up Study (HPFS). Exposure information was collected via biennial questionnaires (dietary variables were collected every 4 years). During the follow-up period (NHS: 1980 to May 31, 2000; HPFS: 1986 to January 31, 2000), we identified 1,139 cases of colon cancer and 339 cases of rectal cancer. We used pooled logistic regression to estimate multivariate relative risks for the 2 outcomes separately and then used polytomous logistic regression to compare these estimates. In the combined cohort, age, gender, family history of colon or rectal cancer, height, body mass index, physical activity, folate, intake of beef, pork or lamb as a main dish, intake of processed meat and alcohol were significantly associated with colon cancer risk. However, only age and sex were associated with rectal cancer. In a stepwise polytomous logistic regression procedure, family history and physical activity were associated with statistically significant different relative risks of colon and rectal cancer. Our findings support previous suggestions that family history and physical activity are not strong contributors to the etiology of rectal cancer. Future investigations of colon or rectal cancer should take into consideration risk factor differences by subsite.
Many etiologic factors have been established as contributing to the risk of colorectal cancer. However, the statistical models that evaluate the endpoint of colorectal cancer assume constancy of the relative risks for colon cancer and rectal cancer, or ignore if heterogeneity exists between them. If colon cancer and rectal cancer have distinct etiologies, these assumptions may be inappropriate and established risk factors for colorectal cancer may or may not be risk factors for colon or rectal cancer if considered separately.
Recently, several investigators have studied associations between exposures and subsites in the large bowel. These studies suggest etiologic distinctions by anatomic location within the colon.1–6 For example, physical inactivity and body mass index have been associated with colon cancer but not with rectal cancer.7 Because rectal cancers make up approximately 30% of all large bowel cancers in developed countries, analyses of rectal cancer as a distinct endpoint have often been inconclusive as a result of limited statistical power. To our knowledge, no study has formally compared risk factors for colon cancer and rectal cancer. To further our understanding of etiologic similarities and differences between colon cancer and rectal cancer, we used polytomous logistic regression techniques to examine and compare risk estimates for colon cancer and rectal cancer across a wide range of dietary and lifestyle factors.
We used data from 2 prospective cohort studies, the Nurses’ Health Study (NHS) and the Health Professionals Follow-Up Study (HPFS). The Nurses’ Health Study began in 1976 with 121,700 female married nurses who were 30 to 55 years of age and resided in 11 states in the United States. Every 2 years since 1976, the Nurses have been sent questionnaires to ascertain their current lifestyle habits and disease history. Dietary intake was first assessed in 1980 and subsequently in 1984, 1986 and every 4 years since 1986.
The Health Professionals Follow-Up Study is a prospective cohort study begun in 1986. At baseline, it was comprised of 51,529 male dentists, optometrists, osteopaths, podiatrists, pharmacists and veterinarians between 40 and 75 years of age in 1986. The cohort is sent a questionnaire every 2 years to update information on current lifestyle habits and disease history. Dietary intake was first assessed in 1986 and every 4 years since then. We analyzed data for men and women separately and also aggregated the data from the cohorts. However, we performed the polytomous logistic regression analysis only on the combined cohort due to sample size limitations.
We excluded any participants who reported previous cancer (except for nonmelanoma skin cancer), ulcerative colitis, Crohn’s disease or familial polyposis syndrome at baseline. We also excluded individuals who reported implausible caloric intakes (<800 or >4,200 kcal for men and <600 or >3,500 kcal for women) or who had a significant number of items left blank on the FFQ (10 or more blank out of the 61 question food-frequency questionnaire in 1980 for the women, or 70 or more on the 131 item food-frequency questionnaire in 1986 for the men). This left 87,733 women and 46,632 men eligible for analysis and a combined total of 134,365 subjects in the combined cohort.
All exposure information was based on data collected via the biennial questionnaires. The primary exposures of interest defined a priori were family history, smoking, physical activity, body mass index (weight in kilograms divided by square of height in meters), height, calcium, folate, alcohol, red meat and processed meat. Each factor had been evaluated previously in relation to colon cancer in these 2 cohorts.5,8–20 The disparity of information collected from the NHS and HPFS questionnaires on aspirin dose and duration was such that combining the data on this variable was not feasible. Therefore, we did not include it in this analysis. We also did not consider other potential exposures such as vitamin D or alternative methods of measuring anthropometry due to a desire to limit this analysis to the risk factors with the most well-established associations. We also did not evaluate post-menopausal hormone use, due to the limited number of cases in the women.
The smoking behavior information collected in the NHS and HPFS allows for current, past or never classification or a cumulative pack-year cigarette history. One pack-year of smoking is equivalent to having smoked 1 pack or 20 cigarettes per day for 1 year. Using information on pack-years accumulated throughout the subjects’ participation in the study, we calculated pack-year totals for various intervals in the person’s life. We chose to use pack-years of smoking accumulated prior to age 30 based on previous studies in both of these cohorts.8,9 These previous studies showed this variable largely accounted for the long latency and therefore captured the association of smoking with an increased risk for colorectal cancer. Also, using this variable facilitated use of a single continuous variable for smoking in the polytomous logistic regression models.
For physical activity in the HPFS, we divided the men into quintiles based on their MET hours per week of activity at baseline (1986). To calculate the MET hours per week, we multiplied the time spent on each activity asked about (walking or hiking outdoors, jogging, running, bicycling, lap swimming, tennis, squash or racquetball and calisthenics or rowing) by the typical energy expenditure for that activity expressed in metabolic equivalents. We also added in METS for the number of flights of stairs climbed daily and the usual walking pace. The validity of the MET-hours per week has been studied and reported previously in the NHS II cohort (a cohort similar to the NHS cohort) as well as the HPFS.21,22 We used the medians of each of the quintiles for the linear tests of trend and in the polytomous logistic regression.
In the NHS, we were unable to derive MET-hours comparable to the men until the 1986 questionnaire. Thus, to keep the follow-up period consistent for men and women and to use all cases that arose in the follow-up period, we used the responses on the 1980 questionnaire to classify participants into 5 physical activity categories. More specifically, in 1980 the nurses reported whether they engaged in exercise intensely enough to produce sweating and then the intensity of the specific activity that they reported engaging in most frequently. Women who reported never sweating were classified as low activity (the reference group). Women were considered to have moderate physical activity if they reported sweating and engaged in a moderately intense activity (i.e., a metabolic equivalent score between 1 and 4). Participants who reported sweating and engaged in vigorous activity (MET score of 4 or more) were categorized as high activity. We further divided the moderate category into those who engaged in moderate activity less than 2 times a week or 2 or more times per week. The high activity category was also further divided into those who reported 4 to 6 METS and those with more than 6 METS. Thus, we had 5 categories based on a combination of sweating or not, and MET scores for their primary physical activity. A similar derivation of physical activity in this cohort has been published previously.23 Starting in 1986, we were able to directly calculate a MET hours per week for each NHS participant. Therefore, in order to improve the comparability between the men and women, we used the medians of quintiles of the 1986 MET hours per week for the linear tests of trend and for the polytomous logistic regression.
Baseline height was obtained in the NHS in 1976 and in the HPFS in 1986. Current weight was also obtained at baseline and on the biennial mailed questionnaires. We used self-reported height and weight to calculate each participant’s body mass index (BMI). The self-reported height and weight variables have been previously validated in these cohorts.24 After adjusting for age and within-person variability, correlations of 0.97 for the HPFS participants and 0.96 in the NHS participants were found between self-reported and measured weight (although the NHS participants underreported on average 1.5 kg).
A semi-quantitative food frequency questionnaire developed specifically for these populations was used for dietary assessment in these 2 cohorts. This type of questionnaire has been extensively validated since its creation.25–30 Nutrient intakes are computed by multiplying the reported frequency of consumption of specific food items by the nutrient composition of the specified portion size, using composition values from the U.S. Department of Agriculture sources supplemented with other sources. Nutrient values are adjusted for total energy intake. For the dietary variables alcohol, folate, beef as a main dish and processed meat, we used baseline values of intakes reported (1980 for women and 1986 for men). We used baseline intake for those variables that were previously found to be associated with colon or colorectal cancer in these cohorts. For calcium, we used a cumulative average of intake over the follow-up period because this was associated with colon cancer in these cohorts.5
Family history of colorectal cancer was ascertained in 1982, 1988, 1992, 1996 and 2000 in the women and 1986, 1990, 1992, 1996 and 2000 for the men. We asked the nurse or health professional whether first-degree relatives had ever been diagnosed with colon or rectal cancer. If the participant responded affirmatively with a family member having been diagnosed, he or she was categorized as having a positive family history. As described in a previous analysis,31 we updated this information throughout the follow-up period to account for newly diagnosed family members.
For both the NHS and HPFS, the participants were asked in 1990 whether they had ever had a sigmoidoscopy or endoscopy and if so, the year they had the test done. We used the response to this question to categorize participants as ever having had an endoscopy before baseline (1980 or 1986). We controlled for screening history in all of the multivariate models as individuals with a prior endoscopy had a lower subsequent risk for colorectal cancer.
For both cohorts, each mailed biennial questionnaire ascertained whether a participant was diagnosed with cancer or other disease outcomes within the previous 2 years. When a participant (or next of kin for fatal outcomes) reported a diagnosis of cancer of the colon or rectum on the questionnaire, we asked for permission to obtain hospital records and pathology reports regarding this diagnosis. A physician reviewed the medical records to verify information on histological type, anatomic location and stage of the cancer. Most deaths in these cohorts were reported by family members or the postal system. The National Death Index was used to identify deaths among nonrespondents and has been shown to be highly specific and sensitive.32
For the age-adjusted and multivariate pooled logistic models, variables were modeled as dichotomous or in categories with the same cut points for the men and women. We chose the cut points based on previous literature but also adjusted them when necessary to keep a reasonable number of cases in each category. The only variable for which cut points were gender-specific was height. For height, we generated 4 categories separately for men and women based on the distribution in each cohort and then combined the categories. Red meat and processed meat were modeled in servings per day.
Each study participant contributed person-time from the date of return of the 1980 questionnaire in women and the 1986 questionnaire in men to the date of a colon or rectal cancer diagnosis, death from any cause to May 31, 2000 (NHS) or January 31, 2000 (HPFS), whichever came first. The lowest category was always used as the reference group. We calculated relative risks by dividing the incidence rate in each category by the rate among those in the lowest category (reference category). We used the Mantel-Haenszel summary estimator for the stratified analysis. To test for trend, we modeled the medians from each category as a continuous variable.
To adjust for more than one variable at a time, we used logistic regression pooled over each 2 year time period. Pooled logistic regression asymptotically estimates results from a Cox proportional hazards model assuming that the time period between intervals is short and the number of event in each time period is low.33
To test for differences in associations with colon and rectal cancer, we used polytomous logistic regression on the combined cohort.34 In particular, we used a custom software program described by Marshall and Chisholm.35 This program provides formal tests of the differences in magnitude of the beta estimate of each risk factor for the separate components of a composite endpoint. This program allows the user to specify which variables are modeled with a common beta estimate for both outcomes and which are modeled with 2 distinct betas.
We began with a model that allowed all of the risk estimates for each exposure variable to vary between the 2 outcomes. Then we conducted a “manual” stepwise procedure, each time constraining 1 risk factor to be uniform across the 2 endpoints while allowing all the others to vary. To evaluate the appropriateness of modeling each exposure variable as the same for both outcomes we used likelihood ratio tests. Comparing each successive model with the baseline model, we determined which variable had the highest p-value and set that to have one common estimate for colon and rectal cancer. This became the baseline model for the next set of models. We repeated this procedure with each of the remaining variables, setting more variables to be the same for the 2 outcomes until all the remaining variables had p-values less than or equal to 0.05, indicating that they were likely to have different associations with the 2 outcomes.
Using polytomous regression, modeling 3 outcome categories (nondiseased, colon cancer cases and rectal cancer cases) is extremely computationally intensive. Thus, to reduce the complexity of the computations and to simplify the interpretation of the estimates, we simplified the expressions of each variable for the polytomous logistic regression. We modeled age as a continuous variable and family history and endoscopy as dichotomous variables. For BMI, physical activity, smoking, beef as a main dish, processed meat, height, calcium, folate, alcohol and fiber, we used the medians of the categories (or quintiles) as a continuous variable. Thus, the estimates of risk from the polytomous logistic models represent risk on a different scale than the pooled logistic models and cannot be compared directly. However, when variables are defined identically, results from the polytomous logistic regression and a traditional logistic model are comparable.36 For variables that had missing values (alcohol, physical activity and endoscopy), we assigned an indicator variable for “missing” and another variable for the value if not missing. In the polytomous logistic regression stepwise procedure, we moved both variables in and out of the model together.
Table I shows the distribution of the various exposure variables across the outcome categories. For both the men and women, we show the values at baseline for each cohort (1980 for women and 1986 for men). The combined cohort had 1,139 colon cancer cases, and 339 rectal cancer cases over 2,302,712 person years. (For the distribution of these cases by cohort, see Table I). The men were more physically active and taller, had higher intake of alcohol, had slightly higher intakes of folate, calcium and processed meats and smoked more pack-years before age 30 when compared to the women. Rectal cancer cases tended to have slightly higher folate and slightly lower calcium intake, to be more physically active and to have a lower frequency of positive family history of colorectal cancer when compared to colon cancer cases.
For each variable of interest, and for each cohort (NHS, HPFS, combined cohort), we calculated the number of cases and the multivariate relative risks of colon and rectal cancer (Table II). We also report p-values based on tests for trend. We found age, sex, family history, height, BMI, physical activity, folate, beef, pork or lamb as a main dish, processed meat and alcohol to be related to colon cancer, whereas only age and gender showed statistically significant associations with rectal cancer.
For BMI, there was a suggestion of a different effect between men and women. Although there was an increased risk for colon cancer for both men and women, there was also a statistically significant increased risk in the highest category of BMI among the women for rectal cancer (MVRR = 1.56, 95% confidence interval: 1.01 to 2.42; p for trend = 0.04).
Based on the manual stepwise polytomous logistic regression procedure, we found that family history and physical activity had significantly different risk estimates for colon versus rectal cancer. Height had the lowest nonstatistically significant p-value for a difference between colon cancer and rectal cancer (p = 0.07). Table III shows results from the polytomous logistic regression including the p-values for the difference between the estimates for colon cancer versus rectal cancer. Log likelihood methods indicated that all other variables appear to be modeled appropriately when risk estimates were set to be the same for colon and rectal cancer.
In a combined cohort made up 134,365 women and men, we found that relative risks for family history of colon or rectal cancer and physical activity were significantly different between colon cancer and rectal cancer.
Our finding, that physical activity is more strongly associated with colon cancer rather than rectal cancer, is consistent with the previous literature. One possible biological mechanism to explain this difference is that the colon is more susceptible to the effects of insulin. With increased physical activity, insulin sensitivity improves. 37 If insulin and insulin-like growth factors are responsible for promoting adenoma development or growth, it is possible that the receptors responsible for this effect are less common in the rectum compared to the colon. Furthermore, we found some disparity between the effect of BMI in men and women. These results are consistent with the hypothesis that the mechanism by which BMI influences risk of colon and rectal cancer are more complex and less consistent in women than in men, possibly influenced by the effects of estrogen.38
Similarly, our findings of an increased risk for colon cancer with increased adult height are consistent with previous literature.13 Previously, it has been proposed that adult height may be a proxy for energy balance during childhood, which would also be related to insulin resistance and levels of growth factors.11 As proposed above, one explanation for the differential association we found for height and physical activity between the colon and rectum is the possibility that the colon has a different pattern of receptors for these growth factors and/or a different sensitivity to insulin effects compared to the rectum. The potential for a gradient of hormone receptors between the colon and rectum is supported by the fact that the colon and rectal mucosa arise from different embryonic tissue, the colon arising from the midgut and the rectum arising from the hindgut.6
More generally, the colon and rectum serve different functions and are exposed to fecal matter for different durations. The rectum is exposed to the fecal in a more concentrated and direct way compared to the colon. Also, as undigested matter travels through the colon, it is coated with alkaline mucus. The different levels of pH in proximal and distal locations within the colon may influence susceptibility to environmental factors.6
In their histologic study of large bowel carcinogenesis, Konishi et al.39 suggest that cancers that arise in the distal colon and rectum begin as adenomatous polyps whereas a de novo pathway is more important in lesions that arise in the proximal colon. Also, the prevalence of K-ras mutations and mutation patterns in the p53 gene in rectal cancers differ from those seen in colon cancers.40 In a recent review, Iocapetta highlights the clinical and molecular differences between tumors of the proximal and distal colon. One compelling difference is that the familial forms of colorectal cancer, familial polyposis syndrome (FAP) and hereditary nonpolyposis coli (HNPCC), arise first in different sections of the colon (FAP: rectum and distal colon, HNPCC: proximal colon).6 These findings are compatible with different etiologic factors for colon cancer and rectal cancers.
Previously, risk estimates for rectal cancer have been reported in the NHS and the HPFS cohorts for several variables. These secondary analyses have suggested that there are etiologic differences between the 2 outcomes. For example, in an analysis of family history and colorectal cancer in these 2 cohorts, Fuchs et al.31 reported relative risk (RR) of 1.99 (95% confidence interval (CI): 1.50 –2.63) for colon cancer and a relative risk of 0.86 (95% CI: 0.43–1.70) for rectal cancer among those who reported a family history of colon or rectal cancer. They report a p-value for the difference between colon and rectal cancer that is based on a simple 1 degree of freedom chi-square test. Our methods offer a more rigorous comparison of these the risk estimates for the 2 outcomes and provide flexibility in allowing some risk estimates to be the same and other to be different based on likelihood ratio methods.35 Our results confirm the hypothesis that a family history of colon or rectal cancer appears to affect risk for colon cancer more strongly than risk for rectal cancer; however, because the questionnaire only queried a history of colon or rectal cancer, we cannot rule out the possibility that colon cancer cases arose in those with a family history of colon cancer and rectal cancer arose in those with a family history of rectal cancer.
For several variables such as smoking, calcium and beef as a main dish, our results appear slightly weaker than what was previously reported in these cohorts. Generally, these differences can be explained by comparing model specification and follow-up time from each of those results. Previous analyses within these cohorts often used covariates in their models that were at the time known or suspected to be confounders. With time, our understanding of possible confounding factors has improved and therefore we are now able to control for confounding more precisely. Thus, in comparison to prior analyses we have increased the number of covariates in our models and hopefully better captured the underlying associations. Furthermore, with longer follow-up time and therefore more cases, we have improved power to report accurate and precise risk estimates, particularly with respect to rectal cancer. Because we combined the results from the men and women and did not evaluate finer definitions of subsite (e.g., proximal colon vs. distal colon), we may have, for example, also underestimated associations that exist most strongly in the proximal colon.
One example of differences between our results and previous analyses in these cohorts is for pack-years before age 30. Giovannucci et al.8 has previously reported results for smoking before age 30 in these cohorts by subsite. In the HPFS, they reported a significant association with colon cancer for ≥16 pack-years before age 30 vs. 0 pack-years before age 30 (RR = 1.96, 95% CI: 1.16 –3.29) and found suggestive but not statistically significant results for rectal cancer (RR = 1.62, 95% CI: 0.60–4.37). These results however were based on total colorectal cases up to 1992, including only 44 rectal cancer cases. In the NHS, they reported elevated risks for both colon and rectal cancer among women 55 and older. However, the RR and CI were slightly stronger for rectal cancer (greater than 10 pack-years vs. zero pack-years: RR colon = 1.59, 95% CI: 1.14 –2.23; RR rectal = 2.36, 95% CI: 1.22– 4.55).9 Although our results showed weaker associations for colon cancer and rectal cancer than what was found previously, our relative risks are within the confidence intervals they reported. Furthermore, their multivariate models were specified with different covariates than in our study. Giovannucci et al.8 controlled for age, family history, BMI, and intake of saturated fat, dietary fiber, folate and alcohol, whereas our multivariate models also include height, beef, pork or lamb as a main dish, processed meat, calcium and history of endoscopy. Our results that show a slightly stronger association between cigarette smoking and rectal cancer than for colon cancer are also consistent with a recent prospective cohort study by Terry et al.41 In their cohort of 89,835 women, they found statistically significant associations between 40 or more years since smoking commenced vs. never smokers and rectal cancer (hazard ratio = 2.27, 95% CI: 1.06–4.87) but weaker results and nonstatistically significant associations for colon cancer (hazard ratio = 1.12, 95% CI: 0.62–2.04). Further investigations of smoking and rectal cancer should be undertaken to clarify and confirm this association.
Another variable for which our results differ slightly from the previously published findings is calcium. In their analysis, Wu et al.5 found the strongest association of calcium with disease arising in the distal colon (multivariate pooled RR for >1,250 mg/day versus = 500 mg/day = 0.65, 95% CI = 0.43– 0.98). However, their results were stronger in men than women (men: multivariate RR = 0.58, 95% CI = 0.32–1.05; women: multivariate RR = 0.73, 95% CI = 0.41–1.27). Furthermore, there appeared to be a threshold effect with intakes above 700 mg/day. Since we are combining the proximal and distal colon and, in the PLR model, looking at calcium as a continuous variable, our effect estimates appear weaker. Since previously there was a suggestion of a threshold effect, the interpretation of the relative risk in the polytomous model for calcium in particular, but also to some degree the other variables, must be interpreted with caution, since for the PLR analysis we specified the exposures as continuous variables.
A visual comparison of relative risks can suggest a difference in their contribution to disease in each subsite. However, visually comparing the relative risks from separate logistic models, particularly from different cohorts can be problematic due to the dependence of standardized regression coefficients on study-specific variability in risk factors. Also, a comparison of statistical significance between outcomes is limited by the dependence of levels of statistical significance on numbers of events for each component of the outcome. One major strength of our study is that the Marshall and Chisholm method we used for the polytomous logistic regression models allows for some exposure variables to vary by outcome while others remain the same.
One limitation of our study is that with 339 rectal cancer cases, we may not have enough power to detect small differences in risk estimates between colon and rectal cancer. Therefore, we cannot rule out etiologic differences between the colon and rectum that were suggested by nonsignificant p-values. Given our sample size, for example, it would be premature to conclude a lack of differential risk for colon and rectal cancer with increased height (p = 0.07).
Another limitation of our study is that we considered the colon in its entirety when in fact, etiologic distinctions between the proximal and distal colon may exist. Possibly more significant differences may lie between the proximal (or distal) colon and the rectum, but these differences are not detectable when combining estimates between the proximal and distal colon. In fact, heterogeneity in tumor formation and growth are likely more precisely defined by molecular characteristic rather than anatomic location. However, differentiating risk factors using anatomic definitions may undercover patterns for future analyses using molecular markers.
Some degree of measurement error in our dietary and lifestyle variables is inevitable but any random misclassification of the exposure variables would only attenuate the associations we observed and would not differentially influence one outcome and not the other. However, because in this analysis we are focused on differences between the colon and rectum, any measurement error would reduce our power to compare estimates. The questions used to obtain dietary and lifestyle data from these 2 cohorts are very similar for all variables with the exception of physical activity (and aspirin, which we did not include for this reason). The slight differences in the way we used the physical activity information may have caused some error associated with combining the cohort information for this variable. However, because we used the information from 1986 for both cohorts (directly comparable derivations of MET-hours) for the polytomous logistic regression analysis, these results should still be valid.
In summary, our findings support the hypothesis that some risk factors, including family history, physical activity and possibly height, differ in their association with colon and rectal cancer. We cannot exclude the possibility that the remaining variables we examined actually have different associations with colon and rectal cancer that were not appreciated due to lack of statistical power. Because risk factors for colorectal cancer do not appear to contribute equally to colon and rectal cancers, future investigations into risk for colorectal cancer should ideally be done differentially by subsite.
Grant sponsor: NIH; Grant number: CA87969