Search tips
Search criteria 


Logo of carcinLink to Publisher's site
Carcinogenesis. 2013 February; 34(2): 299–306.
Published online 2012 November 3. doi:  10.1093/carcin/bgs344
PMCID: PMC3564440

Cell cycle–related genes as modifiers of age of onset of colorectal cancer in Lynch syndrome: a large-scale study in non-Hispanic white patients


Heterogeneity in age of onset of colorectal cancer in individuals with mutations in DNA mismatch repair genes (Lynch syndrome) suggests the influence of other lifestyle and genetic modifiers. We hypothesized that genes regulating the cell cycle influence the observed heterogeneity as cell cycle–related genes respond to DNA damage by arresting the cell cycle to provide time for repair and induce transcription of genes that facilitate repair. We examined the association of 1456 single nucleotide polymorphisms (SNPs) in 128 cell cycle–related genes and 31 DNA repair–related genes in 485 non-Hispanic white participants with Lynch syndrome to determine whether there are SNPs associated with age of onset of colorectal cancer. Genotyping was performed on an Illumina GoldenGate platform, and data were analyzed using Kaplan–Meier survival analysis, Cox regression analysis and classification and regression tree (CART) methods. Ten SNPs were independently significant in a multivariable Cox proportional hazards regression model after correcting for multiple comparisons (P < 5×10–4). Furthermore, risk modeling using CART analysis defined combinations of genotypes for these SNPs with which subjects could be classified into low-risk, moderate-risk and high-risk groups that had median ages of colorectal cancer onset of 63, 50 and 42 years, respectively. The age-associated risk of colorectal cancer in the high-risk group was more than four times the risk in the low-risk group (hazard ratio = 4.67, 95% CI = 3.16–6.92). The additional genetic markers identified may help in refining risk groups for more tailored screening and follow-up of non-Hispanic white patients with Lynch syndrome.


Lynch syndrome (also called hereditary non-polyposis colorectal cancer) is an autosomal dominant inherited cancer predisposition disorder. It is caused by defects in DNA mismatch repair (MMR) due to mutations in DNA MMR genes—MLH1, MSH2, MSH6 and PMS2 (1–4) and more recently also due to mutations in TACSTD1 (or EPCAM) (5,6). The cancers most commonly seen in affected individuals are early onset colorectal cancer (CRC) and endometrial cancer, but cancers of other sites are also observed, such as cancers of the stomach, biliary tract, pancreas, kidneys, brain and skin (7,8). Lynch syndrome tumors are characterized by microsatellite instability resulting from deficient DNA MMR and also demonstrate loss of staining for one or more of the MMR proteins (9–11). These tumor characteristics are key in suspecting Lynch syndrome in an individual and form the basis for genetic testing for Lynch syndrome.

Although mutations in DNA MMR genes are the underlying cause of Lynch syndrome, there is heterogeneity in expression of the cancer phenotype, suggesting that other genetic and lifestyle factors may influence cancer risk. The heterogeneity is particularly evident in the variability in age of onset of CRC seen in these patients. Carriers of MSH6 mutations have a later age of CRC onset than MLH1 and MSH2 mutation carriers, and CRC is less frequent in MSH6 mutation carriers (12,13) but mutations in the different MMR genes only account for some of the variability observed in age of onset of CRC.

Cell cycle checkpoints respond to DNA damage by arresting the cell cycle to provide time for repair and by inducing transcription of genes that facilitate repair (14). Checkpoint loss and perturbation of cell cycle control results in genomic instability and is a hallmark of cancer. More subtle genetic changes due to functional polymorphisms in cell cycle–related genes can act as genetic risk modifiers for the development of cancer. Our previous studies indicate that polymorphisms in the cell cycle–related genes cyclin D1, p53 and AURKA are associated with earlier age of onset of CRC in MMR gene mutation carriers (15–17). Other cell cycle–related genes have also been implicated in modifying cancer risk, including p16 (18), p15 and Rb1 (19), p21 (19), p27 (20) and CHEK2 (21). We hypothesized that in addition to genes regulating MMR, genes regulating the cell cycle influence the heterogeneity in CRC age of onset in patients with Lynch syndrome. To test our hypothesis, we examined the association of 1456 single nucleotide polymorphisms (SNPs) in 128 cell cycle–related genes and 31 DNA repair–related genes in 485 non-Hispanic white subjects with Lynch syndrome to determine whether one or more of the SNPs modified the age-associated risk of CRC. The overarching goal of our study was to provide a better understanding of the role of multiple genetic variants in cell cycle–related genes as risk factors responsible for variation in onset age of Lynch syndrome.

To capture the combined effect of multiple SNPs in the cell cycle pathway, we used a pathways-based genotyping approach, which may amplify the effects of individual polymorphisms that interact in the same pathway and enhance the predictive power. In addition, we utilized a tree-based statistical approach to identify genetic risk factors influencing age-associated risk for Lynch syndrome. We selected a tree-based analysis because it is often able to uncover complex interactions between predictors that may be difficult or impossible to uncover using traditional multivariate techniques. Furthermore, tree-based modeling is adept in uncovering predictors that may be largely operative within specific patient subgroups, but may have minimal effect or none in other patient subgroups.

Materials and methods

Study population

Patients and family members with a confirmed MMR mutation in MLH1, MSH2 or MSH6 were included in the study. To avoid heterogeneity attributable to racial differences in allele frequencies, the analysis was limited to self-reported non-Hispanic white subjects. There were 266 study participants from The University of Texas MD Anderson Cancer Center, USA, and 216 from the Hunter Medical Research Institute, Australia. All participants provided written informed consent for use of their DNA for this research, and the study was approved by the Institutional Review Board of MD Anderson Cancer Center and the Institutional Ethics Review Board of the Hunter New England Health Service.

Gene and SNP selection

To select the cell cycle–related genes included in this study, we used the KnowledgeNet algorithm (22), which is an effective tool to identify genes associated with specific function. It combines literature mining with data on functional classification of genes by the Gene Ontology database. First, a list of key words describing the specific gene function needs to be identified. We used cell cycle, cycle progression, cycle arrest, cell cycle progression, cell cycle arrest, cycle regulation, cycle control, cell cycle regulation, cell cycle control, cell cycle checkpoint, cycle checkpoint, cell cycle checkpoint control, checkpoint and checkpoint control as key words to identify cell cycle–related genes. KnowledgeNet provides a ranking of genes with confidence scores, which we used to prioritize. We next used Ingenuity Pathways Analysis software (Ingenuity Systems, to search the cell cycle pathway in the Ingenuity Pathways Analysis library of canonical pathways. Genes that appeared on the list identified by searching KnowledgeNet but did not appear within the Ingenuity Pathway Analysis cell cycle pathway were excluded from the list. From the remaining genes, we chose to study the top 124 genes ranked by KnowledgeNet using the cutoff confidence score >0.04.

To further refine the list, we used the KnowledgeNet algorithm to select genes that are considered to be important for colorectal cancer. We searched using the terms crc, cancer crc, colorectal cancer crc, crcs, crc risk, sporadic crc, carcinoma crc, primary crc, colorectal carcinoma crc, cancers crc, familial crc, colorectal cancer crc risk, sporadic crcs, familial colorectal, sporadic colorectal cancer crc, colorectal neoplasms, crc susceptibility, cancers crcs, sporadic colorectal, colorectal cancer and colorectal carcinoma. Four cell cycle–related genes including TGFBR2, Mlh1, Msh2 and Msh6 that were ranked highly for CRC but were not on our top-124 list (described in the preceding paragraph) were added to our list.

We used SNPbrowser version 4.0 (23) to select tagging SNPs. This software was designed for selection of SNPs based on observed linkage disequilibrium (LD), through construction of metric LD maps and selection of haplotype tagging SNPs. The application provides easy and intuitive selection of SNPs, including visualization of SNPs by showing gene structure, linkage disequilibrium map and haplotype block information. The tagging SNP wizard easily enables the selection of maximally informative tagging SNPs based on user-selected parameters. SNP selection is based on the ethnic-specific LD patterns identified by the HapMap Project ( The tagging SNPs were chosen with an r 2 of 0.80 or more and a minor allele frequency (MAF) of 0.05 or more in Caucasian population. SNPs from the adjacent 10kb regions on either side of the gene were also included. All validated non-synonymous SNPs were included regardless of MAF. We also include some functional SNPs from 31 DNA repair–related genes that have been reported to influence risk for cancer.

Genotyping and data cleaning

A GoldenGate assay (Illumina, San Diego, CA) was developed to examine 1536 SNPs that were assayable (design score >0.60) according to the GoldenGate genotyping platform criteria. Genotypes were called using Beadstudio software (Illumina). Plates were constructed with duplicate and quality control samples. There were 24 duplicated DNA samples for genotyping quality control. The average discordance rate of duplicates is 0.07%. We removed 12 SNPs with an MAF of 0.01 or less, 26 SNPs with call rate of <95%, 26 SNPs with discordance between duplicates, 16 SNPs with Hardy–Weinberg equilibrium with a P value of 10−5 or less and 1 individual with a call rate of <90% across SNPs. The final data set consisted of 485 white patients with Lynch syndrome with genotyped results for 1456 SNPs.

In silico tools for examining functional relevance of SNPs

We explored the functional consequences of the SNPs using two searchable databases F-SNP ( (24) and the UCSC Genome browser ( (25,26). The UCSC Genome browser incorporates visualization of some of the Encyclopedia of DNA elements (ENCODE) functional elements, such as regions of transcription, transcription factor association, chromatin structure and histone modification (27).

Statistical methods

The outcome variable for the analysis was time to CRC onset. The data were explored for differences in CRC age of onset by sex and MMR mutation type using the log-rank test. Hazard ratios (HRs) and 95% CIs were generated using Cox proportional hazards regression analysis to test the association of each of the 1456 SNPs with risk of CRC. All association analyses were adjusted for sex and MMR mutation type, and to allow for correlation between CRC onset age between multiple family members, we applied the Huber–White robust variance correction and clustered on the family ID (28). STATA software (version 10, StataCorp LP, College Station, TX) was used to perform the analyses.

Principal components analysis was conducted to evaluate the potential effects of population structure between Australian and US samples. There was no significant difference in eigenvector loadings for the first five factors showing that Australian–USA differences in structure were a minor source of population variability. Therefore, we did not condition the analysis on study site.

The analyses were performed in five stages, as follows:

  • Stage 1: Single SNP association analysis with genotypes coded as 0 = wild-type, 1 = heterozygous and 2 = homozygous variant. The models were additive (continuous effect of increasing number of variant alleles 0 versus 1 versus 2), dominant (0 versus 1 and 2), recessive (0, 1 versus 2) and genotypic (0 versus 1, 0 versus 2). All stage 1 Cox regression analyses were adjusted for sex, MMR mutation type and familial correlation.
  • Stage 2: SNPs that were significant in one or more of the genetic models were ranked by their smallest P value. To limit the probability of false-positives due to multiple testing, a false discovery rate method of Benjamini and Hochberg (29) was used to calculate q-value. A false discovery rate cutoff of 0.05 was applied to select the top SNPs, which limited the probability of false-positives due to multiple tests that were carried out. P values for 14 SNPs exceeded the false discovery rate cutoff.
  • Stage 3: The top 14 SNPs and the covariates sex and MMR mutation type were run in a Cox forward-selection regression model to select the most parsimonious model.
  • Stage 4: The top SNPs retained in the multivariable model were classified into favorable and unfavorable (risk) genotypes and survival analysis methods were used to determine the effect of having none versus one or more unfavorable genotypes.
  • Stage 5: CART analysis was used to construct survival trees to identify subgroups of patients with different risks. We applied the RPART function written in S-PLUS software (version 8.0, Insightful Corporation, Seattle, WA) to construct the risk groups.

The multistage approach allowed us to thoroughly interrogate the association signals and identify clinically meaningful risk groups.


The study sample consisted of 205 men and 280 women from 272 families. All participants were self-reported non-Hispanic Whites. A majority of the subjects had mutations in MLH1 (44.9%) or MSH2 (49.5%), but 27 subjects (5.6%) had MSH6 mutations (Table I). A majority of the families (66.2%) had only one family member in the study, but the study also included families with two family members (accounting for 14.0% of families), three family members (10.3%) or four or more family members (9.5%). The median age at CRC diagnosis was older in women (58 years) than in men (48 years; log-rank test P = 0.0002), and women had a lower rate of CRC diagnosis (Table I). Similarly, the median age at CRC diagnosis was older in MSH6 mutation carriers (66 years) than in carriers of mutations in MLH1 (48 years) or MSH2 (52 years), and MSH6 mutation carriers had a lower rate of CRC diagnosis (Table I).

Table I.
Subject characteristics (n = 485)* and HRs and 95% confidence intervals (CI) for colorectal cancer risk

There were 191 SNPs associated with age at diagnosis of CRC at P < 0.05 in the adjusted Cox regression analysis, adjusting for sex, mutation type and familial correlation due to presence of multiple family members in the sample. Fourteen SNPs in 13 genes remained significant after correction for multiple comparisons (results in Table II). None of the 14 SNPs violated Hardy–Weinberg equilibrium. In the Cox forward-selection regression model, four SNPs on chromosome 5 (CDC25C: rs17171794; KDM3B/FAM53C: rs3734168; CDC25C: rs6874130 and SKP2: rs3804439) were no longer significant at P < 0.05; these SNPs were not included in the multivariable model. Three of the four SNPs that dropped out of the model, rs17171794, rs3734168 and rs6874130, were in high LD with rs3734166 (r 2 ≥ 60) as seen in a LD plot generated using Haploview (30) (Supplementary Figure 1, available at Carcinogenesis Online) and were likely dropped from the model because of being correlated with rs3734166. The remaining 10 SNPs in 10 genes were significant in the multivariable model, suggesting independent effects on age of CRC onset.

Table II.
Association between genetic variants in the cell cycle pathway and age of onset of CRC in non-Hispanic whites with Lynch syndrome

As 10 genes independently influenced age-associated CRC risk, we performed combined analysis according to the number of unfavorable (risk increasing) genotypes carried by each individual with the underlying hypothesis that people with a larger number of unfavorable genotypes would be at higher risk for developing CRC at a younger age. Although single SNPs may confer relatively low risk individually, it has been shown that a panel of SNPs in the same pathway may significantly amplify the effects of individual SNPs (31–33). Unfavorable genotypes were classified on the basis of the genetic model that attained significance in the Cox regression. For example, if a SNP was significant in a recessive model, the homozygous variant genotype was considered unfavorable, whereas if the SNP was significant in a dominant model, the heterozygous and the homozygous variant genotypes were considered unfavorable. Using this classification method, we found that subjects carried between 0 and 7 adverse genotypes. Compared with people carrying no unfavorable genotype (of any of the 10 SNPs) as the reference group, people carrying one or two unfavorable genotypes had more than twice the risk and those carrying three or more unfavorable genotypes had more than four times the risk of CRC (Table III) after adjusted for sex, type of MMR mutation and familial correlation. The median age at onset differed significantly between the three groups: it was 58 years for people with no unfavorable genotypes, 48 years for those with 1–2 unfavorable genotypes and 40 years for those with three or more unfavorable genotypes (Figure 1).

Table III.
Cumulative analysis of unfavorable genotypes
Fig. 1.
Kaplan–Meier estimates of age at CRC onset by number of unfavorable genotypes.

CART analysis was performed using genotypes of the 10 SNPs, sex and MMR mutation type. The final resulting tree is shown in Figure 2. There was an initial split on KIF20A: rs10038448. The subgroup with oldest age of onset of CRC (node 1) had the following characteristics: KIF20A: rs10038448 wild-type genotype (WW) and heterozygous variant genotype (WM); female; TGFB1: rs12980942 WM/MM; BCL2: rs1531697 WM/MM; and CHFR: rs11610954 WW. The median onset age in these patients was 63 years. The subgroup with the youngest age of onset of CRC, 35 years (node 5) had the following characteristics: KIF20A: rs10038448 WW/WM; female; and TGFB1: rs12980942 homozygous variant genotype (MM). Furthermore, we used the Cox proportional hazards model to estimate HRs for all the groups and used the subgroup with the latest median age of CRC onset (node 1) as the referent (Figure 2). Because there may be a correlation of time to cancer onset in individuals from the same family due to genetic or familial factors, we applied a robust variance correction in the Cox regression analysis to adjust for the differences (34). We grouped the terminal seven nodes into three categories based on the estimated HRs for each node: low risk (node 1), moderate risk (nodes 2–4) and high risk (nodes 5–7). Compared with the low-risk group, we found HRs of 2.19 (95% CI, 1.50–3.19) and 4.67 (95% CI, 3.16–6.92) for the moderate-risk and high-risk groups, respectively (Table IV). The log-rank test (P = 4.81×1014) demonstrated a statistically significant difference among the time-to-onset curves of these three groups (Figure 2). The median age at onset was 42 years for the high-risk group, 50 years for the moderate-risk group and 63 years for the low-risk group.

Table IV.
Results according to risk groups generated by classification and regression tree analysis
Fig. 2.
(A) Time-to-onset tree for age of onset of CRC. Inside each node is the number of affected CRC patients/the total number of subjects. WW, wild-type; WM, heterozygote; MM, homozygous polymorphism; (B) Kaplan–Meier curves for age of CRC onset by ...


In this hypothesis-generating study, we identified 10 SNPs significantly associated with age-associated risk of CRC after correction for multiple comparisons. We used a pathways-based multigenic approach to capture the combined effect of multiple SNPs in the cell cycle pathway. The analysis showed evidence of a significant gene-dosage effect. People with a larger number of unfavorable genotypes were at higher risk. Furthermore, CART analysis identified a subgroup with high probability of cancer occurrence at younger ages, a median CRC onset age of 42 years as well as a subgroup with the later age of onset, a median CRC onset age of 63 years.

The genes associated with the 10 significant SNPs were XRCC5, TTC28, TNF, TGFB1, PPP2R2B, BCL2, KIF20A, CHFR, CDC25C and ATM. These genes are all directly or indirectly involved in cell cycle checkpoint control based on Ingenuity canonical pathways (Ingenuity® Systems, (Supplementary Figure 2, available at Carcinogenesis Online).

Many of the significant SNPs in these genes are potentially functional as they are in the 5′ or 3′ untranslated region (UTR) or in the coding region. The SNP rs1051685 is located in the 3′ UTR region of XRCC5 and may be of functional relevance because it is located in an exonic splice enhancer sequence as determined by PupaSNP (35). The SNP rs12980942 is located in the upstream 5′ UTR of TGFB1 and may affect mRNA stability and translation. TGFB1 plays a critical role in regulation of cell proliferation, differentiation and apoptosis and serves as a tumor suppressor in normal intestinal epithelium (36). The SNPs rs3734166 and rs1800057 are located in the coding regions of CDC25C and ATM, respectively. CDC25C is a phosphatase that serves as a regulator of G2/M transition and mediates this checkpoint in response to DNA damage. The non-synonymous SNP rs3734166 (CDC25C R70C) was significantly associated with early age of onset of CRC in our study. ATM- and Chk1/2-mediated phosphorylation of CDC25C plays a major role in G2/M arrest. Our study also found that SNP rs1800057 in ATM, resulting in the amino acid change P1045R, was associated with early age of onset of CRC. The SNP was predicted to be deleterious. Heterozygosity for P1054R is associated with decreased ATM expression in tumors (37). The SNP is in complete LD with SNP rs1800056: F858L. The two SNPs were reported to be associated with risk of CRC (38). ATM is critical for regulation of cell cycle checkpoints. Activation of ATM by DNA damage leads to ATM-dependent phosphorylation of CHEK2.

The remaining SNPs that we identified were located in the intronic regions of genes. Recently, the Encyclopedia of DNA elements (ENCODE) project has reported that 80% of the genome is related to some biochemical function after systematically mapping regions of transcription, transcription factor association, chromatin structure and histone modification (27). We found that many of the intronic SNPs mapped to areas of histone modification (modification of histone proteins present in chromatin influence gene expression by changing how accessible the chromatin is to transcription), DNaseI hypersensitivity clusters (DNase hypersensitivity sites map to regions believed to contain regulatory elements, including CpG islands, and highly conserved sequences) (39), and altered transcription factor binding sites. SNPs that were predicted to alter a transcription factor binding site using the in silico tool TFSearch ( included rs12980942 (5′UTR of TGFB1), rs1051685 (3′ UTR of XRCC5), rs3093662 (intronic of TNF and 3′ of LTA) and rs6874130 (CDC25C). In particular, the A-to-G base change of rs3093662 was predicted to gain a cytidine diphosphate binding site; cytidine diphosphate is a transcription regulator known to be involved in cellular proliferation and cell cycle progression (40,41), which are key processes in carcinogenesis. Using the UCSC Genome browser (, intronic SNPs related to areas of histone modification included rs10477307 (PP2R2B), rs12980942 (5′UTR of TGFB1), rs1051685 (3′ UTR of XRCC5) and rs1531697 (BCL2) and some of these SNPS (rs10477307, rs12980942, rs1051685, rs3093662, rs1531697 and rs6874130) were also related to DNaseI hypersensitivity sites. Although intronic, many of these SNPs may therefore influence gene function. Alternatively, the significant SNPs located in intron regions of genes may be linked to other causal SNPs to affect gene activity. In addition, some of the SNPs that we found to be significantly associated with age-associated risk of CRC were associated with susceptibility to other cancers or diseases. For example, SNP rs1051685 in XRCC5 has been reported to be associated with susceptibility to myeloma (35). SNP rs12980942, in the upstream 5′ UTR of TGFB1, has been reported to be associated with increased susceptibility to asthma (42). SNP rs1800057 in ATM has been found to be associated with increased risk of prostate cancer and breast cancer and to modify the effect of radiotherapy (43–47).

When we performed combined analysis for all 10 significant SNPs, we found that people with a larger number of unfavorable genotypes were at higher risk. Our findings suggest a cumulative effect of SNPs that interact in the same pathway on age-associated risk of CRC.

The CART analysis identified a few subgroups with higher risk of early onset CRC. There was an initial split on KIF20A: rs10038448, suggesting that this variation was one of the most important risk factors for CRC. SNP rs10038448 was selected as a tagging SNP for cell cycle gene CDC23 and is in complete LD with SNP rs2864, which is located in the 3′ UTR of CDC23. CDC23 is a protein essential for cell cycle progression through the G2/M transition. CDC23 is a component of the anaphase-promoting complex required for degrading mitotic cyclins and other cell cycle regulators (48). Wang et al. (49) reported that mutant human CDC23 protein decreased cell cycle progression of colon epithelial cells and was involved in expression of human Cyclin b1 protein. The tree shown in Figure 2 may provide some clues regarding how genes act together on the age-associated risk of CRC in patients with Lynch syndrome. As shown in the figure, TGFB1: rs12980942, BCL2: rs1531697 and CHFR: rs11610954 act together. TGFB1 is a well-known cell cycle inhibitor. BCL2 family proteins regulate and contribute to programmed cell death or apoptosis. It has been reported that TGFB1 elevates the protein content of the apoptosis-preventing BCL2 (50), and BCL2 plays a crucial role in regulating the G1/S transition of hematopoietic cells induced by TGFB1 (51). Checkpoint with forkhead and ring finger domains (CHFR) functions as an important checkpoint protein early in the G2/M transition, and its activation delays entry into metaphase in response to mitotic stress (52). The tree in Figure 2 showed that the TNF gene was also a risk factor for CRC. The TNF gene encodes a multifunctional proinflammatory cytokine that belongs to the tumor necrosis factor (TNF) superfamily. Human TNF protein was reported to increase the arrest in G1 phase of MV-4–11 and MCF7 cells (53,54). Recently, it was reported that TNF can promote G1/S transition in vascular endothelial cells and facilitate the cell cycle activation induced by vascular endothelial growth factor (55). Our findings suggest that these genes work together and play important roles in different cell cycle phases to control cell cycle checkpoints. However, the results should be interpreted with caution. As any model, CART has its own weakness. CART splits trees only by one variable and does not use combinations of variables. CART may have unstable tree structure. Small variations in the data might lead to radical changes in decision trees. This problem can be alleviated by using decision trees within an ensemble. In CART analysis, learning an optimal decision tree is based on the heuristic algorithms where the tree is optimal at each split locally. It may not be globally optimal. It helps by training multiple trees in an ensemble learner and randomly sampling samples with replacement. We acknowledge that CART analysis is an exploratory analysis. Prospective validations in independent studies are required to confirm the results of our studies.

To our knowledge, ours is the first study to examine the association of a large panel of tagging SNPs in cell cycle and DNA repair genes with age-associated CRC risk in people with Lynch syndrome. Other strengths of our study include the multigenic pathway-based approach, which helped us identify subgroups of individuals who differ significantly in risk profiles. Potentially, the SNPs identified may also influence risk for sporadic CRC as shown in our earlier studies where a cyclin D1 A870G SNP that modified age of CRC onset in Lynch syndrome mutation carriers (17) was also shown to increase sporadic CRC risk at a younger age (in people <60 years) (56). Similarly, two other studies reported that the cyclin D1 A870G polymorphism was associated with increased CRC risk in younger patients within the Taiwanese population (57) and also increased risk for incident sporadic colorectal adenomas (58). In our current study, we further validated the cyclin D1 A870G SNP, and although the P value was less than the significance level (P < 0.05), it did not meet the multiple testing threshold and so was not included in the top SNPs. Upon validation of our study results, future studies could be directed towards examining the influence of cell cycle–related modifiers of CRC risk as modifiers of non-syndromic (sporadic) CRC risk.

Recent genome-wide association studies (GWASs) have identified at least 15 common genetic susceptibility loci associated with CRC (59–61). Interestingly, six susceptibility loci map to genes involved in the transforming growth factor-beta (TGFβ) signaling pathway, including GREM1, BMP2, BMP4, SMAD7, RHPN2 and LAMA5. Our study focused on cell cycle–related genes. Among 10 significant loci, two of them map to genes involved in the TGFβ signaling pathway, TGFB1 and BCL2, respectively. The findings of our study together with the overrepresentation of TGFβ-related loci in the GWASs suggest that perturbation in the TGFβ signaling pathway plays a critical role in CRC susceptibility (62).

The limitations of our study include the restriction of the population to non-Hispanic whites. Cooperative interinstitutional studies are needed to examine SNPs potentially associated with age of onset of CRC in other ethnic groups. Additionally, the frequency of adverse genotypes for some of the significant SNPs was low (<5%), and those results may be unstable. Furthermore, in our study we were not able to allow for the potential modifying effects of environmental risk factors on age-associated CRC risk. Factors such as smoking (63–65), meat consumption and meat preparation (66), overweight and obesity (67,68), oral contraceptive use (69), fruit consumption and dietary fiber intake (63) have been examined as potential modifiers of adenoma or CRC risk in people with Lynch syndrome. In particular, there is convincing evidence for smoking as a modifier of risk. However, we did not have complete smoking data from our study participants and therefore could not adjust for smoking or other environmental risk factors in the multivariable model. Another limitation of our study is lack of data for tumor site. We were unable to conduct analysis stratifying by rectal and colon cancer. Several GWASs have reported that there were notable site-specific differences in risk associated with some SNPs including rs3802842 at 11q23 and rs4939827 (SMAD7) (70–72), although Broderick et al. (73) did not observe differences by site.

In conclusion, we identified SNPs in 10 genes that were associated with earlier age of onset of CRC in non-Hispanic white patients with Lynch syndrome, and we applied a risk modeling approach to classify individuals into different risk groups on the basis of their genotypes. Ongoing, larger and pooled GWAS analyses as well as studies in other ethnic populations may help identify additional susceptibility alleles and together these may better classify CRC risk. Such classification may help refine the frequency and intensity of screening required for these at-risk subjects. Although our results were based on a large sample size, further validation of these findings is warranted in non-Hispanic whites and in other ethnic groups.


The University of Texas MD Anderson Cancer Center is supported in part by the National Institutes of Health through Cancer Center Support (CA016672) and National Cancer Institute (CA 70759 to M.L.F. and K07CA160753 to M.P.).

Supplementary Material

Supplementary Data:


We thank Di Zhang, Joshua D. Rother and Haidee Chancoco for technical assistance with sample preparation and genotyping assays, Laura Lucio for patient recruitment and Stephanie Deming for scientific editing.

Conflict of Interest Statement: None declared.



classification and regression tree
checkpoint with forkhead and ring finger domains
confidence intervals
colorectal cancer
genome-wide association studies
hazard ratios
linkage disequilibrium
minor allele frequency
mismatch repair
single nucleotide polymorphisms
transforming growth factor-beta
tumor necrosis factor
untranslated region.


1. Edelmann L., et al. (2004). Loss of DNA mismatch repair function and cancer predisposition in the mouse: animal models for human hereditary nonpolyposis colorectal cancer. Am. J. Med. Genet. C. Semin. Med. Genet. 129C, 91–99 [PubMed]
2. Nakagawa H., et al. (2004). Mismatch repair gene PMS2: disease-causing germline mutations are frequent in patients whose tumors stain negative for PMS2 protein, but paralogous genes obscure mutation detection and interpretation. Cancer Res. 64, 4721–4727 [PubMed]
3. Peltomäki P., et al. (2004). Mutations associated with HNPCC predisposition – Update of ICG-HNPCC/INSiGHT mutation database. Dis. Markers. 20, 269–276 [PMC free article] [PubMed]
4. Worthley D.L., et al. (2005). Familial mutations in PMS2 can cause autosomal dominant hereditary nonpolyposis colorectal cancer. Gastroenterology. 128, 1431–1436 [PubMed]
5. Kovacs M.E., et al. (2009). Deletions removing the last exon of TACSTD1 constitute a distinct class of mutations predisposing to Lynch syndrome. Hum. Mutat. 30, 197–203 [PubMed]
6. Ligtenberg M.J., et al. (2009). Heritable somatic methylation and inactivation of MSH2 in families with Lynch syndrome due to deletion of the 3’ exons of TACSTD1. Nat. Genet. 41, 112–117 [PubMed]
7. Lynch H.T., et al. (1991). Hereditary nonpolyposis colorectal cancer (Lynch syndromes I & II). Genetics, pathology, natural history, and cancer control, Part I. Cancer Genet. Cytogenet. 53, 143–160 [PubMed]
8. Mecklin J.P., et al. (1991). Tumor spectrum in cancer family syndrome (hereditary nonpolyposis colorectal cancer). Cancer. 68, 1109–1112 [PubMed]
9. Boland C.R., et al. (1998). A National Cancer Institute Workshop on Microsatellite Instability for cancer detection and familial predisposition: development of international criteria for the determination of microsatellite instability in colorectal cancer. Cancer Res. 58, 5248–5257 [PubMed]
10. Rodriguez-Bigas M.A., et al. (1997). A National Cancer Institute Workshop on Hereditary Nonpolyposis Colorectal Cancer Syndrome: meeting highlights and Bethesda guidelines. J. Natl Cancer Inst. 89, 1758–1762 [PubMed]
11. Umar A., et al. (2004). Revised Bethesda Guidelines for hereditary nonpolyposis colorectal cancer (Lynch syndrome) and microsatellite instability. J. Natl Cancer Inst. 96, 261–268 [PMC free article] [PubMed]
12. Hendriks Y.M., et al. (2004). Cancer risk in hereditary nonpolyposis colorectal cancer due to MSH6 mutations: impact on counseling and surveillance. Gastroenterology. 127, 17–25 [PubMed]
13. Plaschke J., et al. (2004). Lower incidence of colorectal cancer and later age of disease onset in 27 families with pathogenic MSH6 germline mutations compared with families with MLH1 or MSH2 mutations: the German Hereditary Nonpolyposis Colorectal Cancer Consortium. J. Clin. Oncol. 22, 4486–4494 [PubMed]
14. Elledge S.J. (1996). Cell cycle checkpoints: preventing an identity crisis. Science. 274, 1664–1672 [PubMed]
15. Chen J., et al. (2007). Association between Aurora-A kinase polymorphisms and age of onset of hereditary nonpolyposis colorectal cancer in a Caucasian population. Mol. Carcinog. 46, 249–256 [PubMed]
16. Jones J.S., et al. (2004). p53 polymorphism and age of onset of hereditary nonpolyposis colorectal cancer in a Caucasian population. Clin. Cancer Res. 10, 5845–5849 [PubMed]
17. Kong S., et al. (2000). Effects of cyclin D1 polymorphism on age of onset of hereditary nonpolyposis colorectal cancer. Cancer Res. 60, 249–252 [PubMed]
18. Zheng Y., et al. (2002). Haplotypes of two variants in p16 (CDKN2/MTS-1/INK4a) exon 3 and risk of squamous cell carcinoma of the head and neck: a case-control study. Cancer Epidemiol. Biomarkers Prev. 11, 640–645 [PubMed]
19. Starinsky S., et al. (2005). Genotype phenotype correlations in Israeli colorectal cancer patients. Int. J. Cancer. 114, 58–73 [PubMed]
20. Li G., et al. (2004). Association between the V109G polymorphism of the p27 gene and the risk and progression of oral squamous cell carcinoma. Clin. Cancer Res. 10,(12 Pt 1)3996–4002 [PubMed]
21. Simon M., et al. (2006). Variant of the CHEK2 gene as a prognostic marker in glioblastoma multiforme. Neurosurgery. 59, 1078–1085 [PubMed]
22. Yue P., et al. (2006). SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics. 7, 166 [PMC free article] [PubMed]
23. De La Vega FM., et al. (2006). A tool for selecting SNPs for association studies based on observed linkage disequilibrium patterns. Pac. Symp. Biocomput. 487–498 [PubMed]
24. Lee P.H., et al. (2009). An integrative scoring system for ranking SNPs by their potential deleterious effects. Bioinformatics. 25, 1048–1055 [PubMed]
25. Kent W.J., et al. (2002). The human genome browser at UCSC. Genome Res. 12, 996–1006 [PubMed]
26. Rosenbloom K.R., et al. (2010). ENCODE whole-genome data in the UCSC Genome Browser. Nucleic Acids Res., 38, D620–D625 [PMC free article] [PubMed]
27. Bernstein B.E., et al. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature. 489, 57–74 [PMC free article] [PubMed]
28. Williams R.L. (2000). A note on robust variance estimation for cluster-correlated data. Biometrics. 56, 645–646 [PubMed]
29. Benjamini Y., et al. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. 57, 289–300
30. Barrett J.C., et al. (2005). Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 21, 263–265 [PubMed]
31. Han J., et al. (2004). Polymorphisms in DNA double-strand break repair genes and skin cancer risk. Cancer Res. 64, 3009–3013 [PubMed]
32. Popanda O., et al. (2004). Specific combinations of DNA repair gene variants and increased risk for non-small cell lung cancer. Carcinogenesis. 25, 2433–2441 [PubMed]
33. Wu X., et al. (2006). Bladder cancer predisposition: a multigenic approach to DNA-repair and cell-cycle-control genes. Am. J. Hum. Genet. 78, 464–479 [PubMed]
34. Lin D., et al. (1989). The robust inference for the Cox proportional hazards model. J. Am. Stat. Assoc. 84, 1074–1078
35. Hayden P.J., et al. (2007). Variation in DNA repair genes XRCC3, XRCC4, XRCC5 and susceptibility to myeloma. Hum. Mol. Genet. 16, 3117–3127 [PubMed]
36. Massagué J. (2008). TGFbeta in Cancer. Cell. 134, 215–230 [PMC free article] [PubMed]
37. Stankovic T., et al. (1999). Inactivation of ataxia telangiectasia mutated gene in B-cell chronic lymphocytic leukaemia. Lancet. 353, 26–29 [PubMed]
38. Webb E.L., et al. (2006). Search for low penetrance alleles for colorectal cancer through a scan of 1467 non-synonymous SNPs in 2575 cases and 2707 controls with validation by kin-cohort analysis of 14 704 first-degree relatives. Hum. Mol. Genet. 15, 3263–3271 [PubMed]
39. Crawford G.E., et al. (2006). Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res. 16, 123–131 [PubMed]
40. Michl P., et al. (2005). CUTL1 is a target of TGF(beta) signaling that enhances cancer cell motility and invasiveness. Cancer Cell. 7, 521–532 [PubMed]
41. van Wijnen A.J., et al. (1996). CDP/cut is the DNA-binding subunit of histone gene transcription factor HiNF-D: a mechanism for gene regulation at the G1/S phase cell cycle transition point independent of transcription factor E2F. Proc. Natl Acad. Sci. U.S.A. 93, 11516–11521 [PubMed]
42. Sharma S., et al. (2011). Association of variants in innate immune genes with asthma and eczema. Pediatr. Allergy Immunol., 23,, 315–323 [PMC free article] [PubMed]
43. Angèle S., et al. (2004). ATM polymorphisms as risk factors for prostate cancer development. Br. J. Cancer. 91, 783–787 [PMC free article] [PubMed]
44. Cesaretti J.A., et al. (2005). ATM sequence variants are predictive of adverse radiotherapy response among patients treated for prostate cancer. Int. J. Radiat. Oncol. Biol. Phys. 61, 196–202 [PubMed]
45. Gutiérrez-Enríquez S., et al. (2004). Functional consequences of ATM sequence variants for chromosomal radiosensitivity. Genes. Chromosomes Cancer. 40, 109–119 [PubMed]
46. Larson G.P., et al. (1997). An allelic variant at the ATM locus is implicated in breast cancer susceptibility. Genet. Test. 1, 165–170 [PubMed]
47. Meyer A., et al. (2007). ATM missense variant P1054R predisposes to prostate cancer. Radiother. Oncol. 83, 283–288 [PubMed]
48. Schreiber A., et al. (2011). Structural basis for the subunit assembly of the anaphase-promoting complex. Nature. 470, 227–232 [PubMed]
49. Wang Q., et al. (2003). Alterations of anaphase-promoting complex genes in human colon cancer cells. Oncogene. 22, 1486–1490 [PubMed]
50. Chatzaki E., et al. (2003). Transforming growth factor beta1 exerts an autocrine regulatory effect on human endometrial stromal cell apoptosis, involving the FasL and Bcl-2 apoptotic pathways. Mol. Hum. Reprod. 9, 91–95 [PubMed]
51. Katayama N., et al. (2000). Bcl-2 in cell-cycle regulation of hematopoietic cells by transforming growth factor-beta1. Leuk. Lymphoma. 39, 601–605 [PubMed]
52. Scolnick D.M., et al. (2000). Chfr defines a mitotic stress checkpoint that delays entry into metaphase. Nature. 406, 430–435 [PubMed]
53. Cai Z., et al. (1997). Resistance of MCF7 human breast carcinoma cells to TNF-induced cell death is associated with loss of p53 function. Oncogene. 15, 2817–2826 [PubMed]
54. Hu X., et al. (2002). Ubiquitin/proteasome-dependent degradation of D-type cyclins is linked to tumor necrosis factor-induced cell cycle arrest. J. Biol. Chem. 277, 16528–16537 [PubMed]
55. Chen Y., et al. (2012). Time-course network analysis reveals TNF-α can promote G1/S transition of cell cycle in vascular endothelial cells. Bioinformatics. 28, 1–4 [PubMed]
56. Kong S., et al. (2001). Cyclin D1 polymorphism and increased risk of colorectal cancer at young age. J. Natl Cancer Inst. 93, 1106–1108 [PubMed]
57. Lewis R.C., et al. (2003). Polymorphism of the cyclin D1 gene, CCND1, and risk for incident sporadic colorectal adenomas. Cancer Res. 63, 8549–8553 [PubMed]
58. Huang W.S., et al. (2006). Impact of the cyclin D1 A870G polymorphism on susceptibility to sporadic colorectal cancer in Taiwan. Dis. Colon Rectum. 49, 602–608 [PubMed]
59. Dunlop M.G., et al. (2012). Common variation near CDKN1A, POLD3 and SHROOM2 influences colorectal cancer risk. Nat. Genet. 44, 770–776 [PubMed]
60. Houlston R.S., et al. (2008). Meta-analysis of genome-wide association data identifies four new susceptibility loci for colorectal cancer. Nat. Genet. 40, 1426–1435 [PMC free article] [PubMed]
61. Houlston R.S., et al. (2010). Meta-analysis of three genome-wide association studies identifies susceptibility loci for colorectal cancer at 1q41, 3q26.2, 12q13.13 and 20q13.33. Nat. Genet., 42, 973–977 [PubMed]
62. Tenesa A., et al. (2009). New insights into the aetiology of colorectal cancer from genome-wide association studies. Nat. Rev. Genet. 10, 353–358 [PubMed]
63. Diergaarde B., et al. (2007). Environmental factors and colorectal tumor risk in individuals with hereditary nonpolyposis colorectal cancer. Clin. Gastroenterol. Hepatol. 5, 736–742 [PubMed]
64. Pande M., et al. (2010). Smoking and colorectal cancer in Lynch syndrome: results from the Colon Cancer Family Registry and the University of Texas M.D. Anderson Cancer Center. Clin. Cancer Res. 16, 1331–1339 [PMC free article] [PubMed]
65. Watson P., et al. (2004). Tobacco use and increased colorectal cancer risk in patients with hereditary nonpolyposis colorectal cancer (Lynch syndrome). Arch. Intern. Med. 164, 2429–2431 [PubMed]
66. Voskuil D.W., et al. (2002). Meat consumption and meat preparation in relation to colorectal adenomas among sporadic and HNPCC family patients in The Netherlands. Eur. J. Cancer. 38, 2300–2308 [PubMed]
67. Botma A., et al. (2010). Body mass index increases risk of colorectal adenomas in men with Lynch syndrome: the GEOLynch cohort study. J. Clin. Oncol. 28, 4346–4353 [PubMed]
68. Campbell P.T., et al. (2010). Case-control study of overweight, obesity, and colorectal cancer risk, overall and by tumor microsatellite instability status. J. Natl Cancer Inst., 102, 391–400 [PMC free article] [PubMed]
69. Blokhuis M.M., et al. (2010). Lynch syndrome: the influence of environmental factors on extracolonic cancer risk in hMLH1 c.C1528T mutation carriers and their mutation-negative sisters. Fam. Cancer. 9, 357–363 [PubMed]
70. Curtin K., et al. (2009). Meta association of colorectal cancer confirms risk alleles at 8q24 and 18q21. Cancer Epidemiol. Biomarkers Prev. 18, 616–621 [PMC free article] [PubMed]
71. Pittman A.M., et al. (2008). Refinement of the basis and impact of common 11q23.1 variation to the risk of developing colorectal cancer. Hum. Mol. Genet. 17, 3720–3727 [PubMed]
72. Tenesa A., et al. (2008). Genome-wide association scan identifies a colorectal cancer susceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21. Nat. Genet. 40, 631–637 [PMC free article] [PubMed]
73. Broderick P., et al. (2007). A genome-wide association study shows that common alleles of SMAD7 influence colorectal cancer risk. Nat. Genet., 39, 1315–1317 [PubMed]

Articles from Carcinogenesis are provided here courtesy of Oxford University Press