|Home | About | Journals | Submit | Contact Us | Français|
Colorectal cancer (CRC) is one of the leading causes of cancer-related deaths in the Western world, but our understanding of this disease is incomplete. The recent advent of new technologies has provided novel insights into the pathogenesis of CRC.
Genome-wide association studies have recently linked CRC to 10 common genetic variants or single-nucleotide polymorphisms that map to chromosomes 8q23, 8q24, 10p14, 11q23, 14q22, 15q13, 16q22, 18q21, 19q13 and 20p1. However, the causal significance of these variants is not understood, and some are located in poorly characterized genomic regions or gene deserts. Recent studies indicate that the single-nucleotide polymorphism rs6983267, which maps to 8q24, serves as an enhancer of MYC expression by binding T cell factor 4 (TCF4) and influencing Wnt signaling. In addition, several microRNAs interact with genes such as K-RAS, APC, p53, PTEN, TCF4, COX-2, DNMT3a and DNMT3b. Germline hypermethylation of the DNA mismatch repair genes MLH1 and MSH2 may serve as predisposing events in some CRC patients.
Recent studies have elucidated novel mechanisms involved in CRC, including the involvement of single-nucleotide polymorphisms not located within traditional genes, the role of microRNAs and epimutations in DNA mismatch repair genes. Interestingly, most of this progress has been made by understanding DNA that does not encode genes.
Colorectal cancer (CRC) is a common and deadly disease. The availability of tissues and cultured cells for analysis has provided opportunities for the elucidation of the cellular, genetic and molecular mechanisms involved in tumor initiation, progression and metastasis. CRCs arise through a multistep carcinogenic process in which genetic and epigenetic alterations accumulate sequentially . Although the genetic alterations occur in a stochastic fashion in individual cells, the defects accumulate in a nonrandom fashion in tumors because of growth or survival advantages conferred by rare mutations in the selected clones. Therefore, a key mechanistic component of initiation and progression in CRC is the occurrence of genomic and epigenetic instability, which increases the rate of accumulation of such alterations, which permits the adaptations characteristic of malignant tumors .
CRC is a disease that is largely influenced by lifestyle and dietary factors, and studies in recent years have begun to recognize the importance of single-nucleotide polymorphisms (SNPs) in genes that are involved in xenobiotic metabolism that might account for CRC risk in the context of certain environmental exposures. Until recently, it was impractical to explore the millions of SNPs in the human genome using candidate gene approaches. However, recent advances in technology permit genome-wide screening, and several candidate CRC susceptibility loci have been identified. Second, microRNAs (miRNAs) have emerged as important factors in human cancer, as these regulate the expression of approximately 30% of human genes . Identification of gene targets of miRNAs has been a daunting task; however, recent studies indicate that many miRNAs target key growth regulatory genes. Finally, germline hypermethylation of DNA mismatch repair (MMR) genes can be a factor in some proportion of patients who appear to have Lynch Syndrome, but do not have germline mutations in the suspected gene. In this review, we will review new concepts that have been uncovered in the past 2 years relevant to the pathogenesis of CRC, and will briefly discuss the opportunities and challenges that lie ahead.
Genome-wide association studies (GWAS) provide a powerful approach for high-throughput identification of common, low penetrance alleles that can modify the risk for multiple diseases including CRC. These minor but common variations in coding or noncoding DNA sequences are referred to as SNPs. At one end of the disease spectrum, there are syndromic familial CRC diseases that are caused by rare mutations in high-penetrance genes such as adenomatous polyposis coli (APC) (causing familial adenomatous polyposis) and the DNA MMR genes that cause Lynch syndrome. However, these mutations explain only about 3–4% of CRC. The human genome project and extensive linkage analysis suggest that the principal genes causing these high-penetrance diseases have essentially been identified. Other familial clusters of CRC are currently thought to arise through the influence of common, low-penetrance genetic factors.
GWAS technology allows linkage analysis of hundreds of thousands of SNPs simultaneously, making it possible to determine linkage-using sets of tagged SNPs that correspond to common variants in a genome. This permits determination of disease associations without a priori knowledge of the location or function of the DNA sequence . These data have allowed an unprecedented opportunity to better understand the role of common genetic variants in the cause of cancer and other diseases. Although the GWAS concept has existed for years, it was not until late 2007 that the first common low-penetrance susceptibility variant was associated with the risk of CRC . Since then, 10 variants have been linked to CRC and replicated in multiple studies through genotype analysis of tens of thousands of individuals, as summarized in Table 1 [4–17]. The initial data indicate that these variants exert relatively minor effects on cancer risk by themselves; however, combinations of multiple variants correlated with environmental exposures offer a promising possibility to develop robust predictive models for CRC risk stratification.
Most of the published GWAS on CRC have been undertaken by British [4–6,8] and Canadian  researchers. These studies were performed in two phases. Typically, the first phase used modest sample sizes (~1000 patients and controls), and these identified six novel CRC susceptibility loci mapping to 8q23 , 8q24 , 10p14 , 11q23 , 15q13  and 18q21 [6,8,18]. Although statistically significant, these common sequence variants had relatively modest effects on CRC risk [odds ratios (ORs) were no more than ≈1.2]. These initial studies were followed by a meta-analysis  and a second phase of studies with tens of thousands of patients and controls, which identified four new risk loci mapping to chromosomes 14q22, 16q22, 19q13 and 20p12; however, these had even smaller effect sizes (ORs ≈1.1). Most of these data have been successfully replicated in multiple independent studies, and susceptibility loci mapping to 8q23 , 8q24 [12,13,16], 11q23 [14,15] and 18q21 [12,17] have been validated in different populations.
Although the discovery of these novel susceptibility loci for CRC generated enthusiasm, some investigators began to question the causal role and biological significance of these variants, as none of these loci was located within or near a coding (exonic) sequence. All loci were found in noncoding introns, some so devoid of possible coding sequences or transcriptional activity that they were referred to as gene deserts. For instance, the variants on chromosome 8q24 that were associated with CRC and other tumors are 330 kb away from the nearest gene.
Moreover, five of the 10 SNPs identified tag linkage disequilibrium blocks that include or are near genes of the transforming growth factor-beta super family signaling pathway, including SMAD7, gremlin 1 (GREM1), BMP2, BMP4 and rhophilin-like protein (RHPN2). These data suggest that, although these susceptibility variants generally have a modest effect on CRC risk, they might be associated with functional effects that are large, if a combination of critical variants were to be present in any individual .
In a significant development, two independent research groups made seminal discoveries that offer insight into the functional role of one of the three 8q24 variants, rs6983267. First, the haplotype containing the rs6983267 G allele is found in 50% of Europeans and nearly 100% of Africans; so, it is quite common. Homozygosity for the G allele of this SNP increases CRC risk 1.5-fold (a relatively weak effect), but this allele shows relative copy number increase during tumor development [19••,20••]. The novel finding is that this region acts as a transcriptional enhancer and contains a sequence that can enhance Wnt signaling, a key pathway in CRC. Furthermore, regulatory elements of MYC are located within the gene desert on chromosome 8q24 [20••]. This region also preferentially binds the transcription factor T cell factor 7-like 2 (TCF7L2), which is a key participant in the Wnt signaling cascade [19••,20••]. These data illustrate the utility of GWAS and shed important insights into the connection between the SNP on 8q24-activated Wnt signaling, increased MYC expression and CRC. This concept will certainly be exploited in future studies with other SNPs to better understand the pathogenesis of many common diseases.
miRNAs constitute a class of unique, single-stranded, evolutionarily conserved, small (19–25 ribonucleotides), noncoding RNAs that function as posttranscriptional gene regulators. miRNAs have emerged as new molecular players in carcinogenesis, and deregulation of their expression has been linked to multiple human cancers . miRNAs contribute to oncogenesis functioning either as tumor suppressors (tsmiRs) or tumor promoters (oncomiRs). miRNAs were first discovered in worms, and thus far approximately 550 human miRNAs have been identified . Each miRNA can regulate the expression of several mRNA targets; however, identifying the relationship between miRNAs and their target genes has been a challenge. Much of the current knowledge in this regard comes from in-silico predictions, but given the burgeoning evidence that miRNAs play a critical role in cancer initiation and progression, there is a growing interest to identify the authentic, functional targets of miRNAs in human cancers. In the last few years, several miRNAs have been shown to be up or downregulated in CRC. In the last year, several publications [23–34,35••, 36–41] have highlighted the functional role of miRNAs in CRC, as summarized in Table 2.
miRNA profiling of CRCs has identified several up and downregulated miRNAs, and of interest, expression of miR-31, miR-183, miR-17-5, miR-18a, miR-20a and miR-92 have been found to be significantly higher in CRC than normal tissues, whereas miR-143 and miR-145 are expressed at lower levels in CRCs . Findings from this study further revealed that CRCs with overexpression of miR-18a tended to have a poorer prognosis as compared with the tumors with lower expression of this miRNA . miR-18a functions as a tumor suppressor miRNA by targeting the K-RAS oncogene . Increased expression of cyclooxygenase-2 (COX-2) is a frequent event in CRC, and data indicate that downregulation of miR-101 is associated with overexpression of COX-2 in human CRC cells .
Interrogation of the functional role of individual miRNAs has demonstrated that miR-135a and miR-135b directly target the 3′-untranslated region of the APC gene, suppress its expression and activate Wnt signaling . Contrariwise, APC regulates expression of the tsmiR miR-122a and significantly downregulates miR-122a expression in gastrointestinal cell lines and tissues . Inactivation of APC is considered a gatekeeper event for the initiation of CRC. These data reveal a miRNA-mediated mechanism for control of the APC gene and the activation of the Wnt signaling pathway.
Adenomatous polyps are precursors of most CRCs, and the progression of these lesions to cancer is a multistep process orchestrated through various genetic and epigenetic alterations. Increased expression of miR-21 has been linked to poor outcome and survival in CRC, so this functions as an oncomiR . Moreover, recent reports of higher expression of miR-21 in adenomas and CRCs relative to normal surrounding tissue suggest that abnormal expression of this miRNA represents an early cellular event in the progression of CRC [28,29]. It has also been shown that miR-21 promotes cell migration and invasion by targeting the programmed cell death protein 4 (PDCD4) and phosphatase and tensin homolog (PTEN) tumor suppressor genes .
Both miR-143 and miR-145 are downregulated in CRC, and the loci encoding these miRNAs are both located on 5q23 [31,32]. Downregulation of insulin receptor substrate-1 (IRS-1) plays a significant role in the tumor suppressor activity of miR-145 . Upon further exploration of the functional role of these miRNAs, it was recently discovered that the tumor-suppressive role of miR-143 is achieved by targeting the DNA methyltransferase 3a (DNMT3A) gene [42••]. In further support of this, miR-143 was shown to inhibit translation of K-RAS . Given the role of DNMTs and RAS/RAF signaling in the epigenetic regulation of gene expression, these data provide new directions for the possible development of miRNA-based targeted approaches to epigenetic therapy.
The molecular mechanisms responsible for the deregulated expression of miRNAs in human cancer are poorly understood. A group of Japanese investigators [35••] elegantly demonstrated that the tumor suppressor gene p53 enhances the posttranscriptional maturation of several tsmiRs, including miR-16-1, miR-143 and miR-145, revealing a previously unrecognized function of p53 in miRNA processing. Additionally, it has been suggested that expression of tumor suppressive miRNAs can also be silenced via hypermethylation. In this regard, miR-34b, miR-34c, miR-9-1, miR-129-2 and miR-137, all of which are embedded in CpG islands, have been demonstrated to be targets of hypermethylation in CRC cell lines and tumor tissues [36,37]. MiR-9-1 methylation is also associated with the presence of lymph node metastases . Taken together, there is a growing appreciation for the role of miRNAs in CRC and other cancers. The use of miRNAs as biomarkers is a newly emerging field, as is the potential exploitation of miRNAs as therapeutic targets.
The paradigm for hereditary cancer syndromes has been to find a germline mutation in a coding region, splice site or promoter of the gene associated with that disease. Recent data suggest that some cases of the hereditary CRC syndrome, Lynch syndrome, are associated with epigenetic inactivation of the gene caused by promoter methylation. This has been seen in multiple family members, acting like a classic autosomal dominant disease. The theoretical problem is that epigenetic alterations (such as methylation) are thought to be ‘erased’ early in embryogenesis, which conflicts with the observation of vertical passage of the trait through a family. A novel explanation has been found for this.
Lynch syndrome is an autosomal dominant cancer syndrome characterized by early-onset CRCs and a variety of extracolonic tumors. This is caused by germline mutations in DNA MMR genes, most often MutL homolog 1 (MLH1) and MutS homolog 2 (MSH2), and less frequently MutS homolog 6 (MSH6) and postmeiotic segregation 2 (PMS2). The MLH1 gene can also be inactivated by methylation in sporadic CRCs, leading to a tumor that mimics Lynch syndrome, but this is not inherited. This acquired situation is strongly associated with the CpG island methylator phenotype.
In 2002, it was reported that MLH1 can be methylated in the peripheral blood as well as the tumor tissues of some CRC patients , and subsequently, about 20 CRC patients have been reported with monoallelic MLH1 methylation in the tumor tissue and in DNA isolated from lymphocytes and other tissues [44–46]. However, it has been controversial whether these ‘soma-wide’ epimutations can be inherited, and the conventional wisdom is that, in contrast to genetic mutations, MLH1 epimutations are reversible between generations and are not inherited in a Mendelian fashion [45,46].
Large genomic alterations of MSH2 are a frequent mechanism for its inactivation in Lynch syndrome patients because this gene is embedded in an archipelago of Alu sequences, which predisposes to internal homologous recombination and excisional deletion. Additionally, evidence for germline methylation was also reported for MSH2 in a few ‘Lynch syndrome’ families [47,48••,49]. These families had multiple affected members with features of Lynch syndrome, loss of the MSH2 and MSH6 proteins, but lacked germline mutations in either of these genes required for that diagnosis [47,49]. A Finnish study  recently demonstrated simultaneous large genomic deletions in MSH2 and germline epimutations in MLH1 in some proportion of mutation-negative suspected Lynch syndrome families. Interestingly, in contrast to MLH1, epimutations in MSH2 were documented to be stably inherited in multiple individuals across three generations, providing compelling evidence for Mendelian transmission of this epimutation .
In a breakthrough study, a novel molecular explanation for germline MSH2 methylation was indentified. In this report, germline deletions at the 3′-end of the epithelial cell adhesion molecule (EpCAM) gene (formerly known as tumor-associated calcium signal transducer 1 or TACSTD1), which is located immediately upstream of MSH2 and expressed in the same direction, were identified in the epimutation carriers [48••]. The deletions in EpCAM included the termination signal, which abolished transcriptional termination of EpCAM and resulted in transcriptional read-through into MSH2. These fusion transcripts were significantly overexpressed in the epithelial tissues and correlated with extensive MSH2 methylation in these patients [48••,51].
Our current understanding of the contribution of epimutations in MMR genes in CRC is in its infancy; however, these recent discoveries are provocative as identification of such epimutations has important implications for surveillance recommendations in affected families.
A number of common genetic variations (SNPs) associated with CRC risk play a critical role in the cause of CRC by modifying the expression of target genes that regulate cell behaviors. Our genome encodes hundreds of miRNAs, some of which play key roles in human cancer by regulating the expression of a cascade of other genes. The roles of SNPs and miRNAs represent examples of how parts of the genome once assumed to be ‘junk’ have been shown to play an important role in cancer. Similarly, the importance of the epigenetic silencing of genes has been recognized in CRC, and this has been linked to a deletion that occurred outside of the gene mediating the disease. Curiously, recent progress has occurred by finding variations or aberrations outside of the coding regions of the genes that play a more proximate role in regulating cellular behavior. Much of the recent progress has occurred by looking in some of the more unlikely places in the genome.
This work was supported by NIH grants #CA72851 and #CA129286 and funds from the Baylor Research Institute.
Papers of particular interest, published within the annual period of review, have been highlighted as:
• of special interest
•• of outstanding interest
Additional references related to this topic can also be found in the Current World Literature section in this issue (pp. 79–80).