|Home | About | Journals | Submit | Contact Us | Français|
Genome-wide association studies (GWAS) provide a powerful new approach to identify common, low-penetrance susceptibility loci without prior knowledge of biological function. Results from three GWAS conducted in populations of European ancestry are available for colorectal cancer (CRC). These studies have identified eleven disease loci which, for the majority, were not previously suspected to be related to CRC. The proportions of the familial and population risks explained by these loci are small and they currently are not useful for risk prediction. However, the power of these studies was low indicating that a number of other loci may be identified in new on-going GWAS, and in pooled analyses. Thus, the risk prediction ability of susceptibility markers identified in GWAS for CRC may improve as more variants are discovered. This may in turn have important implications for targeting high risk individuals for colonoscopy screening.
Colorectal cancer (CRC) is known to aggregate in families, with the disease being two-to-three times more common among the first degree-relatives of cases than in those of population controls. The contribution of inherited factors (mainly genetic) to the etiology of the disease has been estimated in twin studies to be 35% (1). However, for the most part, the underlying susceptibility genes for CRC remain unknown. In recent decades, linkage studies used collections of multi-case families to identify a number of rare mutations in highly penetrant genes that cause well characterized Mendelian syndromes (e.g., HNPCC, FAP, juvenile polyposis, Peutz Jeghers syndrome) (2). However, these mutations explain 2–6% of CRCs, and only a small fraction of the familial risk. Thus, it is likely that additional susceptibility genes exist for CRC.
In recent years, linkage studies have failed to discover additional high-penetrance genes, suggesting that multiple low-penetrance alleles may explain the remaining genetic risk for CRC. Indeed, association studies, in which the frequencies of genetic variants are directly compared between large series of patients and unrelated controls , are now thought to be more appropriate than linkage studies for the identification of susceptibility loci for complex diseases, including CRC (3).
It is estimated that there are ~10 million single-nucleotide polymorphisms (SNPs) in the human genome, half of which with a minor allele frequency (MAF) over 10% (4). These genetic variants and other types of polymorphisms (insertion/deletion, copy number variations) are expected to explain approximately 90% of human heterozygosity, including susceptibility to disease (4). Variants that were deleterious during evolution (such as mutations that cause early-onset diseases) are typically rare, due to natural selection. Conversely, disease variants that act after reproduction, or that are pleiotropic in effect, may have been neutral or subject to balancing selection (e.g., sickle cell anemia and malaria). In such cases, most of the genetic variation underlying disease risk may be common.
Detailed studies of the variation in the human genome across individuals found sizeable regions, or “linkage disequilibrium (LD) blocks”, over which little evidence for past recombination was observed, and within which more than 90% of all chromosomes matched to only one of a few common haplotypes (5). These studies showed that nearly all of the common diversity at a given locus could be captured by genotyping a small subset of common markers.
Over the past seven years, international efforts have resulted in a public reference human genome diversity database, a Haplotype Map of the Human Genome (HapMap) which has identified and validated over 3 million SNPs (6). These resources, as well as the development of high-throughput microarray platforms for the simultaneous genotyping of hundreds of thousands of SNPs now allow the testing of a high proportion of all common SNPs (with frequency >5%) for association with disease in studies called “genome-wide association studies” (GWAS). These studies allow the scanning of the entire genome for association with disease without prior knowledge of biological function and, thus, have the potential to reveal unsuspected regions and novel biological mechanisms. However, theses studies require large sample sizes to account for the inflated Type-I error resulting from the very high number of case-control comparisons and to detect effect sizes that are expected to be small.
Over the last two years, results from GWAS have been published for CRC. These studies have all been case-controls studies conducted in populations of European ancestry and used multistage designs (7–13). Table 1 summarizes the published findings from these studies as of January 2009. The risks conferred by each risk allele have uniformly been low, with odds ratios in the range of 1.1–1.3 per allele (7–13).
The first susceptibility locus for CRC identified in these studies was 8q24. This genomic region first emerged for prostate cancer, through a linkage study followed up by an association study and, independently through an admixture scan in African Americans (14,15). At least three different susceptibility loci were identified for prostate cancer in 8q24 (16). One of these loci (128.1–128.7 Mb) was found to be associated with CRC in two GWAS (7,8) and, independently, in a case-control study nested in the Multiethnic Cohort study (17). In this region, subsequently also associated with ovarian cancer, there are no known genes or annotated coding transcripts, with the exception of a pseudogene (POU5F1P1). However, approximately 300 kb telomeric to this region is the c-MYC (MYC) oncogene. Replication, sequencing and fine-mapping studies of this locus have identified rs6983267 as the most promising variant for functional assessment (18). This SNP lies in a sequence which is highly conserved across vertebrates and is predicted to have regulatory function (18). Although MYC is often amplified in colon and prostate cancers, rs6983267 has not been found to modify the expression of this gene in colon tumors and lymphoblastoid cell lines. Thus, the mechanism underlying the association of this SNP to CRC and several other common cancers remains unknown. However, its relative proximity to MYC makes it plausible that it may disrupt one of its putative distant enhancers, the effect of which, however, may not be observable in tumors.
A locus at 9p24, also a region with no obvious candidate gene, was also found associated with CRC in the original ARCTIC report (7) and was replicated in the Colorectal Cancer Family Registry (19). However, since this association was not observed in some of the ARCTIC replication populations, this association may not exist in all populations.
A number of the subsequently reported loci fall within or close to a gene (18q21: SMAD7; 15q13.3: CRAC1, 8q23.3: EIF3H; 14q22.2: BMP4; 16q22.1: CDH1; and 19q13.1: RHPN2). SMAD7 is known to act as an intracellular antagonist of TGF signaling and perturbation of its expression had been shown to affect CRC progression (9). EIF3H regulates cell growth and viability (11). CRAC1 had already been linked to hereditary mixed polyposis syndrome and CRC in Ashkenazi Jews (10). However, the other associated loci (10p14, 11q23.1, 18q23, 20p12.3), similarly to 8q24 and 9p24, lie in intergenic regions with no known biological relevance. Thus, a large amount of work is needed to understand the biological mechanisms underlying these associations.
However, before functional studies can be initiated, re-sequencing and fine-mapping efforts are needed to identity the best candidate causal variants at these newly identified eleven loci. Moreover, very little information is also available on the generalization of these associations to ethic/racial groups other than whites. The only exception is rs6983267 at 8q24, which has been shown to be consistently associated with CRC among the five ethnic/racial populations in the Multiethnic Cohort (Japanese Americans, Native Hawaiians, African Americans, Whites and Latinos) (17) and to be the best candidate variants in the region (18). Tenesa et al. (12) have also suggested that rs3802842 at 11q23 may not be associated with CRC in Japanese. Fine-mapping studies in populations with different LD structures are potentially very useful to identify the true causal variant at a particular locus, as well as novel ethnic/racial-specific risk alleles.
Only limited data are available on the epidemiological characteristics of these associations. Rs3802842 at 11q23 and rs4939827 (SMAD7) have been reported to be more strongly associated with rectal cancer than colon cancer (12). No differences in risk have been reported by tumor molecular subtypes for the eleven published variants, with the exception of rs4444235 (BMP4) for which the association was found to be significantly stronger for mismatch repair (MMR) proficient tumors than for MMR deficient tumors (13). The largest analysis conducted to date, a pooled analysis of the two UK studies, have suggested that each of the risk alleles identified so far have independent effects and that, as a group, they only explain a small proportion of cases in the population (13). However, even in this pooled analysis, power to detect associations with SNPs having a MAF <0.3 was limited. This suggests that additional, somewhat less common, susceptibility variants exist and points to the need for a pooled analysis with the ongoing North American GWAS studies and the need to conduct additional GWAS.
The potential modifying effects of the newly identified susceptibility variants also need to be investigated. It is very clear from migrant and temporal trend studies that the etiology of CRC has a very strong environmental component (20). Thus, large cohort studies, in which lifestyle risk factors for CRC were assessed before diagnosis, are being used to investigate gene-environment (GxE) interactions with the risk alleles identified in GWAS. Pooled analyses of published and existing GWAS may also provide adequate power to detect novel modifying genes in investigations of GxG interactions in the primary data (21). Finally, populations that are especially susceptible to the effect of a Western lifestyle on CRC risk, such as the Japanese, may provide a particularly suitable population for identifying GxE interactions (20,22).
GWAS provide an efficient new approach to identify common, low-penetrance susceptibility loci without prior knowledge of biological function. Results from GWAS conducted in populations of European ancestry living in the UK and Canada have been published for CRC. These studies have identified eleven well-replicated disease loci which, for the majority, were not previously suspected to be related to CRC. “Post-GWAS” studies are being initiated to: 1) characterize the epidemiology of these associations across populations, tumor molecular sub-types and clinically relevant variables; 2) explore gene-environment interactions to detect modifying effects that may explain a greater proportion of the population risk; and 3) identify the best candidate causal variants for subsequent functional studies aimed at elucidating the underlying biological mechanisms. The proportions of the familial and population risks explained by the published loci are small and they are not currently useful for risk prediction. However, the power of the published studies was low indicating that a number of other loci may be found in additional ongoing GWAS, especially as the result of pooled analyses of all the combined primary data. Thus, there is potential for the risk prediction ability of susceptibility markers identified in GWAS to improve as more variants are found. This may in turn have important implications for targeting high risk individuals for colonoscopy screening.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.