In this genome-wide association study of early-onset MSS/MSI-Low CRC, we identified no selected personal or lifestyle characteristic that significantly modified the effect of genetic variants on the risk of CRC at a strict genome-wide level of less than 6.5 × 10
−8 using an exhaustive case-control or case-only test or the appropriate significance levels for two-step method of Murcray et al. (
8). We identified seven significant interactions with previously identified hits from published GWAS in CRC. Interestingly, one of the interactions was between rs3802842 and post-menopausal hormone use; rs3802842 has been previously reported to be associated with an increased risk of CRC among females with Lynch syndrome (
18). However, none of these seven interactions were statistically significant at the 5% level in an independent replication sample.
Little of the genetic variation in CRC has been explained and it is likely that many more variants remain to be identified. One potential way to identify additional susceptibility alleles is to search for GxE interactions, and thereby identify genetic variants that may have an effect only in a given subgroup of individuals, identified by a common environmental risk factor or molecular profile. We applied an efficient two-step approach described by Murcray et al. for detecting loci involved in GxE interactions. It is performed independently of any initial scans for main effects and that incorporates a preliminary screening step constructed to efficiently use all available information (
8). Other methods have been proposed, such as a 2-df test for assessing genetic main effects and interactions jointly (
19) and approaches designed to combine the case-control and case-only analyses (
20,
21), but there has been no formal comparison of these methods.
Achieving sufficient statistical power is challenging in a genome-wide context, even with these recently described methodologies. Our power calculations highlight this point, especially where the expected gene, exposure and interaction effects are modest. shows the sample size required to attain 80% power with the two-step approach for various combinations of minor allele frequencies, exposure prevalences, and interaction odds ratios. In this context, it was assumed that there were no SNP main effects, corresponding to the scenario where a GxE scan could detect a SNP that a standard GWAS based on SNP main effects would not. We found that a using data from a typical GWAS of 1,000 cases and 1,000 controls would detect interaction odds ratios of 2 or higher, with highly prevalent exposures and allele frequencies. There are likely to be many GxE interactions, but our study is underpowered to detect them. International consortia gathering GWAS data in CRC may aid in this effort if environmental covariates are available and there is potential for harmonization of variable definitions. However, even this increased sample size will not suffice to detect interaction odds ratios below 1.4, especially for less frequent exposures and lower allele frequencies.
We also investigated whether any of the recently reported and robustly replicated susceptibility loci identified through GWA studies of CRC were modulated by selected environmental factors. We considered only replicated susceptibility variants from published GWA studies of independent CRC cases and unaffected controls (
1–
6). We identified a few significant interactions at the <5% level, but none of these were significant in an independent case-control study of CRC that had collected epidemiologic data using the same questionnaires from individuals in one of the same geographical regions. One potential reason for our failure to replicate could be that we were unable to restrict our replication sample to only cases with early-onset MSS or MSI-low cancers. Common environmental exposures, such as alcohol intake, cigarette smoking, and obesity, have been reported to differ by MSI strata (
22,
23). Furthermore, for four known susceptibility alleles we found no association with colorectal cancer in the Colon CFR and in the absence of a main effect the prospects of identifying a GxE interaction may be lower.
There are some limitations to this study. The main concern is the limited statistical power to investigate GxE interactions for less common exposures and less frequent alleles. Collaborative consortia offer important advantages of increasing sample size; however, they also have important limitations, including the potential introduction of heterogeneity due to combining different study designs, measures of exposures, and cancer outcome. Consortia with central quality control procedures and careful standardization and harmonization of definitions and measurements may be helpful. However, large sample size alone does not guarantee quality and reliable results (
24). In this study, we had uniform data collection protocols and all cases were defined in a standard manner as MSS or MSI-Low. Another potential limitation is our relatively crude definitions of the environmental factors. Furthermore, because of the study design, we were unable to investigate the potential effects of ethnicity, family history of CRC, or other phenotypes of CRC (i.e., MSI-high). Lastly, there is no consensus about the correct statistical method to model gene-environment interactions and more research is required.
In summary, we identified no genome-wide significant GxE interactions in this genome-wide association study of early-onset MSS/MSI-low CRC. Much of the evidence from descriptive epidemiology, migrant studies, and changes in CRC rates in countries undergoing rapid economic development (most obviously Japan in the second half of the twentieth century; Japan now has the highest rates of CRC in the world) points to environmental risk factors as the major determinants of the international variation in CRC. It is crucial therefore that we gain a better understanding of susceptibility to these environmental factors. This, in turn, underscores the need to detect GxE interactions, which will require large collaborations of GWA studies with adequate data collection on exposures.