Our study presents one of the first large scale explorations of gene-gene interactions in the setting of a multi-stage GWAS. Our analysis identifies a list () of pair-wise SNP interactions that, through follow-up study, may elucidate the functional relevance of CaP susceptibility SNPs. Our analysis also provides insights into future methodological challenges that large scale studies will face in establishing conclusive interaction with either the primary or a secondary trait. It presents an empirical evaluation of modern methods for interaction analyses using case-control data.
The results we report for the conditional scans were obtained using a recently proposed EB method. We focused on that method because previous simulations demonstrated it is more powerful than standard logistic regression (25
) and our empirical evaluation suggests it is more robust than case-only type methods. Our observation that the CML method can suffer bias even when scan and conditioning SNPs are on different chromosomes is particularly cautionary. The robustness of the EB method is expected to be similarly beneficial in studies of gene-environment interactions for which it can be difficult to assess an independence assumption.
Perhaps the most noteworthy result of our study is the top SNP for interaction with 8q24 Region 3 in Stage II: rs4953347 which is intronic to EPAS1
. That gene belongs to a family of hypoxia-inducible factors that promote key carcinogenic processes such as angiogensis and metastasis (31
). Under hypoxic conditions, which are common in malignant tumors, EPAS1 directly binds and activates POU5F1
). By activating POU5F1
, EPAS1 has been shown to promote tumorigenesis (34
). Both EPAS1
are over-expressed in CaP, but POU5F1
is not expressed in either healthy or malignant prostate tissue (13
). Given these data, we propose that POU5F1B
mediates the observed EPAS1
-8q24 Region 3 interaction. Our hypothesis involves an assumption that EPAS1 participates in the regulation of POU5F1B
, which is currently poorly understood.
Kastler et. al suggested the over-expression of POU5F1B
in CaP may mimic the ectopic expression of POU5F1
), which has been shown to promote epithelial tumors (35
). That hypothesis aligns well with reports that CaP progression involves the reactivation of embryonic pathways (36
) because POU5F1
is central to the regulation of stem cell pluripotency (17
) and its encoded transcription factor is functionally similar to the protein of POU5F1B
). Specifically, multipotential progenitor cells are thought to be seeds of tumorigenesis in CaP (37
) and ectopic POU5F1
expression is thought to promote epithelial tumorigenesis by inhibiting the differentiation of progenitor cells (35
). Given these data we cautiously hypothesize that the CaP association of 8q24 Region 3 involves a type of pluripotency network centered on POU5F1B
rather than on POU5F1
. Our follow-up analysis of 8q24 Region 3 with ESRRB
offers preliminary support of the hypothesis because those genes, which function as both regulators and targets of POU5F1
, demonstrate a significant interaction with 8q24 Region 3.
The preceding results should be interpreted cautiously. Future effort is needed to replicate the finding of statistical interaction for EPAS1
and 8q24 Region 3 in independent studies. To obtain conclusive evidence, the sample size for those studies needs to be large due to modest magnitude of the anticipated interaction. Even if our findings replicate, it cannot provide direct evidence for the proposed model underlying the interaction. Additional functional studies would be needed. We believe these future studies should consider not only CaP but rather all epithelial cancers associated with 8q24 Region 3 because, in subsets of those cancers, EPAS1
is over-expressed (31
), mRNA transcripts of POU5F1B
have been detected (15
), and embryonic pathways are implicated (38
). Notably, all those features characterize colon cancer.
The Stage II analysis produced other notable results, including two for the conditioning region KLK2-KLK3
. The region-wide significant result for rs17714461 is noteworthy because its nearby gene, KLK4
, has been shown to stimulate cellular proliferation in CaP in conjunction with KLK2
), and additional reports have linked KLK4
to various aspects of CaP progression, including mesenchymal transition, invasion and metastasis (40
). The top SNP for interaction in the KLK2-KLK3
conditional scan was rs1558875 an intronic SNP to PRRX2
that is also associated with cellular proliferation (43
). This interaction replicated in our relatively small, independent CGEMS Stage I analysis. These preliminary results suggest that the KLK2-KLK3
susceptibility region contributes to CaP risk through interaction with genes involved in cellular proliferation. Functional follow-up studies and additional replication efforts are warranted.
A second notable finding from our investigation of potential “cis” interactions involved rs4314620 for the extended 8q24 region. Its pair-wise interactions with three independent known risk alleles for CaP within the 8q24 region suggest different 8q24 susceptibility loci may be related by some common underlying biologic mechanism. It is hard to speculate what that mechanism may be because it is a gene-poor region, but one possible explanation for the observed interactions is long-range gene regulation. rs4314620 resides in a sub-region of 8q24 that is associated with bladder cancer () (44
) and contains two regulatory regions for the oncogene MYC
that flanks 8q24 Region 4 (45
In our analysis of CGEMS Stage I data, one SNP-pair exceeded genome-wide significance for interaction in the conditional scan for 8q24 Region 4. The result was unlikely to be due to genotyping error as a second SNP in strong LD with the original signal also showed strong significance. Yet, the interaction failed to replicate when we genotyped the SNPs in an additional 2439 cases and 2241 controls in CGEMS Stage II. This example illustrates the challenge of employing rank p-values for prioritization of interaction as well as of establishing the threshold needed for a conclusive finding. We note that Stage I of CGEMS, which included 1175 cases and 1100 controls, was underpowered for study of interactions (). In the future, one way of reducing such false positives would be to consider Bayesian methods (46
) that can incorporate both power and biological plausibility into measures of statistical significance. Another strategy to gain power, particularly for initial GWAS stages, is meta-analysis of multiple studies, enabling one to increase sample size while retaining the full array of SNPs.
Power curves for detecting interactions in CGEMS at genome wide significance
Due to the scarcity of highly significant findings, we carefully examined the power of CGEMS Stages I and II to detect interactions at genome-wide significance levels (alpha = 1.0e-7 and 1.85e-6 for Stage I and II, respectively; ). In these calculations, we focused only on quantitative interactions where the effect of one locus can be modified, but not reversed, by the other locus and vice versa. Stage I of CGEMS had virtually no power to detect interaction odds ratios in the scenarios we examined which ranged 1.13–2.05. The larger Stage II, in contrast, had high power for detecting modest to large interaction odds ratio (≥ 1.7) even after accounting for the fact that some power is lost due to selection of the SNPs at Stage I by main effect only. It is notable that under a model of quantitative interaction, a larger interaction odds ratio also corresponds to larger main effects. Thus, the loss of power at Stage I due to the selection by main effect is often small when the interaction odds ratio and MAFs are reasonably large. In contrast, in the presence of qualitative interaction, the main effects of loci could be very weak or even non-existent even when the interaction odds ratio is large. The power for detecting such loci in our analysis is low, as the probability of selecting them for Stage II is low.
We conclude that the susceptibility SNPs we have studied are unlikely to have quantitative interactions of large magnitude with other SNPs in the genome. Theoretical calculations (48
), as well as a lack of findings of epistasis for other diseases, also point towards the possibility that large non-multiplicative or non-additive effects may not be abundant in the etiology of complex traits. It is possible that our study has missed qualitative interactions, but the biologic plausibility of the presence of many such extreme types of interaction is questionable. It is also possible that epistasis, if it plays an important role in the etiology of CaP, will have a much more complex form than the pairwise SNP-SNP interactions we studied. Finding such higher order interactions in large scale studies, however, will remain an intrinsically challenging problem because of both the computationally daunting task of exploring all possible multi-locus models and the requirement for extremely large sample sizes that will be necessary to achieve sufficient power while minimizing the chance of false positives.
In the future, detecting evidence of gene-gene interactions through study of statistical interactions between SNP markers will likely require very large sample sizes that are achievable only by sharing individual level data in consortiums of GWAS. For smaller scale studies, the exercise of exploring gene-gene interactions is unlikely to lead to definitive findings, but it can be useful in generating lists of loci that require follow-up in replication studies. Incorporating biological knowledge from reliable pathway and network databases could enhance the power for detection, validation, and interpretation of interaction.
Our exploration of gene-gene interactions in CGEMS identified a list of SNPs that require future replication effort with varying degrees of priority (). We hope its public availability will motivate replication studies. The EB method we highlight is appropriate for those analyses, as it is both powerful and robust. We consider our most notable finding to be an interaction between SNPs in EPAS1 and 8q24 Region 3 because it generates a preliminary hypothesis about the poorly understood association of 8q24 Region 3 with multiple epithelial cancers that centers on the recently characterized gene POU5F1B. A second result with high priority for follow-up is an interaction between PRRX2 and the KLK2-KLK3 region. It suggests that the functional relevance of the KLK2-KLK3 susceptibility region in the etiology CaP may involve cellular proliferation.